Architecture Review

yenknip · ‎04-14-2010

Summary of Queries:
- Would you imagine my design to be suitable, should the need arise to scale up even further.
- How could I improve on potential race conditions in data logging.

I work in Catalyst Research, where I support about 100 scientists which includes building automated test rigs. I have a joint honours degree and 5 years working experience in both electronics and software engineering, of which 3 years is LabVIEW development.

I can bounce around electronics ideas on site, but as I am the only software person, I was hoping some of you could take 10 minutes and review my general software architecture.

The rigs will usually have several pieces of hardware attached, Usually in the form of:
- NI USB DAQ Device
- Eurotherm Controllers (Mini8 controller, 2000 series, 3000 series) on RS232, 485 or TCP.
- The hardware protocol is irrelevant, as I communicate with them via OCP
- Other RS232/RS485 instruments, communicating via NI VISA.
- Other USB instruments, communicating via directx, or device drivers, et al.

A typically run will last upwards of 12 hours, and so they need to be safe to leave unattended. Mostly this is in the form of hardware interlocks, as I don't trust computers not to lock up. Furthermore, this means actual error handling in the software rather than popping an error message up on screen 😛

I like my software designs to be modular and abstracted (although never to excess). Each hardware device is managed in its own class data structure; Values are read, buffered, and saved to a shift register. Values are continually compared against safe operating limits. The only time an external, or higher level VI is called from the controller is a config data structure during initialisation. Actions can be sent to the device controller through a queue. If a value goes outside of safe operating conditions, the particular condition is sent to an alarm handler via a queue.

Thankfully, an experiment can often be organised into a sequence of user configurable steps. This takes the form of an array of clusters. The cluster instance stores setpoints for temperatures, dwell times, device states and so on. When a sequence step is advanced, the setpoints are fed by queues to each device controller. This all works fine and dandy.

A level above that, test runs can be queued as well; a user will put a sample into the rig, log on to the system, create a test run sequence, and leave it to get on with it. I should also add that sequences and run conditions are saved to disk, so should the PC crash, things can get up and running again quickly.

With regards to the user interface; that is all it is. None of the VIs in the UI deal with the running of the rig. Changes made to the UI are written to the config file data structure, directly to the device controllers, alarm handler, or some miscellaneous process controller. The UI displayed readings have a set update rate, which queries data from each device controller

At the requests of the scientists, all data that can be logged, is logged, at a user defined frequency. Once the log file reaches a number of lines, it is closed and an incremented filename is continued. This is so they can be imported into excel, or to ease strain on a computer post processing the data. Each log iteration, data is queried from wherever readings are taken.

I seem to have a stable, reliable and safe way of building the rigs, but I remain unsatisfied with my tried methods of presenting data to the user and logging data. Given that data is being read from a number of sources, I want a good way of keeping the data reasonably synchronised. (Reasonable being under 50ms) Temperatures, pH levels, pressures, Mass Flow Controllers, and motorised actuators don't fluctuate enough that an accuracy of more than a few hundred Hertz will matter (although oversampling and smoothing occur inside the data controller / on the device).

My current method is creating a temporary notifier and adding it to the front of the device queue. On the next device iteration, it is popped off the queue and the last buffered value is returned via the notifier to the calling function. I favour this over an action engine; because as there was no easy way to indicate when to take a reading, every time the device controller took a reading, it was necessary to update the action engine. I am still unhappy with my current method of querying data, as I cannot ensure that new values have been updated in the device controllers when the data logging section queries the data. This may even cause the same value to be logged. (Race condition) This problem would still persist with an action engine.

_____________________________
- Cheers, Ed

yenknip · ‎04-14-2010

To clarify that last paragraph with an extreme example;

Lets say the device controller is running slowly for whatever reason. The automatic antivirus scanner has kicked in, and the shift register is being updated at 2 Hz (500ms)
If the data logging was to be querying the device data at 5Hz (200ms) then it would be logging the same value every other reading.

I don't want to have to resort to hacking in some flags indicating that the values have been updated, as this would make the data logging hang for readings and not have the correct timestamp. I am aware that a computer is not a real time system, but timestamps should be regular and correct to a few ms.

_____________________________
- Cheers, Ed

F._Schubert · ‎04-15-2010

Just some ideas:

* timestamp each reading

* Instead of logging the timestamp to the file, you can raise a flag 'Old data' or the like. I think the OPC protocol has some flag that indicates how new/recent any value is. Maybe you want to copy it from this standard.

* In such scenarios, I'd just have the readings broadcasting the data via a notifier all the time instead of the temproary callback notifier. The logger (or whatever) just goes through the list of notifiers and takes the most recent value.

Felix

yenknip · ‎04-15-2010

Cheers Felix, I reckon my best bet would be to broadcast.

I do think it's a bit of a shame that there is always lots of help for those learning a programming language, but it dwindles a bit once you need guidelines on how to design a large system 😞 I guess it is largely to do with needing to know the problem intimately before designing the system, so casual aid is a lot more difficult

_____________________________
- Cheers, Ed

Mark_SAI · ‎04-19-2010

Hi,

Following Felix's excellent comments, I thought about pointing you towards this page, if you haven't already seen it! It doesn't discuss anything specifically but I noticed your comment about information on large system development and its very true and perhaps the link and the links inside the link will help.

ST5 · ‎04-19-2010

I don't know your (Labview) Knowledge, neither what you already have.

An advice I can give you, keeping in mind that:

-it's an experimental setup

-auto reconnect can be needed

-there are multiple I/O connection

-multi user, but one programmer

-...

Instead of initialising all connections only at the beginning of a program, it's better to make it possible everywhere, using a FSM (Finite state machine) for every connection.

When not, you could have to restart your program for every single connection breakdown.

How I do it: Enum-Typedef-Case, thats very known.

My cases contain ReadIniFile - ConnectionInit - Wait - TryToConnect - Connected - ConnectionError - Closedown

Depending on the state of the connection, the user interface, the measurement made, errors etc you can now easily/automaticly jump from one to another, reinitialising connections etc.

Success !

Ben · ‎04-19-2010

yenknip wrote:
Cheers Felix, I reckon my best bet would be to broadcast.

I do think it's a bit of a shame that there is always lots of help for those learning a programming language, but it dwindles a bit once you need guidelines on how to design a large system 😞 I guess it is largely to do with needing to know the problem intimately before designing the system, so casual aid is a lot more difficult

1) To get good feedback on architectures, an Architect is handy.

Did you take a look at the desings docs I posted in this thread (see reply #22) ?

2) Answering trivia questions about LV is something I can do ont the way too of from my breaks. Architectures takes some thought since no two apps are the same and there are an infiniate number of ways to implment them, not all of which are good.

3) Architectures are what I get paid to develop/implement so I am constantly watching myself to ensure I don't "give away the shop".

4) If you have a specific Q about the architecture please summarize the Q and I'll try to help out.

Ben

Retired Senior Automation Systems Architect with Data Science Automation LabVIEW Champion Knight of NI and Prepper LinkedIn Profile YouTube Channel

yenknip · ‎04-22-2010

Hi all, sorry for the delayed reply, I had a few days off 🙂

Mark, that's a great link to some sound software design theory

ST5, My current hardware controllers are based on a state machine as you described. The class holds any data regarding connection addresses and references. I would describe my structure as a queued-state-machine, because other parts of the software can queue actions to the controller to perform an action such as changing a COM port or enabling/disabling reading of an unused channel. Additional states I use include signal conditioning and limit checking.

Ben, I think I really need to find a spare week somewhere and do the LV advanced training course, but in the mean time, I think I can summarise:

In the system I currently implement, data needs to be constantly streaming into the system for safety checking. This data is also used in a number of other locations, namely; safe limit checking, logging, PID controller, displaying to user. Following Felix's suggestion, I am broadcasting the data from a notifier, which is updated at the speed it takes to read, buffer and prepare the data which is only a few milisec. From the looks of it, this is a similar approach to the one outlined in your powerpoint presentation; A liberal use of state machines and the synchronisation palette.

This is holding up for the time being but if I need to run an even more complex rig with a number of RS232/485/USB devices, I am unsure how stable and quick the system would be. The way I would fix this would be to put hardware read/write directly in the most greedy data user, the PID controller(s) and notify the other data sinks, but I would lose the abstraction of the hardware checking. Alternately, the whole hardware state machine could go in the PID loop, but then I might as well be sharing data around as above.

Basically, I just need some peers to be able to share and exaplain my ideas to, and steer me away from pit traps. Even just a thumbs up to my current architecture would be great

_____________________________
- Cheers, Ed

Ben · ‎04-22-2010

Here is some stuff from another app

where I was concerned about safety since I had 4 PIDs that if out of control would be bad.

This is a rough interaction diagram to show how the various componenets interact. THe import point in this diagram is that all Output Control (an AE) functions are invoked via the "Safety Limits" ( another AE).

The dotted lines are for error and exception handling and in normal conditions can be ignored. THe following is a cut and paste of my notes to myself that were transfered to the code.

1)Event Logger – Will log and display and log all posted events and will allow users to identify a list of background processes. The nature of the logged events will be dependent on the entities posting the events but will include at the minimum the time the application was started and if the application is shutdown in an orderly fashion, the time and date when it was shutdown. NOTES: The Event Logger will be capable of monitoring the running state of the application. It will close itself when it has detected that the application has been stopped. All interaction with the Event Logger is one directional and will only be initiated in the event of an error or an abnormal condition being detected. The connections between the various entities of the system and the Event Logger are shown with dotted lines to emphasize the logging of events as not playing a part in the operation of the application.

2)DAQ – The DAQ sub-system will acquire and scale measurements for all configured channels. It will obtain a list of input channels from the “Configuration”. The DAQ sub-system will utilize the offset, scale, and unit information store in the “Configuration” when converting physical measurements to engineering units. The DAQ sub-system will access the most recent channel configuration for every input operation to allow for changes to the configuration to applied “on-the-fly”. As measurements are acquired they will be passed to the “Logging” and Safety Limits” sub-systems via queues as well as being posted to the “Input Status”. The DAQ sub-system will accept shutdown commands from the GUI when appropriate. Additional logic will be implemented in the DAQ sub-system to support the monitoring of the amount of CO2 in the loop. It will track the total amount of mass in the loop. It will use the total (acquired from the “Configuration”) when it starts as the base-line value. It will also allow this value to be over-ridden when indicated to do so in response to the user.

3)Configuration – The Configuration object will provide the current configuration information to the DAQ sub-system and the configuration screen on the user interface (GUI). It will recall the previously used configuration from file when it starts. It will also accept changes to the configuration that are initiated by the user and will also load the specified configuration if selected by the user. The Configuration object will also offer the previous CO2 mass total that was saved when the system was previously shutdown for use by the DAQ sub-system. When indicated by commands from the GUI, the Configuration will update the Total Mass. This will typically happen at system shut-down.

4)Input Status – The Input Status sub-system will store the most recent values obtained by the DAQ sub-system. The Various GUI sub-systems will use the “Input Status” when update the various GUI indicators and charts.

5)Logging – The Logging sub-system will start logging when the application starts. It will monitor the a queue fed by the DAQ sub-system for new measurements. It will accept commands from the GUI to allow new files to be opened at any time.

6)GUI – The GUI will be display the readings stored in the “Input Status” entity on the various graphs and indicators present in its various tabs. The GUI will also interact with the “Configuration”, “DAQ”, and “Logging” to allow user control over the operation of those entities. It will also interact with the “Safety Limits” entity to allow the user to control the outputs of the system.

7)Safety Limits – The “Safety Limits” (SL) sub-system will continually monitor the state of critical measurements in the system to ensure safe operation. If an un-safe condition is detected, the appropriate action will be taken to place the test loop in a safe condition. The “Safety Limits” sub-system will monitor the current state of the system by monitoring a queue fed by the DAQ sub-system. The SL sub-system will monitor requests from the GUI to change output set-points and provided the system is in a safe state, will initiate the changes indicated. The SL will also accept updates of the safety set-points when indicated to do so by the GUI.

😎Output Control – Will provide an interface to all output devices and will coordinate all updates as required. The setting of the two water control valves will be coordinated such that the valve that should open will be updated before the valve that should close.

have fun,

Ben

Message Edited by Ben on 04-22-2010 07:59 AM

Retired Senior Automation Systems Architect with Data Science Automation LabVIEW Champion Knight of NI and Prepper LinkedIn Profile YouTube Channel

F._Schubert · ‎04-22-2010

Ed,

I published the way I architect my app's in the Event nugget

One thing I consider important is reuse. So I like to have things loosly coupled. This means use of strings instead of type def'ed enums. If a module isn't present (you can load modules using some kind of plug-in architecture or LVOOP, but I havn't done this yet on large scale), it doesn't break the code (very usful if the device is actually missing and you need to start setting up the ATE).

I've a PID on the event architecture (well, at the moment there is a lot of duct tape/glue/gloabl variables involved, but I'm currently rewriting it as a state machine), it runs fine. Some benchmarkings I did showed that I can loose events when I directly call two Generate Event VIs (a 1 ms wait between them did help) AND I did use the filter VI's (but it merly was an academic challenge).

Felix

LabVIEW

Architecture Review

Architecture Review

Re: Architecture Review

Re: Architecture Review

Re: Architecture Review

Re: Architecture Review

Re: Architecture Review

Re: Architecture Review

Re: Architecture Review

Re: Architecture Review

Re: Architecture Review