11-06-2009 03:12 PM
Dear All,
I have a nasty problem with a large app. that uses the DataSocket server to communicate between its two component parts.
The first part (the "Logger") is a VI which reads data from various pieces of custom H/W attached to its host PC, logs them to disc and also 'publishes' them using the DataSocket interface. The publishing is done using the dstp protocol and the front panel menus (ie. not programatically) for each indicator.
The second part (the "Monitor") is made up of several VIs which 'subscribe' to the published indicators and share the data amongst themselves using global variables. The Monitor always runs on the same PC as the Logger but can also run on any other machine on the LAN at the same time.
I have been developing these VIs for some years, without problem in the past. The most common symptom is that, after a period of typically some days, one particular (boolean) control on the Monitor constantly receives a corrupt value. The corresponding indicator on the Logger is always correct and its little green LED is on. The little green LED on the Monitor's corresponding control is also always on but once the problem has arisen its value is always wrong. If I start the Monitor on another PC (once the problem has occurred) it too receives the wrong value. Running a cut-down 'Monitor' also reproduces the problem -- 'cut-down' meaning a dummy VI comprising just the problem Control by itself inside a while-loop.
The only way to be sure of clearing the problem is to stop the Logger and the DataSocket server and then restart them both.
I tried sniffing on TCP port 3015 on the other PC running eg. the 'cut-down' Monitor and recording the traffic (pcap format). Using Ethereal I can see what looks like an initialisation at the start of the conversation in which the Monitor requests the Control's name (as a string) and the Logger's PC replies using the string together with various binary data which I imagine contains the Control's value plus other metadata. Is the dtsp protocol documented anywhere? If so eg. an Ethereal filter ought to be able to be written to interpret the traffic and maybe shed some light here...
My 'analysis' such as it is, is that the problem is occurring somewhere between where the Logger's VIs data is 'published' (ie. inputted) to the DataSocket server and where the DataSocket server writes it to TCP port 3015 for 'subscription' via either the PCs loopback interface or NIC.
I have checked the Logger PC's Event Log for errors/warnings corresponding to the occurrance of the problem but found nothing.
Can anyone suggest how best to proceed in diagnosing this one?
Many thanks
Tom Crane.
System details:
Labview 6.1 development version.
WinXP Pro SP2 or SP3 (all PCs).
DataSocket server version 4.0 (377).
Publishing URL eg. dstp://localhost/SM1000_STATUS_blinking
Subscribing eg. URL: dstp://pcxyz.rhul.ac.uk/SM1000_STATUS_blinking
11-12-2009 08:19 AM
Hello Tom,
Intermittent occurences like this along with heavy use of global variables generally point to race conditions.
I would advise first looking into the code very carefully, and follow the flow of data within and outside towards your monitoring VIs.
Once you are 100% certain that there are no race conditions, then we can consider looking into the communications.
I hope this helps,
Kind Regards,
Michael S.
Applications Engineer
NI UK & Ireland
11-18-2009 11:21 PM
Hello Michael,
Thanks for the followup. I am pretty sure it is not a race condition amongst the 'monitor' VIs, since only one of those VIs actually subscribes to the controls which are published by the Logger VI. Moreover, the above mentioned 'dummy monitor' VI which does nothing but display the subscribed control in a while-loop, displays identical symptoms to the full 'monitor' suite of VIs.
To recap: The 'Logger' VI only publishes the problem variables (as indicators) and the 'dummy monitor' VI only subscribes them (as controls). The 'Logger''s published indicator values are always correct but the 'Monitor's corresponding subscribed controls display the fault when present.
Another type of failure I have encountered is that the 'Monitor' VI's subscribed controls sometimes stop updating, usually after several days of working fine. Symptoms detail are as follows;
o 'Logger' VI works fine, constantly updates its published indicators (inc. current time[nos seconds since 1/1/1904 00:00:00], DataSocket LEDs all remain green.
o 'Monitor' VI appears to be fine, but subscribed controls show values present at the time the fault occurred, eg. current time control shows that date/time value and is unchanging, DataSocket LEDs all remain green.
o Connections to TCP port 3015 on the 'Logger' PC remain open ('ESTABLISHED').
Cheers
Tom.
Ps. I notice I typoed 'Datatsocket' in the thread's subject. Would it be possible/desirable for the forum admin to correct this for me to aid keyword searching/indexing?
12-07-2009 05:53 AM
Hello,
Have you had a chance to test the reliability of the same app on LabVIEW 2009? This may well be a known bug that has been fixed since 6.1 as there are at least 6 versions released since. I have found bug reports (CAR:2W98AHMO) that describe similar behaviour that have been fixed in subsequent versions.
Kind Regards,
12-09-2009 04:28 PM
Hello Mark,
Thanks for the feedback. I do not know my way round this forum too well and could not interpret your "(CAR:2W98AHMO)" link! Could you clarify / explain ?
Thanks
Tom.