03-30-2023 09:59 PM - last edited on 04-02-2023 08:36 PM by markwni
Hi, all,
I am maintaining and extending a LabVIEW app that runs on a cRIO 9067. We developed our own FPGA in LabVIEW as part of this app.
It maintains a bi-directional communication path with an HMI running on a nearby PC (also written in LabVIEW). Regular status and data messages are sent from the cRIO to the HMI on a 10-second cycle. The technology used for both directions of this pipeline is LabVIEW network streams. The messages sent from the cRIO to the HMI are short -- a few hundred bytes each.
Every few hours, the cRIO app also transmits a 15 Mb file to an FTP server. I've observed this process to take 15 seconds or so, whenever it's kicked off. For this operation, the software uses LabVIEW's built-in FTP protocol, first writing the data to a local file, and then requesting it to be sent to the server. The VI that does this operates at background priority.
My question is this: will the LabVIEW TCP/IP protocol overlap these operations? Or, is it possible that when the FTP transmission kicks off, it might disrupt the more routine message flow? What I'm seeing in a network with 4 of these instruments is the cRIO randomly losing contact with the HMI, tossing a variety of seemingly unrelated exceptions.
As it turns out, the FTP server runs on the same machine as one of the HMIs. Would that make any difference?
Thanks for your help,
Chuck
03-31-2023 02:53 AM - edited 03-31-2023 02:56 AM
Logically it should not, performance technically it of course does. My suspicion is that your cRIO<->HMI protocol is not entirely fail safe. It works under normal load but starts to get into a mess as soon as the network gets congested.
These kind of things are very common and making a TCP/IP communication channel fully reliable is not trivial. Unlike some other things, not every error returned by a TCP/IP node is a fatal error. Timeouts for instance should simply be treated as "can't do now, try again". Other errors should be generally treated as "lets disconnect and reconnect again".
For the server the last usually means to simply cut the connection and wait for a new connection. The client should cut the connection and actively try to reconnect.
This is the principal scheme, the exact implementation details can get a little messy and you need to selectively ignore certain errors after having handled them appropriately (timeout means, don't try to process the not present message but simply retry again and forget the error).
Things can get especially messy as you will have to decide if a retry is just waiting again for a message or if you rather want to resend the command. The latter can in certain situations cause the connection to get completely out of sync if the previous command just got held up a little longer than your timeout was, so that you read on the next command the answer from the previous one. One possibility to detect this is to actually add a counter value to each command header and let the server side copy that value into each response header. If the response you read does not match the counter value you sent, it is obviously an out of order message and you should likely discard it and retry to read until you read the right response.