03-28-2012 03:31 PM
We have a huge data acq system in design. There are multiple PC's running Linux that do that actual talking to the custom acquisition modules via sockets. They then turn around and act as a socket server so you can establish a connection and stream packets to your PC. I am noticing a packet corruption every few million transactions. The Linux server's logs say everything went out OK.
In order to troubleshoot, I had a colleague run a C based app to talk to the server to see if it saw the same problem. It did, but I have since found out that he was running CVI. We have written a similar app in Microsoft C.net and it is now running tests but no results yet.
The question is, does LV and CVI use the same TCPIP stack and has anybody seen corrupted packets.
I am only receiving 60 packets a second and the app is queued. Processor usage is around 1% and memory usage is steady.
03-28-2012 03:40 PM
Depends on what you mean by "share the same code." In both languages you're making calls to the operating system's TCP stack. There might be slightly less wrapping around the CVI version, but it's the same underlying implementation provided by the operating system. I've never seen corrupted packets due to either LabVIEW or the operating system. What sort of corruption - is the data in the packet bad, or are you losing a packet?
03-28-2012 10:31 PM
Thanks for your reply. All my tcpip stack experience was 10+ years ago on Unix. I was wondering if the code just called the operating system's tcpip stack code or if it handled on its own. The more I think about it, the more I realize that it pretty much has to be part of the OS since multiple apps can be using the same connection.
The packets are 64 bits. The corruption looks like a 64 byte packet is the combination of the first part of one packet and then starts off on a new packet (has another header) roughly midway through. The next packet will start off with the remainder of the packet that started in the middle of the previous packet and then get a header from the next packet, and so on. From the Linux logs it shows that maybe 200-600 packets were actually lost. I can stop the stream and restart and every thing goes back to normal.
We wrote the same type of program in C.NET with no CVI and it is seeing the same type of problems. The developer of the server code in Linux says he has a similar app running on a Linux machine that is not seeing the problem. I haven't personally seen it running on Linux but that is my next step.
03-29-2012 02:13 AM - edited 03-29-2012 02:15 AM
This quite likely looks like an interpretation fault at the receiver end. How robust is your packet format? The most simple implementation with just a length field and the packet load is highly suspect to such corruptions. If you misinterpret anything or get one byte off for whatever reason, your entire decoding is going haywire. A proper header should contain both some sort of ID that can be verified and a length indicator of the data load that follows. Then when receiving the data you look for that ID, access the length and read as many data as indicated. If the ID doesn't match you log the frame and everything and preferable the previous frame too, if possible and cause an error in your receiving end. On such an error it is usually best to close the connection and reconnect to the server for a new connection.
03-29-2012 08:27 AM
Each and every packet in the current configuration is supposed to be 64 bytes period. I read and verify the header on each packet. I will get a packet with a good header, followed by corrupted data starting somewhere in the packet. The corruption occurs and can be easily seen because what is happening is that it appears to start putting in a new packet's data. The next packet suffers from the non aligned data as well.
I can fix the offset and get things sync'd again but that is not the issue. I am trying to validate the operation of both ends of the link and that there is basically a zero error rate. At the moment I can't seem to prove who is at fault or what is causing it.
03-29-2012 11:06 AM
@t_houston wrote:
I can fix the offset and get things sync'd again but that is not the issue. I am trying to validate the operation of both ends of the link and that there is basically a zero error rate. At the moment I can't seem to prove who is at fault or what is causing it.
If you haven't already, you might want to try a tool like Wireshark (http://www.wireshark.org/) to see the data/packets "on the wire". This would let you verify whether or not the sending software is sending the correct packets, and whether the receiving software is properly receiving/interpreting the packets.
Mark Moss
Electrical Validation Engineer
GHSP