05-22-2012 05:15 PM
This may be a question more suited for the BioBench or somewhere else. If so, please let me know where I should post.
PROBLEM:
I wrote a program for data acquisition off an AD board. The files are huge and take a long time to read for analysis.
DETAILS:
I built a small animal plethysmograph and metabolic assessment system. For full disclosure I use a 6220 board with an SCB68 for acquisition. I collect 5 channels @1KHz, (respiration - pressure transducer, EKG - self built amplifier w/filters, Oxygen, CO2, and chamber temp - all out of the box). I have no formal training in writing this kind of software (as will become evident) and I am self taught from a book. The software that I wrote does work well enough. However, my biggest problem is that the file sizes are HUGE!!! A half hour of data collection can leave me with a 1Gb file. I have a computer that can handle this, but it is slow. I think that I am doing something wrong here though. ADI systems (that I have demo'ed) doesn't produce files this size, even as text files, while I am writing to a binary file. My suspicion is that for each loop in my program, I am resetting t0?
After I write the files, I open them in a second program that I wrote. This program splits the channels, and allows me to pick peaks and valleys (respiration freq. and amplitude as well as heart rate) or take RMS (for the gas content). Opening these files take 10-15min if they are very large and peak/valley marking goes slow. This software may also have problems, but I think the large file size is most fundamental to the problem.
I'll post the acquisition program and I am happy to post the peak detection program, if it will help anyone.
RESOLUTION:
I was hoping that I could get 1) ideas on what I am doing wrong or why my files are so large 2) an overall critique of my code and ideas for improvement where warranted.
Thank you for any help offered. It would be great if I could streamline these experiments!
05-22-2012 10:00 PM
It's large because you are collecting a lot of data.
5 channels at 1 kHz is 5000 samples per second. Each sample is 8 bytes since it is a double datatype. So 40,000 bytes per second. 2,400,000 bytes per minute. 30 minutes would be 72 million bytes. That is big, but not quite 0.5 GB.
I think the datalog format probably has a lot of overhead, some from file headers, some from the rest of the data in the waveform cluster such as t0 and dT. I tested a single waveform with a single value in the array. That file size came out to be 636 bytes although it was only 1 data point.
I think the way you are writing waveforms to the file is not the best way. I know NI recommends using the TDMS file format for high speed streaming data. So try that instead of that Write Waveform to File VI.
Actually your VI looks really pretty good. One thing that could help performance would be to use a producer/consumer architecture where one loop collects the data from the DAQ card and passes it to another loop by way of a queue. That loop handles the file operations. Doing this, you might be able to accumulate multiple chunks of data, build them together and write them out in a larger sample to the file.
05-23-2012 12:42 PM
One minor comment about your VI: it's a bit weird that you've wired a number of samples to the timeout terminal of DAQmx Read. It turns out that this works in your case because the input is in milliseconds, and you're expecting one sample per millisecond, but I recommend that you simply wire a reasonable timeout value and a number of samples. Remove the DAQmx property node. If the buffer overflows you'll get an error from DAQmx Read, but I don't think your Buffer Overflow indicator will ever become true because there will never be more samples available than can fit in the buffer.
05-23-2012 02:31 PM
Good catch Nathan. I missed that.
That could be contributing to the problem. With number of samples actually unwired, it will return the default which is all available samples. Depending on how fast the rest of the loop runs, that could be a lot of samples or very few. If it is very few samples, then it is a much shorter section of waveform happening more often, which will mean more overhead bytes and less real data bytes.