12-12-2010 03:37 PM
Hello all,
I am designing a verification tool at work for an RFID device. I have a dedicated PC box with the NI PCIe 7841 DAQ for this tool.
Question
My question is about segmenting data in TDMS files between files/groups and channels. How do you pick max file size to break storage into? Is storing into one group of really long channels efficient, or you should you break it up into multiple smaller groups to improve read access-times later?
Event Definition
My application is storing 'events' defined by the following fields (each event has these fields):
Event Activity Rate
These events are acquired at an average rate of ~50kEvents/sec, and will be processed offline. Sample sessions will last ~1-2hours, yielding ~ 7-8GB. Sample sessions will only happen 1-3 times/wk.
Storage Method
Forget about the FPGA->Host mechanism for a moment, and just consider the storage of this data. What I have here is a continuous stream of these events. My method of storage so far is:
Final Question Summary
Is this a good way to store the data? I have a few questions here:
Thanks!
-Justin Reina
Solved! Go to Solution.
12-12-2010 09:24 PM
Justin,
What's the problem you want to solve? Improve the TDMS Write and/or Read performance?
Do you have any benchmark for your current HDD and TDMS throughput performance? If the TDMS throughput is close to your HDD physical throughput limitation, you can do nothing but update your HDD hardware. If not, you might analyze your VI files to find the bottleneck.
FYI, LabVIEW 2010 ships a new feature "Advanced TDMS VIs and Functions" to improve TDMS performance. Did you ever try it?
Best Regards,
Bo Xie
12-12-2010 11:21 PM
Hi Bo,
Clarification of Question
I am sorry, perhaps the length of my post obscured its intent. I was really looking for someone to say "sure, go ahead and store single channels of 250M U16 points". Or conversely for someone to say that its a bad idea, and then why.
Answers
What's the problem you want to solve? Improve the TDMS Write and/or Read performance?
Do you have any benchmark for your current HDD and TDMS throughput performance?
FYI, LabVIEW 2010 ships a new feature "Advanced TDMS VIs and Functions" to improve TDMS performance. Did you ever try it
Again, sorry if the post is too long. I am mainly looking for a reason to make a decision one way or another, and not to just run emperical tests to figure it out.
Thanks,
Justin Reina
12-13-2010 11:13 AM
Hey Justin,
That seems like a reasonable setup to use. Storing each of these entities as channels with every event corresponding with a single point in those channels seems like the most straight-forward layout.
You asked about the idea of dividing into multiple groups within the same file:
I don't see any need to do this in your case. If, however, you have some distinction that you want to maintain between data sets, it seems like this would be handy; for example, if you see an interesting event and want to keep track that something is different with the subsequent points, creating a new group would keep the data sets logically separate.
As far as storing that much data in each file, that seems fine. The only thing to bear in mind is that when processing the file, you won't be able to load that much data into memory at once, so you will need to process it piece-meal. As far as determining what that magic number of samples per file that you want to put into each file, it is a bit arbitrary. I don't know of a good formula for determining where you should partition data sets, but in general just make sure to consider:
1) the overhead in switching files (opening/closing/writing a new header)
2) how you'd like to process the data
3) your risk tolerance: if something were to happen to a file (deletion, volume corruption, etc), what's the most acceptable balance for your needs between file size and number of files.
12-13-2010 12:51 PM
that's exactly what I was looking for. Thanks!
-Justin Reina