TDMS storage in one group or many?

Justin_Reina · ‎12-12-2010

Hello all,

I am designing a verification tool at work for an RFID device. I have a dedicated PC box with the NI PCIe 7841 DAQ for this tool.

Question

My question is about segmenting data in TDMS files between files/groups and channels. How do you pick max file size to break storage into? Is storing into one group of really long channels efficient, or you should you break it up into multiple smaller groups to improve read access-times later?

Event Definition

My application is storing 'events' defined by the following fields (each event has these fields):

[U64] FPGA Timestamp
[U16] Event Type
[U32] Event SeqID (seqID is based on type and ensures all recorded events are in the storage file)
[U16] Value0
[U16] Value1

Event Activity Rate

These events are acquired at an average rate of ~50kEvents/sec, and will be processed offline. Sample sessions will last ~1-2hours, yielding ~ 7-8GB. Sample sessions will only happen 1-3 times/wk.

Storage Method

Forget about the FPGA->Host mechanism for a moment, and just consider the storage of this data. What I have here is a continuous stream of these events. My method of storage so far is:

Break TDMS files at 250MB (arbitrary size selection)
One group per file
5 channels per group, each reflecting the listed parameters

Final Question Summary

Is this a good way to store the data? I have a few questions here:

Should I break these into 5 groups of 50MB, or just one group, with really long channels?
What considerations should I make with regards to max file size? What information about my HDD, PC and TDMS can help pick a reasonable file size? I would prefer to store as large as possible.
How does the length of a channel affect access time afterwards?

Thanks!

-Justin Reina

justinmreina@gmail.com

Bo_Xie · ‎12-12-2010

Justin,

What's the problem you want to solve? Improve the TDMS Write and/or Read performance?

Do you have any benchmark for your current HDD and TDMS throughput performance? If the TDMS throughput is close to your HDD physical throughput limitation, you can do nothing but update your HDD hardware. If not, you might analyze your VI files to find the bottleneck.

FYI, LabVIEW 2010 ships a new feature "Advanced TDMS VIs and Functions" to improve TDMS performance. Did you ever try it?

Best Regards,
Bo Xie

Justin_Reina · ‎12-12-2010

Hi Bo,

Clarification of Question

I am sorry, perhaps the length of my post obscured its intent. I was really looking for someone to say "sure, go ahead and store single channels of 250M U16 points". Or conversely for someone to say that its a bad idea, and then why.

Answers

What's the problem you want to solve? Improve the TDMS Write and/or Read performance?

"How do you pick max file size to break storage into? Is storing into one group of really long channels efficient, or you should you break it up into multiple smaller groups to improve read access-times later?"

Do you have any benchmark for your current HDD and TDMS throughput performance?

No.
"What considerations should I make with regards to max file size? What information about my HDD, PC and TDMS can help pick a reasonable file size?"

FYI, LabVIEW 2010 ships a new feature "Advanced TDMS VIs and Functions" to improve TDMS performance. Did you ever try it

I have seen it, yes. I have not tried it.

Again, sorry if the post is too long. I am mainly looking for a reason to make a decision one way or another, and not to just run emperical tests to figure it out.

Thanks,

Justin Reina

justinmreina@gmail.com

AndrewMc · ‎12-13-2010

Hey Justin,

That seems like a reasonable setup to use. Storing each of these entities as channels with every event corresponding with a single point in those channels seems like the most straight-forward layout.

You asked about the idea of dividing into multiple groups within the same file:

I don't see any need to do this in your case. If, however, you have some distinction that you want to maintain between data sets, it seems like this would be handy; for example, if you see an interesting event and want to keep track that something is different with the subsequent points, creating a new group would keep the data sets logically separate.

As far as storing that much data in each file, that seems fine. The only thing to bear in mind is that when processing the file, you won't be able to load that much data into memory at once, so you will need to process it piece-meal. As far as determining what that magic number of samples per file that you want to put into each file, it is a bit arbitrary. I don't know of a good formula for determining where you should partition data sets, but in general just make sure to consider:

1) the overhead in switching files (opening/closing/writing a new header)

2) how you'd like to process the data

3) your risk tolerance: if something were to happen to a file (deletion, volume corruption, etc), what's the most acceptable balance for your needs between file size and number of files.

Thanks,

Andy McRorie
NI R&D

Justin_Reina · ‎12-13-2010

that's exactly what I was looking for. Thanks!

-Justin Reina

justinmreina@gmail.com

LabVIEW

TDMS storage in one group or many?

TDMS storage in one group or many?

回复： TDMS storage in one group or many?

回复： TDMS storage in one group or many?

回复： TDMS storage in one group or many?

回复： TDMS storage in one group or many?