Saving large amounts of data in a single file, and as it is collected

CatDoe · ‎10-24-2022

I am trying to save a large amount of data, as it is collected, and prevent memory leaks. I have not dealt with data sets large enough for memory to be a concern with LabVIEW before. My goals for the data saving process are below. I would appreciate any insight and memory conscious suggestions.

Goals:

1. Save data as it is collected -- everything is saved as a row in a giant string array. If the distributable crashes, I want as much data already saved as possible.

2. Not duplicate data -- in the case of a user triggering the saving process, I only want individual runs counted. There is a small chance that two runs will have exactly the same line of information, but it is a possibility.

3. All data contained in a single file (.xls or text file) -- I want one file for all data collected in a day no matter if the program is restarted, file manually altered, etc...

4. Must be a reliable method that has a low failure/glitch rate.

5. speed -- as fast as possible, but the memory consumption reliability is a larger priority.

I had originally planned to append the file, however this appears incredibly slow and frequently fails. Is there a better option?

santo_13 · ‎10-24-2022

This is an excellent article to start - https://www.ni.com/en-us/innovations/white-papers/09/comparing-common-file-i-o-and-data-storage-appr...

In short, text-based files (txt,csv, xlsx) are one of the worst to store large data

Santhosh
Soliton Technologies

New to the forum? Please read community guidelines and how to ask smart questions

Only two ways to appreciate someone who spent their free time to reply/answer your question - give them Kudos or mark their reply as the answer/solution.

Finding it hard to source NI hardware? Try NI Trading Post

thols · ‎10-24-2022

As the article Santhos concludes: Go for TDMS. Its superior to anything else.

Bob_Schor · ‎10-25-2022

Hello, @CatDoe.

It seems to me you have two (slightly-different) problems: writing data "efficiently" (which might mean in some form of binary encoding) and writing data "safely" (so nothing is lost if you get a hardware or software glitch and the program halts with the file in an "unknown" state).

You failed to attach any code (note that a picture of a part of a Block Diagram isn't "code" -- only files with the extension ".vi" and related, such a .ctl, .lvproj, etc. count), so we can't "see for ourselves" what/how you are doing. You also didn't give us much quantitative data , like how much you are writing, how fast the data are arriving (is it in "bursts", or more-or-less continuous at, say, 10 kHz). What do you do when the file I/O throws an error? Does the program crash, or do you ignore the error and start a new file, or what?

My colleagues describe some of the virtues of TDMS. I don't have that much experience with this format, and can't say how it handles "crashes" and "restarts". However, if your data come in as "bursts" (for example, sampling 1000 points at a time at 1 kHz), then there's a fairly easy way to minimize file corruption with text files, and keep it fairly "fast" --

Open the Text file (which should be "local" on your PC) (very quick).
Position yourself to the end of the file (very quick).
Write 1000 points (very quick).
Close file (very quick),

If you've got a Producer/Consumer design going, you can make such a "Write" routine a Consumer, so it runs in parallel with everything else, and should take way less than the 1 second before the next request (which will be just as fast).

Now, if you'd attached your code, I wouldn't have to ask if you are using a Producer/Consumer design, and could (instead) tried to explain how LabVIEW "lets you do two things at the same time" (also called "Parallel Processing").

Bob Schor

Intaris · ‎10-26-2022

TDMS Streaming suffers from massive slowdown if many incremental writes are performed, I'm not sure it will maintain it's speed when used as the OP intends. I looked at this years ago for a similar application, but had to remove it from the selection due to this slowdown.

thols · ‎10-26-2022

@Intaris wrote:

TDMS Streaming suffers from massive slowdown if many incremental writes are performed, I'm not sure it will maintain it's speed when used as the OP intends. I looked at this years ago for a similar application, but had to remove it from the selection due to this slowdown.

Interesting. That is not something I have seen and I am initially skeptical to that generic statement. I'm sure you tried all the advanced TDMS-functions too that TDMS has. If you have a forum post about it I am interested to see it, too learn when or if TDMS is not ideal. Even more interesting would be if the OP would post more info or some code, so we can help.

Intaris · ‎10-26-2022

https://forums.ni.com/t5/LabVIEW/TDMS-flexibility-performance/m-p/3222453

I no longer have the details in my cranium, but I do remember that initial feedback was "That can't be" but when looked at closer, the way we wanted to write was basically incompatible with "fast".

Hooovahh · ‎10-26-2022

Writing small amounts of data very often to a TDMS file can result in some less then ideal situations, usually resulting in fragmentation, and having a very large index file. Periodically defragging is one solution, or simply flushing the buffer to disk less often is another. I made a toolkit posted on VIPM.io which has a couple of modes where it can handle the flushing periodically, or making new files at time intervals, and a few other modes.

https://www.vipm.io/package/hooovahh_tremendous_tdms/

Since TDMS file are able to be merged with a binary copy, you can also start logging into new files and combine them at the end.

Unofficial Forum Rules and Guidelines
Get going with G! - LabVIEW Wiki.

17 Part Blog on Automotive CAN bus. - Hooovahh - LabVIEW Overlord

thols · ‎10-26-2022

Thanks Intaris. Had a look at it. I have never had to do that much channel switching. I hope someone has a solution for that, its been a couple of years since that thread after all.

CatDoe · ‎10-26-2022

Hello Bob_Schor,

Attached is my VI. This VI is called in the message handling loop at the end of my testing sequence message queue.

This VI does not reliably create or edit files. Frequently it fails to generate a file or amend a file, yet no error code or probe/lightbulb issue is located. I would like something that has more reliability even if it is a slower method.

As stated in my original post, I need to be memory conscious and preserve as much data as possible. And be as quick as can be while keeping the other two the top priorities.

LabVIEW

Saving large amounts of data in a single file, and as it is collected

Saving large amounts of data in a single file, and as it is collected

Re: Saving large amounts of data in a single file, and as it is collected

Re: Saving large amounts of data in a single file, and as it is collected

Re: Saving large amounts of data in a single file, and as it is collected

Re: Saving large amounts of data in a single file, and as it is collected

Re: Saving large amounts of data in a single file, and as it is collected

Re: Saving large amounts of data in a single file, and as it is collected

Re: Saving large amounts of data in a single file, and as it is collected

Re: Saving large amounts of data in a single file, and as it is collected

Re: Saving large amounts of data in a single file, and as it is collected