10-24-2022 03:32 PM
I am trying to save a large amount of data, as it is collected, and prevent memory leaks. I have not dealt with data sets large enough for memory to be a concern with LabVIEW before. My goals for the data saving process are below. I would appreciate any insight and memory conscious suggestions.
Goals:
1. Save data as it is collected -- everything is saved as a row in a giant string array. If the distributable crashes, I want as much data already saved as possible.
2. Not duplicate data -- in the case of a user triggering the saving process, I only want individual runs counted. There is a small chance that two runs will have exactly the same line of information, but it is a possibility.
3. All data contained in a single file (.xls or text file) -- I want one file for all data collected in a day no matter if the program is restarted, file manually altered, etc...
4. Must be a reliable method that has a low failure/glitch rate.
5. speed -- as fast as possible, but the memory consumption reliability is a larger priority.
I had originally planned to append the file, however this appears incredibly slow and frequently fails. Is there a better option?
10-24-2022 04:06 PM
This is an excellent article to start - https://www.ni.com/en-us/innovations/white-papers/09/comparing-common-file-i-o-and-data-storage-appr...
In short, text-based files (txt,csv, xlsx) are one of the worst to store large data
10-24-2022 11:49 PM
10-25-2022 01:55 PM
Hello, @CatDoe.
It seems to me you have two (slightly-different) problems: writing data "efficiently" (which might mean in some form of binary encoding) and writing data "safely" (so nothing is lost if you get a hardware or software glitch and the program halts with the file in an "unknown" state).
You failed to attach any code (note that a picture of a part of a Block Diagram isn't "code" -- only files with the extension ".vi" and related, such a .ctl, .lvproj, etc. count), so we can't "see for ourselves" what/how you are doing. You also didn't give us much quantitative data , like how much you are writing, how fast the data are arriving (is it in "bursts", or more-or-less continuous at, say, 10 kHz). What do you do when the file I/O throws an error? Does the program crash, or do you ignore the error and start a new file, or what?
My colleagues describe some of the virtues of TDMS. I don't have that much experience with this format, and can't say how it handles "crashes" and "restarts". However, if your data come in as "bursts" (for example, sampling 1000 points at a time at 1 kHz), then there's a fairly easy way to minimize file corruption with text files, and keep it fairly "fast" --
If you've got a Producer/Consumer design going, you can make such a "Write" routine a Consumer, so it runs in parallel with everything else, and should take way less than the 1 second before the next request (which will be just as fast).
Now, if you'd attached your code, I wouldn't have to ask if you are using a Producer/Consumer design, and could (instead) tried to explain how LabVIEW "lets you do two things at the same time" (also called "Parallel Processing").
Bob Schor
10-26-2022 02:48 AM
TDMS Streaming suffers from massive slowdown if many incremental writes are performed, I'm not sure it will maintain it's speed when used as the OP intends. I looked at this years ago for a similar application, but had to remove it from the selection due to this slowdown.
10-26-2022 03:32 AM
@Intaris wrote:
TDMS Streaming suffers from massive slowdown if many incremental writes are performed, I'm not sure it will maintain it's speed when used as the OP intends. I looked at this years ago for a similar application, but had to remove it from the selection due to this slowdown.
Interesting. That is not something I have seen and I am initially skeptical to that generic statement. I'm sure you tried all the advanced TDMS-functions too that TDMS has. If you have a forum post about it I am interested to see it, too learn when or if TDMS is not ideal. Even more interesting would be if the OP would post more info or some code, so we can help.
10-26-2022 07:21 AM
https://forums.ni.com/t5/LabVIEW/TDMS-flexibility-performance/m-p/3222453
I no longer have the details in my cranium, but I do remember that initial feedback was "That can't be" but when looked at closer, the way we wanted to write was basically incompatible with "fast".
10-26-2022 08:27 AM - edited 10-26-2022 08:31 AM
Writing small amounts of data very often to a TDMS file can result in some less then ideal situations, usually resulting in fragmentation, and having a very large index file. Periodically defragging is one solution, or simply flushing the buffer to disk less often is another. I made a toolkit posted on VIPM.io which has a couple of modes where it can handle the flushing periodically, or making new files at time intervals, and a few other modes.
https://www.vipm.io/package/hooovahh_tremendous_tdms/
Since TDMS file are able to be merged with a binary copy, you can also start logging into new files and combine them at the end.
Unofficial Forum Rules and Guidelines
Get going with G! - LabVIEW Wiki.
17 Part Blog on Automotive CAN bus. - Hooovahh - LabVIEW Overlord
10-26-2022 08:31 AM
10-26-2022 08:33 AM - edited 10-26-2022 08:34 AM
Hello Bob_Schor,
Attached is my VI. This VI is called in the message handling loop at the end of my testing sequence message queue.
This VI does not reliably create or edit files. Frequently it fails to generate a file or amend a file, yet no error code or probe/lightbulb issue is located. I would like something that has more reliability even if it is a slower method.
As stated in my original post, I need to be memory conscious and preserve as much data as possible. And be as quick as can be while keeping the other two the top priorities.