Help/Guidance on writing to Binary Files to be read by other languages - Varying DataTypes

LuminaryKnight · ‎02-06-2025

I currently write a 2D array of Float to a binary file. I append to the beginning of the binary file some information regarding headers for each column. The header also includes the data type of each column and of course since this is a 2D array, they're all Float. Other users I work with are able to read this binary file using Python/Matlab.

However, we've run into limitations. Again, Float is selected because it's a 2D array and some of the values are indeed floats. But this causes the following problems:

File size: The file is already large. But some of the values saved to this file were originally just Bytes. So they got an increase of 3 bytes for every row for every instance a byte set to a float. So the file is made larger as a result.
We ran into an issue (I wasn't thinking about this originally) where U32 counters converted to Floats hits a max on the Float and eventually just repeats. So I fixed this by typecasting U32 to Float. The issue here is making sure I tell all users which numbers they need to make sure they typecast back to U32 or I32.

So basically the question is this: An array of cluster (containing all the appropriate data types from U32, Float, Byte) saved to binary file, are Python/Matlab gonna be able to read this back into whatever equivalent thing they do? NumPy? Or is that too easy? That an array of cluster is not necessarily a 2D array of varying data types. So the binary file would probably only be readable by LabVIEW due to something in the binary file identifying it as an array of cluster?

Thoughts? White Papers? Thank you.

Jay14159265 · ‎02-06-2025

@LuminaryKnight wrote:

I currently write a 2D array of Float to a binary file. I append to the beginning of the binary file some information regarding headers for each column. The header also includes the data type of each column and of course since this is a 2D array, they're all Float. Other users I work with are able to read this binary file using Python/Matlab.

However, we've run into limitations. Again, Float is selected because it's a 2D array and some of the values are indeed floats. But this causes the following problems:

File size: The file is already large. But some of the values saved to this file were originally just Bytes. So they got an increase of 3 bytes for every row for every instance a byte set to a float. So the file is made larger as a result.

We ran into an issue (I wasn't thinking about this originally) where U32 counters converted to Floats hits a max on the Float and eventually just repeats. So I fixed this by typecasting U32 to Float. The issue here is making sure I tell all users which numbers they need to make sure they typecast back to U32 or I32.

So basically the question is this: An array of cluster (containing all the appropriate data types from U32, Float, Byte) saved to binary file, are Python/Matlab gonna be able to read this back into whatever equivalent thing they do? NumPy? Or is that too easy? That an array of cluster is not necessarily a 2D array of varying data types. So the binary file would probably only be readable by LabVIEW due to something in the binary file identifying it as an array of cluster?

Thoughts? White Papers? Thank you.

You are in the deep end now : )

Dealing with binary data files that are mixed data types is work, on both ends (packing and unpacking). Is there a reason you must use binary?

One thing to know, xml files handle multiple data types nicely and are human readable ... and there are xml libraries in many programming languages ........ and XML files compress nicely ............... and LabVIEW has an xml schema. json files can also be used but they don't handle binary data types directly so i never use them.

So to directly answer your question. No, those other programs are not going to be able to directly read some unknown binary file generated by LabVIEW unless you tell the other program exactly how the binary data is packed into the file. If it is one data type, as in the entire file is n floats, its not too hard to unpack it, if it is flattened clusters of multiple data types, well you have some work to do.

If you want to do the minimum amount of work to move files of mixed arbitrary data types including metadata between programs you can save the data as a tdms or HDF5 file and use the built in libraries for each program language (python, matlab, C++, labview, etc...) to read the data in and out. If the files get too big, compress them.

______________________________________________________________
Have a pleasant day and be sure to learn Python for success and prosperity.

raphschru · ‎02-06-2025

Hi,

When converted to binary in LabVIEW:

- A cluster is simply the concatenation of the binaries of its elements.

- An array is 4 bytes representing the array size (interpreted as an I32), followed by the concatenation of the binaries of its elements.

- A string is 4 bytes representing the string size (interpreted as an I32), followed by the actual string as a sequence of bytes.

- Numeric types such as double (float64), single (float32), I8/16/32/64, U8/16/32/64, Enum U8/16/32 are decomposed in bytes (by default in big-endian order).

- A boolean is just a byte where only the least significant bit is used.

- More infos here.

Then, depending whether the Python part knows about the actual cluster element data types, you may or may not add a header listing the types.

Finally, knowing how LabVIEW flattens the data and what are the actual types flattened, nothing prevents you from unflattening it in Python.

Regards,

Raphaël.

santo_13 · ‎02-06-2025

I recommend the TDMS or HDF5 route, being better than creating something from scratch.

Santhosh
Soliton Technologies

New to the forum? Please read community guidelines and how to ask smart questions

Only two ways to appreciate someone who spent their free time to reply/answer your question - give them Kudos or mark their reply as the answer/solution.

Finding it hard to source NI hardware? Try NI Trading Post

altenbach · ‎02-06-2025

Lets be clear that by "float" you means SGL, not DBL. Converting U32 to SGL should not cause any repeat, you'll just lose precision once the U32 value no longer fits into the mantissa. Can you explain your observation? "Repeats" (sic) will only happen if you do the raw counting already in SGL, where at one point n+1 =n. Bad idea.

An "array of clusters" will only make sense if all datatypes have the same number of elements and you are e.g. streaming new structures to disk. To save all data at once, a cluster of arrays might be easier.

Any binary structure can be read/written in any software as long as the specifications are given (sections, sizes, headers, byte order, datatypes, etc.) but I agree with others that you should stick to well defined known file structures.

LabVIEW Champion.

cordm · ‎02-06-2025

In Python, you can use struct.unpack to convert binary data.

For Matlab, you might have to read element-by-element which is going to be slow. Maybe with R2024b you can use typecast with a custom struct as an argument?

If you do not care about size, just use doubles for all datatypes.

BertMcMahan · ‎02-06-2025

I heartily recommend not reinventing this particular wheel. Try TDMS.

Kyle97330 · ‎02-06-2025

"File size: The file is already large"

Can you say how big?

Some might feel like 1 meg is a "large file". Others wouldn't start calling it a "large file" until it hits gigabyte levels.

Is there a limitation here? Is it directly related to size (disk space, transmission time over a network)? Or is it indirect (time it takes to save/load the file, pack/unpack data)? Or is it more of a fear of a limitation that hasn't been reached yet?

Are there other potential data integrity issues here, like making sure floats are loaded and saved bit-perfectly as opposed to getting some rounding on the least significant digits?

My view here is that unless you are coming up against a limitation on the technical side, simplicity and ease of use should rule. Take the route that creates the least potential for misunderstandings, requires the least amount of "computer science" knowledge to decode, and uses encoding that works across as many platforms with premade tools as possible. Right now people are using Python/Matlab, but what happens tomorrow when someone wants to import it to Excel? Then someone wants to upload it to a database?

Storing things in direct binary has its uses where performance rules all, but it doesn't sound like this is the case yet... hard to know without specifics.

LuminaryKnight · ‎02-06-2025

See if I can answer/respond to some of the questions and suggestions:

File Size: We've seen close to 2GB. We're unpacking data into something readable. And do it quickly. As much as I like XML and what it can do, XML will be significantly larger. This also means using DBL is out of the question as well as I'd be double the file size.

TDMS: I have thought about TDMS files, but I have very little experience with them. I had always thought that was proprietary to LabVIEW (no real reason why I thought that). Is that something Python/Matlab are familiar with?

U32 to SGL: Sorry, yes, Float meaning 32bit Single Precision Float. I created a very simple VI that starts with U32 and increments the U32 and shifts it around. Each iteration, the incremented value is converted to Single Precision. I instructed the loop to stop if the previous SGL is equal to the current SGL. Attached are the final numbers once it hits a repeat value.

Jay14159265 · ‎02-06-2025

tdms for other languages:

https://www.mathworks.com/help/daq/tdmsread.html

https://github.com/adamreeve/npTDMS

hdf5:

https://knowledge.ni.com/KnowledgeArticleDetails?id=kA00Z0000019QOLSA2&l=en-US

https://docs.h5py.org/en/stable/index.html

https://www.mathworks.com/help/matlab/hdf5-files.html

______________________________________________________________
Have a pleasant day and be sure to learn Python for success and prosperity.

LabVIEW

Help/Guidance on writing to Binary Files to be read by other languages - Varying DataTypes

Help/Guidance on writing to Binary Files to be read by other languages - Varying DataTypes

Re: Help/Guidance on writing to Binary Files to be read by other languages - Varying DataTypes

Re: Help/Guidance on writing to Binary Files to be read by other languages - Varying DataTypes

Re: Help/Guidance on writing to Binary Files to be read by other languages - Varying DataTypes

Re: Help/Guidance on writing to Binary Files to be read by other languages - Varying DataTypes

Re: Help/Guidance on writing to Binary Files to be read by other languages - Varying DataTypes

Re: Help/Guidance on writing to Binary Files to be read by other languages - Varying DataTypes

Re: Help/Guidance on writing to Binary Files to be read by other languages - Varying DataTypes

Re: Help/Guidance on writing to Binary Files to be read by other languages - Varying DataTypes

Re: Help/Guidance on writing to Binary Files to be read by other languages - Varying DataTypes