LabVIEW Idea Exchange

cancel
Showing results for 
Search instead for 
Did you mean: 
Petru_Tarabuta

"Get Data Size" node

Status: New

It would be useful to have a node that could be fed any wire and return the size (in bytes) of the data on that wire. In other words, the node would return the number of bytes that the wire occupies in memory.

 

1 (edited).png

 

 

 

 

 

 

For example, the node would return a value of:

  • 1 byte when fed a U8 wire
  • 2 bytes when fed a U16
  • 4 bytes when fed a I32
  • 8 bytes when fed a DBL
  • 800 bytes when fed a 1D array that contains 100 DBL elements
  • 9 bytes when fed a cluster that contains a DBL and a U8
  • 9 bytes when fed an object that contains a DBL and a U8
  • 18 bytes when fed an object that contains two other objects that occupy 9 bytes each
  • and so on

Notes

  • The node would would enhance LabVIEW programmers' ability to monitor and audit memory usage.
  • The node may serve as an additional tool to detect memory leaks (by repeatedly calling the VI on the same wire and checking whether the size is going up).
  • The node would simply be interesting to programmers interested in performance and would enable programmers to learn more about LabVIEW internals.
  • The node would be useful especially to query the size of complex data structures, such as objects that contains other objects that themselves contain objects, or clusters that contain arrays, or arrays that contain clusters, or objects that contain DVRs.
  • I would be happy if the node had a second input named "Mode" (or similar). This input may be a typedef enum with items named "Shallow Measurement" and "Deep Measurement" (or similar). This input could be required, recommended, or optional.
    • When "Shallow Measurement" mode is selected, the node would return the size of all the by-value data fields in the main input wire, but would not add up the size of data referenced by DVRs or other references. For example, a wire that contains a cluster that contains a DBL, a U8, and a DVR would return perhaps 13 bytes (8 + 1 + presumably 4 bytes for the DVR reference itself). It would not add to the result the size of the data referenced by the DVR.
    • When "Deep Measurement" is selected, the node would recursively scan all data structures, including DVRs and other references.
  • In both "Shallow Measurement" and "Deep Measurement" modes there should be no limit to the scanning depth. In other words, if a cluster contains a cluster that contains a cluster and so on, they should all be measured regardless of the nesting depth. Similarly, if "Deep Measurement" is selected, and a DVR contains a DVR that contains a DVR and so on, the data behind all these DVRs should be added to the total.
  • When in "Deep Measurement" mode and fed a Queue reference wire, the node could perhaps return the size of all the data in the queue. In other words, the size of all the elements present in the queue.
  • Perhaps the LabVIEW compiler already uses a "Get Data Size" function internally? If such a function already exists, perhaps it would be relatively straight forward to expose it as a node in the palettes?
  • Perhaps the best location for this node would be in the Programming >> Application Control >> Memory Control palette.

2.png

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Thanks!

12 Comments
fefepeto_kb
Member

I'm coming from the perspective of a low level programmer (C for microcontrollers) here, so if it feels nitpick I'm really sorry. Also note that my goal is to clarify the memory management behind the scenes of the processors we widely use today and I oversimplify things here.

 

So, first of all, what a variable looks like in memory: type descriptor and value. Why is that? Simply put: what is the difference if an Int8 or Uint8 is stored in memory? From the value's perspective nothing. If one sees 0b10000001 they cannot tell weather that's -127, 129 or ü (the ASCII character). So, in order for the language to handle and display the values correctly it stores a type descriptor before the value.

 

Well, that sounds good, but what about arrays (and also strings since they are arrays of chars on the memory level)? Arrays are similar, with a little difference. The type descriptor also stores if the variable is an array, and the low level type of the variables inside the array, but before storing the values it stores the size(s) of the array. So, it looks like type descriptor, length and values. Don't quote me on the exact order here.

 

Sounds good, but what about complex types, like clusters? Well, they are similar but different. A cluster is also having a leading type descriptor, signaling the language that it is a complex type or structure. Then it stores the length of the data, including further type descriptors and their specific values.

 

Things are getting complicated, but let's complicate them a little more. Let's introduce pointers. Pointers are a type that stores the memory address of a variable in the memory. It allows referencing the same value, without making data copies of it, which also has its own overhead. But how a pointer is stored in the memory? Surprise surprise, it has a type definition telling the processor that it is a pointer, then a type definition of the variable then a value, which points to the beginning of the variable's memory storage, to it's type definition or length.

 

Great, we did cover everything right? No. One more commonly used complex type is a linked list. Note that it is not directly available in LabVIEW. This is similar to an array on the surface and is very different on low level. Basically, the linked list is a bunch of variables in different sports of the memory, not necessarily continuously stored like the arrays. The additional thing is a pointer to the next element in the list (additionally another pointer to a pervious element if we are talking about a double linked list). It is easy to see that this allows a lot of operations to happen more effectively than in the case of arrays, like removing one element is simply changing the pointer of the previous element to point to the next element. But it also makes seeking more time consuming.

 

So with all this knowledge made available the following questions need answers:

  1. Are we looking for the data size or the actual memory occupation?
  2. If I recall correctly LabVIEW also stores pointers to arrays, strings and variants inside clusters. What are we interested in here, the storage size of the cluster, or the array's memory occupation should also count? Then the array will be it's own occupation + 4 or 8 (in case of 64 bit memory addressing) bytes of a pointer.

And, to provide some things that can be investigated already: type casting anything to U8 array, and getting the array size will return the size of the values stored in memory as bytes.

If the overall memory occupation is the point of interest then flattening to string and parsing out the sizes might be helpful.

 

For reference leaks, I personally use DETT, desktop execution trace toolkit.

For memory leaks, well, that is trickier. LabVIEW depends on the garbage collector of the OS. For example, if you resize the array, and there is not enough free space in the memory behind the original allocation LabVIEW makes a mem copy to a space where it can allocate the whole new size. But the old data does not get deleted (in the sense of ZEROed out). It gets "freed" meaning that the OS gets the information that the memory space is no longer in use and can reallocate that for other purposes. In Windows garbage collector is quite lazy, so you will have memory leaks, but the proposed operator would not represent this. For this, you can follow the memory consumption in Task Manager. A simple representation of how lazy garbage collector is a variant, loaded with thousands of attributes then terminate the wire (maybe have a loop here with manual stop condition so you can observe what happens). The second scenario would be a variant loaded with thousands of attributes which get deleted before the wire is let loose (here having the highlighting on might simply help observing how the memory usage changes).

 

Another thing to consider is the Tools -> Profile -> Performance and Memory... tool.

 

References to LabVIEW help to get an in detail overview of LabVIEW types inside memory:

Numeric Data Types Table - NI

How LabVIEW Stores Data in Memory - NI

Type Descriptors - NI

 

Another useful thing in these types of questions is the LabVIEW performance training. I did it while I was at NI and it is really a game changer in writing both memory and processor efficient algorithms.

Petru_Tarabuta
Active Participant

Hi fefepeto_kb,

Thanks for the detailed reply, and for highlighting the question of type descriptors, pointers, and array size. For brevity I will refer to these additional (but vital) pieces of data as "metadata".

 

I would be happy with any/all of the following three implementations:

  • Implementation 1: The node returns the size of the actual data and ignores the size of metadata. For example, the node would return a value of 800 bytes when fed a 1D array of 100 DBL elements.
  • Implementation 2: The node returns the size of (data + metadata). For example, the node would return a value of 812* bytes when fed the same 1D array of 100 DBL elements. *assuming 812 = Size of pointer (8 bytes) + Array Size (4 bytes) + 800 bytes (size of data).
  • Implementation 3: The node returns both values: the data size and (data + metadata) size, or data size and metadata size.

Ideally I would slightly prefer implementation 2 or 3 over implementation 1. Ignoring the size of metadata may lead to underestimating the size of some data structures. For example, a linked list where each element is a DBL may contain more metadata than data (by size).

Mads
Active Participant

A primitive to return size would be more efficient than having to do a flatten and length check. The function could return both the data size and full memory footprint.

 

When it comes to using the VI profiler I think the profiler is less than optimal when it comes to detecting memory leaks etc as it only presents the current numbers.

It leaves it to us as users to manually trend and detect issues. I have suggested improvements of that here:

https://forums.ni.com/t5/LabVIEW-Idea-Exchange/Trends-and-analytical-features-in-the-Profile-Perform...

fefepeto_kb
Member

Making some clarification after realizing that my response was ambiguous.

I just realized that I did not specify the intent with those suggestions: stop gap. They are there to get something as close to the desired functionality as possible now.

I mentioned profiler for getting a feel of the overall memory usage of the VI, not for an in depth analysis or leakage detections.

 

I also mentioned the LabVIEW Performance Course because that gives basic instructions on having a brief overview of the application performance, and also best practices to avoid memory leaks in the first place.

 

An addition to my first response: using the buffer allocation tool can help predict places where memory leaks could occur, by showing where memory allocations are likely to happen. Notice the highlighted words. I think those are the areas that need to improve in order to have high confidence in the output of the Buffer Allocation tool for these use-cases.

wiebe@CARYA
Knight of NI

There are ways to do this.

wiebeCARYA_2-1715590175987.png

 

Never needed this myself, and I don't see a mention of a use case. "Useful especially to query the size of complex data structures" comes closest, but "query the size of a complex data structure" isn't exactly a rational for getting size...

Petru_Tarabuta
Active Participant

Hi Wiebe, thanks for the snippet. It's good that there are straight-forward ways to get the size of a by-value wire, but the limitations are:

  • No easy, recursive measurement of data that is referenced by DVRs, Queues, Events (and other references)
  • Metadata size (type descriptors, array sizes) may not be correctly reported. For example, in your snippet a value of 4 bytes is output in some cases. Presumably this accounts for the array size. But it seems that the type descriptor size is not added to that output. Moreover, the proposed node would output two values, such that it would be possible to distinguish the "real" data size from the metadata size.

I don't have a particular use case in mind, but I think that the node would enable lots of people (myself included) to understand LabVIEW better. It would enable us to optimise code in ways we may not be aware of right now.

fefepeto_kb
Member

Desktop Execution Trace Toolkit did that, and also allowed you to check the timing of memory operations.

Unfortunately it is not supported beyond 2021, and I'm not aware of any replacements. I use LabVIEW 2021 and it allows me to detect reference leaks with the origin of the references also defined, queue operations, crucial with parallel execution of multiple threads and also events.

Maybe all we might need is a tool that can do this or even more.

fefepeto_kb_0-1715667512783.png

wiebe@CARYA
Knight of NI

@Petru_Tarabuta wrote:

But it seems that the type descriptor size is not added to that output. Moreover, the proposed node would output two values, such that it would be possible to distinguish the "real" data size from the metadata size.


I'm not sure why I'd need this, so I also don't know why the type descriptor should be included.

 

A few of these nodes return the type descriptor:

wiebeCARYA_0-1715705164528.png

I think the biggest downside is that the data often needs to be copied to determine it's size (or type descriptor).

raphschru
Active Participant

@Petru_Tarabuta wrote:

[...] No easy, recursive measurement of data that is referenced by DVRs, Queues, Events (and other references)


What if the same reference appears several times in your data structure? Recursing several times in the same DVR/Queue/Event would be irrelevant to measure the size of your total data, unless you only enable to evaluate each instance of referenced data once.

 


@Petru_Tarabuta wrote:

Metadata size (type descriptors, array sizes) may not be correctly reported. For example, in your snippet a value of 4 bytes is output in some cases. Presumably this accounts for the array size. But it seems that the type descriptor size is not added to that output. Moreover, the proposed node would output two values, such that it would be possible to distinguish the "real" data size from the metadata size.


The array/string/... sizes are always reported correctly by the flatten functions. Also, not sure why you need the type descriptors to measure the data size. These are only descriptions of the wire types, but do not count in the data size (unless you systematically turn all your data into variants for no reason, which is considered a bad practice).

Petru_Tarabuta
Active Participant

"I think the biggest downside is that the data often needs to be copied to determine it's size (or type descriptor)." - I agree that the Get Data Size operation may be expensive. If that's the case then it would be up to the programmer to use it judiciously. Perhaps the node could output a "Time Elapsed (ms)" output. This would highlight that the operation can be expensive.

 

"What if the same reference appears several times in your data structure? Recursing several times in the same DVR/Queue/Event would be irrelevant to measure the size of your total data, unless you only enable to evaluate each instance of referenced data once." - The underlying data should be counted only once.