10-01-2024 07:55 PM - edited 10-01-2024 08:00 PM
I'm curious about how memory is used when inserting into an array vs reshaping an array to make it one larger and then replacing the value. I know it's generally bad practice to grow an array using a loop, but I was surprised to see such a difference between these two examples.
Example 1) Insert into array.
From what I could find, insert into array is making a copy with the new value added so I expected this to be slow with large arrays.
Example 2) Reshape array and replace
I would have thought this should take about the same amount of time as insert into array. However, when I timed each, I found reshaping and replacing is much faster once there is more than 10000 iterations and scales better.
Does anyone have any idea why reshaping and replacing appears faster here? I can't seem to find any information about how memory is handled with reshape array but maybe I haven't looked in the right places. Again, I know this is bad practice and this probably isn't a thorough timing test, but I didn't expect this result.
10-01-2024 10:02 PM - edited 10-01-2024 10:04 PM
It it hard to tell from a snippet, for example we can't tell the debugging and other settings. It is also not obvious why you don't use a for loop since the number of iterations is known before the loop starts. Why do you think you need the tick subtraction with every single iteration even though only the last one matters? What's wrong with the +1 primitive?
You should also be aware that a new allocation for the larger array is not done with each iteration, but preemptively with larger and larger chunks and the allocation algorithm could well slightly differ between the two versions. It would need detailed analysis to test that.
As a first task, create a proper testing harness and redo the tests.
All that said, why test two inefficient ways to do something if more sane ways exist.
10-02-2024 01:35 AM
Hi JC,
@JC990 wrote:
I know it's generally bad practice to grow an array using a loop, but I was surprised to see such a difference between these two examples.
10-02-2024 09:06 AM
@altenbach wrote:
It it hard to tell from a snippet, for example we can't tell the debugging and other settings. It is also not obvious why you don't use a for loop since the number of iterations is known before the loop starts.
This is just me playing around because I often don't know the size I'll need ahead of time so I'll either buffer the data or pre-allocate more space than I'll need. Also, I sometimes like to deepen my understanding by "breaking" things and figuring out what's going on.
You should also be aware that a new allocation for the larger array is not done with each iteration, but preemptively with larger and larger chunks and the allocation algorithm could well slightly differ between the two versions. It would need detailed analysis to test that.
I assume this would be "under-the-hood" and information about the algorithms aren't available to the public?
Why do you think you need the tick subtraction with every single iteration even though only the last one matters? What's wrong with the +1 primitive?
I am embarrassed I did that and my only (rather poor) excuse is that I was tired and distracted. Regardless, I would hope those inefficiencies wouldn't slow things down much compared to the array functions.
10-02-2024 10:04 AM - edited 10-02-2024 10:04 AM
@JC990 wrote:
I assume this would be "under-the-hood" and information about the algorithms aren't available to the public?
Yes that is an internal detail and not documented openly. Not because it is top secret but for the simple reason that this is part of the LabVIEW optimizations and as such regularly reviewed and revised as new insights, use patterns are discovered and new and different hardware platforms are introduced. There is even a chance that it may not be the same on every LabVIEW hardware platform depending on the capabilities of the underlaying hardware and the actual LLVM compiler version used in the final compilation step to the hardware target code. So documenting it only creates either a hard fact that prevents future improvements if they want to keep the documented method intact, or creates old and stale documentation that people are falling over later and scream about for not adhering to anymore.
10-02-2024 10:12 AM
In theory reshape could insert some extra space in advance and amount of reallocations will be lower. May be you will be able to see this when using High Resolution Relative Seconds for profile (which have much higher precision than TickCount), then you probably will see some "steps" on the graph. May be. Usually I always preallocating as much as possible and then using replace, this is common practice.
10-02-2024 10:19 AM
There is absolutely nothing wrong with testing and comparing various implementations of an algorithm. You could even create an array of loop times to potentially see at which sizes allocations occur (But please use high resolution relative seconds, because tick count is not accurate enough). You could even fill the array with the loop times instead of boring random numbers to get much richer information.
(And yes, as Gerd mentioned, autoindexing at the output tunnel would give the same results, but when done on a while loop with an initially unknown number of iterations, this has exactly the same potential memory thrashing issues. No advantage except code simplicity)
Good benchmarking is quite difficult and there are many landmines that can give false results. For some points, have a look at our talk from a few years ago.
In a typical scenario, you have reasonable guesses of the final size and even pre-allocating double that, then trimming at the end would be much more efficient.
For up to "medium" (whatever that means!) sized arrays it does not really matter, but nobody in his right mind would fill hundreds of millions of samples one element at a time in tight loops. Whenever the array grows past the allocated slack space, a new contiguous block of memory needs to be found and all current data copied over. Due to the resulting memory fragmentation, you will run out of sufficient contiguous memory way before you run out of physical memory. Remember, Array MUST be contiguous and they cannot be fragmented.
Happy testing!!!!
10-02-2024 10:20 AM - edited 10-02-2024 10:46 AM
@Andrey_Dmitriev wrote:
In theory reshape could insert some extra space in advance and amount of reallocations will be lower.
Basically this is the approach the while loop uses when using autoindexing output tunnels. It starts with a relatively moderate array size of a few dozen elements and then doubles the amount whenever needed. When the loop finally terminates, the array is one more time reshaped to the final size and then code continues. This reduces the complexity of memory allocations to log2(n) which is considerable for anything but short arrays.
Even better is a for loop of course. The array is fully pre-allocated at the beginning of the loop and then possibly reshaped at the end if the loop was aborted prematurely through its termination terminal. This causes at most two memory manager calls (a huge saving of time and CPU!)
10-02-2024 10:23 AM
@rolfk wrote:
@Andrey_Dmitriev wrote:
In theory reshape could insert some extra space in advance and amount of reallocations will be lower.
Basically this is the approach the for loop uses when using autoindexing output tunnels..
I think you are talking about WHILE loops. In FOR loops, the final size is exactly known before the loop starts and the entire output can be allocated at once.
(Well, now we also have the conditional terminal, but that's not typical)
10-02-2024 10:49 AM - edited 10-02-2024 10:53 AM
@altenbach wrote:
@rolfk wrote:Basically this is the approach the for loop uses when using autoindexing output tunnels..I think you are talking about WHILE loops.
Headslap! Of course you are right and thanks for pointing out the brain short circuit during typing. I fixed it for clarity but want to acknowledge your correction as valid!