LabVIEW

cancel
Showing results for 
Search instead for 
Did you mean: 

How does parallel for loop effect iteration terminal


@Quiztus2 wrote:

 

It turned out that indexing works on chunks as I expected. Every instance starts with the expected index as it would in plain sequence. The ambiguities came from my combination of test data and processing.

 

I have a room of interest for my 3D array. I use the index values instead of Array subset.VI+autoindexing. I hope that I save some memory this way, since my 3D input array is of the size of some gigabyte.


Sorry I cannot look at your snippet and we also don't have any of your subVIs. Are they from a toolkit or home-made? What do they do? Are all slow ones reentrant or even inlined?

For a 3D array, it probably also depends on how you "slice it" during processing. It help if e.g. processed planes are adjacent in memory (se comment below).

 

(For example if you index into a just transposed 2D array to process columns instead of rows, the compiler might decide to just swap the indices instead of doing an actual transpose first, making elements non-adjacent. You might get a significant speedup if you do a real transpose (e.g. with an "always copy" or by using the transpose function form the linear algebra palette instead of "transpose array")

 

Message 11 of 25
(1,741 Views)

The max number of parallel loop iterations will effect the behavior.

 

IIRC, it's 32 by default, as it is for some reason linked to a fictional max number of logical cores that seemed reasonable at the time it was made up.

 

In LabVIEW.ini:

ParallelLoop.MaxNumLoopInstances=500000

 

There doesn't need to be a limit...

 

Although it makes some sense to link the nr of parallel executions to the logical cores (as an attempt to distributing the load evenly across them), there's no guarantee that it will happen. There's other stuff (like windows) utilizing the cores.

 

Also, spreading CPU load is not the only use case for parallel loop execution.

 

 

0 Kudos
Message 12 of 25
(1,713 Views)

I attached a simplified VI. I have to work in every specific pixel along a set of consecutive shots. Since my first coordinate are the shots (z) and not the pixel data xy I can't use auto indexing well here.

simplified.png

Actor Framework
0 Kudos
Message 13 of 25
(1,692 Views)

You may be able to save some memory and improve speed if the "whack the mole dots" I am seeing are correct.

 

  1. After you extract your data points from the 3D array, DO NOT reshape the array.
  2. DO NOT change the array to doubles, let the multiplication handle it.
  3. The summation works on multidimensional arrays, so no need to reshape.

This may improve speed and some memory usage.

 

mcduff_0-1696522343135.png

 

0 Kudos
Message 14 of 25
(1,682 Views)

@Quiztus2 wrote:

I attached a simplified VI. I have to work in every specific pixel along a set of consecutive shots. Since my first coordinate are the shots (z) and not the pixel data xy I can't use auto indexing well here.


 

All your cluster elements are zero by default. Does the ROI correspond to "everything" or a subset of the full array? I would probably take the 3D ROI subset before the loop, then use "index array" to get data columns directly as 1D arrays.

 

Not sure why you even enter the loop stack if there is an error. Is there anything (in the full code) in the inner loop that can generate an error?

 

0 Kudos
Message 15 of 25
(1,678 Views)

@mcduff wrote:

You may be able to save some memory and improve speed if the "whack the mole dots" I am seeing are correct.

 

  1. After you extract your data points from the 3D array, DO NOT reshape the array.

My impression was that the "some heavy lifting" code is very different in the real code and the summing is just a simplified stand-in.

 

altenbach_0-1696523559996.png

 

Still, it that's the real code, we should take the sum first and scale it later. (1 multiplication instead of N multiplications!). Of course the compiler might figure it out either way. 😄

 

altenbach_0-1696524327737.png

 

Message 16 of 25
(1,669 Views)

@altenbach wrote:

@mcduff wrote:

You may be able to save some memory and improve speed if the "whack the mole dots" I am seeing are correct.

 

  1. After you extract your data points from the 3D array, DO NOT reshape the array.

My impression was that the "some heavy lifting" code is very different in the real code and the summing is just a simplified stand-in.

 

altenbach_0-1696523559996.png

 

Still, it that's the real code, we should take the sum first and scale it later. (1 multiplication instead of N multiplications!). Of course the compiler might figure it out either way. 😄

 


Just like Ivory soap, 99.99%, you are correct. But in this case if the code is real, the pixels are in V16. You would need to convert to double to avoid potential overflow, but that still may be faster than a multi million point multiplication. That is why I suggested let the multiplication do the conversion for you; make memory space and math operations at the same time.

0 Kudos
Message 17 of 25
(1,662 Views)

Never mind. @altenback mentioned this. I'll keep my comment, maybe two different explainations help.


Spoiler
@mcduff wrote:

You may be able to save some memory and improve speed if the "whack the mole dots" I am seeing are correct.

 

  1. After you extract your data points from the 3D array, DO NOT reshape the array.
  2. DO NOT change the array to doubles, let the multiplication handle it.
  3. The summation works on multidimensional arrays, so no need to reshape.

This may improve speed and some memory usage.

 

mcduff_0-1696522343135.png

 


It is also redundant to multiply the array, esp. twice.

 

Simply add all elements and than multiply the result.

 

This works because A*c + B*c = (A + B)*c...

 

So, sub the array than multiply the scalar 2X.

 

0 Kudos
Message 18 of 25
(1,653 Views)

@mcduff wrote:
You would need to convert to double to avoid potential overflow, but that still may be faster than a multi million point multiplication. That is why I suggested let the multiplication do the conversion for you; make memory space and math operations at the same time.

Both the conversion to DBl or your coercion will need to allocate the full size DBL array to be summed later (unless the compiler can do some real magic).

 

According to the diagram comments, we typically have 5000 pages, so that's the maximum number of column elements. The sum would comfortably fit in U32, so I would probably convert to U32, do the integer sum, then scale with a DBL.

 

(In fact, I would probably do a couple of different ways and compare. You can never be sure what's best. 😄 )

0 Kudos
Message 19 of 25
(1,638 Views)

I think there's a (less obvious) optimization.

 

As OP is iterating the sum over a varying range of subsets (ROIs), depending on the numbers it might be worth making a 2D array of cumulative sums (aka the. integral).

 

You'd be able to get the sum of a subset by getting the top left and bottom right, and the difference, if I'm not mistaken, will be the sum of the array.

 

This is a bit of work to do, but once you have the 2d integral array, getting the sums of roi will be blazingly fast.

0 Kudos
Message 20 of 25
(1,635 Views)