LabVIEW

cancel
Showing results for 
Search instead for 
Did you mean: 

How to make this histogram function faster

Hi there,

 

I was timing different parts of my code and the histogram takes up a large amount of the analysis time, so I wanted to reduce it. My typical data is a 1600x1200 image of U16 values, and it takes about 13ms to run this code.

 

The histogram should tell me how many pixels have a value of 0, 1, 2, ... , 65535. Thank you for any insights.

 

 

0 Kudos
Message 1 of 12
(3,589 Views)

To form a histogram of all 1600 by 1200 points, you basically need to sort all of the values and then count how many are 0, are 1, etc.  Let's not even worry about the algorithm -- you basically need to "process" every point in some way.  Assume you have a 3GHz machine that takes 13ms to make a histogram for you.  How many machine cycles per point is it using for this?  Remember, it has to do a bit of work ...  By my calculations, the answer is 20.  Note that this includes array accessing, comparisons, possible swapping, etc.  I think this is pretty darn efficient ...

 

Bob Schor

0 Kudos
Message 2 of 12
(3,577 Views)

Yeah I'm wondering too why you think 13ms is slow.  Is your UI unresponsive when this is being generated?  Can this be moved to a separate loop?  Or somehow be done only when the system is idle?  Or less often?  You can try to use some inplace structures to improve performance but I'm betting the compiler is smart enough and is doing that already.

0 Kudos
Message 3 of 12
(3,564 Views)

.

 

 

Firstly, your data structures are at least twice the size needed.

The array in the shift register and all integer diagram constants need to be I32.

You can replace the "index, increment, replace" with an IPE. (probably won't make much of a difference, but is cleaner code)

Do you really encounter all possibleU16  values? Else you can use array min&max to get a better upper size for the histogram.

You could split the problem, run several histograms in parallel on a multicore machine, and add them up at the end.

If performance matters, disable debugging. (Make sure to prevent possible folding for a honest benchmark)

Message 4 of 12
(3,550 Views)

See if this is any faster. 😉

 

 

 

Also note that you should "separate compiled code from source file" (VI properties...general) if you include these large diagram constants, else it doubles the size of the VI on disk (3.6M vs 1.8M) because the diagram constant exists in the diagram AND in the compiled code).

Download All
Message 5 of 12
(3,514 Views)

Here's one way to parallelize some of the operations, but it does not seem to be any faster. 😞

(Well, yes, parallelization speeds things up but the double loop is 10x significantly slower overall. On my 16 core xeon I get about the same speed.)

 

 

0 Kudos
Message 6 of 12
(3,486 Views)

@Bob_Schor: Thank you for your insight. I guess I had a personal bias because this function was so simple to write, and some more mathematically complex ones were executing much faster.

 

@Hooovahh: My entire image analysis runs in a separate loop and takes about 50ms. I only let the analysis run every 100ms (I figure it is "live" enough for most users), but I was just thinking what if I want to run the analysis more frequently. Thank you for suggesting the in-place structure, I had only used it for cluster manipulation, not for arrays!

 

@Altenbach: Thank you for the tips. Disabling debugging (which was my timing mechanism thanks to a nifty custom probe I downloaded) dropped executing time from 13ms to 7.7ms. Using the in-place structure dropped it to 6.5ms. Changing the double precision to I32 dropped it to 5.2ms.

 

Do I encounter all possible U16 values? Not exactly. The camera is a 12-bit camera, but outputs in U16 format. So only 4096 values are ever seen, but I'm not exactly sure what they are... We do try to adjust the parameters to make the most of the range that we have though, so we often tune the settings to just less than saturating the detector.

0 Kudos
Message 7 of 12
(3,483 Views)

On second glance, it looks like every 16th value could be non-zero. Index 0, 16, 32 ...

0 Kudos
Message 8 of 12
(3,473 Views)

@Gregory wrote:

On second glance, it looks like every 16th value could be non-zero. Index 0, 16, 32 ...


No, the spacing seems uneven.

 

 

Even after reducing to every 16th, you'll get some zeroes.

 

Unfortunately, keeping only the reduced number of bins slows things down because if the required -4 bitshift in the values. More math!

0 Kudos
Message 9 of 12
(3,463 Views)

@altenbach: On third glance, you're right. I've tried with a couple different images, and it looks like starting with index 32, every 48th element is zero. I know next to nothing about image compression, so I'm not sure why this is.

0 Kudos
Message 10 of 12
(3,447 Views)