LabVIEW

cancel
Showing results for 
Search instead for 
Did you mean: 

Why is C code so much faster? (Image Processing)

Solved!
Go to solution

@altenbach wrote:

One of the big expenses is the operation on DBLs. All values are quantized to 256 possibilities so instead of all these mutiplications, all you need is a tiny LUT (lookup table) for each color that gives an U8 result for all possible 256 multiplications. This keeps everything in U8. I am sure it would be faster.


OK, here's what I had in mind. I have not benchmarked it but speed should be OK. Try it! Npote that the code is 100% U8.

 

(the subVI is inlined and the outer loop parallelized. Also test without parallelization to see if it gains anything)

 

 

Download All
0 Kudos
Message 11 of 16
(1,441 Views)

@altenbach wrote:

OK, here's what I had in mind. I have not benchmarked it but speed should be OK. Try it! Npote that the code is 100% U8.


I thought you were going to use a 3D array and then just index the values with a single Index Array and you magically have the final value.  No need to add either.


GCentral
There are only two ways to tell somebody thanks: Kudos and Marked Solutions
Unofficial Forum Rules and Guidelines
"Not that we are sufficient in ourselves to claim anything as coming from us, but our sufficiency is from God" - 2 Corinthians 3:5
0 Kudos
Message 12 of 16
(1,427 Views)

@crossrulz wrote:
I thought you were going to use a 3D array and then just index the values with a single Index Array and you magically have the final value.  No need to add either.

Then the LUT would no longer be tiny (24 bits!) 😮 This is 16MB (compared to my 768bytes) and won't fit into the cache of a typical CPU.

 

Alternative to my above solution, we can also index directly into the 2D array. Not sure what's better.

 

Download All
0 Kudos
Message 13 of 16
(1,421 Views)

@altenbach wrote:

One of the big expenses is the operation on DBLs. All values are quantized to 256 possibilities so instead of all these mutiplications, all you need is a tiny LUT (lookup table) for each color that give an U8 result for all possible 256 multiplications. This keeps everything in U8. I am sure it would be faster.


In this case converting to SGL should be plenty and a bit faster. It'd be interesting to compare the SGL, DBL and your lookup table.

/Y

G# - Award winning reference based OOP for LV, for free! - Qestit VIPM GitHub

Qestit Systems
Certified-LabVIEW-Developer
0 Kudos
Message 14 of 16
(1,389 Views)

@altenbach wrote:

 

Alternative to my above solution, we can also index directly into the 2D array. Not sure what's better. 

 


I'd guess the above example would be better, since constant folding will just start with 3 1D arrays, but who knows the compiler can be crazy.  I do like the 3D array idea.  In either case all of these options have to be faster than anything we've come up with so far involving doubles (or floating point in general).

0 Kudos
Message 15 of 16
(1,369 Views)

@Hooovahh wrote:

I'd guess the above example would be better, since constant folding will just start with 3 1D arrays, but who knows the compiler can be crazy.  I do like the 3D array idea.  In either case all of these options have to be faster than anything we've come up with so far involving doubles (or floating point in general).


Sometimes logic plays a trick on us. When Doom 3 (i think) was delevoped they'd created a lookup table for Sin to improve frame rate. By that time the FPUs was so good they were quite a bit faster than the memory access to read the table.

The SGL idea could be surprisingly fast, especially if the lookup table gets large.

 

Though, this sounds like the type of idea that should be run on a GPU. 🙂

/Y

G# - Award winning reference based OOP for LV, for free! - Qestit VIPM GitHub

Qestit Systems
Certified-LabVIEW-Developer
0 Kudos
Message 16 of 16
(1,351 Views)