04-10-2016 07:36 PM - edited 04-10-2016 07:38 PM
@altenbach wrote:One of the big expenses is the operation on DBLs. All values are quantized to 256 possibilities so instead of all these mutiplications, all you need is a tiny LUT (lookup table) for each color that gives an U8 result for all possible 256 multiplications. This keeps everything in U8. I am sure it would be faster.
OK, here's what I had in mind. I have not benchmarked it but speed should be OK. Try it! Npote that the code is 100% U8.
(the subVI is inlined and the outer loop parallelized. Also test without parallelization to see if it gains anything)
04-10-2016 07:49 PM
@altenbach wrote:OK, here's what I had in mind. I have not benchmarked it but speed should be OK. Try it! Npote that the code is 100% U8.
I thought you were going to use a 3D array and then just index the values with a single Index Array and you magically have the final value. No need to add either.
04-10-2016 07:57 PM - edited 04-10-2016 08:01 PM
@crossrulz wrote:
I thought you were going to use a 3D array and then just index the values with a single Index Array and you magically have the final value. No need to add either.
Then the LUT would no longer be tiny (24 bits!) 😮 This is 16MB (compared to my 768bytes) and won't fit into the cache of a typical CPU.
Alternative to my above solution, we can also index directly into the 2D array. Not sure what's better.
04-11-2016 03:46 AM
@altenbach wrote:One of the big expenses is the operation on DBLs. All values are quantized to 256 possibilities so instead of all these mutiplications, all you need is a tiny LUT (lookup table) for each color that give an U8 result for all possible 256 multiplications. This keeps everything in U8. I am sure it would be faster.
In this case converting to SGL should be plenty and a bit faster. It'd be interesting to compare the SGL, DBL and your lookup table.
/Y
04-11-2016 07:39 AM
@altenbach wrote:
Alternative to my above solution, we can also index directly into the 2D array. Not sure what's better.
I'd guess the above example would be better, since constant folding will just start with 3 1D arrays, but who knows the compiler can be crazy. I do like the 3D array idea. In either case all of these options have to be faster than anything we've come up with so far involving doubles (or floating point in general).
Unofficial Forum Rules and Guidelines
Get going with G! - LabVIEW Wiki.
17 Part Blog on Automotive CAN bus. - Hooovahh - LabVIEW Overlord
04-11-2016 08:53 AM - edited 04-11-2016 08:54 AM
@Hooovahh wrote:I'd guess the above example would be better, since constant folding will just start with 3 1D arrays, but who knows the compiler can be crazy. I do like the 3D array idea. In either case all of these options have to be faster than anything we've come up with so far involving doubles (or floating point in general).
Sometimes logic plays a trick on us. When Doom 3 (i think) was delevoped they'd created a lookup table for Sin to improve frame rate. By that time the FPUs was so good they were quite a bit faster than the memory access to read the table.
The SGL idea could be surprisingly fast, especially if the lookup table gets large.
Though, this sounds like the type of idea that should be run on a GPU. 🙂
/Y