05-15-2014 01:35 PM
Oh and, to answer your other question, LabVIEW does use the Intel MKL Libraries:
LabVIEW and Intel® Math Kernel Library (Intel® MKL) Version Specifications: http://digital.ni.com/public.nsf/allkb/0927A32F3F2532C4862576EA006A8408
05-16-2014 02:59 AM
Hi altenbach,
FFT function does not use MKL. Some linear algebra functions use MKL underneath. Your link of matlab discussion explains the performance degradation of FFT function in 32-bit LV.
Based on my benchmark, checking NaN in the input array roughly takes 10% of FFT time.
So the question is: do we want to improve the FFT perforamnce of NaN input case, at the expense of slowing down 10% performance of valid input case? I would vote for not checking NaN in FFT.
Best Regards,
Michael
05-16-2014 11:20 AM - edited 05-16-2014 11:21 AM
DSPmchen wrote:Based on my benchmark, checking NaN in the input array roughly takes 10% of FFT time.
Well, checking for NaN is O(N) and FFT is O(Nlog(N)), so for large arrays the penalty is less. For small arrays the speed might be irrelevant. Maybe it is worth checking for very large arrays where NaN could cause seconds of delays and the check would be relatively cheap (probably <1%). The check could also be done in steps, e.g. check if the first element is NaN, and only check the rest if it is.
DSPmchen wrote:FFT function does not use MKL. Some linear algebra functions use MKL underneath. Your link of matlab discussion explains the performance degradation of FFT function in 32-bit LV.
Thanks, thats what I actually thought. I remember long ago when all FFTs got overhauled and vastly improved. (LV 5?). LabVIEW FFT performs very well overall, even for random sizes. (probably similar to FFTW, but I havent checked in a long time).
Based on the linked discussions you might look at the compile options used. Could it be that SSE is not enabled for FFTs?
I wonder if the FFTs in the multicore/sparse matrix toolkit suffer from the same problem. I need to find a different machine to test...
05-18-2014 09:21 PM
Hi altenbach,
I agree that for large arrays the penulty is less. However, since FFT is O(NlogN), it takes very long to go from 10% to 1%.
On my machine, i7-4770, for 20k-point FFT, checking for NaN takes 10% of FFT time. for 200k-point FFT, checking NaN takes 7% of FFT time. Even for 5M-point FFT, checking NaN still takes 4% of FFT time. It might also relates to different CPU and RAM size. It might be hard to determine from which size we should do checking. Also doing NaN check for large size FFT but not small size FFT might bring some inconsistency to customer.
Yes, LV FFT mainly optimizes at algorithm level, not SSE instruction level. The FFT function in Multicore analysis and sparse matrix toolkit uses MKL.
Best Regards,
Michael
05-19-2014 01:54 PM
I can't believe we're considering adding a check for NaN to an FFT function- that would be a very poor move IMO. For those who like passing NaNs through their FFTs, let them do the check themselves rather than slowing down core functionality.
Stick a warning note in the FFT function documentation regarding the issue and job's a good'n.
05-19-2014 02:09 PM - edited 05-19-2014 05:55 PM
No, I am also against anything that impacts perfomance. In any case, 10 years from now everything will be 64bit, so the problem will dissolve. 😄
(I wonder if the NaN check itself is also artificially slow under 32bit. Not tested yet. :D)
05-19-2014 05:10 PM
Phew- thanks for clearing that up, Altenbach! 😄
Also thanks to the OP for supporting my suggested solution.
05-20-2014 02:19 AM - edited 05-20-2014 02:20 AM
NaN check should simply be a boolean operation or two to discern the bit pattern that indicates NaN numbers. Note that it is not just a single bit pattern but a whole range that can indicate NaN, although I believe LabVIEW itself uses a canonical NaN value, but still honors the entire range of patterns as NaN.
The issue is about NaN numbers being passed to the floating point unit when SSE2 is not used. And SSE2 can only be used when it is sure that it will not run on systems that lack that support.
The Intel MKL would probably use conditional code paths for the FFT as it is specifically written to account for a very large range of CPUs (well at least Intel CPUs ) but according to an earlier post the NI FFT does not seem to make use of the MKL. In order to be sure that the function will still work on older and non Intel CPUs, the lvanlys.dll can't be compiled with SSE2 enabled so it's a bit of a catch 22 here.
And unfortunately the SSE2 compile setting in newer LabVIEW versions in the Build properties won't help since it only affects what code the LabVIEW compiler emits, not what code the underlaying DLLs are compiled in.