10-06-2009 04:54 AM
I am trying to utilize a multi-core CPU to speed up the computation of 32 FFTs by running them in four parallel threads, as shown in the code example image below. However, the performance difference between single and multi-threading is only ca. 10% even on a Core 2 Quad CPU.
I already tried a few things such as placing the array split and merge functions, or the waveform graph, outside the timed section, but this has very little effect - the main delay still occurs with the FFT VIs. These VIs are already set to reentrant execution, but somehow still don't perform well in parallel. Why?
Can someone demonstrate a better performance gain in a similar VI? I am using Labview 7.1, using images instead of a VI for replies would be greatly appreciated!
Thanks!
Solved! Go to Solution.
10-06-2009 05:06 AM
Probably the FFT VI does it internally. I remember having heard some functions will automatically use multiple cores if available.
Do you see a difference if you look at the CPU load of the different cores when running both cases?
10-06-2009 06:55 PM
Hello,
Thanks for your response. Actually, I found the solution. Rather than already doing internal multi-threading, the Express VI did just the opposite, it internally broke the multithreading ability by including several sub-VIs which were not reentrant. That means that the overall Spectral Analysis Express VI is not reentrant either, and will not properly accelerate on a multi-core CPU.
My solution was to dig down into the Express VI until I found the most basic VI levels (DLL function calls etc.), which actually were fully re-entrant. By extracting these, and saving just this essential code as a new, fully reentrant sub-VI, I was able to unlock the full multi-core potential. My FFT benchmark VI now runs 5x faster, simply by replacing the Express VI with the stripped-down FFT VI of my own.
As a courtesy, I am attaching my new, 5x faster Multi-Core FFT VI.
It scales as follows on an Intel Core 2 Quad CPU:
Labview Spectral Analysis Express VI (single or multiple instances): 1x Speed
Multi-Core FFT VI (single instance): 2.3x to 2.4x Faster
Multi-Core FFT VI (dual instance): 3.7x to 4.0x Faster
Multi-Core FFT VI (quad instance): 4.8x to 6.1x Faster
Multi-Core FFT VI (octo instance): 4.8x to 6.1x Faster (would probably need an 8-core to see benefit)
Here are the internals of the stripped-down Multi-Core FFT VI: