LabVIEW

cancel
Showing results for 
Search instead for 
Did you mean: 

Intel i7 processor with hyper-threading (only every second core is used)

Solved!
Go to solution

Dear altenbach, I expect that all the eight logical processors (= threads) will be engaged, when the algorithm allows for it. That is how I understand hyper-threading: 4 physical cores, each with 2 threads = 8 virtual processing units. And that is also what the http://www.ni.com/white-paper/3558/en#toc7  article states in the very first paragraph. G

0 Kudos
Message 11 of 20
(1,544 Views)

@altenbach wrote:

If you turn off parallelization, how much slower is it?


I tried to switch off parallelisation (I was inspired by http://www-w2k.gsi.de/controls/CS/How-To/cs_multithreading.htm see the "How and why to turn multithreading off" section), but I could not find any such option in LabVIEW 2012. I decide to turn on only one single core in the BIOS options. But the start-up of Win7 64bit took more than four time longer than with all 4 cores enabled. I reverted this setting in BIOS after watching the "please wait..." window's boot screen for 20 minutes. Any hint where to switch off parallelisation in LabVIEW only, please? G

0 Kudos
Message 12 of 20
(1,539 Views)

altenbach wrote:

If you turn off parallelization, how much slower is it?


I think altenbach was refering to turning off the parallelism on your "for loops".

 

Go to: Tools>>Profile>>Find Parallelizable Loops...

 

You will get a list of all "for loops" and their parallelized status.

Unfortunately you can't change the settings here.

Double click on a "for loop" in the list to go to it's block diagram and change the setting.

There is a refresh button to update the list status.

 

steve

 

----------------------------------------------------------------------------------------------------------------
Founding (and only) member of AUITA - the Anti UI Thread Association.
----------------------------------------------------------------------------------------------------------------
0 Kudos
Message 13 of 20
(1,525 Views)
Solution
Accepted by ghighuphu

@ghighuphu wrote:
 I expect that all the eight logical processors (= threads) will be engaged, when the algorithm allows for it.


Hi,

 

I guess your algorithm is not utlize all eight cores. You should adapt is to multicore PC.

 

Do pretty simple test - put 8 while loops without any delays and see how many cores will be utlized. Then you should see something like that (in my case I have 2 CPUs each with 6 cores and hyperthreading is enabled - therefore 24 while-loops):

 

cpu.png

 

As you can see - overall 100% cpu load.

 

Andrey.

 

Message 14 of 20
(1,513 Views)

Thank you Andrey! It was a very easy check on the possibilities of my PC. All eight virtual cores are alive an running 🙂 OK, so I have to rethink the algorithm and its possibilities. (I was sure, it was suppose to use all possible cores. Obviously, it was not...)

0 Kudos
Message 15 of 20
(1,499 Views)

@ghighuphu wrote:

Next I've ran the "4 Calculate N Digits of Pi.vi" with N set to 10000 (ten thousand). The result was, that only four cores out of eight were engaged. Do you get similar or different results please? KR, M


OK, I had a glance at the code and it is NOT optimized for multiple processors at all. What makes you think it is???

 

There is a place where 4 reentrant "series" subVIs are called in parallel, so there will be some mild parallelization with an upper limit of 4x.

 

Some simple profiling shows that significant effort is spent in the "powers of two" subVI. Once you disable debugging and inline it, this code will be folded and will take 0 time, overall speeding the calculations dramatically. There are quite a few other places where significant optimization is possible. Please try. I have not.

 

 

0 Kudos
Message 16 of 20
(1,474 Views)

@altenbach wrote:

OK, I had a glance at the code and it is NOT optimized for multiple processors at all. What makes you think it is???

 

There is a place where 4 reentrant "series" subVIs are called in parallel, so there will be some mild parallelization with an upper limit of 4x.


I did not think it through so far. It was a coincidence that the four re-entrant VIs engaged my four processors.

0 Kudos
Message 17 of 20
(1,467 Views)

OK, I did a benchmark on my old non-hyperthreaded 4 core machine (Intel Q9300) before and after my 2 minute modifications mentioned above and here are the results for 10000 digits:

 

stock: 304 seconds, ~75% CPU utilization

my modification: 43 seconds, ~88% CPU utilization

 

As you can see, a few trivial changes can speed up things by more than a factor of 7 and this is only the tip of the iceberg salad! We can tell that in the stock implementation, a huge percentage of the CPU is just pumping hot air (thread swapping, other overhead, etc.) instead of doing real work. There is no telling what's possible if the entire thing is rearchitected from scratch to optimize for multiprocessor use. There is no upper limit, because the problem can be split into an infinite number of seperate calcaluations. Try it!

0 Kudos
Message 18 of 20
(1,460 Views)

@Andrey_Dmitriev wrote:
Do pretty simple test - put 8 while loops without any delays and see how many cores will be utlized. Then you should see something like that (in my case I have 2 CPUs each with 6 cores and hyperthreading is enabled - therefore 24 while-loops):

 ... and don't forget to keep the fire extinguisher nearby. 😄

0 Kudos
Message 19 of 20
(1,457 Views)

altenbach wrote:

stock: 304 seconds, ~75% CPU utilization

my modification: 43 seconds, ~88% CPU utilization


This also tells you that core utilization is NOT a useful tool to assess the quality of the code. The goal should be to do it quickest with the least CPU effort. The only measure that counts is the elapsed time to achieve the task. If you have inefficient parallel code that burns 100% of all CPUs, and a better serial algorithm that can get the same result in 10% of the time using a single core, you should go with the latter.

 

Maxing out all cores at 100% is never the primary goal. It also has disadvantages, such as placing high demands on the thermal management of the computer as well as impacting everything else you are trying to do on the computer at the same time. For a quick test, run Andrey's code above, then try to browse the web or watch a youtube clip. 😮

0 Kudos
Message 20 of 20
(1,441 Views)