10-17-2023 02:22 AM
Hi,
This is a question about the implicit multithreading (implicit parallelism) of LabVIEW.
I created a VI named Implicit Multithreading Main VI.vi that calls multiple instances of subVI Processor-Intensive Single-Thread subVI.vi. The subVI performs dummy single-threaded computations, simply to keep a virtual processor core busy for around 10 seconds.
Running the code on machine 1
I created and initially ran the code on a desktop machine with 12 virtual cores using LabVIEW 2023 Q3 32-bit, running on a 64-bit Windows 11 machine. There are 12 instances of the processor-intensive subVI in the Main VI, one for each of the virtual cores. When the subVI's execution is set to non-reentrant (the default), running the Main VI results in the CPU experiencing around 8% utilization. This is as expected, as 100% / 12 cores = 8.33%. Using the Threads tab in Process Explorer I can see that a single thread is utilizing all 8%.
The unexpected results occur when I switch the subVI's execution setting to Shared clone or Preallocated clone reentrant. With either of these settings, the Main VI continues to fully utilize a single thread, resulting in around 8% CPU usage.
The behaviour changes when enabling the "Inline" setting of the subVI. When running the Main VI LabVIEW now utilises 12 threads at around 8% each, for a total of approx. 100% CPU utilization, which is what I expected.
Running the code on machine 2
Machine 2 is a laptop with 8 virtual cores running LabVIEW 2023 Q3 64-bit on 64-bit Windows 11.
In the Main VI I diagram-disabled the bottom four instances of the subVI, to have 8 instances being called - one for each virtual core.
When the subVI is set to Non-reentrant, Shared clone or Preallocated clone reentrant, the CPU utilization is around 37.5%. When the subVI is set to Inline, the CPU utilization is around 100%. Unfortunately I don't have Process Explorer on the laptop to see how many threads are being used by LabVIEW. The 37.5% figure suggests that perhaps 3 threads are being used (3 threads / 8 virtual cores = 37.5%).
Questions
Main question: On both machines, why does LabVIEW not utilise 100% when the subVI is set to Shared clone or Preallocated clone? Why does it use 100% only when the subVI is inlined? Is this expected? Is this a bug?
Secondary question: When running as Non-reentrant, Shared clone or Preallocated clone why does the code seem to use 3 threads on machine 2, but only 1 thread on machine 1? In other words, why is the implicit parallelism different between the machines? Is it because machine 1 uses 32-bit LabVIEW and machine 2 uses 64-bit LabVIEW? Is 64-bit LabVIEW "more parallel" than 32-bit LabVIEW?
Thanks!
Below are screenshots of the VIs:
Screenshot 1: The project containing the two VIs
Screenshot 2: The Main VI calls a number of subVI instances equal to the number of logical cores (12 for machine 1, and 8 for machine 2)
Screenshot 3: The processor-intensive, single-threaded subVI. It performs dummy computations to keep the processor busy.
Please note that the "Random Number (Range)" VI has the "Inline" setting turned on. The multithreading behaviour doesn't change when that VI is diagram-disabled.
Solved! Go to Solution.
10-18-2023 02:24 AM - edited 10-18-2023 02:51 AM
At first: we cannot edit/test pictures. Please post real code and don't forget to save the VI/project for previous version (at least labview 2019). Only a view developers at NI might be able to give a competent answer to your question. I can only write some comments.
Great, you found out that reentrant execution sometimes needs more, than enabling the option in a dialog box 🙂
if you use such kind of massive calcualtion, then think about a better algorithm. If this is not possible, then use async call by reference (call and collect) for such kind of massive calculations and carefully select the options when opening the VI references. You can slo use a for loop with iteration parallelism.
I can't test it, because I'm unable to edit your pictures: I sure, that placing a very small delay into the loop forces labview to execute the VIs in parallel.
In very old versions of Labview, it was easy to force Labview to use all the CPU time. This results in a totally useless system while labview did its calculations. This has changed at some time. It might be happened accidental that way or there is some kind of algorithm which prevents labview from doing similar things in most situations nowadays. The descriptions of the Open VI Reference function also gives some possible hints.
10-18-2023 03:20 AM - edited 10-18-2023 03:21 AM
Note that parallel execution allows the clumps to be executed in parallel.
If there are 12 parallel VIs, there's no link to 12 available threads. It just means your "things to be executed" can be distributed over available resources.
So, disabling the bottom 4 VIs when there are only 8 threads is rather pointless (IMO), because it simply means you have less stuff to distribute. If you keep them, the 'stuff to execute' will be distributed over the available resources nonetheless.
Note that all cores are being used all the time by other processes.
The spreading of parallel load across resources isn't something LabVIEW has much influence on.
Why the VI loads aren't (always) spread over all cores is a mystery. I'd too assume similar results for the reentrant code and the inlined code.
I agree that it would help if we could run something without having to recreate it.
10-18-2023 03:51 AM - edited 10-18-2023 04:08 AM
wiebe@CARYA wrote:
Why the VI loads aren't (always) spread over all cores is a mystery.
The mystery is, why labview executes these VI sequencially.
Just made a similar VI with 5 calls of a reentrant SubVI. The SubVI pops up its front panel when called (show front panel when called). On LabVIEW 2021 it pops up only one reentrant VI at a time instead of all 5.
There is no hardware limitation - my notebook has a plenty of CPU cores (20 logical processors), lot of RAM space (64GB) and labview is configured to use 50 threads instead of the default lower number of threads (but thats another story).
10-18-2023 04:02 AM - edited 10-18-2023 04:03 AM
...and with this variant (see attachment) I places a 2ms delay into the for loop ... and now: labview executes these VIs parallel.
10-18-2023 09:18 AM - edited 10-18-2023 09:23 AM
Thanks both for your insightful comments.
Please find attached the VIs saved for LabVIEW 2016. The reason I had not attached them from the beginning is because my question was a general question about the implicit multithreading/parallelism of LabVIEW, therefore I thought it was unnecessary to attach the particular dummy VIs I had created.
I agree with Martin's statement that the mystery is why LabVIEW is executing the VIs sequentially when it could execute them in parallel.
I added a Wait (ms) primitive inside for loop inside the single-thread subVI, and fed a value of 0 ms as the input. This made LabVIEW 2023 32-bit execute all twelve subVI instances in parallel, one on each thread, resulting in almost 100% CPU usage. This occurred regardless of whether the VI was set to shared clone, preallocated clone and/or inlined.
In short, adding a "dummy" 0 ms wait inside a loop makes a difference in terms of how LabVIEW multithreads the tasks. This seems very odd, even a bug. Surely LabVIEW should execute the code in the most parallel way possible regardless of whether a "Wait (ms)" (or any other primitive) is present in the loop or not?
When executing the same attached VIs in LabVIEW 2016 32-bit on the same machine, twelve threads are being used regardless of whether the 0 ms wait is present or not. This shows that LabVIEW 2023 Q3 implements implicit multithreading differently (and worse?) than LabVIEW 2016.
The following table summarises my results with the attached subVIs:
In the table above, green cells represent expected results, where the LabVIEW multithreading works as expected, and red cells represent situations where the implicit multithreading doesn't work as expected.
10-18-2023 10:20 AM
@Petru_Tarabuta wrote:In short, adding a "dummy" 0 ms wait inside a loop makes a difference in terms of how LabVIEW multithreads the tasks. This seems very odd, even a bug.
That is a well known property of the wait function.
Wait 0 ms is a thread switch, always has been.
It is even documented (you didn't read the manual, did you 😉)
10-18-2023 10:26 AM
you can speed up the execution if you place the 0ms wait outside of the loop. No matter where and when the 0ms wait is executed.
10-18-2023 10:26 AM
I forwarded this. If this did change it needs to be looked into.
10-18-2023 10:40 AM - edited 10-18-2023 10:43 AM
Thanks again for the replies. It's good to know that placing the 0 ms wait outside the for loop will speed the subVI up, and that it's documented that a 0 ms wait will yield the thread. This explains why adding the 0 ms wait changes the behaviour, and are good practical workarounds.
But my question was broader: why is the "trick" of using a 0 ms wait needed in the first place? The way LabVIEW is taught in the Core 1 and Core 2 courses suggests that reentrant VIs would execute in parallel as much as possible. In other words, that LabVIEW would use n threads if there are n logical cores and at least n chunks of code that can run in parallel.
Perhaps the implicit multithreading algorithm used by LabVIEW needs to be documented better? (please let me know if it is already and I haven't seen it) What is the expected behaviour?
At the risk of being pedantic I will add: