02-04-2016 10:56 AM
James, in your core optimized version, you mention 8 cores. Does this include hyperthreading? So, you have 4 physical cores with 8 logical "processors." If that is the case, then I am unlikely to have 64 devices. Typical number for most implementations is between 20 and 40 devices.
For my current testing, I have 10 devices. 9 out of 10 are disconnected. The execution of the parallel loop completes in about 12 seconds. The extra 2 seconds might acount for thread scheduling but more than likely is from the data processing that happens on the one device that is connected. Trying to understand my results in light of your testing with core limits. Does this sound like it is operating as expected? And will I run into scaling limits? I may have to add a bunch of dummy devices to the array to test the upper limits of execution time.
02-04-2016 11:14 AM - edited 02-04-2016 11:17 AM
Sorry, yes this is including all 8 logical processors. When the snippet above is run, the Tick values are all within a tick or two of each other. This means that each iteration ran in parallel instead of waiting for the 100ms Wait I put in there.
The parallel processing will always take as much time as the slowest iteration. So if your one device tkaes 12 seconds, that's probably what you're seeing. I would run the code on the single device along and see if it takes 12 seconds to compare the performance. My total runtime for the above loops is 100 ticks every time.
Did you run it with the nested For loops I included above? Btw, you can drag a Snippet (like above) to your block diagram to import the code.
Cheers
--------, Unofficial Forum Rules and Guidelines ,--------
'--- >The shortest distance between two nodes is a straight wire> ---'
02-04-2016 11:21 AM
The slowest iteration is going to be when an attempt is made to connect to a disconnected device. This will incur the 10 second timeout. The devices that are connected and respond appropriately do so in the sub-second range. But there is some added processing of sampled data, but that is usually negligible. So, I am just trying to track down the extra 2 seconds. And that may be occuring from the way I am timing the execution times. My fear is that it is occuring because I have run into the core limit you were discussing. I will try your nested loop version and see if it makes any difference to the execution times. Thanks again for your efforts. I appreciate the invested time and effort.
02-04-2016 11:37 AM - last edited on 12-16-2024 02:58 PM by Content Cleaner
Just add the Tick counter like I have so you can see what the tick count is when each iteration starts. This will tell you if they're starting all that the same time or not. I assume you already set the VI to reentrant?
Thanks are best given in the form of Kudos and Marked Solutions (Unofficial Forum Rules and Guidelines). You received assistance from a top-tier LabVIEW expert here (RavensFan), whom I'm sure would appreciate your gratitude. Marked Solutions help others find this post when they have the same issue and Kudos motivate us all to keep coming back to help!
Cheers
--------, Unofficial Forum Rules and Guidelines ,--------
'--- >The shortest distance between two nodes is a straight wire> ---'
02-04-2016 11:52 AM
@Arcus111 wrote:Here is what I boiled it down to:
I just tested it and the entire loop takes approximately 12 seconds to execute. That seems about right with the data processing involved. Thanks to all for the input and suggestions. I have a much better handle on data flow and it's impact on making things execute in parallel.
Thats looking better but you can still improve!
If you dig into the help file way down deep you can learn a bit more about re-enterant vis and how they handle creating the dataspaces for the clones. Creating the dataspaces can be time consuming if you need more clones than are currently in the clone pool. The default is 1 clone per core and you can't create less than that many HOWEVER, you CAN create more with the Prealocate Clones method. This burries the time it takes to stop and create new dataspaces in the initialization of your application and since the clones are allready there when you need them the execution becomes much faster.
There is a GREAT shipping example called "Benchmarking Asynchronous Calls.vi" you should look into. There is good information in the documentation of that vi you will find enlightening
02-04-2016 01:19 PM - last edited on 12-16-2024 02:58 PM by Content Cleaner
@James.M wrote:
@Arcus111 wrote:
Here is what I boiled it down to:
I just tested it and the entire loop takes approximately 12 seconds to execute. That seems about right with the data processing involved. Thanks to all for the input and suggestions. I have a much better handle on data flow and it's impact on making things execute in parallel.
This will still be limited by the number of cores.
I would look in to Call and Collect as a way to do infinite launches like RavensFan suggests. I made this below if you want to do nested For loops. This is where i stop spending so much time on this though haha
So, I tested this version and it is returning in the expected 10 second timeframe. The extra seconds must have been coming from the processor core limit. So, I am pretty sure most of our implemented systems will have fewer than 512 devices since they will be running on quad cores with hyperthreading. So, this solution works perfectly for me. Thanks again.
02-04-2016 01:22 PM
Glad to hear it! I just noticed I put that 100ms Wait outside the "replace with subVI" structure, so make sure you remove that Wait in your code.
Cheers
--------, Unofficial Forum Rules and Guidelines ,--------
'--- >The shortest distance between two nodes is a straight wire> ---'
02-04-2016 01:24 PM
@James.M wrote:Glad to hear it! I just noticed I put that 100ms Wait outside the "replace with subVI" structure, so make sure you remove that Wait in your code.
Already removed it, but thanks for the follow-up.
02-04-2016 02:32 PM
One more option is to use the call And collect method. Using two for loops.
for loop 1 call your GetData.vi by ref and execute it. But lv will not wait for the result and immediately starts the next Vi
for loop 2 , retrieves all the results from the vi's when they are done.