07-15-2016 04:34 PM - edited 07-15-2016 04:37 PM
@EngrStudent wrote:
Here is the FP and block diagram (appended)
Please attach the actual VI.
(An image often does not tell the whole story and we cannot test for ourselves)
Thanks!
I don't understand the point of the "in place" code. The code executes in place anyway for one of the wires. The 60x60 could even be taken out of the loop.
07-15-2016 05:19 PM - edited 07-15-2016 05:23 PM
@altenbach - I put "in place" there because I am trying to learn, I am trying to find a way to get to the things that you understand without spending a few man-years living in your office.
It is attached. Note - only one version is attached. The error can be rewired and the display for the arrays can be moved.
I know, now, that I am converting days to seconds. In a year I might not know that. If someone else reads it they might not. I don't always have the time to write out all the details in documentation, so I try to build some hints in at the DNA level. The compiler should convert it to a constant, so it shouldn't hit runtime or memory.
Can you tell me what you do with the VI?
07-15-2016 06:04 PM - edited 07-15-2016 06:07 PM
@EngrStudent wrote:Can you tell me what you do with the VI?
Just playing around a little bit.
The 24x24x60 only gets folded if you change the order, see picture (look for the fuzzy wires!). (not sure if the compiler reorders thing though, but we seem to gain 5-10%).
Once you disable debugging in the VI the times for the non-mathscript code are basically identical. With debugging enabled, you are giving the formula node an advantage, because the formula does not have debugging code inside it.
You definitely want all array indicators outside the main loop, Right now the UI might steal cycles from later running code in the loop to update the indicators.
None of the sequence frames on the right serve any purpose.
To get a more honest value, user array min instead of mean. All external artifacts (e.g. OS jobs, scheduling) tend to make the times longer, so the min is asymptotically a better guess for the pure algorithm time.
I would use high resolution relative seconds and format the time display with a format of "%.2ps", for example.
I look at it some more.... 😉
07-15-2016 07:11 PM - edited 07-15-2016 07:12 PM
Once you take the random number genration out of the inner loops and remove the vestigial FOR loops, the various versions (incl. mathscript!) are basically identical in speed. The formula node is about 2x slower.
Here's a quick benchmark rewrite (LV2013). See if it makes any sense. Let me know if you have any questions.
(Not fully tested, Of course there could be bugs)
07-16-2016 07:39 AM
My results looked a little different. I just opened your VI and clicked run.
Also there is this startup effect where things happen faster then take a step increase. Is that consumption of RAM, or similar? Transition to SWAP? Windows background process?
I am wondering about the relationship between the mean and the min in evaluating the algorithm. I use the mean because I want to measure realistic performances. If the algorithm is faster in the min, shouldn't it be faster in the mean too? The mean is confounded by many windows processes, so maybe there is a variation in the mean. There is likely a bias too. ...
07-16-2016 10:03 AM - edited 07-18-2016 10:04 AM
If you do the "mean", a single significant outlier can skew one of the results (e.g. when some other program loads at just the wrong time, windows checks for updates, etc.).
If you are worried about "realistic performance", you could do the min and the max, giving you a range from the best to worst case scenario. Absolute performance differs from machine to machine and is thus not very interesting. If I am really worried about the distribution, I typically do a histogram of many runs. Often, the "mean" is a bad measure of the actual distribution, because it is not gaussian.
Are you running on batteries or plugged in? What is your OS and power plan? Sometime the results can differ because the system adjusts clock frequency on the fly to either boost performance (turboboost) or save power (speedstep), or prevent overheating.
There are also boundaries that depend on the CPU cache size of the processor and such. How are your results if you change the size to 1M or 100k, for example. What is the exact make and model of your CPU? e.g. AMD will behave differently compared to Intel.
What version of LabVIEW are you running? Every version has new compiler improvements. I am running 2015SP1.
07-16-2016 10:28 AM - edited 07-16-2016 10:30 AM
Here are my result for 50M, same picture.
If I go down to 10k size, things get a bit noisier and the mathscript slows down a little bit. It seems there is slightly more overhead launching the mathscript node (That's why it is so slow if you place it inside a FOR loop instead of operating on arrays!!
Here is is also more obvous that the "min" is a more accurate comparison. The distributions are higly skewed, always with slower outliers, but the min forms a relatively stable and reproducible lower boundary.