07-14-2009 06:26 PM
Hello,
I am currently taking a large amount of data points, and recording 1 out of every 1000 points to a spreadsheet, and saving the latest 1000 points taken to record to a spreadsheet when the program has finished. I would like to be able to know the overall standard deviation of all the points (approximately 36000000 data points, of which I will only record 36000), but I will not be saving all the data points. Is there a way to approximate the standard deviation (of all the points, not just the ones I record)?
I know that I can find the exact mean by continuously summing and then dividing by the number of points taken. I also know that if the mean remains constant, I can start with a standard deviation, find the sum of the residuals squared, and add the new residual for each data point. The problem is that the mean is changing, so each time a new data point is added, the residuals for all the earlier points are also changed.
I'm sorry if this is more of a math problem than a coding issue, but I thought perhaps there would be a way to approximate it in LabVIEW.
Thanks
Solved! Go to Solution.
07-14-2009 07:23 PM
According to this, You need to keep three sums: S0=N, S1=Sum of all x, and S2=Sum of all x².
You can then calculate the running standard deviation at any time by as SQRT(S0xS2-S1²)/N
07-14-2009 07:55 PM
Here is a quick example, showing that you get the same result keeping only an array with 3 elements as described.
(Shown for both sample mean and population mean. For large arrays the difference is irrelevant).
07-14-2009 08:55 PM
What about the Point by Point SD VI? If you didn't want such a huge array in memory (36000 elements), you could at least always have the last 1000 or so rolled into the SD by using this VI.
07-15-2009 08:58 AM
Broken Arrow wrote:What about the Point by Point SD VI?
Well, that's exactly what the OP did not want... 😄
(quote: I would like to be able to know the overall standard deviation of all the points (approximately 36000000 data points, of which I will only record 36000), but I will not be saving all the data points. Is there a way to approximate the standard deviation (of all the points, not just the ones I record)?)
However, you can use the standard_deviation_ptbypt to get the global value without much memory impact by setting the sample lenght to zero.
Of course my example can be futher simplified as attached. (The earlier version was a verbatim translation of the article). Modify as needed. I also added the ptbypt version. You can look at the code of the ptbypt version, it's quite similar (look for the "infinite horizon" case). Personally, I prefer to use a size=3 array instead of multiple scalar shift registers. 😉
07-15-2009 09:28 AM
altenbach wrote:
Broken Arrow wrote:What about the Point by Point SD VI?
Well, that's exactly what the OP did not want... 😄
Knights shouldn't laugh at the common folk.
07-15-2009 11:04 AM
Thank you very much altenbach!
This is exactly what I was looking for, I must have overlooked it. Thank you also for your sample vi for implementing the solution.