Cross-correlation values are too good

DaveSA · ‎01-18-2024

Hi Everyone

I am looking to use the cross-correlation function to determine the similarity between two bioelectrical signals (EMG) with the signals being the same length. This would be to establish reliability two measures of the signal. Since the signals are the same length, with no delay or offset, I believe I can simply take the max value of the output array. When looking at actual data, this worked really well, suspiciously well, with high correlation coefficients. To test if the code is behaving as I expected, I extracted the cross-correlation code I am using to work with just two arrays. One array is real data with the other being an array of 1s and should result in a very low correlation but the max value here was 0.986. I cannot figure out why this output is so high. I am also not certain what the difference between the crosscorrelation.vi and TSA cross-correlation function.vi would be - these give similar results. Any help understanding why the output is so high would be very appreciated!

Attached is a VI with default values for the input arrays (real data and another arrays of 1s). Also attached are two .csv files with two trials of real data.

GerdW · ‎01-19-2024

Hi Dave,

@DaveSA wrote:

I cannot figure out why this output is so high.

You use the "default" crosscorrelation with "none" normalization. what happens with any of the other normalization modes? (They are described in the help for this function!)

@DaveSA wrote:

I am also not certain what the difference between the crosscorrelation.vi and TSA cross-correlation function.vi would be - these give similar results.

One function comes with "default" LabVIEW (maybe with Professional edition due to SignalAnalysis package), the other is part of a specific toolkit for specific tasks...

@DaveSA wrote:

Attached is a VI with default values for the input arrays (real data and another arrays of 1s).

Unfortunately you use a very recent LabVIEW version. You would reach a broader audience when downconverting the VI before attaching. (File->Save for previous, I prefer LV2019.)

As your "real" data aren't the same as the ones shown in your screenshot I don't know the expected result for those "real" data. And I don't have that toolkit installed so you need to read the help for the TSA function on your own...

Best regards,
GerdW

using LV2016/2019/2021 on Win10/11+cRIO, TestStand2016/2019

DaveSA · ‎01-20-2024

Thanks for the response. I have gone through the documentation for both VIs. I did look through the normalization methods, and for interpretability, I should be using a unbiased normalization to get values between -1 and 1 with 1 being a perfect match for the regular crosscorrelation VI.

However, when I run this code and load up the same data set twice the output does not return a max value of 1. Its clear that something is not correct or I would have a value of one when the same signal is used as the input twice.

I took your advice and save the VI for 2017 and also changed the program to read in the spreadsheets. Any insights would be appreciated to get the correct output for signal similarity.

alexderjuengere · ‎01-22-2024

Cross-correlation is a measure of the similarity between two data sets.

(simplified) R_XY = ∑_i X_i Y_i

The larger the absolute value of R_XY, the higher the correlation.

A value of 0 indicates no correlation.

if you use unbiased or biasd normalization, and compare real data 1 and real data 2 their max crosscorrelation is 0,06..

if you use biasd normalization, and compare signal 1 with "array of 1s" their max crosscorrelation is 0,27..

but if you'd compare array of 1s with array of 1s, the crosscorrelation is 1,00

Spoiler

Just like all correlations, it does not represent causation, only statistical association.

so in terms of max correlation value, signal 1 and signal 2 are less correlated than signal 1 and "array of 1s"

which apperas to be counter-intuitiv - or is it?

keep in mind:

- Cross-correlation analysis is usually conducted to understand the relationship between two stochastic processes.

- I'd say the array of 1s is hardly or very unlikely the outcome of a stochastic process

- Cross-correlation measures the temporal similarity for two data series, and it assesses the information between peaks (Derrick and Thomas, 2004; Ruppert and Matteson, 2015).

- arrayof1s is just one big plateau, with no peaks at all ...

this is signal 2 (red) compared to arrays of 1 (white):

this is signal 2(red) compared to signal 1 (white):

as far as i am concerned, crosscorrelation vi works as expected:

I used 2d convolution to reverse engineer this small CNN (convolutional neueral network, on the page saerch for Convolution Demo). as 2d convolution is almost the same as 2d crosscorrelation, but with a flipped convolution-mask, I also tried successfully to exchange those .vis - which did work successfully. I also tried the correlation .vis in the free sparse matrix toolkit, as it provides a double and a single datatype version of crosscorrelation.vi.

DaveSA · ‎01-22-2024

@alexderjuengere - Thanks very much for that response! Agree with everything you said. The ultimate goal is to see the similarity between the two signals. Like you say, 0 would be no match and 1 being a perfect much. I'd therefore expect a max value of 1 if I cross correlate the signal with itself. This happens when I use the array of 1s but not if I use the same data array. Using real data 1.csv as both inputs, the max value is only 0.052 where I would be expecting 1. However this does occur when using the TSA crosscorrelate function. However it also will output a 0.969 value as the output for the cross correlation between my two sample data set - which intuitively feels far two high for this data.

The question I have left is why the cross correlation code is not returning a max value of 1 when cross correlating the signal with itself?

For a little more context - I added two more .csv files which represent the mean of 15 trials each. If I use the correlation coefficient VI, correlation between real data 3.csv & real data 4.csv is 0.949. And perhaps I am overthinking this, and the correlation coefficient VI is sufficient since I know there is no lag between the signals.

Thanks again!

alexderjuengere · ‎01-22-2024

@DaveSA wrote:
The question I have left is why the cross correlation code is not returning a max value of 1 when cross correlating the signal with itself?

I don't know the answear to this - I suppose this is rather an expectation, not a fact, which is generally true, but is true under certain conditons.

however, the cross correlation of signal 1 with an array of 0 is obviously always zero 😉

so, if we do this with different values for x, max cross-corr value is about 1 when x=4,45

x	max cross-corr value
0	0
0,01	0,00223537
0,1	0,0223537
0,25	0,0558843
0,5	0,111769
1 ( auto correlation)	0,22
4,45	0,994741

@DaveSA wrote:

For a little more context - I added two more .csv files which represent the mean of 15 trials each. If I use the correlation coefficient VI, correlation between real data 3.csv & real data 4.csv is 0.949. And perhaps I am overthinking this, and the correlation coefficient VI is sufficient since I know there is no lag between the signals.

It looks like crosscorrelation.vi is the wrong tool for your task, but linear correlation coefficient is the proper tool.

so basically you learn a linear model and compare how well the model fits both signals.

if the signals are identically, this correlation coefficient will be 1. if you do it with an array of 1s or an array 0s and one of the two signal, this will result in NaN.

but this also appears to some-what scale-invariant, when multiplying one the two sgnals with a scalar factor, corr-eff stays exactly the same, however the signals look completly different:

Spoiler

x = 0,2; cross-corr-max = 13,5; corr-coeff = 0,9492

x = 1; cross-corr-max = 67,54 ; corr-coeff = 0,9492

x = 10; cross-corr-max = 675,46 ; corr-coeff = 0,9492

x = 0,2; cross-corr-max = 13,5; corr-coeff = 0,9492x = 1; cross-corr-max = 67,54 ; corr-coeff = 0,9492 x = 10; cross-corr-max = 675,46 ; corr-coeff = 0,9492

- which is propably mostly due to z-normalization, which involves subtracting mean-values... -

DaveSA · ‎01-22-2024

@alexderjuengere

Awesome stuff! I think your input led me to the correct tool for the job! Luckily scaling shouldn't be a problem - the signals are already scaled to the relative maximum as part of the data processing.

If one one does know the answer to why cross correlating a signal with itself does not lead to a value of 1, I am still interested to know.

As a thanks (and if you're interested) here is a video of the movement we are recording the electromyographic data from for this research 🙂 https://drive.google.com/file/d/1nZDUJUwIexENOmTzyH2xeyVFpGG4_KR_/view?usp=sharing

Thanks for the help!

Dave

alexderjuengere · ‎01-23-2024

@DaveSA wrote:

If one one does know the answer to why cross correlating a signal with itself does not lead to a value of 1, I am still interested to know.

me too 😄

@DaveSA wrote:

As a thanks (and if you're interested) here is a video of the movement we are recording the electromyographic data from for this research 🙂 https://drive.google.com/file/d/1nZDUJUwIexENOmTzyH2xeyVFpGG4_KR_/view?usp=sharing

nice! I appreciate it - watch out for your ankle joints 🙂

LabVIEW

Cross-correlation values are too good

Cross-correlation values are too good

Re: Cross-correlation values are too good

Re: Cross-correlation values are too good

Re: Cross-correlation values are too good

Re: Cross-correlation values are too good

Re: Cross-correlation values are too good

Re: Cross-correlation values are too good

Re: Cross-correlation values are too good