1

Before starting, let me say I'm not a mathematician, I'm just a curious engineer, with a master's degree in engineering and computer science.

I have been using this book The Scientist and Engineer's Guide to Digital Signal Processing for years, as I developed many audio applications throughout my life.

This book has been very useful to me, but recently I have been struggling with correlation between signals.

Suppose I have a discrete signal $S$, with a length of $N$ samples in the time domain (where $N > 10000$). Along with that, I have another signal, $S2$ with a length of $1024$ samples, also in the time domain.

Both $S$ and $S2$ consist of samples with amplitude ranging from -128 to 127.

If I want to figure out if $S2$ appears within $S$, even in the presence of a background noise, I can use correlation. That correlation can be implemented in a way similar to a convolution (according to Chapter 7 of that book), to produce a third signal, the cross-correlation of the two input signals, which I will call $C$.

Now, how should I interpret the cross-correlation $C$, in order to extract the real useful information out of it, that is: is $S2$ present in $S$, and if so, at which sample of $S$ does $S2$ start?

2 Answers2

1

The cross-correlation machine moves across the signal $S$ searching for similarities with the pattern of the shorter target signal $S2$, to assess whether it is contained in $S$. To achieve this, the original and the target waveforms $S$ and $S2$ are aligned in a given point of $S$, and then the product of the simultaneous samples is obtained. This operation is then repeated multiple times after the target waveform $S2$ has been moved (leftward or rightward) and re-aligned in other points of $S$. After a high number of operations, the whole original waveform $S$ has been completely explored. The sum of all products obtained in this way is then used to build the cross-correlation signal $C$. The amplitude of each sample in $C$ indicates how much $S2$ resembles the original signal $S$, exactly in that location. So, you have to check whether $C$ has peaks: each peak shows that $S$ and $S2$ are aligned with a similar pattern, and then indicates that $S2$ appears in $S$ in that location.

Anatoly
  • 17,079
  • Thanks, @Anatoly! So you mean there isn't a single "magic" value that determines whether $S2$ lies within $S$, and instead, I'll have look for peaks in $C$? In order to generate $C$ in time domain, if $S$ has $N$ samples, I'll have to "move" $S2$ $N$ times to the "right", starting at sample $0$ of $S$? Is this correct? – carlosrafaelgn Sep 18 '14 at 14:55
  • 1
    Yes, you are right! There are no magic values. You have to move $S2$ a total of $N -1024$ times to the "right", starting where sample $1$ of $S2$ is aligned with sample $1$ of $S$, and ending where sample $1024$ of $S2$ is aligned with the last sample of $S$. – Anatoly Sep 18 '14 at 15:17
  • @Anatoly, I though asking the question here, but decided to create a new question ( http://math.stackexchange.com/questions/1477135) to provide more details. – Izhaki Oct 12 '15 at 21:46
1

Your problem is a classic one in detection theory where one has to decide whether a signal with unknown delay is present within the observation interval.

In order to obtain the optimal detection rule you will need to specify (make assumptions on) the statistics of the additive noise. In many cases (such as when the noise is AWGN), the optimal detection rule is the following:

(a) perform a (sliding) correlation of the observed signal with the signal of interest,

(b) find the sample where the correlation is maximum. This is the maximum likelihood estimate of the sample the signal of interest starts (if present),

(c) Compare the maximum correlation with a threshold. If the correlation is greater than the threashold then you decide that your signal of interest is indeed present.

Due to the simplicity of the above approach, this procedure is widely followed even though it may not be optimal. Note that, in any case, you cannot be 100% certain that you have made the right decision, i.e., even if the correlation is larger than the threshold there is a non-zero probability that the signal is not present (due to a "bad" noise sample realization). However, you can try to choose the threshold in such a way that it minizes this probability (or any other cost function).

In order to find the optimal threshold you have two options:

(a) perform calculations by hand (this requires knowledge of detection/estimation theory and that your model assumptions lead to tractable analysis),

(b) find the threshold by Monte Carlo simulation (which is what I would suggest)

Stelios
  • 3,077