What is the proof that covariance matrices are always semi-definite?

Question

Suppose that we have two different discreet signal vectors of $N^\text{th}$ dimension, namely $\mathbf{x}[i]$ and $\mathbf{y}[i]$, each one having a total of $M$ set of samples/vectors.

$\mathbf{x}[m] = [x_{m,1} \,\,\,\,\, x_{m,2} \,\,\,\,\, x_{m,3} \,\,\,\,\, ... \,\,\,\,\, x_{m,N}]^\text{T}; \,\,\,\,\,\,\, 1 \leq m \leq M$
$\mathbf{y}[m] = [y_{m,1} \,\,\,\,\, y_{m,2} \,\,\,\,\, y_{m,3} \,\,\,\,\, ... \,\,\,\,\, y_{m,N}]^\text{T}; \,\,\,\,\,\,\,\,\, 1 \leq m \leq M$

And, I build up a covariance matrix in-between these signals.

$\{C\}_{ij} = E\left\{(\mathbf{x}[i] - \bar{\mathbf{x}}[i])^\text{T}(\mathbf{y}[j] - \bar{\mathbf{y}}[j])\right\}; \,\,\,\,\,\,\,\,\,\,\,\, 1 \leq i,j \leq M $

Where, $E\{\}$ is the "expected value" operator.

What is the proof that, for all arbitrary values of $\mathbf{x}$ and $\mathbf{y}$ vector sets, the covariance matrix $C$ is always semi-definite ($C \succeq0$) (i.e.; not negative definte; all of its eigenvalues are non-negative)?

Positive semidefinite is not the same as "not negative definite", although you might say "nonnegative definite". — Robert Israel, Feb 27 '12 at 19:43

Did · Accepted Answer · 2012-02-27T19:29:19.833

57

A symmetric matrix $C$ of size $n\times n$ is semi-definite if and only if $u^tCu\geqslant0$ for every $n\times1$ (column) vector $u$, where $u^t$ is the $1\times n$ transposed (line) vector. If $C$ is a covariance matrix in the sense that $C=\mathrm E(XX^t)$ for some $n\times 1$ random vector $X$, then the linearity of the expectation yields that $u^tCu=\mathrm E(Z_u^2)$, where $Z_u=u^tX$ is a real valued random variable, in particular $u^tCu\geqslant0$ for every $u$.

If $C=\mathrm E(XY^t)$ for two centered random vectors $X$ and $Y$, then $u^tCu=\mathrm E(Z_uT_u)$ where $Z_u=u^tX$ and $T_u=u^tY$ are two real valued centered random variables. Thus, there is no reason to expect that $u^tCu\geqslant0$ for every $u$ (and, indeed, $Y=-X$ provides a counterexample).

edited Feb 27 '12 at 19:29

answered Feb 27 '12 at 18:16

Did

279,727

2

Writing $C = E (X X^T)$ implies $X$ is a zero mean random vector? – rims Feb 15 '20 at 00:50
@ironX One can think of $X$ as $Y-E(Y)$, then $E(XX^T)$ will give the covariance matrix for the random vector $Y$ – Arpit Saxena Mar 23 '20 at 13:27
Equating $X$ to $Y-E(Y)$ means that $X$ is zero mean as @ironX asked. – Homero Esmeraldo Feb 08 '22 at 18:02
In general, $C = E(XX^t)- E(X)E(X)^t$. It is only when you have a random vector $X$ with zero mean that $C = E(XX^t)$. – Hector May 18 '23 at 14:48

hkBattousai · Answer 2 · 2022-12-10T22:09:23.687

36

Covariance matrix C is calculated by the formula, $$ \mathbf{C} \triangleq E\{(\mathbf{x}-\bar{\mathbf{x}})(\mathbf{x}-\bar{\mathbf{x}})^T\}. $$ Where are going to use the definition of positive semi-definite matrix which says:

A real square matrix $\mathbf{A}$ is positive semi-definite if and only if
$\mathbf{b}^T\mathbf{A}\mathbf{b}\succeq0$
is true for arbitrary real column vector $\mathbf{b}$ in appropriate size.

For an arbitrary real vector u, we can write, $$ \begin{array}{rcl} \mathbf{u}^T\mathbf{C}\mathbf{u} & = & \mathbf{u}^TE\{(\mathbf{x}-\bar{\mathbf{x}})(\mathbf{x}-\bar{\mathbf{x}})^T\}\mathbf{u} \\ & = & E\{\mathbf{u}^T(\mathbf{x}-\bar{\mathbf{x}})(\mathbf{x}-\bar{\mathbf{x}})^T\mathbf{u}\} \\ & = & E\{s^2\} \\ & = & \sigma_s^2. \\ \end{array} $$ Where $\sigma_s$ is the variance of the zero-mean scalar random variable $s$, that is, $$ s = \mathbf{u}^T(\mathbf{x}-\bar{\mathbf{x}}) = (\mathbf{x}-\bar{\mathbf{x}})^T\mathbf{u}. $$ Square of any real number is equal to or greater than zero. $$ \sigma_s^2 \ge 0 $$ Thus, $$ \mathbf{u}^T\mathbf{C}\mathbf{u} = \sigma_s^2 \ge 0. $$ Which implies that covariance matrix of any real random vector is always positive semi-definite.

edited Dec 10 '22 at 22:09

answered Mar 11 '13 at 22:34

hkBattousai

4,543

Interesting... Did you compare your approach to (a part of) an answer posted one year earlier? – Did Apr 24 '14 at 06:19
14

Rereading this answer five years later, I realize it is actually completely wrong, confusing random variables with real numbers. Nice upvotes though... – Did Nov 24 '18 at 19:51
@Did: Completely agree with you. No idea why this answer is even allowed here. – Akshay Bansal Feb 15 '19 at 09:25
@Did what do you mean, what's the matter here ? reading that second answer made me understand that the whole proof of the first answer lies upon the fact that there is a number that is squared thus non negative. What's wrong with that second answer? – Marine Galantin Apr 16 '20 at 15:39
@MarineGalantin: In the second calculations hkBattousai states that "s = u^T(x-xbar)" and in third calculations "sigma_s = u^T(x-xbar)". s is a stochastic variable and sigma_s is the variance. – aaa May 29 '20 at 09:40
@aaa what do you mean? I don't understand your comment. – Marine Galantin May 29 '20 at 12:50
1

@MarineGalantin The incorrectness lies in that hkBattousai mixes the the stochastic variable "s" with the variance "sigma", that is, he treats them as they were equals. (That's at least how I understood it, I might be wrong.) – aaa May 29 '20 at 13:12
2

For anyone reading this now, the answer has been corrected. – dkarkada Feb 22 '24 at 20:41

score 2 · Answer 3 · answered Jun 25 '23 at 08:47

This is the correct way to justify and elaborate on hkBattousai's answer.

If $\mu=(E(X_{1}),...,E(X_{n}))^{T}$ and $\Sigma$ be the covariance matrix. Then for any $v\in\Bbb{R}^{n}$

We have \begin{align}\langle v,\Sigma v\rangle&=\sum_{i,j=1}^{n}v_{i}\Sigma_{ij}v_{j}\\ &= \sum_{i,j=1}^{n}v_{i}E\bigg((X_{i}-\mu_{i})(X_{j}-\mu_{j})\bigg)v_{j}\end{align}

Now if we consider the random variable $\displaystyle Y=\sum_{i=1}^{n}v_{i}X_{i}$ .

Then $\displaystyle\bigg(Y-\sum_{i=1}^{n}v_{i}\mu_{i}\bigg)^{2}=\sum_{i,j=1}^{n}\left(v_{i}X_{i}-v_{i}\mu_{i}\right)\left(v_{j}X_{j}-v_{j}\mu_{j}\right)=\sum_{i,j=1}^{n}v_{i}\left(X_{i}-\mu_{i}\right)(X_{j}-\mu_{j})v_{j}$

Thus $\displaystyle E\bigg(\sum_{i,j=1}^{n}v_{i}\left(X_{i}-\mu_{i}\right)(X_{j}-\mu_{j})v_{j}\bigg)=\sum_{i,j=1}^{n}v_{i}E\bigg((X_{i}-\mu_{i})(X_{j}-\mu_{j})\bigg)v_{j}=Var(Y)\geq0$

And that's basically it. So $\langle v,\Sigma v\rangle\geq 0$ for all $v\in\Bbb{R}^{n}$

What is the proof that covariance matrices are always semi-definite?

3 Answers3

Linked