1

This is the equation given to me in the lectures, which doesn't make sense to me when I think about it. The $x_n$ are D dimensional vectors for D features. So subtracting the mean will again result in a vector. Then taking the transpose multiplication will result in a scalar. So as far as I can see this equation will end up as a scalar instead of a matrix.

How would I be able to correctly interpret this equation? I know there are different equations in matrix form, but I should use this equation.

1 Answers1

1

Let $\mathbf{X}$ be a vector of random variables:

$$\mathbf{X}=(X_1, X_2, ... , X_D)^{\mathrm T}.$$

Then the covariance matrix of $\mathbf{X}$ is defined

$$ \operatorname{K}_{\mathbf{X}\mathbf{X}} = \begin{bmatrix} \mathrm{E}[(X_1 - \operatorname{E}[X_1])(X_1 - \operatorname{E}[X_1])] & \cdots & \mathrm{E}[(X_1 - \operatorname{E}[X_1])(X_D - \operatorname{E}[X_D])] \\ \\ \vdots & \ddots & \vdots \\ \\ \mathrm{E}[(X_D - \operatorname{E}[X_D])(X_1 - \operatorname{E}[X_1])] & \cdots & \mathrm{E}[(X_D - \operatorname{E}[X_D])(X_D - \operatorname{E}[X_D])] \end{bmatrix}.$$

or

$$ \operatorname{K}_{\mathbf{X}\mathbf{X}} = \operatorname{E}[(\mathbf{X}-\mathbf{\mu_X})(\mathbf{X}-\mathbf{\mu_X})^{\rm T}],$$

where $\mathbf{\mu_X} = \operatorname{E}[\mathbf{X}]$.

Notice that this is a definition involving random variables. In your example, your $x_n \in \mathbb{R}^D$ are real numbers sampled from random variable $\mathbf{X}$. The equation you provided is a way to estimate $\operatorname{K}_{\mathbf{X}\mathbf{X}}$ using samples $x_1, \cdots, x_N$.

I assume your $\bar{x}$ is the mean of your samples, defined

$$ \bar{x} = \sum_{n=1}^N x_n.$$

Then notice that each $(x_n - \bar{x})(x_n - \bar{x})^T$ is actually a matrix in $\mathbb{R}^{D \times D}$, since the transpose is on the second term. So you have dimension $(D \times 1)$ times dimension $(1 \times D)$ is dimension $(D \times D)$. The expression

$$ S_N = \frac{1}{N} \sum_{n=1}^N (x_n - \bar{x})(x_n - \bar{x})^{\rm T}, $$

is a Monte Carlo method to approximate $\operatorname{K}_{\mathbf{X}\mathbf{X}}$. In other words, a way to estimate the true covariance matrix using data. Notice the similarity to $\operatorname{K}_{\mathbf{X}\mathbf{X}} = \operatorname{E}[(\mathbf{X}-\mathbf{\mu_X})(\mathbf{X}-\mathbf{\mu_X})^{\rm T}].$ If your samples are independently sampled, you get the property

$$ \lim_{N \rightarrow \infty} S_N = \operatorname{K}_{\mathbf{X}\mathbf{X}}.$$

Check out these for further details.

https://en.wikipedia.org/wiki/Covariance_matrix

https://en.wikipedia.org/wiki/Monte_Carlo_method

Brian Lai
  • 770