12

This assertion came up in a Deep Learning course I am taking. I understand intuitively that the eigenvector with the largest eigenvalue will be the direction in which the most variance occurs. I understand why we use the covariance matrix's eigenvectors for Principal Component Analysis.

However, I do not get why the eigenvectors' variance are equal to their respective eigenvalues. I would prefer a formal proof, but an intuitive explanation may be acceptable.

(Note: this is not a duplicate of this question.)

AlexMayle
  • 258
  • As you build the matrix $M$ as a sum of outer products : $M = \sum{v}{v}^T$ what ends up in the respective elements are the expected values $M_{ij} = E[v_{i}v_{j}]$ with i and j being vector positions. This is all before any transformation to the space of principal vectors is done. Maybe it can help figuring out the rest. – mathreadler Feb 16 '17 at 14:52
  • As mentioned in Omnoms answer $M_{ij}$ will contain $\sum v_iv_j$ which is one way to estimate of $E[X_i X_j]$, assuming the components of vectors $v_i$ are samples drawn from the $X_i$ which are the random variables. – mathreadler Feb 16 '17 at 15:07
  • 1
    An eigenvector of a covariance matrix is not a random vector, so the variance of an eigenvector does not make sense. If it was a random vector, it would make more sense to talk about the covariance matrix of this random vector and not the variance. – Cm7F7Bb Jun 30 '17 at 12:44
  • Eigenvectors are unit vectors having variance … 1. – Sergey Bushmanov Jan 10 '21 at 06:23

1 Answers1

23

Here's a formal proof: suppose that $v$ denotes a length-$1$ eigenvector of the covariance matrix, which is defined by $$ \Sigma = \Bbb E[XX^T] $$ Where $X = (X_1,X_2,\dots,X_n)$ is a column-vector of random variables with mean zero (which is to say that we've already absorbed the mean into the variable's definition). So, we have $\Sigma v = \lambda v$ (for some $\lambda \geq 0$), and $v^Tv = 1$.

Now, what do we really mean by "the variance of $v$"? $v$ is not a random variable. Really, what we mean is the variance of the associated component of $X$. That is, we're asking about the variance of $v^TX$ (the dot product of $X$ with $v$). Note that, since the $X_i$s have mean zero, so does $v^TX$. We then find $$ \Bbb E([v^TX]^2) = \Bbb E([v^TX][X^Tv]) = \Bbb E[v^T(XX^T)v] = v^T\Bbb E(XX^T) v \\ = v^T\Sigma v = v^T\lambda v = \lambda(v^Tv) = \lambda $$ and this is what we wanted to show.

Ben Grossmann
  • 225,327
  • Perfect answer. Thank you very much. – AlexMayle Feb 16 '17 at 15:34
  • 2
    This part is crucial: what do we really mean by "the variance of "? is not a random variable. Really, what we mean is the variance of the associated component of. Thanks! – Abramo Sep 05 '20 at 21:58