3

I am coming from reading the selected answer to this question. I have a question about the following bit:

It’s not hard to show that if the covariance matrix of the original data points $x_i$ was $\Sigma$, the variance of the new data points is just $u^T \Sigma u$.

I have been playing around with projecting two dimensional data for random variables $(x_1,x_2)$ onto the horizontal axis corresponding to $x_1$. As expected, with $u$ being in the direction of the horizontal axis, the result is $u^T \Sigma u = \operatorname{var}[x_1]$ since we are only taking into account $x_1$. However, when setting $u$ to be in direction of the identity line, the result is $$u^T \Sigma u = \operatorname{var}[x_1] + \operatorname{var}[x_1] + 2\operatorname{cov}[x_1,x_2].$$ I don’t know much about statistics and don’t understand how this represents the variance of the data when projected onto the identity line. Is there a more formal proof for why the quoted bit is true? An intuitive explanation of the result would be appreciated as well.

Edit: My question is why $u^T \Sigma u$ is the variance of the new data points as stated in the question linked above.

Rócherz
  • 3,976
Lucas Alanis
  • 1,406

1 Answers1

1

I’m not sure why are you interested on projecting onto the identity line, but: assuming zero mean variables, projecting onto $x_2=x_1$ is the same as computing the average: $z= (x_1+x_2)/2$.

Now, $E[z^2]=\frac{1}{4}E[(x_1+x_2)^2]=\frac{1}{4}( E[x_1^2] +E[x_2^2]+2 E[x_1 x_2]) $

So $\operatorname{Var}(z)=\frac{1}{4}(\operatorname{Var}(x_1)+\operatorname{Var}(x_2)+2\operatorname{Cov}(x_1,x_2))$

Rócherz
  • 3,976
leonbloy
  • 63,430