If $\hat{Y}$ is the OLS linear regression model for $Y$, what can I say about $\operatorname{Cov}(\hat{Y},Y)$? Is this value $0$?
-
3In linear minimum-mean-square error estimation where $\hat{Y}=aX+b$ with $a$ and $b$ chosen so as to minimize $E[(Y-\hat{Y})^2]$, the residual error $Y-\hat{Y}$ is orthogonal to the estimate $\hat{Y}$, that is, $\text{cov}(Y-\hat{Y},\hat{Y}) = 0$ and so $$\text{cov}(\hat{Y},Y) = \text{var}(\hat{Y})$$ – Dilip Sarwate Dec 27 '12 at 17:16
-
I would like to ask further if var$(\hat{Y})$ = var$(Y)$? And, on the space of random variables, why do in some cases we use expectation as the inner product and in others we use the covariance? – Peter Dec 27 '12 at 17:49
-
No, $\text{var}\hat{Y)} = \rho^2\text{var}(Y)$ (where $\rho$ is the Pearson correlation coefficient) is generally smaller than $\text{var}(Y)$. – Dilip Sarwate Dec 27 '12 at 19:47
1 Answers
$\newcommand{\var}{\operatorname{var}}$ $\newcommand{\cov}{\operatorname{cov}}$ If you know matrix algebra, one often writes $$ \begin{array}{ccccccccccc} Y & = & X & \beta & + & \varepsilon \\ \\ (n\times1) & & (n\times p) & (p\times1) & & (n\times1) \end{array} $$ where $X$ is observable and "fixed" (i.e. not random), $\beta$ is unobservable and fixed, $\varepsilon$ is unobservable and random, and $Y$ is observable and random. The $n\times n$ matrix $H = X(X^T X)^{-1}X^T$ projects orthogonally onto the column space of $X$, and $$ \hat Y = HY. $$ Recall that if $Y$ is an $n\times 1$ random column vector, then $$ V=\var(Y) = \mathbb E\Big( (Y-\mathbb E Y)(Y - \mathbb E Y)^T \Big) $$ is an $n\times n$ matrix. And $$ \begin{array}{cccccccccccccccc} \cov\Big( & A & Y & , & B & Y & \Big) & = & A & \var(Y) & B^T \\ \\ & (j\times n) & (n\times1) & & (k\times n) & (n\times1) & & & (j\times n) & (n\times n) & (n\times k) \end{array} $$ is a $j\times k$ matrix.
So $$ \cov(\hat Y, Y) = \cov(HY, Y) = H \cov(Y,Y) = H\sigma^2 I_{n\times n} = \sigma^2 H. $$
You could also write $$ \cov(\hat Y, Y) = \cov(\hat Y, \hat Y) + \cov(\hat Y, \hat\varepsilon) $$ $$ = \cov(HY, HY) + 0 = H\cov(Y,Y) H^T = H(\sigma^2 I_{n\times n})H^T = \sigma^2 HH^T. $$
But, being the matrix of an orthogonal projection, $H$ is both its own transpose and its own square, so this reduces to the same thing we got by the other method.