Consider the standard regression equation $$ Y = X\beta + \varepsilon $$ with predicted value $\hat{Y} = P_XY$ where $P_X = X(X'X)^{-1}X'$ is the projection onto $X$ or the "hat" matrix. Also let $\mathrm{Var}(\varepsilon) = \Sigma$ be an arbitrary symmetric (positive semi-definite) matrix.
The question: What is $\mathrm{cov}(\hat{Y}, Y)$?
This question is answered here for the case where $\Sigma = \sigma^2I$, but with general $\Sigma$ the two derivations there do not lead to the same answer.
Derivation 1
$$ \mathrm{cov}(\hat{Y}, Y) = \mathrm{cov}(P_XY, Y) = P_X \mathrm{cov}(Y, Y) = P_X\mathrm{Var}(Y) = P_X\Sigma $$
Derivation 2
Note $\hat{\varepsilon} = Y - \hat{Y}$ and $\mathrm{cov}(\hat{Y}, \hat{\varepsilon}) =0 $ (Edit: This is not true for $\Sigma\neq\sigma^2I$). Then
$$ \mathrm{cov}(\hat{Y}, Y) = \mathrm{cov}(\hat{Y}, \hat{Y} + \hat{\varepsilon}) = \mathrm{Var}(\hat{Y}) = \mathrm{Var}(P_XY) = P_X\mathrm{Var}(Y)P_X = P_X\Sigma P_X \neq P_X\Sigma $$
Derivation 2, version 2.
(Not included in the linked post.)
Recall $\hat{Y} = X\hat{\beta}$ where $\hat{\beta} = (X'X)^{-1}X'Y$ and $\mathrm{Var}(\hat{\beta}) = (X'X)^{-1}X'\Sigma X(X'X)^{-1}$. Then
$$ \mathrm{cov}(\hat{Y}, Y) = \mathrm{cov}(\hat{Y}, \hat{Y} + \hat{\varepsilon}) = \mathrm{Var}(\hat{Y}) = \mathrm{var}(X\hat{\beta}) = X\mathrm{var}(\hat{\beta})X' = P_X\Sigma P_X $$
Note that when spherical errors are assumed, so $\Sigma = \sigma^2I$, all of the above collapse to $\sigma^2P_X$, which is what the previous post found.
Where is the error?
I believe Derivation 2 is probably the correct one, but Derivation 1 seems so straightforward that I'm not sure what the problem is.
As a bonus, the following Python code numerically shows a few of the above identities, in particular $cov(\hat{Y}, Y) = Var(\hat{Y})$, though for the case of spherical errors, obviously.
import numpy as np
import scipy.linalg as la
# Setup
N = 100
y = np.random.normal(size=N)
X = np.random.normal(size=(N, 2))
X[:, 0] = 1
# Regression
P = X.dot(la.inv(X.T.dot(X))).dot(X.T)
yhat = P.dot(y)
ehat = y - yhat
# Covariances
cov1 = np.cov(yhat, y)
cov2 = np.cov(yhat, ehat)
print cov1[0, 0] - cov1[0, 1] # var(yhat) = cov(yhat, y)
print cov2[0, 1] # cov(yhat, ehat) = 0
Edit:
Solution (Correction to Derivation 2)
Many thanks to Arin Chaudhuri for pointing out my mistake. With general $\Sigma$, we have $\mathrm{cov}(\hat{Y}, \hat{\varepsilon}) = P_X\Sigma (I - P_X)$. If $\Sigma = \sigma^2I$, then this reduces to $0$, but otherwise is non-zero. Adding that to Derivation 2 yields the correct answer: $$ \mathrm{cov}(\hat{Y}, Y) = P_X\Sigma $$