I'm a bit confused about the universality of this statement:
Suppose we have real-valued random variables $Y,X$, and differentiable function $f(X)$ (perhaps some model). Do not assume that $f(X)$ is convex.
$$ \mathbb{E}[Y \mid X] = \text{argmin}_f \mathbb{E}[(Y - f(X))^2] $$
Is this always true? And if so, why? Most of these proofs rely on reducing the statement above to (e.g., here):
$$ \text{argmin}_f \mathbb{E}[(\mathbb{E}[Y] - f(X))^2]$$
Then they take the derivative to compute the minimum to show the result, but this would require $f(x)$ to be convex, so would the above statement always hold?