0

When I try to find the gradient of the MSE loss $$L(w)=\|y-Xw\|_2^2$$ (ignoring constant factors) I find two different solutions, one is the transpose of the other:

  1. Compute the gradient of mean square error claims the gradient is $$\nabla L(w)=X^TXw-X^Ty$$

  2. MSE Loss function and derivatives claims it is $$\nabla L(w)=w^TX^TX-y^TX$$

I also tried doing it myself:

$$ \begin{aligned} L(w) &= \|y-Xw\|^2 \\ &= (y-Xw)^T(y-Xw) \\ &= (y^T-w^TX^T)(y-Xw) \\ &= (y^Ty-y^TXw-w^TX^Ty+w^TX^TXw) \end{aligned} $$

and

$$\nabla L(w) = -y^TX - y^TX + w^T(X^TX+(X^TX)^T) = 2(w^TX^TX-y^TX)$$

which is equivalent to the second post. Why am I getting the transposed solution?

R3lay
  • 1

0 Answers0