Gradient of $w \mapsto\|y-Xw\|_2^2$

Asked Jul 21 '23 at 15:43

Active Jul 22 '23 at 05:44

Viewed 116 times

When I try to find the gradient of the MSE loss $$L(w)=\|y-Xw\|_2^2$$ (ignoring constant factors) I find two different solutions, one is the transpose of the other:

Compute the gradient of mean square error claims the gradient is $$\nabla L(w)=X^TXw-X^Ty$$
MSE Loss function and derivatives claims it is $$\nabla L(w)=w^TX^TX-y^TX$$

I also tried doing it myself:

$$ \begin{aligned} L(w) &= \|y-Xw\|^2 \\ &= (y-Xw)^T(y-Xw) \\ &= (y^T-w^TX^T)(y-Xw) \\ &= (y^Ty-y^TXw-w^TX^Ty+w^TX^TXw) \end{aligned} $$

and

$$\nabla L(w) = -y^TX - y^TX + w^T(X^TX+(X^TX)^T) = 2(w^TX^TX-y^TX)$$

which is equivalent to the second post. Why am I getting the transposed solution?

edited Jul 22 '23 at 05:44

Rodrigo de Azevedo

asked Jul 21 '23 at 15:43

R3lay

4

They are conceptually the same. The only difference is whether you want to represent a gradient as a column vector (1) or as a row vector (2). – angryavian Jul 21 '23 at 16:11
Related – Rodrigo de Azevedo Jul 22 '23 at 05:37
I see, thanks. I wasn't aware that this wasn't standardized.
So if I want to get a column vector I should use d(Ax)/dx = A^T and d(x^TAx)/dx = (A^T + A)x?
– R3lay Jul 22 '23 at 09:38

Gradient of $w \mapsto\|y-Xw\|_2^2$

0 Answers0