7

Let $Y = \begin{pmatrix} y_1 \\ \cdots \\ y_N\end{pmatrix}$ and $X = \begin{pmatrix} x_{11} & \cdots & x_{1D} \\ \cdots & \cdots & \cdots \\ x_{N1} & \cdots &x_{ND}\end{pmatrix}$. Let also $e = y - Xw$ and let's write the mean square error as $L(w) = \frac{1}{2N} \sum_{i=1}^{N} (y_n - x_n^Tw)^2 = \frac{1}{2N} e^T e$.

I want to prove that the gradient of $L(w)$ is $-\frac{1}{N} X^T e$. What would be a way of proving this?

the_candyman
  • 14,064
  • 4
  • 35
  • 62
user1868607
  • 5,791

1 Answers1

11

Since

$$ L(w) = \frac{1}{2N}\sum_{n=1}^N(y_n - (Xw)_n)^2 $$

it follows that

$$ \frac{\partial L}{\partial w_j} = -\frac{1}{N}\sum_{n=1}^N x_{nj}(y_n - (Xw)_n) = -\frac{1}{N}x_j^Te, $$

where $x_j$ is the $j$th column of $X$. Therefore,

$$ \nabla L(w) = -\frac{1}{N}X^Te $$

K. Miller
  • 4,688