Let $Y = \begin{pmatrix} y_1 \\ \cdots \\ y_N\end{pmatrix}$ and $X = \begin{pmatrix} x_{11} & \cdots & x_{1D} \\ \cdots & \cdots & \cdots \\ x_{N1} & \cdots &x_{ND}\end{pmatrix}$. Let also $e = y - Xw$ and let's write the mean square error as $L(w) = \frac{1}{2N} \sum_{i=1}^{N} (y_n - x_n^Tw)^2 = \frac{1}{2N} e^T e$.
I want to prove that the gradient of $L(w)$ is $-\frac{1}{N} X^T e$. What would be a way of proving this?