Let $x_i \in \mathbb{R}^n$, $y_i\in\mathbb{R}$, $i=1,\cdots,l$, be a train set for a linear model on the form $y = w^Tx$ for some $w\in\mathbb{R}^n$.
We have a loss function as mean square error (MSE): $$L(w) = \frac{1}{l} \sum_{i=0}^l(w^Tx_i-y_i)^2 = \frac{1}{l}||Xw-y||^2,$$ where $X = \begin{bmatrix}x_1^T\\\vdots\\ x_l^T\end{bmatrix}$.
So, can someone explain me why when we make $L'(w) = 0$, we get $w = (X^TX)^{-1}X^Ty$?