Reading the justification for Least Squares Minimization (in Matrix form), I come up with the expression:
$$0=\frac{dS}{d\beta}\left( \hat{\beta}\right)= \frac{d}{d\beta}\left.\left( y^Ty-\beta^T X^Ty-y^TX\beta+\beta^TX^TX\beta \right)\right\rvert_{\beta=\hat{\beta}} = -2X^Ty+ 2X^TX\hat{\beta}$$
I am not sure I follow the rules they apply here, what are the main matrix derivative rules applied in this expression ?