For the loss function of Linear Regression $L(w) = (y - Xw)^T(y - Xw) $, where $y_{N \times 1}, X_{N\times D}, w_{D\times 1}$ dimensional matrices. I tried to apply product rule:
$\nabla_wL(w) = -(y - Xw)^T X - X^T(y- Xw) $.
But as you can see the dimension of first term ($1\times D$) is not matching with second term $(D \times 1)$. Also if I transpose the first term then I get the desired loss function. Where am I doing wrong?