Given matrix $A \in \mathbb R^{m \times n}$ and vector $y \in \mathbb R^m$, I want to take the gradient of the following scalar field with respect to $x\in \mathbb R^n$.
$$x \mapsto \big((Ax - y)^T(Ax - y) \big),$$
$\textbf{Attempt}.$ \begin{align} \frac{\partial}{\partial x} \big((Ax - y)^T(Ax - y) \big) &= \frac{\partial}{\partial x} \big( (x^TA^TAx - x^TA^Ty - y^TAx+ y^Ty )\big)\\ &= \frac{\partial}{\partial x}x^TA^TAx - \frac{\partial}{\partial x}x^TA^Ty - \frac{\partial}{\partial x}y^TAx+ \frac{\partial}{\partial x}y^Ty \\ &= 2 A^TAx - A^Ty - y^TA\qquad\,\,\mathbf{(1*)}\\ &= 2 A^TAx - 2A^Ty. \qquad\qquad\,\mathbf{(2*)}\\ \end{align}
$\textbf{Question}.$ There are two expressions above marked by $(*)$. I don't understand the justification in going from $(1*)$ to $(2*)$ (in fact, the dimensions don't make sense...), which makes me think that there is a mistake in $(1*)$. Can someone explain the basics involved in these matrix manipulations?