Linear Regression formula and vector derivatives

Question

I am currently following Andrew NG's Standford Online Machine Learning course and decided to prove the formulas on my own since it that's my method of understanding them better. I am a bit stuck on the linear regression formula.

$$\textrm{Let X represent the matrix of inputs where m is the number of input features and n is the number of inputs}$$ $$X = \begin{bmatrix} 1 & x_{1}^{(1)} & \cdots & x_{m}^{(1)} \\\ \vdots & \vdots & \ddots & \vdots \\\ 1 & x_{1}^{(n)} & \cdots & x_{m}^{(n)} \end{bmatrix} $$

$$\textrm{Y is the vector of the outputs}$$ $$Y = \begin{pmatrix} y^{(1)} & \cdots & y^{(n)} \end{pmatrix} ^{T}$$

$$\theta \textrm{ is the vector of the line parameters}$$ $$\theta = \begin{pmatrix} \theta_{1} & \cdots & \theta_{m} \end{pmatrix} ^{T}$$

$$\textrm{And the loss function } L(\theta) \textrm{ would be } ||X\theta - Y||^2$$

So to minimize this, $$ \frac{\partial{L(\theta)}}{\partial{\theta}} = \frac{\partial{((X\theta - Y)^T (X\theta - Y))}}{\partial{\theta}} $$ $$ =\frac{\partial{(\theta^T X^T X\theta - \theta^T X^T Y - Y^T X\theta + Y^T Y)}}{\partial{\theta}} $$ $$ =2X^T X \theta - X^T Y - Y^T X $$

But according to Wikipedia, it's supposed to be, $$ 2X^T X \theta - 2X^T Y $$

But $X^T Y \neq Y^T X$ since X isn't a vector, so where did I make a mistake? I've been trying to find it for some time already.

Or the alternative method, $$ \frac{\partial{((X\theta - Y)^T (X\theta - Y))}}{\partial{\theta}} = \left(\frac{\partial (X\theta - Y)}{\partial \theta}\right)^T \frac{\partial{((X\theta - Y)^T (X\theta - Y))}}{\partial{(X\theta - Y)}}$$ $$ = 2X^T (X\theta - Y)$$ which is the same as Wikipedia.

It is true that $X^T Y \neq Y^T X$ but notice that: $\theta^T X^T Y = Y^T X \theta$ since the transpose of real number is itself — Thành Nguyễn, Dec 08 '23 at 03:05
Following up on @ThànhNguyễn's answer: See this answer https://math.stackexchange.com/a/20712/1230700 — Adrian Fletcher, Dec 08 '23 at 03:13

Linear Regression formula and vector derivatives

0 Answers0