0

Suppose I have a vector $y$ of dimension $N \times 1$, and a matrix $X$ of dimension $N \times p$ and a vector $\beta$ of dimension $p \times 1$. Then I wish to differentiate the matrix equation :

$RSS(\beta) = (y-X\beta)^T(y-X\beta)$ with respect to $\beta$.

I know that in general for a vector $x$ that $\frac{d}{dx} x^Tx = 2x$, so that the result should be something like (using the chain rule) :

$\frac {\partial RSS}{\partial \beta} = 2 (y-X\beta) \frac {\partial} {\partial \beta} (y-X\beta)$, and that the resulting answer is $-2 X^T(y-X\beta)$, but I am confused how this is achieved. Why is $X^T$ on the left side? I'm not too experienced with differentiating matrix equations and any general concept here to see this would be much appreciated.

2 Answers2

1

First, $$R(\beta) = (y^T - \beta^T X^T)(y-X\beta) = y^Ty-y^TX\beta-\beta^TX^Ty+\beta^TX^TX\beta.$$ Next observe that, $$(y^TX\beta)^T=\beta^TX^Ty$$ and so their derivaties are the same, (equal to $X^Ty$) and the derivative of $\beta^TX^TX\beta$ is $2X^TX\beta.$ So you have, $$R'(\beta) = 0 - X^Ty - X^Ty+2X^TX\beta = -2X^T(y-X\beta).$$

Student
  • 9,196
  • 8
  • 35
  • 81
1

Let $\partial_i:=\frac{\partial}{\partial\beta_i}$ so $\partial_i(y-X\beta)_j=-X_{ji}$ and$$\partial_i(y-X\beta)^2=2(X\beta-y)_jX_{ji}=2X^T(X\beta-y)_i,$$i.e. $\nabla_\beta(y-X\beta)^T(y-X\beta)=2X^T(X\beta-y)$.

J.G.
  • 115,835