16

I want to solve the following equation $$\frac{\partial}{\partial {\bf \beta}} \left[||{\bf y}-{\bf X}{\bf \beta}||^2 + ||{\bf \beta}||^2\right] = 0$$ for $\beta$. Here ${\bf y}$ and ${\bf \beta}$ are vectors and ${\bf X}$ is a matrix. I am having trouble with the part of differentiating the equation. I can split it up into $$\frac{\partial}{\partial {\bf \beta}} ||{\bf y}-{\bf X}{\bf \beta}||^2 + \frac{\partial}{\partial {\bf \beta}}||{\bf \beta}||^2$$ and then use the rule that $$\frac{\partial}{\partial a}||a||^2 = 2a$$

The problem is with the other part. I can use the product rule, but I am still left with $\frac{\partial}{\partial {\bf \beta}}||{\bf y} - {\bf X}{\bf \beta}||^2$.

user61300
  • 163

2 Answers2

19

$$ \frac{\partial}{\partial \beta} \left(\|F(\beta)\|^2\right) = \frac{\partial}{\partial \beta} \left(F(\beta) \cdot F(\beta)\right) = 2 \left( \frac{\partial}{\partial \beta} F(\beta) \right) \cdot F(\beta) $$ $F(\beta) \in \mathcal{R}^D$, where $D$ is the dimension of $F(\beta)$.

BoltzBooz
  • 162
Robert Israel
  • 448,999
  • 1
    How would I get rid of the $\frac{\partial}{\partial} F(\beta) = \frac{\partial}{\partial} ||y-X\beta||$? – user61300 Feb 06 '13 at 23:20
  • No, take $F(\beta) = y - X \beta$. $\dfrac{\partial}{\partial \beta} (y - X \beta) = - X^T$ (i.e. $\dfrac{\partial}{\partial \beta_i} (y - X \beta)j = - X{ji}$). – Robert Israel Feb 07 '13 at 02:28
  • 1
    What would you do if you had a cubed norm instead? – Translunar Dec 31 '15 at 14:15
  • i guess you plug the norm in a variable, like z = ||y−Xβ||; so it becomes z^2 for square and z^3 for cube, then rules of derivative apply – Swas_99 Apr 25 '19 at 15:58
2

Let's do a directional derivative instead, eventually building up to some voodoo magic.

$$a \cdot \nabla_\beta [(y - \underline X(\beta))^2 + \beta^2] = -\underline X(a) \cdot [-2(y - \underline X(\beta))] + 2 \beta \cdot a$$

But $\underline X(a) \cdot b = \overline X(b) \cdot a$. This exchanges a linear operator with its adjoint.

We can then use this to write the result as

$$2a \cdot [\overline X(\underline X(\beta)-y) + \beta]$$

Now we can take out the $a$ to get

$$2[\overline X(\underline X(\beta)-y) + \beta]$$

Muphrid
  • 19,902