So I was under the impression that the L2 norm squared of a vector x is just 2x, but the example in the screenshot I have linked to says otherwise. What gives? I can't figure out why there's an extra A transpose factor in the result for the derivative.
-
1Because you have $f(x) = n(l(x))$, where $l(x) = Ax-b$ and $n(y) = {1\over 2} |y|^2$, so you need to use the chain rule. Alternatively, just compute $f(x+h)-f(x)$ and look at the terms that are linear in $h$. – copper.hat Nov 08 '15 at 03:37
-
1Does this answer your question? How to take the gradient of the quadratic form? – Rodrigo de Azevedo Jul 19 '23 at 07:45
1 Answers
You can use the chain rule for this problem. But for matrix/vector problems the intermediate derivatives required by the chain rule often involve complicated 3rd and 4th order tensors. So my preferred approach is to use successive change-of-variables within differential expressions.
Define the varible $y=Ax+b$. Then the norm (written in terms of the Frobenius product) and its differential are $$\eqalign{ f &= \|y\|_F^2 \cr &= y:y \cr\cr df &= 2y:dy \cr &= 2y:A\,dx \cr &= 2A^Ty:dx \cr }$$ Since $df=\big(\frac{\partial f}{\partial x}:dx\big),\,$ the gradient is $$\eqalign{ \frac{\partial f}{\partial x} &= 2A^Ty \cr }$$ Note that your initial impression is correct, i.e. with respect to $y$ the gradient is simply $$\eqalign{ df &= 2y:dy \cr \frac{\partial f}{\partial y} &= 2y \cr }$$

- 1,286