82

Could someone please provide a proof for why the gradient of the squared $2$-norm of $x$ is equal to $2x$?

$$\nabla\|x\|_2^2 = 2x$$

4 Answers4

109

Use the definition. If $$f(x)=\|x\|^2_2= \left(\left(\sum_{k=1}^n x_k^2 \right)^{1/2}\right)^{2}=\sum_{k=1}^n x_k^2 ,$$ then $$\frac{\partial}{\partial x_j}f(x) =\frac{\partial}{\partial x_j}\sum_{k=1}^n x_k^2=\sum_{k=1}^n \underbrace{\frac{\partial}{\partial x_j}x_k^2}_{\substack{=0, \ \text{ if } j \neq k,\\=2x_j, \ \text{ else }}}= 2x_j.$$ It follows that $$\nabla f(x) = 2x.$$

Surb
  • 55,662
  • Thanks. One follow up question, does this still hold if x is complex? – user167133 Jul 30 '14 at 20:56
  • Isn't the L2 norm defined as: $$f(x)=|x|^2_2= \left(\left(\sum_{k=1}^n |x_k|^2 \right)^{1/2}\right)^{2}=\sum_{k=1}^n |x_k|^2$$, there is an absolute value sign surrounding $x_k$, so when you take the derivative what should pop out is $2|x_j|$ not just $2x_j$, correct me if I am wrong – Fraïssé Oct 23 '15 at 17:46
  • 4
    @Lookbehindyou well |t|^2 = t^2 for any $t\in\Bbb R$. Moreover, note that $\frac{d}{t}|t|^2 = 2|t|\operatorname{sign}(t)=2t$, where $\operatorname{sign}$ is the sign function – Surb Feb 29 '16 at 14:49
  • 1
    Could you please explain what happened to the sigma in the last step? – Gigili Aug 16 '16 at 12:38
  • 3
    @Gigili: If $k\neq j$, then $\frac{\partial }{\partial x_j}x_k^2=0$. Then, $$\sum_{k=1}^n\frac{\partial }{\partial x_j}x_k^2=0+...+0+2x_j+0+...+0=2x_j.$$ – Surb Aug 16 '16 at 12:44
  • Ah, got it. Thank you very much. :) – Gigili Aug 16 '16 at 12:45
  • @Surb Thank you for the clarification. However, is this an $\ell_2$ norm? Or an $L_2$ norm? Aren't those for the infinite-dimensional cases (countable and otherwise)? I did not remove any tags. I added the tag [scalar-fields] because the powers-that-be removed the tag [gradient], which makes it hard to find questions on computing the gradient of a scalar field. – Rodrigo de Azevedo Mar 15 '22 at 08:55
  • @Surb I introduced neither $\ell^2$ nor $L^2$. I used "$2$-norm" instead. Also, tags are not necessarily for humans. If Stack Exchange is minimally well-designed, then proper tagging should lead to more relevant entries in the Related column, which should make it easier (for humans) to find duplicates or related questions. The lack of discipline and rigor in tagging is a major weakness of Math SE, in my opinion. – Rodrigo de Azevedo Mar 15 '22 at 11:49
  • @RodrigodeAzevedo Faire enough :). – Surb Mar 15 '22 at 13:40
42

Another approach that extends to more general settings is to use the connection between the norm and the inner product, $$\|x\|^2 = (x,x).$$

We have the finite difference, \begin{align} \|x+sh\|^2 - \|x\|^2 &= (x+sh,x+sh) - (x,x) \\ &= (x,x) + 2s(x,h) + s^2(h,h) - (x,x) \\ &= 2s(x,h) + s^2(h,h). \end{align}

The gradient acting in the direction $h$ is the limit of this finite difference as the stepsize goes to zero, \begin{align} (\nabla\|x\|^2, h) &:= \lim_{s \rightarrow 0} \frac{1}{s}\left[\|x+sh\|^2 - \|x\|^2\right] \\ &= \lim_{s \rightarrow 0} \frac{1}{s}\left[2s(x,h) + s^2(h,h)\right] \\ &= (2x,h). \end{align} Since this holds for any direction $h$, the gradient must be $\nabla \|x\|^2 = 2x$.

Hanno
  • 6,302
Nick Alger
  • 18,844
5

I'm not sure if this is rigorous enough to count as a proof, but an elegant way to obtain derivatives of vector expressions is to use matrix differential calculus.

Let $y = \lVert x \rVert_2^2 = x^{T} x$ with $x \in \mathbb{R}^{n}$. Using the product rule, the differential of $y$ is $$ dy = dx^{T} x + x^{T} dx = 2 x^{T} dx $$

We can then set $$ dy = \frac{dy}{dx} dx = (\nabla_{x} y)^{T} dx = 2x^{T} dx $$ where $dy/dx \in \mathbb{R}^{1 \times n}$ is called the derivative (a linear operator) and $\nabla_{x} y \in \mathbb{R}^{n}$ is called the gradient (a vector).

Now we can see $\nabla_{x} y = 2 x$.


If $x$ is complex, the complex derivative does not exist because $z \mapsto |z|^{2}$ is not a holomorphic function.

We can, however, instead consider the real derivatives with respect to the two components of $x$. Let $x = u + i v$. With this definition, $y$ is a real function of $u, v \in \mathbb{R}^{n}$ defined by $$ y = x^* x = (u + i v)^* (u + i v) = u^T u - v^T v $$ Taking the differential $$ dy = 2 u^T du - 2 v^T dv = \frac{\partial y}{\partial u} du + \frac{\partial y}{\partial v} dv $$ and therefore $$ \nabla_{u} y = 2 u \enspace , \qquad \nabla_{v} y = -2 v $$


For an introduction to matrix differential calculus, see the lecture of Geoff Gordon on YouTube or the paper on matrix derivatives of Mike Giles.

pterojacktyl
  • 495
  • 4
  • 12
0

Here an other simple proof using directly the definition of differentiability at a point.

1-But first let's remmeber that $f(\vec{x})$ is said to be differentaible at point $x$ if $\forall \vec{h}$ you have that you can writte $ f(\vec{x}+ \vec{h})= f(\vec{x}) + L(\vec{h}) + o(\vec{h})$ with $L(\vec{h})$ a linear mapping in $\vec{h}$ and $\lim_{\vec{h} \to \vec{0}} || \frac{o(\vec{h})} {||\vec{h}||} ||$

2-Here $f(\vec{x})= ||\vec{x}||^2$ $$||\vec{x}+\vec{h}||^2 = ||\vec{x}||^2 + || \vec{h}||^2 + <\vec{x}|\vec{h}> + <\vec{h}|\vec{x}> = f(\vec{x}) + <\vec{x}|\vec{h}> + <\vec{h}|\vec{x}> + || \vec{h}||^2 $$
We note $o(\vec{h}) = || \vec{h}||^2 \Rightarrow \lim_{\vec{h} \to \vec{0}} || \frac{||\vec{h}||^2} {||\vec{h}||} || = \lim_{\vec{h} \to \vec{0}} || \vec{h}|| = 0$
Obviously $L( \vec{h}) = <\vec{x}|\vec{h}> + <\vec{h}|\vec{x}>$ as it is a linear mapping in $\vec{h}$ because we work with real number we get that $<\vec{x}|\vec{h}> = <\vec{h}|\vec{x}> \Rightarrow <\vec{x}|\vec{h}> + <\vec{h}|\vec{x}> = 2<\vec{x}|\vec{h}> $

3- Now again by definition the unique vector $ \vec{\nabla f( \vec{x})}$ satisfying $ 2<\vec{x}|\vec{h}> = <\vec{\nabla f( \vec{x})}| \vec{h}> $ is the gradient.
Thus it cames trivially that $\vec{\nabla f( \vec{x})} = 2\vec{x} $

OffHakhol
  • 708