0

Given the matrix $A \in \Bbb R^{k \times n}$, let the scalar field $f: \Bbb R^n \to \Bbb R$ be defined by $$f(x) := \frac12 \|Ax\|_2^2$$ Calculate the directional derivative $\partial_{v}f(x)$, $v \in \Bbb R^n$ and deduce that $\nabla f(x) = A^T A x$.


Attempt:

$$\partial_{v}f(x) = \lim_{t \rightarrow 0} \frac{f(x+tv) - f(x)}{t}\implies \partial_{v}f(x) = \lim_{t \rightarrow 0} \frac{\|Ax + t Av\|^2 - \|Ax\|^2}{2t}$$

I am unsure how to proceed from here since, to my understanding, $\|Ax + tAv\|^2 - \|Ax\|^2$ cannot be simplified to $\|tAv\|^2$. If it could, $\partial_{v}f(x) = \dfrac{1}{2}\|Av\|^2$.

Azorbz
  • 185

2 Answers2

1

$$\frac{|Ax+tAv|^2-|Ax|^2}{2t}=\frac{\langle Ax+tAv,Ax+tAv \rangle-\langle Ax, Ax \rangle}{2t}=$$ $$=\frac{2\langle tAv,Ax \rangle+\langle tAv, tAv \rangle}{2t}=\langle Av,Ax \rangle +\frac{t}{2} \langle Av,Av \rangle.$$ From here we can compute directly $f$'s gradient: $$\lim_{t\to 0} \frac{f(x+tv)-f(x)-\langle A^TA x,tv \rangle}{t}{=} ({\text{simplify } t's})=$$ $$=\lim_{t\to 0} \langle Av,Ax \rangle +\frac{t}{2} \langle Av,Av \rangle-\langle A^TA x,v \rangle=\lim_{t\to 0}\langle Av,Ax \rangle-\langle A^TA x,v \rangle=0$$ The last equality holds because the transpose has this property: $\langle v,Aw \rangle=\langle A^Tv,w \rangle.$

P.s. I can verify in this way (namely with 1-dimensional limit) that A^TA x its the gradient because i know that that $f$ its already differentiable (composition of $C^\infty$ functions).

Bongo
  • 1,320
1

Note that $$|Ax|^2=x^T A^T Ax$$ where $A^T$ means the transpose of $A$.

By defenition, \begin{align} &|Ax+tAv|^2-|Ax|^2\\ &=(x^TA^T+tv^TA^T)(Ax+tAv)-x^T A^T Ax\\ &=x^TA^TAx+tx^TA^TAv+tv^TA^TAx+t^2v^TA^TAv-x^T A^T Ax\\ &=tx^TA^TAv+tv^TA^TAx+t^2v^TA^TAv \end{align} Divided by $\frac{1}{2t}$, and let $t\to0$, we obtain $$\frac{1}{2t}\left({tx^TA^TAv+tv^TA^TAx+t^2v^TA^TAv}\right)=\frac12(x^TA^TAv+v^TA^TAx+tv^TA^TAv)$$ and this tends to $$\frac12(x^TA^TAv+v^TA^TAx)$$ Finally note that $$x^TA^TAv=v^TA^TAx=\langle Ax,Av\rangle$$ as real inner product. In conclusion, we have $$\partial_{v}f(x) = \langle Ax,Av\rangle$$ On the other hand, we can also rewrite $\langle Ax,Av\rangle$ as $\langle A^TAx,v\rangle$, because $$\langle Ax,Av\rangle=v^T\left(A^TAx\right)$$ Since this is a differentiable function (on $\mathbb R^n$), a theorem in Calculus tells us $$\partial_{v}f(x)=\langle\nabla f(x),v\rangle$$ Hence $\nabla f(x) = A^T A x.$

León
  • 406