Regarding the derivative of Euclidean L2 norm, Definition of differentiation in Rudin.

Question

I am trying to understand the answer posted by hemanth in this post.

I understand how he derived the derivative $f:\Bbb R^n \rightarrow \Bbb R$ defined as $f(x)=\rVert x \rVert$,
$$ Df(x) = \nabla_x\rVert x\rVert_2= \frac{x}{\lVert x \rVert}. $$ So we have $Df:\Bbb R^n \rightarrow L(\Bbb R^n,\Bbb R)$, but according to Rudin, $Df(x)$ must be a linear transformation from $\Bbb R^n$ to $\Bbb R$, while I do not see $Df(x)h \in \Bbb R$ for any h $ \in \Bbb R^n$.
I am trying to derive $Df(x)$ such that: $$ \lim_{h\rightarrow0}=\frac{\rVert f(x+h)-f(x)-Df(x)h \rVert}{\rVert h \rVert},$$ so $Df(x)h$ must be a real number.

Could it be that $Df(x)h$ is a product of a column vector and a row vector?

If so, how does $$ \begin{split} \frac{\rVert f(x+h)-f(x)-Df(x)h \rVert}{\rVert h \rVert} &= \frac{\Vert\rVert x+h \rVert - \rVert x \rVert -Df(x)h \Vert}{\rVert h\rVert}\\ &\le\frac{\rVert\rVert x \rVert+\rVert h \rVert - \rVert x \rVert -Df(x)h \Vert}{\rVert h\rVert}=\frac{\rVert\rVert h \rVert -\frac{x}{\rVert x \rVert}h \rVert}{\rVert h \rVert}\rightarrow 0\;, \end{split} $$ as $h \rightarrow 0$ ?
Anyone can help me with this confusion?
It is my first time asking the question here, so my formatting might be bad.

Thank you for understanding.

M A Pelto · Accepted Answer · 2022-04-27T13:05:21.057

Note: the Jacobian here is the row vector, or in other words, the $1 \times n$ matrix $$Df({\bf{x}} )=\frac{1}{\|{\bf{x}}\|}{\bf{x}}^{T} \quad \quad \left({\bf x} \in \mathbb{R}^n \setminus \{ {\bf 0} \} \right).$$

By definition of the standard inner product on $\mathbb{R}^n$, we have $$Df({\bf x}){\bf h}=\frac{1}{\|{\bf{x}}\|}{\bf{x}}^{T}{\bf h}=\frac{1}{\|{\bf{x}}\|}\left( {\bf h}\cdot {\bf{x}}\right) \in \mathbb{R} \;\text{ whenever } \;{\bf x} \in \mathbb{R}^n \setminus \{ {\bf 0} \}.$$

Let ${\bf x} \in \mathbb{R}^n \setminus \{ {\bf 0} \}$, and consider $A=\|{\bf x}\|^{-1}{\bf x}^T$. Then we apparently have

\begin{aligned}\lim_{{\bf h} \to {\bf 0}} &\frac{\| f({\bf{x}}+{\bf h}) - f({\bf{x}})-A{\bf h}\|}{\| {\bf h}\|}=\lim_{{\bf h} \to {\bf 0}} \frac{ \left\|{\displaystyle \sqrt{({\bf{x}}+{\bf h})^T({\bf x}+{\bf h})}}-{\displaystyle \sqrt{{\bf x}^T{\bf x}}}- \| {\bf x}\|^{-1}{\bf x}^{T}{\bf h}\right\| }{\| {\bf h}\|} \\&\quad \quad\quad \quad\quad\quad\quad\quad\quad \quad \;\;\, = \lim_{{\bf h} \to {\bf 0}} \frac{ \left\|{\displaystyle \sqrt{{\bf x}^T{\bf x}+2{\bf x}^T{\bf h}+{\bf h}^T{\bf h}}}-{\displaystyle \sqrt{{\bf x}^T{\bf x}}}- \| {\bf x}\|^{-1}{\bf x}^{T}{\bf h}\right\| }{\| {\bf h}\|} \\&\quad \quad\quad \quad\quad \quad =\lim_{{\bf h} \to {\bf 0}} \frac{ \left\| 2{\bf x}^T{\bf h}+{\bf h}^T{\bf h}- \left({\displaystyle \sqrt{{\bf x}^T{\bf x}+2{\bf x}^T{\bf h}+{\bf h}^T{\bf h}}}+{\displaystyle \sqrt{{\bf x}^T{\bf x}}} \right)\| {\bf x}\|^{-1}{\bf x}^{T}{\bf h}\right\| }{\left({\displaystyle \sqrt{{\bf x}^T{\bf x}+2{\bf x}^T{\bf h}+{\bf h}^T{\bf h}}}+{\displaystyle \sqrt{{\bf x}^T{\bf x}} }\right)\|{\bf h}\|} \\&\quad \quad\quad \quad\quad \quad =\lim_{{\bf h} \to {\bf 0}} \frac{ \left\| 2{\bf x}^T+{\bf h}^T- \left({\displaystyle \sqrt{{\bf x}^T{\bf x}+2{\bf x}^T{\bf h}+{\bf h}^T{\bf h}}}+{\displaystyle \sqrt{{\bf x}^T{\bf x}}} \right)\| {\bf x}\|^{-1}{\bf x}^{T}\right\| \|{\bf h}\|}{\left({\displaystyle \sqrt{{\bf x}^T{\bf x}+2{\bf x}^T{\bf h}+{\bf h}^T{\bf h}}}+{\displaystyle \sqrt{{\bf x}^T{\bf x}}} \right)\|{\bf h}\|} \\&\quad \quad\quad \quad\quad \quad =\lim_{{\bf h} \to {\bf 0}} \frac{ \left\| 2{\bf x}^T+{\bf h}^T- \left({\displaystyle \sqrt{{\bf x}^T{\bf x}+2{\bf x}^T{\bf h}+{\bf h}^T{\bf h}}}+{\displaystyle \sqrt{{\bf x}^T{\bf x}}} \right)\| {\bf x}\|^{-1}{\bf x}^{T}\right\| }{\left({\displaystyle \sqrt{{\bf x}^T{\bf x}+2{\bf x}^T{\bf h}+{\bf h}^T{\bf h}}}+{\displaystyle \sqrt{{\bf x}^T{\bf x}}} \right)} \\&\quad \quad\quad \quad\quad \quad =\frac{ \left\| 2{\bf x}^T- \left({\displaystyle \sqrt{{\bf x}^T{\bf x}}}+{\displaystyle \sqrt{{\bf x}^T{\bf x}}} \right)\| {\bf x}\|^{-1}{\bf x}^{T}\right\| }{\left({\displaystyle \sqrt{{\bf x}^T{\bf x}}}+{\displaystyle \sqrt{{\bf x}^T{\bf x}}} \right)} \\&\quad \quad\quad \quad\quad \quad =\frac{ \| 2{\bf x}^T- 2\| {\bf x}\|\| {\bf x}\|^{-1}{\bf x}^{T}\| }{2\| {\bf x}\|} \\&\quad \quad\quad \quad\quad \quad =\frac{ \| 2{\bf x}^T- 2{\bf x}^{T}\| }{2\| {\bf x}\|}=\frac{ \|{\bf 0}\| }{2\| {\bf x}\|}=0. \end{aligned}

Thus by definition $9.11$ of Baby Rudin, $f$ is differentiable in $\mathbb{R}^n \setminus \{ {\bf 0} \}$ and $Df({\bf{x}} )=A$.

Thank you for your reply. Just for clarification, does the Jacobian matrix uniquely define the linear transformation Df(x)? I do not doubt your answer, but because I know that the existence of the Jacobian matrix does not imply differentiability of the function. So could it be that the Jacobian matrix might exist, but not Df(x)? — Whenever you is, Apr 27 '22 at 03:21
Sorry for messy comment, but I myself is still very unambiguous with what I am asking. I think what I should ask is that does the Jacobian in your post allow me to say that f is differentiable on R{0} due to continuous partial derivatives on R{0}? — Whenever you is, Apr 27 '22 at 03:25
Updated my answer, sorry I wanted to get this little note out of the way before typing up this mess of latex code — M A Pelto, Apr 27 '22 at 04:12
Further updated to better reflect how we find $Df({\bf{x}} )$ according to Rudin. — M A Pelto, Apr 27 '22 at 04:50

score 3 · Answer 2 · answered Apr 27 '22 at 11:09

I would simply write $$ Df(\mathbf{x})[\mathbf{h}] = \frac{\partial f}{\partial \mathbf{x}}:\mathbf{h} $$ where the colon operator denotes the inner product in $\mathbb{R}^N$ This says how much (up to first order) $f$ will change when you move from $\mathbf{x}$ to $\mathbf{x+h}$.

Regarding the derivative of Euclidean L2 norm, Definition of differentiation in Rudin.

2 Answers2