Transposition problems inside the Gradient of squared l2 norm

Question

For the notations, I put all vectors in lowercase boldface letters. The notation $(.)^T$ stands for the transpose and $||.||_2^2$ denotes the squared $l_2$ norm.

Suppose $\mathbf{x} =[x_1, x_2, \cdots, x_n]\in \mathbb{R}^n$, $\mathbf{y} = [y_1, y_2, \cdots, y_n]^T \in \mathbb{R}^n$, and $\mathbf{z} = [z_1, z_2, \cdots, z_n]\in \mathbb{R}^{1 \times n}$.

I know that $||\mathbf{x}||_2^2 = \sum \limits_{i=1}^n x_i^2$. Obviously, the derivative of $||\mathbf{x}||_2^2$ w.r.t $\mathbf{x}$ is $2\mathbf{x}$, that is, $\frac{\partial ||\mathbf{x}||_2^2}{\partial x} = 2\mathbf{x}$. The proof is found here (click here to see the proof) where @Surb answered @user167133's question. It also follows that $\frac{\partial ||\mathbf{x} - \mathbf{y}||_2^2}{\partial x} = 2(\mathbf{x} - \mathbf{y})$.

My main problem to be resolved by some of your suggestions and details:

I am wondering if $\frac{\partial ||\mathbf{x}^T||_2^2}{\partial x}$ is equal to $2\mathbf{x}$ or $2\mathbf{x}^T$. I know that $||\mathbf{x}^T||_2^2 = \sum \limits_{i=1}^n x_i^2 = ||\mathbf{x}||_2^2$. For me, as the $||\mathbf{x}^T||_2^2 = ||\mathbf{x}||_2^2$, so I can say that $\frac{\partial ||\mathbf{x}^T||_2^2}{\partial x} = 2\mathbf{x}$.

But what if I suppose that I have $||\mathbf{x}^T - \mathbf{z}||_2^2$ and I want to compute $\frac{\partial ||\mathbf{x}^T - \mathbf{z}||_2^2}{\partial x}$.

In fact, $||\mathbf{x}^T - \mathbf{z}||_2^2 = \sum \limits_{i=1}^n (x_i^2 - z_i^2)$. So if I consider that $\frac{\partial ||\mathbf{x}^T - \mathbf{z}||_2^2}{\partial x} = 2(\mathbf{x} - \mathbf{z})$, this of course is not correct (even impossible) since $\mathbf{x} \in \mathbb{R}^n$ and $\mathbf{z}\in\mathbb{R}^{1 \times n}$. So if $\frac{\partial ||\mathbf{x}^T||_2^2}{\partial x}$ is really $2\mathbf{x}$, so in this case $\frac{\partial ||\mathbf{x}^T - \mathbf{z}||_2^2}{\partial x}$ will be equal to $2(\mathbf{x}^T - \mathbf{z})$ or $2(\mathbf{x}^T - \mathbf{z})^T$?

I am really so confused, and any further details from you will be very appreciated.

rych · Answer 1 · 2016-10-31T10:52:35.953

You are looking for the Fréchet derivative of the map $f:\mathbb R^n\to\mathbb R$, $f:x\mapsto\|x\|^2$ at $x$. It is a linear map denoted often by $Df_x(h)$. We can find it as follows, $$ D\|x\|^{2}(h)=\left.\tfrac{d}{dt}\right|_{0}\|x+tx\|^{2}=\left.\tfrac{d}{dt}\right|_{0}\langle x+th,x+th\rangle=2\langle x,h\rangle $$

Now, you may introduce the "gradient vector" $grad f$, and this is what some people mean when they write $\tfrac{\partial f}{\partial x}$, such as $Df(h)\equiv\langle grad f,h\rangle$. In this case, $grad f(x)=2x.$

Transposition problems inside the Gradient of squared l2 norm

1 Answers1

Linked