-2

Various resources claim

$ \nabla_x x^\top x = 2x $ .

I only know

$ \nabla_X f(X) = \begin{bmatrix} \frac{\partial f(X)}{\partial x_{11}} & \cdots & \frac{\partial f(X)}{\partial x_{1m}}\\ \vdots & & \\ \frac{\partial f(X)}{\partial x_{n1}} & \cdots & \frac{\partial f(X)}{\partial x_{nm}} \end{bmatrix} $

How do I use the above rule to get $\nabla_x x^\top x = 2x$?

Hint: Based on my understand, even if a question look like homework, if OP explained his point of view and his own efforts to understand, and he only asks for confirmation / rejection / details, the question should be on topic.

I tried the following, but cannot find the answer.

$ \begin{align*} \nabla_Z f(Z) =& \nabla_Z\begin{bmatrix} z_{11} & \cdots & z_{1m}\\ \vdots & & \\ z_{n1} & \cdots &z_{nm} \end{bmatrix}^\top \begin{bmatrix} z_{11} & \cdots & z_{1m}\\ \vdots & & \\ z_{n1} & \cdots &z_{nm} \end{bmatrix}\\ =& \nabla_Z\begin{bmatrix} z_{11} & \cdots & z_{n1}\\ \vdots & & \\ z_{1m} & \cdots &z_{nm} \end{bmatrix} \begin{bmatrix} z_{11} & \cdots & z_{1m}\\ \vdots & & \\ z_{n1} & \cdots &z_{nm} \end{bmatrix} \notag \\ =& \nabla_Z\begin{bmatrix} z_{11}z_{11}+z_{21}z_{21}+\cdots +z_{n1}z_{n1} & \cdots & z_{11}z_{1m}+z_{21}z_{2m}+\cdots +z_{n1}z_{nm}\\ \vdots & & \\ z_{1m}z_{11}+z_{2m}z_{21}+\cdots +z_{nm}z_{n1} & \cdots & z_{1m}z_{1m}+z_{2m}z_{2m}+\cdots +z_{nm}z_{nm} \end{bmatrix} \notag \\ =& \begin{bmatrix} \frac{\partial}{\partial z_{11}} z_{11}z_{11}+z_{21}z_{21}+\cdots +z_{n1}z_{n1} & \cdots & \frac{\partial}{\partial z_{1m}}z_{11}z_{1m}+z_{21}z_{2m}+\cdots +z_{n1}z_{nm}\\ \vdots & & \\ \frac{\partial}{\partial z_{n1}}z_{1m}z_{11}+z_{2m}z_{21}+\cdots +z_{nm}z_{n1} & \cdots & \frac{\partial}{\partial z_{nm}}z_{1m}z_{1m}+z_{2m}z_{2m}+\cdots +z_{nm}z_{nm} \end{bmatrix} \notag \\ = & \begin{bmatrix} 2z_{11} & z_{11} & \cdots & z_{11}\\ & 2z_{22} \\ \vdots & \\ z_{nm} & z_{nm} & \cdots & 2z_{nm} \end{bmatrix} \notag \end{align*} $

user85503
  • 741
Gqqnbig
  • 445

2 Answers2

1

Based on definition of gradient, gradient is only applicable to function f if the result of f is real number.

Then, since we assume $\nabla_x x^\top x = 2x$ is valid, x must be a vector, or a N-by-1 matrix. Next, we treat x as a matrix because we are using matrix operations, eg. transpose and matrix multiplication.

$$f(x)=x^\top x= x_1^2+x_2^2+\cdots+x_n^2$$

Also, based on the definition, the matrix size of $\nabla_x f(x)$ is the same as the size of x.

Now we can apply definition 1 in OP's question to $x^\top x$.

$$\begin{align*} \nabla_x x^\top x =& \begin{bmatrix} \frac{\partial x^\top x}{\partial x_{1}} \\ \vdots \\ \frac{\partial x^\top x}{\partial x_{n}} \end{bmatrix} \\ =& \begin{bmatrix} \frac{\partial x_1^2+x_2^2+\cdots+x_n^2}{\partial x_{1}} \\ \vdots \\ \frac{\partial x_1^2+x_2^2+\cdots+x_n^2}{\partial x_{n}} \end{bmatrix}\\ =& \begin{bmatrix} 2x_1 \\ \vdots \\ 2x_n \end{bmatrix} \\ =& 2 \begin{bmatrix} x_1 \\ \vdots \\ x_n \end{bmatrix} \\ =& 2x \end{align*}$$

Gqqnbig
  • 445
0

$\boldsymbol x^T\boldsymbol x=x^2+y^2+z^2$.

Then

$$\nabla_{\boldsymbol x}(\boldsymbol x^T\boldsymbol x)=\left(\frac{\partial}{\partial x}(x^2+y^2+z^2),\frac{\partial}{\partial y}(x^2+y^2+z^2),\frac{\partial}{\partial z}(x^2+y^2+z^2)\right)=(2x,2y,2z).$$