19

Let $x \in \mathbb{R}^n$

What is

$$\frac{\partial}{\partial x} [ x^Tx ]$$

My guess is: $\frac{\partial}{\partial x} [ x^Tx ] = 0$, because $[x^Tx] \in \mathbb{R}^1$, hence a real number as is interpreted as scalar in this derivation.

Mahoni
  • 793
  • i think norm of any vector is always a positive real number so it is constant and derivative of constant function is zero. – Kns Apr 16 '12 at 09:16
  • 3
    @Kunjanshah: Do you think every function $\Bbb R^n\to \Bbb R$ is constant? – anon Apr 16 '12 at 09:25
  • Great question. This has stumped me as well for the same reason :) – noPE Apr 11 '23 at 12:10

2 Answers2

17

Write x as $(x_1, x_2, \cdots, x_n)$. Then $x^t x = \sum_i x_i^2$. So, for example, $$\frac{d}{dx_1} x^t x = \frac{d}{dx_1} \left( \sum x_i^2\right) = \frac{d}{dx_1} x_1^2 = 2x_1$$ and similarly for each of the other components of $x$. From this it should be clear that $$\frac{d}{dx} x^t x = 2x^t$$ (The transpose is there because the derivative is a map $\mathbb{R}^n\rightarrow\mathbb{R}$, so expressed as a matrix it must have dimension $1\times n$, or alternatively, as a linear map it must live in the dual space to $\mathbb{R}^n$, i.e. the space of linear maps $\mathbb{R}^n \rightarrow \mathbb{R}$.)

Your question perhaps betrays some confusion as to what the derivative is. Although for each $x$ the value of $x^t x$ is a single number, i.e. a scalar, the derivative expresses the amount by which $x^t x$ changes as the entries of $x$ change. This is surely nonzero, since the value of $x^t x$ depends on the entries of $x$.

Gavin
  • 179
11

Let $u:\mathbb R^n\to\mathbb R$, $x\mapsto u(x)=x^Tx$. There exists a linear application $\ell_x:\mathbb R^n\to\mathbb R$, called the gradient of $u$ at $x$, such that

$$u(x+z)=u(x)+\ell_x(z)+o(\|z\|)$$

when $z\to0$. To compute $\ell_x$, note that $$ u(x+z)=(x+z)^T(x+z)=x^Tx+z^Tx+x^Tz+z^Tz=u(x)+2x^Tz+o(\|z\|), $$ hence $$ \ell_x(z)=2x^Tz. $$ Every linear form $\ell$ on $\mathbb R^n$ has the form $\ell:z\mapsto w^Tz$ for some $w$ in $\mathbb R^n$ hence one often identifies $\ell$ with $w$ (technically, this is identifying the dual of $\mathbb R^n$ with $\mathbb R^n$). In the present case, one may identify the gradient $\ell_x$ of $u$ at $x$ (a linear application from $\mathbb R^n$ to $\mathbb R$) with the vector $2x$ (an element of $\mathbb R^n$), and indeed, one often reads the formula $$ (\text{grad}\ u)(x)=2x. $$

Did
  • 279,727
  • 1
    Thanks for this specific answer, but I am afraid this does not help me. Is this a counter example, why do you introduce new variables? What is $\frac{\partial}{\partial x} [x^Tx]$ after all? – Mahoni Apr 16 '12 at 09:40
  • Nothing specific here, please read again: the object you call $\frac{\partial}{\partial x}(x^Tx)$ (using a notation I cannot recommend) is $(\text{grad}\ u)(x)$, that is, $2x$. – Did Apr 16 '12 at 09:45
  • This clarifies things a lot and what about $(grad\ x) (x^T x)$? – Mahoni Apr 16 '12 at 09:53
  • 2
    Not a valid expression. – Did Apr 16 '12 at 09:57
  • I understood most of the answer and I would say that this explanation is beautiful (if not "Elegant") but can you please explain this part: "technically, this is identifying the dual of $\mathbb R^n\ with\ \mathbb R^{n}$" ? – Satish May 01 '20 at 14:46