1

If I want to find the gradient of

$$f(x) = x^Tx+c$$

where $x$ is a vector of size $n$ and $c$ is a constant, can I write it using the following notation

$$\frac{\partial}{\partial x} \left( x^Tx+c \right) = \frac{\partial}{\partial x}x^Tx+\frac{\partial}{\partial x}c = \frac{\partial}{\partial x}x^Tx = \frac{\partial}{\partial x}\sum_{i=1}^nx_ix_i = \sum_{i=1}^n 2x_i = 2x$$

Or more specifically does

$$\frac{\partial}{\partial x}\sum_{i=1}^nx_ix_i = \sum_{i=1}^n 2x_i = 2x$$ make any sense or am I mixing element and vector notation. I.e differenting wrt. a vector but writing it as a sum. It seemes unintuitive that i can go from a sum to a scalar times a vector in the last step?

  • @RodrigodeAzevedo - this exactly is my question, as i write in the last sentence. But this comment confirms my suspicion :) Does the first part even make sense? - $$\frac{\partial}{\partial x} \sum_{i=1}^n x_ix_i$$ –  Nov 27 '19 at 15:16
  • You want to compute $n$ partial derivatives. The partial derivative wrt $x_j$ can be written using the Kronecker delta. Alternatively, you can include $\mathrm e_i$ in your sum, where $\mathrm e_i$ is the $i$-th vector of the standard basis, with a $1$ at entry $i$ and zeros elsewhere. – Rodrigo de Azevedo Nov 27 '19 at 15:19
  • The first part makes sense, but I prefer using $\nabla_x$ to denote the gradient. – Rodrigo de Azevedo Nov 27 '19 at 15:19
  • Related: https://math.stackexchange.com/q/222894/339790 – Rodrigo de Azevedo Nov 27 '19 at 15:20

2 Answers2

1

Why is it un-intuitive ?

The derivative w.r.t a vector is defined as $$\frac{\partial f(x)}{\partial x} = \begin{bmatrix}\frac{\partial f(x)}{\partial x_1} \\ \vdots \\ \frac{\partial f(x)}{\partial x_n} \end{bmatrix} \tag{1}$$

So in your case, $$\frac{\partial f(x)}{\partial x_k} = \frac{\partial }{\partial x_k} (x^Tx + c ) = \frac{\partial }{\partial x_k} x^Tx = \frac{\partial }{\partial x_k} \sum_{i=1}^n x_i^2 = \frac{\partial }{\partial x_k} (x_1^2 + \ldots + x_k^2 + \ldots x_n^2) = 2x_k \tag{2}$$ Replacing $(2)$ in $(1)$ we get $$\frac{\partial f(x)}{\partial x} = \begin{bmatrix}2x_1 \\ \vdots \\ 2x_n \end{bmatrix} = 2x$$

Ahmad Bazzi
  • 12,076
  • This does not seem unintuitive. However you also differentiate wrt. the element in x. Not the vector itself (when considering sums.) –  Nov 27 '19 at 16:43
0

There will be sticklers who define $\partial_xf$ as the transpose of @AhmadBazzi's definition so the chain rule $df=dx^i(\partial_xf)_i$ contracts according to the Einstein convention. On this view, the derivative would be $2x^T$. The same ideas apply when we differentiate a scalar with respect to a matrix.

J.G.
  • 115,835