Gradient of norms - general advice

Question

I have something of the following sort:

$$ F(x): \mathbb{R}^n \to \mathbb{R} $$

Where $F(x)$ is a function mapping from one value to another. For example, I may have functions of the form

$$ F(x) = \|x - x_0\|_2^2 $$

or

$$ F(x) = \|Ax - b\|_2^2 $$

Now, I would like to know how to find the gradient for different $l_2$ norms as follows:

$$ \nabla F(x)$$

I also know that

$$ F(x) = \|x - x_0\|_2^2 = (x - x_0)^T(x-x_0)$$

Unfortunately, my vector/norm calculus is not knowledgeable, so I would like to know general methods to apply when using calculus on these mathematical objects/books to consult on how to perform these. I know how to break down the matrix/vector and thus perform the gradient calculations, but I would like a generalized way to perform these computations on the whole matrices/vectors and norms without breaking them down into their element-wise operations.

Thanks for these comments. This is very helpful in figuring out the general mathematics of calculus of norms. — qxzsilver, Jul 06 '20 at 16:52

littleO · Accepted Answer · 2020-07-03T19:12:58.410

Background info: If $F:\mathbb R^n \to \mathbb R^m$ is differentiable at $x$, then $F'(x)$ is an $m \times n$ matrix which satisfies $$ \tag{1} \underbrace{F(x + \Delta x)}_{m \times 1} \approx \underbrace{F(x)}_{m \times 1} + \underbrace{F'(x)}_{m \times n} \underbrace{\Delta x}_{n \times 1}. $$ The approximation is good when $\Delta x$ is small. The local linear approximation (1) is sometimes called "Newton's approximation", and it is the key to understanding and computing derivatives in calculus. It is the basic idea at the heart of differential calculus. Most formulas of calculus can be derived easily just by applying Newton's approximation.

In the special case that $F:\mathbb R^n \to \mathbb R$, $F'(x)$ is a $1 \times n$ matrix (a row vector). Often we use the convention that the gradient of $F$ at $x$ is a column vector, so that $$ \nabla F(x) = F'(x)^T. $$

For $F(x) = \|x \|_2^2$, if you don't want to compute the partial derivatives of $F$ (which would be easy in this case), you could think directly in terms of Newton's approximation. With this choice of $F$, we have \begin{align} F(x + \Delta x) &= \|x + \Delta x \|_2^2 \\ &= \| x \|^2 + 2 x^T \Delta x + \| \Delta x \|_2^2 \\ &\approx F(x) + 2 x^T \Delta x. \end{align} Comparing with Newton's approximation, we discover that $$ F'(x) = 2 x^T. $$ If we use the convention that $\nabla F(x)$ is a column vector, then $$ \nabla F(x) = F'(x)^T = 2x. $$ This is the result we would expect or guess based on what we know about single-variable calculus. (In single-variable calculus, if $F(x) = x^2$, then $F'(x) = 2x$.)

To compute the gradient of the function $F(x) = \| Ax - b \|_2^2$, I recommend using the chain rule, as I explained here: https://math.stackexchange.com/a/3508376/40119 This makes the calculation simple and elegant.

Thank you very much for the analytical and intuitive approach to thinking about these. This helped immensely. — qxzsilver, Jul 06 '20 at 16:53

Gradient of norms - general advice

1 Answers1