Why is it that if $f(x) = x^T A x$ then $\nabla f = \frac{1}{2} (A+A^T) x$

Question

I have that $$f(\vec{x},\vec{y}) = \vec{x}^T A \vec{x}$$

I have seen the result online that $$\nabla f = \frac{1}{2} (A+A^T) \vec{x}$$ yet I can't understand why this is the case. How do you get from one on the top to the bottom?

The thing I am actually trying to figure out is why we can go from (if $A$ is symmetric)

$$f(\vec{x},\vec{y}) = \begin{bmatrix} \vec{x} \\ \vec{y} \end{bmatrix}^T A \begin{bmatrix} \vec{x} \\ \vec{y} \end{bmatrix}$$ $$f(\vec{x},\vec{y}) = \begin{bmatrix} \vec{x} \\ \vec{y} \end{bmatrix}^T \frac{1}{2} (A+A^T) \begin{bmatrix} \vec{x} \\ \vec{y} \end{bmatrix}$$

To

$$\nabla f = \frac{1}{2}(A+A^T) \begin{bmatrix} \vec{x} \\ \vec{y} \end{bmatrix}$$

Then apparently the Hessian is as follows:

$$H = \frac{1}{2} (A + A^T) = A$$

Could someone help me to understand?

Write $x^T A x$ as a sum, then do the partial derivate with $x_j$. — thewatcher, Apr 11 '20 at 22:30
Have you tried expanding $x^TAx$ by coordinates? It’s a scalar-valued function, after all, so you should be able to compute its gradient directly. — amd, Apr 11 '20 at 22:30
You’ve got some rather odd notation here. Are you sure that you don’t mean for $x$ and $y$ in $f(\vec x,\vec y)$ and in the partial expansion of $\vec x^TA\vec x$ to be scalars? — amd, Apr 11 '20 at 22:32
If a square matrix $B$ is skew symmetric, meaning $B^T = - B,$ then for any column vector $v$ we get $v^T B v = 0.$ Because the transpose of a number (a 1 by 1 matrix) is itself — Will Jagy, Apr 12 '20 at 00:03
Does this answer your question? How to take the gradient of the quadratic form? — Rodrigo de Azevedo, Apr 12 '20 at 06:04

maciek97x · Accepted Answer · 2020-04-11T22:44:47.763

2

Let $A=\left[\begin{array}{} a & b\\ c& d\end{array}\right]$, then $$ f(x,y)=ax^2+(b+c)xy+dy^2 $$ $$ \nabla f=\left[\begin{array}{} 2ax+(b+c)y \\ (b+c)x+2dy\end{array}\right]= \left[\begin{array}{} 2a & b+c \\ b+c & 2d\end{array}\right]\cdot \left[\begin{array}{} x \\ y\end{array}\right]= (A+A^T)\cdot \left[\begin{array}{} x \\ y\end{array}\right]. $$ Why there is $\frac12$ in your result?

edited Apr 11 '20 at 22:44

answered Apr 11 '20 at 22:39

maciek97x

295

1

You're right, the 1/2 shouldn't be there. (Another way to recognize this is to think about the case where $x$ and $A$ are scalars, in which case $f(x) = Ax^2$ and $f'(x) = 2Ax$.) – littleO Apr 11 '20 at 22:57

score 2 · Answer 2 · answered Apr 11 '20 at 22:44

We want to compute the gradient with finesse, and that means we should think in terms of Newton's approximation: $$ \tag{1} f(x + \Delta x) \approx f(x) + f'(x) \Delta x. $$ If $f(x) = x^T Ax$, and $\Delta x$ is a small vector, then \begin{align} f(x + \Delta x) &= (x + \Delta x)^T A (x + \Delta x) \\ &= x^T A x + x^T A \Delta x + \Delta x^T A x + \Delta x^T A \Delta x \\ &= x^T A x + x^T A \Delta x + x^T A^T \Delta x + \text{very small term} \\ &\approx x^T A x + x^T (A + A^T) \Delta x. \end{align} Comparing this with (1), we discover that $$ f'(x) = x^T(A + A^T). $$ So, $$ \nabla f(x) = f'(x)^T = (A + A^T) x. $$

score 1 · Answer 3 · answered Apr 11 '20 at 23:01

Assuming you meant $f(x) = x^T A x$, notice that for any vector $h$, we have:

\begin{align*} f(x + h) &= (x + h)^T A (x + h) \\ &= x^T A x + h^T A x + x^T A h + h^T A h \\ &= f(x) + h^T (A + A^T)x + h^T A h \end{align*}

With $L(h) = h^T(A + A^T)x$, which is clearly linear, we find that

\begin{align*} \lim_{h \to 0} \frac{f(x + h) - f(x) - L(h)}{||h||} &= \lim_{h \to 0} \frac{h^T A h}{||h||} = 0 \end{align*}

So that $\nabla f(x) = (A + A^T)x$.

score 1 · Answer 4 · 2020-04-11T23:47:53.700

Young's theorem is that $$ \dfrac{\partial^2 f(x)}{\partial x_i \partial x_j} = \dfrac{\partial^2 f(x)}{\partial x_j \partial x_i} $$ so that cross derivatives of differentiable functions do not depend on the order of differentiation. $A$ is the Hessian of a quadratic form, so it must be symmetric. To make this really explicit, if you multiplied it out as $$ x'Ax = \sum_{i=1}^N \sum_{j=1}^N a_{ij} x_i x_j $$ and took the cross partial you would get $a_{ij} $ or $a_{ji}$ depending on the order in which you did it, so you must conclude $a_{ij} = a_{ji}$ and $A$ is symmetric, or else $f(x) = x'Ax$ is violating Young's theorem.

Why does this matter?

Consider a vector field $F(x)$ and any piecewise smooth curve from $x$ to $x'$, so the line integral is $$ \int_{0}^{1} F(\alpha(t)) d \alpha(t) $$ where $\alpha(0) = x$ and $\alpha(1) = x'$. We might ask, ``does the vector field $F(x)$ characterize a proper function, so $F(x) = \nabla f(x)$?'' If so, we'll need it to be the case that $$ f(x) - f(x') = \int_{0}^{1} F(\alpha(t)) d \alpha(t) $$ for any piecewise smooth curve $\alpha$. Basically, does the fundamental theorem of calculus hold?

If $A \neq A'$, however, $2A'x$ cannot be the gradient of a function, because it will have curl and violate Young's theorem (sometimes people use $\frac{1}{2}(A+A')$ to ``symmetricize'' $A$). So a necessary condition for $2A'x$ to be the gradient of a function is that $A$ be symmetric. Otherwise, the vector field $A'x$ would not be conservative, and line integrals would depend on the path, so they could not be integrated to get a potential function that makes sense of the vector field. The answer is not that $A$ is not symmetric and the Hessian is $\frac{1}{2}(A'+A)$: it is that $A$ must be symmetric because the Hessian of $x'Ax$ is $A$, and if you wrote $A$ as non-symmetric, you might be lucky enough that the way you do the computations will correct it for you.

Why is it that if $f(x) = x^T A x$ then $\nabla f = \frac{1}{2} (A+A^T) x$

4 Answers4