3

Let $f: \mathbb{R}^n \rightarrow \mathbb{R}$ and $f(x) = x^T A x$. I will denote $\nabla_x$ or $\nabla$ as the gradient to some vector-valued variable and $\nabla^2$ or $H$ as the Hessian.

The lecturer postulated that $\nabla f(x) = 2 A x$, and that $\nabla^2 f(x) = 2A$.

It's not immediately clear to me that this is true. What I thought of, was that $\nabla f(x)$ always yields a column vector, and that therefore $\nabla f(x) = 2 A x$. But this feels more like a trick (to remember it) and not like a proof to me.

How does one derive $\nabla_x x^T A x = 2 A x$? Why can't it be $2 (x^T A)^T = A^T x$?

Jasper
  • 503
  • Not in general, but $A$ only occurs in the quadratic form and nowhere else. Therefore the antisymmetric part cancels (right?), so you can take $B = \frac{1}{2}(A+A^T)$. However, if we leave $A$ as it is, it's $Ax \neq (x^T A)^T$ if it's not symmetric which confuses me. – Jasper Oct 23 '16 at 12:35
  • 1
    Use on the web the keyword "gradient". Asking for "gradient of X^TAX", I had at once many answers, like (http://math.stackexchange.com/q/482742) – Jean Marie Oct 23 '16 at 12:54
  • For an in-depth understanding of matricial (and graphical) understanding ofquadratic forms, I advise you this document (http://www2.econ.iastate.edu/classes/econ501/Hallam/documents/Quad_Forms_000.pdf). – Jean Marie Oct 23 '16 at 13:02

1 Answers1

4

You need $A$ to be symmetric for that.

$x^TAx=\langle Ax, x\rangle=:f(x).$

$f(x+h)=\langle A(x+h), x+h \rangle=\langle Ax, x \rangle + \langle Ax,h \rangle +\langle x,Ah \rangle+ \langle Ah,h \rangle.$

Therefore, the derivative is given by $f'_x=\langle Ax, \cdot \rangle +\langle x, A \cdot \rangle.$

Because $A$ is symmetric, (*) $$f'_x=\langle 2Ax, \cdot \rangle.$$

Since the definition of $\nabla_x f$ is the vector such that $f'_x= \langle \nabla_x f, \cdot \rangle$ (which exists and is unique by Riesz), we get $$\nabla_x f= 2Ax.$$

(*) Note that $f'_x=\langle Ax, \cdot \rangle +\langle x, A \cdot \rangle=\langle (A+A^T)x, \cdot\rangle .$ Therefore, $f'_x=\langle 2Ax, \cdot \rangle$ if and only if $2A=A+A^T$, which occurs if and only if $A$ is symmetric.