1

I am taking derivative of

$$X^TAX, X \in {\rm I\!R}^n$$

using Frechet Derivative where $$f(x + h) = f(x) + <\nabla f(x), h> + O||h|| $$.

So I have

$$f(x + h) = X^TAX + X^TAh + h^TAX +h^TAh$$

and with the two terms in between, I have

$$<X^T(A+A^T), h> $$

and I think this $X^T(A+A^T)$ is the derivative of $X^TAX$. However, since $X$ is a $n \times 1$ vextor, while $X^T(A+A^T)$ is a $1 \times n$ vector. Am I doing anything wrong here? I saw some matrix calculus instructions also have this answer. I don't know what is happening.

So the problem is if I am doing gradient decent, I will have to do $x - \nabla f(x)$, but the dimensions don't match, so I think there must be something wrong.

  • The way you use the notation $<\cdot>$ is kind of unusual and inconsistent. – user251257 Feb 13 '17 at 19:37
  • $X^TAX$ is a $1\times 1$ vector. $\nabla X^TAX$ is a $1\times n$ vector. – Doug M Feb 13 '17 at 19:41
  • @user251257 I'm sorry. It's inner product. – user3716774 Feb 13 '17 at 20:22
  • @Doug M Yes. My problem is that is X is $n \times 1$ vector, while the gradient is $1 \times n$, isn't this wired? If I am doing gradient decent, I have to do $ x - \nabla f(x) $. If the dimensions don't match, how can I do the gradient decent? – user3716774 Feb 13 '17 at 20:24
  • @user3716774 in such context, the gradient is usually the transpose of the Jacobian. So just transpose you result. – user251257 Feb 13 '17 at 20:46
  • @user3716774 Thank you very much! So both $X^T(A^T + A)$ and its transpose are the gradient of the function? Or is there some rules in this context? Do you have some reference that I can look into? Thank you again. – user3716774 Feb 13 '17 at 22:13

1 Answers1

1

Let $f : \mathbb R^n \to \mathbb R$ be defined by $f (\mathrm x) := \mathrm x^{\top} \mathrm A \, \mathrm x$, where $\mathrm A \in \mathbb R^{n \times n}$ is given. Hence,

$$f (\mathrm x + h \mathrm v) = (\mathrm x + h \mathrm v)^{\top} \mathrm A \, (\mathrm x + h \mathrm v) = f (\mathrm x) + h \langle \mathrm v, \mathrm A \, \mathrm x \rangle + h \langle \mathrm A^{\top} \mathrm x, \mathrm v \rangle + h^2 \mathrm v^{\top} \mathrm A \, \mathrm v$$

The directional derivative of $f$ in the direction of $\mathrm v$ at $\mathrm x$ is, thus,

$$D_{\mathrm v} f (\mathrm x) = \langle \mathrm v, \mathrm A \, \mathrm x \rangle + \langle \mathrm A^{\top} \mathrm x, \mathrm v \rangle = \langle \mathrm v, (\mathrm A + \mathrm A^{\top}) \, \mathrm x \rangle$$

and the gradient of $f$ is

$$\boxed{\quad \nabla f (\mathrm x) = (\mathrm A + \mathrm A^{\top}) \, \mathrm x = 2\left(\frac{\mathrm A + \mathrm A^{\top}}{2}\right) \, \mathrm x \quad}$$

where $\dfrac{\mathrm A + \mathrm A^{\top}}{2}$ is the symmetric part of $\mathrm A$.