3

please excuse the stupid question but I cant find anything online..

If $$f(\vec{x}) = \vec{x}^TA\vec{x}$$ with $A$ being a matrix, then $$ \frac{df}{d\vec{x}} = \vec{x}^T(A+A^T)$$ Can someone tell me why this is? And I am also interested in knowing what the derivatives of the following termes are: $$ \frac{d}{d\vec{x}}\vec{x}^T A, \qquad \frac{d}{d\vec{x}}A \vec{x}$$ as well as the derivatives with respect to a matrix H $$ \frac{d}{dH}H A , \qquad \frac{d}{dH}A H^T$$

Many thanks for your help.

Siminore
  • 35,136

2 Answers2

4

So you have a function $f \colon \mathbb{R}^n \to \mathbb{R}$, given by $f(x) = x^T Ax$. If you know the concept of derivative in $\mathbb{R}^n$, you just compute $$ f(x+h) -f(x) = (x+h)^TA(x+h) -x^TAx= x^TAh + h^TAx + O(|h|^2), $$ where $O(|h|^2)$ is a term that behaves like $|h|^2$ as $h \to 0$. Now you simply use the rules of transposition to conclude that $$ f(x+h) -f(x) = (x+h)^TA(x+h) -x^TAx= x^T(A+A^T)h + O(|h|^2), $$ and therefore $$ \lim_{|h| \to 0} \frac{f(x+h) -f(x) - x^T(A+A^T)h}{|h|}=0, $$ and, by definition, $Df(x)=x^T(A+A^T)$.

For the other functions, first of all check if they are linear (in $x$, or in $H$), because the derivative of a (continuous) linear function coincides with the function itself.

Siminore
  • 35,136
  • Thank you @Siminore, the part with the function is clear now. But what do you mean by "the derivative of a linear dunction coincides with the function itself" ? Is the derivative of x^t* A with respect to x equal to A^T ? but then what about the derivative of A*x? – bananamanana Jun 18 '14 at 12:06
  • If $f$ is linear, then $f'(x)=f$. The derivative of $x \mapsto Ax$ is $A$, i.e. the linear map $h \mapsto Ah$. – Siminore Jun 18 '14 at 12:21
  • Okay i got that.. but what about the derivative of $$ x \rightarrow x^T A $$ ? I think this is equal to $$ (A^T x)^T $$ and the solution transponed would be A? – bananamanana Jun 18 '14 at 12:28
  • The derivative acts as $h \mapsto h^TA$, you are right. – Siminore Jun 18 '14 at 14:20
3

Well you have to write the things...

$$f(x) = x^TAx = \sum_{i = 1}^n \sum_{j = 1}^n A_{i,j} x_i x_j$$ so using the classic definition of partial derivatives you get $$\frac{\partial}{\partial x_k}f(x) = \sum_{j = 1}^n A_{k,j}x_j+ \sum_{i = 1}^n A_{i,k} x_i = (A^Tx)_k+(Ax)_k=((A+A^T)x)_k,$$ using this shows $$\nabla f(x) = (A+A^T)x.$$

Note that you can see a matrix $H \in \mathbb{R}^{m \times n}$ as a vector $h \in \mathbb{R}^{nm}$ and then you just "reshape" the function to make the dimensions consistent.

Surb
  • 55,662
  • Thank you for the intresting way of showing it. – Yola Jan 05 '15 at 19:44
  • This was also how i learnt it. But how can I compute the derivative with respect to a matrix ? – Kong Jan 13 '18 at 11:33
  • 1
    @kong Exactly in the same way. With this example: if $g(A)=x^TAx$ then $\frac{\partial}{\partial {A_{k,l}}} g(A) =\frac{\partial}{\partial {A_{k,l}}}\sum_{i,j}A_{i,j}x_ix_j = x_kx_l$ – Surb Jan 13 '18 at 11:46
  • Thanks, and $\frac{d}{dA} = xx^T$. Is there an intermediary step that I can arrive to before the solution $xx^T$. I mean, I know the solution is $xx^T$ because it has to be a matrix. But going from $\frac{d}{dA_{kl}} = x_kx_l$ to $\frac{d}{dA} = xx^T$ is quite unintuitive for me (when handling bigger expressions) and i was wondering if there are additional steps in between that will make it clearer ? – Kong Jan 13 '18 at 12:41
  • 1
    @kong $(xx^T)_{k,l}=?$ – Surb Jan 13 '18 at 13:16
  • 1
    @Surb yes that was very simple and very intuitive :) Thank you very much – Kong Jan 13 '18 at 13:18