0

I see here: Vector derivative w.r.t its transpose $\frac{d(Ax)}{d(x^T)}$ that that which is stated in the title is true. However, I tried deriving it myself.

$U = \mathbf{x}^T\mathbf{A}$

$\frac{\partial(\mathbf{x}^T\mathbf{Ax})}{\partial \mathbf{x}} = \frac{\partial U}{\partial\mathbf{x}} \mathbf{x} + U\frac{\partial\mathbf{x}}{\partial\mathbf{x}}$

$\frac{\partial(\mathbf{x}^T\mathbf{Ax})}{\partial \mathbf{x}} = \frac{\partial \mathbf{x}^T\mathbf{A}}{\partial\mathbf{x}} \mathbf{x} + \mathbf{x}^T\mathbf{A}$

$\frac{\partial(\mathbf{x}^T\mathbf{Ax})}{\partial \mathbf{x}} = \mathbf{A}^T\mathbf{x} + \mathbf{x}^T\mathbf{A} = 2\mathbf{x}^T\mathbf{A} \neq x^TA^T+x^TA$

Am I applying product rule correctly?

3 Answers3

0

Let $$f(x)=x^t A x$$ Then we have that $$f(x+h)=(x+h)^t A (x+h)=x^tAx+x^tAh+h^tAx+h^tAh$$ I.e. $$f(x+h)-f(x)=x^tAh+h^tAx+h^tAh=x^tAh+x^tA^th+h^tAh=(x^tA+x^tA^t)h+h^tAh$$ Can you continue?

Botond
  • 11,938
  • Ah thanks! This makes the most intuitive sense to me. Although I'm still confused why the shortcuts don't work :( – Richard Nai Dec 17 '19 at 21:57
  • @RichardNai I'm going to look through your work again, be right back. – Botond Dec 17 '19 at 22:08
  • @RichardNai I think the problem happens when you are taking the derivative of $U$, because the other term is fine. And note that if $x$ is a column vector, then $A^t x$ is a coulomn vector as well, but $x^t A$ will be a row vector. – Botond Dec 17 '19 at 22:28
  • Ahh that makes sense, the x should go out in front to preserve the dimensionality, so it should be $xA^t$. Thanks! – Richard Nai Dec 21 '19 at 15:23
0

Explicit indices help:$$\begin{align}\frac{\partial(x^TAx)}{\partial x_i}&=\partial_i(x_jA_{jk}x_k)\\&=\delta_{ij}A_{jk}x_k+x_jA_{jk}\delta_{ik}\\&=A_{ik}x_k+x_jA_{ji}\\&=(Ax+A^Tx)_i,\end{align}$$so the derivative you sought is $(A+A^T)x$ or the transpose, $x^T(A+A^T)$, depending on how you define it. The second option results from $df=\frac{df}{dx}dx$.

J.G.
  • 115,835
0

Maybe changing notation helps? I'll do it in a different way since you already got two good answers. You're looking for the gradient of the function $f\colon \Bbb R^n \to \Bbb R$ given by $f(x) = \langle Ax,x\rangle$, where $A\colon \Bbb R^n\to \Bbb R^n$ is linear. For every bilinear map $B$ we have that $$DB(x,y)(h,k) = B(x,h)+B(h,k),$$and $f = B \circ \Delta$, where $B(x,y) = \langle Ax,y\rangle$ is bilinear and $\Delta(x) = (x,x)$ is the (linear) diagonal embedding. So the chain rule kicks in and we have that $$\begin{align} Df(x)(h) &= D(B\circ \Delta)(x)(h) = DB(\Delta(x)) \circ D\Delta(h) \\ &= DB(x,x)(h,h) = B(x,h)+B(h,x) \\ &= \langle Ax,h\rangle + \langle Ah,x\rangle = \langle Ax,h\rangle + \langle h, A^\top x\rangle \\ &= \langle Ax+A^\top x, h\rangle. \end{align}$$This means that $\nabla f(x) = Ax+A^\top x$, as wanted.

Ivo Terek
  • 77,665
  • Ahh this is a bit too high level for me to wrap my head around. I haven't worked with bilinear maps, and I don't understand the intuition behind the derivative in equation 1. But thanks for this answer! – Richard Nai Dec 17 '19 at 22:06