Derivative of $Ax x^\top A$ with respect to $x$

Question

I do not want to use index notation.

I want to compute the derivative

$$ D_x (Axx^\top A) = ? $$ where $A$ is an $n\times n$ symmetric matrix and $x$ in a vector in $\mathbb{R}^n$. I tried resources such as the matrix calculus cookbook but they don't deal with scenarios like this: Here the function $f(x) = Axx^\top A$ takes a vector as input and returns a matrix output.

It is possible to express this without using index notation and I want this type of answers. I would like step by step, to figure out how I can go about performing similar calculations in the future.

Attempt

One attempt is using the Frechet derivative definition (I will use the Frobenius norm) $$ \begin{align} \lim_{\|v\|\to 0} \frac{\|A(x+v)(x+v)^\top A - Axx^\top A - Dv\|_F}{\|v\|} &= \lim_{\|v\|\to 0} \frac{\|A(xv^\top +vv^\top + vx^\top)A - Dv\|_F}{\|v\|} \end{align} $$

Since the result is a third order tensor you cannot avoid writing ${\left( \frac{\partial T}{\partial x}\right)^{lm}}_i={A^j}_i x_j A^{lm}

{A^m}_i x^k {A^l}_k = (Ax)_i A^{lm} + {A^m}_i (x^TA)^l $ where $T=Axx^TA$ — Ted Black, Mar 07 '24 at 22:38

score 6 · Accepted Answer · answered Mar 06 '24 at 16:11

6

Let's look at pertubations $$f(x+v) = A(x+v)(x+v)^TA = Axx^TA + Axv^TA + Avx^TA + Avv^TA$$

The derivate is often defined as the unique linear function such that: $$f(x+v) = f(x) + D_{f;x}(v) + \mathcal{o}(v)$$ as $v\rightarrow 0$.

Thus $D_{f;x}: v\mapsto A(xv^T+vx^T)A$ is the linear derivative map. We can express this not as a matrix, but a tensor of 3-rd order (a matrix would be a second order tensor). However, compact expressions for tensors of higher order are elusive and so index notations are used.

answered Mar 06 '24 at 16:11

Snake707

1,041
5
7

Where did $Avv^\top A$ go? – Euler_Salter Mar 06 '24 at 16:23
3

$\lim_{|v|\to0} \frac{Avv^TA}{|v|}=0$ since it is quadratic in $v$. You can discard all higher order terms when computing the Frechet derivative. – whpowell96 Mar 06 '24 at 17:05

Quertiopler · Answer 2 · 2024-03-11T12:40:39.010

Let $v\in\mathbb R^n$ such that $v\neq 0_n$. Then \begin{align*} 0&\leq\frac{\Vert A(xv^\top + vx^\top)A - D_xv + Avv^\top A\Vert}{\Vert v\Vert} \\&\leq \frac{\Vert A(xv^\top+vx^\top)A - D_xv\Vert}{\Vert v\Vert} + \frac{\Vert Avv^\top A\Vert}{\Vert v\Vert} \\ &= \frac{\Vert A(xv^\top+vx^\top)A - D_xv\Vert}{\Vert v\Vert} + \frac{\Vert(Av)(Av)^\top\Vert}{\Vert v\Vert} \\ &\leq \frac{\Vert A(xv^\top+vx^\top)A - D_xv\Vert}{\Vert v\Vert} + \frac{\Vert Av\Vert\cdot\Vert Av\Vert}{\Vert v\Vert} \\ &\leq\frac{\Vert A(xv^\top+vx^\top)A - D_xv\Vert}{\Vert v\Vert} + \frac{\Vert A\Vert^2\Vert v\Vert^2}{\Vert v\Vert} \\ &=\frac{\Vert A(xv^\top+vx^\top)A - D_xv\Vert}{\Vert v\Vert} + \Vert A\Vert^2\Vert v\Vert,\end{align*} by Triangle inequlty and operator norm inequality (2x). The term $\Vert A\Vert^2\Vert v\Vert$ converges to $0$ as $\Vert v\Vert\rightarrow 0$. Now consider the linear map $D_x:\mathbb R^n\rightarrow\mathbb R^{n\times n}$ given by $D_xv = A(xv^\top + vx^\top)A$. For this choice, we have that $$\frac{\Vert A(xv^\top+vx^\top)A - D_xv\Vert}{\Vert v\Vert} = 0.$$ Consequently, this expression (trivially) converges to $0$ as $\Vert v\Vert\rightarrow 0$. That is, this $D_x$ is the Frechet derivative of $f(x) = Axx^\top A$.

But be careful: we can only compute the "point evaluation" $D_x$ (and not $D$ withouth the subscript) in this sense as $D$ would be an element of $\mathbb R^{n\times n\times n}$ (i.e., a third order tensor).

In tensorial form, the derivative can be expressed as $(Ax⊗A)+(A⊗Ax)$; see https://math.stackexchange.com/a/3341343/99220 — Hyperplane, Mar 06 '24 at 17:41
@Hyperplane Nice! What is the symbol $\otimes$ representing here? — Euler_Salter, Mar 07 '24 at 11:35

Derivative of $Ax x^\top A$ with respect to $x$

Attempt

2 Answers2