4

I want to compute the derivative of

\begin{align} f(x) = Axx^\top B^\top \label{eqn} \end{align}

with respect to $x$ where $A$ and $B$ are $n\times n$ matrices and $x$ is a (column) vector of size $n \times 1$. By this I mean the derivative of each component of $f(x)$ with respect to each component of $x$.

I can prove that if $$ g(x) = xx^\top $$

Then the derivative can be expressed as, $$ \frac{\partial g}{\partial x} = x \otimes I_n + I_n \otimes x $$ where $I_n$ is the $n\times n$ identity matrix. In here I am vectorizing $xx^\top$ and then taking the derivative with respect to each of the components of $x$.

Question: Is there a way to extend this result to $f(x)$. My gut feeling is that this should be possible. Any thoughts?. If that's not possible how do I go about computing it?

EDIT (After Rodrigo de Azevedo's comment): You are right. But I mean the derivative in the following flattened sense. I hope this makes it a bit clearer.

Let us consider the $2 \times 2$ case. Then $ Y =f(x)$ is a $2 \times 2$ matrix. If I vectorize $f(x)$ then I can view $f$ as, $$ f: \mathbb{R}^2 \to \mathbb{R}^4 $$ More precisely \begin{align} f: \begin{bmatrix} x_1 \\ x_2 \end{bmatrix} \to \begin{bmatrix} Y_{11} \\ Y_{21} \\ Y_{12} \\ Y_{22} \end{bmatrix} \end{align} Then by the symbol $\frac{\partial{f(x)}}{\partial{x}}$ I mean the following: \begin{align} \frac{\partial{f(x)}}{\partial{x}} & = \begin{bmatrix} \frac{\partial{Y_{11}}}{\partial{x_1}} & \frac{\partial{Y_{11}}}{\partial{x_2}} \\ \frac{\partial{Y_{21}}}{\partial{x_1}} & \frac{\partial{Y_{21}}}{\partial{x_2}} \\ \frac{\partial{Y_{12}}}{\partial{x_1}} & \frac{\partial{Y_{12}}}{\partial{x_2}} \\ \frac{\partial{Y_{22}}}{\partial{x_1}} & \frac{\partial{Y_{22}}}{\partial{x_2}} \end{bmatrix} \end{align}

Cousin
  • 3,525

3 Answers3

2

Clearly, the derivative of $f$ is the linear map $Df(x):v\mapsto Avx^TB^T + Axv^TB^T$. Using the identity $\operatorname{vec}(XYZ)=(Z^T\otimes X)\operatorname{vec}(Y)$, we get \begin{align} \operatorname{vec}(Avx^TB^T + Axv^TB^T) &=\operatorname{vec}(Avx^TB^T) + \operatorname{vec}(Axv^TB^T)\\ &=[(Bx)\otimes A]\operatorname{vec}(v) + [B\otimes(Ax)]\operatorname{vec}(v^T)\\ &=[(Bx)\otimes A+B\otimes(Ax)]v. \end{align} Therefore the Jacobian matrix of $f$ is $(Bx)\otimes A + B\otimes(Ax)$.

user1551
  • 139,064
  • Can you explain a bit more?. How do I see that the derivative of $f$ is the linear map as given in the first equation? – Cousin Mar 20 '18 at 06:19
  • 1
    @minibuffer It follows from the definition of derivative. $f(x+v)-f(x)=Avx^TB^T+Axv^T+B^T$ plus a second-order term in $v$. Therefore $Df(x)(v)=Avx^TB^T+Axv^T+B^T$. – user1551 Mar 20 '18 at 14:17
  • The Jacobian can be written, without the commutation matrix, as $$(Bx)\otimes A + B\otimes(Ax)$$ – greg Apr 30 '19 at 19:56
  • You've calculated $(Ax\otimes B)$. My comment is that $K(Ax\otimes B)$ can be written without the commutator matrix as $(B\otimes Ax)$. – greg May 01 '19 at 12:55
  • @greg Oh, you are right. Thanks. – user1551 May 01 '19 at 14:49
2

Let matrix-valued function $\mathrm F : \mathbb R^n \to \mathbb R^{n \times n}$ be defined by

$$\mathrm F (\mathrm x) := \mathrm A \mathrm x \mathrm x^\top \mathrm B^\top$$

where $\mathrm A, \mathrm B \in \mathbb R^{n \times n}$ are given. The $(i,j)$-th entry of $\rm F$ is a scalar field given by

$$f_{ij} (\mathrm x) := \mathrm e_i^\top \mathrm A \mathrm x \mathrm x^\top \mathrm B^\top \mathrm e_j = \mathrm a_i^\top \mathrm x \mathrm x^\top \mathrm b_j = \mathrm x^\top \mathrm b_j \mathrm a_i^\top \mathrm x$$

where $\mathrm a_i^\top$ and $\mathrm b_j^\top$ are the $i$-th and $j$-th rows of matrices $\rm A$ and $\rm B$. Hence, the gradient of $f_{ij}$ is

$$\nabla f_{ij} = \color{blue}{\left( \mathrm a_i \mathrm b_j^\top + \mathrm b_j \mathrm a_i^\top \right) \mathrm x}$$

1

Your intuition is correct, the known derivative $$\eqalign{ G &= xx^T \\ g &= {\rm vec}(G) = x\otimes x \\ \frac{\partial g}{\partial x} &= x\otimes I + I\otimes x \\ }$$ can be used to calculate the new derivative. Just take care to distinguish between a matrix and its flattened vector form.

The calculation is straightforward. $$\eqalign{ F &= Axx^TB^T \\&= AGB^T \\ f &= {\rm vec}(F) \\&= (B\otimes A)\,{\rm vec}(G) \\&= (B\otimes A)\,g \\ \frac{\partial f}{\partial x} &= (B\otimes A)\;\frac{\partial g}{\partial x} \\ &= (B\otimes A)\;(x\otimes I + I\otimes x) \\ &= (Bx\otimes AI) + (BI\otimes Ax) \\ &= (Bx\otimes A) + (B\otimes Ax) \\ }$$

greg
  • 35,825