1

I have an $n \times r$ matrix $A$ and an $r \times m$ matrix $B$.

What will the derivative of $ABB^T$ with respect to $A$ be? Is it simply $BB^{T}$?

qwerty
  • 65
  • The derivative of a matrix-valued function with respect to a matrix is a 4-dimensional matrix. The derivative of each entry of the output with respect to the input is a (2-dimensional) matrix itself. – Rodrigo de Azevedo Aug 19 '18 at 09:41
  • Can you please explain with an example – qwerty Aug 19 '18 at 09:43
  • The derivative is a linear map which approximates the given map in the first order. As a consequence, the derivative of a linear map is always the map itself. So the derivative of $A\mapsto ABB^T$ is $V\mapsto VBB^T$, since the original map is linear in $A$. – Thomas Aug 19 '18 at 09:52

2 Answers2

1

Consider the matrix-valued function

$$\mathrm F (\mathrm X) := \mathrm X \mathrm C$$

The $(i,j)$-th entry of the output is

$$f_{ij} (\mathrm X) := \mathrm e_i^\top \mathrm F (\mathrm X) \,\mathrm e_j = \mathrm e_i^\top \mathrm X \mathrm C \,\mathrm e_j = \mathrm e_i^\top \mathrm X \,\mathrm c_j = \mbox{tr} \left( \mathrm c_j\mathrm e_i^\top \mathrm X \right) = \langle \mathrm e_i\mathrm c_j^\top, \mathrm X \rangle$$

where $\mathrm c_j$ is the $j$-th column of $\mathrm C$ and $\langle \cdot, \cdot \rangle$ denotes the Frobenius inner product. Hence, the gradient of $f_{ij}$ with respect to $\mathrm X$ is

$$\nabla f_{ij} (\mathrm X) = \mathrm e_i\mathrm c_j^\top$$

  • You can develop this answer further by writing $$\nabla f_{ij}=e_ie_j^TC^T=E_{ij}C^T$$ where $E_{ij}$ is the single-entry matrix. But this is really just the $(i,j)^{th}$ component of a fourth-order tensor: $,{\mathcal E}{ijkl}=\delta{ik}\delta_{jl}.,$ Thus you can write the result as a full tensor equation $$\nabla F={\mathcal E}C^T$$ – greg Aug 19 '18 at 16:20
  • @greg Unfortunately, I know nothing about tensors. – Rodrigo de Azevedo Aug 19 '18 at 16:21
0

This answer provides a different way of writing the results than that provided by Rodrigo. I have every belief his analysis to be correct, but I personally find this particular method a little more intuitive and general without having to introduce tensor concepts.

Consider the operation of restructuring a matrix as a long vector by concatenating successive columns. For a given matrix $W$, this operator is often given the notation $vec\left( W \right)$ (see this Wikipedia page). Then, there exists a useful property that for any three matrices $X$, $Y$ and $Z$ $$vec\left( {XYZ} \right) = \left( {{Z^T} \otimes X} \right)vec\left( Y \right)$$ where $\otimes$ is the Kronecker product (see this Wikipedia page).

Now the derivative of an $n\,x\,m$ matrix $W$ WRT itself is $$\frac{{\partial W}}{{\partial W}} = \frac{{\partial \,vec\left( W \right)}}{{\partial \,vec\left( W \right)}} = {I_{\left[ {nm} \right]}}$$ Then, for $X = {I_{\left[ n \right]}}$, $Y = A$, and $Z = B{B^T}$ $$vec\left( {AB{B^T}} \right) = \left( {B{B^T} \otimes {I_{\left[ n \right]}}} \right)vec\left( A \right)\quad \Rightarrow \quad \frac{{\partial \left( {AB{B^T}} \right)}}{{\partial A}} = B{B^T} \otimes {I_{\left[ n \right]}}$$