Derivative of $A^\top A$

Question

Let $f(A):= A^\top A$ where $A$ is an $m \times n$ matrix. We want to find the derivative of $f$ with respect to $A$. By derivative we mean to find the Jacobian of all partial derivatives of $f(A)$ with respect to $A$. Here is how I proceed.

The Derivative of $f$ is the linear map $D f(A): X \to A^\top X + X^\top A$. Let $K$ be the commutation matrix such that $K\operatorname{vec}(X^\top A) = \operatorname{vec}(A^\top X)$. Then,

\begin{align} \operatorname{vec}(A^\top X + X^\top A) & = \operatorname{vec}(A^\top X) + \operatorname{vec}(X^\top A) \\ & = (I_n\otimes A^\top) \operatorname{vec}(X) + \operatorname{vec}(X^\top A) \\ & = I_n (\otimes A^\top) \operatorname{vec}(X) + K_{n,n} \operatorname{vec}(A^\top X) \\ & = (I_n \otimes A^\top) \operatorname{vec}(X) + K_{n, n} (I_n \otimes A^\top) \operatorname{vec}(X) \end{align}

It now follows that \begin{align} \frac{\partial f}{\partial A} & = (I_n \otimes A^\top) + K_{n, n} (I_n \otimes A^\top) \end{align}

In here I am using the fact that $\operatorname{vec}(AXB) = (B^\top \otimes A)\operatorname{vec}(X)$ where $\operatorname{vec}$ is the vectorization operator.

I was inspired by this answer and the corresponding equation under the section Differentials of Quadratic Products on this webpage

My Questions:

Is this approach correct?. If not how do I go about finding the desired derivative?
Where can I find references regarding this type of manipulation?. (I don't mean this particular manipulation, but a reference for derivatives of matrices in general). I looked on Horn and Johnson Matrix Analysis, but a 'commutation matrix' is nowhere to be found. When I say reference, I mean a rigorous linear algebraic exposition.

greg · Accepted Answer · 2018-07-07T18:31:23.100

Take the differential of the expression $$\eqalign{ F &= A^TA \cr dF &= dA^T\,A + A^T\,dA \cr }$$ At this point, you can either use vectorizations $$\eqalign{ {\rm vec}(dF) &= {\rm vec}(dA^T\,A) + {\rm vec}(A^T\,dA) \cr df &= (A^T\otimes I)(K\,da) + (I\otimes A^T)\,da \cr \frac{\partial f}{\partial a} &= (A^T\otimes I)K + (I\otimes A^T) \cr }$$ or tensor methods $$\eqalign{ dF &= (I{\mathcal E}A^T):({\mathcal K}:dA) + (A^T{\mathcal E}I):dA \cr \frac{\partial F}{\partial A} &= ({\mathcal E}A^T):{\mathcal K} + A^T{\mathcal E} \cr }$$ where a colon represents the double-contraction product, i.e. $$(X:{\mathcal E})_{kl} = \sum_{ij} X_{ij} {\mathcal E}_{ijkl} $$ while juxtapositions represent single-contractions $$(X{\mathcal E}Y)_{ikmr} = \sum_{jp} X_{ij} {\mathcal E}_{jkmp} Y_{pr} $$ The isotropic 4th order tensors have components $$\eqalign{ {\mathcal E}_{ijkl} &= \delta_{ik} \delta_{jl} \cr {\mathcal K}_{ijkl} &= \delta_{il} \delta_{jk} \cr\cr }$$ For references, try
"Matrix Differential Calculus" by Magnus and Neudecker
"Complex-Valued Matrix Derivatives" by Are Hjorungnes

score 0 · Answer 2 · answered Jul 07 '18 at 13:22

You are close. By my calculation (checked on a $2\,x\,2$ example) $$\frac{\partial }{{\partial \underline {\overline {\bf{A}} } }}\left( {{{\underline {\overline {\bf{A}} } }^T}\underline {\overline {\bf{A}} } } \right) = \left( {{{\underline {\overline {\bf{I}} } }_{\left[ n \right]}} \otimes {{\underline {\overline {\bf{A}} } }^T}} \right) + \left( {{{\underline {\overline {\bf{A}} } }^T} \otimes {{\underline {\overline {\bf{I}} } }_{\left[ n \right]}}} \right){\underline {\overline {\bf{K}} } _{\left[ {m,n} \right]}}$$ Derivation: $$\frac{\partial }{{\partial \underline {\overline {\bf{A}} } }}\left( {{{\underline {\overline {\bf{A}} } }^T}\underline {\overline {\bf{A}} } } \right) = {\left. {\frac{\partial }{{\partial \underline {\overline {\bf{A}} } }}\left( {{{\underline {\overline {\bf{A}} } }^T}\underline {\overline {\bf{A}} } } \right)} \right|_{{{\underline {\overline {\bf{A}} } }^T}{\rm{ constant}}}} + {\left. {\frac{\partial }{{\partial \underline {\overline {\bf{A}} } }}\left( {{{\underline {\overline {\bf{A}} } }^T}\underline {\overline {\bf{A}} } } \right)} \right|_{\underline {\overline {\bf{A}} } {\rm{ constant}}}}$$ For the first term $${\underline {\overline {\bf{A}} } ^T}\underline {\overline {\bf{A}} } = {\underline {\overline {\bf{A}} } ^T}\underline {\overline {\bf{A}} } \,{\underline {\overline {\bf{I}} } _{\left[ n \right]}} = \left( {{{\underline {\overline {\bf{I}} } }_{\left[ n \right]}} \otimes {{\underline {\overline {\bf{A}} } }^T}} \right){\rm{vec}}\left( {\underline {\overline {\bf{A}} } } \right)$$ so that $${\left. {\frac{\partial }{{\partial \underline {\overline {\bf{A}} } }}\left( {{{\underline {\overline {\bf{A}} } }^T}\underline {\overline {\bf{A}} } } \right)} \right|_{{{\underline {\overline {\bf{A}} } }^T}{\rm{ constant}}}} = \left( {{{\underline {\overline {\bf{I}} } }_{\left[ n \right]}} \otimes {{\underline {\overline {\bf{A}} } }^T}} \right)$$ For the second term $${\underline {\overline {\bf{A}} } ^T}\underline {\overline {\bf{A}} } = {\underline {\overline {\bf{I}} } _{\left[ n \right]}}{\underline {\overline {\bf{A}} } ^T}\underline {\overline {\bf{A}} } = \left( {{{\underline {\overline {\bf{A}} } }^T} \otimes {{\underline {\overline {\bf{I}} } }_{\left[ n \right]}}} \right){\rm{vec}}\left( {{{\underline {\overline {\bf{A}} } }^T}} \right) = \left( {{{\underline {\overline {\bf{A}} } }^T} \otimes {{\underline {\overline {\bf{I}} } }_{\left[ n \right]}}} \right){\underline {\overline {\bf{K}} } _{\left[ {m,n} \right]}}{\rm{vec}}\left( {\underline {\overline {\bf{A}} } } \right)$$ so that $${\left. {\frac{\partial }{{\partial \underline {\overline {\bf{A}} } }}\left( {{{\underline {\overline {\bf{A}} } }^T}\underline {\overline {\bf{A}} } } \right)} \right|_{\underline {\overline {\bf{A}} } {\rm{ constant}}}} = \left( {{{\underline {\overline {\bf{A}} } }^T} \otimes {{\underline {\overline {\bf{I}} } }_{\left[ n \right]}}} \right){\underline {\overline {\bf{K}} } _{\left[ {m,n} \right]}}$$ I found it a challenge to stitch together all the different results required to do this type of calculation proficiently (which I needed to compute the Jacobian determinant of SVD transformations). One very useful reference that dealt with elimination and commutation matrices is:

Magnus, J., and Neudecker, H., “The Elimination Matrix: Some Lemmas and Applications,” SIAM J. on Algebraic. and Discrete Meth., V. 1, Issue 4, pp 422-449, Dec. 1980.

However, this doesn’t cover anything to do with the calculus side of things. I ended up compiling my own list of useful results, which (for the real case) can be found here in Section 3. The fact that it is Rev 8 gives you a sense of how easy it is to mess things up.

Derivative of $A^\top A$

2 Answers2

Linked