Derivative of matrices product AXA^T with respect to A. (Plus result when A is a vector.)

Question

I want to know how to find an expression for $$\frac{\partial (AXA^T)}{\partial A}$$ where no information is given a priori on the dimensions of $A$ and $X$.

The question is related to machine learning but I am not given any additional details on the nature of the matrices; I am only given the result: $$\frac{\partial (AXA^T)}{\partial A}=A(X+X^T)$$ (in Andrew's answer below it is shown this is only the result if A is size $(1\times k)$, i.e. a row vector)

I have seen similar questions on the forum and was trying to approach this by differentiating the given product: \begin{align}\mathrm{d}(AXA^T)&=\mathrm{d}(AX)A^T+AX\mathrm{d}(A^T)= \left[ \mathrm{d}AX+A\mathrm{d}X \right]A^T+AX(\mathrm{d}A)^T=\\ &=\mathrm{d}AXA^T+AX(\mathrm{d}A)^T+ A\mathrm{d}XA^T\end{align}

Then setting $\mathrm{d}X$ to zero (since we are derivating with $X$ constant): \begin{align} \partial(AXA^T)= \partial AXA^T+AX(\partial A)^T \end{align}

Here I get stuck because I am unable to express it in a way I can premultiply by $(\partial A)^{-1}$ and obtain my derivative.

I have tried by attempting to transpose twice the second term on the right hand side to get \begin{align}\partial(AXA^T)= \partial AXA^T + \left((\partial A)X^TA^T\right)^T \end{align} and thought maybe there are symmetry assumptions in the solution I was given, to finally lead to it. I have also seen quite similar results in the Matrix Cookbook (e.g. formulae 79 and 80), but they are not the same and are given in index notation which is confusing me a little bit more; also I would like to actually learn how to calculate them since I have never come up with this kind of derivatives (with respect to matrices) and do not even know how exactly they are defined.

I have also tried to proceed with the calculus rules (product rule of derivatives) but felt I was probably missing things and am not sure if they hold in their usual form here.

I would appreciate your help in any of those questions.

EDIT:

The clarification given by the authors of this exercise is to just use the simple product rule (I am unsure if this is actually possible with matrices, at least without introducing any special products): \begin{align} \frac{\partial (AXA^T)}{\partial A} = \frac{\partial A}{\partial A}XA^T+A\frac{\partial XA^T}{\partial A} = (XA^T)^T+AX=AX^T+AX=A(X+X^T) \end{align} saying on the side that they have applied the property: $\frac{ \partial A}{\partial A}B= B^T$, which according to them follows from $\left[ \frac{\partial A}{\partial A}B\right]_i=\frac{\partial \sum_{k=1}^n A_k B_k}{\partial A_i}=B_i$, "for the i-th element". (I cannot see how this property follows from there either, and how these operations are performed that way with a just single index and with different-dimension matrices.)

The result that you've written is the gradient of the trace of the quantity, i.e. $\frac{\partial,{\rm Tr}(AXA^T)}{\partial A}$ — greg, Sep 28 '19 at 21:47
Based on the clarification, the quantity $A$ is not a matrix, it's a row vector. You should update the question to reflect this. — greg, Sep 29 '19 at 04:16
Possible duplicate of How to take the gradient of the quadratic form? — Rodrigo de Azevedo, Oct 02 '19 at 07:51
@Rodrigo de Azevedo I renamed the question again to account this was originally asked with all variables being matrices in general — abcd, Oct 02 '19 at 17:05

Eman Yalpsid · Accepted Answer · 2019-09-29T08:12:43.847

You should be able to find the derivative wrt. $A$ the usual way: by calculating the partial derivatives of the coordinates. For example letting $C = A X A^T$ we can find the $(\alpha,\beta)$th partial derivative of $(C)_{i,j} = c_{i,j}$ as follows: \begin{align*} \partial_{\alpha, \beta} \, c_{i,j} &= \partial_{\alpha, \beta} \left( \sum_{k=1}^n \sum_{l=1}^n a_{i,k}x_{k,l}a_{j,l} \right) \\ &= \sum_{k=1}^n \sum_{l=1}^n\partial_{\alpha, \beta}\left(a_{i,k}x_{k,l}a_{j,l} \right) \\ &= \sum_{k=1}^n \sum_{l=1}^n\partial_{\alpha, \beta}(a_{i,k})x_{k,l}a_{j,l} + a_{i,k}x_{k,l}\partial_{\alpha, \beta}(a_{j,l}) \\ &= \sum_{l=1}^n\partial_{\alpha, \beta}(a_{i,\beta})x_{\beta,l}a_{j,l} + \sum_{k=1}^na_{i,k}x_{k,\beta}\partial_{\alpha, \beta}(a_{j,\beta}) \\ &= \delta_{i,\alpha} \sum_{l=1}^n x_{\beta,l}a_{j,l} + \delta_{j,\alpha} \sum_{k=1}^na_{i,k}x_{k,\beta} \\ &= \delta_{i,\alpha} (A X^T)_{j,\beta} + \delta_{j,\alpha} (A X)_{i, \beta}, \end{align*} where $$ \delta_{i,j} = \begin{cases} 1,\quad i=j \\ 0, \quad i\neq j. \end{cases} $$ If you are interested in the derivative of $\operatorname{tr} (AXA^T)$, then that's not too hard from here: \begin{align*} \partial_{\alpha, \beta} \, \sum_{i=1}^n c_{i,i} &= \sum_{i=1}^n\delta_{i,\alpha} (A X^T)_{i,\beta} + \delta_{i,\alpha} (A X)_{i, \beta} \\ &= \sum_{i=1}^n\delta_{i, \alpha}\left(A (X^T + X)\right)_{i,\beta} \\ &= \left(A (X^T + X)\right)_{\alpha,\beta}. \end{align*}

Some thoughts on the clarification. Based on the formulae, I'd agree that $A$ is a row-vector but let's check.

When dealing with derivatives of vectors it's sometimes useful to type-check formulae. Let us assume then that $$\frac{ \partial AB}{\partial A}= B^T$$ holds, where $A$ and $B$ are matrices. If the matrix-multiplication makes sense, then for some $n, m, k$ $$ A \in \mathbb{R}^{m \times k}, B \in \mathbb{R}^{k \times n}$$ holds, i.e. the map we want to find the derivative of is the following (with type annotations) \begin{align*} \varphi \colon\, &\mathbb{R}^{m \times k} \to \mathbb{R}^{m \times n}\\ &A \mapsto AB. \end{align*}

If we know this, then the derivative is by definition of the type $$ \varphi'\colon \,\mathbb{R}^{m \times k} \to \hom(\mathbb{R}^{m \times k}, \mathbb{R}^{m \times n}), $$ where $\hom(\mathbb{R}^{m \times k}, \mathbb{R}^{m \times n}) \sim \mathbb{R}^{(m \times n) \times (m \times k)}$ holds.

Assuming $\varphi'(A) = B^T$, we get $$ \mathbb{R}^{(m \times n) \times (m \times k)} \ni \varphi'(A) = B^T \in \mathbb{R}^{n \times k}, $$ for which to make sense $m=1$ is required (by a dimensional argument).

Okay, so according to this, $A$ needs to be a row vector, but $B$ can be a matrix. Let's assume this, and try to calculate derivative. If we denote by $C_i$ the $i$th column of $C$, then we have $$ AB = \sum A_i (B^T)_i, $$ which implies that $$ \frac{\partial {AB}}{\partial A_i} = (B^T)_i. $$ To sum it up, from all the formulae you have provided, I'd say $A \in \mathbb{R}^{1\times k}$ is a row-vector, $X \in \mathbb{R}^{k\times k}$ is a square matrix, and the formulae hold.

Thank you a lot! I understand this perfectly. However this contradicts the clarification provided by the people who proposed this exercise - I have edited and added it to the question. Might they be right in something at least, and be missing something? — abcd, Sep 29 '19 at 00:14
@abcd I've added some thoughts on the clarification provided. I don't think there's a contradiction, but I find the notation a bit confusing. — Eman Yalpsid, Sep 29 '19 at 08:14
thank you again, this makes things much clearer now. I do not know what hom() means and how the derivative is by definition of that type, but that's outside this question; anyway all dimensions definitely make sense now. @greg I updated the title and added a comment to the question to make it clearer to find for others — abcd, Sep 29 '19 at 13:25

Derivative of matrices product AXA^T with respect to A. (Plus result when A is a vector.)

1 Answers1

Linked