I want to know how to find an expression for $$\frac{\partial (AXA^T)}{\partial A}$$ where no information is given a priori on the dimensions of $A$ and $X$.
The question is related to machine learning but I am not given any additional details on the nature of the matrices; I am only given the result: $$\frac{\partial (AXA^T)}{\partial A}=A(X+X^T)$$ (in Andrew's answer below it is shown this is only the result if A is size $(1\times k)$, i.e. a row vector)
I have seen similar questions on the forum and was trying to approach this by differentiating the given product: \begin{align}\mathrm{d}(AXA^T)&=\mathrm{d}(AX)A^T+AX\mathrm{d}(A^T)= \left[ \mathrm{d}AX+A\mathrm{d}X \right]A^T+AX(\mathrm{d}A)^T=\\ &=\mathrm{d}AXA^T+AX(\mathrm{d}A)^T+ A\mathrm{d}XA^T\end{align}
Then setting $\mathrm{d}X$ to zero (since we are derivating with $X$ constant): \begin{align} \partial(AXA^T)= \partial AXA^T+AX(\partial A)^T \end{align}
Here I get stuck because I am unable to express it in a way I can premultiply by $(\partial A)^{-1}$ and obtain my derivative.
I have tried by attempting to transpose twice the second term on the right hand side to get \begin{align}\partial(AXA^T)= \partial AXA^T + \left((\partial A)X^TA^T\right)^T \end{align} and thought maybe there are symmetry assumptions in the solution I was given, to finally lead to it. I have also seen quite similar results in the Matrix Cookbook (e.g. formulae 79 and 80), but they are not the same and are given in index notation which is confusing me a little bit more; also I would like to actually learn how to calculate them since I have never come up with this kind of derivatives (with respect to matrices) and do not even know how exactly they are defined.
I have also tried to proceed with the calculus rules (product rule of derivatives) but felt I was probably missing things and am not sure if they hold in their usual form here.
I would appreciate your help in any of those questions.
EDIT:
The clarification given by the authors of this exercise is to just use the simple product rule (I am unsure if this is actually possible with matrices, at least without introducing any special products): \begin{align} \frac{\partial (AXA^T)}{\partial A} = \frac{\partial A}{\partial A}XA^T+A\frac{\partial XA^T}{\partial A} = (XA^T)^T+AX=AX^T+AX=A(X+X^T) \end{align} saying on the side that they have applied the property: $\frac{ \partial A}{\partial A}B= B^T$, which according to them follows from $\left[ \frac{\partial A}{\partial A}B\right]_i=\frac{\partial \sum_{k=1}^n A_k B_k}{\partial A_i}=B_i$, "for the i-th element". (I cannot see how this property follows from there either, and how these operations are performed that way with a just single index and with different-dimension matrices.)