I am trying to figure out a the derivative of a matrix-matrix multiplication, but to no avail. This document seems to show me the answer, but I am having a hard time parsing it and understanding it.
Here is my problem: We have $\mathbf{D} \in \Re^{m n}$, $\mathbf{W} \in \Re^{m q}$, and $\mathbf{X} \in \Re^{q n}$. Furthermore, $\mathbf{D} = \mathbf{W}\mathbf{X}$. (NOT an element wise multiplication - a normal matrix-matrix multiply).
I am trying to derive the derivative of $\mathbf{D}$, w.r.t $\mathbf{W}$, and the derivative of $\mathbf{D}$, w.r.t $\mathbf{X}$.
My class note this is taken from seems to indicate that $$ \frac{\delta \mathbf{D}}{\delta \mathbf{W}} = \mathbf{X}^{T} \text{ and that } \frac{\delta \mathbf{D}}{\delta \mathbf{X}} = \mathbf{W}^{T}, $$ but I am floored as to how he derived this. Furthermore, in taking the derivatives, we are asking ourselves how every element in $\mathbf{D}$ changes with perturbations by every element in, say, $\mathbf{X}$, - so wouldn't the resulting combinations blow up to be a-lot more than what $\mathbf{W}^{T}$ has? I cant even see how the dimensionality is right here.
EDIT: Id like to add the context of this question. It's coming from here, and here is my marked screen-shot of my problem. How are they deriving those terms? (Note: I understand the chain-rule aspect, and I am not wondering about that. I am asking about the simpler intermediate step).
Thanks.