0

Here is my problem: We have $\mathbf{D} \in \Re^{m n}$, $\mathbf{W} \in \Re^{m q}$, and $\mathbf{X} \in \Re^{q n}$. Furthermore, $\mathbf{D} = \mathbf{W}\mathbf{X}$. (NOT an element wise multiplication - a normal matrix-matrix multiply).

I am trying to derive the derivative of $f(\mathbf{D})$, w.r.t $\mathbf{W}$, and the derivative of $f(\mathbf{D})$, w.r.t $\mathbf{X}$.

My class note this is taken from seems to indicate that $$ \frac{\delta \mathbf{f}}{\delta \mathbf{W}} = \frac{\delta \mathbf{f}}{\delta \mathbf{D}} \mathbf{X}^{T} \text{ and that } \frac{\delta \mathbf{f}}{\delta \mathbf{X}} = \mathbf{W}^{T} \frac{\delta \mathbf{f}}{\delta \mathbf{W}}, $$

I understand the chain rule. But I struggle to see the transpose, and how come sometimes $\frac{\delta \mathbf{f}}{\delta \mathbf{W}}$ is on the left, and sometimes it is on the right. Please try to explain as clean and simple as possible. !

wrek
  • 485
  • This is a little hard to answer without more context. Computing the (Fréchet) derivative is straightforward in terms of the (Fréchet) derivative of $f$, to wit: Let $w(X) = WX$, then $Dw(X)H = WH$. Let $g = f \circ w$, then $Dg(X)(H) = Df(w(X)) (Dw(X)(H)) = Df(WX) (WH)$. However, you can't in general just write the derivative as a matrix multiply, for example, the derivative of $f(D) =AD + A^TD$ (which is $f$ itself, since $f$ is linear) cannot be written as a matrix multiply. – copper.hat Feb 01 '24 at 05:07
  • This related post should answer your questions. – greg Feb 01 '24 at 05:19
  • @greg Nice answer. – copper.hat Feb 01 '24 at 06:17

0 Answers0