0

I have seen lots of people asking this question - $dF/dW = ??$ when $F = WX$. Here $W$ is a $m \times n$ matrix and $X$ is $n \times p$ matrix.

The simple answer they give is $X^{T}$. How did it appear to be like this?

I googled this question - CS231N of stanford gave an explanation of this thing. Yes if you derive it - it is supposed to be a higher order tensor (4 free indices). It is kind of like a matrix whose elements are itself a matrix.

In case you are thinking whether I checked this site questions before asking this question and thinking of closing this question - I would show some of my findings from here and other resources I came by.

  1. This question attempted to demystify the answer. The answer given here is elaborate. But wait a sec, here he mentioned that this can be realized using Kronecker product. Now isn't it a bit way around? What if we want to derive it from the basic rules? (Like multiply two matrices and then deriving each of the $mp$ terms w.r.t all the matrix elements of $X$.

  2. Resources mentioned in CS231N. Yes I checked those. I understand the materials on matrix derivative. And no, I can't find the correlation between these two.

What am I missing? How to derive these kind of expressions from the basics?

I want to make sure that I understand this. Thanks.


  1. The CS231N resource I mentioned. link - Vector, Matrix, and Tensor Derivatives Erik Learned-Miller
  2. Another resource from the same CS231N course link- Derivatives, Backpropagation, and Vectorization Justin Johnson
  • There are so many awesome people here. I just want to understand this clearly. Please, if any of you know just let me know. – mathbeginner Nov 26 '19 at 00:51

1 Answers1

4

In index notation, the function can be written as $$F_{ik} = W_{ij} X_{jk}$$ The indices $\{i,k\}$ are not repeated and are called "free" indices,
but $\{j\}$ is a repeated "dummy" index and is implicitly summed over.

Now calculate the derivative with respect to the component $W_{qr}$ $$\eqalign{ \frac{\partial F_{ik}}{\partial W_{qr}} &= \frac{\partial W_{ij}}{\partial W_{qr}}\;X_{jk} \\ &= \delta_{iq}\delta_{rj}\;X_{jk} \\ &= \delta_{iq}\;X_{rk} \\ }$$ The symbol $\delta_{iq}$ is called a Kronecker delta. When $i=q$ it equals ${\tt 1}$ otherwise it's equal to $0$.

Since the derivative has 4 free indices, it is a 4th order tensor, whose dimensions are $(m\times p\times m\times n)$

Since higher order tensors are awkward to work with, most texts flatten the matrices $(F,W)$ into the vectors $(f,w)$ and then calculate the derivative using ordinary matrix notation. $$\eqalign{ {\rm vec}(F) &= {\rm vec}(IWX) = (X^T\otimes I)\,{\rm vec}(W) \\ f &= (X^T\otimes I)\,w \\ df &= (X^T\otimes I)\,dw \\ \frac{\partial f}{\partial w} &= (X^T\otimes I) \\ }$$ This result is a matrix, not a tensor; the symbol $\otimes$ represents the Kronecker product.

greg
  • 35,825
  • Thanks for the reply. The formula you wrote $vec(F)=vec(IWX)$ is there any proof for this? So basically we are tranforming it to vector or matrix(provided correct basis)? – mathbeginner Nov 26 '19 at 09:17
  • Got the proof here. https://www.ime.unicamp.br/~cnaber/Kronecker.pdf But wait a sec, Kronecker product of (X^T x I) is a matrix with dimension? X=2x2 and I=2x2 then it would 4x4 right? (So cant be equal to x if x is 2x2..atleast the dmensions dont match)Following from here http://mathworld.wolfram.com/KroneckerProduct.html – mathbeginner Nov 26 '19 at 09:27
  • Sir can you please clarify the answer a bit more and comment on my last comment? I would be grateful. – mathbeginner Nov 26 '19 at 15:59
  • 1
    Choose whichever form you're most comfortable working with: the $(m\times p\times m\times n)$ tensor or the $ (pm\times mn)$ matrix. Note however, that derivative is definitely not equal to $,X^T,$ as stated in the second sentence of your question, unless the dimension $m=1$. – greg Nov 26 '19 at 17:18
  • .: Thanks. You can edit this in answer. I will select the answer anyway. – mathbeginner Nov 26 '19 at 19:00