I am trying to learn how to replicate the matrix calculus done in the following paper: https://arxiv.org/pdf/1811.11433.pdf. To learn how to do this, I a using the following book I found (https://www.mobt3ath.com/uplode/book/book-33765.pdf), by Karim Abadir and Jan Magnus.
I attempted to start by find the differential of function H given below. However, it does not look like I am on the right track. Can someone tell me if my calculations below are correct so far? Or at least if I am using the correct book to be able to understand the paper I listed? I noticed that the book uses the 'vec' operator to treat the Hessian of a matrix function as a matrix while the paper uses an order 4 tensor, so I am not sure if I am using the right approach. Thanks for the help.
My work so far:
Let $H(B)=\log\det BCB^T$ where $B$ and $C$ are square matrices of dimension $n$ and $C$ is symmetric. Let $F(B)=BCB^T$ and $G(R)=\log\det R$ so that $H(B)=G(F(B))$.
\begin{align*} dF &= d(B)CB^T + BCd(B^T) \hspace{0.4cm} dG(R) = Tr[R^{-1} dR] \\ \\ dH &= Tr[(BCB^T)^{-1} (d(B)CB^T + BCd(B^T))] \textbf{ Take transpose}\\ &= Tr[(BCd(B)^T+d(B)CB^T)(BCB^T)^{-1}] \\ &=Tr[BCd(B)^T(BCB^T)^{-1}] + Tr[(d(B)CB^T(BCB^T)^{-1}] \\ &=Tr[BCd(B)^T(B^T)^{-1}C^{-1}B^{-1}] + Tr[(d(B)CB^T(B^T)^{-1}C^{-1}B^{-1}] \textbf{ Use cyclic property}\\ &= Tr[(B^T)^{-1} d(B)^T] + Tr[B^{-1} d(B)] = 2* Tr[B^{-1}d(B)] \end{align*}
The corresponding total derivative is then $DH=2*(vec (B^{-1}))^T$ by the book's notation. Then I assume I would just 'unvectorize' this to get the derivative in the paper's notation? Is this a good start to calculating the gradient of the loss function in the paper I listed. Thanks.