6

If there is some function that takes the transpose of a matrix such as $g(x) = x^t$ where $x$ is some square matrix.

What would then be the derivative of the function, $\frac{dg}{dx}$?

1 Answers1

2

$ \def\d{\delta} \def\o{{\tt1}}\def\p{\partial} \def\H{{\cal H}} \def\LR#1{\left(#1\right)} \def\BR#1{\Big(#1\Big)} \def\op#1{\operatorname{#1}} \def\trace#1{\op{Tr}\LR{#1}} \def\qiq{\quad\implies\quad} \def\qif{\quad\iff\quad} \def\grad#1#2{\frac{\p #1}{\p #2}} \def\cas#1{\begin{cases} #1\end{cases}} \def\c#1{\color{red}{#1}} \def\CLR#1{\c{\LR{#1}}} \def\gradLR#1#2{\LR{\grad{#1}{#2}}} $Define a fourth-order tensor $\H$ with components defined in terms of Kronecker deltas $$\eqalign{ \H_{ijkl} &= \d_{il}\d_{jk} = \cas{ \o \qquad {\rm if}\:\LR{i=l}\:{\rm and}\:\LR{j=k} \\ 0 \qquad {\rm otherwise}\\ } \\ }$$ and consider its double contraction product with an arbitrary matrix $X$ $$\eqalign{ \sum_{k=1}^n\sum_{l=1}^n \H_{ijkl} X_{kl} \;=\; \sum_{k=1}^n\sum_{l=1}^n \d_{il} \d_{jk} X_{kl} \;=\; X_{ji} \\ }$$ This is often written without the Sigmas using a double-dot product $$\eqalign{ \H:X = X^T \\ }$$ This is one way of writing your $g$ function: $\;\;G=g(X)=\H:X$

Since $\H$ is constant the differential and gradient are easy to calculate $$\eqalign{ dG &= \H:dX \qif \c{\grad GX = \H} \\ }$$ The gradient is tensor-valued, which is expected for a matrix-by-matrix gradient.

This result can also be written using index notation $$\eqalign{ \grad{G_{ij}}{X_{kl}} = \H_{ijkl} \qif \grad{X_{ji}}{X_{kl}} = \d_{jk}\d_{il} \\ }$$ An alternative to dealing with tensors is to compute a matrix-valued gradient with respect to a single component of $X$ $$\eqalign{ \grad G{X_{kl}} &= \H:\gradLR{X}{X_{kl}} &= \H:\BR{E_{kl}} = E_{kl}^T = E_{lk} \\ }$$ where $E_{lk}$ is a matrix whose elements are all zero except for the $(l,k)$ element which equals $\o$.

greg
  • 35,825