1

I encounter a problem where I wish to calculate: $$ \frac{\partial}{\partial\boldsymbol{X}}\,\operatorname{tr}\left(\left( \boldsymbol{X X}^\top \right) ^{\frac{1}{2}}\right) $$ Peterson gave a very thorough discussion on different types of matrix differentiation, including ones involving quadratic trace. Nevertheless, I am at a loss when I have fraction power. I tried as follows: $$ \begin{align} \dfrac{\partial}{\partial\boldsymbol{X}}\,\operatorname{tr}\left( \boldsymbol{X X} ^\top\right) ^{\frac{1}{2}} &= \left \{ \dfrac{\partial}{\partial\boldsymbol{X}^{1/2}}\,\operatorname{tr}\left( \boldsymbol{X X} ^\top \right)^{1/2} \right\}^\top\dfrac{\partial\boldsymbol{X}^{1/2}}{\partial\boldsymbol{X}} \end{align} $$ Yet, I found it seems that the chain rule cannot be applied this way as $\boldsymbol{X}^{1/2}$ may not exist if $\boldsymbol{X}$ is not square.

Thanks in advance.

2 Answers2

1

Let $Y = X X^\top$. Then $Y$ is symmetric and has real eigenvalues. If it is invertible then the eigenvalues are positive. So there is a unique square root. Applying the chain rule, you just need to differentiate $Y^{1/2}$ with respect to $Y$ and then $X X^\top$ with respect to $X$.

1

The function $$N = {\rm tr}\Big(\sqrt{XX^T}\Big)$$ is known as the Nuclear norm of $X^T$.

The gradient is given by either $$\eqalign{ \frac{\partial N}{\partial X} &= (XX^T)^{-1/2}\,X \cr &= X\,(X^TX)^{-1/2} \cr \cr }$$ If the SVD of $X$ is available, then $$\eqalign{ X &= USV^T \cr \frac{\partial N}{\partial X} &= UV^T \cr }$$

greg
  • 35,825