2

The nuclear norm (also known as trace norm) is defined as

\begin{equation} {\left\| M \right\|_*} = \mbox{tr} \left( {\sqrt {{M^T}M} } \right) = \sum\limits_{i = 1}^{\min \left\{ {m,n} \right\}} {{\sigma _i}\left( M \right)} \end{equation} where ${\sigma _i}\left( M \right)$ denotes the $i$-th singular value of $M$.

My question is how to compute the derivative of ${\left\| {XA} \right\|_*}$ with respect to $X$, i.e., \begin{equation} \frac{{\partial {{\left\| {XA} \right\|}_*}}}{{\partial X}} \end{equation} In fact, I want to use it for the gradient descent optimization algorithm.

Note that there is a similar question, according to which the sub-gradient of ${\left\| X \right\|_*}$ is $U{V^T}$, where $U\Sigma {V^T}$ is the SVD decomposition of $X$. I hope this is helpful. Thanks a lot for your help.

Jack
  • 27
  • 1
    Did you read Michael Grant's answer? – Rodrigo de Azevedo Jun 07 '18 at 11:41
  • @RodrigodeAzevedo Thanks for your suggestion. I just read Michael Grant's answer right now. Although I haven’t made it clear, actually, I want to use ${\left| {XA} \right|*}$ as a loss function in a deep neural network (DNN). As is known, the optimization algorithm for DNNs can be easily implemented if we can compute the gradient. There is a recent paper which optimizes ${\left| {X} \right|*}$ using its sub-gradient $UV^T$, and I think it maybe appropriate to follow this work. – Jack Jun 07 '18 at 12:26

1 Answers1

4

Let $$Y=XA$$ Write the norm in terms of this new variable, then find the differential and do a change of variables from $Y\rightarrow X$ to obtain the desired gradient $$\eqalign{ \phi&=\|Y\|_* \cr d\phi &= (YY^T)^{-\tfrac{1}{2}}Y:dY \cr &= (YY^T)^{-\tfrac{1}{2}}Y:dX\,A \cr &= (YY^T)^{-\tfrac{1}{2}}YA^T:dX \cr \frac{\partial\phi}{\partial X} &= (YY^T)^{-\tfrac{1}{2}}YA^T \cr &= Y(Y^TY)^{-\tfrac{1}{2}}A^T \cr\cr }$$ There are two ways of writing the inverse of the square root, only one of which makes sense when $Y$ is rectangular (full column rank -vs- full row rank).

greg
  • 35,825
  • Dear greg, thank you so much for your answer. Although I have no ability to verify your answer, it seems to be correct. Besides, I have an additional problem: is it true that $Y{({Y^T}Y)^{ - {\textstyle{1 \over 2}}}}{A^T} = \tilde U{\tilde V^T}A^T$ where $Y = \tilde U\tilde \Sigma {\tilde V^T}$ like the answer of another similar question? – Jack Jun 07 '18 at 13:52
  • 1
    @Jack Yes, if you know the SVD of $Y$, then the solution can be simplified to $$ \frac{\partial\phi}{\partial X} = UV^TA^T $$ – greg Jun 07 '18 at 16:41
  • Thank you so much for your answer and it means a lot to me. – Jack Jun 07 '18 at 17:30