1

Let $X\in \mathbb{R}^{m\times n}$. How to show that the trace norm $\|X\|_\text{tr}$, which is defined as the sum of the singular values of $X$, is equivalent to the following optimization problem?

\begin{array}{ll} \mathop{\text{maximize}}\limits_{Y\in\mathbb{R}^{m\times n}} & \text{tr}(X^TY)\\ \text{subject to} & Y^TY \preceq I_n\end{array}

david
  • 73
  • See section 4.2.3.2 of MOSEK Modeling Cookbook https://docs.mosek.com/modeling-cookbook/sdo.html . – Mark L. Stone Apr 01 '18 at 19:42
  • Thanks! This is what I want. But under equation (19), Is $\sup_{|Z|2\leq 1} \mathrm{tr}(X^T Z) = \sup{|X|2\leq 1} \mathrm{tr}(\Sigma^T U^T ZV)$ a typo? I think it should be $\sup{|Z|_2\leq 1} \mathrm{tr}(\Sigma^T U^T ZV)$. – david Apr 02 '18 at 01:37
  • I believe you are correct. The next line then follows because $U^T$ and $V$ both have 2-norm = 1, so "unitary invariance" applies. – Mark L. Stone Apr 02 '18 at 01:58
  • Michael Grant wrote about it here. Note that $|Q|_2 \leq 1$ is equivalent to $Q^\top Q \preceq I$. – Rodrigo de Azevedo Apr 02 '18 at 09:40

1 Answers1

1

I don't know if it is OK in stackexchange but I am trying to answer my own question to explain more about the derivation of the equivalence of trace norm. The reference is mostly from section 4.2.3.2 of MOSEK Modeling Cookbook.

We first prove that $||A||_{2} \le 1$ is equivalent to $A^{T} A\preceq I$. \begin{align*} & ||A||_{2} \le 1\\ \Leftrightarrow \quad & \max_{||x||_{2} =1} ||Ax||_{2} \le 1\\ \Leftrightarrow \quad & \max_{||x||_{2} =1} x^{T} A^{T} Ax\le 1\\ \Leftrightarrow \quad & \max_{||x||_{2} =1} x^{T}\left( A^{T} A-I\right) x\le 0\\ \Leftrightarrow \quad & A^{T} A\preceq I \end{align*} Then our problem \begin{array}{ll} \mathop{\text{maximize}}\limits_{Y\in\mathbb{R}^{m\times n}} & \text{tr}(X^TY)\\ \text{subject to} & Y^TY \preceq I_n\end{array} becomes $$\sup _{||Y||_{2} \leq 1}\mathrm{tr} (X^{T} Y)$$ Then we do SVD decomposition to $X$, $X=U\Sigma V^T$, \begin{equation*} \begin{aligned} \sup _{||Z||_{2} \leq 1}\mathrm{tr} (X^{T} Z) & =\sup _{||Z\| _{2} \leq 1}\mathrm{tr} (V\Sigma ^{T} U^{T} Z)\\ & =\sup _{||Z\| _{2} \leq 1}\mathrm{tr} (\Sigma ^{T} U^{T} ZV)\\ & =\sup _{\| U^{T} ZV\| _{2} \leq 1}\mathrm{tr} (\Sigma ^{T} U^{T} ZV)\\ & =\sup _{\| Y\| _{2} \leq 1}\mathrm{tr} (\Sigma ^{T} Y) \end{aligned} \end{equation*} $\mathrm{tr} (V\Sigma ^{T} U^{T} Z) = \mathrm{tr} (\Sigma ^{T} U^{T} ZV)$ follows from the invariant under cyclic permutations property of trace. And the equivalence of constraint $||Z||_2 \le 1 \Leftrightarrow ||U^T Z V ||_2 \le 1$ follows from the definition of $\ell_2$ norm of matrix. And using the unitary invariance of the norm $||\cdot||_2$ again. We can consider $Y=\mathbf{diag}(y_1, \dots, y_p)$. $$\sup_{\|Z\|_2\leq 1} \mathrm{tr}(X^T Z) = \sup_{|y_i| \leq 1} \sum_{i=1}^p \sigma_i y_i = \sum_{i=1}^p \sigma_i$$

david
  • 73