The proximal operator of the nuclear norm / Schatten norm

Question

\begin{equation} \arg\min_{X} \frac{1}{2}\|X-Y\|_{F}^2 + \tau\|X\|_{*} \end{equation} where $\tau\geq 0,Y\in \mathbb{C}^{n\times n}$ and $\|\cdot\|_{*}$ is the nuclear norm. What's the solution of this convex optimization?

In some literature, they show the solution of this optimization problem in real condition (where $Y\in \mathbb{R}^{n\times n}$) is $\mathcal{D}_{\tau}(Y)$, where $\mathcal{D}_{\tau}$ is the soft-thresholding operator. But I wonder what the solution is in complex condition (where $Y\in \mathbb{C}^{n\times n}$)? Is it exactly the same? which is $\mathcal{D}_{\tau}(Y)$.

I would guess that if $Y$ is diagonal with non-negative entries, then the solution $X$ should be diagonal with non-negative entries too.
$\DeclareMathOperator{\diag}{diag}$
With that, it suffices to solve this problem: suppose that $Y$ is diagonal with $Y = \diag(y_1,\dots,y_n)$ and $y_i \geq 0$. Similarly, take $X = \diag(x_1,\dots,x_n)$ with $x_i \geq 0$. The problem now becomes $$ \arg \min_{x_1,\dots,x_n} \frac 12 \sum_{i}((x_i - y_i)^2 + 2\tau x_i) $$ From there, extend the result using SVD. — Ben Grossmann, Nov 11 '16 at 13:39
@Chenfl the solution in the "complex condition" is exactly the same — Ben Grossmann, Nov 11 '16 at 14:12
Show me a reference for real matrices, and I'll explain how every step in the proof still works over complex matrices. — Ben Grossmann, Nov 11 '16 at 14:33
It would be much easier for me to answer your concerns, however, if you explained specifically why you expect something to change for the problem over complex matrices. — Ben Grossmann, Nov 11 '16 at 14:36
In the theorem 2.1 of the paper "A singular value thresholding algorithm for matrix completion", it proves this for real matrices. Can you help me to check whether every step in the proof still works over complex matrices? You can look through this proof and tell me the answer. Thanks! — Chenfl, Nov 11 '16 at 14:45
Finally, I want to check again whether the solution in the "complex condition" is $\mathcal{D}_{\tau}(Y)=U\text{max}(S-\tau I,0)V^H$, where $Y=USV^H$ is the SVD of $Y$, and the operation is taken element-wise. I'm afraid I misunderstand your answer. Thanks! — Chenfl, Nov 11 '16 at 14:54
@Chenfl you've understood correctly. And of course, it's important that we use $H$ as opposed to the entry-wise transpose. — Ben Grossmann, Nov 11 '16 at 16:41
Yep, it's the soft-thresholding function applied to the singular values even in the complex case. — Michael Grant, Nov 11 '16 at 16:42
@Chenfl Looking at the paper, It's not the proof I was expecting. This needs a more careful treatment than I thought — Ben Grossmann, Nov 11 '16 at 16:43
@Omnomnomnom, Thanks! So I think the only difference between the real and complex case is $V^T$ and $V^H$. One is entry-wise transpose, the other one is conjugate transpose. — Chenfl, Nov 12 '16 at 00:10

score 5 · Answer 1 · answered Sep 25 '17 at 08:25

Basically, for any Schatten Norm the algorithm is pretty simple.

If we use Capital Letter $ A $ for Matrix and Small Letter for Vector than:

$$ {\operatorname*{Prox}}_{\lambda \left\| \cdot \right\|_{p}} \left( A \right) = \arg \min_{X} \frac{1}{2} \left\| X - A \right\|_{F}^{2} + \lambda \left\| X \right\|_{p} $$

Where $ \left\| X \right\|_{p} $ is the $ p $ Schatten Norm of $ X $.

Defining $ \boldsymbol{\sigma} \left( X \right) $ as a vector of the Singular Values of $ X $ (See the Singular Values Decomposition).

Then the Proximal Operator Calculation is as following:

Apply the SVD on $ A $: $ A \rightarrow U \operatorname*{diag} \left( \boldsymbol{\sigma} \left( A \right) \right) {V}^{T} $.
Extract the vector of Singular Values $ \boldsymbol{\sigma} \left( A \right) $.
Calculate the Proximal Operator of the extracted vector using Vector Norm $ p $: $ \hat{\boldsymbol{\sigma}} \left( A \right) = {\operatorname*{Prox}}_{\lambda \left\| \cdot \right\|_{p}} \left( \boldsymbol{\sigma} \left( A \right) \right) = \arg \min_{x} \frac{1}{2} \left\| x - \boldsymbol{\sigma} \left( A \right) \right\|_{2}^{2} + \lambda \left\| x \right\|_{p} $.
Return the Proximal of the Matrix Norm: $ \hat{A} = {\operatorname*{Prox}}_{\lambda \left\| \cdot \right\|_{p}} \left( A \right) = U \operatorname*{diag} \left( \hat{\boldsymbol{\sigma}} \left( A \right) \right) {V}^{T} $.

The mapping of Matrix Norm into Schatten Norm:

Frobenius Norm - Given by $ p = 2 $ in Schatten Norm.
Nuclear Norm - Given by $ p = 1 $ in Schatten Norm.
Spectral Norm (The $ {L}_{2} $ Induced Norm of a Matrix) - Given by $ p = \infty $ in Schatten Norm.

So in your case use the Schatten Norm where $ p = 1 $.
The Proximal Operator for Vector Norm for $ {L}_{1} $ Norm is the Soft Thresholding Operator.

Thank you for this answer. Does it make any sense to define $l_0$ Schatten norm, whose proximal mapping involves hard thresholding of singular values (instead of soft thresholding)? — User32563, Mar 21 '20 at 14:59
I guess it has the same sense as applying it to a vector. Namely you are trying to vanish the effect of some directions (As opposed to weaken them with the $ {L}_{1} $). — Royi, Mar 21 '20 at 15:13
hello @Chenfl , was wondering if you guys got the solution for the objective function mentioned in the question ? If yes, kindly spare some time to educate me on the solution. I am working on exact same optimization problem (same constraints and regularization) — Upendra01, Aug 05 '20 at 18:17

The proximal operator of the nuclear norm / Schatten norm

1 Answers1

Linked