Consecutive Proximal Projections as a Solution to Sum of Many Functions Each with Known Proximal Operator

Question

I have a sparse matrix factorization problem, where I want to decompose a matrix $X\in \mathbb{R}^{n\times m}$ to $A\in \mathbb{R}^{n\times p}$ and $B\in \mathbb{R}^{p\times m}$, such that $X\approx AB$.

My matrix $X$ might have outlier values and missing values (that are replaced by zero). While I am aware of robust PCA, and low-rank plus sparse decomposition (E.g. in Candès et. al 2009-2016), I do not want to address the outliers directly.

I want to solve the following optimization program instead:

$$\text{minimize}_{A,B\geq 0} ~~\beta\|X-AB\|_F^2+(1-\beta)\|X-AB\|_1+\gamma_1\|A\|_1+\gamma_2\|B\|_1+\lambda\|A\|_*$$

$\|A\|_*$ is the nuclear norm of $A$, and it tries to encourage $A$ to be low-rank. Since this program is not jointly convex in $A$ and $B$, I want to solve it by alternating between the two, so let's assume I just want to solve it for $A$:

$$\text{minimize}_{A\geq 0} ~~\beta\|X-AB\|_F^2+(1-\beta)\|X-AB\|_1+\gamma_1\|A\|_1+\lambda\|A\|_*$$

I have the following questions:

Can I use a sequence of proximal projections for iteratively solving this problem? If yes, what should be the sequence of applying the projections?
If not, what is the best approach? I have tried subgradient descent, and it is not giving me a satisfactory convergence rate.
I would like to avoid off-the-shelf solvers, but if there is too much hassle to implement this myself (e.g. in Python's Scipy), are there any free solvers out there that scale to $n\approx10^9$ and $m\approx 10^4$?

Thanks!

By $| A |_{1}$, do you mean the matrix one norm or the sum of the absolute values of the entries of $A$? — Brian Borchers, Feb 01 '17 at 19:20
I mean the sum of absolute values of the entries in the matrix. — Alt, Feb 01 '17 at 21:03

score 1 · Accepted Answer · answered Feb 08 '17 at 00:58

I ended up doing the following, so I thought maybe I should post it here too.

We can also formulate this problem as a cone program, some thing like:

$\text{minimize}_{A\geq 0} ~~\beta\|X-AB\|_F^2+(1-\beta)t_1+\gamma_1 t_2+\lambda \|A\|_*$

$\text{subject to}$

$\quad\|X-AB\|_1\leq t_1$

$\quad\|A\|_1\leq t_2$

Since we know that the consraints $\|X-AB\|_1\leq t_1$ and $\|A\|_1\leq t_2$ can be replaced by affine inequalities, and since we have the following property. We can solve the problem using the interior point method. $$\begin{array}{ll} \text{minimize} & \|X\|_* \\ \text{subject to} & \mathcal{A}(X)=b \end{array} \quad\Longleftrightarrow\quad \begin{array}{ll} \text{minimize} & \tfrac{1}{2}\left( \mathop{\textrm{Tr}}W_1 + \mathop{\textrm{Tr}}W_2 \right) \\ \text{subject to} & \begin{bmatrix} W_1 & X \\ X^T & W_2 \end{bmatrix} \succeq 0 \\ & \mathcal{A}(X)=b \end{array} $$

Ideas are welcome!

dohmatob · Answer 2 · 2017-02-01T13:05:09.933

The term "$\beta\|X-AB\|_{\text{F}}^2 + (1-\beta)\|X-AB\|_1$" really looks like bad modelling. What are you trying to capture with it ? Aren't you better off with just the term $\|X-AB\|_{\text{F}}^2$ ?
Sequence of proximal operators applied anyhow will certainly diverge
Subgradient descent has very bad complexity: $\mathcal O(1/\epsilon^2)$. You don't want to be doing this since this will only be a subproblem to solve at every step...
There are proximal forward-backward algorithms which can deal with multiple proximable functions. For example there is the Vu-Condat algorithm (see section 5).
Finally, your original problem though nonconvex nonsmooth, it's semi-algebraic and can be attacked directly using recent works from Jerome Bolte, Heddy Attouch, and Marc Teboulle. For example, see page 12 of these slides.

Thanks for your comments. "The term "$\beta|X-AB|_{\text{F}}^2 + (1-\beta)|X-AB|_1$" really looks like bad ..." Matrix $X$ is expected to be low-rank in general but we expect some large outlier values, I'll probably end up using a very small $\beta$. The ideal cost function would be a convex m-estimator loss like Huber, although I thought $L_1$ could be a good start. — Alt, Feb 01 '17 at 17:27

Consecutive Proximal Projections as a Solution to Sum of Many Functions Each with Known Proximal Operator

2 Answers2