0

I have a sparse matrix factorization problem, where I want to decompose a matrix $X\in \mathbb{R}^{n\times m}$ to $A\in \mathbb{R}^{n\times p}$ and $B\in \mathbb{R}^{p\times m}$, such that $X\approx AB$.

My matrix $X$ might have outlier values and missing values (that are replaced by zero). While I am aware of robust PCA, and low-rank plus sparse decomposition (E.g. in Candès et. al 2009-2016), I do not want to address the outliers directly.

I want to solve the following optimization program instead:

$$\text{minimize}_{A,B\geq 0} ~~\beta\|X-AB\|_F^2+(1-\beta)\|X-AB\|_1+\gamma_1\|A\|_1+\gamma_2\|B\|_1+\lambda\|A\|_*$$

$\|A\|_*$ is the nuclear norm of $A$, and it tries to encourage $A$ to be low-rank. Since this program is not jointly convex in $A$ and $B$, I want to solve it by alternating between the two, so let's assume I just want to solve it for $A$:

$$\text{minimize}_{A\geq 0} ~~\beta\|X-AB\|_F^2+(1-\beta)\|X-AB\|_1+\gamma_1\|A\|_1+\lambda\|A\|_*$$

I have the following questions:

  1. Can I use a sequence of proximal projections for iteratively solving this problem? If yes, what should be the sequence of applying the projections?

  2. If not, what is the best approach? I have tried subgradient descent, and it is not giving me a satisfactory convergence rate.

  3. I would like to avoid off-the-shelf solvers, but if there is too much hassle to implement this myself (e.g. in Python's Scipy), are there any free solvers out there that scale to $n\approx10^9$ and $m\approx 10^4$?

Thanks!

Royi
  • 8,711
Alt
  • 2,592

2 Answers2

1

I ended up doing the following, so I thought maybe I should post it here too.

We can also formulate this problem as a cone program, some thing like:

$\text{minimize}_{A\geq 0} ~~\beta\|X-AB\|_F^2+(1-\beta)t_1+\gamma_1 t_2+\lambda \|A\|_*$

$\text{subject to}$

$\quad\|X-AB\|_1\leq t_1$

$\quad\|A\|_1\leq t_2$

Since we know that the consraints $\|X-AB\|_1\leq t_1$ and $\|A\|_1\leq t_2$ can be replaced by affine inequalities, and since we have the following property. We can solve the problem using the interior point method. $$\begin{array}{ll} \text{minimize} & \|X\|_* \\ \text{subject to} & \mathcal{A}(X)=b \end{array} \quad\Longleftrightarrow\quad \begin{array}{ll} \text{minimize} & \tfrac{1}{2}\left( \mathop{\textrm{Tr}}W_1 + \mathop{\textrm{Tr}}W_2 \right) \\ \text{subject to} & \begin{bmatrix} W_1 & X \\ X^T & W_2 \end{bmatrix} \succeq 0 \\ & \mathcal{A}(X)=b \end{array} $$

Ideas are welcome!

Alt
  • 2,592
0
  • The term "$\beta\|X-AB\|_{\text{F}}^2 + (1-\beta)\|X-AB\|_1$" really looks like bad modelling. What are you trying to capture with it ? Aren't you better off with just the term $\|X-AB\|_{\text{F}}^2$ ?
  • Sequence of proximal operators applied anyhow will certainly diverge
  • Subgradient descent has very bad complexity: $\mathcal O(1/\epsilon^2)$. You don't want to be doing this since this will only be a subproblem to solve at every step...
  • There are proximal forward-backward algorithms which can deal with multiple proximable functions. For example there is the Vu-Condat algorithm (see section 5).
  • Finally, your original problem though nonconvex nonsmooth, it's semi-algebraic and can be attacked directly using recent works from Jerome Bolte, Heddy Attouch, and Marc Teboulle. For example, see page 12 of these slides.
dohmatob
  • 9,535
  • Thanks for your comments. "The term "$\beta|X-AB|_{\text{F}}^2 + (1-\beta)|X-AB|_1$" really looks like bad ..." Matrix $X$ is expected to be low-rank in general but we expect some large outlier values, I'll probably end up using a very small $\beta$. The ideal cost function would be a convex m-estimator loss like Huber, although I thought $L_1$ could be a good start. – Alt Feb 01 '17 at 17:27