2

I am wondering if there is a simple closed form solution to the constrained proximal mapping problem:

$$ \operatorname*{argmin}_{\beta: \|\beta\|_2 \leq 1} \frac{1}{2\mu }\|X - \beta\|_2^2 + \|\beta\|_1,$$ where $\|a\|_1 = \sum_{i=1}^p |a_i|$. Intuitively, I would think that the solution is the projection of the unconstrained solution (i.e., soft thresholded solution) onto the unit sphere, but I am having difficulty proving this. Perhaps my intuition is wrong here, or I am overlooking a simple property of proximal operators.

Any tips on a direction for proof or papers for reference would be appreciated.

Royi
  • 8,711
user23658
  • 453
  • If you write {\rm argmin} instead of \operatorname{argmin}, then you don't get proper spacing in things like $a\operatorname{argmin} b$ and $a\operatorname{argmin}(b).$ I mention both examples so that you can see the context-dependent nature of the spacing (less space to the right in the second example). Also with \operatorname{argmin}_\beta (with the asterisk) you see $$ \operatorname{argmin}_\beta, $$ with the subscript directly below $\operatorname{argmin}$ (when that is in a displayed, as opposed to inline, setting). $\qquad$ – Michael Hardy Dec 13 '16 at 19:13
  • Thank you Michael. That is very useful. – user23658 Dec 13 '16 at 20:28
  • just a wild idea: write down the KKT conditions for both problems and see if the projected solution to one is a solution to the other. – LinAlg Dec 14 '16 at 00:00
  • If you can, but the constraint in variation form (by adding a quadratic term $\frac{1}{2}\alpha |\beta|_2^2$). The solution is then just a soft-thresholding. – dohmatob Dec 14 '16 at 20:47
  • I know that the elastic net ($\ell_1$ plus non-squared $\ell_2$) admits a simple two-step approach, but I'm not so sure about this one. – Michael Grant Dec 14 '16 at 22:37
  • OK, I've done some rough work that shows you can do a soft threshold by $\mu$, then scale if necessary to achieve $|\beta|_2\leq 1$. I don't consider my scratchings answer-worthy yet. – Michael Grant Dec 14 '16 at 23:10
  • 1
    Thank you for the input Michael. You are correct; I have proven the solution is ${\rm soft}(X, \mu)/\max \left(1, |{\rm soft}(X, \mu)| \right)$. I will write up the proof and post it sometime tomorrow morning. – user23658 Dec 15 '16 at 01:36
  • @user23658, Could you please share your solution? – Royi Mar 17 '20 at 20:39
  • Related to - https://math.stackexchange.com/questions/2595199. – Royi Mar 17 '20 at 20:41

1 Answers1

2

The problem is given by:

$$\begin{aligned} \arg \min_{x} \quad & \frac{1}{2} {\left\| x - y \right\|}_{2}^{2} + \mu {\left\| x \right\|}_{1} \\ \text{subject to} \quad & {\left\| x \right\|}_{2} \leq 1 \end{aligned}$$

Which is equivalent to

$$\begin{aligned} \arg \min_{x} \quad & \frac{1}{2} {\left\| x - y \right\|}_{2}^{2} + \mu {\left\| x \right\|}_{1} \\ \text{subject to} \quad & {\left\| x \right\|}_{2}^{2} \leq 1 \end{aligned}$$

The Lagrangian is given by:

$$ L \left( x, \lambda \right) = \frac{1}{2} {\left\| x - y \right\|}_{2}^{2} + \mu {\left\| x \right\|}_{1} + \lambda \left( {x}^{T} x - 1 \right) $$

With similar reasoning to the solution of Orthogonal Projection onto the $ {L}_{2} $ Unit Ball and derivation of the Proximal Operator of the $ {L}_{1} $ Norm one could conclude that (By examining the case $ \lambda \neq 0 $ and the case $ \lambda = 0 $):

  1. If the norm of $ y $ after the soft thresholding is larger than 1 then one should apply the projection onto the $ {L}_{2} $ Unit Ball.
  2. If the norm of $ y $ before or after the soft thresholding is smaller or equal to 1 then then one should apply only the Soft Thresholding.

Hence the solution is given by:

$$ x = \frac{ \operatorname{sign} \left( y \right ) {\left( \left| y \right| - \mu \right)}_{+} }{ \max \left( 1, \operatorname{sign} \left( y \right ) {\left( \left| y \right| - \mu \right)}_{+} \right) } = \frac{ \mathcal{S}_{\mu} \left( y \right) }{ \max \left( 1, \mathcal{S}_{\mu} \left( y \right) \right) } $$

Where $ \mathcal{S}_{\mu} \left( \cdot \right) $ is the Soft Threshold operator with parameter $ \mu $ and $ {\left( \cdot \right)}_{+} $ denotes the positive part.

To verify the result I implemented it in MATALB and verified it against CVX.
The code is available at my StackExchange Mathematics Q2057347 GitHub Repository.

Royi
  • 8,711