13

What is the orthogonal projection onto the $ {L}_{1} $ unit ball?
Namely, given $ x \in {\mathbb{R}}^{n} $,

$$ {\mathcal{P}}_{ { \left\| \cdot \right\| }_{1} \leq 1 } \left( x \right) = \arg \min_{{ \left\| y \right\| }_{1} \leq 1} \left\{ {\left\| y - x \right\|}_{2}^{2} \right\} $$


Related:

Royi
  • 8,711
  • Given $x=(x_1,x_2,\ldots,x_n)$ we want to find $y=(y_1,y_2,\ldots,y_n)$ such that $|y_1|+|y_2|+\ldots+|y_n|\leq 1$ and $(x_1-y_1)^2+(x_2-y_2)^2+\ldots+(x_n-y_n)^2$ is minimal.
    Since the first set is a convex polytope and the second function is a convex function, this problem can be solved through convex optimization techniques and the same holds if the $L^1$ norm is replaced by the $L^{\infty}$ norm. There isn't, in general, a simple closed-form solution.
    – Jack D'Aurizio Jun 18 '17 at 17:52
  • @Royi: of course, and a natural point for starting an iterative method is just $\frac{x}{| x|_1}$. A hybrid between the conjugate gradient method and the symplex method should be very efficient. – Jack D'Aurizio Jun 18 '17 at 18:35
  • This is explained in ch. 8 ("the proximal mapping") of Vandenberghe's 236c notes. See slide 8-15 here. – littleO Jun 18 '17 at 20:40
  • 2
    Also see the function proj_l1.m in TFOCS for an $O(n\log n)$ implementation (with the big-O complexity limited by the sorting of $|x_i|$). While no closed-form solution is possible, iteration is not necessary. – Michael Grant Jun 19 '17 at 05:04
  • @littleO, I'm not sure I got all the steps in your link. But how come he has $ \lambda $ twice on the Lagrangian? Shouldn't it be 1? See here - http://i.imgur.com/6d9oMsU.png. – Royi Jun 19 '17 at 20:29
  • @Royi I think you are correct, the first expression for the Lagrangian is an error. I believe the second expression is correct, so the error is not serious. It's actually pretty unusual to find an error like that in Vandenberghe's 236c notes. The typo in the word "obtained" is further evidence that this slide was not proofread as carefully as normal. I think the idea is that you simply pick $\lambda$ so that the KKT conditions are satisfied, which is a pretty clear idea. – littleO Jun 20 '17 at 00:23
  • That's right. And that $\lambda$ value can be found in $O(n)$ steps once the values of $x$ are sorted. – Michael Grant Jun 20 '17 at 01:25
  • @MichaelGrant, Yep, it seems the second one is right (Error on the error is correct :-)). I get the derivation of the Soft Thresholding. What I don't get is the derivation of this line - http://i.imgur.com/Ywsd6Lw.png. For the case of $ \lambda > 0$. Thank You. – Royi Jun 20 '17 at 04:52
  • Connection to Soft Max function - https://math.stackexchange.com/questions/2372487 through Mirror Descent. – Royi Aug 22 '17 at 12:44

3 Answers3

7

$$ \DeclareMathOperator{\sign}{sign} $$

The Lagrangian of the problem can be written as:

$$ \begin{align} L \left( x, \lambda \right) & = \frac{1}{2} {\left\| x - y \right\|}^{2} + \lambda \left( {\left\| x \right\|}_{1} - 1 \right) && \text{} \\ & = \sum_{i = 1}^{n} \left( \frac{1}{2} { \left( {x}_{i} - {y}_{i} \right) }^{2} + \lambda \left| {x}_{i} \right| \right) - \lambda && \text{Component wise form} \end{align} $$

The Dual Function is given by:

$$ \begin{align} g \left( \lambda \right) = \inf_{x} L \left( x, \lambda \right) \end{align} $$

The above can be solved component wise for the term $ \left( \frac{1}{2} { \left( {x}_{i} - {y}_{i} \right) }^{2} + \lambda \left| {x}_{i} \right| \right) $ which is solved by the soft Thresholding Operator:

$$ \begin{align} {x}_{i}^{\ast} = \sign \left( {y}_{i} \right) { \left( \left| {y}_{i} \right| - \lambda \right) }_{+} \end{align} $$

Where $ {\left( t \right)}_{+} = \max \left( t, 0 \right) $.

Now, all needed is to find the optimal $ \lambda \geq 0 $ which is given by the root of the objective function (Which is the constrain of the KKT System):

$$ \begin{align} h \left( \lambda \right) & = \sum_{i = 1}^{n} \left| {x}_{i}^{\ast} \left( \lambda \right) \right| - 1 \\ & = \sum_{i = 1}^{n} { \left( \left| {y}_{i} \right| - \lambda \right) }_{+} - 1 \end{align} $$

The above is a Piece Wise linear function of $ \lambda $ and its Derivative given by:

$$ \begin{align} \frac{\mathrm{d} }{\mathrm{d} \lambda} h \left( \lambda \right) & = \frac{\mathrm{d} }{\mathrm{d} \lambda} \sum_{i = 1}^{n} { \left( \left| {y}_{i} \right| - \lambda \right) }_{+} \\ & = \sum_{i = 1}^{n} -{ \mathbf{1} }_{\left\{ \left| {y}_{i} \right| - \lambda > 0 \right\}} \end{align} $$

Hence it can be solved using Newton Iteration.

In a similar manner the projection onto the Simplex (See @Ashkan answer) can be calculated.
The Lagrangian in that case is given by:

$$ \begin{align} L \left( x, \mu \right) & = \frac{1}{2} {\left\| x - y \right\|}^{2} + \mu \left( \boldsymbol{1}^{T} x - 1 \right) && \text{} \\ \end{align} $$

The trick is to leave non negativity constrain implicit.
Hence the Dual Function is given by:

$$ \begin{align} g \left( \mu \right) & = \inf_{x \succeq 0} L \left( x, \mu \right) && \text{} \\ & = \inf_{x \succeq 0} \sum_{i = 1}^{n} \left( \frac{1}{2} { \left( {x}_{i} - {y}_{i} \right) }^{2} + \mu {x}_{i} \right) - \mu && \text{Component wise form} \end{align} $$

Again, taking advantage of the Component Wise form the solution is given:

$$ \begin{align} {x}_{i}^{\ast} = { \left( {y}_{i} - \mu \right) }_{+} \end{align} $$

Where the solution includes the non negativity constrain by Projecting onto $ {\mathbb{R}}_{+} $

Again, the solution is given by finding the $ \mu $ which holds the constrain (Pay attention, since the above was equality constrain, $ \mu $ can have any value and it is not limited to non negativity as $ \lambda $ above).

The objective function (From the KKT) is given by:

$$ \begin{align} h \left( \mu \right) = \sum_{i = 1}^{n} {x}_{i}^{\ast} - 1 & = \sum_{i = 1}^{n} { \left( {y}_{i} - \mu \right) }_{+} - 1 \end{align} $$

The above is a Piece Wise linear function of $ \mu $ and its Derivative given by:

$$ \begin{align} \frac{\mathrm{d} }{\mathrm{d} \mu} h \left( \mu \right) & = \frac{\mathrm{d} }{\mathrm{d} \mu} \sum_{i = 1}^{n} { \left( {y}_{i} - \mu \right) }_{+} \\ & = \sum_{i = 1}^{n} -{ \mathbf{1} }_{\left\{ {y}_{i} - \mu > 0 \right\}} \end{align} $$

Hence it can be solved using Newton Iteration.

I wrote MATLAB code which implements them both at Mathematics StackExchange Question 2327504 - GitHub.
There is a test which compares the result to a reference calculated by CVX.

Royi
  • 8,711
3

Hint: Because of the symmetric essences of the problem you may assume $x$ lies in first quadrant i.e, $x \ge 0$ and assume $x$ is out side of $\ell_1 $- Unit ball (other wise the answer is trivially $y=x$ ),Therefor under these assumption for sure we have $ 0 \leq y^{*} \leq x$ where $y^{*} $ is the unique optimal solution. To find $y^{*}$ you need to solve following Quadratic programming
\begin{aligned} & {\text{Min}} & & \sum_{i=1}^{n} (x_i -y_i)^2 \\ & \text{subject to} & & y \geq 0, \\ & & & \sum_{i=1}^{n} y_i =1 , \end{aligned}

Note that this is a smooth convex optimization problem with linear constraints, So it is easy to solve! To find a closed form solution set up $KKT$ systems.

Note that once you get solution from problem above, you can characterize all solutions for all cases depending on positions of $x$ in space. For example let $x = (-1, 2,0,0,3)$, you know the solution for above problem where $\bar{x}=(1,2,0,0,3),$ call it $\bar{y} =(y_1,y_2,..., y_n)$ then solution corresponding to $x$ is $y=(-y_1,y_2,...,y_n)$.

Red shoes
  • 6,948
  • 1
    See @littleO's comment above. It's actually possible to solve this with an $O(n\log n)$ sort and $O(n)$ additional computations. – Michael Grant Jun 19 '17 at 05:06
  • @Ashkan, First +1 this lovely idea. I tried sketching the KKT. It doesn't look easy to get a closed form solution from them (Or any solution). Any idea? – Royi Jun 19 '17 at 17:58
  • OK, I gave it more thought and basically this is Projection onto the Unit Simplex / Probability Simplex which is as hard as projection to the L1 Unit Norm. Unless you have really nice way to solve the KKT of this. Thank You. – Royi Jun 19 '17 at 20:04
2

An alternative approach is to use the closed formula for the infinite norm ball and perform a 45 rotation. Additionally, there's a scaling factor to consider. This approach seems to give correct results for 1-norm unit ball centered at the origin. The idea goes like this:

To express the idea of projecting onto the 1-norm unit ball using a rotation, the mathematical notation involves several steps. Here's how it can be represented using mathematical formulas:

  1. Rotation by 45 degrees: Rotate a vector $\mathbf{x} = (x, y)$ by 45 degrees using a rotation matrix. The rotation matrix for a 45-degree (or $\pi/4$ radians) rotation is: $$ R = \begin{pmatrix} \cos(\pi/4) & -\sin(\pi/4) \\ \sin(\pi/4) & \cos(\pi/4) \end{pmatrix} $$ The rotated vector $\mathbf{x'}$ is given by: $$ \mathbf{x'} = R \mathbf{x} $$

  2. Scaling Up by $\sqrt{2}$: Scale up the rotated vector by a factor of $\sqrt{2}$. This is necessary because the diagonals of the unit square (representing the infinity-norm ball) become the sides of the diamond (representing the 1-norm ball). The scaled vector $\mathbf{x''}$ is: $$ \mathbf{x''} = \mathbf{x'} \times \sqrt{2} $$

  3. Applying the Infinity-Norm Projection: Project the scaled vector onto the infinity-norm unit ball. This projection, denoted as $\text{Proj}_{\infty}(\cdot)$, clips each component of the vector to be within the range $[-1, 1]$. The projected vector $\mathbf{x'''}$ is: $$ \mathbf{x'''} = \text{Proj}_{\infty}(\mathbf{x''}) $$

  4. Scaling Down by $\sqrt{2}$: Scale down the projected vector by a factor of $\sqrt{2}$. The scaled-down vector $\mathbf{x''''}$ is: $$ \mathbf{x''''} = \mathbf{x'''} \div \sqrt{2} $$

  5. Rotating Back by -45 degrees: Finally, rotate the vector back by -45 degrees using the inverse of the original rotation matrix. The inverse of the rotation matrix $R$ is its transpose, given by: $$ R^{-1} = R^T = \begin{pmatrix} \cos(\pi/4) & \sin(\pi/4) \\ -\sin(\pi/4) & \cos(\pi/4) \end{pmatrix} $$ The final projected vector $\mathbf{y}$ is: $$ \mathbf{y} = R^{-1} \mathbf{x''''} $$

Combining these steps, the overall 1-norm projection of a vector $\mathbf{x}$ can be written as: $$ \mathbf{y} = R^T \left( \text{Proj}_{\infty} \left( R \mathbf{x} \times \sqrt{2} \right) \div \sqrt{2} \right) $$

This formula represents the entire process of projecting a vector onto the 1-norm unit ball using rotation and scaling in conjunction with the infinity-norm projection.

I've tried this in $\mathbb{R}^2$ and it is correct. I would like to see this approach generalized to many dimensions and also to non-unit balls, but I do not have the skills.

root
  • 153