6

Let $\Delta^n$ denote the $n$-dimensional probability simplex. Projection Onto A Simplex by Yunmei Chen and Xiaojing Ye gives an algorithm for the $\ell_2$-projection of a vector onto $\Delta^n$, i.e., given $y \in \mathbb R^n$, they give a non-iterative method to find a minimizer $x^\ast(y)$ of $$ \min_{x \in \Delta^n} \|x-y\|_p $$ where $p = 2$.

I am interested in the case where $p = 1$. Is there a method that computes an optimizer $x^\ast(y)$ for $p = 1$ which does not use a convex solver?

Royi
  • 8,711
beeflavor
  • 881

2 Answers2

3

This is a real nice question. So I solved it even though there is an answer.

The Problem

The problem is given by:

$$\begin{aligned} \arg \min_{x} \quad & {\left\| x - y \right\|}_{1} \\ \text{subject to} \quad & \boldsymbol{1}^{T} x = 1 \\ & {x}_{i} \geq 0 \; \forall i \end{aligned}$$

Solution 001 - Linear Programming

The objective function is:

$$ {\left\| x - y \right\|}_{1} = \sum_{i = 1}^{n} \left| {x}_{i} - {y}_{i} \right| = \sum_{i = 1}^{n} {t}_{i} $$

Now, by setting $ \left| {x}_{i} - {y}_{i} \right| \leq {t}_{i} $ we implicitly suggests that $ {t}_{i} \geq 0 $. Then we can rewrite the problem as:

$$\begin{aligned} \arg \min_{t, x} \quad & \sum_{i = 1}^{n} {t}_{i} \\ \text{subject to} \quad & \left| {x}_{i} - {y}_{i} \right| \leq {t}_{i} \; \forall i \\ & \boldsymbol{1}^{T} x = 1 \\ & {x}_{i} \geq 0 \; \forall i \end{aligned}$$

Now, since $ \left| {x}_{i} - {y}_{i} \right| \leq {t}_{i} \Leftrightarrow {x}_{i} - {t}_{i} \leq {y}_{i} \wedge -{x}_{i} - {t}_{i} \leq - {y}_{i} $ we can rewrite the problem as:

$$\begin{aligned} \arg \min_{t, x} \quad & \sum_{i = 1}^{n} {t}_{i} \\ \text{subject to} \quad & - {t}_{i} + {x}_{i} \leq {y}_{i} \; & \forall i \\ & - {t}_{i} -{x}_{i} \leq {y}_{i} \; & \forall i \\ & \boldsymbol{1}^{T} x = 1 \\ & {x}_{i} \geq 0 \; & \forall i \end{aligned}$$

Which is a Linear Programming Problem which can be easily formed and be solved.

Solution 002 - Direct Solution

Remark: This is closely similar to the solution of @Maziar Sanjabi just wrote it in a clear way to me.

The objective function is given by $ \sum_{i = 1}^{n} \left| {x}_{i} - {y}_{i} \right| $ which is optimized by each term.

For any term where $ {y}_{i} \leq 0 $ the best, due to constraints, one can do is set $ {x}_{i} = 0 $ to minimize the cost of this term.

By defining $ \mathcal{S} = \left\{ i \mid {y}_{i} > 0 \right\} $ and $ s = \sum_{i \in \mathcal{S}} {y}_{i} $ we can analyze 3 cases:

Case I: $ s = 1 $

In this case we set $ {x}_{j} = {y}_{j}, \; \forall j \in \mathcal{S} $. Since $ x $ then obeys all constraints this is the optimal solution.

This is the only case with single solution.

Case II: $ s < 1 $

First we set $ {x}_{j} = {y}_{j}, \; \forall j \in \mathcal{S} $. Then, in this case we have a budget $ r = 1 - s $ to spend. In this case spending means adding value to elements of $ x $.
Since the objective function is sum of absolute value, we can spread this budget any way we want as long as we keep $ {x}_{i} \geq {y}_{i}, \; \forall i $. One way is to add to each element of $ x $ the same amount, we can spread the amount on $ \mathcal{S} $ or even add it to a single element of $ x $. Since in any combination its contribution will be the same (Element wise absolute function).
Performance wise it is probably the best to add it to a single element.

Clearly in this case there are many options for a solution.

Case III: $ s > 1 $

First we set $ {x}_{j} = {y}_{j}, \; \forall j \in \mathcal{S} $. Then, in this case we have a budget $ r = s - 1 $ to spend. In this case spending means subtracting value from elements of $ x $.
Since we can't subtract any value from $ i \notin \mathcal{S} $ we must do that only for elements in $ \mathcal{S} $. One simple strategy is to subtract from the first element as much as we can but without making it negative. Then, if still the budget is positive we can move to the next element and so on.

Clearly in this case there are many options for a solution.

Summary

I implemented both methods in MATLAB and verified the code vs. CVX.
The MATLAB Code is accessible in my StackExchange Mathematics Q2477400 GitHub Repository.

Royi
  • 8,711
2

In the case of p=1, unlike the Euclidean norm, the optimizer is not unique. So it does not have good properties and if you think about it a bit you can see that in some cases it will be non-intuitive. But the following procedure should give one of the many possible optimizers: If $y_i\leq0$ set $x_i=0$. Now denote the rest of indices i such that $y_i>0$ as $S$. If $s=\sum_{i\in S} y_i\geq 1$ then simply choose $x_i\leq y_i, ~i\in S$, such that $\sum_{i\in S} x_i=1$ (It is easy to choose a point like that). Any such point would be an optimizer. Now for the case where $s=\sum_{i\in S} y_i <1$, simply choose $x_i\geq y_i, ~i\in S$, such that $\sum_{i\in S} x_i=1$, e.g. $x_i = y_i + \frac{1-s}{|S|}$.

In the case where all the indices in y have non-positive value, then any point on the simplex is an optimizer.