What is a proximity operator? why do we need it?

Question

I am going to deal with convex optimization problems and I am not a math student so I may have some problems in understanding some topics. As you know, many of the optimization problems lead to a cost function and according to the problem you may have some constraints on data or solution. for example here, as a priori, I am looking for sparse solutions x in the following cost function:

$$ x^* \in \underset{x}{\text{argmin }}\frac{1}{2}||y-Ax||^2+||x||_1 $$

As I read in many articles, they introduce some proximity operator to solve this problem using forward-backward spliting algorithm.

What is the proximity operator in simple words? why do we need it?

some papers say that the proximity operator is somehow the projection, what does it mean?

score 1 · Answer 1 · edited Apr 13 '17 at 12:19

1

Checkout my answer to a similar question. It explains what prox operators are, and how they're used in modern convex optimization. It also gives useful references for further reading on the subject. If you still have questions, then we can look at those.

edited Apr 13 '17 at 12:19

Community

1

answered Jul 21 '15 at 07:46

dohmatob

9,535

score 0 · Answer 2 · answered Jul 18 '21 at 14:30

There are plenty of ways to define the proximity operator, but loosely speaking:

The proximity operator is a tool used in modern optimization which allows the user to algorithmically minimize a function.

One nice thing about proximity operators is that for convex, nonsmooth functions, prox-operators are still perfectly well-defined. This means that prox-based algorithms (e.g. proximal-point, Douglas-Rachford, forward-backward) can be used to solve problems which gradient-based algorithms (e.g. gradient descent) cannot. So, for your problem, the gradient may not be defined, due to the nonsmooth $\ell_1$ penalty. This means we will rely on a prox operator to come to the rescue.

However, another hurdle with this problem is that no one knows how to compute the prox-operator of this objective function in general.

This is where the forward-backward algorithm comes in -- instead of requiring the gradient of the entire problem (which will rarely exist) or the proximity operator of the entire objective function (which is rarely computable), we can minimize this sum of functions using only the gradient of the square-term and the prox of the $\ell_1$ penalty. This is an example of "splitting" -- minimizing an objective function using only operators (e.g. gradient or prox) which concern individual summands in the objective.

The proximity operator also captures the projection operator as a special case: If $C$ is a nonempty closed convex set, and $\iota_{C}$ is its $0$-$\infty$ indicator function, then the proximity operator $\iota_{C}$ turns out to be the projection operator onto $C$, i.e. $$\textrm{prox}_{\iota_C}=\textrm{proj}_C.$$

What is a proximity operator? why do we need it?

2 Answers2

Linked