Proximal operator intuition

Question

I'm studying proximal operators. But there is something that disturbs me quite a lot and can't find an answer to it.

If ℎ is a closed convex function, then the proximal operator of ℎ (with parameter >0) is defined by

$$\text{prox}_{th}(\hat x) = \arg \min_x \, h(x) + \frac{1}{2t} \|x - \hat x \|_2^2$$

This topic explains the following: What Is the Motivation of Proximal Mapping / Proximal Operator?

"A natural strategy is to first reduce the value of by taking a step in the negative gradient direction, then reduce the value of ℎ by applying the prox-operator of ℎ, and repeat". This strategy yields the following iteration:

$$x^{k+1} = \text{prox}_{th}(x^k - t \nabla g(x^k))$$

I don't understand how you can minimize $h$. For me, you are still trying to optimize a non-differentiable function.

My only partial answer/intuition is that one we have the proximal operator form, we can actually use the subgradient optimality condition on $h$ to derive a closed-form solution of the problem. But then arises a second question: why can't we use subgradient optimality condition right away ? (much like in subgradient methods).

I'm having a hard time distilling a clear question from what you wrote, so I'll focus on the one sentence that has a question mark at the end. Are you aware of the fact that the proximal-gradient method (your second equation) is used to solve $\min_x g(x)+h(x)$? This the answer to your question why we don't use first order optimality conditions (FOC) right away. The "additive" problem (g+h) is more complicated than just minimizing $h$, so in general we are not able to solve for the FOC directly even if we are able to do so for "just" $h$. — xel, Mar 23 '21 at 10:40
Also judging by your last sentence you should be careful in distinguishing optimality conditions (like $0\in\partial h$) and (iterative) methods to solve optimization problems (like the subgradient or gradient or proximal-gradient method). — xel, Mar 23 '21 at 10:44
Yeah, the iterative method above (also called "the forward-backward method") is (in my view) not a great way to gather intuition on what the proximity operator does on its own. To understand what a proximity operator does, check out proximity-operator.net or this answer to a similar question — Zim, Sep 30 '21 at 02:51

Proximal operator intuition

0 Answers0