Proximal Operator of Weighted $ {L}_{2} $ Norm

Question

Define the weighted $ {L}_{2} $ norm $ {\left\| x \right\|}_{2,w} = \sqrt{ \sum_{i = 1}^{n} {w}_{i} {x}_{i}^{2} }$. Find the formula for $ \operatorname{prox}_{\lambda {\left\| \cdot \right\|}_{2,w} }(y)$, where $ \lambda > 0 $.

By definition we have: $$ \operatorname{prox}_{\lambda {\left\| \cdot \right\|}_{2,w}} \left( y \right) = \arg \min_{x} \left( \lambda \sqrt{ \sum_{i = 1}^{n} {w}_{i} {x}_{i}^{2} } + \frac{1}{2} {\left\| x - y \right\|}_{2}^{2} \right) $$

But I have no idea how to proceed from here. Any idea?

Proximal Operator of the $ {L}_{2} $ Norm - https://math.stackexchange.com/questions/1681658. — Royi, Jul 31 '19 at 03:13
Related to - https://math.stackexchange.com/questions/3581340. — Royi, Mar 15 '20 at 20:34
What's with the run of questions on proximal operators? If you have so many questions, it's probably better to seek a face-to-face (e.g., with the instructor, if this is for a course you're enrolled in). — Gerry Myerson, Mar 16 '20 at 09:56
@Royi there were six or eight questions in close proximity on proximal operators, all with you as the last user. Sorry, I didn't look closely enough to see that they weren't all new questions you had just asked. If they were old questions that you edited, I'd suggest it's a bad idea to edit bunches of old questions, as it bumps new questions off the front page; one should limit oneself to three or four a day. — Gerry Myerson, Mar 17 '20 at 08:33
@GerryMyerson, Yea, I just added the proximal-operators tag. I also added it to relevant questions. I will limit the tagging to few per day. By the way, I made a mistake spelling the tag. Could you please rename it to proximal-operator? No need to the last s. Thank You. — Royi, Mar 17 '20 at 08:47
@Royi I don't know how to rename a tag someone else has created, and I don't know how to change it without bumping all those questions up to the front page again. Maybe you could flag this question for moderator attention and explain what you hope a moderator can do. — Gerry Myerson, Mar 17 '20 at 08:50

amakelov · Accepted Answer · 2020-03-16T05:56:45.960

5

The previous answer contained a crucial mistake (thanks to the users in the comments for pointing it out) and became a mess of edits, so here's a new, correct one. Denote $\|x\|_{2,w}^2 = \sum_{i=1}^n w_ix_i^2$. Define $$f(x) = \lambda\sqrt{ \sum_{i = 1}^{n} {w}_{i} {x}_{i}^{2} } + \frac{1}{2} {\left\| x - y \right\|}_{2}^{2}.$$ This is a convex function, being the sum of a norm and a scaled version of the $\ell_2$ squared norm. It is not differentiable everywhere, but it is continuous - so we can essentially replace the gradient by the subgradient, which is defined as $$\partial f(x) = \{v\in\mathbb{R}^n \ | \ f(y) \geq f(x) + \langle v, y-x\rangle \text{ for all $y$}\}.$$ Then standard facts from convex analysis tell us that:

$x$ is a minimizer of $f$ if and only if $0\in \partial f(x)$;
if $f$ is differentiable at $x$, the subgradient $\partial f(x)$ is a singleton containing the gradient of $f$ at $x$;
given continuous convex functions $f,g$, we have $\partial (f+g)(x)$ is the convex hull of the Minkowski sum of $\partial f(x)$ and $\partial g(x)$.

Now we compute the subgradient of $f(x)$. When $x\neq0$, both summands are differentiable and we obtain the condition $$ \partial f(x) = \frac{\lambda}{\|x\|_{2,w}}Wx + (y-x).$$ where $W$ is a diagonal matrix with the $w_i$'s as entries.

The case $x=0$ is more interesting. The subgradient of the weighted $\ell_2$ norm at $0$ is by definition $$\{v \ | \ \langle v, y\rangle \leq \lambda \|y\|_{2,w}\text{ for all $y$}\}$$ which exactly means $\|v\|_{2,w}^*\leq \lambda$, where $\|\cdot\|_{2,w}^*$ is the dual norm to the weighted $\ell_2$ norm, which can easily be seen to be (use Cauchy-Schwarz) $\|y\|_{2,w}^* = \sqrt{y^T W^{-1}y}$. So the total subgradient at zero is $$\partial f(0) = \{ v - y \ | \ \|v\|_{2,w}^*\leq \lambda\}$$.

Now we must see when $0\in\partial f(x)$.

The condition $0\in\partial f(0)$ is that $\|y\|_{2,w}^* \leq \lambda$. So, when $\|y\|_{2,w}^* \leq \lambda$, $0$ is a global minimizer for $f$.
The condition $0\in\partial f(x)$ for $x\neq 0$ is that $$ y = \left(I + \frac{\lambda}{\|x\|_{2,w}}W\right) x$$ One can see by taking the squared $\|\cdot\|_{2,w}^*$ norm of the right hand side that it is $> \lambda$ when $x\neq 0$ and all the weights $w_i$ are positive. Thus, the two cases for a global minimizer are exclusive. Furthermore, one can from the above equation after some manipulations solve for the minimizer $x$ explicitly.

edited Mar 16 '20 at 05:56

answered May 03 '17 at 07:11

amakelov

3,520
12
25

Maybe to clarify the first term can be obtained by applying the chain rule on $f(g(x)), \cases{f(x) = \sqrt{x}\ g(x) = \sum_{i=1}^{n}w_i{x_i}^2}$. – mathreadler May 03 '17 at 10:15
I think this is not the closed form answer as $ W $ depends on $ {x}{i} $. For example if we set $ {w}{i} = 1 $ for all $ i $ we should get the result of the Proximal Operator of the vanilla $ {L}_{2} $ Norm as in https://math.stackexchange.com/questions/1681658. But the results are not the same. – Royi Aug 27 '19 at 14:25
The function we are minimizing is not differentiable, though, so we can't simply set the gradient equal to $0$ and solve for $x$. – littleO Mar 15 '20 at 20:36
1

@littleO, I think the answer is wrong for many reasons (See my comment above). I am trying to get amakelov response but he seems to ignore. I hope he will answer as I think it is a marked answer which is wrong. – Royi Mar 15 '20 at 20:40
I have added some details in the answer, please check it out – amakelov Mar 15 '20 at 22:43
@Royi pointed out that you don't get the correct answer if all the weights are equal to $1$. What do you think about that? – littleO Mar 16 '20 at 03:50
yes, you are right, thank you. i have posted a new answer fixing this – amakelov Mar 16 '20 at 05:57
@amakelov, It has been a year I'm asking you to address this. Why ignoring the comments? – Royi Mar 16 '20 at 06:09
i'm sorry, i haven't been paying any attention to this website for a long while – amakelov Mar 16 '20 at 06:18
It has also now been 3 years since amakelov answered this question, it had been two years when you asked your follow-up. I think demanding an answer years later is a bit immature. Believe it or not, some people do move on from this site. – Andnp Apr 02 '20 at 23:48
@Andnp, you know one can see the last visit of any user to the site, right? So I could have seen the user log into the site and ignored. I think it is not good for the community that wrong answers are marked as correct ones. – Royi Apr 15 '20 at 11:36

Proximal Operator of Weighted $ {L}_{2} $ Norm

1 Answers1

Linked