The previous answer contained a crucial mistake (thanks to the users in the comments for pointing it out) and became a mess of edits, so here's a new, correct one. Denote $\|x\|_{2,w}^2 = \sum_{i=1}^n w_ix_i^2$. Define
$$f(x) = \lambda\sqrt{ \sum_{i = 1}^{n} {w}_{i} {x}_{i}^{2} } + \frac{1}{2} {\left\| x - y \right\|}_{2}^{2}.$$
This is a convex function, being the sum of a norm and a scaled version of the $\ell_2$ squared norm. It is not differentiable everywhere, but it is continuous - so we can essentially replace the gradient by the subgradient, which is defined as
$$\partial f(x) = \{v\in\mathbb{R}^n \ | \ f(y) \geq f(x) + \langle v, y-x\rangle \text{ for all $y$}\}.$$
Then standard facts from convex analysis tell us that:
- $x$ is a minimizer of $f$ if and only if $0\in \partial f(x)$;
- if $f$ is differentiable at $x$, the subgradient $\partial f(x)$ is a singleton containing the gradient of $f$ at $x$;
- given continuous convex functions $f,g$, we have $\partial (f+g)(x)$ is the convex hull of the Minkowski sum of $\partial f(x)$ and $\partial g(x)$.
Now we compute the subgradient of $f(x)$. When $x\neq0$, both summands are differentiable and we obtain the condition
$$ \partial f(x) = \frac{\lambda}{\|x\|_{2,w}}Wx + (y-x).$$
where $W$ is a diagonal matrix with the $w_i$'s as entries.
The case $x=0$ is more interesting. The subgradient of the weighted $\ell_2$ norm at $0$ is by definition
$$\{v \ | \ \langle v, y\rangle \leq \lambda \|y\|_{2,w}\text{ for all $y$}\}$$
which exactly means $\|v\|_{2,w}^*\leq \lambda$, where $\|\cdot\|_{2,w}^*$ is the dual norm to the weighted $\ell_2$ norm, which can easily be seen to be (use Cauchy-Schwarz) $\|y\|_{2,w}^* = \sqrt{y^T W^{-1}y}$. So the total subgradient at zero is
$$\partial f(0) = \{ v - y \ | \ \|v\|_{2,w}^*\leq \lambda\}$$.
Now we must see when $0\in\partial f(x)$.
- The condition $0\in\partial f(0)$ is that $\|y\|_{2,w}^* \leq \lambda$. So, when $\|y\|_{2,w}^* \leq \lambda$, $0$ is a global minimizer for $f$.
- The condition $0\in\partial f(x)$ for $x\neq 0$ is that
$$ y = \left(I + \frac{\lambda}{\|x\|_{2,w}}W\right) x$$
One can see by taking the squared $\|\cdot\|_{2,w}^*$ norm of the right hand side that it is $> \lambda$ when $x\neq 0$ and all the weights $w_i$ are positive. Thus, the two cases for a global minimizer are exclusive. Furthermore, one can from the above equation after some manipulations solve for the minimizer $x$ explicitly.
proximal-operators
tag. I also added it to relevant questions. I will limit the tagging to few per day. By the way, I made a mistake spelling the tag. Could you please rename it toproximal-operator
? No need to the lasts
. Thank You. – Royi Mar 17 '20 at 08:47