Minimize $\frac{1}{2}\sum_{i=1}^p (y_i-x_i)^2$ such that $\sum_{i=1}^p y_i - 1=0$ and $\forall i \in [\![ p ]\!]: -y_i \le 0$ by KKT method

Question

I asked how to solve this optimization here. I found this approach by combining @Royi's idea in his answer with KKT's conditions. Personally, I feel my formulation is clearer and easier to understand.

Could you please verify if my proof is correct or contains logical mistake? Thank you so much!

Let $[\![ p ]\!]:=\{1,\ldots,p\}$ and $(x_1,\ldots,x_p) \in \mathbb R^p$. Solve the constrained optimization problem $$\begin{align*} \text{min} &\quad \frac{1}{2}\sum_{i=1}^p (y_i-x_i)^2 \\ \text{s.t} &\quad \sum_{i=1}^p y_i - 1 &&=0\\ &\quad\forall i \in [\![ p ]\!]: -y_i &&\le 0 \end{align*}$$

$\textbf{My attempt}$ Define $$\begin{aligned} f(y) &= \frac{1}{2}\sum_{i=1}^p (y_i-x_i)^2 \\ h(y) &= \sum_{i=1}^p y_i - 1 \\ \forall i \in [\![ p ]\!]: g_i(y) &= -y_i \end{aligned}$$ We have $f,g_i$ are convex and $h$ is linear. Let $a =(1/p, \ldots, 1/p)$. Then $h(a)=0$ and $g(a) <0$ for all $i \in [\![ p ]\!]$. It follows that Slater's condition is qualified. By Karush-Kuhn-Tucker conditions, we have $$\begin{aligned} \begin{cases} \forall i \in [\![ p ]\!]:\mu_i &\ge 0 \\ \forall i \in [\![ p ]\!]: g_i(y) &\le 0\\ h(y) &=0 \\ \forall i \in [\![ p ]\!]:\mu_i g_i(y)&=0 \\ \nabla f (y)- \lambda\nabla h (y)+ \mu_i \nabla g_i (y) &=0 \end{cases} &\iff \begin{cases} \forall i \in [\![ p ]\!]:\mu_i &\ge 0 \\ \forall i \in [\![ p ]\!]:-y_i &\le 0\\ \sum_{i=1}^p y_i - 1&=0 \\ \forall i \in [\![ p ]\!]: -\mu_i y_i &=0 \\ \forall i \in [\![ p ]\!]: (y_i - x_i) -\lambda - \mu_i &= 0 \end{cases} \end{aligned}$$

If $x_i+\lambda = 0$ then $y_i=\mu_i =0$ and thus $y_i = (x_i+\lambda)_+$. If $x_i+\lambda > 0$ then $y_i>0$ and thus $\mu_i=0$. Then $y_i = (x_i+\lambda)_+$. If $x_i+\lambda < 0$ then $\mu_i>0$ and thus $y_i=0$. Then $y_i = (x_i+\lambda)_+$. As such, we always have $y_i = (x_i+\lambda)_+$.

Then $\sum_{i=1}^p y_i - 1=0 \iff \sum_{i=1}^p (x_i+\lambda)_+ - 1=0$. Notice that $(x_i+\lambda)_+ = \max \{x_i+\lambda,0\}$ is continuous in $\lambda$ for all $i \in [\![ p ]\!]$. Hence $\psi(\lambda) = \sum_{i=1}^p (x_i+\lambda)_+ - 1$ is continuous in $\lambda$. Let $\alpha = -\max_{i \in [\![ p ]\!]}|x_i|$ and $\beta =1+ \max_{i \in [\![ p ]\!]}|x_i|$. It follows that $\psi(\alpha)<0<\psi(\beta)$. By Intermediate Value Theorem, the equation $\psi(\lambda)=0$ has a solution. Notice that $\psi$ is strictly increasing, so such solution is unique. We can also solve that equation by applying Intermediate Value Theorem on the interval $[\alpha , \beta]$.

This optimisation problem can be solved in at most $n$ steps. You don't need Newton's method. The key is that projection onto a convex set $C$ is the same as projecting onto the affine hull first and then projecting on to $C$. When you project onto the affine hull you can drop inactive constraints (at least one of not optimal) and repeat. — copper.hat, Feb 28 '20 at 17:57
Thank you @copper.hat! I got your point about the complexity of the algorithm. Do you feel my proof is fine? — Akira, Feb 28 '20 at 18:02
Ah @copper.hat I meant my solution/approach to solve the system of equations from KKT. — Akira, Feb 28 '20 at 18:14
Sorry, I might be slow this morning, but I am not seeing a method above. The existence of a solution is known because the cost is continuous and the feasible set is compact. — copper.hat, Feb 28 '20 at 18:26
@copper.hat I meant am I correct that we solve the minimization problem by solving $\sum_{i=1}^p (x_i+\lambda)_+ - 1=0$ ;) — Akira, Feb 28 '20 at 18:34
Hi @Royi, in this question, I ask for proof verification. It's not the same as the other one. — Akira, Feb 28 '20 at 19:06
But your solution is exactly what I did in https://math.stackexchange.com/questions/2402504. So I am not sure what you're doing. — Royi, Feb 28 '20 at 19:07
Honestly, I'm unable to follow your logic in that thread @Royi. I can not understanding "The trick is to leave non negativity constrain implicit" in my sense of KKT theorem. I meant your solution does match how I understand KKT. — Akira, Feb 28 '20 at 19:12
@Royi but I don't leave any constrain implicit ^^. I just wrote down the KKT system of equation and solved it. Of course, how I solved it is inspired by your expression $(x_i+\lambda)_+$ ;) — Akira, Feb 28 '20 at 19:20
I need more time to look at it, your equation may be correct. However it is not differentiable, so you cannot use Newton's method. Also, there is a simpler way, but I need some time to resurrect some old memories after I finish work. — copper.hat, Feb 28 '20 at 19:31
Ah @copper.hat, I meant to use Intermediate Value Theorem to solve the equation $\psi(\lambda)=0$ on the interval $[\alpha , \beta]$. — Akira, Feb 28 '20 at 19:40

Minimize $\frac{1}{2}\sum_{i=1}^p (y_i-x_i)^2$ such that $\sum_{i=1}^p y_i - 1=0$ and $\forall i \in [\![ p ]\!]: -y_i \le 0$ by KKT method

0 Answers0

Linked