1

How to minimize the follow optimization

$$\begin{array}{ll} \text{minimize} & \| \mathbf{A}\mathbf{x} - \mathbf{b} \|^2\\ \text{subject to} & {\mathbf{x}}^{T} \mathbb{1} = \mathbb{1}\\ & \mathbf{x} \geq 0\end{array}$$

where $\mathbf{x} \in R^{n}$ and $\mathbf{b} \in R^{m}$ are vectors and $\mathbf{A}\in R^{m \times n}$ is a matrix.

The constraint means that all element of vector $ \mathbf{x}$ are non-negative and sum of its all element is equal to $1$.

Royi
  • 8,711
jason
  • 829
  • Forward backward, also known as proximal gradient descent, can be used to solve this problem. – Jürgen Sukumaran May 09 '18 at 16:19
  • @TonyS.F. Can you give more details or some reference? Thanks – jason May 09 '18 at 16:24
  • You forgot to mention nonnegativity in the title, that is quite important here. – LinAlg May 09 '18 at 17:18
  • The possible amount of methods is huge. If you provide us with some more detail, e.g. how large are your typical $n$ and $m$, do you value simplicity over speed or not, or do you have a way to invert $A^T A$ efficiently, we would be able to recomment the most suitable method. – Alex Shtoff May 09 '18 at 19:18
  • @Alex, This a good point, $m$ and $n$ are large. And orthogonal $AA^T = I$, I don't show this constraint in this problem since my goal is to solve $x$. – jason May 09 '18 at 19:29
  • 2
    If they are large enough so that inverting $A^T A$ might take up huge amount of resources, then the methods suggested below by Tony are excellent. – Alex Shtoff May 09 '18 at 19:36
  • Related - https://math.stackexchange.com/questions/2935650. – Royi Mar 18 '20 at 21:06
  • Related - https://math.stackexchange.com/questions/3350835. – Royi Mar 18 '20 at 21:07
  • Related - https://math.stackexchange.com/questions/953740. – Royi Mar 18 '20 at 21:09

1 Answers1

1

I don't know why you write your vector before your matrix multiplication, I will do it slightly differently. We can write the problem with the constraint as an indicator function,

$$\min\limits_{x\in\mathbb{R}^n} \|Ax-b\|^2 + i_{\Delta_n} (x)$$

where $\Delta_n = \{x\in \mathbb{R}^n: x_i\geq 0, \sum\limits_{i=1}^nx_i =1\}$ is the unit simplex in $\mathbb{R}^n$ and $i_C(x)$ is the indicator function for the set $C$.

Our objective function is the sum of a smooth function $f(x) = \|Ax-b\|^2$ and a nonsmooth function $g(x) = i_{\Delta_n} (x)$. Forward backward splitting says we should do gradient descent on the smooth part and then apply the proximal operator of the nonsmooth part,

$$x^{k+1} = \mbox{prox}_{\gamma g}(x^{k} - \gamma \nabla f (x^{k}))$$

where $\gamma$ is a step size determined by the regularity of $f$ (i.e. Lipschitz constant of $\nabla f$). The prox of $\gamma g$ is the projection onto $\Delta_n$.

We can interpret this approach as taking a step in the direction given by gradient descent on $f$ and then projection back onto the constraint set $\Delta_n$.


You can also solve this problem using Frank-Wolfe or conditional gradient methods. This is a class of methods which works for minimizing a $C^{1,1}$ function (i.e. differentiable with Lipschitz continuous gradient) over a compact, convex set. Since the constraint set is $\Delta_n$, which is compact, we are free to try conditional gradient here in the following way,

$$s^{k} = \arg\min\limits_{s\in \Delta_n} \langle \nabla f(x^{k}), s\rangle\\ x^{k+1} = x^k - \gamma_k(x_k-s_k)$$

where $\gamma_k\in[0,1]$ is the step size, usually required to be in $\ell^2\backslash\ell^1$.

This method can be seen as linearizing our objective $f$, finding a step direction using this linear minimization oracle (the inner product we minimize), and then taking a convex combination of the step direction and our current iterate. This ensures that we will remain in the constraint set because it is convex.

We can compute the linear minimization oracle easily if the set we are looking at can be written as the convex hull of a finite set of atoms which happens in our case, $\Delta_n = \mbox{conv}\{e_1, \ldots, e_n\}$.

  • 1
    I added a solution with code for this case which implements the Frank Wolfe Algorithm in https://math.stackexchange.com/a/3586172. – Royi Mar 18 '20 at 23:53