How to Solve Large Scale Linear Least Squares with Simplex Constraints

Question

I need to solve a rather simple optimization problem, however the structure of the problem makes out-of-the-box commercial solvers fail. The problem is:\begin{aligned} &\min_{x_i}\|\sum_{i=1}^{10}x_i\mathbf{a}_i - \mathbf{b}\|_2^2 \\ &s.t.: \sum_{i=1}^{10}x_i = 1,\; x_i\geq0 \end{aligned} The decision variables are $x_i$, which are scalars, and typically the problem will have 10 variables. The complexity comes from the fact that $\mathbf{a}_i, \mathbf{b}$ are VERY LONG constant vectors, varying from 10,000 to 100,000 elements and possibly more. Without going too much into details, black-box solvers fail since they impute the variable over vectors, thus converting a 10-D problem to 100K-vars problem.

Since the problem has a simple structure (only complicated by the constant vectors), I think that interior point algorithms can perform well here. Also, the Hessian is very simple and easy to compute - it's $ij$-th entry is simply $\langle \mathbf{a}_i, \mathbf{a}_j \rangle$, with size $10\times 10$. I'm very familiar with optimization, however my experience is mostly with first-order algorithms. I'm almost clueless when it come to IPMs and not sure how to implement an appropriate algorithm. Does anyhow have a ready script I can use? Or any other tips that might help?

Do you really have 10 different $b$ vectors, or is the subscript on $b$ a typo? — prubin, Jan 07 '21 at 23:53
@prubin it's a typo, thanks for pointing that out. Do you know of an algorithm to solve the problem? — iarbel84, Jan 08 '21 at 07:57
For commercial SOCP or QP solvers a simple regression problem like this with 100K variables is a piece of cake unless you have extremely bad scaling or similar numerical issues caused by your data. You should contact the support of the solver for help if you experience such issues. — Michal Adamaszek, Jan 08 '21 at 07:59
Also, somehow I missed it at first reading, this is just a tiny QP with 10 variables. Why don't you input it like that? You can compute the coefficients in the objective yourself. — Michal Adamaszek, Jan 08 '21 at 08:07
@MichalAdamaszek believe me, I tried. When feeding the problem to CVXPY and trying to solve with CVXOPT and Gurobi, it failed. I also tried solving it directly through Gurobi Python API, and failed. I'm frustrated as this is really a simple quadratic problem, as you pointed out. Something about the structure of this problem isn't working for out-of-box solvers, it's like too many redundant arithmetic operations are made. That's why I'm trying to code a solution — iarbel84, Jan 08 '21 at 08:17
I would suspect the objective value becomes humongous and that causes the issue then. Hard to say without the data and code. — Michal Adamaszek, Jan 08 '21 at 09:49
Just to close the loop, I would have suggested a QP solver, as Michal Adamaszek did. If none were available, I might replace $x_10$ with $1-x_1 - \dots -x_9$ (to make all constraints inequalities), then take a barrier function onto the objective and try gradient descent. — prubin, Jan 08 '21 at 18:00
my answer is the only one that addresses the scale of the problem, could you mark it as the answer? — LinAlg, Feb 23 '21 at 15:04

Royi · Answer 1 · 2021-01-12T05:36:38.147

3

The problem is given by:

\begin{aligned} &\min_{x_i}\|\sum_{i=1}^{N}x_i\mathbf{a}_i - \mathbf{b}\|_2^2 \\ &s.t.: \sum_{i=1}^{N}x_i = 1,\; x_i\geq0 \end{aligned}

The problem is equivalent to:

$$ \begin{alignat*}{3} \arg \min_{x} & \quad & \left\| A \boldsymbol{x} - \hat{\boldsymbol{b}} \right\|_{2}^{2} \\ \text{subject to} & \quad & {x}_{i} \geq 0 \\ & \quad & \boldsymbol{x}^{T} \boldsymbol{1} = 1 \end{alignat*} $$

Where $ A = \left[ \boldsymbol{a}_{1}, \boldsymbol{a}_{2}, \ldots \right] $ and $ \hat{\boldsymbol{b}} = N \boldsymbol{b} $.

This is basically non negative least squares with Linear Equality Constraints.
You could easily solve it with many Least Squares solvers.

You could also utilize Orthogonal Projection onto the Unit Simplex with some acceleration (FISTA like) in the Projected Gradient Descent framework and have low memory and pretty fast solver even for large size problem.

When using the Projected Gradient Descent you should pre calculate $ {A}^{T} A $ and $ {A}^{T} \hat{\boldsymbol{b}} $ then the calculation per loop should be pretty small.

edited Jan 12 '21 at 05:36

answered Jan 07 '21 at 23:11

Royi

8,711

Are you sure it's equivalent? Notice that in the problem I posted, the summation is inside the norm, not summing over norms. Also, there is only one $\mathbf{b}$ vector, there was a typo before with the subscript in it – iarbel84 Jan 08 '21 at 08:18
BTW I'm currently solving it with FISTA. However, the derivative $partial x_j=\sum_{k=1}^n (a_j∗\sum_{i=1}^{10}x_i a_i)_k$. Meaning there are A LOT of arithmetic operations that have to take place at each step. On the contrary, the Hessian does not involve the problem variable, therefore it can be calculated only once so this is a significant improvement, plus an interior point method will take fewer iterations. That's why I would like to explore that option – iarbel84 Jan 08 '21 at 08:38
Yes. If the norm was inside than the equivalent vector would be the mean instead of the sum. You can chose using 2nd order methods which will require the Hessian which is easy. Again, usually you'd use Least squares solvers for this. Accelerated Projected Gradient Descent will also be fairly decent in my opinion. – Royi Jan 08 '21 at 11:36
OK. By reducing the subscript al together the answer isn't valid any more. I will update it. – Royi Jan 08 '21 at 11:50
I updated the answer to remove the subscript. – Royi Jan 08 '21 at 11:56
THANK YOU! I totally missed that option. That was very helpful, as now I was able to properly feed the data into a solver – iarbel84 Jan 08 '21 at 13:54
@iarbel84, Why did you mark the other answer as the answer. My answer is the one which solved the problem. The other solution explicitly use my form. – Royi Jan 08 '21 at 22:46

score 2 · Accepted Answer · answered Jan 08 '21 at 10:18

2

Since you mention cvxpy I wrote quick code and it works without any problem. $A,b$ are like in the other answer.

import cvxpy as cp 
import numpy as np
N = 500000
K = 10
A = np.random.uniform(size=(N,K))
b = np.random.uniform(size=(N))
x = cp.Variable(K)
prob = cp.Problem(cp.Minimize(cp.norm(A@x-b)), [cp.sum(x)==1,x>=0])
prob.solve(solver=cp.MOSEK,verbose=True)

answered Jan 08 '21 at 10:18

Michal Adamaszek

4,018

Definitely did the work. I was very caught up in an idea I had in my mind so I missed the option to formulate the problem this way. It's working good now, and for sure I can still optimize it a bit – iarbel84 Jan 08 '21 at 13:55
I have another question now, maybe you could help? I found that ECOS is the best option to solve the problem. I don't require a very high precision for the solution, $1e-3$ is enough. I tried setting "abstol" (through cvxpy) to that level but it just ignores that option. Also "reltol" didn't work. Do you have experience dealing with this kind of problem? – iarbel84 Jan 08 '21 at 14:36

score 1 · Answer 3 · answered Jan 12 '21 at 03:01

The dimensions of $A$ and $b$ are not relevant if you do some preprocessing:

$$||Ax-b||^2 = x^T(A^TA)x - 2 b^TAx+b^Tb$$

You can compute $A^TA$ and $A^Tb$ which are $10 \times 10$ and $10\times 1$ and specify the objective function with the formula on the right hand side, so the optimization algorithm never sees the 100k data points.

How to Solve Large Scale Linear Least Squares with Simplex Constraints

3 Answers3