How to minimize $\| x {\bf a} - {\bf b} \|_1$ without using linear programming?

Question

_{Note: The following question is a generalization of a question asked earlier today.}

Given vectors ${\bf a}, {\bf b} \in \mathbb R^n$, can one solve the following minimization problem without using linear programming? If so, how?

$$\begin{array}{ll} \underset{x \in {\Bbb R}}{\text{minimize}} & \| x {\bf a} - {\bf b} \|_1 \end{array}$$

If ${\bf a} = {\bf 1}_n$, one can use the median. If ${\bf a} = \begin{bmatrix} 1 & 2 & \cdots & n\end{bmatrix}^\top$, Siong showed that one can also use the median. What can one do in the general case?

Generate a list of candidate values $b_i/a_i$, sort them, and use convexity to do a binary search for the norm-minimizing one? I'm not sure whether this counts as "without using linear programming," but unless I'm missing something it runs in time $n \log n$... — Micah, Apr 06 '19 at 16:50
Added/edited an answer outlining how to use a proximal method. It should be noted subgradient selection algorithms can also be used here; might add if I've got time. — Zim, Aug 04 '20 at 17:49
@Zim, Any chance you add the "subgradient selection algorithms" option? Or do you mean classic sub gradient with regard to the cost function? — Royi, Jan 20 '24 at 08:58
@Royi Sure, I added a short note which should hopefully be enough to get folks in the right direction :-) — Zim, Jan 22 '24 at 11:06

Micah · Answer 1 · 2023-11-18T22:07:49.037

I'm not sure if this counts as "without using linear programming", but it's at least relatively fast (it has runtime $O(n \log n)$).

Let $f$ be the objective function. Notice that $f(x)=\sum_{i=1}^n |a_i x - b_i|$ is piecewise linear, and also (non-strictly) convex, and so the slope of $f$ is a (non-strictly) increasing function. The minimum of $f$ will occur either on an interval where the slope is zero, or at a point where it switches from positive to negative. We can proceed as follows.

Compute all the points of nonlinearity $b_i/a_i$ ($O(n)$) and sort them ($O(n \log n)$). Call the sorted values $x_1,x_2\dots,x_n$.
Let $k=\left\lfloor\frac{n}{2}\right\rfloor$ and compute the slope of $f$ on the interval of linearity $[x_k,x_{k+1}]$ ($O(n)$). If this slope is positive, we're to the right of the minimum; if it's negative, we're to the left of the minimum.
Perform a binary search, doing step 2) $\log n$ more times with different values of $k$ ($O(n\log n)$). Eventually you will find some $x_\ell$ such that either $f$ has slope zero on $[x_\ell,x_{\ell+1}]$, or the slope is negative on $[x_{\ell-1},x_\ell]$ but positive on $[x_\ell,x_{\ell+1}]$. Then $f(x_\ell)$ is your minimum value.

If you walked through adjacent values of $x_k$ instead of doing a binary search, you would essentially be minimizing $f$ via the simplex method, which is why I'm not totally sure this isn't linear programming. But it does seem like the binary search essentially exploits the one-dimensionality of the problem.

What if the data is complex (The vectors, while the parameter is still real)? Will it still work? — Royi, Jan 20 '24 at 17:06

Zim · Answer 2 · 2024-01-28T17:19:31.883

EDIT: Here are two methods.

Your function is always convex, proper, and continuous (hence lower-semicontinuous). Under mild conditions on your problem -- e.g. when $a\neq\mathbf{0}$ -- your function is coercive which guarantees existence of a minimizer (also if $a=\mathbf{0}$ then the problem is trivial and every number is a minimizer). Since we are now qualified to find a minimizer, let us proceed.

Method 1: The function $g\colon x\mapsto \|ax-b\|_1$ can be viewed as a linear operator applied to $x$ followed by application of a translated $1$-norm, i.e., $g= f\circ L$, where $L\colon\mathbb{R}\to\mathbb{R}^N\colon x\mapsto ax$ and $f=\|\cdot-b\|_1 $.

The goal here is to use a product-space reformulation e.g., described in Section 3.1 here and a splitting algorithm (e.g., Douglas-Rachford / ADMM). The reformulation would replace minimizing $f\circ L$ over $\mathbb{R}^n$ with minimizing $f(x) + \iota_G(y) + \iota_D(x,y)$, over $\mathbb{R}^n\times\mathbb{R}^n$, where $\iota_C$ is an $0$-$\infty$ indicator function, $G$ is the graph of $L$, and $D$ is the diagonal subspace (mentioned in the product space reformulation literature). Now, to use these algorithms, you need $\textrm{prox}_f$ and the projection onto $G$.

Regarding the prox of $f$, note that for any $\lambda\in\left]0,+\infty\right[$, the proximity operator of $\lambda \|\cdot\|_1$, is the component-wise soft thresholder with parameter $\lambda$ which I'll call $\textrm{soft}_\lambda$. The following propositions can be found in Bauschke & Combettes' book, 2nd edition. It follows from Proposition 24.8(ii) that $\textrm{prox}_{\lambda f}(x)=b+\textrm{soft}_{\lambda}(x-b)$. Several formulae for the projection onto $G$ are available in the same book, page 540.

Method 2: The subgradient descent algorithm (using subgradients of the objective function) is also guaranteed to work if we manage the stepsizes properly. The relevant literature vocabulary here is "subgradient projectors", and a good reference is Section 29.6 of Bauschke & Combettes' book (2nd edition).

If I have not yet upvoted, it is not because of indifference, but because I am not qualified to assess the quality of your answer. — Rodrigo de Azevedo, Aug 05 '20 at 22:40
@RodrigodeAzevedo I'm happy to answer any questions you may have! The theory side of this algorithm is heavily rooted in convex analysis. — Zim, Aug 06 '20 at 14:52
@Zim, Have you verified it in code? I think the requirement to do so is having $\boldsymbol{a} \boldsymbol{a}^{T} = \mu \boldsymbol{I}$ which is not in this case. Am I missing something? — Royi, Jan 28 '24 at 07:03
@Royi wow good catch, yes you're right! I had misread the requirement as $a^\top a=|a|^2=\mu I$ which would have been satisfied for any nonzero $a$, but I was wrong. I will edit my answer to reflect this. — Zim, Jan 28 '24 at 16:34
@Royi if you have to solve a problem which doesn't satisfy that assumption, I've provided a few references in the "method 1.5" section for a more general (albeit more memory-intensive) prox-based method. — Zim, Jan 28 '24 at 16:55
@Zim, I'd remove the whole writing on the first original method. It won't work as $\boldsymbol{a}$ can not hold the assumption. The derivation is wrong Indeed with ADMM like methods it is easy to solve. Computer wise, I'm pretty sue that 1D search either on the gradient or the function itself will be more efficient. — Royi, Jan 28 '24 at 16:59
@Royi Ah good point, the assumption cannot hold since $aa^\top$ is rank one. Now my answer just includes the subgradient one and a splitting method. — Zim, Jan 28 '24 at 17:21

How to minimize $\| x {\bf a} - {\bf b} \|_1$ without using linear programming?

2 Answers2

Linked