Least Absolute Deviation (LAD) Line Fitting / Regression

Question

I want to implement robust line fitting over a set of $n$ points $(x_i,y_i)$ by means of the Least Absolute Deviation method, which minimizes the sum

$$\sum_{i=1}^n |y_i-a-bx_i|.$$

As described for instance in Numerical Recipes, you can find the optimal slope, by canceling the function

$$s(b):=\sum_{i=1}^n x_i\text{ sign}(y_i-bx_i-\text{med}_k(y_k-bx_k))$$

where $b$ is the unknown. To find the change of sign, a dichotomic search is - appropriately - recommended. Anyway, the starting search interval as well as the termination accuracy are based on purely statistical arguments (using a $\chi^2$ test).

I was wondering if there is an analytical way to derive safe bounds for the slopes (i.e. values such that $s(b)$ is guaranteed positive or negative), as well as the termination criterion (minimum of $|s(b)|$). (In fact, I don't even know if the function is monotonic.)

Update:

I have gained some insight in the problem.

If you sort the points on $x$ and set $b$ larger than the slope of all segments between two points, the $y_k-bx_k$ will appear in decreasing order, and the median is at index $n/2$. Then $s(b)$ is the difference of the sum of the $n/2$ largest $x_i$ and the sum of the $n/2$ smallest $x_i$, hence it is certainly positive. And larger values of $b$ make no change.

Similarly, setting $b$ smaller than the smallest slope ensures $s(b)$ negative, and this answers my first question. This can work fine when the $x_i$ are regularly spaced, or close to.

Anyway, in the case such that some of the $x_i$ are equal of very close, the slopes can take high values and the dichotomic process based on the full $[b_{min},b_{max}]$ interval might be inefficient. The ordering of the $y_k-bx_k$ will only change when $b$ crosses the value of one of these $n(n-1)/2$ slopes. Hence, the dichotomic process could be based on the (sorted) slopes rather than on the continuous values of $b$.

Hi Yves ! I shall wait for answers to your post because I also concerned by the same problem. May be, you could also ask at https://stats.stackexchange.com/ — Claude Leibovici, Jan 13 '18 at 14:08
@ClaudeLeibovici: I did before ! But I usually get very few reactions from this site :-( and this is why I reposted here. — , Jan 13 '18 at 14:15
This optimization problem could be solved using linear programming. — Rodrigo de Azevedo, Jan 13 '18 at 20:00
@RodrigodeAzevedo: yep. But then can there be a specific resolution algorithm better than the simplex ? — , Jan 14 '18 at 10:00
To be pedantic, note that there is likely to be a closed interval of many optimal slopes, so the optimal slope might not exist. — HelloGoodbye, Oct 26 '23 at 17:11

Royi · Answer 1 · 2022-07-18T10:09:29.380

Let's formulate the Sum of Absolute Residuals / Least Absolute Deviation (LAD) / $ {L}_{1} $ Regression / Robust Regression:

$$ \arg \min_{x} {\left\| A x - b \right\|}_{1} $$

There are few options to solve this.

Iteratively Reweighted Least Squares (IRLS)

The main trick in the IRLS approach is that:

$$ {\left\| \boldsymbol{x} \right\|}_{p}^{p} = \sum_{i} {\left| {x}_{i} \right|}^{p - 2} {\left| {x}_{i} \right|}^{2} = \sum_{i} {w}_{i}^{2} {\left| {x}_{i} \right|}^{2} $$

On the other hand, it is known (Weighted Least Squares):

$$ \sum_{i = 1}^{m} {w}_{i} {\left| {A}_{i, :} x - {b}_{i} \right|}^{2} = {\left\| {W}^{\frac{1}{2}} \left( A x - b \right) \right\|}_{2}^{2} $$

The solution is given by:

$$ \arg \min_{x} {\left\| {W}^{\frac{1}{2}} \left( A x - b \right) \right\|}_{2}^{2} = {\left( {A}^{T} W A \right)}^{-1} {A}^{T} W b $$

In the case above, where $ p = 1 $, one could set $ {w}_{i} = {\left| {A}_{i, :} x - {b}_{i} \right|}^{-1} $.
Since the solution depends on $ x $ one could set iterative procedure:

$$ {x}^{k + 1} = {\left( {A}^{T} {W}^{k} A \right)}^{-1} {A}^{T} {W}^{k} b $$

Where $ {W}_{ii}^{k} = {w}_{i}^{k} = {\left| {A}_{i, :} {x}^{k} - {b}_{i} \right|}^{p - 2} = {\left| {A}_{i, :} {x}^{k} - {b}_{i} \right|}^{-1} $.

Sub Gradient Method

The Sub Gradient is given by:

$$ \frac{\partial {\left\| A x - b \right\|}_{1}}{\partial x} = {A}^{T} \operatorname{sgn} \left( A x - b \right) $$

So the solution is given by iterating:

$$ {x}^{k + 1} = {x}^{k} - \alpha \left[ {A}^{T} \operatorname{sgn} \left( A x - b \right) \right] $$

Linear Programming Conversion

This approach convert the problem into Linear Programming problem.

By defining $ {t}_{i} = \left| {A}_{i, :} x - {b}_{i} \right| $ the optimization problem becomes:

$$\begin{align*} \text{minimize} \quad & \boldsymbol{1}^{T} \boldsymbol{t} \\ \text{subject to} \quad & -\boldsymbol{t} \preceq A x -b \preceq \boldsymbol{t} \\ & \boldsymbol{0} \preceq \boldsymbol{t} \end{align*}$$

Which is a Linear Programming form of the problem.

MATLAB Code

I implemented and validated the approaches by comparing it to CVX solution. The MATLAB code is available at my Mathematics StackExchange Question 2603548 Repository.

Remark: Pay attention that the MATLAB code is a vanilla implementation. Real world implementation should take into account numerical issues (Like the inverse of a small number).

It doesn't look like any of the to-be-written sections in your post would answer the OP's specific questions, "I was wondering if there is an analytical way to derive safe bounds for the slopes (i.e. values such that $()$ is guaranteed positive or negative), as well as the termination criterion (minimum of $|()|$)." — , Aug 11 '19 at 11:49
The way I see it, @YvesDaoust tries to solve the problem in general. He took specific numerical method and wants to optimize its iterative steps. Yet I offer different approaches to the same end - Solving the problem. — Royi, Aug 11 '19 at 12:28
the weight should be wi=max(eps, 1/abs(ri)), otherwise a near zero residual (by chance) will create a near infinite weight. — Felipe G. Nievinski, Jul 17 '22 at 03:03

Least Absolute Deviation (LAD) Line Fitting / Regression

1 Answers1

Iteratively Reweighted Least Squares (IRLS)

Sub Gradient Method

Linear Programming Conversion

MATLAB Code