When I am explaining concept of linear regression to one of my peers, I got stuck in answer this question. Why don`t we use Manhattan distance instead of euclidean distance in linear regression? Can anyone give intuition behind this?
-
Least squares is easier to minimize than least absolute deviation (LAD) because the latter is non-differetial. Nowadays, LAD aka median regression is also quite frequently used. – Michael M Dec 25 '18 at 15:04
4 Answers
Linear regression does not typically use Euclidean distance. The most common loss for linear regression is the least-squares error. It might be useful to examine this idea visually.
Here is least squared error:
The orange lines show examples of residuals, the delta between predicted and observed for the target variable. Each residual is squared, all of the squares are summed, and then divided by the count.
Here is Euclidean distance:
Euclidean distance is different, the orange lines show it as the perpendicular distance of the point to the nearest point on the line.

- 21,136
- 2
- 26
- 109
-
This is an interesting take on the question, and I have linked it in my answer for that reason. However, I think this is a matter of minimizing square loss vs minimizing absolute loss. – Dave Sep 19 '23 at 03:46
Another answer discusses why “Euclidean distance” might not be the best way to describe ordinary least squares regression. However, it is reasonable to view ordinary least squares as minimizing the Euclidean distance between the observed and fitted values, even though this is not equivalent to minimizing the Euclidean distance between the observations and the regression line.
So why not minimize the Manhattan distance between the true and fitted values? This is actually done with some frequency. This is the minimization of the sum of absolute errors/deviations/residuals, and it is equivalent to quantile regression at the median.
There are a number of reasons why minimizing this Manhattan distance is less popular than minimizing the Euclidean distance, some of which are better than others.
A big one is that minimizing the sum if squared residuals is equivalent to maximum likelihood estimation for Gaussian errors.
Another big one is that the Gauss-Markov theorem gives the conditions where the ordinary least squares solution (minimization of Euclidean distance) is the minimum-variance unbiased estimator of the regression coefficients, and this does not even make a Gaussian assumption about the error term.
There is a closed-form solution in the linear case. There is no general closed-form solution for minimizing the sum of absolute deviations.
There is an especially large penalty for bad misses. While this need not be desirable behavior, it might be.
It’s tradition. Almost everyone does it that way, and who wants to rock the boat?
While it is possible to break this, especially for large sample sizes, minimizing square loss behaves as expected for many violations of standard assumptions (that is, OLS is "robust" to many (but not all!) deviations from ideal circumstances). With this in mind, again, why rock the boat when so many practitioners know OLS and it tends to work quite well?
(As I said, some reasons are better than others.)

- 3,818
- 1
- 8
- 29
The Euclidean distance is the most commonly used measure of distance in the context of least squares regression, because it is the distance measure that is induced by the Euclidean norm. In other words, the Euclidean distance is the natural distance measure to use in least squares regression, because it is the distance measure that is consistent with the assumptions of the least squares criterion.
The use of the Euclidean distance in linear regression is motivated by the fact that the least squares criterion is based on the assumption that the error between the observed values and the predicted values is normally distributed. Under this assumption, the sum of squared errors (SSE) is the appropriate measure of the goodness of fit of the model, and the model parameters are chosen to minimize the SSE.
The Manhattan distance, on the other hand, is a different distance measure that is not induced by any vector norm. It is defined as the sum of the absolute differences between the coordinates of two points, and it is not generally used in the context of least squares regression.
One reason why the Manhattan distance is not commonly used in linear regression is that it does not have the nice mathematical properties that the Euclidean distance has. For example, the Euclidean distance is a continuous function, while the Manhattan distance is not. This means that the Manhattan distance cannot be easily differentiated, and it is not as amenable to mathematical analysis as the Euclidean distance.
In addition, the Manhattan distance does not have the same statistical properties as the Euclidean distance.

- 3,808
- 13
- 31
- 54
-
-
Yes, Manhattan distance, also known as "L1 norm" and "Taxicab Distance", – Pluviophile Sep 19 '23 at 11:46
-
1Then your claim about Manhattan distance not being induced by a norm appears not to be correct. – Dave Sep 19 '23 at 12:00
The main reason for that may be typically use Euclidean metric; Manhattan may be appropriate if different dimensions are not comparable.. While using regression-based methods you may have noticed that you usually have features of real values. You usually normalise the features and feed them to your model. The act of normalising features somehow means your features are comparable. In cases where you have categorical features, you may want to use decision trees, but I've never seen people have interest in Manhattan distance but based on answers [2, 3] there are some use cases for Manhattan too. You can also consider that they are comparable.

- 14,058
- 9
- 57
- 98
-
1I am struggling to understand how this answers the question. Could you please clarify? – Dave Sep 18 '23 at 19:56