3

Say I have n points on a plane which are not collinear. What's the most efficient way to translate each point so that all points are collinear? I want to minimize the sum of the distances traveled.

I'm guessing that taking a linear regression using https://en.wikipedia.org/wiki/Least_absolute_deviations and then moving each point to the line along a perpendicular path is at least a close approximation. Is that the optimal solution though? I'm worried that it may be inaccurate since LAD seems to focus on the y-axis rather than my "distance traveled" metric.

I feel like this must be a solved problem, but maybe I don't have a good enough vocabulary to search for it. No urgency here, I'm just curious.

Edit: an interesting variation is minimizing the max distance, i.e. how does a crowd of people line up ASAP if they're all equally speedy.

Tyler
  • 133
  • @Tyler: If you fix the slope of the line, you can rotate the problem so that the distance traveled is always vertical. My instincts are you'll have to do something less elegant to optimize the slope, though. –  Apr 09 '18 at 21:44
  • @Hurkyl Good point. – amd Apr 09 '18 at 21:46
  • @Hurkyl interesting, maybe I should experiment to see if a LAD regression line remains fixed with respect to the cloud as it rotates. – Tyler Apr 09 '18 at 21:50
  • On the other hand maybe I should abandon LAD because I think I found a counter-example: (0, 0), (0, 1), (1, 0) has a lot of LAD solutions, e.g. y=0, which are clearly not solutions to my problem. – Tyler Apr 09 '18 at 21:58
  • if you fix the direction, you know the line has to leave the same number of points on each side. now the optimum is easy a sum of projections (reduce to $1$ dim case). But optimizing this over all directions seems hard. There could be numerical algorithms. – orangeskid Apr 10 '18 at 02:09

2 Answers2

1

ADDED: OOps, I did the sum of distance squared problem by mistake, it's much simpler. The sum of distances seems much harder.

I think it has to do with least squares and factor analysis, but we can try an ad hoc approach. Below are some hints.

  1. First of all, the line must pass through the center of mass of the points. If not, consider a parallel line passing through the center. One can check that the sum of the squares of distances is decreased. ( there is an elementary geometry formula for that).

  2. Consider now lines through the center of mass. May assume now this is the origin. The distance from vector $v$ to line with unit direction vector $v$ is $$|v|^2 - |\langle v,u\rangle|^2$$

Therefore, we need to find $u$ unit vector so that $$\sum_{i=1}^n |\langle v_i,u\rangle|^2 $$is maximum

Open the brackets and get a quadratic form in the components $(u_1, u_2)$ of $u$. This transforms in the problem of maximizing a quadratic form on a sphere ($S^1$ in this case). There appear eigenvalues and eigenvectors.

orangeskid
  • 53,909
1

For any fixed direction of line, the maximum distance is minimized by placing the line midway between the two antipodal points on the convex hull of the point set. The corresponding distance is half the width of the convex hull in the direction perpendicular to the line. Therefore, the problem is equivalent to finding the direction in which the convex hull has minimum width, and can be solved using the rotating calipers algorithm.

Minimizing the total distance is likely to be harder: if you replace the unknown line with an unknown point, this is the problem of finding the geometric median, which does not have an analytical solution or a direct algorithm. However, an iteratively reweighted least-squares approach may work. Using the notation of the linked article, let $\beta$ be the unknown line and $f_i(\beta)$ the projection of the $i$th point $y_i$ onto it, so that you want to minimize $\sum\|y_i-f_i(\beta)\|$. Iteratively reweighted least-squares will require you to solve subproblems of the form $$\beta^{(t+1)}={\arg\min}_\beta\sum w_i^{(t)}\|y_i-f_i(\beta)\|^2$$ instead, where $w_i^{(t)}=\|y_i-f_i(\beta^{(t)})\|^{-1}$. This you can do using an eigendecomposition, as described in orangeskid's answer (albeit slightly modified to account for the weights $w_i^{(t)})$.

  • This doesn't seem so easy after all. This paper seems to deal with this problem: https://arxiv.org/pdf/1405.6785.pdf – Jap88 Apr 11 '18 at 02:04