Weighted line-fitting using SVD

Question

I have a set of points $\boldsymbol{p}_i \in \mathbb{R}^d$ ($i=1,\cdots, n$) and I need to perform line fitting. However, some points matter more in this process. More precisely, if we denote with $distance\left(\boldsymbol{p}_i, \ell\right)$ the distance from the $i$-th point to a line $\ell$, I want to find the line that minimizes the cost

$$ f(\ell) = \sum_{i=1}^n w_i \left[distance\left(\boldsymbol{p}_i, \ell\right)\right]^2 $$

wherein $w_i\in\mathbb{R}$ is the non-negative weight associated to the $i$-th point.

My intuition (plus some tinkering with "trivial" cases) told me that the solution can be found using a modified version of the "standard" SVD-based approach. The algorithm would be:

Calculate the barycenter of the points $$\boldsymbol{g} = \frac{\sum w_i\boldsymbol{p}_i}{\sum{w_i}}$$
Build a $n\times d$ matrix $\boldsymbol{A}$ as $$ \boldsymbol{A} = \begin{bmatrix} \sqrt{w_1}\left(\boldsymbol{p}_1 - \boldsymbol{g}\right) \\ \cdots \\ \sqrt{w_n}\left(\boldsymbol{p}_n - \boldsymbol{g}\right) \end{bmatrix} $$ (what I mean is that the vectors $\sqrt{w_i}\left(\boldsymbol{p}_i - \boldsymbol{g}\right)$ are the rows of the matrix)
Perform a SVD, $\boldsymbol{A} = \boldsymbol{U}\boldsymbol{S}\boldsymbol{V}^*$
The best-fitting line is the one passing through $\boldsymbol{g}$ and having $\boldsymbol{v_1}$ (first row of $\boldsymbol{V}$) as direction

Can someone confirm that the result is true in all dimensions and possibly provide a proof for it?

For those interested, the analytical solution in the 2D-case can be obtained by considering the generic 2D line given by the equation $$ x \sin\theta - y \cos\theta + \rho = 0 $$ for which the cost function writes as $$ f(\theta, \rho) = \sum w_i\left(x_i \sin\theta - y_i \cos\theta + \rho\right)^2 $$

After solving $\frac{\partial f}{\partial \theta} = 0$ and $\frac{\partial f}{\partial \rho} = 0$ one finds the four candidates: $$ \begin{cases} \theta_{j} = \frac{1}{2}\arctan\left(\frac{2\mu_{11}}{\mu_{20}-\mu_{02}}\right) + \frac{j\pi}{2} \\ \rho_j = - x_g\sin\theta_j + y_g\cos\theta_j \end{cases} $$

for $j=0,1,2,3$, with $(x_g, y_g)$ the barycenter of the points and $\mu_{pq} = \sum_{i=1}^n w_i (x_i-x_g)^p (y_i-y_g)^q$ (centered moments). Of these solutions, the pairs $(0, 2)$ and $(1,3)$ correspond to the same lines. Furthermore, between these two lines one is the best fitting we are looking for, while the other is the worst fitting passing through the barycenter.

I verified using a PC that the line obtained by the "weighted SVD" is the same as the analytical solution for a lot of random points and weights. Not a proof, but enough to convince me that the result is likely true.

ffusco · Answer 1 · 2023-08-11T12:01:11.773

Well... As it often happens, shortly after posting the question, I gave it yet another try and I think I found the proof. Here it goes:

A line can be written as the parametric curve $\boldsymbol{p}_0 + t \boldsymbol{u}$, with $\boldsymbol{p}_0$ being a point that belongs to the line and $\boldsymbol{u}$ a unit vector giving the direction of the line.

The distance of a generic point $\boldsymbol{p}$ from the line can be obtained by taking the difference $\boldsymbol{p} - \boldsymbol{p}_0$ and removing its projection along $\boldsymbol{u}$. Writing vectors as column matrices (matrices with a single column), the distance can be written as

$$ \left\|\left(\boldsymbol{p} - \boldsymbol{p}_0\right) - \boldsymbol{u}\boldsymbol{u}^T\left(\boldsymbol{p} - \boldsymbol{p}_0\right) \right\| = \left\|\left(\boldsymbol{I} - \boldsymbol{u}\boldsymbol{u}^T\right)\left(\boldsymbol{p} - \boldsymbol{p}_0\right) \right\| $$

The cost function thus writes as: $$ f(\boldsymbol{p}_0, \boldsymbol{u}) = \sum_i w_i \left\|\left(\boldsymbol{I} - \boldsymbol{u}\boldsymbol{u}^T\right)\left(\boldsymbol{p}_i - \boldsymbol{p}_0\right) \right\|^2 = \sum_i w_i \left(\boldsymbol{p}_i - \boldsymbol{p}_0\right)^T\left(\boldsymbol{I} - \boldsymbol{u}\boldsymbol{u}^T\right)^2\left(\boldsymbol{p}_i - \boldsymbol{p}_0\right) = \sum_i w_i \left(\boldsymbol{p}_i - \boldsymbol{p}_0\right)^T\left(\boldsymbol{I} - \boldsymbol{u}\boldsymbol{u}^T\right)\left(\boldsymbol{p}_i - \boldsymbol{p}_0\right) $$

We can now take the derivative of $f$ with respect to $\boldsymbol{p}_0$:

$$ \frac{\partial f}{\partial \boldsymbol{p}_0} = - 2 \sum_i w_i \left(\boldsymbol{p}_i - \boldsymbol{p}_0\right)^T\left(\boldsymbol{I} - \boldsymbol{u}\boldsymbol{u}^T\right) $$

For the function to have a minimum, the expression above needs to be equal to the zero-vector, i.e.,

$$ \sum_i w_i \left(\boldsymbol{I} - \boldsymbol{u}\boldsymbol{u}^T\right) \left(\boldsymbol{p}_i - \boldsymbol{p}_0\right) = 0 $$

Which is equivalent to

$$ \left(\boldsymbol{I} - \boldsymbol{u}\boldsymbol{u}^T\right) \sum_i w_i \left(\boldsymbol{p}_i - \boldsymbol{p}_0\right) = 0 $$

For this to hold, the result of the summation must belong to the kernel of the matrix $\boldsymbol{I} - \boldsymbol{u}\boldsymbol{u}^T$, which is the space generated by $\boldsymbol{u}$ (see this answer). This means that all solutions to the equation above are given by

$$ \sum_i w_i \left(\boldsymbol{p}_i - \boldsymbol{p}_0\right) = t \boldsymbol{u} $$

for $t\in\mathbb{R}$. By solving for $\boldsymbol{p}_0$ we obtain:

$$ \boldsymbol{p}_0 = \frac{\sum_i w_i \boldsymbol{p}_i}{\sum_i w_i} - \frac{t}{\sum_i w_i} \boldsymbol{u} = \boldsymbol{g} + t^\prime \boldsymbol{u} $$

It should be noted that the line is invariant to translations of the origin point $\boldsymbol{p}_0$ in the direction $\boldsymbol{u}$, that is, $f(\boldsymbol{p}_0 + t\boldsymbol{u}, \boldsymbol{u}) = f(\boldsymbol{p}_0, \boldsymbol{u})$ for all values of $t\in\mathbb{R}$, and we can choose without loss of generality $\boldsymbol{p}_0=\boldsymbol{g}$. Injecting the value into the objective function leads to:

$$ f(\boldsymbol{g}, \boldsymbol{u}) = \sum_i w_i \left(\boldsymbol{p}_i - \boldsymbol{g}\right)^T\left(\boldsymbol{I} - \boldsymbol{u}\boldsymbol{u}^T\right)\left(\boldsymbol{p}_i - \boldsymbol{g}\right) = \sum_i w_i \left(\boldsymbol{p}_i - \boldsymbol{g}\right)^T\left(\boldsymbol{p}_i - \boldsymbol{g}\right) - \sum_i w_i \left(\boldsymbol{p}_i - \boldsymbol{g}\right)^T\boldsymbol{u}\boldsymbol{u}^T\left(\boldsymbol{p}_i - \boldsymbol{g}\right) $$

The first term is invariant to $\boldsymbol{u}$. The optimal unit vector is therefore the one that minimizes just the second quadratic form. Note that after some manipulation it can be re-written as:

$$ -\boldsymbol{u}^T \left(\sum_i w_i \left(\boldsymbol{p}_i - \boldsymbol{g}\right)\left(\boldsymbol{p}_i - \boldsymbol{g}\right)^T\right)\boldsymbol{u} $$

which in turns can be written as

$$ -\boldsymbol{u}^T \boldsymbol{A}^T\boldsymbol{A}\boldsymbol{u} = -\left\|\boldsymbol{A}\boldsymbol{u}\right\|^2 \quad \boldsymbol{A}\doteq \begin{bmatrix} \sqrt{w_1}\left(\boldsymbol{p}_1 - \boldsymbol{g}\right)^T \\ \cdots \\ \sqrt{w_n}\left(\boldsymbol{p}_n - \boldsymbol{g}\right)^T \end{bmatrix} $$

Note that, since $\boldsymbol{u}$ is a unit vector, what we are trying to evaluate is $\min -\left\|\boldsymbol{A}\boldsymbol{u}\right\|^2 = \max \left\|\boldsymbol{A}\boldsymbol{u}\right\|^2$, i.e., the square of the spectral norm (or 2-norm) of $\boldsymbol{A}$. Such quantity is equal to the largest singular value of the matrix, and is obtained when $\boldsymbol{u}$ is the first right singular vector of $\boldsymbol{A}$. QED.

Weighted line-fitting using SVD

1 Answers1