Consider a 2-dimensional polygon with unknown vertices $V_1, \dots, V_n \in \mathbb{R}^2$. Given a random set of points $p_1,\dots, p_m \in \mathbb{R}^2$ lying approximately on the perimeter of such a polygon, is it possible to estimate $V_1,\dots, V_n$?
To simplify the problem I assume that:
- $m\ggg n$.
- the given points $p_1, \dots, p_m$ are ordered along the perimeter, i.e., if we walk along the perimeter, we will find the points in the "circular" order $p_1, p_2, \dots, p_m, p_1, p_2, \dots$
- any observed point is obtained as \begin{equation*}p_i = \bar{p}_i + v_i \qquad v_i \sim \mathcal{N}(0_{2\times 1}, \sigma^2\,I_2)\end{equation*} where $\bar{p}_i$ is a point lying exactly on the perimeter and $v_i$ is zero-mean Gaussian noise.
My approach to the problem:
Starting from $p_1$, I search for the best line interpolating the points $p_1,\dots, p_k$, where $k$ is a design parameter. By "best," I mean in the least squares sense. In particular, $k$ should not be too small because I need to compensate for the noise and run a meaningful regression, but at the same time, $k$ should not be too large because I want to compute the regression only with points lying on the same edge of the polygon. As $k$ increases, the risk of considering points over different edges increases as well.
Given the first optimal line, I search for the first vertex. I achieve this task by looking at the distances between the line and the points $p_{k+1}, \dots, p_m$. If the distance between the line and a point $p_i$ (where $i$ ranges from $k+1$ to $m$) is similar to the distances between the line and the points $p_1, \dots, p_k$, then I consider $p_i$ as a point on the first edge (by the definition of the regression, I consider $p_1, \dots, p_k$ belonging to the first edge). When I spot the first "big" deviation, say with point $p_i$, I define the first edge $V_1$ as the projection of $p_{i-1}$ over the optimal line.
I delete $p_1, \dots, p_{i-1}$ from the dataset $p_1,\dots, p_m$. Starting from $p_i$, I repeat the previous procedure: I run a local search for the optimal line with $p_i,\dots, p_{i+k}$, then I identify the second vertex $V_2, and finally, I remove the points belonging to the new edge.
I repeat the entire process until the dataset becomes empty.
Questions: I've implemented the method, and unfortunately, it seems to work only for a negligible noise level $\sigma$. My questions are the following:
Has my problem already been solved? If yes, with which algorithms? Do you have any suggestions to improve the robustness of my method with respect to the noise?