1

I am trying to implement a modified version of sift tracking. I found a matlab implementation of sift to track objects in which, basically, they take the indices of the matches and compute the following transform:

pts = [x;y]'; % pts contains the points x and their transform y 

pt1 = [point,ones(size(pts,1),2),ones(size(pts,1),2)]; 
pt2 = [ones(size(pts,1),2),pts,ones(size(pts,1),2)]; 
m1= [1,0,0,0,0,0;0,1,0,0,0,0;0,0,0,0,0,0;0,0,0,0,0,0;0,0,0,0,1,0;0,0,0,0,0,0];
m2= [0,0,0,0,0,0;0,0,0,0,0,0;0,0,1,0,0,0;0,0,0,1,0,0;0,0,0,0,0,0;0,0,0,0,0,1];

t1=pt1*m1; 
t2=pt2*m2; 

A = reshape([t1(:) t2(:)]',size(t1,1)+size(t2,1), []); 
b = reshape(npoint(:),2*size(npoint,2),1); % npoints are the indices of the points

transformation = inv(A'*A)*A'*b

I don't understand how the transform is computed nor what is represents.

Has anyone an idea on this?

MvG
  • 42,596
bigTree
  • 429

1 Answers1

2

inv(A'*A)*A'*b looks like a least squares approximation. You can approximate

$$ A\cdot x \approx b $$

by solving

$$ \left(A^T\cdot A\right)\cdot x = A^T\cdot b $$

which can be solved for $x$ as

$$ x = \left(A^T\cdot A\right)^{-1}\cdot A^T\cdot b $$

You can also fiond this forumla in the section Model verification by linear least squares in the Wikipedia article on SIFT.

Let's also translate these matrices into math notation:

$$ M_1= \begin{bmatrix} 1&0&0&0&0&0\\ 0&1&0&0&0&0\\ 0&0&0&0&0&0\\ 0&0&0&0&0&0\\ 0&0&0&0&1&0\\ 0&0&0&0&0&0 \end{bmatrix} \qquad M_2= \begin{bmatrix} 0&0&0&0&0&0\\ 0&0&0&0&0&0\\ 0&0&1&0&0&0\\ 0&0&0&1&0&0\\ 0&0&0&0&0&0\\ 0&0&0&0&0&1 \end{bmatrix} $$

The matrices pt1 and pt2 seem to look something like this:

$$ \textit{pt}_1 = \begin{bmatrix} x_1 & y_1 & 1 & 1 & 1 & 1 \\ x_2 & y_2 & 1 & 1 & 1 & 1 \\ \vdots & & & & & \vdots \\ x_n & y_n & 1 & 1 & 1 & 1 \end{bmatrix} \qquad \textit{pt}_2 = \begin{bmatrix} 1 & 1 & x_1 & y_1 & 1 & 1 \\ 1 & 1 & x_2 & y_2 & 1 & 1 \\ \vdots & & & & & \vdots \\ 1 & 1 & x_n & y_n & 1 & 1 \end{bmatrix} $$

The above formulation disregards the difference between pointand pts. Multiplying those $M$ matrices from the right sets certain columns to zero, so you end up with

$$ t_1 = \begin{bmatrix} x_1 & y_1 & 0 & 0 & 1 & 0 \\ x_2 & y_2 & 0 & 0 & 1 & 0 \\ \vdots & & & & & \vdots \\ x_n & y_n & 0 & 0 & 1 & 0 \end{bmatrix} \qquad t_2 = \begin{bmatrix} 0 & 0 & x_1 & y_1 & 0 & 1 \\ 0 & 0 & x_2 & y_2 & 0 & 1 \\ \vdots & & & & & \vdots \\ 0 & 0 & x_n & y_n & 0 & 1 \end{bmatrix} $$

The reshape then stacks these matrices.

$$ A = \begin{bmatrix} x_1 & y_1 & 0 & 0 & 1 & 0 \\ x_2 & y_2 & 0 & 0 & 1 & 0 \\ \vdots & & & & & \vdots \\ x_n & y_n & 0 & 0 & 1 & 0 \\ 0 & 0 & x_1 & y_1 & 0 & 1 \\ 0 & 0 & x_2 & y_2 & 0 & 1 \\ \vdots & & & & & \vdots \\ 0 & 0 & x_n & y_n & 0 & 1 \end{bmatrix} $$

These are the same lines you also find in the Wikipedia article, although Wikipedia has them interleaved, so they are in a different order.

Now the right side is built from npoint. Despite the comment, I assume that this is not point indices, but instead the new positions of the points, after the transformation was applied. Wikipedia calls this $u$ and $v$. So you have

$$ b = \begin{bmatrix} u_1 \\ u_2 \\ \vdots \\ u_n \\ v_1 \\ v_2 \\ \vdots \\ v_n \end{bmatrix} $$

So what you are doing is you are solving the following least squares approximation problem:

$$ \begin{bmatrix} x_1 & y_1 & 0 & 0 & 1 & 0 \\ x_2 & y_2 & 0 & 0 & 1 & 0 \\ \vdots & & & & & \vdots \\ x_n & y_n & 0 & 0 & 1 & 0 \\ 0 & 0 & x_1 & y_1 & 0 & 1 \\ 0 & 0 & x_2 & y_2 & 0 & 1 \\ \vdots & & & & & \vdots \\ 0 & 0 & x_n & y_n & 0 & 1 \end{bmatrix} \cdot \begin{bmatrix} m_1 \\ m_2 \\ m_3 \\ m_4 \\ t_x \\ t_y \end{bmatrix} \approx \begin{bmatrix} u_1 \\ u_2 \\ \vdots \\ u_n \\ v_1 \\ v_2 \\ \vdots \\ v_n \end{bmatrix} $$

which is just another way of writing

$$ \begin{bmatrix}m_1 & m_2 \\ m_3 & m_4\end{bmatrix} \cdot\begin{bmatrix}x_i \\ y_i\end{bmatrix} +\begin{bmatrix}t_x \\ t_y\end{bmatrix} \approx\begin{bmatrix}u_i \\ v_i\end{bmatrix} $$

So in the end, your transformation will be a vector of six elements, namely $(m_1, m_2, m_3, m_4, t_x, t_y)$, which define the affine transformation

$$ \begin{bmatrix}x \\ y\end{bmatrix} \mapsto \begin{bmatrix}m_1 & m_2 \\ m_3 & m_4\end{bmatrix} \cdot\begin{bmatrix}x \\ y\end{bmatrix} +\begin{bmatrix}t_x \\ t_y\end{bmatrix} $$

Note that I wouldn't program things that way. I'd rather treat this as two systems of equations, since rows related to $u$ don't influence those related to $v$. So I'd write something like

$$ \begin{bmatrix} x_1 & y_1 & 1 \\ x_2 & y_2 & 1 \\ \vdots & & \vdots \\ x_n & y_n & 1 \end{bmatrix} \cdot \begin{bmatrix} m_1 & m_3 \\ m_2 & m_4 \\ t_x & t_y \end{bmatrix} \approx \begin{bmatrix} u_1 & v_1 \\ u_2 & v_2 \\ \vdots & \vdots \\ u_n & v_n \end{bmatrix} $$

This should translate into matlab like this:

pts = [x;y]';
A = [pts, ones(size(pts,1),1)];
transformation = (inv(A'*A)*A'*npoint)'
MvG
  • 42,596
  • Great answer! thank you. I have one more question though: It is possible to decompose an affine transformation into shear, scaling and rotation. I need to find the right order of this decomposition for the affine transformation I have. Do you know how I should proceed? – bigTree Dec 20 '13 at 15:24
  • @bigTree: What is your application? You will have to include translations in the set of operations, and you should fix an order up front and can then see how to deduce its parameters from the transformation matrix. – MvG Dec 20 '13 at 21:45