How to find an all-in-one 2D to 3D Transformation Matrix for perspective projection, rotation, and translation?

Question

I have read Finding a 3D transformation matrix based on the 2D coordinates but I think my situation is different because I think I need a 4x3 matrix, not a 3x3 matrix. I'm not sure but this might be because I have rotation and translation in addition to just the perspective transformation.

Here is the setup:

suppose you have several 2D points in an image: (x1,y1) (x2,y2) (x3,y3) (x4,y4)

suppose you also have several corresponding 3D points on an arbitrary plane: (X1,Y1,Z1) (X2,Y2,Z2) (X3,Y3,Z3) (X4,Y4,Z4)

to transform from 2D to 3D using homogenous coordinates, we can use

(X,Y,Z,W) = M*(x,y,1). Here M must be a 4x3 matrix

So a 2D-homogenousCoords point gets transformed into a 3D-homogenousCoords point.

Then, I could divide (X,Y,Z,W) by W to get (X,Y,Z,1), which is a form that I can read out the true X,Y,Z values in "regular" coordinates.

Now, here is a problem. I don't know what is W for any of my (X,Y,Z) points. (If I did know each point's W, I think there are standard linear algebra way for finding M.)

So to find M, I multiply things out like the following:

X = M11*x + M12*y + M13*1

Y = M21*x + M22*y + M23*1

Z = M31*x + M32*y + M33*1

W = M41*x + M42*y + M43*1

but these X,Y,Z,W are the homogenous coords, so to get the "real" X,Y,Z coords:

X = (M11*x + M12*y + M13*1) / (M41*x + M42*y + M43*1)

Y = (M21*x + M22*y + M23*1) / (M41*x + M42*y + M43*1)

Z = (M31*x + M32*y + M33*1) / (M41*x + M42*y + M43*1)

also, I can get rid of one parameter from each equation by multiplying each equation by (1/M43)/(1/M43). Then I can also rename the ratio of parameters. I'm left with:

X = (a1*x + a2*y + a3*1) / (a10*x + a11*y + 1)

Y = (a4*x + a5*y + a6*1) / (a10*x + a11*y + 1)

Z = (a7*x + a8*y + a9*1) / (a10*x + a11*y + 1)

finally I plug in all the (X,Y,Z) and (x,y,z) values that I have into multiple instances of these equations and algebraically re-arrange everything to get the classic A=Bx form, where x is vector of unknown a's (a1 ... a11).

Once I have a1 through a11, I could go back and work out what the original components of M were. Either way I can now project points from 2D to 3D using perspective transformation even if there is rotation or translation.

My question is whether this is the best way to find this kinds of general 2D to 3D perspective transformation?

See here: http://mathematica.stackexchange.com/questions/9244/solve-system-of-equations-related-to-perspective-projection — user7530, Jul 29 '13 at 20:43
And here: http://stackoverflow.com/questions/8925569/perspective-projection-4-points — user7530, Jul 29 '13 at 20:47
@user7530 Those links are very helpful. It seems that finding the perspective projection comes down to solving a system of linear equations. I am still wondering exactly what are the steps to go from the original matrix equation to the system of equations. — kdaquila, Jul 29 '13 at 21:14
The trick is to move the division over to the other side, which then turns the nonlinear system into a linear system in the unknown variables (the matrix coefficients). See the mathematica question for the worked solution. — user7530, Jul 29 '13 at 21:18
True and I get that. However, how exactly does one get the nonlinear system of equations in the first place? In homogenous coordinates, the matrix equation starts off linear: (3D pt in homogenous coords) = Matrix* (2D pt in homogenous coords). I have tried to derive it in my post, but I'm not sure if it's right — kdaquila, Jul 29 '13 at 21:26
This might help: http://math.stackexchange.com/questions/441597/how-to-calculate-what-matrix-will-transform-specified-points-to-other-specified/442002#442002 — bubba, Jul 30 '13 at 00:32

score 3 · Accepted Answer · answered Jul 29 '13 at 22:52

It looks like you are trying to solve for a map from 2D points to 3D points, so I'm a bit confused... a projection transformation would map the 3D points to the 2D points (and the inverse is, of course, impossible since each point on the projection plane could lie anywhere on a ray form the camera through the plane.)

Next, notice there is no difference between first transforming an object in 3D, and then projecting through a fixed camera, versus leaving the object in place and projecting through a camera of unknown position and orientation. Here I'll take the former approach.

We have some points in 3D and apply an affine transformation to them, then project through a camera at the origin looking down the $z$ axis, with the projection plane passing through $z=1$. This makes the projection matrix $P: (x,y,z) \to (u,v,w)$ easy: it is just the identity.

Before we project we apply some affine transformation $Mq + t$ to the 3D points $q$. Notice that we do not constraint $M$ to only rotate and scale here: to do so we would need to add additional (nonlinear) constraints on the coefficients of $M$. The short of it is that you will need to supply more than the theoretical minimum of four corresponding points to determine the map (and you will get shear if your corresponding points did not come from a bona fide Euclidean motion + projection.)

So now the total map can be written as

$$\left[\begin{array}{c}u\\v\\w\end{array}\right] = \left[\begin{array}{cccc}m_{11} & m_{12} & m_{13} & t_x\\m_{21} & m_{22} & m_{23} & t_y\\m_{31} & m_{32} & m_{33} & t_z\end{array}\right]\left[\begin{array}{c}x\\y\\z\\1\end{array}\right].$$

Since $(u,v,w) \sim (u/w,v/w,1)$, this map is scale-invariant, so we might as well set $m_{33} = 1$. We can also write it in block form (which will prove useful) as

$$\left[\begin{array}{c}u\\v\\w\end{array}\right] = \left[\begin{array}{c}N_{uv}\\N_w\end{array}\right]\left[\begin{array}{c}x\\y\\z\\1\end{array}\right].$$

Like you say, we only know $u/w$ and $v/w$ for the corresponding points, not $u,v,w$. Well, $$\left[\begin{array}{c}u/w\\v/w\end{array}\right] = N_{uv}\left[\begin{array}{c}x\\y\\z\\1\end{array}\right]/N_w \left[\begin{array}{c}x\\y\\z\\1\end{array}\right],$$ or $$N_w \left[\begin{array}{c}x\\y\\z\\1\end{array}\right]\left[\begin{array}{c}u/w\\v/w\end{array}\right] = N_{uv}\left[\begin{array}{c}x\\y\\z\\1\end{array}\right]$$

which is a system of two linear equations in 11 unknowns. Plugging in $5\frac{1}{2}$ corresponding points will let you solve for $N$.

This answer is very helpful and I think I can follow the same logic for the case that I'm interested in which is 2D to 3D. On that point, I don't see why projecting from 2D to 3D is impossible as long as I project onto a plane in 3D. The context of this problem is in computer vision. I want to take a picture of a wall that has fiducials. I know the 3D location of the fiducials from measuring them, and I find the 2D location of the fiducials in the image. By finding the 2D to 3D mapping, I can determine the 3D projected position of any point in the image. — kdaquila, Jul 30 '13 at 02:16
Yes, that should work the same way, the easiest is probably to parameterize space using (u, v, n) where u and v are two orthogonal tangent vectors of your plane and n is the plane normal, then all of the above applies almost unchanged. — user7530, Jul 30 '13 at 02:20

score 0 · Answer 2 · answered Feb 08 '20 at 03:45

I guess what you need can be found from this arXiv.org article here: A New Way to Factorize Linear Cameras.

The introdution of detinitions 2.1 and 2.2 in this article makes it possible to convert 3D coordinates into 2D ones and adversely.

They are introduced mainly for factorizing $3\times 4$ camera matrices in computer vision. For the definitions of the left and central/parallel matrix factors, you better look into another arXiv.org article here for more interpretations.

How to find an all-in-one 2D to 3D Transformation Matrix for perspective projection, rotation, and translation?

2 Answers2

Linked