Linear transformations
With a linear transformation you have
$$\begin{pmatrix}x'\\y'\end{pmatrix}=
\begin{pmatrix}a&b\\c&d\end{pmatrix}\cdot
\begin{pmatrix}x\\y\end{pmatrix}$$
and you want to find $a,b,c,d$. You can set up a bunch of equations for this:
$$\begin{pmatrix}
x_1 & y_1 & 0 & 0 \\
0 & 0 & x_1 & y_1 \\
x_2 & y_2 & 0 & 0 \\
0 & 0 & x_2 & y_2 \\
\vdots & \vdots & \vdots & \vdots \\
x_n & y_n & 0 & 0 \\
0 & 0 & x_n & y_n
\end{pmatrix}\cdot
\begin{pmatrix}a\\b\\c\\d\end{pmatrix}\approx
\begin{pmatrix}x_1'\\y_1'\\x_2'\\y_2'\\\vdots\\x_n'\\y_n'\end{pmatrix}$$
Use linear least squares approximation to determine $a$ through $d$ and you are done.
Actually you could as well determine the parameters for $a,b$ and $c,d$ separately since the corresponding equations are independent. The combined view above is of benefit in cases where you want additional restrictions. So for example you might want $a=d$ and $b=-c$ to enforce angle-preserving similarity transformations. That way you have a single system of equations in two variables.
Affine transformations
Here your map is
$$\begin{pmatrix}x'\\y'\\1\end{pmatrix}=
\begin{pmatrix}a&b&c\\d&e&f\\0&0&1\end{pmatrix}\cdot
\begin{pmatrix}x\\y\\1\end{pmatrix}$$
so you have two more parameters to tune, and two more columns in your matrix. Just fill them with $1$, as $c$ and $f$ will be added to the result as they are.
Projective transformations
I honestly don't have a simple and generic solution here. The problem is that you get something like
$$\begin{pmatrix}x'\\y'\\1\end{pmatrix}=
\lambda\begin{pmatrix}a&b&c\\d&e&f\\g&h&k\end{pmatrix}\cdot
\begin{pmatrix}x\\y\\1\end{pmatrix}$$
and you don't know that $\lambda$ either. You can use the last row to eliminate the $\lambda$, but then you get equations like these:
\begin{align*}
(gx+hy+k)x' &= ax+by+c \\
(gx+hy+k)y' &= dx+ey+f
\end{align*}
Expressed as a matrix you could write this as
$$\begin{pmatrix}
x_1 & y_1 & 1 & 0 & 0 & 0 & -x_1x_1' & -y_1x_1' & -x_1' \\
0 & 0 & 0 & x_1 & y_1 & 1 & -x_1y_1' & -y_1y_1' & -y_1' \\
x_2 & y_2 & 1 & 0 & 0 & 0 & -x_2x_2' & -y_2x_2' & -x_2' \\
0 & 0 & 0 & x_2 & y_2 & 1 & -x_2y_2' & -y_2y_2' & -y_2' \\
\vdots&\vdots&\vdots&\vdots&\vdots&\vdots&\vdots&\vdots&\vdots\\
x_n & y_n & 1 & 0 & 0 & 0 & -x_nx_n' & -y_nx_n' & -x_n' \\
0 & 0 & 0 & x_n & y_n & 1 & -x_ny_n' & -y_ny_n' & -y_n'
\end{pmatrix}\cdot
\begin{pmatrix}a\\b\\c\\d\\e\\f\\g\\h\\k\end{pmatrix}\approx
\begin{pmatrix}0\\0\\0\\0\\\vdots\\0\\0\end{pmatrix}$$
But if you apply standard least squares optimization to this, you will end up setting all your variables to zero, as using the zero matrix as transformation matrix will eliminate all errors. To avoid this, you may have to add something to enforce a non-zero matrix. For example, in most situations your transformation matrix will have a nonzero $k$ entry, and therefore you can choose to have that entry equal $1$.
$$\begin{pmatrix}
x_1 & y_1 & 1 & 0 & 0 & 0 & -x_1x_1' & -y_1x_1' \\
0 & 0 & 0 & x_1 & y_1 & 1 & -x_1y_1' & -y_1y_1' \\
x_2 & y_2 & 1 & 0 & 0 & 0 & -x_2x_2' & -y_2x_2' \\
0 & 0 & 0 & x_2 & y_2 & 1 & -x_2y_2' & -y_2y_2' \\
\vdots&\vdots&\vdots&\vdots&\vdots&\vdots&\vdots&\vdots\\
x_n & y_n & 1 & 0 & 0 & 0 & -x_nx_n' & -y_nx_n' \\
0 & 0 & 0 & x_n & y_n & 1 & -x_ny_n' & -y_ny_n'
\end{pmatrix}\cdot
\begin{pmatrix}a\\b\\c\\d\\e\\f\\g\\h\end{pmatrix}\approx
\begin{pmatrix}x_1'\\y_1'\\x_2'\\y_2'\\\vdots\\x_n'\\y_n'\end{pmatrix}$$
Now you can solve this. If you have four points you would get exactly the projective transformation from that post you referenced. If you have more and they do fit a common projective transformation, you should get this transformation. If they don't fit a projective transformation exactly, you have to be careful about what errors you are actually minimizing. You are no longer minimizing the sum of squared distances between predicted and actual points. That's because the way the system of equations is set up, you are using the actual transformed positions on the left hand side of the equation as well, but if you'd compute squared errors you'd use the predicted coordinates there. If your projective transformation is almost affine, i.e. if $\lvert g\rvert,\lvert h\rvert\ll1$ then this probably doesn't matter too much.