It’s a good idea that unfortunately doesn’t really do what you want: you’re computing a best-fit affinity instead of a proper projectivity. The way you’ve set it up, the matrix $A$ must preserve the last coordinate of every input vector, which means that its last row must be $(0,0,1)$. This is the hallmark of an affine transformation.
Given pairings between the vertices of two nondegenerate quadrilaterals, there’s a unique projectivity that maps one onto the other. Let’s see what happens when we try to compute this using your method. Let the source vertices be, say, $(-2,1)$, $(1,-3)$, $(3,2)$, $(2,5)$ and the destination be the unit square with vertices enumerated counterclockwise from the origin. Then $$A=\begin{bmatrix} 0.204734 & -0.144379 & 0.47574 \\ 0.168639 & 0.106509 & 0.198225 \\ 0&0&1 \end{bmatrix}$$ and the images of the four vertices are $(-0.0781065, -0.0325444)$, $(1.11361, 0.0473373)$, $(0.801183, 0.91716)$, $(0.163314, 1.06805)$, not the vertices of the unit square. If the four points formed a paralellogram, on the other hand, the mapping would’ve been exact.
What’s going wrong is that the way that you’ve formulated the mappings gives up two of the eight degrees of freedom of a projectivity. In general, the raw output of the product of a projective matrix and a point in the form $(x,y,1)^T$ is not going to be of that form. Its last coordinate will be some other value (think of it as a scale factor) that depends on the values of $x$ and $y$. So, by forcing the last coordinate of every output vector to be $1$—replacing $\sim$ by $=$, as you put it—you’ve lost two degrees of freedom. Not coincidentally, affinities have only six degrees of freedom.
To fix this, you have to allow the result of $A\mathbf x_i$ to be a scalar multiple of its desired image $\mathbf x_i'$. In $\mathbb P^2$, this can be expressed without introducing another variable via a cross product: $\mathbf x_i'\times A\mathbf x_i=0$. This results in a system of homogeneous linear equations in the elements of $A$, but this is pretty much the DLT method.
If you have exactly four point correspondences, then there is a method described here by MvG that’s similar to yours which allows you to compute the transformation matrix more-or-less directly. For the above example, the resulting matrix after normalization is $$H = \begin{bmatrix} 0.127072 & -0.127072 & 0.381215 \\ 0.0883978 & 0.0662983 & 0.110497 \\ -0.154194 & -0.0145655 & 1. \end{bmatrix}.$$ You can verify that the images of the source data points are in fact the corners of the unit square.