Possibly not an answer, but certainly too long for a comment.
If I understand the notation, your "true" image is modeled by the coordinate system $(x', y')$ and the image is captured in the coordinate system $(x, y)$, but the captured image is distorted by the apparatus. The distortion is modeled by a mapping
$$
(x, y) = (X(x', y'), Y(x', y')).
\tag{*}
$$
The practical goal is to "undistort" the measured data/pixels into the "true" image. Mathematically, you want to (approximately) solve for $(x', y')$ as functions of $(x, y)$, namely to (approximately) invert the mapping (*). (Applying the resulting transformation to measured data (approximately) undoes the distortion of the X-ray machine.)
Be that as it may, it's reasonable to assume the functions $(X, Y)$ are smooth (infinitely differentiable). Consequently, they can be approximated to arbitrary accuracy with polynomials, such as Taylor polynomials. That's my best guess of the rationale for assuming
\begin{align*}
x &= \sum_{i=0}^{d} \sum_{j=0}^{d-i} u_{i,j}(y')^{j} (x')^{i}
= \sum_{k=0}^{d} \sum_{j=0}^{k} u_{k-j,j}(x')^{k-j} (y')^{j}, \\
y &= \sum_{i=0}^{d} \sum_{j=0}^{d-i} v_{i,j}(y')^{j} (x')^{i}
= \sum_{k=0}^{d} \sum_{j=0}^{k} v_{k-j,j}(x')^{k-j} (y')^{j}.
\end{align*}
The right-hand expressions are more common mathematical notation, in which the inner sum groups terms of fixed degree $k$.
Here's what this looks like concretely for $d = 3$, with successive lines showing the constant, linear, quadratic, and cubic terms:
\begin{align*}
x &= u_{0,0} && \sum_{j=0}^{0} u_{0,j} (x')^{0} (y')^{j} && (d = 0) \\
&\quad+ u_{1,0} (x') + u_{0,1} (y') && \sum_{j=0}^{1} u_{1-j,j} (x')^{1-j} (y')^{j} && (d = 1) \\
&\quad+ u_{2,0} (x')^{2} + u_{1,1} (x') (y') + u_{0,2} (y')^{2} && \sum_{j=0}^{2} u_{2-j,j} (x')^{2-j} (y')^{j} && (d = 2) \\
&\quad+ u_{3,0} (x')^{3} + u_{2,1} (x')^{2} (y') + u_{1,2} (x') (y')^{2} + u_{0,3} (y')^{3}. && \sum_{j=0}^{3} u_{3-j,j} (x')^{3-j} (y')^{j} && (d = 3)
\end{align*}
In general, the constant term $(u_{0,0}, v_{0,0})$ is "image drift", a constant vector by which the output is shifted.
The linear terms are presumably close to $u_{1,0} \approx v_{0, 1} \approx 1$, $u_{0, 1} \approx v_{1, 0} \approx 0$; that is, $(x, y) \approx (x', y')$ after drift correction.
The higher-order coefficients are presumably of small absolute value (i.e., the distortion is not "highly non-linear").