I try to describe the passage between $GF(16)$ and $GF(4^2)$. The same principles can be applied in going from $GF(256)$ to $GF(16^2)$ and back.
Let us begin with $GF(4)$. It's elements are $0,1$, an element called $\alpha$ (aka the generator of the field) and $\alpha+1$. The arithmetic follows from the rule that $\alpha$ is a zero of the polynomial $x^2+x+1$, in other words $\alpha^2=\alpha+1$. In a program we represent the element
$a_1\alpha+a_0$ with the pair of bits $(a_1,a_0)$. So the individual bits are viewed as coefficients of a (at most linear) polynomial that we imagine to be evaluated at $\alpha$.
Similarly elements of $GF(16)$ are represented as sequences of bits $(a_3,a_2,a_1,a_0)$. This sequence represents the element $a_3\beta^3+a_2\beta^2+a_1\beta+a_0$, where $\beta$ is the (possibly, but hopefully not mysterious) generator. This time the arithmetic follows from the fact that $\beta$ is a zero of the polynomial $x^4+x+1$, in other words $\beta^4=\beta+1$. For example,
we then see that
$$
\beta^5=\beta\cdot\beta^4=\beta(\beta+1)=\beta^2+\beta,
$$
and
$$
\beta^{10}=(\beta^5)^2=(\beta^2+\beta)^2=\beta^4+2\beta^2\beta+\beta^2
=\beta^4+\beta^2=\beta^2+\beta+1,
$$
using the fact that in all fields of characteristic two we have $2=1+1=0$. This piece of information is systematically used in a lot of software and hardware in the sense that field addition is implemented as bitwise XOR of the sequences representing the elements of the field.
We make the important observation that $\beta^2+\beta$ is a root of the same equation
$x^2=x+1$ as $\alpha$. This allows us to identify $GF(4)$ as a subset of $GF(16)$. We can choose to equate $\alpha=\beta^2+\beta$.
Transformation from $GF(16)$ to $GF(4^2)$: Consider an arbitrary element
$z=a_3\beta^3+a_2\beta^2+a_1\beta+a_0$ of $GF(16)$. We can get rid of the high powers of $\beta$ here using the equation $\beta^2+\beta=\alpha$, or rather the equivalent form
$\beta^2=\beta+\alpha$, and its consequence
$\beta^3=\beta\cdot\beta^2=\beta(\beta+\alpha)=\beta^2+\alpha\beta=\alpha\beta+\beta+\alpha.$ Substituting these allows us to rewrite
$$
\begin{aligned}
z&=a_3(\alpha\beta+\beta+\alpha)+a_2(\beta+\alpha)+a_1\beta+a_0
=(a_3\alpha+[a_3+a_2+a_1])\beta+([a_3+a_2]\alpha+a_0)\\
&=a_h\beta+a_\ell,
\end{aligned}
$$
where $a_h=(a_3\alpha+[a_3+a_2+a_1])$ and $a_\ell=([a_3+a_2]\alpha+a_0)$ are elements of $GF(4)$. Internally we represent elements of $GF(4)$ with the sequence of coefficient of $\alpha^j,j=0,1$, e.g. $a_h=a_{h1}\alpha+a_{h0}$, so here
$$
a_{h1}=a_3,\quad a_{h0}=a_3+a_2+a_1,\quad a_{\ell1}=a_3+a_2,\quad a_{\ell0}=a_0.
$$
Transformation from $GF(4^2)$ to $GF(16)$: The other direction is also straightforward with the identification $\alpha=\beta^2+\beta$ in place. If we are given an element of
$GF(4^2)$ in the form
$$
z=a_h\beta+a_\ell,
$$
where $a_h=a_{h1}\alpha+a_{h0}$ and $a_\ell=a_{\ell1}\alpha+a_{\ell0}$, then
$$
\begin{aligned}
z&=(a_{h1}[\beta^2+\beta]+a_{h0})\beta+(a_{\ell1}[\beta^2+\beta]+a_{\ell0})\\
&=a_{h1}\beta^3+(a_{\ell1}+a_{h1})\beta^2+(a_{h0}+a_{\ell1})\beta+a_{\ell0}\\
&=z_3\beta^3+z_2\beta^2+z_1\beta+z_0,
\end{aligned}
$$
where $z_3=z_{h1}$, $z_2=(a_{\ell1}+a_{h1})$, $z_1=(a_{h0}+a_{\ell1})$ and
$z_0=a_{\ell0}$.
Inversion in $GF(4^2)$: Here the idea is to take advantage of the fact that inversion is easier in $GF(4)$ than it would be in $GF(16)$. Namely $1^{-1}, \alpha^{-1}=\alpha+1$
and $(\alpha+1)^{-1}=\alpha$. Equivalently $x^{-1}=x^2$ for any non-zero $x\in GF(4)$.
The idea is similar to inverting a complex number $z=x+yi$ where $x,y$ are real, at least one non-zero. There we make the observation that
$$
(x+yi)(x-yi)=x^2+y^2=Q(x,y),
$$
where we can deduce that the real number $Q(x,y)\neq0$. As we know how to compute the reciprocal of $Q(x,y)$ we can take advantage and compute
$$
\frac1{x+yi}=\frac1{Q(x,y)}(x-yi).
$$
The key to the success of such calculation is (this is bread and butter in field theory, but does
require a more extensive background in abstract algebra) that $i$ and $-i$
are roots of the same polynomial $x^2+1$ with real coefficients.
We can do something similar here. Remember that $\beta$ is a root
of the polynomial $x^2+x+\alpha$ with coefficients in $GF(4)$. The other root
of that polynomial is $\beta'=\beta+1$, because
$$
\beta'^2+\beta'=(\beta+1)^2+(\beta+1)=\beta^2+1+\beta+1=\beta^2+\beta=\alpha.
$$
So given an element $z=a_h\beta+a_\ell$ of $GF(4^2)$ let's calculate
$$
z(a_h\beta'+a_\ell)=a_h^2\beta\beta'+a_ha_\ell(\beta+\beta')+a_\ell^2.
$$
Here $\beta\beta'=\beta(\beta+1)=\beta^2+\beta=\alpha$ and $\beta+\beta'=1$, so we get
$$
z(a_h\beta'+a_\ell)=Q(a_h,a_\ll),
$$
where $Q(x,y)=\alpha x^2+xy+y^2$. Here we can prove that $Q(x,y)\neq0$
if at least one of $x,y$ is non-zero. If $x=0$, then $Q(x,y)=Q(0,y)=y^2\neq0$,
because then $y\neq0$. OTOH, if $x\neq0$, then
$$
Q(x,y)=\frac1{x^2}\left(\alpha+\frac{y}{x}+(\frac{y}{x})^2\right).
$$
Here the factor in round parens is $P(y/x)$, where $P(T)=T^2+T+\alpha$.
We have seen that $P(T)=0$, when $T=\beta$ or $T=\beta'$. Because $P(T)$ is
a quadratic polynomial it can have at most two zeros in the field $GF(16)$.
Because neither of those zeros was in the smaller field $GF(4)$ we can deduce that here
$P(y/x)\neq0$, because $y/x\in GF(4)$.
Putting all this together gives us, at long last,
$$
z^{-1}=\frac{1}{Q(a_h,a_\ell)}(a_h\beta'+a_\ell)
=\frac{1}{Q(a_h,a_\ell)}(a_h\beta+[a_h+a_\ell]).
$$
In other words, using the representation of $GF(16)$ as $GF(4^2)$
allows us to calculate the inverse in $GF(16)$ as a single inversion,
a single addition, and two multiplications in $GF(4)$. I guess that may be
worth the engineers' while in this setting.
What is needed to extend this to conversion between $GF(256)$ and $GF(16^2)$:
AES specifies that the field $GF(256)$ be given by using a generator $\theta$
that is a zero of the polynomial $x^8+x^4+x^3+x+1$ [EDIT Earlier I had another polynomial here. Corrected.]. IOW the arithmetic of $GF(256)$ follows
from the rule
$$
\theta^8=\theta^4+\theta^3+\theta+1.
$$
We need to identify a copy of $GF(16)$ inside that field. A little bit of tinkering
allowed me to find that
$$
(\theta^{16}+\theta)=\theta^2+\theta^3+\theta^4+\theta^6
$$
is a solution of $x^4+x+1=0$. So let us declare
$$\beta=\theta^2+\theta^3+\theta^4+\theta^6.$$
Then we can calculate that
$${\bf\{e\}}=(1,1,1,0)_2=\beta^3+\beta^2+\beta=\theta^2+\theta^3+\theta^5+\theta^6+\theta^7.$$
It is possible to prove that
the polynomial $q(x)=x^2+x+{\bf\{e\}}$ has no solutions in $GF(16)$, and hence two in $GF(256)$.
The simplest proof of this relies on the properties of the trace mapping.
A little bit of searching reveals that the roots of $q(x)$ in $GF(256)$ are
$$
\gamma=\theta+\theta^2+\theta^3+\theta^4+\theta^5+\theta^6+\theta^7
$$
and
$$\gamma+1=\gamma^{16}.$$
We can then represent
elements $z\in GF(256)$ in "$GF(16^2)$" format like
$$
z=a_h\gamma+a_\ell,
$$
where this time $a_h,a_\ell\in GF(16)$. The formula
for converting the 8-tuple of bits $(z_0,z_1,\ldots,z_7)$ to
the pair $(a_h,a_\ell)\in GF(16)^2$ such that
$$
z_0+z_1\theta+z_2\theta^2+\cdots+z_7\theta^7=z=a_h\gamma+a_\ell
$$
is quite messy, and I haven't calculated it.