We need the following essential lemma, i.e.,
Knott-Smith optimality: Let $X = Y = \mathbb R^d$ and assume that $\mu, \nu$ both have finite second moment. Then $\pi^\dagger$ minimizes $\mathbb K$ over $\Pi(\mu, \nu)$ with cost $c(x,y) := \frac{1}{2} |x-y|^2$ if and only if there is $\varphi \in L_1 (\mu)$ convex l.s.c. such that $y \in \partial f (x)$ for $\pi$-a.e. $(x, y) \in X \times Y$. Moreover, the pair $(\varphi, \varphi^*)$ minimizes $\mathbb J$ over $\Phi$ where $\varphi^*$ is the convex conjugate and
$$
\Phi := \{(\varphi', \psi') \in L_1(\mu) \times L_1 (\nu) \mid \varphi' (x) + \psi' (y) \ge \langle x, y \rangle\}.
$$
- Let $\pi^\dagger$ minimize $\mathbb K$ over $\Pi(\mu, \nu)$.
By Knott-Smith optimality, there is a convex proper l.s.c. function $\varphi:X \to \mathbb R \cup \{+\infty\}$ such that $\varphi \in L_1 (\mu)$ and $y \in \partial \varphi (x)$ for $\pi^\dagger$-a.e. $(x, y) \in X \times Y$. Then $\varphi$ is differentiable $\mu$-a.e., and its gradient $\nabla \varphi$ is $\mu$-a.e. defined and Borel measurable. It follows that $y = \nabla \varphi (x)$ for $\pi$-a.e. $(x, y) \in X \times Y$. For $f \in L_1(\nu)$, we have
$$
\begin{align}
\int_Y f (y) \mathrm d \nu (y) &= \int_{X \times Y} f (y) \mathrm d \pi^\dagger (x,y) \\
&= \int_{X \times Y} f (\nabla \varphi (x)) \mathrm d \pi^\dagger (x,y) \\
&= \int_{X} f (\nabla \varphi (x)) \mathrm d \mu (x) \\
&= \int_{Y} f ( y) \mathrm d [(\nabla \varphi)_\sharp \mu] (y) \quad \text{because} \quad X=Y.
\end{align}
$$
It follows that $\nu = (\nabla \varphi)_\sharp \mu$. For $f \in L_1(\pi)$, we have
$$
\begin{align}
\int_{X \times Y} f (x, y) \mathrm d \pi^\dagger (x, y) &= \int_{X \times Y} f (x, \nabla \varphi (x)) \mathrm d \pi^\dagger (x,y) \\
&= \int_{X} f (x, \nabla \varphi (x)) \mathrm d \mu (x) \\
&= \int_{X \times Y} f (x, y) \mathrm d [(\operatorname{Id, \nabla \varphi})_\sharp \mu] (x).
\end{align}
$$
It follows that $\pi^\dagger = (\operatorname{Id}, \nabla \varphi)_\sharp \mu$.
2.
- $\implies$ Notice that 1. implies
$$
\inf_{\pi \in \Pi(\mu, \nu)} \mathbb K(\pi) = \inf_{T \in \mathcal T} \mathbb M(T).
$$
We assume $T$ minimizes $\mathbb M$ over $\mathcal T$. Then $\pi' := (\operatorname{Id, T})_\sharp \mu$ minimizes $\mathbb K$ over $\Pi(\mu, \nu)$. By Knott-Smith optimality, there is a proper l.s.c. convex $\psi \in L_1 (\mu)$ such that $y = \nabla \psi (x)$ for $\pi'$-a.e. $(x, y) \in X \times Y$. Then
$$
\begin{align}
0 &= \int_{X \times Y} |y - \nabla \psi (x)|^2 \mathrm d \pi'(x, y) \\
&= \int_{X \times Y} |y - \nabla \psi (x)|^2 \mathrm d [(\operatorname{Id}, T)_\sharp \mu] (x, y) \\
&= \int_{X} |T(x) - \nabla \psi (x)|^2 \mathrm d \mu (x).
\end{align}
$$
This implies $T = \nabla \psi$ $\mu$-a.e.
- $\impliedby$Now we assume $T \in \mathcal T$ such that $T = \nabla \psi$ $\mu$-a.e. for some proper l.s.c. convex $\psi \in L_1 (\mu)$. Let $\pi' := (\operatorname{Id}, T)_\sharp \mu$. Then
$$
\begin{align}
&\int_{X \times Y} |y - \nabla \psi (x)|^2 \mathrm d \pi'(x, y) \\
= &\int_{X \times Y} |y - \nabla \psi (x)|^2 \mathrm d [(\operatorname{Id}, T)_\sharp \mu] (x, y) \\
= &\int_{X} |T(x) - \nabla \psi (x)|^2 \mathrm d \mu (x) = 0.
\end{align}
$$
Then $y = \nabla \psi (x) \in \partial \psi (x)$ for $\pi'$-a.e. $(x, y) \in X \times Y$. Then $\pi'$ minimize $\mathbb K$ over $\Pi(\mu, \nu)$ by Knott-Smith optimality.
- Let $T_1, T_2$ minimize $\mathbb M$ over $\mathcal T$.
By 2., $T_1 =\nabla \varphi_1$ and $T_2 =\nabla \varphi_2$ for some proper convex l.s.c. $\varphi_1, \varphi_2 \in L_1 (\mu)$. Then $\pi_1^\dagger := (\operatorname{Id}, \nabla \varphi_1)_\sharp \mu$ and $\pi_2^\dagger := (\operatorname{Id}, \nabla \varphi_2)_\sharp \mu$ minimize $\mathbb K$ over $\Pi(\mu, \nu)$. By Knott-Smith optimality, $y = \nabla \varphi_1 (x)$ for $\pi_1^\dagger$-a.e. $(x, y) \in X \times Y$ and $y = \nabla \varphi_2 (x)$ for $\pi_2^\dagger$-a.e. $(x, y) \in X \times Y$. It follows that
$$
\begin{align}
& \int_X \varphi_1 \mathrm d \mu + \int_Y \varphi^*_1 \mathrm d \nu \\
=& \int_{X \times Y} [\varphi_1 (x) + \varphi^*_1 (y)] \mathrm d \pi_2^\dagger (x, y)\\
=& \int_X [\varphi_1 (x) + \varphi^*_1 (\nabla \varphi_2 (x))] \mathrm d \mu (x)
\end{align}.
$$
and
$$
\begin{align}
& \int_X \varphi_2 \mathrm d \mu + \int_Y \varphi^*_2 \mathrm d \nu \\
=& \int_{X \times Y} [\varphi_2 (x) + \varphi^*_2(y)] \mathrm d \pi_2^\dagger (x, y) \\
=& \int_{X} [\varphi_2 (x) + \varphi^*_2( \nabla \varphi_2 (x))] \mathrm d \mu (x) \\
=& \int_{X} \langle x, \nabla \varphi_2 (x)\rangle \mathrm d \mu (x) \quad (\star)
\end{align}
$$
Here $(\star)$ follows from this result about subdifferential and convex conjugate. Also by Knott-Smith optimality,
$$
\int_X \varphi_1 \mathrm d \mu + \int_Y \varphi^*_1 \mathrm d \nu = \int_X \varphi_2 \mathrm d \mu + \int_Y \varphi^*_2 \mathrm d \nu.
$$
So
$$
\int_X [\varphi_1 (x) + \varphi^*_1 (\nabla \varphi_2 (x))] \mathrm d \mu (x) = \int_{X} \langle x, \nabla \varphi_2 (x)\rangle \mathrm d \mu (x).
$$
On the other hand, we always have $\varphi_1 (x) + \varphi^*_1 (\nabla \varphi_2 (x)) \ge \langle x, \nabla \varphi_2 (x)\rangle$. It follows that $\varphi_1 (x) + \varphi^*_1 (\nabla \varphi_2 (x)) = \langle x, \nabla \varphi_2 (x)\rangle$ for $\mu$-a.e. $x\in X$, so $\nabla \varphi_2 (x) \in \partial \varphi_1 (x)$ for $\mu$-a.e. $x\in X$. Hence $\nabla \varphi_2 (x) = \nabla \varphi_1 (x)$ $\mu$-a.e. This completes the proof.