I am given two metric measure (MM) spaces $(X, \mu, d_{X})$, $(Y, \nu, d_{Y})$, where $X, Y \in \mathbb{E}^{2}$ Euclidean. I would like to minimize the Kullback-Leibler divergence between $\mu$ and $\nu \circ \phi^{-1}$ by optimizing $\phi : Y \rightarrow X$, diffeomorphism. The problem is the following \begin{equation} J[\phi] = KL(\mu || \nu \circ \phi^{-1})= \int \log{\left( \frac{d \mu}{d \nu \circ \phi^{-1}} \right)} d\mu \rightarrow min_{\phi} \end{equation} To do so I am planning to compute the functional derivative of $J[\phi]$, however I end up with the following expression, which seems hard to variate. In the following $\varepsilon \cdot h = \delta \phi$ is the variation and $\varepsilon \in \mathbb{R}, h \in L^{1}$. Based on my calculations and pushforward by diffeomorphism \begin{equation} \left[ \frac{d}{d \varepsilon} J[\phi + \varepsilon \cdot h] \right]_{\varepsilon = 0} = \left[ \frac{d}{d \varepsilon} \int \log \left( \frac{d \mu}{d \nu} \right) d \mu + \int\log(|\det(J_{\phi + \varepsilon \cdot h})|) d \mu \right]_{\varepsilon = 0},\ (1) \end{equation} where $J_{\phi}$ is the Jacobian of the diffeomorphism $\phi$. Deriving $(1)$ further one arrives at the following \begin{equation} \frac{\delta J[\phi]}{\delta \phi} = (J_{\phi}^{-1})^{T} = (J_{\phi}^{T})^{-1} \end{equation}
The problem here is, I can't see how this will minimize the KL divergence between $\mu$ and $\nu$. This rather proves that the KL divergence is unchanged when one of the measures is transformed with a diffeomorphism, since if $\phi$ is a diffeomorphism it will be a volume preserving measurable mapping, ending in $\det{(\phi)} =1$.
I have been thinking going on an alternative way with the conjugate of the KL divergence as the following \begin{equation} KL(\mu || \nu \circ \phi^{-1})=1 + \sup_{g}\left\{\int g d\mu - \int e^{g} d \nu \circ \phi^{-1} \right \} \text{, g is Borel measurable.} \end{equation}
Anyone has an idea or approach to minimize the KL divergence between two measures, by a push-forward of one of the measures?