Inverse propagation of information from the PDF of $Y=f(X)$ to the PDF of $X$

Question

Assume a non-linear relation between the random variables $\mathbf{Y} = f(\mathbf{X})$, where $\mathbf{Y}\sim p_Y$ takes values $\mathbf{y} \in \mathbb{R}^M$ and $\mathbf{X}\sim p_X$ takes values $\mathbf{x} \in \mathbb{R}^N$, with $M\leq N$. My question is about the "inverse" problem described below.

Direct problem - If we know the PDF $p_X$, then the PDF $p_Y$ is formally given by

$$p_Y(\mathbf{y}) = \int \delta^M(f(\mathbf{x})-\mathbf{y} ) \, p_X(\mathbf{x}) \, d^Nx$$

In general, this expression can not be handled with analytical techniques. However, we can sample some values $\mathbf{x}_i\sim p_X$: the scatter of the $f(\mathbf{x}_i)$ values already allows us to probe $p_Y$.

Inverse problem - The PDF $p_Y$ and the map $f$ are given. We want to estimate $p_X$. Unfortunately, the formal expression

$$ p_X(\mathbf{x}) = \int \delta^N(f^{-1}(\mathbf{y})-\mathbf{x} ) \, p_Y(\mathbf{y}) \, d^My $$

is useless: contrary to the previous case, it does not allow us to come up with a practical strategy (i.e. the "sampling" strategy above). The expression does not even make sense since $M\leq N$, plus we do not know the, potentially multivalued, map $f^{-1}$. Only if $f$ is a bijective, differentiable function, we may use the change of variables formula but, again, we have the practical problem that $f^{-1}$ is not analytically known (see e.g. this, this, this and this questions).

Does this inverse problem have a "name"? Of course, it is not always a well-posed problem, e.g. $f(\mathbf{x})=\mathbf{y}_0$, where $\mathbf{y}_0$ is a constant vector so that $p_Y= \delta^M(\mathbf{y}-\mathbf{y}_0 ) $ regardless of $p_X$: knowing $f$ and $p_Y$ tells us nothing about $p_X$. However, in most cases, the knowledge of $f$ and $p_Y$ should allow us to "know something" about $p_X$.
Is there any practical strategy/approach to tackle it? Maybe a Bayesian inference approach where some $\mathbf{y}_i$ distributed according to the known $p_Y$ is treated as the "data" and we infer $p_X$? Or maybe a maximum entropy approach where we try to maximize our ignorance on $p_X$ while accounting for constraints coming from knowledge of $p_Y$ and $f$?

Reference: A few days after posting the question I found this interesting reference that is very promising: D. Sanz-Alonso et al. Inverse Problems and Data Assimilation, available on arXiv.

I think Bayesian estimation can be used successfully in this situation. I recommend the book "Bayesian Filtering and Smoothing" by Simo Sarkka. It is very easy to follow. — obareey, Mar 15 '24 at 15:10

Eric · Answer 1 · 2024-03-20T12:48:03.920

In Measure Theory (advanced probability theory), your direct problem is known as the push forward and the inverse could probably be considered the pullback.

As highlighted here (https://mathoverflow.net/questions/122704/pullback-measures), “To define pullbacks of measures we need some additional data, because otherwise one would be able to obtain a canonical measure on an arbitrary measurable space M by pulling back the canonical measure on the point along the unique map M→pt.”

In other words, you need some kind of way of highlighting the relative importance of various points within $f^{-1}(y)$ for any $y$.

In probability theory terms, if we have some prior distribution $p_X$ and some known final distribution $p_Y$, then we want a distribution $q_X$ that pushes forward to $p_Y$ and that for any given value of $f(x)$ has the same density within $f^{-1}(x)$ as $p_X$. The simplest way to do this is by scaling up $p_X$. We can define some weight function $w(y)$ as the ratio of $p_Y$ and the push forward of $p_X$ (technically the Radon-Nikodym derivative) so that: $$w(y) \int \delta^M(f(x)-y)p_X(x) d^Nx = p_Y(y) \delta^M(y)$$ Then, we can just scale up $p_X$: $$q_X(x)= p_X(x) w(f(x))$$

Using capital letters for the equivalent measure and the derivative represents the Radon-Nikodym derivative this is fairly natural in measure-theoretic notation: $$Q_X(A)=\int_A \frac{d P_Y}{d(P_X \circ f^{-1})}(f(x))dP_X(x)$$

This is nice because $\frac{dQ_X}{dP_X}=\frac{d P_Y}{d(P_X \circ f^{-1})}(f(x))$ i.e. the relative densities of $Q_X$ and $P_X$ is fixed when $f(x)$ is fixed. Also, canceling out terms and substituting gives that $Q_X(f^{-1}(B))=P_Y(B)$ so it has the correct push forward implied distribution on $Y$.

Thank you! This does not really fully solve my doubts but the re-weighting procedure is interesting, hence I will aware the bounty to this answer. I will keep this post updated in case I find out more. — Quillo, Mar 21 '24 at 15:14

Inverse propagation of information from the PDF of $Y=f(X)$ to the PDF of $X$

1 Answers1