Assume a non-linear relation between the random variables $\mathbf{Y} = f(\mathbf{X})$, where $\mathbf{Y}\sim p_Y$ takes values $\mathbf{y} \in \mathbb{R}^M$ and $\mathbf{X}\sim p_X$ takes values $\mathbf{x} \in \mathbb{R}^N$, with $M\leq N$. My question is about the "inverse" problem described below.
Direct problem - If we know the PDF $p_X$, then the PDF $p_Y$ is formally given by
$$p_Y(\mathbf{y}) = \int \delta^M(f(\mathbf{x})-\mathbf{y} ) \, p_X(\mathbf{x}) \, d^Nx$$
In general, this expression can not be handled with analytical techniques. However, we can sample some values $\mathbf{x}_i\sim p_X$: the scatter of the $f(\mathbf{x}_i)$ values already allows us to probe $p_Y$.
Inverse problem - The PDF $p_Y$ and the map $f$ are given. We want to estimate $p_X$. Unfortunately, the formal expression
$$ p_X(\mathbf{x}) = \int \delta^N(f^{-1}(\mathbf{y})-\mathbf{x} ) \, p_Y(\mathbf{y}) \, d^My $$
is useless: contrary to the previous case, it does not allow us to come up with a practical strategy (i.e. the "sampling" strategy above). The expression does not even make sense since $M\leq N$, plus we do not know the, potentially multivalued, map $f^{-1}$. Only if $f$ is a bijective, differentiable function, we may use the change of variables formula but, again, we have the practical problem that $f^{-1}$ is not analytically known (see e.g. this, this, this and this questions).
Does this inverse problem have a "name"? Of course, it is not always a well-posed problem, e.g. $f(\mathbf{x})=\mathbf{y}_0$, where $\mathbf{y}_0$ is a constant vector so that $p_Y= \delta^M(\mathbf{y}-\mathbf{y}_0 ) $ regardless of $p_X$: knowing $f$ and $p_Y$ tells us nothing about $p_X$. However, in most cases, the knowledge of $f$ and $p_Y$ should allow us to "know something" about $p_X$.
Is there any practical strategy/approach to tackle it? Maybe a Bayesian inference approach where some $\mathbf{y}_i$ distributed according to the known $p_Y$ is treated as the "data" and we infer $p_X$? Or maybe a maximum entropy approach where we try to maximize our ignorance on $p_X$ while accounting for constraints coming from knowledge of $p_Y$ and $f$?
Reference: A few days after posting the question I found this interesting reference that is very promising: D. Sanz-Alonso et al. Inverse Problems and Data Assimilation, available on arXiv.