Given a probability space $\left( \Omega\mathcal{,F,}\mathbb{P} \right)$, and two $\mathcal{F}$-measurable real-valued random variables $X,Y$, then the joint random variable $\left( X,Y \right)$ can be defined on a product space $\left( \Omega^{2},\sigma\left( \mathcal{F}^{2} \right),\mathbb{P \times P} \right)$ where $\mathbb{P \times P}$ is the product measure of $\mathbb{P}$. Let $f\left( x,y \right),f_{X}\left( x,y \right),f_{Y}\left( y \right)$ be the density functions (Randon-Nikodym derivatives) of $\left( X,Y \right),X,Y$ respectively, and let $f_{X|Y}\left( x,y \right)$ be the density function of $X$ conditioned on $Y$.
Anyone can help with a construction, or proof or related materials about the Bayes rule $f_{X|Y}\left( x|y \right) = \frac{f\left( x,y \right)}{f_{Y}\left( y \right)}$? We may also instead consider the other version $f_{X|Y}\left( x|y \right) = \frac{f_{Y|X}\left( y|x \right)f_{X}\left( x \right)}{f_{Y}\left( y \right)}$ which does involve the joint random variable. I do not understand how the this Bayes rule is formulated in measure theory. This is a widely used formula, while I cannot find any construction or proof from my probability books.
I can find related definition for "conditional density" in the following way. There could be other definitions.
We denote the integration w.r.t. the measure $\mathbb{P \circ}X^{- 1}$ of a RV as $\int_{B}^{}{dX} := \int_{B}^{}{d\left( \mathbb{P \circ}X^{- 1} \right)}$ for simplicity. Define the conditional probability measures $\mathbb{P}_{y},y \in Y\left( \Omega \right)$ as a family of probability measures on $\left( \Omega\mathcal{,F} \right)$ s.t. two axioms hold: 1) $\mathbb{P}_{y}\left( A \right)$ is $\left( \mathbb{R,}\mathcal{B}\left( \mathbb{R} \right) \right)$-measurable for any $A \in \mathcal{F}$ (given a fixed $A \in \mathcal{F}$,$\ \mathbb{P}_{y}\left( A \right)$ is a $\mathbb{R \rightarrow}\left\lbrack 0,1 \right\rbrack$ function w.r.t. index $y$); and 2) the general version of law of total probability
$$\int_{B}^{}{\mathbb{P}_{y}\left( A \right)dY}\mathbb{= P}\left( A\bigcap Y^{- 1}\left( B \right) \right),\forall A \in \mathcal{F}, B \in \mathcal{B}\left ( \mathbb R \right)$$
We then denote $\mathbb{P}\left( A|Y = y \right) = \mathbb{P}_{y}\left( A \right),\forall A \in \mathcal{F}$ as the conditional probability measure given event $Y = y$. Then for any RV $X$, the conditional probability density function $f_{X|Y}\left( x|y \right)$ is the Radon-Nikodym derivative of distribution $\mathbb{P}_{y} \circ X^{- 1}$
I list all relations I can conceive, based on above definition,
$$\int_{B}^{}{\mathbb{P}_{y}\left( A \right)dY}\mathbb{= P}\left( A\bigcap Y^{- 1}\left( B \right) \right),\forall A\mathcal{\in F,}B \in \mathcal{B}\left( \mathbb{R} \right)$$
$$\int_{B}^{}{\mathbb{P}_{x}\left( A \right)dY}\mathbb{= P}\left( A\bigcap X^{- 1}\left( B \right) \right),\forall A\mathcal{\in F,}X \in \mathcal{B}\left( \mathbb{R} \right)$$
$$\int_{B}^{}{f_{X|Y}\left( x|y \right)} = \mathbb{P}_{y}\left\{ X^{- 1}\left( B \right) \right\},\forall B \in \mathcal{B}\left( \mathbb{R} \right)$$
$$\int_{B}^{}{f_{Y|X}\left( y|x \right)} = \mathbb{P}_{x}\left\{ X^{- 1}\left( B \right) \right\},\forall B \in \mathcal{B}\left( \mathbb{R} \right)$$
$$\int_{B}^{}{f_{Y}\left( y \right)} = \mathbb{P}\left\{ Y^{- 1}\left( B \right) \right\},\forall B \in \mathcal{B}\left( \mathbb{R} \right)$$
$$\int_{B}^{}{f_{X}\left( x \right)} = \mathbb{P}\left\{ X^{- 1}\left( B \right) \right\},\forall B \in \mathcal{B}\left( \mathbb{R} \right)$$