Do we get a product regular conditional probability for conditionally independent random variables in Polish spaces?

Question

Theorem 8.37 in Klenke's Probability Theory ensures the existence (albeit not uniqueness) of a regular conditional probability for Borel spaces (in particular Polish spaces).

Now, say we have two random variables $X,Y$ living in Polish spaces equipped with the Borel algebra. Can we then find a regular conditional distribution $\kappa$ of $(X,Y)$ given $\mathcal F$ that is a product measure (almost) everywhere (i.e. $\kappa_\omega=\kappa_{1,\omega}\otimes\kappa_{2,\omega}$ is the product of its marginals), whenever $X$ and $Y$ are conditionally independent given $\mathcal F$?

I asked around, I went through all of my favorite books and online resources. I have not tried to prove or disprove it myself. My not too educated guess would be that this should hold, for similar reasons that ensure the existence of a regular conditional distribution.

BONUS QUESTION: Does this also hold for countable products?

If the conditional distribution $\kappa_{\mathcal{F}, (X, Y)}(\omega, \cdot)$ of $(X, Y)$ given $\mathcal{F}$ is a product measure for almost all $\omega \in \Omega$, then wouldn't this imply that $X$ and $Y$ are independent given $\mathcal{F}$? This means such a kernel definitely does not always exist; there are counter examples when $\mathcal{F}$ is the sigma algebra generated by a random variable $Z$. — Mason, Oct 12 '22 at 20:10
Thanks for the comment, I forgot to add the important part. The question is if conditional independence implies the existence of a version of the conditional probabilities that is a product measure. — Matija, Oct 12 '22 at 20:13

score 2 · Answer 1 · answered Oct 12 '22 at 20:30

Some thoughts: I'm think it is true when $X$ and $Y$ take values in a Polish space $E$ (or in different Polish spaces). Basically, the conditional independence assumption means that for $A, B \subset E$, we have $\kappa_{\mathcal{F}, (X, Y)}(\omega, A \times B) = \kappa_{\mathcal{F}, X}(\omega, A)\kappa_{\mathcal{F}, Y}(\omega, B)$ for $\omega \in \Omega_{A, B}$, where $P(\Omega_{A, B}) = 1$. But what we want is to find one $\Omega_{0}$ that works for all $A, B$. Taking $\Omega_0 = \bigcap_{A, B \subset E}\Omega_{A, B}$ fails because this is an uncountable intersection. Here I think you can prove $\Omega_0$ exists using second countability, and the $\pi$-$\lambda$ theorem. My thought is that you take a countable base $\{A_1, A_2, \dots\}$ for the topology of $E$ and let $\Omega_0 = \cap_{i, j \geq 1}\Omega_{A_i, A_j}$, but I'll have to work this out some time.

I worked it out, hopefully correctly, staying fairly close to your strategy. Thanks! — Matija, Oct 14 '22 at 00:00

Matija · Accepted Answer · 2022-10-15T10:21:57.010

The following references are from Klenke's Probability Theory (2nd Edition, 2014) unless mentioned otherwise.

The result holds by Theorem 17.43 in Probability and Conditional Expectation (Nagel & Steyer, 2017), which establishes the claim for any countably generated measurable spaces, and Corollary 3.3.8 with Theorem 3.3.7 in Borel Spaces by Berberian (1988, terminology heavily differs from other sources) state that a measurable space (Borel space in their terminology) is countably generated if and only if it is a Borel space (terminology in Klenke's book). This means that Theorem 17.43 above is stated exactly for Borel spaces. By Theorem 8.36 Polish spaces with their Borel algebra are Borel spaces.

In the following we present the proof.

Let $(\Omega,\Sigma,\mathbb P)$ be the underlying, sufficiently rich, probability space. Let $(\mathcal X_i,\Sigma_i)$ be Borel spaces (using Definition 8.35) for $i=1,2$. Further, fix random variables $X_i\in\mathcal X_i$ and a $\sigma$-algebra $\mathcal F\subseteq\Sigma$ such that $X_1$ and $X_2$ are conditionally independent given $\mathcal F$. Then there exist regular conditional distributions $\kappa_i:\Omega\times\Sigma_i\rightarrow[0,1]$, $(\omega,\mathcal E)\mapsto\kappa_{i,\omega}(\mathcal E)$, of $X_i$ given $\mathcal F$ which are unique almost surely, and $\kappa_1\otimes\kappa_2:\Omega\times\Sigma_1\otimes\Sigma_2\rightarrow[0,1]$, $(\omega,\mathcal E)\mapsto(\kappa_{1,\omega}\otimes\kappa_{2,\omega})(\mathcal E)$, is a conditional regular distribution of $(X_1,X_2)$ given $\mathcal F$ which is unique almost surely.

Short proof:

Let $\varphi_i$ be the isomorphism for $\mathcal X_i$ from the definition. Notice that we can reduce the discussion to random variables $X_i\in B_i\in\mathcal B(\mathbb R)$ since the complete structure carries over, including $\mathcal X_1\times\mathcal X_2$ using $\varphi_1\times\varphi_2$.
Use the vanilla $\pi$-$\lambda$-theorem for $\mathcal B(\mathbb R^2)$, and conditional independence.

Long proof: Let $B_i\in\mathcal B(\mathbb R)$ and let $\varphi_i:(\mathcal X_i,\Sigma_i)\rightarrow(B_i,\mathcal B(B_i))$ be the isomorphism from Definition 8.35. Notice that $\varphi_i:2^{\mathcal X_i}\rightarrow 2^{B_i}$, $S\mapsto\varphi_i(S)$, is a bijection of the power sets, where $\varphi_i(S)=\{\varphi_i(x):x\in S\}$. This implies that also $\varphi_i:2^{2^{\mathcal X_i}}\rightarrow 2^{2^{B_i}}$ is a bijection. Let $\mathcal T^*\subseteq\mathcal B(\mathbb R)$ be the standard topology of $\mathbb R$, and let $\mathcal T^\circ_i=\mathcal T\cap B_i\subseteq\mathcal B(\mathbb R)\cap B_i=\mathcal B(B_i)$ be the subspace topology of $B_i$. Using that $\mathcal T^\circ_i\subseteq 2^{B_i}$ and hence $\mathcal T^\circ_i\in 2^{2^{B_i}}$, we observe that $\mathcal T_i=\varphi_i^{-1}(\mathcal T^\circ_i)$ is a topology on $\mathcal X_i$ (recall that preimages commute with complements and arbitrary unions and intersections) and that $\Sigma_i=\mathcal B(\mathcal X_i)$ is the Borel algebra with respect to $\mathcal T_i$. This shows that $\mathcal X_i$ and $B_i$ are equivalent in the strongest possible sense (using $\varphi_i$ we can carry the enire remaining structure of $B_i$ over to $\mathcal X_i$ and consider $\mathcal X_i$ as a copy of $B_i$), and $\varphi_i$ is now also an isomorphism of $\mathcal X_i$ and $B_i$ as topological spaces (i.e. a continuous open bijection). To be overly precise, notice that also $\varphi_1\times\varphi_2:\mathcal X_1\times\mathcal X_2\rightarrow B_1\times B_2$, $(x_1,x_2)\mapsto(\varphi_1(x_1),\varphi_2(x_2))$, is a bijection. Let $\mathcal T_1\otimes\mathcal T_2$ be the product topology and notice that the projections $\pi^\circ_i:B_1\times B_2\rightarrow B_i$ are continuous with respect to the topology $(\varphi_1\times\varphi_2)(\mathcal T_1\otimes\mathcal T_2)$ (write $\pi^\circ_i=\varphi_i\circ\pi_i\circ(\varphi_1\times\varphi_2)^{-1}$) and $\pi_i:\mathcal X_1\times\mathcal X_2\rightarrow\mathcal X_i$ is continuous with respect to the topology $(\varphi_1\times\varphi_2)^{-1}(\mathcal T^\circ_1\otimes\mathcal T^\circ_2)$, so $(\varphi_1\times\varphi_2)(\mathcal T_1\otimes\mathcal T_2)=\mathcal T^\circ_1\otimes\mathcal T^\circ_2$. Using that $(\varphi_1\times\varphi_2)(A\times B)=\varphi_1(A)\times\varphi_2(B)$ are exactly the boxes, and since the generator is included in the RHS, we know that $\Sigma_1\otimes\Sigma_2\subseteq(\varphi_1\times\varphi_2)^{-1}(\mathcal B(B_1)\otimes\mathcal B(B_2))$, and vice versa, which shows that $\Sigma_1\otimes\Sigma_2=(\varphi_1\times\varphi_2)^{-1}(\mathcal B(B_1)\otimes\mathcal B(B_2))=(\varphi_1\times\varphi_2)^{-1}(\mathcal B(\mathcal T^\circ_1\otimes\mathcal T^\circ_2))=\mathcal B(\mathcal T_1\otimes\mathcal T_2)$, where we additionally used Lemma 1.2 in Kallenberg's Probability Theory, 3rd Edition, 2021, and that the equivalence of the product topologies implies the equivalence of the Borel algebras, which is shown analogously to the product algebras. This means that any assertion for random variables $X_i\in\mathcal X_i$ and $(X_1,X_2)\in\mathcal X_1\times\mathcal X_2$ holds if and only if it holds for $\varphi_i\circ X_i\in B_i$ and $(\varphi_1\circ X_1,\varphi_2\circ X_2)\in B_1\times B_2$. To be precise, this statement is true if the assertion only relies on the topologies and $\sigma$-algebras above, or if the remaining required structure is inherited from $B_1$, $B_2$, $B_1\times B_2$ via $\varphi_i$ as demonstrated above.

Thus, we may assume that $X_i\in B_i$. We start with the special case $B_i=\mathbb R$. Recall that there exists a regular conditional distribution $\kappa_i:\Omega\times\mathcal B(\mathbb R)\rightarrow[0,1]$ of $X_i$ given $\mathcal F$. Consider $\mathbb R^{\mathcal B(\mathbb R)}$ equipped with the product algebra $\mathcal B(\mathbb R)^{\otimes\mathcal B(\mathbb R)}$ (from Definition 14.4), then $P_i:\Omega\rightarrow\mathbb R^{\mathcal B(\mathbb R)}$, $\omega\mapsto(\kappa_i(\omega,\mathcal E))_{\mathcal E\in\mathcal B(\mathbb R)}$, is $\mathcal F$-measurable (Lemma 1.9 in Kallenberg's Probability Theory, 3rd Edition, 2021, where we may consider $\Omega$ equipped with $\mathcal F$) because $\kappa_i$ is a stochastic kernel. Using the countably infinite generator $\mathcal G=\mathcal E_{10}=\{(-\infty,a]:a\in\mathbb Q\}$ in Theorem 1.23 to consider the $\mathcal F$-measurable (Lemma 14.7) restriction $P_i^{\mathcal G}:\Omega\rightarrow\mathbb R^{\mathcal G}$, $\omega\mapsto(\kappa(\omega,\mathcal E))_{\mathcal E\in\mathcal G}$. Now, let $\kappa':\Omega\times\mathcal B(\mathbb R)\rightarrow[0,1]$ be a second regular conditional distribution of $X_i$, and define $P'\in\mathbb R^{\mathcal B(\mathbb R)}$ accordingly. Notice that $P_i$ and $P'$ are random variables, and that by definition we have $\mathbb P(P_i(\mathcal E)=\mathbb P(X_i\in\mathcal E|\mathcal F)=P'(\mathcal E))=1$ for all $\mathcal E\in\mathcal B(\mathbb R)$ (recall that the diagonal in $\mathbb R^3$ is measurable). But this implies that $\mathbb P(P_i^{\mathcal G}=P'^{\mathcal G})=1$, since this is a countable intersection. For $\omega\in\{\omega:P^{\mathcal G}_i(\omega)=P'^{\mathcal G}(\omega)\}$ we thus have $P_i(\omega)=P'(\omega)$ by Theorem 1.23, Remark 1.24 and Lemma 1.42, and thereby we have $\kappa_i(\omega,\mathcal E)=\kappa'(\omega,\mathcal E)$ unless $\omega\in\mathcal N$, where $\mathcal N=\{\omega:P^{\mathcal G}_i(\omega)\neq P'^{\mathcal G}(\omega)\}$ is a null set. The result for $(X_1,X_2)$ is shown analogously, where we take $\mathcal G^2=\{(-\infty,a]:a\in\mathbb Q^2\}$ (which is $\mathcal E_{10}$ for $n=2$) and the restriction $P^{\mathcal G^2}$.

Hence, we are left to show that $\kappa_1\otimes\kappa_2$ is a regular conditional distribution of $(X_1,X_2)$ given $\mathcal F$, where $\kappa_1\otimes\kappa_2:\Omega\times\mathcal B(\mathbb R^2)\rightarrow[0,1]$ is given by $(\kappa_1\otimes\kappa_2)(\omega,\mathcal E)=(P(\omega))(\mathcal E)$ and $P_\times(\omega)$ is the unique product measure of $P_1(\omega)$ and $P_2(\omega)$ (with $P_i$ from above). Theorem 14.22 suggests that $\kappa_1\otimes\kappa_2$ is a kernel and in particular $\mathcal F$-measurable. In this notation and using these versions of the conditional probabilities, conditional independence (Definition 12.20) asserts that $\mathbb P(P(\mathcal E_1\times\mathcal E_2)=P_1(\mathcal E_1)P_2(\mathcal E_2))=1$ for all $\mathcal E\in\mathcal B(\mathbb R)^2$, but since $P_\times$ is the product measure, i.e. $(P_\times(\omega))(\mathcal E_1\times\mathcal E_2)=(P_1(\omega))(\mathcal E_1)(P_2(\omega))(\mathcal E_2)$ for all $\omega$ and $\mathcal E_{1,2}$, and using the same argument as above, we obtain $\mathbb P(P^{\mathcal G^2}=P_\times^{\mathcal G^2})=1$, where $P_\times^{\mathcal G^2}$ is the restriction of $P_\times$ to $\mathcal G^2$, analogous to the above. As we have seen above, we have $P(\omega)=P_\times(\omega)$ on $\{P^{\mathcal G^2}=P_\times^{\mathcal G^2}\}$, i.e. with probability $1$.

Now, consider general $B_i\in\mathcal B(\mathbb R)$. Let $X^*_i=\varphi_i\circ X_i$ with $\varphi_i:B_i\rightarrow\mathbb R$, $x\mapsto x$, which is measurable because $\mathcal B(B_i)=\mathcal B(\mathbb R)\cap B_i$ is the trace (Corollary 1.84). Also, notice that $(X^*_1,X^*_2)=\varphi\circ(X_1,X_2)$ with $\varphi:B_1\times B_2\rightarrow\mathbb R^2$, $x\mapsto x$, and that $\mathcal B(B_1)\times\mathcal B(B_2)=\mathcal B(B_1\times B_2)=\mathcal B(\mathbb R^2)\cap(B_1\times B_2)$ from above. Now, for a regular conditional distribution $\kappa^*_i$ of $X^*_i$ given $\mathcal F$, we let $\kappa_i(\omega,\mathcal E)=\kappa_i^*(\omega,\mathcal E)$ for all $\mathcal E\in\mathcal B(B_i)$ (which are in $\mathcal B(\mathbb R)$ because $B_i\in\mathcal B(\mathbb R)$), whenever $\kappa_1^*(\omega,B_1)=\kappa_2^*(\omega,B_2)=1$ (which is $\mathcal F$-measurable because $\kappa_i^*$ is a kernel), and otherwise we let $\kappa_i(\omega,\mathcal E)=\unicode{120793}\{x\in\mathcal E\}$ for some fixed $x_i\in B_i$. We can easily check that this is a kernel, and also a regular conditional distribution of $X_i$ given $\mathcal F$ because $\kappa_i^*(\omega,B_i)$ is a version of $\mathbb P(X^*_i\in B_i|\mathcal F)(\omega)$ and so is $\omega\mapsto 1$, since we have $X^*_i(\Omega)\subseteq B_i$ by definition. Analogously, we see that $\mathbb P(X^*_i\in\mathcal E|\mathcal F)$ is a version of $\mathbb P(X_i\in\mathcal E|\mathcal F)$ for all $\mathcal E\in\mathcal B(B_i)$, hence this holds for $\kappa^*_i(\cdot,\mathcal )$ and by that $\kappa_i(\cdot,\mathcal E)$ (which agrees with $\kappa^*_i$ on all but a null set) is a version of $\mathbb P(X_i\in\mathcal E|\mathcal F)$, establishing that $\kappa_i$ is a regular conditional distribution of $X_i$ given $\mathcal F$. For uniqueness let $\kappa':\Omega\times\mathcal B(B_i)\rightarrow[0,1]$ be a regular conditional distribution of $X_i$ given $\mathcal F$. Let $\kappa'':\Omega\times\mathcal B(\mathbb R)\rightarrow[0,1]$, $(\omega,\mathcal E)\mapsto\kappa'(\omega,\mathcal E\cap B_i)$ (which is well-defined since $\mathcal B(B_i)=\mathcal B(\mathbb R)\cap B_i$ is the trace). Notice that $\kappa''$ is a kernel by checking directly, and a regular conditional distribution of $X^*_i$, since $\mathbb P(X_i\in\mathcal E\cap B_i|\mathcal F)$ is a version of $\mathbb P(X^*_i\in\mathcal E|\mathcal F)$. Now, uniqueness implies that $\kappa''$ and $\kappa^*_i$ only differ on a null set, thus $\kappa'$ and $\kappa_i$ only differ on a null set (since $\kappa'(\omega,\mathcal E)=\kappa''(\omega,\mathcal E)$ always). Clearly, the result for $(X_1,X_2)$ follows. Of course, here we don't want to take any regular conditional distribution for $(X^*_1,X^*_2)$, we take $\kappa^*_1\otimes\kappa^*_2$, and we also do not want to take any point $x\in B_1\times B_2$, we take $x=(x_1,x_2)$. For $\kappa^*_1(\omega,B_1)=\kappa^*_2(\omega,B_2)=1$ we have $(\kappa^*_1\otimes\kappa^*_2)(B_1\times B_2)=1$, thus the projection $\kappa$ is a probability measure for each $\omega$, hence a kernel by the above, and finally a product measure on the null set since $\unicode{120793}\{x_1\in\mathcal E_1,x_2\in\mathcal E_2\}=\unicode{120793}\{x_1\in\mathcal E_1\}\unicode{120793}\{x_2\in\mathcal E_2\}$ and otherwise the property is inherited from $\kappa^*$, so $\kappa=\kappa_1\otimes\kappa_2$.

BONUS QUESTION: The proof for $n$-fold products is completely analogous (strictly speaking, induction does not apply because we assume factorization for all measurable subsets in the definition of conditional independence for two rv's, but we only get it for products when looking at $(n-1)+1$ rv's).

For the countable case we recall that conditional independence is defined as conditional independence for any finite selection, i.e. $X_{i_1},\dots,X_{i_k}$ are conditionally independent for all distinct $i_1,\dots,i_k\in\mathbb N$, $k\in\mathbb N_{\ge 2}$. We understand the product $\bigotimes_{i=1}^\infty\kappa_i$ to be given by the unique product measure for each $\omega\in\Omega$, i.e. $(\bigotimes_{i=1}^\infty\kappa_i)(\omega,\mathcal E)= (\bigotimes_{i=1}^\infty\kappa_i(\omega))(\mathcal E)$.

First, recall that the product $\sigma$-algebra of spaces with Borel algebra equals the Borel algebra of the product (with the topological product, Kallenberg's Lemma 1.2), in this sense Borel spaces are closed with respect to taking countable products and we can safely restrict to Borel sets $B_i\in\mathcal B(\mathbb R)$, $i\in\mathbb Z_{>0}$. For the special case $B_i=\mathbb R$ for all $i$, since these and $\mathbb R^{\mathbb N}$ are still Borel spaces, they are still countably generated and we recover uniqueness of the regular conditional distribution, also for the product.

Next, we verify that the (by the above existing unique) regular conditional distribution $\kappa:\Omega\times\mathcal B(\mathbb R^{\mathbb N})\rightarrow[0,1]$ of $(X_i)_i$ given $\mathcal F$ coincides with the product $\bigotimes_{i=1}^n\kappa_i:\Omega\times\mathcal B(\mathbb R^n)\rightarrow[0,1]$ of the kernels $\kappa_i$ of $X_i$ given $\mathcal F$, $i=1,\dots,n$. To be precise, let $P:\Omega\rightarrow\mathbb R^{\otimes\mathcal B(\mathbb R^{\mathbb N})}$, $\omega\mapsto(\kappa(\omega,\mathcal E))_{\mathcal E\in\mathcal B(\mathbb R^{\mathbb N})}$, which is still measurable by the universal property. We obtain $P_n:\Omega\rightarrow\mathbb R^{\mathcal B(\mathbb R^n)}$, $\omega\mapsto(P(\mathcal E\times\mathbb R^{\mathbb N\setminus\{1,\dots,n\}}))_{\mathcal E\in\mathcal B(\mathbb R^n)}$, as a projection (the relabeling map to move from sets in $\mathcal B(\mathbb R^{\mathbb N})$ to $\mathcal B(\mathbb R^n)$ is also measurable by the universal property), on the other hand we have $P'_n:\Omega\rightarrow\mathbb R^{\mathcal B(\mathbb R^n)}$, $\omega\mapsto(\bigotimes_{i=1}^n\kappa_i(\omega,\mathcal E))_{\mathcal E\in\mathcal B(\mathbb R^n)}$. Now, we get that $P_n=P'_n$ almost surely as in the finite size case, using the countable generator. Since this amounts to a countable number of null sets, we obtain $\mathcal P=\{\forall n P_n=P'_n\}$ almost surely, so $\mathcal N=\Omega\setminus\mathcal P$ is a null set. On $\mathcal P$ the Ionescu-Tulcea Theorem 14.32 ensures that $P(\omega)=P'(\omega)$, where $P'(\omega)=\bigotimes_{i=1}^\infty P_i(\omega)$, where the measures $P_i(\omega)$ on $(\mathbb R,\mathcal B(\mathbb R))$ are given by $P_i(\mathcal E)=\kappa_i(\omega,\mathcal E)$.

For general $B_i$ we proceed as before, in particular we set $\kappa_i(\omega,\mathcal E)=\unicode{120793}\{x_i\in\mathcal E\}$ on $\mathcal N$ for some fixed $x_i\in B_i$, and $\kappa_i(\omega,\mathcal E)=\unicode{120793}\{x\in\mathcal E\}$, where $x=(x_i)_i$. The remainder is completely analogous. There is one remarkable difference compared to the previous proof. We have not shown (and needed) that $\bigotimes_{i=1}^\infty\kappa_i$ is measurable for $B_i=\mathbb R$. Instead, we have chosen versions $\kappa$ and $\kappa_i$ that are consistent one-point masses on $\mathcal N$, to ensure measurability (using only that one-point masses are measurable, which also holds for $\mathbb R^{\mathbb N}$ because it is a Borel space, i.e. we can reduce this to $\mathbb R$).

Do we get a product regular conditional probability for conditionally independent random variables in Polish spaces?

2 Answers2

Linked