Where does the randomness come in for conditional expectations $\mathbb{E}[X | \mathcal{F}]$?

Question

For continuous random variables $X, Y$ the conditional expectation $\mathbb{E}[X | Y]$ is itself a random variable. I understood this in the sense that for a realisation of $Y$ we can say

$$ \mathbb{E}[X | Y=y] = \int_{-\infty}^{\infty}xf_{X|Y}(x|y)dx = \frac{1}{f_Y(y)}\int_{-\infty}^{\infty}xf_{X,Y}(x,y)dx. $$

And so I imagine a random elementary event happening on $\Omega$ that gives a corresponding result to $Y$ and allows conditioning $X$ as above.

The signature would then be something like this

$$\mathbb{E}[X | Y]: \Omega \to \mathbb{R}, \quad \omega \mapsto \mathbb{E}[X | Y(\omega)]$$

The conditional expectation here is random because $Y$ is a random variable that gives an output to the random occurrences on the magical space $\Omega$.

However in more advanced courses and textbooks I studied on the matter the conditional expectation is often introduced via sub-$\sigma$-algebras and then defined by some characterisation like this:

Let $X \in L_1(\Omega, \mathcal{A}, \mathbb{P})$ and $Y \in L_1(\Omega, \mathcal{F}, \mathbb{P})$ where $\mathcal{F}$ is a sub-$\sigma$-algebra of $\mathcal{A}$. Then

$$ Y = \mathbb{E}[X | \mathcal{F}] \quad \iff \quad \forall F \in \mathcal{F}: \mathbb{E}[\mathbb{1}_F X] = \mathbb{E}[\mathbb{1}_F Y]. $$

These concepts supposedly coincide as $\mathbb{E}[X | Y] = \mathbb{E}[X | \sigma(Y)]$ and $\mathbb{E}[X | \mathcal{F}]$ is understood to be a random variable as well.

My concern is with that last fact. $\mathcal{F}$ is a set of subsets of $\Omega$, so I'd say it is in principle deterministic.

Where does the randomness come in now?

What happens if the experiment that underlies $\Omega$ produces a random event? I.e. how does that influence the conditional expectation $\mathbb{E}[X | \mathcal{F}]$?

My issue here is not to doubt the usefulness of that mathematical theory but it feels like the original interpretation is not coherent with the abstraction anymore.

"My concern is with that last fact. ${\cal F}$ is a set of subsets of $\Omega$ so I'd say it is in principle deterministic." Interesting view. A nondeterministic random variable was -as far as I knew- an ${\cal F}$-measurable function on $\Omega$. Or to put it this way: the elements of ${\cal F}$ (subset of $\Omega$) are random events that are by no means deterministic. — Kurt G., Aug 11 '22 at 11:17
This question et similia appear often here: the answer is always that measure-theoretic probability equates 'random' with 'measurable'. In this framework, the intuitive layman's concept of randomness is purely interpretative. — Snoop, Aug 11 '22 at 15:44
@Snoop And yet, if that measure-theory is of any use, it should be able to bridge the gap back to layman's concepts of randomness. All probability courses start with an abstraction of real random events that give the intuitive foundation, and justification for why we should care about this theory in the first place. If you don't like random and prefer measurable, then how does it translate to reality where we all live? — lpnorm, Aug 12 '22 at 08:23
@lpnorm if this question doesn't clear something up then I don't get your post — Snoop, Aug 12 '22 at 09:46
@Snoop It is not completely true that measure-theoretic probability equates 'random' and 'measurable'. Constant maps $\Omega \to E$ are measurable but are considered 'deterministic'. There is a little bit more at play in the interpretation. — unwissen, Aug 12 '22 at 14:39
@unwissen ever heard of constant random variables? They also have distributions! — Snoop, Aug 12 '22 at 14:45
Of course it has, but this doesn't explain how randomness is modeled here at all. I suggest you to leave this question alone if all you have to add are platitudes and unkind behavior. — unwissen, Aug 12 '22 at 14:49
@Snoop comments are completely correct, the other comments are of people who are confused. "Randomness" is a pure intuitive idea; in a probability space $(\Omega, \mathscr{F}, \mathbf{P})$ randomness "enters through" intuition since there is nothing "random" about that triplet. Yet, it was proven empirically over and again that it is very useful to say that the elements of $\mathscr{F}$ are "random events" and that the measure assigned by $\mathbf{P}$ is their "theoretical frequency.".... — William M., Aug 12 '22 at 15:05
In this context, "randomness" of $\mathbf{E}(X \mid \mathscr{F}_0)$ "enters through" the fact that the events in $\mathscr{F}_0$ are themselves random! Again, purely interpretation. — William M., Aug 12 '22 at 15:06
In fact, there is a geometric view point that is often disregarded but that is proven to be often times much more useful. The conditional expectation is the orthogonal projection operator from the Hilbert space $\mathbf{L}^2(\mathscr{F})$ onto the Hilbert subspace $\mathbf{L}^2(\mathscr{F}_0).$ This has proven useful because projections and convexity are at the core of convex optimisation which in turns is the mathematical heart of "machine learning." This geometric viewpoint is not interpretation, by the way, the orthogonal projection really is the conditional expectation. — William M., Aug 12 '22 at 15:11
@WilliamM Did you read my answer? I actually wrote that and even mentioned the 'geometric viewpoint'. — unwissen, Aug 12 '22 at 15:16
@unwissen Your answer below is fine. That is the third interpretation that I know of is that of "information" and it is mainly used in mathematical finance as far as I know. — William M., Aug 12 '22 at 15:18
@WilliamM The whole concepts of e.g. filtrations, martingales and markov processes are based on this interpretation. Concepts which do appear in mathematical finance but in now way "mainly" there. — unwissen, Aug 12 '22 at 15:21
@WilliamM. Of course we are talking about interpretation, but models and theory only have purpose if they allow for meaningful and coherent interpretation. The issue of my question is that it is easy to see how we can interpret elements of $\Omega$ as random outcomes of some real random experiment. Everyone likes to explain via dice or urn experiments here. But it is not at all clear how this interpreted randomness extends to the $\sigma$ algebra. — lpnorm, Aug 20 '22 at 13:08
@lpnorm the $\sigma$ algebra is, by arbitrariness, the random events you want to consider. In fact, when $\Omega$ is countable, you take the $\sigma$ field to be its power set because you want everything to be a possible event. The fact of the matter is that we want to have as many events as possible but the measure does not extend to the power set, so we have to restrict. The $\sigma$ algebra is fundamentally a mathematical restriction, not an interpretation. (The interpretation is perhaps just those events where we can define the probability in a meaningful way.) — William M., Aug 20 '22 at 23:42

score 2 · Accepted Answer · answered Aug 12 '22 at 14:32

I think this a legitimate question as intuition as opposed to mere manipulation of symbols is very important in mathematics in general and especially in probability theory.

That said, it may be helpful to first understand that a $\sigma$-algebra often has to be interpreted as a body of information in the following sense. A set $A$ is contained in the $\sigma$-algebra $\mathcal{F}$ if we can answer the question "Is (the outcome of "the" probability experiment) $\omega \in A$?" (this somehow also explains the definition/closure properties of a $\sigma$-algebra!).

In the same vein the conditional expectation $\mathbb{E}\left[X \,\middle\vert\, \mathcal{F}\right](\omega)$ of a random variable $X$ given the $\sigma$-algebra (or "the information") $\mathcal{F}$ is "the best" estimate for $X(\omega)$ we can give if we are given the information necessary to answer the question "Is $\omega \in A$?" for all $A \in \mathcal{F}$.

This property is connected with the formal requirements that $$ \mathbb{E}\left[X \,\middle\vert\, \mathcal{F}\right] $$ is a $\mathcal{F}$-measurable random variable, i.e. we can actually determine it by the information $\mathcal{F}$, and the projection property $$ \mathbb{E}\left[\mathbb{E}\left[X \,\middle\vert\, \mathcal{F}\right] \cdot \mathbb{1}_{A} \right] = \mathbb{E}\left[X \cdot \mathbb{1}_{A} \right] $$ for all $A \in \mathcal{F}$ or even more general (but equivalent) $$ \mathbb{E}\left[\mathbb{E}\left[X \,\middle\vert\, \mathcal{F}\right] \cdot Z \right] = \mathbb{E}\left[X \cdot Z\right] $$ for all $\mathbb{R}_+$-valued and $\mathcal{F}$-measurable random variables $Z$. The latter expresses the idea that $\mathbb{E}\left[X \,\middle\vert\, \mathcal{F}\right]$ is the "best possible estimate of $X$ given $\mathcal{F}$".

If $A \in \mathcal{F}$ with $\mathbb{P}(A) > 0$, we can write it as $$ \frac{1}{\mathbb{P}(A)} \int_{A} \, \mathbb{E}\left[X \,\middle\vert\, \mathcal{F}\right](\omega) \, \mathbb{P}(\mathrm{d}\omega) \\ = \frac{1}{\mathbb{P}(A)} \cdot \mathbb{E}\left[\mathbb{E}\left[X \,\middle\vert\, \mathcal{F}\right] \cdot \mathbb{1}_A \right] = \frac{1}{\mathbb{P}(A)} \mathbb{E}\left[X \cdot \mathbb{1}_A \right] \\ = \frac{1}{\mathbb{P}(A)} \int_{A} \, X(\omega) \, \mathbb{P}(\mathrm{d}\omega), $$ i.e. we have the same average over all $\omega \in A$.

Another useful viewpoint (which can be used to prove existence of conditional expectations too) is that for square-integrable $X \in L^2(\Omega, \mathcal{A}, \mathbb{P})$ the conditional expectation given $\mathcal{F}$ is the $\mathcal{F}$-measurable, i.e. determined by the information $\mathcal{F}$, (and square-integrable) random variable which minimizes $$ \mathbb{E}\left[(X-Z)^2\right] $$ among all $\mathcal{F}$-measurable and square-integrable random variables $Z$.

Thank you so much for that well posed answer. And although I find the information interpretation of $\sigma$ Algebras to be quite hand wavy (in general, not in your reply), I appreciate your thoughts on the matter. — lpnorm, Aug 20 '22 at 13:11

Where does the randomness come in for conditional expectations $\mathbb{E}[X | \mathcal{F}]$?

1 Answers1