1

I've done courses on measure theory, advanced stochastic processes, etc.--years and years of using the notion of 'measurableness'. I've now come to the conclusion that, in fact, I do not understand this notion.

Yes, I 'understand' the "mathematical" meaning (i.e., the definition), it's very simple: The inverse images of (Borel-)measurable sets need to be in the sigma algebra. In fact mathematically there isn't anything to understand, it's just a definition. Yet there's a difference between 'understanding' a definition mathematically, and really understanding the semantics on the deepest level (which I'd call the 'true' meaning).

Concretely, let's focus on probability spaces. Let $(\Omega, \mathcal{F}, P)$ be a probability space. Let $\mathcal G \subseteq \mathcal F$ be a sub-sigma-algebra.

(1) What is the meaning of "the information contained in $\mathcal G$"?

Let $X \colon \Omega \to \mathbb{R}$ be a function. Then $X$ is called a random variable iff $X$ is $\mathcal F$-measurable. This means that, for any Borel set $B$, we have that $(X \in B)$ lies in $\mathcal F$. The reason behind this definition is that, for any such Borel set $B$, we want to be able to compute the probability that $X \in B$ (therefore we must demand all such to be part of the sigma-algebra). Let's suppose that $\sigma(X) \subseteq \mathcal G$, so that $X$ is even measurable w.r.t. the smaller sigma algebra.

(2) How can the "information" in $\mathcal G$ determine the value of $X$?

$\mathcal G$ is some smaller sigma algebra. In which sense does the information in $\mathcal G$ determine the value of $X$? I don't see that at all. In fact, in which sense does the information in $\mathcal F$ determine the value of $X$!?

Despite years of thinking and searching for answers online (including numerous MSE posts), the following question is pertinent:

(3) What is the link between sigma algebras and information?

(NB: I know e.g. What does it mean by saying 'a random variable $\mathit X$ is $\mathcal G$-measurable'? and Problem with intuition regarding sigma algebras and information and many others are similar/a duplicate of this, but it didn't help me solve my problems--all questions there were left unanswered, completely.)

TL;DR: I do not understand the link between sigma algebras and "information". I would appreciate any help. Right now I'm totally stuck. If I can't clear this up I basically have to drop out of the course and find a different career path.

  • “In fact mathematically there isn’t anything to understand, it’s just a definition.” Hard disagree. Most importantly, not every set has a well defined length/area/volume/probability and in order to define those notions satisfying the desired intuitions (e.g. rotational and translation invariance and additivity) we have to delimit which sets are measurable and which are non-measurable otherwise we obtain counterintuitive results. Perhaps you know this, but then I find it difficult to believe you would write the quoted sentence. – Nap D. Lover May 19 '22 at 13:46

2 Answers2

5

Here is an intuitive answer which may address your concern about $\sigma$-algebras and information.

Assume that only $3$ mutually exclusive events may happen at time $T$. Let these be denoted by $\omega_1$, $\omega_2$, $\omega_3$. The probabilities of these events are estimated to be $p_1, p_2, p_3 \in [0, 1]$ such that $p_1+p_2+p_3=1$. At time $T$, based on the occurrence of a particular event, John, Jack and Jane will take actions $X$, $Y$, $Z$ from a set of possible actions $A=\{0, 1, 2\}$.

Now consider the task of modelling $X$, $Y$, $Z$ mathematically, such that you can speak of the probability of a particular action taken. Finding the right model for $X$, $Y$, $Z$ will depend on the possible restrictions that John, Jack and Jane face at time $T$. Assume that at time $T$, the individual circumstances of John, Jack and Jane are as follows:

At time $T$, John will know exactly which of the mutually exclusive events $\omega_1, \omega_2, \omega_3$ has occurred and will take the action $1$, $2$, $3$, respectively.

At time $T$, Jack will only be able to tell whether $\omega_1$ has occurred or not. So if $\omega_1$ has occurred, he will take action $1$. If $\omega_1$ has not occurred, Jack will know that either $\omega_2$ or $\omega_3$ has occurred, but he will not know which one exactly, and, in either case, he will take action $2$.

At time $T$, Jane will only be able to tell whether $\omega_2$ has occurred or not. So if $\omega_2$ has occurred, she will take action $1$. If $\omega_2$ has not occurred, Jane will know that either $\omega_1$ or $\omega_3$ has occurred, but she will not know which one exactly, and, in either case, she will take action $3$.

Now, let us come up with a suitable mathematical model for the taken actions $X$, $Y$, $Z$. This will be accomplished by designing an individual probability space $(\Omega, \mathcal{F}, \mathbb{P})$ for the random variables $X$, $Y$, $Z$. In all the three cases, $\Omega$ will be given by $\{\omega_1, \omega_2, \omega_2\}$. However, the $\sigma$-algebra $\mathcal{F}$ should model the information accessible at time $T$ to the person under question. By saying information, we mean the events that are observable at time $T$ to the person under question.

John. According to the description, action $X$ taken by John is defined as follows: $X(\omega_1)=1$, $X(\omega_2)=2$, $X(\omega_3)=3$. Since John has complete information at time $T$, i.e., he is able to distinguish which of the mutually exclusive events $\omega_1$, $\omega_2$ and $\omega_3$ has occurred, the corresponding sigma-algebra $\mathcal{F}$ should reflect this fact. Therefore $\mathcal{F}$ should contain all the individual events $\{\omega_1\}$, $\{\omega_2\}$, $\{\omega_3\}$. Of course, John is also able to observe the event "either $\omega_1$ or $\omega_2$ has occurred", which is modelled by including the union of $\{\omega_1\}$ and $\{\omega_2\}$, given by $\{\omega_1, \omega_2\}$, into $\mathcal{F}$. Through a similar line of thought, we see that $\mathcal{F}$ has to be the power set of $\Omega$: $$ \mathcal{F}_1 = \mathscr{P}(\Omega) = \sigma(X). $$ The mutually exclusive events $\{\omega_1\}$, $\{\omega_2\}$, $\{\omega_3\}$ generate $\mathcal{F}$, and the values of $X$ are captured/determined by the values on these generating events.

Jack. According to the description, action $Y$ taken by Jack is defined as follows: $Y(\omega_1)=1$, $Y(\omega_2)=2$, $Y(\omega_3)=2$. Since at time $T$, John is able to distinguish the events "$\omega_1$ has occurred" and "either $\omega_2$ or $\omega_3$ has occurred", we include $\{\omega_1\}$ and $\{\omega_2, \omega_3\}$ in $\mathcal{F}$. Of course John is also able to tell whether either of the two aforementioned events has occurred, which is reflected by including $\{\omega_1, \omega_2, \omega_3\}$ in $\mathcal{F}$. Hence, $$ \mathcal{F}_2 = \{ \{\omega_1, \omega_2, \omega_3\}, \{\omega_2, \omega_3\}, \{\omega_1\}, \emptyset \} = \sigma(Y). $$ The mutually exclusive events $\{\omega_1\}$, $\{\omega_2, \omega_3\}$ generate $\mathcal{F}$, and the values of $Y$ are captured/determined by the values on these generating events.

Jane. According to the description, action $Z$ taken by Jane is defined as follows: $Z(\omega_1)=3$, $Z(\omega_2)=1$, $Z(\omega_3)=3$. Since at time $T$, Jane is able to distinguish the events "$\omega_2$ has occurred" and "either $\omega_1$ or $\omega_3$ has occurred", we include $\{\omega_2\}$ and $\{\omega_1, \omega_3\}$ in $\mathcal{F}$. Of course Jane is also able to tell whether either of the two aforementioned events has occurred, which is reflected by including $\{\omega_1, \omega_2, \omega_3\}$ in $\mathcal{F}$. Hence, $$ \mathcal{F}_3 = \{ \{\omega_1, \omega_2, \omega_3\}, \{\omega_1, \omega_3\}, \{\omega_2\}, \emptyset \} = \sigma(Z). $$ The mutually exclusive events $\{\omega_2\}$, $\{\omega_1, \omega_3\}$ generate $\mathcal{F}$, and the values of $Y$ are captured/determined by the values on these generating events.

It is easy to see that $X, Y, Z$ are all $\mathcal{F}_1$-measurable, since $\mathcal{F}_1$ is the $\sigma$-algebra with complete information. But $X$ is not $\mathcal{F}_2$-measurable, i.e., not all the events associated with $X$ are observable in $\mathcal{F}_2$. For example, $\mathcal{F}_2$ does not contain $\{\omega_2\}$, and one cannot speak of the probability of $X=2$ on the probability space $(\Omega, \mathcal{F}_2, \mathbb{P})$.

Holden
  • 1,517
  • Thanks a lot. It is for sure a very enlightening explanation. I am not (yet) sure if I fully grasp it now, but it seems at the very least this answer will put me on the way to getting there. If I still encounter a problem I'll ask it later. – herbhofsterd May 19 '22 at 16:46
  • The fact that "one cannot speak of the probability of $X=2$" (which, mathematically, I agree with), what does it then intuitively mean? I agree, $X$ is not $\mathcal{F}_2$-measurable. Say we have a random variable $T$ that is $\mathcal{F}_3$-measurable. Mathematically I see that this exactly means that $T$ is constant on ${\omega_1, \omega_3}$. I also understand that basically this holds because Jane cannot distinguish $\omega_1$ and $\omega_3$. And $X$ clearly isn't constant on ${\omega_1, \omega_3}$, so it's not $\mathcal{F}_3$-measurable. Yet I still feel that I don't understand it. – herbhofsterd May 19 '22 at 18:27
  • So, intuitively, when we say a random variable, say $X$, is "measurable" with respect to some sigma algebra, say $\mathcal F$, what should I think about? It means mathematically that we can compute all the probabilities that $(X\in B)$ for all Borel sets $B$ (i.e., each $(X\in B)$ is in $\mathcal F$). And for a map that is not measurable, we can't. Why can we say that we "know" $X$ given the 'information' $\mathcal F$ (if $X$ is $\mathcal F$-measurable)? Again, I know this relates to $E[ X , | , \mathcal F] = X$ etc., but this does not solve my conceptual issues. – herbhofsterd May 19 '22 at 18:34
  • In other words, my problem is the following sentence: "If $X$ is $\mathcal F$-measurable, then the information in $\mathcal F$ determines the value of $X$." In which sense is this true? – herbhofsterd May 19 '22 at 19:01
  • You could interpret that sentence the following way. If $\omega_0 \in \Omega$ has occurred, then any event $E \in \mathcal{F}$ with $\omega_0 \in E$ has also occurred. Consider such an event $E$. If $X$ is $\mathcal{F}$-measurable, then either $X(\omega)=X(\omega_0)$ for all $\omega \in E$, or there exists $E' \subset E$, $E' \in \mathcal{F}$, such that $\omega_0 \in E'$ and $X(\omega)=X(\omega_0)$ for all $\omega \in E'$. We can take $E'=E \cap { \text{preimage of} \ X(\omega_0) }$. – Holden May 19 '22 at 22:45
1

A sub-$\sigma$-algebra $\mathcal{G}$ represents "partial information" in the following sense. For each $G\in\mathcal{G}$, an observer (who doesn't know which $\omega$ has been drawn) knows if $\omega\in G$ or $\omega\in G^{c}$. This is also related to conditional expectation, i.e., a finer $\mathcal{G}$ gives a better prediction of some unknown quantity $Y$ living on $(\Omega,\mathcal{F},\mathsf{P})$. Assuming that $Y\in L^2$, and $\mathcal{G}'\subset\mathcal{G}''$, \begin{align} \mathsf{E}[(Y-\mathsf{E}[Y\mid \mathcal{G}'])^2\mid \mathcal{G}'']&=\mathsf{E}[(Y-\mathsf{E}[Y\mid \mathcal{G}'']+\mathsf{E}[Y\mid \mathcal{G}'']-\mathsf{E}[Y\mid \mathcal{G}'])^2\mid \mathcal{G}''] \\ &=\mathsf{E}[(Y-\mathsf{E}[Y\mid \mathcal{G}''])^2\mid \mathcal{G}''] +(\mathsf{E}[Y\mid \mathcal{G}'']-\mathsf{E}[Y\mid \mathcal{G}'])^2. \end{align} That is, $\mathsf{E}[(Y-\mathsf{E}[Y\mid \mathcal{G}'])^2]\ge \mathsf{E}[(Y-\mathsf{E}[Y\mid \mathcal{G}''])^2]$.

On the other hand, the interpretation of $\sigma$-algebras as information is purely informal as pointed out by Billingsley (Example 4.9 in his Probability & Measure, 1986). First, he defines partitions of $\mathcal{G}$, i.e., $\omega$ and $\omega'$ are $\mathcal{G}$-equivalent if $1_G(\omega)=1_G(\omega')$ for every $G\in \mathcal{G}$. Sets of $\mathcal{G}$-equivalent points is the partition of $\mathcal{G}$. Our observer knows the equivalence class to which $\omega$ belongs (which we interpret as information).

Now, let $([0,1],\mathcal{B}_{[0,1]},\mathsf{P})$ be the probability space, where $\mathsf{P}$ is the Lebesgue measure, and let $\mathcal{G}=\sigma\{\{\omega\}:\omega\in [0,1]\}$, i.e., $\mathcal{G}$ is the countable/co-countable $\sigma$-algebra. Notice that $\mathcal{G}$ is independent of $\mathcal{B}_{[0,1]}$ because all the events in $\mathcal{G}$ are trivial. Thus, it does not contain any information about the sets in $\mathcal{B}_{[0,1]}$ in the sense that $\mathsf{P}(B\mid \mathcal{G})=\mathsf{P}(B)$. On the other hand, the $\mathcal{G}$-partition consists of the singletons, i.e., it contains "all the information".

  • Thanks a lot. Actually I'd myself come up with the concept of looking at "points separable by a $\mathcal G$-measurable set"; and defining two points to be equivalent iff they can't be separated by a $\mathcal G$-set. If my interpretation is right this exactly what you're saying in second paragraph. Under the interpretation that for each $G \in \mathcal G$ the observer (though he doesn't know which $\omega$ has been drawn) knows if $\omega$ is in $G$ or $G^C$, it makes sense that this observer will "know (strictly) more", if it's instead $\mathcal F$ where $\mathcal G \subset \mathcal F$. – herbhofsterd May 19 '22 at 16:50