The "true" domain of random variables

Question

Typically in applied probabilistic or statical literature we work with random variables whose domain we don't specify. We just care about the set in which the random variables takes values.

For example, the number of aces in hand at a certain cardgame, the height of a population or the income of a company in a certain year are all random variables (the last two examples come from statistics). But in all of these examples, the domain is never given.

While we could always construct any number of artificial probability spaces, that would serve as domain, I'm interested in what a ̶"̶t̶r̶u̶e̶" compelling probability space domain could be that really models (the underlying the experiment of) these three examples?

EDIT To prevent unclarity with what I mean by "compelling". Let me be more precise by giving an example: Consider the random variable that counts the number of heads when flipping a coin $n$ times. Thus it takes values from $0,1,\ldots,n$. But which experiment would most likely be perform in order to lead to these values?
The most complelling space $\Omega$ would be $\Omega=\{ H,T\}^n$, the space of sequences of $n$ coin flipping, since this is what actually happens.
But one could just as well define this random variable on the set $\{0,1,…,n\}$, in which case the random variable would be the identity function. This space I would call artificial, not "compelling", because it doesn't give an accurate representation of the underlying experiment any more.
In particular I'm interested in the underlying space for the statistical examples.

P.S. See also this other question of mine, which also has a bounty running.

I think your example is getting at the idea that a "true" space should contain all the full entropy of the experiment used to generate our random variable, even if our measurement destroys some of that entropy. I wonder whether it should also not contain any more entropy than the experiment. For example your experiment is easy to implement on $[0,1]^n$ with the Lebesgue measure; is this not a suitable "true space" because it contains "excess entropy"? — Ian, Apr 14 '17 at 15:47
Relevant: https://math.stackexchange.com/questions/712734/domain-of-a-random-variable-sample-space-or-probability-space, https://math.stackexchange.com/questions/298971/random-variable, https://math.stackexchange.com/questions/1939369/domain-of-a-random-variable. — Martín-Blas Pérez Pinilla, Apr 14 '17 at 17:37
@Ian I wouldn't consider any set that contains more events than necessary to formally record the outcome of a real-world experiment as the "true" space. For example, to the set $\Omega={ H,T }^n$ (where $H=0$ and $T=0$, so as to give a "mathematical encoding" for $H$ and $T$) I could add any number of elements to obtain $\Omega'$ and $\Omega'$ doesn't represent an immediate translation of my experiment any more (even more so, if the probabilities of events$A\subset \Omega'\cap \Omega$ have to be adjusted in $\Omega'$, due to having added more randomness). — temo, Apr 14 '17 at 18:45
(Though this doesn't mean that there must always be just one "true" space, for one and the same experiment there may be multiple, equally natural, ways to translate it into a formal, "true" space.) — temo, Apr 14 '17 at 18:55
The point is that there are sets which have a "clean" representation of your experiment but can also encode far more experiments. $[0,1]^n$ with the coin flipping experiment is an example. Certainly arbitrary extensions couldn't be the "true" space (if this notion is to make sense). — Ian, Apr 14 '17 at 19:24
@Ian Ok, you made your point. I will reword my question from "true" to "compelling". — temo, Apr 16 '17 at 00:32
You can safely assume that the "true" underlying sample space exists, but given joint probability distribution of the random variables of interest only, you can only make a guess about the "true" underlying sample space. Sometimes a simple "compelling" guess may exist, generally not. — kludg, Apr 21 '17 at 12:09
@kludg I saw your comment only today; because of the many answers that appeared and the different bounties that were open I lost a bit the overview. Your remark sounds very intriguingwhat. Could you elaborate a bit on it please? I've started a new question, just for this, if you want, you could post there. Here's the link: https://math.stackexchange.com/questions/2257506/compelling-probability-theory-spaces — temo, Apr 29 '17 at 12:08

score 5 · Accepted Answer · 2017-06-26T12:34:54.867

Here is paper that discusses your question (see page 3). And this stack overflow question: what are the sample spaces when talking about continuous random variables

As you'll see, random variables are not required to use probability theory, they are just convenient ways to capture aspects of the underlying sample space we are interested in. We could choose to work directly with the underlying sample space if we knew it (as a running example I will use $\Omega = \{H,T\}^N$ for an N-coin-toss experiment).

Basically, the decision to model outcomes of an experiment as a random variable or treat them as direct observations of the sample space is mostly a matter of perspective. The random variables view separates the object itself (possibly an abstract object) $\omega \in \Omega$ from the questions we can ask about it (e.g., "HH" vs "Number of tails","Number of Heads", "At least one tail", "No more than 2 Heads" etc).

If you only care about one question, then the views are isomorphic. However, if you want to ask multiple questions about the same observational unit, then the random variables view is more consistent with what you are trying to do. For example, you ask the height and weight of 100 randomly chosen people -- in this case, a random variables view makes more sense, as "height" and "weights" are not independent objects in the real world that "just happen" to be correlated - they are linked through people ($\omega \in \Omega$).

So, let's say I gave you the underlying sample space $\Omega$ for a problem. Now what? You will want to start to ask questions about the probability of various events defined as measurable sets with elements from $\Omega$ (e.g., all outcomes where we toss at least three heads). There are two ways to do this:

Create the set of all $\omega \in \Omega$ that have three heads and then calculate the probability of this set.
Define an integer-valued random variable $X(\omega)$ that returns the number of heads in $\omega$. This will create a new sample space called the image of $X(\omega)$, along with an induced probability measure $P'$ that is defined over the integers 0 to N. This induced measure is called a pushforward measure (or image measure). Now you can re-cast your question as $P'(X=3)$ as opposed to $P(\{\omega \in \Omega: \#\text{Heads}(\omega) = 3\})$ using the original space.

You are probably familiar with this stuff -- however, you want to know why we bother with it. In the case of the analysis of a single random variable, we can very well re-define our sample space by using the induced sample space (or simply define a sample space to match the properties of the random variable).

This changes when we move to jointly distributed random variables. Without $\Omega$ (at least implicitly), we'd have no way to index joint observations. Here's an example:

Lets say you sample 5 values from each of two random variables, $X$ and $Y$:

Observed X's = $1,1,2,5,3$
Observed Y's = $0,1,1,0,1$

Now, you want to develop a joint distribution that describes these observations as random variables (i.e., different aspects of some common object). How will you do this? Most importantly, you need to first associate an observation from $X$ with an observation from $Y$. Implicit in this association is the assumption that there is some common sample space $\Omega_J$ that justifies us associating, say, the first observation of $X$ with the first observation of $Y$ to form the joint observation $(1,0)$ (in this example).

So, in my example, we are assuming there is some underlying event $\omega'\in \Omega_J$ such that $X(\omega')=1$ and $Y(\omega')=0$ and that there is a valid underlying probability space $(\Omega_J,\mathcal{F}_J,P_J)$ whose image will produce the observed joint distribution of $(X,Y)$.

However, we could dispense with all of this if we chose to model $X,Y$ not as random variables but as direct observations (the integers are our experimental units or foundation data).

At this point, you may still be unconvinced of the usefulness of the sample space view...

So, let's say you develop your distribution of $X,Y$ directly (no sample space[i.e., domain-less in your terminology]), then you want to add a new quantity $Z$. How do you do this. Without an underlying sample space you need to develop the joint distribution manually from first principles (i.e., ad hoc) whereas invoking the idea of an underlying sample space makes extending joint distributions a natural consequence of defining a new function over the same (usually implicit) underlying probability space. The fact that this can be assumed to be true is a major theoretical elegance of modern probability theory.

Again, it's a matter of perspective, but the random variables view, at least to me, has a philosophical/conceptual elegance to it when you consider joint observations and stochastic processes.

Here is a nice post in math.overflow that discusses something similar.

This answer contains so many useful things additionally to Jason's answer, that I just can't let it go unrewarded! Since I can't split the bounty, I wanted to award you with 50rep for it, but I just discovered that the system allows me to reward only at least double the previous bounty, meaning 300rep, which is too much. I will think of a way to get you at least some rep points as my way of saying "thanks" for such a great answer, just give me a few days. — temo, Apr 27 '17 at 14:32
@temo glad you found my post helpful. Don't worry about the bounty, just glad it helped. Thanks for pointing out my profile..I had originally thought to delete my account some time ago and I forgot to revise the wording. — , Apr 27 '17 at 16:24

Jason · Answer 2 · 2017-04-26T18:32:17.150

If you are only interested in a specific collection of random variables, then I would argue that the most "natural" setting is to look at their joint law as you have done for the coin tosses. For example, if $(Z_n)$ is an iid sequence of $\mathcal N(0,1)$ random variables, we would have $\Omega=\mathbb R^{\mathbb N}$, $\mathcal F$ the (Borel) product $\sigma$-field, and $\mathbb P$ the probability measure such that $$\mathbb P\left\{\omega\in\mathbb R^{\mathbb N}\,:\,\omega(n)\in A_n\text{ for }n=1,\ldots,N\right\}=\prod_{n=1}^N\frac1{\sqrt{2\pi}}\int_{A_n}e^{\frac{-x_n^2}{2}}dx_n$$ for all Borel sets $A_1,\ldots,A_N$ (recall this uniquely defines a probability measure by Kolmogorov's theorem). In this case we have $Z_n(\omega):=\omega(n)$. Of course, having the random variables be independent is a trivial example, but the basic idea remains in far greater generality: suppose for each $i\in I$, we have a measurable space $(E_i,\mathcal A_i)$ and random variables $X_i$ defined on $E_i$ such that $$\mathbb P(X_i\in A_i\text{ for }i\in F)=:p\bigg(F,\prod_{i\in F}A_i\bigg)$$ is known for every finite $F\subset I$ and every collection of measurable sets $A_i\in\mathcal A_i$. Consider $\Omega:=\prod_{i\in I}E_i$, $\mathcal F$ the product $\sigma$-field of the $\mathcal A_i$'s, and the unique probability measure $\mu$ on $\mathcal F$ such that $$\mu\{\omega\in\Omega\,:\,\omega(i)\in A_i\text{ for }i\in F\}=p\bigg(F,\prod_{i\in F}A_i\bigg).$$ As far as the collection of random variables $(X_i)$ is concerned, $(\Omega,\mathcal F,\mu)$ knows as much as whatever our original probability space does, so we may as well assume our original space was $(\Omega,\mathcal F)$ with $\mathbb P=\mu$. Again, in this case we have $X_i(\omega)=\omega(i)$.

This set-up is not perfect, of course. For a simple example, take Brownian motion $(B_t)$. We would like to say $\mathbb P(t\mapsto B_t\text{ is continuous})=1$, but under the product $\sigma$-field this event is not even measurable. There are ways to work around such problems (in this specific case you use Kolmogorov's continuity theorem) but are usually handled on a case-by-case basis.

Another issue is when you are looking at a sequence of spaces. Consider for instance particles on the discrete $N$-torus performing symmetric simple exclusion. Explicitly, each particle independently performs a (continuous time) simple random walk on the the torus $\mathbb T_N:=\mathbb Z/N\mathbb Z$, but if a particle attempts to jump to a position which is already occupied, no jump occurs. It is interesting to consider the asymptotics of such a process, i.e. what happens as $N$ becomes large. But for distinct $N$, we necessarily require different probability spaces. How does it make sense to consider different $N$ simultaneously? What would be the probability of an event of the form $$\{\text{the system on the $N$-torus is at state $A$}\}\cap\{\text{the system on the $M$-torus is at state $B$}\}?$$

This is why we don't usually bother with the probability space too much. We know it exists - Kolmogorov's theorem guarantees that in many cases, and when there is delicate points like continuity, there are theorems that get around the problem. So we ignore it. Why do we care about the space? It's not that it's something so abstract we couldn't possibly begin to understand it, but we are almost without exception interested in some collection of random variables, and knowing everything we can about their joint law is enough.

EDIT: To address your questions below.

$1)$ The Kac-Rice formula provides a method for computing the (expected) number of zeroes of a smooth Gaussian field. The source I have linked deals exclusively with the case where the field depends only on finitely many iid $\mathcal N(0,1)$ random variables, in which case existence of the appropriate probability space is dealt with via our earlier example. However, the Kac-Rice formula still holds for a more general smooth Gaussian field $F:U\rightarrow\mathbb R$ for some open $U\subset\mathbb R^N$, only we now need to be careful in what conditions we place on the correlations; without a certain degree of correlation, we cannot hope to have a smooth field (e.g. if $\{F(x)\}_{x\in U}$ are iid then obviously $F$ is not smooth, or even continuous). Once we have appropriate conditions, the approach is similar to the Brownian motion case: first we construct a field in the standard (i.e. Kolmogorov consistency theorem) way, and then we show there is a smooth version.

$2)$ I don't believe there is a sensible answer to this question. This is common when we model particle systems using probability theory: we assume there are $N$ particles and construct our model, then we see what happens if $N$ is large (which seems sensible since if our system is macroscopic, we would expect something on the order of $10^{20}$ particles). We are not assuming that, for instance, we have some global space and then we keep adding more particles - for each $N$, the models are distinct. As you may imagine one needs to think about what it means for such a system to "converge" - typically, we will identify the state of the system with some empirical measure and then consider weak convergence of measures.

$3)$ One must be careful to remember that random variables and probability theory in general is a model for statistics, they are NOT the same thing. So strictly speaking, there isn't some abstract probability space underlying people's heights; height is deterministic, and it is simply a matter of who you choose to survey. Of course, it may be extremely useful to model such an experiment as drawing a sample of size $N$ from a particular distribution. I would argue that the most "natural" probability space for this model would be the joint law of an independent (infinite) sequence of random variables of the given distribution. So for example, if our distribution was the standard normal (which is obviously absurd for height, but you get the idea), then the natural space is the very first example I gave. As we saw there, this probability measure is more than capable with dealing with only a finite number of random variables, and has the advantage that it does not matter what your size $N$ is - you could keep surveying more people if you wanted. Again, this is getting into how you choose to model a specific experiment or problem, and it is important not to assert that there is a "true" probability space.

This seems to be a very interesting answer, but there are still a few points that are open, before I accept it, in particualr 3) below (sorry for responding so late, I was travelling): — temo, Apr 26 '17 at 10:41

Michael · Answer 3 · 2020-09-12T16:57:34.700

The sample space is usually chosen as the simplest and easiest-to-understand representation of the relevant aspects of the system.

For the experiment of flipping $n$ coins, the sample space $\{H,T\}^n$ is good because all of its $2^n$ outcomes are equally likely. One could alternatively use the sample space $\{0, 1, ..., n\}$, but then one needs to assign non-equal probability masses, and those would be computed with the equally-likely model for $\{H,T\}^n$ in mind.

In standard probability courses, a lot of probability is done even before defining random variables. Abstract sample spaces are used and axioms are defined. This shows how a variety of different situations can be treated and emphasizes important concepts of outcome and event.

The topic of random variables occurs only later in standard courses. Random variables are good ways to represent events of interest. It is encouraging to know that random variables have a direct connection with sample spaces and probability axioms (so that the same probability theory that was learned before still applies). It is also useful to recognize that many different probability experiments can be modeled by the same kinds of random variables, i.e., variables with the same cumulative distribution functions (CDFs). In some cases it is easier to work directly with the CDFs (or joint CDFs for random vectors), rather than describing a sample space. This is a way of representing the problem very simply, where the random variables or vectors are just identity functions on $\mathbb{R}$ or $\mathbb{R}^n$.

+1 for pointing out in the second paragraph that everythin essential reduces to uniform probability. — temo, Apr 26 '17 at 10:21

score 0 · Answer 4 · answered Apr 14 '17 at 13:06

0

There is no such "true" space. Measure theory is a model to use for probability. It is not the "true" probability. A probabilist need not mention any measure space. But an analyst (like me) prefers to talk about $(\Omega, \mathcal F,\mathbb P)$, which seems foreign to a probabilist.

This is the same thing as saying a differential equation is a model for a vibrating string: it is not the "true" vibrating string.

Or (in mathematics) ZFC can specify a model for sets, but they are not necessarily the "true" sets.

answered Apr 14 '17 at 13:06

GEdgar

111,679

There's a misunderstanding here. I understand that a measure space is not the true model in your sense of the word (that's why I used apostrophes). What I meant with "true" was a formal model that represents more accurately the underlying experiment than an artificial model. – temo Apr 14 '17 at 13:19
Let me be more precise by giving an example: Consider the random variable that counts the number of heads when flipping a coin $n$ times. The "true" space $\Omega$ (in my meaning of the word "true") would be $\Omega={H,T}^n$, the space of sequences of $n$ coin flipping, since this is what actually happens. But one could just as well define this random variable on the set ${0,1,\ldots,n}$, in which case the random variable would be the identity function. This space I would call artificial, not "true", because it doesn't give an accurate representation of the underlying experiment any more. – temo Apr 14 '17 at 13:20
(I provided an edit to my question where I inserted this remark, to avoid any misunderstandings for future readers.) – temo Apr 14 '17 at 15:43
@temo Ultimately I think GEdgar is still right: a "true" probability space should, philosophically, be a purely probabilistic notion, but there are other axiomatizations of probability than measure theory. – Ian Apr 14 '17 at 15:49

score 0 · Answer 5 · 2018-02-19T15:34:49.793

(Note: the details in this post are not completely correct; there is some awkwardness related to events of probability zero. What one needs is the Loomis-Sikorski representation theorem, as indicated by the comment by @r.e.s. )

There is a generic construction, which basically defines

A sample is a consistent way of specifying which events hold and which events do not

Let $\mathcal{E}$ be the set of all possible events that can be constructed — that is, boolean valued random variables. Or equivalently, all real random variables that take values in $\{ 0, 1 \}$.

For example, if $X$ is a real random variable, then for every real number $x$, then $X > x$ is an event. (and the collection of all of these events uniquely determines $X$)

The boolean operations (e.g. $\wedge$, $\vee$, and $\neg$) give operations on $\mathcal{E}$. If you develop random sets in one reasonable way, then $\mathcal{E}$ will actually be an abstract $\sigma$-algebra.

Stone's representation theorem says that, every abstract $\sigma$-algebra can be expressed as a $\sigma$-algebra of subsets of a set $\Omega$.

The theorem also gives a specific way to construct such an $\Omega$: as the set of ultrafilters on $\mathcal{E}$. The definition of ultrafilter is pretty much literally a precise way to state the definition given at the top of this post.

We can take $(\Omega, \mathcal{E})$ as the underlying measurable space. All events are measurable functions on this space, and two events are the same if and only if they correspond to the same measurable function, so this space really does give a faithful way to talk about the random variables.

In your example of flipping $n$ coins, if you take the events to be the ones you can construct out of the events "the $n$-th coin is heads". Running through this construction really does lead to $\Omega \cong \{ H, T \}^n$.

Incidentally, there is an approach to measure theory that complete eliminates the need to have a sample space: that of measurable locales

"Stone's representation theorem says that, every abstract $\sigma$-algebra can be expressed as a $\sigma$-algebra of subsets of a set $\Omega$." I'm no expert expert in this area, but my understanding is that although that conclusion is false, it's "almost" true -- I think you want to refer to the Loomis-Sikorski representation theorem. — r.e.s., Feb 19 '18 at 15:09

The "true" domain of random variables

5 Answers5

Linked