15

Typically in applied probabilistic or statical literature we work with random variables whose domain we don't specify. We just care about the set in which the random variables takes values.

For example, the number of aces in hand at a certain cardgame, the height of a population or the income of a company in a certain year are all random variables (the last two examples come from statistics). But in all of these examples, the domain is never given.

While we could always construct any number of artificial probability spaces, that would serve as domain, I'm interested in what a ̶"̶t̶r̶u̶e̶" compelling probability space domain could be that really models (the underlying the experiment of) these three examples?

EDIT To prevent unclarity with what I mean by "compelling". Let me be more precise by giving an example: Consider the random variable that counts the number of heads when flipping a coin $n$ times. Thus it takes values from $0,1,\ldots,n$. But which experiment would most likely be perform in order to lead to these values?
The most complelling space $\Omega$ would be $\Omega=\{ H,T\}^n$, the space of sequences of $n$ coin flipping, since this is what actually happens.
But one could just as well define this random variable on the set $\{0,1,…,n\}$, in which case the random variable would be the identity function. This space I would call artificial, not "compelling", because it doesn't give an accurate representation of the underlying experiment any more.
In particular I'm interested in the underlying space for the statistical examples.

P.S. See also this other question of mine, which also has a bounty running.

temo
  • 5,237
  • I think your example is getting at the idea that a "true" space should contain all the full entropy of the experiment used to generate our random variable, even if our measurement destroys some of that entropy. I wonder whether it should also not contain any more entropy than the experiment. For example your experiment is easy to implement on $[0,1]^n$ with the Lebesgue measure; is this not a suitable "true space" because it contains "excess entropy"? – Ian Apr 14 '17 at 15:47
  • Relevant: https://math.stackexchange.com/questions/712734/domain-of-a-random-variable-sample-space-or-probability-space, https://math.stackexchange.com/questions/298971/random-variable, https://math.stackexchange.com/questions/1939369/domain-of-a-random-variable. – Martín-Blas Pérez Pinilla Apr 14 '17 at 17:37
  • @Ian I wouldn't consider any set that contains more events than necessary to formally record the outcome of a real-world experiment as the "true" space. For example, to the set $\Omega={ H,T }^n$ (where $H=0$ and $T=0$, so as to give a "mathematical encoding" for $H$ and $T$) I could add any number of elements to obtain $\Omega'$ and $\Omega'$ doesn't represent an immediate translation of my experiment any more (even more so, if the probabilities of events$A\subset \Omega'\cap \Omega$ have to be adjusted in $\Omega'$, due to having added more randomness). – temo Apr 14 '17 at 18:45
  • (Though this doesn't mean that there must always be just one "true" space, for one and the same experiment there may be multiple, equally natural, ways to translate it into a formal, "true" space.) – temo Apr 14 '17 at 18:55
  • The point is that there are sets which have a "clean" representation of your experiment but can also encode far more experiments. $[0,1]^n$ with the coin flipping experiment is an example. Certainly arbitrary extensions couldn't be the "true" space (if this notion is to make sense). – Ian Apr 14 '17 at 19:24
  • @Ian Ok, you made your point. I will reword my question from "true" to "compelling". – temo Apr 16 '17 at 00:32
  • You can safely assume that the "true" underlying sample space exists, but given joint probability distribution of the random variables of interest only, you can only make a guess about the "true" underlying sample space. Sometimes a simple "compelling" guess may exist, generally not. – kludg Apr 21 '17 at 12:09
  • @kludg I saw your comment only today; because of the many answers that appeared and the different bounties that were open I lost a bit the overview. Your remark sounds very intriguingwhat. Could you elaborate a bit on it please? I've started a new question, just for this, if you want, you could post there. Here's the link: https://math.stackexchange.com/questions/2257506/compelling-probability-theory-spaces – temo Apr 29 '17 at 12:08

5 Answers5

5

Here is paper that discusses your question (see page 3). And this stack overflow question: what are the sample spaces when talking about continuous random variables

As you'll see, random variables are not required to use probability theory, they are just convenient ways to capture aspects of the underlying sample space we are interested in. We could choose to work directly with the underlying sample space if we knew it (as a running example I will use $\Omega = \{H,T\}^N$ for an N-coin-toss experiment).

Basically, the decision to model outcomes of an experiment as a random variable or treat them as direct observations of the sample space is mostly a matter of perspective. The random variables view separates the object itself (possibly an abstract object) $\omega \in \Omega$ from the questions we can ask about it (e.g., "HH" vs "Number of tails","Number of Heads", "At least one tail", "No more than 2 Heads" etc).

If you only care about one question, then the views are isomorphic. However, if you want to ask multiple questions about the same observational unit, then the random variables view is more consistent with what you are trying to do. For example, you ask the height and weight of 100 randomly chosen people -- in this case, a random variables view makes more sense, as "height" and "weights" are not independent objects in the real world that "just happen" to be correlated - they are linked through people ($\omega \in \Omega$).

So, let's say I gave you the underlying sample space $\Omega$ for a problem. Now what? You will want to start to ask questions about the probability of various events defined as measurable sets with elements from $\Omega$ (e.g., all outcomes where we toss at least three heads). There are two ways to do this:

  1. Create the set of all $\omega \in \Omega$ that have three heads and then calculate the probability of this set.
  2. Define an integer-valued random variable $X(\omega)$ that returns the number of heads in $\omega$. This will create a new sample space called the image of $X(\omega)$, along with an induced probability measure $P'$ that is defined over the integers 0 to N. This induced measure is called a pushforward measure (or image measure). Now you can re-cast your question as $P'(X=3)$ as opposed to $P(\{\omega \in \Omega: \#\text{Heads}(\omega) = 3\})$ using the original space.

You are probably familiar with this stuff -- however, you want to know why we bother with it. In the case of the analysis of a single random variable, we can very well re-define our sample space by using the induced sample space (or simply define a sample space to match the properties of the random variable).

This changes when we move to jointly distributed random variables. Without $\Omega$ (at least implicitly), we'd have no way to index joint observations. Here's an example:

Lets say you sample 5 values from each of two random variables, $X$ and $Y$:

  • Observed X's = $1,1,2,5,3$
  • Observed Y's = $0,1,1,0,1$

Now, you want to develop a joint distribution that describes these observations as random variables (i.e., different aspects of some common object). How will you do this? Most importantly, you need to first associate an observation from $X$ with an observation from $Y$. Implicit in this association is the assumption that there is some common sample space $\Omega_J$ that justifies us associating, say, the first observation of $X$ with the first observation of $Y$ to form the joint observation $(1,0)$ (in this example).

So, in my example, we are assuming there is some underlying event $\omega'\in \Omega_J$ such that $X(\omega')=1$ and $Y(\omega')=0$ and that there is a valid underlying probability space $(\Omega_J,\mathcal{F}_J,P_J)$ whose image will produce the observed joint distribution of $(X,Y)$.

However, we could dispense with all of this if we chose to model $X,Y$ not as random variables but as direct observations (the integers are our experimental units or foundation data).

At this point, you may still be unconvinced of the usefulness of the sample space view...

So, let's say you develop your distribution of $X,Y$ directly (no sample space[i.e., domain-less in your terminology]), then you want to add a new quantity $Z$. How do you do this. Without an underlying sample space you need to develop the joint distribution manually from first principles (i.e., ad hoc) whereas invoking the idea of an underlying sample space makes extending joint distributions a natural consequence of defining a new function over the same (usually implicit) underlying probability space. The fact that this can be assumed to be true is a major theoretical elegance of modern probability theory.

Again, it's a matter of perspective, but the random variables view, at least to me, has a philosophical/conceptual elegance to it when you consider joint observations and stochastic processes.

Here is a nice post in math.overflow that discusses something similar.

  • This answer contains so many useful things additionally to Jason's answer, that I just can't let it go unrewarded! Since I can't split the bounty, I wanted to award you with 50rep for it, but I just discovered that the system allows me to reward only at least double the previous bounty, meaning 300rep, which is too much. I will think of a way to get you at least some rep points as my way of saying "thanks" for such a great answer, just give me a few days. – temo Apr 27 '17 at 14:32
  • (P.S. May I ask why on your profile you say "delete me"?) – temo Apr 27 '17 at 14:32
  • @temo glad you found my post helpful. Don't worry about the bounty, just glad it helped. Thanks for pointing out my profile..I had originally thought to delete my account some time ago and I forgot to revise the wording. –  Apr 27 '17 at 16:24
3

If you are only interested in a specific collection of random variables, then I would argue that the most "natural" setting is to look at their joint law as you have done for the coin tosses. For example, if $(Z_n)$ is an iid sequence of $\mathcal N(0,1)$ random variables, we would have $\Omega=\mathbb R^{\mathbb N}$, $\mathcal F$ the (Borel) product $\sigma$-field, and $\mathbb P$ the probability measure such that $$\mathbb P\left\{\omega\in\mathbb R^{\mathbb N}\,:\,\omega(n)\in A_n\text{ for }n=1,\ldots,N\right\}=\prod_{n=1}^N\frac1{\sqrt{2\pi}}\int_{A_n}e^{\frac{-x_n^2}{2}}dx_n$$ for all Borel sets $A_1,\ldots,A_N$ (recall this uniquely defines a probability measure by Kolmogorov's theorem). In this case we have $Z_n(\omega):=\omega(n)$. Of course, having the random variables be independent is a trivial example, but the basic idea remains in far greater generality: suppose for each $i\in I$, we have a measurable space $(E_i,\mathcal A_i)$ and random variables $X_i$ defined on $E_i$ such that $$\mathbb P(X_i\in A_i\text{ for }i\in F)=:p\bigg(F,\prod_{i\in F}A_i\bigg)$$ is known for every finite $F\subset I$ and every collection of measurable sets $A_i\in\mathcal A_i$. Consider $\Omega:=\prod_{i\in I}E_i$, $\mathcal F$ the product $\sigma$-field of the $\mathcal A_i$'s, and the unique probability measure $\mu$ on $\mathcal F$ such that $$\mu\{\omega\in\Omega\,:\,\omega(i)\in A_i\text{ for }i\in F\}=p\bigg(F,\prod_{i\in F}A_i\bigg).$$ As far as the collection of random variables $(X_i)$ is concerned, $(\Omega,\mathcal F,\mu)$ knows as much as whatever our original probability space does, so we may as well assume our original space was $(\Omega,\mathcal F)$ with $\mathbb P=\mu$. Again, in this case we have $X_i(\omega)=\omega(i)$.

This set-up is not perfect, of course. For a simple example, take Brownian motion $(B_t)$. We would like to say $\mathbb P(t\mapsto B_t\text{ is continuous})=1$, but under the product $\sigma$-field this event is not even measurable. There are ways to work around such problems (in this specific case you use Kolmogorov's continuity theorem) but are usually handled on a case-by-case basis.

Another issue is when you are looking at a sequence of spaces. Consider for instance particles on the discrete $N$-torus performing symmetric simple exclusion. Explicitly, each particle independently performs a (continuous time) simple random walk on the the torus $\mathbb T_N:=\mathbb Z/N\mathbb Z$, but if a particle attempts to jump to a position which is already occupied, no jump occurs. It is interesting to consider the asymptotics of such a process, i.e. what happens as $N$ becomes large. But for distinct $N$, we necessarily require different probability spaces. How does it make sense to consider different $N$ simultaneously? What would be the probability of an event of the form $$\{\text{the system on the $N$-torus is at state $A$}\}\cap\{\text{the system on the $M$-torus is at state $B$}\}?$$

This is why we don't usually bother with the probability space too much. We know it exists - Kolmogorov's theorem guarantees that in many cases, and when there is delicate points like continuity, there are theorems that get around the problem. So we ignore it. Why do we care about the space? It's not that it's something so abstract we couldn't possibly begin to understand it, but we are almost without exception interested in some collection of random variables, and knowing everything we can about their joint law is enough.

EDIT: To address your questions below.

$1)$ The Kac-Rice formula provides a method for computing the (expected) number of zeroes of a smooth Gaussian field. The source I have linked deals exclusively with the case where the field depends only on finitely many iid $\mathcal N(0,1)$ random variables, in which case existence of the appropriate probability space is dealt with via our earlier example. However, the Kac-Rice formula still holds for a more general smooth Gaussian field $F:U\rightarrow\mathbb R$ for some open $U\subset\mathbb R^N$, only we now need to be careful in what conditions we place on the correlations; without a certain degree of correlation, we cannot hope to have a smooth field (e.g. if $\{F(x)\}_{x\in U}$ are iid then obviously $F$ is not smooth, or even continuous). Once we have appropriate conditions, the approach is similar to the Brownian motion case: first we construct a field in the standard (i.e. Kolmogorov consistency theorem) way, and then we show there is a smooth version.

$2)$ I don't believe there is a sensible answer to this question. This is common when we model particle systems using probability theory: we assume there are $N$ particles and construct our model, then we see what happens if $N$ is large (which seems sensible since if our system is macroscopic, we would expect something on the order of $10^{20}$ particles). We are not assuming that, for instance, we have some global space and then we keep adding more particles - for each $N$, the models are distinct. As you may imagine one needs to think about what it means for such a system to "converge" - typically, we will identify the state of the system with some empirical measure and then consider weak convergence of measures.

$3)$ One must be careful to remember that random variables and probability theory in general is a model for statistics, they are NOT the same thing. So strictly speaking, there isn't some abstract probability space underlying people's heights; height is deterministic, and it is simply a matter of who you choose to survey. Of course, it may be extremely useful to model such an experiment as drawing a sample of size $N$ from a particular distribution. I would argue that the most "natural" probability space for this model would be the joint law of an independent (infinite) sequence of random variables of the given distribution. So for example, if our distribution was the standard normal (which is obviously absurd for height, but you get the idea), then the natural space is the very first example I gave. As we saw there, this probability measure is more than capable with dealing with only a finite number of random variables, and has the advantage that it does not matter what your size $N$ is - you could keep surveying more people if you wanted. Again, this is getting into how you choose to model a specific experiment or problem, and it is important not to assert that there is a "true" probability space.

Jason
  • 15,438
  • This seems to be a very interesting answer, but there are still a few points that are open, before I accept it, in particualr 3) below (sorry for responding so late, I was travelling): – temo Apr 26 '17 at 10:41
  • Can you give me one more reference (article/boot/etc) please, beside the continuity of Brownian motion that is resolved by Kolmogorovs theorem, when you say "There are ways to work around such problems but are usually handled on a case-by-case basis."
  • – temo Apr 26 '17 at 10:42
  • I'd be interested in what answers you have for the rhethoric question that you ask: "How does it make sense to consider different N simultaneously? What would be the probability of an event of the form ... ?" Did you leave them unanswered, because there is no answer to them?
  • – temo Apr 26 '17 at 10:42
  • For statistical experiments, e.g. measuring the height of a population, and retaining those values in a random vector, what would be an interesting, complelling probability space? You answer made it clear to me beyond doubt that we are capable of finding some suitable probability space, given a sequence of random variables. [...]
  • – temo Apr 26 '17 at 10:42