Understanding random variables as functions

Question

First of all, I have read What is a function and I have understood it basically and it is clear to me that in order to caluclate statistics "things" have to be transformed or mapped to numbers.

I have read that a random variable $X$ is (or can be thought) as a function. $X:\Omega\rightarrow\mathbb{R}$ and then $X(\omega) = ...$ Say we have a coin with $\Omega = \{H,T\}$ then we could do $X:\{H,T\} = \{0,1\}$.

My question is here about the meaning of $X$ or how to "pronouce" it. I would say $X$ is just a placeholder or short for "map (or transform) the character "H" into 0 and "T" into 1. Or if we wean to count the numbers of getting tails then X:{H,T} = if tails is facing upwards increase the counter by 1. And $X$ is just short for the if sentene. Is this right?

Second, say I have a data set like this \begin{array}{|c|c|c|} \hline id& coin & value \\ \hline 1& H & 0\\ \hline 2& H & 0\\ \hline 3& T & 1\\ \hline \end{array} then "coin" is no random variable because it isn't a number and only "value" is. Is this true?

Random variable: "A random variable (also called random quantity, aleatory variable, or stochastic variable) is a mathematical formalization of a quantity or object which depends on random events" — Mauro ALLEGRANZA, May 20 '22 at 09:01
It is like using the symbol $v$ as name for the mathematical function velocity that represents a physical fact. — Mauro ALLEGRANZA, May 20 '22 at 09:02
If $X: {H,T} \to {0,1}$ then remember that $\Omega={H,T}$ is the sample space of a probability space. There are potentially $|{0,1}|^{|{H,T}|}=4$ such random variables $X$s, one with $\mathbb P(X=1)=\mathbb P(\omega =H)$, another with $\mathbb P(X=1)=\mathbb P(\omega =T)$, plus the trivial $\mathbb P(X=1)=1$ and $\mathbb P(X=1)=0$ — Henry, May 20 '22 at 09:19
I was once in the room with a well-known statistician who asked the group "What is a random variable?" I was all set to reply with the "function from a probability space to the real line" definition, but he surprised me by answering his question thus: "A random variable is a number you don't know." Just something to think about. ;-) — awkward, May 20 '22 at 14:59

PhysicsKid · Answer 1 · 2022-05-20T09:31:49.443

1

In statistics, we want some way to go from the space of random events to the space of real numbers that we can deal with. Random variables are exactly how we go from "random event", defined as a particular set of elements of $\Omega$ to "real number."

Your example with the coin is very instructive. We want to find some "statistics" that can tell us, with some level of specificity, what happened.

For example, the random variable that we use depends on the use we have for it. If we want to figure out what the probability of getting a heads is out of the sample space of $n$ tosses of the coin, then we would reasonably pick our random variable as $X(\omega) = \text{# of heads in }\omega$.

If we let $n = 2$ then $\Omega = \{HH, HT, TH, TT\}$ and $X$ is the mapping \begin{array}{|c|c|} \hline \omega & X \\ \hline HH & 2\\ \hline HT & 1\\ \hline TH & 1\\ \hline TT & 0\\ \hline \end{array}

So $X$ tells us important information about which $\omega$ actually happened over the course of our two tosses!

$X$ is a mapping from random events and we can make statements about the properties of $X$ based on how frequently each of the "events" $\omega$ occur.

edited May 20 '22 at 09:31

answered May 20 '22 at 09:16

PhysicsKid

156

Here if $\Omega={H,T}^2={HH,HT,TH,TT}$ then there are $3^4= 81$ potential random variables $X: \Omega \to {0,1,2}$ of which one is your "number of heads in two tosses" counting function – Henry May 20 '22 at 09:25
Exactly! But, if we are losing the "number of heads in two tosses" restriction, why keep the ${0, 1, 2}$ mapping? There are an uncountably infinite number of potential random variables $X$! I think I was trying to make clear that we try to pick $X$ such that it gives us information about the event $\omega$. – PhysicsKid May 20 '22 at 09:32
Most random variables lose information from the sample space, they would have to be an injective function to not lose information. Your counting heads example is not injective, as it treats $HT$ and $TH$ as being the same – Henry May 20 '22 at 09:38
I think "not lose information" is a different question than "gives information." I specifically mentioned in the example that we want to model the "probability of getting a heads" and thus a sufficient statistics is the sum of the number of heads. We don't NEED/WANT the information from any of the other possible random variables you could define to get the best possible estimate of the "probability of getting a heads." – PhysicsKid May 20 '22 at 09:53

score 1 · Answer 2 · answered May 20 '22 at 09:27

By definition, a random variable is just a measurble function over the whole space $\Omega$.

It's possible that a random variable is just to use numbers to encode different random outcomes. For example, in information theory, we can define the entropy of a random variable, where the value of the random variable doesn't matter and all we need is whether two outcomes are the same or not. In that sense, the expectation or average value of the random variable is not a meaningful quantity, just like the average of head and tail doesn't make much sense.

But the value of the random variable often made some sense. Like if you win a coin flip, you get 1 million dollars otherwise you get 0. Here $0$ and $1$ million are clearly specific numbers related to the problem you are studying that should not be replaced by two other distinct numbers. In this case, $X(H)=10^6, X(T)=0$ is not just a place holder.

When we say "a random variable", the second point of view is often implicitly assumed, as we often want to know its expectations and variance, etc. As there is no harm to use them as place holders, there is no need to have a "coin" instead of "values" in your case. One could argue, there is difference between how random variables are treated in information theory and statistics/data science, but it's not worth the effort to emphasize the difference.

Understanding random variables as functions

2 Answers2