1

The Deep Learning Book (Goodfellow Et al. 2016) defines a random variable as (see section 3.2):

A random variable is a variable that can take on different values randomly. We typically denote the random variable itself with a lower case letter in plain typeface, and the values it can take on with lower case script letters. For example, x1 and x2 are both possible values that the random variable x can take on. For vector-valued variables, we would write the random variable as x and one of its values as x. On its own, a random variable is just a description of the states that are possible; it must be coupled with a probability distribution that specifies how likely each of these states are.

Random variables may be discrete or continuous. A discrete random variable is one that has a finite or countably infinite number of states. Note that these states are not necessarily the integers; they can also just be named states that are not considered to have any numerical value. A continuous random variable is associated with a real value.

This definition seems to be different from the regular definition of a random variable, which from my understanding, requires the value to be numerical (more concretely, the random variable defines a mapping from sample space to a measurable space).

This becomes apparent when you compare the regular definition of the expected value (discrete case):

$$ \mathbb{E}[X] = \sum_x xP(x) $$

compared to the definition deep learning book (see section 3.8):

$$ \mathbb{E}_{x \sim P}[f(x)] = \sum_x P(x)f(x) $$

where since their definition of random variable is not numerical, we require this mapping function $f$ that's basically what the random variable is actually suppose to be defined as. That is, their definition is more a synonym for the outcome of an experiment (i.e. a variable that takes on a value from sample space).

Is my understanding here correct? If so, is there a good reason for deviating from the standard definition for random variable? It seems quite confusing that they use a different definition from what is standard.

Jay Mody
  • 139
  • 1
    Seems natural to me. Suppose you have a hat filled with white and red balls. – user619894 Aug 31 '23 at 13:56
  • This definition is no deviation from the standard. – Kurt G. Aug 31 '23 at 14:17
  • related https://math.stackexchange.com/questions/3456658/expected-value-of-discrete-random-variable-taking-non-numeric-values

    https://math.stackexchange.com/questions/240673/what-exactly-is-a-random-variable

    https://stats.stackexchange.com/questions/236765/does-a-random-variable-needs-to-be-numeric

    – leonbloy Aug 31 '23 at 15:35
  • 1
    The book is wrong. A random variable, by definition, is real valued. A random element can take values in more general spaces, but not random variables. – Andrew Aug 31 '23 at 18:28
  • I once met a well-known statistician, who asked (rhetorically): "What is a random variable?" His answer: "A random variable is a number you don't know." I think the moral is that there are different definitions for different purposes. – awkward Sep 01 '23 at 13:36
  • @Andrew: You are not wrong, but in practice the notion "random variable" is very frequently used as a synonym for "random quantity" or "random element", at least among researchers and in my experience. – Matija Sep 01 '23 at 20:35
  • @Jay Mody: Strictly speaking, you are right, in a primer on probability a random variable is defined as a measurable map from the sample space to $\mathbb R$ equipped with the canonical $\sigma$-algebra (which is not canonical at all, see here). As pointed out in the answers, they were deliberately informal in the book. – Matija Sep 01 '23 at 20:51

2 Answers2

0

The textbook does not use

$$ \mathbb{E}_{x \sim P}[f(x)] = \sum_x P(x)f(x) $$

as the definition of the expected value of $x$. They use it as the definition of the expected value of $f(x)$. Quote from section 3.8: "The expectation, or expected value, of some function $f(x)$ with respect to a probability distribution $P(x)$ is the average, or mean value, that $f$ takes on when $x$ is drawn from $P$."

Indeed, if you take $f(x)=x$, then you recover the expression for the expected value of $x$.

$$ \mathbb{E}[X] = \sum_x xP(x) $$

Scott Hahn
  • 1,479
  • 2
    Re last sentence: You actually don’t because, as the book writes, $X$ need not be a number. In particular, it need not take values in a vector space, according to the book at least. – Andrew Sep 01 '23 at 21:42
  • ah, yes, you're right. Either way though, I wanted to illustrate that the book added a layer of indirection by using f(x) – Scott Hahn Sep 01 '23 at 21:45
0

The book is using the standard definition of random variable. They are just trying to explain it in a self-contained, intuitive, less formal way, one that can be understood by a reader who is less sophisticated with mathematics. They are not trying to define a different notion of random variable, and indeed, everything they do is consistent with the standard mathematical definition of random variables.

D.W.
  • 4,540