6

A renowned professor of statistics (whose name I will not reveal here) told me that the notation $p(x)$ makes perfect sense when $p$ is a pdf and $x$ is a RANDOM variable (i.e. a function). I was a bit surprised because I never thought that a pdf accepts functions as input, but, actually, $p(x)$ means a composition of the pdf with the r.v. $x$ (a function), i.e. composition of functions, i.e. it would be equivalent to $p \circ x = f(x)$.

This information revolutionized my view of statistics and revolutionized the way I look at expressions, like $p(x)$, in many formulas, where I thought that $p(x)$ was actually an output (a number) of the function $p$ (e.g. a pdf) when evaluated at the point $x$ of its domain, even though, in certain cases, it seemed like $p(x)$ needed to be a function (but I only thought that whoever had written that was just careless and wrote $p(x)$ instead of just $p$). Now, what those people had written, i.e. $p(x)$, probably made sense, because $p(x)$ is a function, and, actually, a random variable, because $x$ is a random variable.

So, formally, why does it really make sense to compose random variables and p.d.f.s? An r.v. $x$ is typically defined as $x \colon \Omega \to E$, where $\Omega$ is the sample space and $E$ is a measure space (e.g. $\mathbb{R}$ should be measurable). What are the domain and codomain of the pdf? The domain should be $E$, because, otherwise, why can we compose $p$ (the pdf) and $x$ (the random variable)?

Moreover, in many cases, we define what is apparently a pdf, and then we use it in places that require "probability distributions" or "random variables". For example, on page 13 of these notes, we define the multi-variate Gaussian pdf as follows

$$ p(x)=\frac{1}{(2 \pi)^{n / 2} \operatorname{det}(\Sigma)^{1 / 2}} \exp \left(-\frac{1}{2}(x-\mu)^{T} \Sigma^{-1}(x-\mu)\right) $$

I thought that the $x$ in the formula above was the dummy variable of the pdf Gaussian (at least, that's how I used to read that formula above), i.e. an element of its domain, but, then, after that definition, the author derives the analytic expression for the computation of the KL divergence using $x$ as a random variable, because, at some point, he will take the expectation of $x$ and, as far as I know, we can only take expectations of random variables (with respect to distributions), so $x$ must be a random variable there. So, is $x$, in the definition of the Gaussian pdf above, also a random variable, and does that mean that the pdf (denoted by $p(x)$) is also a random variable?

1 Answers1

5

If $X$ is a random variable defined on $(\Omega, \mathcal F,P)$ and $f: \mathbb R \to \mathbb R$ is any Borel measurable function then $Y=f(X)$ is defined as the random variable on $(\Omega, \mathcal F,P)$ such that $Y(\omega)=f(X(\omega))$. This is indeed a random variable (in the sense it is a real valued measurable function on $(\Omega, \mathcal F,P))$. In particular, any pdf is a is a Borel measurable function $ \mathbb R \to \mathbb R$, so $p(X)$ makes perfect sense.

  • Ok, thanks for answering! But, on page 13 of these notes, can you tell me which $x$s are random variables and which ones are dummy variables, what is a random variable and what isn't, and why? I don't know if I need to think if something is a pdf or a random variable. The $x$s in the derivation of the KL divergence must be random variables, right? Because expectations can only be taken of random variables, i.e. expectations receive random variables (functions) as inputs (i.e. expectations are linear operators, functions of functions). –  Jul 26 '20 at 00:02
  • 1
    @nbro I think $x$ is just a real number on that page. Probabilists do not use small letters at all for random variables but many Statisticians do. That causes confusion. It is better to use $X,Y,Z...$ for random variables and $x,y,x..$ for real numbers.. – Kavi Rama Murthy Jul 26 '20 at 00:08
  • I agree with you about the notation, i.e. we should always clearly differentiate between random variables and dummy variables. –  Jul 26 '20 at 00:09
  • 1
    But what about the $x$s that appear in the derivation of $D(P_1 | P_2)$? They must be random variables because, at some point, we will take expectations of them and we will get the mean of the associated distribution. So, I don't understand the relationship between those $x$s in the derivation of the KL divergence and the $x$ in $p(x)$. You say that the $x$ in $p(x)$ is a dummy variable, how come that the distribution $P_1$ is essentially the pdf but with random variables instead of dummy variables? –  Jul 26 '20 at 00:14
  • 1
    @nbro You are right. I think when they write $E_{P_1} f(x)$ they are thinking of $x$ as random varible with distribution $P_1$. Like you, I also consider these as confusing notations! – Kavi Rama Murthy Jul 26 '20 at 00:18
  • In your example, $p(X)$ would then be defined as $\Omega \rightarrow \mathbb{R}$? Btw, are all p.d.f.s defined as functions of the form $\mathbb{R} \rightarrow \mathbb{R}$? –  Jul 26 '20 at 00:48
  • @nbro The answer is YES to both. – Kavi Rama Murthy Jul 26 '20 at 04:39