-1

Suppose I have a probability space $(A, \Omega, \mathbb{P})$. Traditionally, the expectation is defined as $$\mathbb{E}[X] = \int xf(x)\,dx \tag{1}$$ where $X$ is a random variable and $f$ is the probability density function (PDF).

In a measure theoretic setting we have $$\mathbb{E}[X] = \int X \,d\mathbb{P}. \tag{2}$$

I am new to probability, though I am comfortable working with measure theory, so it could be that some of these questions are relatively elementary. Basically I am unclear on the equivalence on these two integrals. Namely (2) makes no reference to the PDF, whereas (1) makes no reference to the random variable $X$. My best guess is that the PDF is somehow encoded within the random variable $X$, and that explains everything. But I am not sure if this is true and I could not find many sources that explain this directly.

CBBAM
  • 5,883
  • 2
  • 6
  • 18
  • 1
    A way to think of it is

    $$\int Xd\mathbb{P}(X\le x) = \int XdF_X(x) = \int xf(x)dx $$ where $F$ is the CDF for the distribution of $X$, and we have $F' = f$. The integral $\int Xd\mathbb{P}$ is the Riemann–Stieltjes integral. It is also a more general form because perhaps there is no density function for the distribution $X$ comes from.

    – oliverjones Aug 31 '22 at 17:21
  • @oliverjones I see, so each $X$ might have a different PDF. And if the PDF is known then we can use $\int xf(x) dx$, but if we do not know the PDF then we must resort to the measure-theoretic definition? – CBBAM Aug 31 '22 at 17:27
  • 1
    I made an answer here if it helps. I go into the full gory details using Williamson's Probability with Martingales notation. – FakeAnalyst56 Aug 31 '22 at 19:58
  • Also https://math.stackexchange.com/q/236077/321264 – StubbornAtom Sep 01 '22 at 14:11

2 Answers2

3

First you change variables to get \begin{align} E(X):=\int_{\Omega}X\,dP=\int_{\Bbb{R}}x\, dP_X(x), \end{align} where $P_X:=X_*P=P(X^{-1}(\cdot))$ is the push-forward measure/law/distribution of $X$ under $P$. This is the general setting.

Now specialize to the case where the pdf of $X$ exists. By definition this means the Radon-Nikodym derivative of the measure $P_X$ with respect to Lebesgue measure $\lambda$ exists; this is what we call the pdf, i.e \begin{align} f_X:=\frac{dP_X}{d\lambda}. \end{align} In this case we can write the expectation integral as \begin{align} E(X)&=\int_{\Bbb{R}}x\, dP_X(x)\\ &=\int_{\Bbb{R}}x\cdot\frac{dP_X}{d\lambda}(x) \, d\lambda(x)\\ &\equiv \int_{\Bbb{R}}x f(x) \, d\lambda(x), \end{align} where the second equality is a standard measure theory exercise.

peek-a-boo
  • 55,725
  • 2
  • 45
  • 89
2

Consider a measure space $(X, \mathscr{X}, \mu)$ and a function $f:X \to \mathbf{R}$ which is Borel measurable (relative to $\mathscr{X}$). Consider the following function $\nu$ defined on the Borel sets of $\mathbf{R}$ $$ \nu (B) = \mu(f^{-1}(B)), $$ where $f^{-1}(B)$ is the preimage of $B$ by $f.$ This $\nu$ can be shown to be a measure on the Borel sets of $\mathbf{R}$ which is therefore called the image measure of $\mu$ by $f$ and denoted $f(\mu).$

Suppose now that we have a probability space $(\Omega, \mathscr{F}, \mathbf{P})$ and a random variable $X$ (which by definition is a Borel measurable real-valued function). The image measure of $\mathbf{P}$ by $X$ is known as the distribution law of $X$ and, by measure theoretic arguments, this measure is univocally identified with a function $F:\mathbf{R} \to [0, 1]$ such that $F$ is non-decreasing, right-continuous, with left-limits and $F(-\infty) = 0,$ $F(\infty) = 1,$ we call graciously this $F$ as a distribution function. By definition then $$ \mathbf{E}(\mathbf{1}_B(X)) = \mathbf{P}(X \in B) = \int_B dF(x) = \int_\mathbf{R} \mathbf{1}_B(x) dF(x), $$ where $dF$ now denotes the distribution law of $X$ (defined on the Borel sets of $\mathbf{R}$). Using measure theoretic arguments (linearity and monotone classes of functions), we can show that $$ \mathbf{E}(u(X)) = \int_\mathbf{R} u(x) dF(x), $$ for any function $u:\mathbf{R} \to \mathbf{R}$ that is Borel measurable (in the sense that either both integrals exist and are equal, or neither of them exist).

Sometimes it is the case that $dF$ has a density relative to Lebesgue measure, this means that there exists a Borel function $f:\mathbf{R} \to \mathbf{R}_+$ such that $$ dF(B) = \int_B f(x) dx. $$ In this case, it can be shown that $$ \int_\mathbf{R} u(x) dF(x) = \int_\mathbf{R} u(x) f(x) dx. $$

Putting all of these together, you get what you were asking, the expected value of $X$ is $$ \mathbf{E}(X) = \int_\mathbf{R} x f(x) dx. $$

William M.
  • 7,532
  • Thank you! So is the PDF nothing but the Radon-Nikodym derivative of the distribution law measure with respect to the Lebesgue measure? The only point I am unclear of is given $\mathbf{E}(u(X)) = \int_\mathbf{R} u(x) dF(x) = \int_\mathbf{R} u(x) f(x) dx$, how come your last line is $\mathbf{E}(X)$ and not $\mathbf{E}(x)$, taking $u(x) = x$, or is this simply notational? – CBBAM Aug 31 '22 at 17:35
  • 2
    Yes, it is simply notational. In theoretical probability, it is quite common to write random variables with capital letters; however, in statistics (roughly, applied probability), the use of capital letter is less common because one need the flexibility to pretend that observed values (which aren't random) and the possible values (which are assumed random) can be interchangeable in the arguments. – William M. Aug 31 '22 at 17:37
  • 1
    Note that I wrote $E(u(X)),$ so if $u(t) = t,$ then $u(X) = X.$ Yes, the density of $dF$ is the Radon-Nikodym derivative of $dF$ which is none other than the usual derivative of $F.$ (This already shows that densities do no exist always as $dF$ needs to be regular enough to make $F$ smooth function.) – William M. Aug 31 '22 at 17:39
  • If we take $u(X) = X$ in your formula for $E[u(x)]$, by your last formula wouldn't this be in the most general case $\int X f(x) dx? Sorry I am still a little unclear on this. – CBBAM Aug 31 '22 at 17:49
  • Note that $X$ is a symbol that denotes a function $\Omega \to \mathbf{R},$ therefore, $\int X f(x) dx$ makes no sense ($X$ needs an $\omega \in \Omega$ to be evaluated, while $f$ needs $x \in \mathbf{R}$). On the other hand, I never wrote $E(u(x))$ but $E(u(X)).$ – William M. Aug 31 '22 at 17:55
  • My apologies I must have misread, thanks for all the help! – CBBAM Aug 31 '22 at 18:10