8

I thought about this question recently, because we accidentally stated this lemma with convergence in probability instead of the usual almost sure convergence as an exercise. The usual proof with Fatou's lemma does not work in this case.

It turns out that you can in fact generalize Scheffe's lemma to the following

Generalized Scheffe's Lemma

Assume that $(X_n)_{n\in\mathbb{N}}\subset L^1 $ converges in probability to $X_\infty\in L^1$. Then the following statements are equivalent:

  1. $\mathbb{E}[|X_{n}|]\to \mathbb{E}[|X_{\infty}|]<\infty$, as $n\to \infty$.
  2. For all $\epsilon>0$ we have $\limsup_{n\to\infty} \mathbb{E}[|X_n|\mathbb{1}_{|X_\infty-X_n|> \epsilon}] \le \epsilon$
  3. $\{X_\infty, X_1, X_2,\dots\}$ are uniformly integrable
  4. $X_n\to X_{\infty}$ in $L^1$, as $n\to \infty$

I am not sure if this result is new (probably not) but I wanted it to become easier to find. So here is a proof.

Felix B.
  • 2,425
  • The convergence in probability version is easily deduced from the convergence a.s. version using the "Uryshonn subsequence principle" and the fact that convergence in probability implies convergence of a subsequence a.s.. Also note that equivalence of 1 and 4 is a special case of the generalized DCT. – Mason Mar 12 '22 at 20:25
  • @Mason sure you get an a.s. converging subsequence form convergence in probability, which then also converges in $L^1$, but it is not obvious how you can deduce that the entire sequence converges in $L^1$ from that. For the DCT to work you need almost sure convergence, because the pointwise limit inside the integral is otherwise not properly defined as far as I know – Felix B. Mar 12 '22 at 20:29
  • I've added an answer to illustrate the use of Uryshon's subsequence principle. As for the generalized DCT, it also works when the sequence converges in measure. You are already given $X_{\infty}$ as the limit in measure; the generalized DCT allows you to conclude convergence in $L^1$. – Mason Mar 12 '22 at 21:15
  • To add a couple of references: In Schilling's book Measures, Integrals and Martingales, 2nd ed (2017), this is Thm 22.7. In Kallenberg's book Foundations of Modern Probability, 2nd ed (2002), this is Thm 5.12. Schilling states the result for a $\sigma$-finite measure space (not just a probability space) and in $L^p$ for any $p \geq 1$ and calls it Vitali's Theorem. Kallenberg states the result for probability spaces and in $L^p$ for any $p > 0$. – NRH Dec 21 '23 at 10:42

2 Answers2

5

Proof

(i) $\Rightarrow$ (ii): Fix $\epsilon>0$ and choose any $\eta>0$. Then there exists $n_0\in\mathbb{N}$ such that

\begin{align}\label{generalized scheffe (i)->(ii)1} \mathbb{E}[|X_n|] - \mathbb{E}[|X_\infty|] \le \tfrac{\eta}2 \quad \forall n\ge n_0. \end{align}

Since $\{X_\infty\}$ is uniformly integrable, there exists $\delta(\eta)>0$ such that \begin{align*} \mathbb{E}[|X_\infty| 1_{A}] \le \tfrac\eta2 \quad \forall A\in\mathcal{A} : \mathbb{P}(A) < \delta(\eta). \end{align*} Since $X_n\to X_\infty$ in probability, there therefore exists $n_1\in\mathbb{N}$ such that for all $n\ge n_1$ \begin{align}\label{generalized scheffe (i)->(ii)2} \mathbb{P}(|X_\infty - X_n| > \epsilon) \le \delta(\eta) \implies \mathbb{E}[|X_\infty| 1_{|X_\infty - X_n| > \epsilon}] \le \tfrac\eta2 \end{align} Now we plug things together. We have for $n\ge\max\{n_0, n_1\}$ \begin{align*} \mathbb{E}[|X_n| 1_{|X_\infty-X_n|>\epsilon}] &= \mathbb{E}[|X_n|] - \mathbb{E}[|X_n| 1_{|X_\infty-X_n|\le\epsilon}]\\ &= \underbrace{\mathbb{E}[|X_n|] - \mathbb{E}[|X_\infty|]}_{ \le \tfrac\eta2 } + \underbrace{ \mathbb{E}[(|X_\infty| - |X_n|)1_{|X_\infty-X_n|\le\epsilon}] }_{ \begin{aligned} &\le \mathbb{E}[|X_\infty - X_n|1_{|X_\infty-X_n|\le\epsilon}]\\ &\le \epsilon \end{aligned} } + \underbrace{\mathbb{E}[|X_\infty|1_{|X_\infty-X_n|>\epsilon}]}_{ \le \tfrac\eta2 } \end{align*} This implies the claim. The first sum exploits the fact that the mass of the random variables has to be similar. So we can not concentrate mass on a small event such as $X_n$ being far from $X_\infty$. Which is what examples of convergence in probability without $L^1$ convergence exploit.

(ii) $\Rightarrow$ (iii): We want to show \begin{align*} \lim_{M\to\infty} \sup_n \mathbb{E}[|X_n|1_{|X_n|> M}] = 0. \end{align*} So fix some $\delta>0$. We select $\epsilon := \tfrac\delta4$ and then select $n_0\in\mathbb{N}$ such that for all $n\ge n_0$ we have \begin{align}\label{convergence ui} \mathbb{E}[|X_n|1_{|X_\infty - X_n| > \epsilon}] \le \epsilon + \tfrac\delta4. \end{align} As $\{X_1, \dots, X_{n_0}\}$ is a finite set it is uniformly integrable, and there exists $M_0$ such that \begin{align}\label{finite case} \sup_{0\le n\le n_0} \mathbb{E}[|X_n| 1_{|X_n| > M}] \le \delta \quad \forall M\ge M_0. \end{align} Similarly there exists $M_1$, such that \begin{align}\label{X_infty is ui} \mathbb{E}[|X_\infty| 1_{|X_\infty|>M-\epsilon}] \le \tfrac\delta4 \quad \forall M\ge M_1 \end{align} To put things together, note that we always have \begin{align}\label{indicator functions} 1_{|X_n| > M} \le 1_{|X_\infty - X_n| > \epsilon} + 1_{|X_\infty|>M-\epsilon}1_{|X_\infty-X_n|\le \epsilon}. \end{align} This implies for all $M\ge \max\{M_0, M_1\}$ \begin{align*} &\sup_{n\in\mathbb{N}} \mathbb{E}[|X_n| 1_{|X_n| > M}]\\ &\le \max\Big\{ \underbrace{\sup_{0\le n\le n_0} \mathbb{E}[|X_n| 1_{|X_n| > M}]}_{\le \delta}, \sup_{n\ge n_0} \underbrace{\mathbb{E}[|X_n| 1_{|X_\infty - X_n|>\epsilon}]}_{ \le \tfrac\delta2 } + \underbrace{\mathbb{E}[|X_n| 1_{|X_\infty|>M-\epsilon}1_{|X_\infty - X_n|\le\epsilon}]}_{ \begin{aligned} &\le \mathbb{E}[|X_\infty| 1_{|X_\infty|>M-\epsilon}] + \epsilon\\ &\le \tfrac\delta2 \end{aligned} } \Big\} \end{align*}

(iii) $\Rightarrow$ (iv): Fix some $\epsilon >0$, then due to uniform integrability there exists some $\delta >0$, such that $\mathbb{P}(A)<\delta$ implies for all $n$ \begin{align}\label{ui result} \mathbb{E}[|X_n| 1_{A}] \le \epsilon \quad \text{and}\quad \mathbb{E}[|X_\infty|1_{A}] \le \epsilon. \end{align} Now we choose $n_0\in \mathbb{N}$ such that \begin{align*} \mathbb{P}(|X_\infty - X_n| > \epsilon) \le \delta \quad \forall n\ge n_0. \end{align*}
With the previous result, this implies for all $n\ge n_0$ \begin{align*} \mathbb{E}[|X_\infty - X_n|] \le \underbrace{\mathbb{E}[|X_\infty - X_n|1_{|X_\infty - X_n|\le \epsilon}]}_{\le \epsilon} + \underbrace{ \mathbb{E}[|X_\infty| 1_{|X_\infty - X_n| > \epsilon}] }_{\le \epsilon} + \underbrace{ \mathbb{E}[|X_n| 1_{|X_\infty - X_n| > \epsilon}] }_{\le \epsilon} \end{align*}

(iv) $\Rightarrow$ (i): This finally follows from the reverse triangle inequality \begin{align}\label{generalized scheffe: reverse triangle inequality} \big\lvert|x|-|y|\big\rvert\le |x-y|, \end{align} and Jensen's inequality \begin{align*} \big\lvert \mathbb{E}\big[\lvert X_n \rvert\big] - \mathbb{E}\big[\lvert X_{\infty}\rvert\big] \big\rvert &= \big\lvert \mathbb{E}\big[\lvert X_n \rvert- \lvert X_{\infty}\rvert\big] \big\rvert\\ &\le \mathbb{E}\big[ \big\lvert \lvert X_n\rvert - \lvert X_{\infty}\rvert \big\rvert\big]\\ &\le \mathbb{E}\big[\lvert X_n - X_{\infty}\rvert\big] \to 0, \; (n \to \infty). \end{align*}

Felix B.
  • 2,425
3

I'll demonstrate how to show 1 implies 4 only using the a.s. version of this theorem. Suppose $E(|X_n|) \to E(|X_\infty|)$. Let $(X_{n_k})$ be an arbitrary subsequence of $(X_n)$. Since $X_{n_k} \to X_{\infty}$ in probability, there is a subsequence $(X_{{n_k}_j})$ such that $X_{{n_k}_j} \to X_{\infty}$ a.s. Since $E(|X_{{n_k}_j}|) \to E(|X_{\infty}|)$, the a.s. version of the theorem implies that $X_{{n_k}_j} \to X_{\infty}$ in $L^1$. Since $X_{n_k}$ was an arbitrary subsequence, the Urysohn subsequence principle yields $X_n \to X_{\infty}$ in $L^1$.

Note that the implication 1 $\implies$ 4 is a special case of the generalized DCT, which holds on an arbitrary measure space. Generalized DCT is often stated for sequences converging a.e., but by similar subsequence arguments as above can also be shown to hold for sequences converging in measure.

Mason
  • 10,415
  • what is the uryshonn subsequence principle? I kind of assumed that it was the fact that you can select an a.s. subsequence from a sequence converging in probability. But that does not appear to be it. Google does not yield anything useful. Same thing for this generalized DCT. No idea what you are referring to so I can not judge whether this answer makes sense – Felix B. Mar 12 '22 at 21:28
  • @FelixB. The subsequence principle says that if $(a_n)$ is a sequence in a topological space $X$ and $a \in X$, then $a_n \to a$ if and only if every subsequence of $(a_n)$ has a further subsequence that converges to $a$. The proof is easy: If $a_n \not\to a$, then there is some neighborhood $U$ of $a$ such that $(a_n)$ is not eventually in $U$. Thus we can construct a subsequence $(a_{n_k})$ with $a_{n_k} \notin U$ for all $k$. – Mason Mar 12 '22 at 22:06
  • @FelixB. Generalized DCT says that if $f_n \in L^1$ and $f_n \to f$ a.e. and $|f_n| \leq g_n \in L^1$ and $g_n \to g \in L^1$ a.e. and $\int g_n \to \int g$, then $f_n \to f$ in $L^1$. In my answer I am saying that we can use the subsequence principle to prove the generalized DCT for $f_n \to f$ in measure rather than a.e. – Mason Mar 12 '22 at 22:16
  • okay I see, thank you – Felix B. Mar 13 '22 at 21:35