If $X_n \to X $ in probability then $f(X_n) \to f(X)$ in probability for a Borel function $f$

Question

$(X_n)_n$ is a sequence of identically distributed random variables, $f:\mathbb{R} \to \mathbb{R}$ a Borel function.

Prove that if $X_n$ converges in probability to $X,$ then $f(X_n)$ converges in probability to $f(X).$

@Kurt.W.X What was the counter-example you came up with for the a.s. convergence part? — Mike Earnest, Feb 25 '21 at 21:30
If $f$ is continuous, it is trivial. Finite measures on $(\mathbb{R}, \mathcal{B}(\mathbb{R}))$ are automatically Radon. I am thinking about Lusin's Theorem. Recall that a Borel function is a continuous function except on a set of small measure. Unfortunately, measure less than epsilon and measure zero are different... — Danny Pak-Keung Chan, Feb 26 '21 at 02:52
I think it is still difficult to show $\lim_k P(|f_k(X_n)-f_k(X)|>\epsilon/3)=0$. That is, assuming $f$ is bounded and has compact support does not help prove $f(X_n)\to f(X)$ in probability. — Mike Earnest, Feb 26 '21 at 17:31
You could show that the set of Borel functions $f$ for which the statement is true contains the continuous functions and is stable under passing to the limits. That shows (from a non-trivial theorem) that it contains all Borel functions. — Stéphane Laurent, Feb 27 '21 at 11:50
Any proof needs to make use of the "identically distributed" assumption, because the conclusion does not follow without it (look at $X_n=1/n,X=0,f(x)={\bf 1}(x=0)$). You should look at your proof and figure out where you apply this assumption, @Kurt.W.X. — Mike Earnest, Feb 27 '21 at 16:05
@Kurt.W.X, the latest edit doesn't help. The claims made there aren't particularly obvious (e.g. showing $K$ is a $\sigma$-algebra...) nor is it clear that it offers a simpler solution than the accepted one. — , Mar 21 '21 at 03:14

Nate Eldredge · Accepted Answer · 2021-02-28T02:43:17.600

4

Following up on a line of reasoning suggested by MikeG and Danny Pak-Keung Chan:

Let $\mu$ be the law of the $X_n$ (which is also the law of $X$). Fix $\epsilon, \eta > 0$. By Lusin's theorem applied to the measure $\mu$, there is a compact set $K \subset \mathbb{R}$ with $\mu(K) > 1-\eta$ on which which $f|_K$ is continuous. It is thus even uniformly continuous, so there exists $\delta > 0$ such that for every $x,y \in K$ with $|x-y| < \delta$, we have $|f(x)-f(y)| < \epsilon$.

As such, for any given $n$, if we are to have $|f(X_n) - f(X)| \ge \epsilon$, then we must either have $X_n \notin K$, $X \notin K$, or $|X_n - X| \ge \delta$. Thus by union bound $$\begin{align*}P(|f(X_n) - f(X)| \ge \epsilon) &\le P(X_n \notin K) + P(X \notin K) + P(|X_n - X| \ge \delta)\\ &\le 2 \eta + P(|X_n - X| \ge \delta).\end{align*}$$ Since $X_n \to X$ in probability, letting $n \to \infty$ we conclude $$\limsup_{n \to \infty} P(|f(X_n) - f(X)| \ge \epsilon) \le 2\eta$$ and $\eta > 0$ was arbitrary.

A previous of the version asked whether, if $X_n \to X$ almost surely, we can conclude $f(X_n) \to f(X)$ almost surely. The answer is no. For simplicity, work on the group $(S^1,+)$. You can take it back to $\mathbb{R}$ if you like by identifying $S^1$ with $[0,1)$ and $+$ with addition mod 1.

Let $E \subset S^1$ be a closed nowhere dense set of positive Lebesgue measure, e.g. a generalized Cantor set, and $f = 1_E$. I claim there exists a sequence $t_n \in S^1$ with $t_n \to e$, the identity element, for which $\liminf_{n \to \infty} 1_E(x+t_n) = 0$ for every $x$. Once this is shown, we can let $X$ be uniformly distributed on $S^1$ and let $X_n = X + t_n$. Clearly every $X_n$ is uniformly distributed on $S^1$ and $X_n \to X$ a.s., but $\liminf f(X_n) = 0$ a.s., so $f(X_n) \not\to f(X)$ on the event $\{X \in E\}$, which has positive probability.

To prove the claim, note that $U=E^c$ is open and dense. So for any integer $m>0$, the sets $U-s$ where $d(s,e) < 1/m$ form an open cover of $S^1$. By compactness, there is a finite subcover $U-s_{m,1}, \dots, U-s_{m, k_m}$. This means that for every $x$ and every $m$, we have $x + s_{m,i} \in U$ for some $i$. Let $t_n$ be the sequence $$(s_{1,1}, \dots, s_{1, k_1}, s_{2,1},\dots, s_{2, k_2}, \dots).$$
Since $d(s_{m,i}-e) < 1/m$, we have $t_n \to e$. And by construction, for every $x \in S^1$, there are infinitely many $t_n$ such that $x + t_n \in U$. This means that $\liminf_{n \to \infty} 1_E(x+t_n) = 0$.

edited Feb 28 '21 at 02:43

answered Feb 27 '21 at 18:55

Nate Eldredge

97,710

1

@Kurt.W.X: Yes, since Lusin's theorem is valid in that setting. – Nate Eldredge Feb 28 '21 at 00:15
What are your thoughts on the a.s convergence? Is it preserved? – Kurt.W.X Feb 28 '21 at 00:17
@Kurt.W.X: I suspect it's false but I don't have a counterexample. My idea is to take a Borel function $f$ on the unit circle. We know that if, say, $f \in L^1$ and $a_n \to 0$ then $f(\cdot + a_n) \to f$ in $L^1$. If the claim were true then we would also have $f(\cdot + a_n) \to f$ almost everywhere. I've never heard such a theorem and if it were true I think I would know it. – Nate Eldredge Feb 28 '21 at 01:22
In particular I'd be suspicious of something like $f = 1_A$ where $A$ is positive measure and nowhere dense, like a generalized Cantor set. – Nate Eldredge Feb 28 '21 at 01:24
@Kurt.W.X: https://math.stackexchange.com/questions/1920408/convergence-in-norm-but-not-almost-everywhere-of-f-t-n-f-cdot-t-n-to-f?rq=1 seems like it can be modified to produce such a counterexample. It would produce an unbounded $f$ though, a bounded example would be more interesting. – Nate Eldredge Feb 28 '21 at 02:15
@Kurt.W.X: I think I have a counterexample now, see edit. – Nate Eldredge Feb 28 '21 at 02:43
@Kurt.W.X The way I propose in my comment to your question works under the general assumption of a separable metric space. And FYI this theorem is known as Slutsky's lemma in the Russian litterature. – Stéphane Laurent Feb 28 '21 at 15:59
@Kurt.W.X No. The theorem I'm referring to is sometimes called "Hausdorff-Banach-Lebesgue's theorem". I think it is proved in Kechris's book. It is far more difficult than the monotone class theorem, but is well-known I think. – Stéphane Laurent Mar 20 '21 at 19:20
@StéphaneLaurent: I've usually head this called the "functional monotone class theorem". It's sort of a combination of the mononote class theorem and an argument similar to Stone-Weierstrass. – Nate Eldredge Mar 20 '21 at 19:41
@NateEldredge This is a different theorem. It is proved in the context of descriptive set theory. – Stéphane Laurent Mar 20 '21 at 21:14
1

@StéphaneLaurent: Well in this case, with the domain being $\mathbb{R}$, I am pretty sure the statement you've given is a consequence of the functional monotone class and/or multiplicative system (functional $\pi$-$\lambda$) theorems, which are pure measure theory. I guess I am using the fact that the set of functions for which the statement is true is clearly a vector space. – Nate Eldredge Mar 20 '21 at 21:20
@Kurt.W.X: The best way to post an alternative proof would be in another answer. – Nate Eldredge Mar 21 '21 at 02:06

Kurt.W.X · Answer 2 · 2021-03-25T00:06:38.617

This is an alternative proof for the problem, where we will suppose that $X,X_n$ take values in a separable metric space $(U,d),f:U \to \mathbb{R}$ measurable function.

Consider $\mathcal{K}=\{E \in \mathcal{B}(U),1_E(X_n) \to^P1_E(X)\}.$ We will prove that $\mathcal{K}$ is a $\sigma$-algebra containing $\mathcal{B}(U).$

Clearly $U \in \mathcal{K}$ and $\mathcal{K}$ is a $\pi$-system. Let $E,F \in \ \mathcal{K}$ such that $E \subset F.$ Write $1_{F-E}=1_F-1_E$ to conclude that $F-E \in \mathcal{K}.$ Taking an increasing sequence $(E_n)_n$ in $\mathcal{K}$ and letting $E=\bigcup_nE_n.$

So, $ \lim_k1_{E_k}=1_E.$ Notice that for all $n,k \in \mathbb{N}$ $$P(|1_E(X_n)-1_E(X)|>\epsilon) \leq P(|1_E(X_n)-1_{E_k}(X_n)|>\epsilon/3)+P(|1_{E_k}(X_n)-1_{E_k}(X)|>\epsilon/3)+P(|1_{E_k}(X)-1_E(X)|> \epsilon /3)$$ In other word, using the fact that $X_n$ are identically distributed, $$P(|1_E(X_n)-1_E(X)|>\epsilon) \leq P(|1_{E_k}(X_n)-1_{E_k}(X)|>\epsilon/3)+2P(|1_{E_k}(X)-1_E(X)|> \epsilon /3)$$ so $\limsup_n P(|1_E(X_n)-1_E(X)|>\epsilon)=0$ and $E\in \mathcal{K}.$

So $\mathcal{K}$ is a $\sigma$-algebra.

If $E$ is a closed of $U,$ then by considering $f_k(x)=\frac{1}{(1+d(x,E))^{k}}$ (which is continuous on $U$ and $\lim_k f_k=1_E$) then $$P(|1_E(X_n)-1_E(X)|>\epsilon) \leq P(|1_E(X_n)-f_k(X_n)|>\epsilon/3)+P(|f_k(X_n)-f_k(X)|>\epsilon/3)+P(|f_k(X)-1_E(X)|> \epsilon /3)$$

so, $\limsup_nP(|1_E(X_n)-1_E(X)|>\epsilon)=0$ this means that $E \in \mathcal{K}$ and $\mathcal{B}(U)=\mathcal{K}.$

Now if $f:U \to \mathbb{R}$ is a measurable function, then it can be approximated by a sequence of simple function $\phi_k$ and from the above for all $k \in \mathbb{N},\phi_k(X_n) \to^P \phi_k(X)$ and again since $$P(|f(X_n)-f(X)|>\epsilon) \leq P(|f(X_n)-\phi_k(X_n)|>\epsilon/3)+P(|\phi_k(X_n)-\phi_k(X)|>\epsilon/3)+P(|\phi_k(X)-f(X)|>\epsilon/3),$$ so $\limsup_nP(|f(X_n)-f(X)|>\epsilon)=0.$

In this proof we didn't need to use the continuity on a compact (uniform continuity).

Separability is supposed just to ensure that $(X,X_n)$ is indeed a random variable $(\mathcal{B}(U^2)=\mathcal{B}(U) \otimes \mathcal{B}(U))$

If $f(X_1) \in L^p$ and $X_n \to ^{L^p} X$ then $f(X_n) \to ^{L^p} f(X).$

The statement is not true for almost sure convergence.

As I mentioned in a comment to your post, any proof of this problem must make use of the fact that $X_n$ are identically distributed, since the result is not true otherwise. I do not think you used that hypothesis anywhere, so something must be wrong. — Mike Earnest, Mar 24 '21 at 00:28
Thanks for clarifying, and I apologize for any rudeness in my last comment. — Mike Earnest, Mar 25 '21 at 00:10
No problem! The identical distribution (where we use it) should be mentioned earlier before — Kurt.W.X, Mar 25 '21 at 00:12

If $X_n \to X $ in probability then $f(X_n) \to f(X)$ in probability for a Borel function $f$

2 Answers2

Linked