2

In for example this paper the authors say

The central limit theorem provides an estimate of the probability \begin{align} P\left( \frac{\sum_{i=1}^n X_i - n\mu}{\sigma \sqrt{n}} > x \right) \end{align} ... the CLT estimates the probability of $O(\sqrt{n})$ deviations from the mean of the sum of random variables ... On the other hand, large deviations of the order of the mean itself, i.e., $O(n)$ deviations, is the subject of this section [Cramer-Chernoff Theorem].

It is not clear to my why the CTL can't be used to calculate large deviations. Following the answer of my previous question for large $n$, the CTL tells me, that the mean is approximately normally distributed as $$P\left(|\sum_{i=1}^n X_i - n\mu| \geq x\right) \approx 2\Phi\left(-\frac{x \sqrt{n}}{\sigma}\right)$$

Why (and in which cases) should Cramers theorem be used if $x$ is large and not the CTL?

  • My simplest explanation is that the Central Limit Theorem gives a small absolute error in calculating the probability, and this tends to get smaller with larger $n$ (notably when $x/\sqrt{n}$ is held constant so the approximation does not change) but this does not provide an assurance of small relative error when the probability is tiny as it is with large deviations. – Henry Oct 29 '14 at 08:21
  • 1
    @Manuel The $x$ in the LHS of your last formula should be $nx$. But then the asymptotics given by the RHS is quite wrong since CLT and large deviations principle apply to deviations of $S_n$ from its mean of different magnitudes, respectively $\sqrt{n}$ and $n$. To apply one in the regime of the other may result in wrong asymptotics, for example the one at the end of your post is probably only true when the $X_n$s are normal. – Did Oct 29 '14 at 17:57

1 Answers1

1

Large deviation gives you an estimate of the probabilities in the non-typical regime, whereas CLT gives you an estimate of the probabilities in the typical regime. Suppose $X_i$ have finite variance and are a.s. positive. Then CLT gives, $$ |P(\frac{\sum_i X_i - n\mu}{\sigma \sqrt{n}} \ge x) - \Phi(-x)| \to 0. $$ But CLT does'nt say anything if you let $x$ grow with $n$ as well. Say you want to know $P(\sum_i X_i - n\mu >\mu n)$ and $X_i$'s are all positive a.s. Then replacing $x$ by $\sqrt{n} \mu/\sigma$ would give you an approximation of this probability as $e^{-cn}$ from some $c>0$. But this would not be correct. Lets say, $X_i$ have a very heavy tail, $P(X_i > n) \ge n^{-4}$. Then if $X_1 >2n\mu$ then $\sum_{i \le n}X_i > 2n\mu$ and hence $$ P(|\sum_{i \le n}X_i - n\mu| > n\mu) \ge P(\sum_{i \le n}X_i > 2n\mu) \ge P(X_1 > 2n\mu) \ge (2n\mu)^{-4} $$ and you do not get an exponential bound, but a polynomial bound.

gmath
  • 1,395
  • 1
    "Large deviation is more precise" No. "Then we know via CLT..." Sorry but we know nothing of the sort of what you write afterwards. As a matter of fact, if we "knew" what you write, every action of every large deviation principle would be quadratic. – Did Oct 29 '14 at 17:46
  • Sorry, I was careless with my writing. I have edited my answer. – gmath Oct 29 '14 at 21:22