1

I'm currently messing around with confidence intervals and I can't really understand how a $t$ distribution converges to a normal distribution for large $n$.

For example, suppose we want to construct a $95\%$ confidence interval when we have a sample mean $\bar{X} = 74.8$ and sample variance $S = 1.23$ with $n = 143$.

I would construct the confidence interval using,

$$(\bar{X} - z_{1 - \frac{\alpha}{2}} \frac{S} {\sqrt{n}},\bar{X} + z_{1 - \frac{\alpha}{2}} \frac{S} {\sqrt{n}})$$

since $n>30$. If $n<30$ I would have used a $t$-distribution.

My question is why does the $t$ - distribution approach a normal distribution for relatively large $n$?

R.Evet
  • 165
  • 10
  • by googling: https://stats.stackexchange.com/questions/110359/why-does-the-t-distribution-become-more-normal-as-sample-size-increases – J Tg Sep 05 '17 at 01:45
  • To say a $t$-distribution is practically the same as a normal distribution if $n>30$ is to be confused. I might go with $n\ge 100.$ I think you're confusing this with something else: the distribution of a sample mean is approximately normal if $n\ge 30,$ and with distributions that aren't very skewed, $n\ge10$ may be plenty. But these are TWO DIFFERENT THINGS. – Michael Hardy Sep 08 '17 at 03:43
  • https://math.stackexchange.com/q/3240536/321264, https://math.stackexchange.com/q/2246154/321264 – StubbornAtom Feb 25 '20 at 17:24

2 Answers2

1

The $t$ distribution arises because you estimate the population standard deviation $\sigma$ by the sample standard deviation $S$. For smaller $n$, there is some significant chance that $S$ is quite a bit smaller than $\sigma$; thus for fixed $c$ and $n$, there is significant probability that $\overline{X}$ is within $c \sigma$ of $\mu$ and not within $cS$ of $\mu$. (The reverse is possible too, but less likely.) But for large $n$, $S$ is essentially guaranteed to be very close to $\sigma$, because it is an asymptotically consistent estimator for $\sigma$. And of course, if $S$ is close to $\sigma$, $\overline{X}$ being within $c\sigma$ and being within $cS$ of $\mu$ are nearly equivalent.

Note that strictly speaking you should always use the $t$ distribution for confidence intervals from a normally distributed population with unknown standard deviation. It is just negligibly different from the same interval constructed with the normal distribution if $n$ is large enough. How large $n$ needs to be really depends on how small a difference can be treated as negligible.

Ian
  • 101,645
0

Suppose $X_1,\ldots,X_n \sim \operatorname{i.i.d. N}(\mu,\sigma^2).$

Let $\overline X = \dfrac{X_1+\cdots+X_n} n.$

Let $S^2 = \dfrac 1 {n-1} \left( (X_1-\overline X)^2+\cdots+(X_n-\overline X)^2 \right). $

Then $ \dfrac{\bar X - \mu}{\sigma/\sqrt n} \sim N(0,1) $ and $\dfrac{\overline X - \mu}{S/\sqrt n} \sim t_{n-1}.$ The second one has $S$ where the first has $\sigma.$ If $n$ is large then the probability that $S$ is close to $\sigma$ is large, so these two random variables are nearly the same.

  • When n is small, how do I plot "unstandardized" raw sampling distribution of $\overline{X}$ which we transform to "standardized" t distribution with mean 0 and SD 1? – Parthiban Rajendran Sep 29 '18 at 12:59
  • @Paari : The standard t-distribution does not have standard deviation $1. \qquad$ – Michael Hardy Sep 29 '18 at 15:38
  • I have explained in detail my doubt here. Can you kindly check and enlighten? I am confused in understanding the resultant original sampling distribution which when transformed, we get standard t distribution. – Parthiban Rajendran Sep 29 '18 at 15:45