6

I encountered this problem in the book "Introduction to the Theory of Statistics" (by Mood, Graybill and Boes) and I have not been able to solve part (c):

"A bowl contains five chips numbered from 1 to 5. A sample of two drawn without replacement from this finite population is said to be random if all possible pairs of the five chips have an equal chance to be drawn.
(a) What is the expected value of the sample mean? What is the variance of the sample mean?
(b) Suppose that the two chips of part (a) were drawn with replacement. What would be the variance of the sample mean? Why might one guess that this variance would be larger than the one obtained before?
(c) Generalize part (a) by considering N chips and samples of size n. Show that the variance of the sample mean is $$\frac{N-n}{N-1}\frac{\sigma^{2}}{n}$$ where $\sigma^{2}$ is the population variance, that is $$\sigma^{2}=\frac{1}{N}\sum_{i=1}^{N}\Big(i-\frac{N+1}{2}\Big)^{2}$$"
* To solve part (a) I explicitly wrote the set of possible pairs with equal probability: $\Omega=\{(1,2),(1,3),(1,4),(1,5),(2,3),(2,4),(2,5),(3,4),(3,5),(4,5)\}$
From this it is easy to see that $Im\bar{X}=\{1.5,2,2.5,3,3.5,4,4.5\}$ where $\bar{X}=\frac{1}{n}\sum_{i=1}^{n} X_{i}$
Correspondingly the probabilities for this values are $(0.1,0.1,0.2,0.2,0.2,0.1,0.1)$
Hence, by definition, the expected value and the variance are: $E[\bar{X}]=3$ and $V[\bar{X}]=\frac{3}{4}$.
* For part (b) the same procedure gives us $E[\bar{X}]=3$ and $V[\bar{X}]=\frac{7}{6}$.
* Finally, for part (c) I tried to generalize what I did noticing that the least value for $\sum X_{i}$ is $\frac{n(n+1)}{2}$ and its greatest possible value is $\frac{n(2N-n+1)}{2}$.
Hence $Im\bar{X}=\{\frac{n+1}{2},\frac{n+1}{2}+\frac{1}{n},\frac{n+1}{2}+\frac{2}{n},\dots,\frac{n+1}{2}+(N-n)\}$ Clearly the probability for the first and last values is $\frac{1}{C^{N}_{n}}=\frac{n!(N-n)!}{N!}$ but I haven't come up with an idea of how to find the other probabilities. How can I get the rest of them?

  • This is the variance of the mean of a simple random sample in survey sampling. You could find the proof on an introductory book on sampling. For example, http://home.iitk.ac.in/~shalab/chapter2-simple-random-sampling.pdf – JACKY88 Dec 26 '14 at 13:55

2 Answers2

5

The device that @Did uses will apply to any situation where you are sampling without replacement from a finite population, regardless of what's written on the "chips".

Suppose that $\sigma^2$ is the population variance. This implies: If the random variable $X$ is the result of a single draw from the population, then $\operatorname{Var}(X)=\sigma^2$. Now consider drawing a sample of $n$ items $X_1,\ldots,X_n$ without replacement from the population. Since every pair $(X_i,X_j)$ for $i\ne j$ has the same joint distribution, the variance of the sum $S_n:=X_1+\cdots+X_n$ is $$ \operatorname{Var}(S_n)= n\operatorname{Var}(X_1) + (n^2-n)\operatorname{Cov}(X_1,X_2) = n\sigma^2 + n(n-1)c\tag1 $$ where we write $c$ for the covariance between the results of two distinct draws. Formula (1) applies in the case $n=N$, as well, with the extra bonus that $S_N$ is a constant (equal to the sum of all $N$ values in the population). It follows that $$ 0=\operatorname{Var}(S_N)=N\sigma^2+N(N-1)c.\tag2 $$ Solve equation (2) for $$c=-\frac{\sigma^2}{N-1}\tag3$$ and plug back into (1) to obtain $$ \operatorname{Var}(S_n)=n\sigma^2\left(1-\frac{n-1}{N-1}\right)=\frac{N-n}{N-1}\cdot n\sigma^2\tag4 $$ and $$ \operatorname{Var}(\bar X_n)=\frac{N-n}{N-1}\cdot \frac{\sigma^2}n.\tag5 $$ Notice the difference between formulas (4) and (5) and the corresponding formulas for sampling with replacement is a factor $\displaystyle\frac{N-n}{N-1}$, which is the famous correction factor for sampling without replacement.

grand_chat
  • 38,951
3

It is time to upgrade the methods you rely on, since listing every possibility yields the result for small values of $n$ and $N$ but this approach (1) leads to a dead end for general values (as you realized), and (2) provides no insight.

So... let us attack directly (c), considering $N$ chips numbered from $1$ to $N$, and samples of size $n$. Then the sample mean $\bar X$ is such that $n\bar X=\sum\limits_{k=1}^nY_k$ where $Y_k$ is the $k$th chip. By hypothesis, each $Y_k$ is uniform on $\{1,2,\ldots,N\}$ hence $E(Y_k)=\frac1N\sum\limits_{x=1}^Nx=\frac12(N+1)$ for every $k$ and, by linearity of the expectation, $$E(\bar X)=\frac{N+1}2.$$ Likewise, $n^2\bar X^2=\sum\limits_{k=1}^nY_k^2+\sum\limits_{k\ne\ell}Y_kY_\ell$, $E(Y_k^2)$ does not depend on $k$ and $E(Y_kY_\ell)$ does not depend on $k\ne\ell$, hence $nE(\bar X^2)=E(Y_1^2)+(n-1)E(Y_1Y_2)$ hence it suffices to compute $E(Y_1^2)$ and $E(Y_1Y_2)$. By the same argument as before, $E(Y_1^2)=\frac1N\sum\limits_{x=1}^Nx^2=\frac16(N+1)(2N+1)$. To compute $E(Y_1Y_2)$, one can note that, conditionally on $Y_k=x$, $Y_\ell$ is independent on $\{1,2,\ldots,N\}\setminus\{x\}$ and proceed.

Or, one can use the specific case $n=N$, since then, $N\bar X=\sum\limits_{x=1}^Nx$ with full probability, in particular, $NE(Y_1^2)+N(N-1)E(Y_1Y_2)=\frac14N^2(N+1)^2$, which leads to $E(Y_1Y_2)=\frac1{12}(N+1)(3N+2)$. Finally, $$E(\bar X^2)=\frac{2(N+1)(2N+1)+(N+1)(3N+2)(n-1)}{12n},$$ and the variance follows.

Sanity check: If $n=N$, then $E(\bar X^2)=E(\bar X)^2$ (do you see why?).

Did
  • 279,727
  • Hi thank you very much!! I followed every step and seen that it works. The "sanity check" was very useful for the conclusion (when n=N, $ \bar{X} $ will always be the same, hence its variance is equal to zero and the second moment equals the squared first moment). Initially I considered using uniform distributions for the value of the chips but hesitated after reading the "without replacement" part. I still do not feel completely sure as to why this can be done. Do you think you can further illustrate this assumption? (i.e. I thought X2 ranges over {1, 2, ..., N}{X1}) – Jonathan Julián Huerta Dec 30 '14 at 01:30
  • 1
    Indeed X2 ranges over {1, 2, ..., N}{X1} but this tells you the conditional distribution of X2. For the absolute distribution, one computes P(X2=x) as the sum over every y of P(X2=x,X1=y). If y=x, this is zero, for every other y (thus, N-1 cases), this is 1/(N(N-1)). Summing over y yields P(X2=x)=1/N. – Did Dec 30 '14 at 08:35