2

From an urn with numbers $1,...,n$ we draw $k < n$ numbers without replacement.

Let $X_i$ be the $i$-th draw. The random variable is their sum $X=\sum_{i=1}^kX_i$.

I have already calculated the expected value of the sum, which is

$$\Bbb{E}[X]=\sum_{i=1}^k\Bbb{E}[X_i]=k\frac{n+1}{2}$$ because each $\Bbb{E}[X_i]=\frac{1}{n}\sum_{i=1}^n i=\frac{n+1}{2}$.

Now the variance of the sum would be $$Var[X]=\Bbb{E}[X^2]-\Bbb{E}[X]^2$$

I have read that the variance of a sum is the sum of variances if the random variables are independent, it does not seem to be the case here, as previous draws determine future draws.

Is there an elegant way to determine the first summand of the variance?


Edit: I am trying it the ugly way.

$\Bbb{E}[X^2]=\Bbb{E}[(\sum_{i=1}^kX_i)^2]=\Bbb{E}[\sum_{i=1}^k \sum_{j=1}^k X_iX_j]=\sum_{i=1}^k \sum_{j=1}^k \Bbb{E}[X_iX_j]$

To know $\Bbb{E}[X_iX_j]$ we would have to know $\Bbb{P}(X_iX_j=k)$, meaning we would have to know the number of ways to write a number as the product of two factors $1\leq X_i, X_j \leq n$... I am pretty sure I am off the track here, as I do not see a way to do it for a general $n$.


Am I wrong to regard the $X_i$ instead of the $X$, which are independent, as two draws of $k$ balls would be independent? Then $\Bbb{E}[X^2]=\Bbb{E}[X]\Bbb{E}[X]$

StubbornAtom
  • 17,052
B.Swan
  • 2,469
  • Your formula for the expectation is true only if the drawing is done with with replacement – Robin Nicole Jul 07 '19 at 20:28
  • 1
    Linearity of expected value holds even for dependent random variables and the probabilities of the draws are uniformly distributed. Are you sure? – B.Swan Jul 07 '19 at 20:36
  • Are you are right – Robin Nicole Jul 07 '19 at 20:53
  • 1
    Same question: https://math.stackexchange.com/questions/2813390/if-m-tickets-are-drawn-out-of-n-tickets-numbered-1-to-n-find-vx-whe. – StubbornAtom Jul 08 '19 at 17:54
  • Hello! Can you please explain for me why does the $\mathbb{E}[X_i]=\sum_{i=1}^n i \cdot \frac{1}{n}$? I mean, if we take just one number(I'm ok that the probability to be taken for each particular number is $\frac{1}{n}$) then the probability that we take any other will change: it will be $\frac{1}{n-1}$. I know, we can select first element a lot of ways and we can say that 'by symmetry' we can assume the probability for each draw is $\frac{1}{n}$. But as for me this explanation is not satisfying. Can you please explain why it is so, or provide some link/book where it is explained. – Levon Minasian Dec 02 '20 at 05:19
  • @LevonMinasian $\mathbb{E}[X_i]$ is the expected value for one draw. What you mean is the expected value for the sum, where dependency could play a role, but that is just the sum of expected values for each draw. As for why that is, you can look at a proof of expected value of sums of random variables (ex: http://www.milefoot.com/math/stat/rv-sums.htm). The main argument is that you can switch summation or intergral orders, giving you the "symmetry". – B.Swan Dec 02 '20 at 08:59
  • @B.Swan thanks for answering. I'll try to explain more precise what I misunderstood. It is pretty straightforward that $\mathbb{E}[X_1]=\sum_{i=1}^n i \cdot \frac 1 n$. But after taking this first number from the set, it becomes not that easy to show that $\mathbb{E}[X_2]=\sum_{i=1}^n i \cdot \frac 1 n$ (although, eventually I succeeded:)). It is more hard to show the same for $X_3$. As I see you take the arbitrary $i$ and say that $\mathbb{E}[X_i]=\sum_{i=1}^n i \cdot \frac 1 n$. But I don't understand why we can say this for arbitrary $i$. Is it obvious? This part is the scariest. – Levon Minasian Dec 02 '20 at 11:57
  • @B.Swan I've used the following to show this for $X_2$: $\mathbb{E}[X_2]=\sum_{i=1}^n \frac{1}{n} \cdot \sum_{j \ne i}^n j \cdot \frac{1}{n-1}=\frac{1}{n(n-1)} \sum_{i=1}^n \sum_{j \ne i} j=...=\frac {n+1} 2$. (I bet the missing part is obvious for you). As you understand, to do this even for $X_3$ is very uneasy. This is why I think you used another thoughts to explain the identity for arbitrary $X_i$. What was it? – Levon Minasian Dec 02 '20 at 12:06
  • @LevonMinasian I see what you mean. If my argument is correct, then it is clear (intuitively), again by a symmetry argument. Before you start drawing, every number is equally likely to be drawn in any draw. Thus the expected value of the draw should not depend on whether it is the first, second, third, ... draw. The expected values are NOT calculated AFTER you know what the first draw delivered, that would indeed change the expected value, they are calculated BEFORE you even start drawing. – B.Swan Dec 02 '20 at 18:46
  • @B.Swan this one was much better. Thanks! – Levon Minasian Dec 02 '20 at 19:15
  • 1
    https://math.stackexchange.com/q/972267/321264 – StubbornAtom Apr 03 '21 at 18:48

3 Answers3

4

Let's do it the ugly way. If any of the steps is confusing, let me know in the comments, I'll elaborate.

You have $$\mathbb{E}[X^2] = \sum_{i=1}^k \sum_{j=1}^k \mathbb{E}[X_iX_j] = \sum_{i=1}^k \mathbb{E}[X_i^2]+2\sum_{1\leq i < j\leq k} \mathbb{E}[X_iX_j]$$

The first term is easy to compute: $$ \sum_{i=1}^k \mathbb{E}[X_i^2] = k\cdot \frac{1}{n}\sum_{i=1}^n i^2 = \frac{k(n+1)(2n+1)}{6}\,. $$ The second... is similar. $$\begin{align*} 2\sum_{1\leq i < j\leq k} \mathbb{E}[X_iX_j] &= \binom{k}{2}\cdot \frac{1}{\binom{n}{2}} \sum_{\substack{1\leq i,j\leq n\\ i\neq j}} ij\\ &= \frac{k(k-1)}{n(n-1)}\left( \sum_{1\leq i,j\leq n} ij-\sum_{1\leq i\leq n} i^2 \right) \tag{Can you see why?}\\ &= \frac{k(k-1)}{n(n-1)}\left( \left(\sum_{i=1}^n i\right)^2-\sum_{i=1}^n i^2 \right) \tag{Can you see why?}\\ &= \frac{k(k-1)}{n(n-1)}\left( \left(\frac{n(n+1)}{2}\right)^2-\frac{n(n+1)(2n+1)}{6} \right) \\ &= \frac{k(k-1)}{n(n-1)}\left( \frac{n(n+1)(3n^2-n-2)}{12} \right) \end{align*}$$ so $$\begin{align} \mathbb{E}[X^2] - \mathbb{E}[X]^2 &= \frac{k(n+1)(2n+1)}{6} + \frac{k(k-1)(n+1)(3n^2-n-2)}{12(n-1)} - \frac{k^2(n+1)^2}{4}\\ &= \boxed{\frac{k(n-k)(n+1)}{12}} \end{align}$$

Sanity checks: the obtained expression is non-negative (good: it's a variance), and equal to $0$ for $k=n$ (good, this makes sense: if we decide to draw all the numbers, the sum is fixed). Moreover, for $k=1$, we do get $(n^2-1)/12$, which is indeed the variance of a uniform r.v. on $\{1,2,\dots,n\}$.

Clement C.
  • 67,323
  • Thanks a lot! I think that is where I failed, so I better ask: $\frac{1}{{n}\choose{2}}$ is the probability that we pick the pair of factors? And we have ${k}\choose{2}$ pairs of factors at disposal? – B.Swan Jul 07 '19 at 23:03
  • 1
    @B.Swan Yes, indeed. – Clement C. Jul 07 '19 at 23:10
  • I am not able to understand why $2\sum\limits_{1\leq i < j\leq k} \mathbb{E}[X_iX_j] = \binom{k}{2}\cdot \frac{1}{\binom{n}{2}} \sum\limits_{\substack{1\leq i,j\leq n\ i\neq j}} ij$. Can you explain why we are getting the $\binom{k}{2}, \frac1{\binom{n}{2}}$ – Soham Chatterjee Apr 08 '22 at 07:12
  • 1
    The $\binom{k}{2}$ comes from the fact that, if we take $k$ numbers without replacement, we have $\binom{k}{2}$ pairs $1\leq i,j\leq k$ with $i\neq j$. The rest is the expectation $\mathbb{E}[XY]$ for one of these pairs, noting that it takes each of the $\binom{n}{2}$possible values $(a,b)$ with $1\leq a,b\leq n$, $a\neq b$ with the same probability $1/\binom{n}{2}$. @SohamChatterjee – Clement C. Apr 09 '22 at 08:50
2

There is a more elegant way of proving this. We are trying to evaluate the expectation, $$ \mathbb{E}[X^2]=\sum_{i=1}^{k} \sum_{j=1}^k \mathbb{E}[X_i X_j] $$ We know that $\mathbb{E}[X_i]=(N+1)/2$. But given the number $X_j$ drawn on the $j$th draw the conditional expectation for $X_i$ is, $$ \mathbb{E}[X_i | X_j]=\frac{1}{N-1}\left( \sum_{X_i=1}^{N} X_i - X_j \right)=\frac{1}{N-1}\left( \frac{N(N+1)}{2} - X_j \right) $$ (After number $X_j$ is drawn there are $N-1$ numbers left.) If we now take the second expectation w.r.t. $X_j$ we get, $$ \mathbb{E}[\mathbb{E}[X_i | X_j]X_j]=\frac{1}{N-1}\left( \frac{N(N+1)}{2}\mathbb{E}[X_j] - \mathbb{E}[X_j^2] \right) $$ with, $$ \mathbb{E}[X_j^2]=\frac{1}{N} \sum_{n=1}^{N} n^2 = \frac{(N+1)(2N+1)}{6} $$ So when $i\neq j$, $$ \mathbb{E}[X_i X_j]=\frac{1}{N-1}\left( \frac{N(N+1)^2}{4} - \frac{(N+1)(2N+1)}{6} \right)=\frac{(N+1)(3N^2-N-2)}{12(N-1)} $$ When $i=j$, $$ \mathbb{E}[X_i^2]= \frac{(N+1)(2N+1)}{6} $$ The sum $\sum_{j=1}^k X_i X_j$ will have one term with $j=i$ and $k-1$ terms with $j \neq i$. Therefore, $$ \sum_{j=1}^{k} \mathbb{E}[X_i X_j] =\frac{(N+1)(2N+1)}{6}+\frac{(k-1)(N+1)(3N^2-N-2)}{12(N-1)} $$ If we now sum over the first index $i$ we get, $$ \sum_{i=1}^{k}\sum_{j=1}^{k} \mathbb{E}[X_i X_j]=k \sum_{j=1}^{k} \mathbb{E}[X_i X_j]=\frac{k(N+1)(2N+1)}{6}+\frac{k(k-1)(N+1)(3N^2-N-2)}{12(N-1)} $$ which is the same expression for $\mathbb{E}[X^2]$ as derived in the answer above.

Ted Black
  • 579
1

It's enough to compute the variance of a single draw. Then apply the following general formula, proved here:

Let $X_1, X_2,\ldots,X_k$ be drawn at random without replacement from a finite population of $n$ items, and let $S_k:=X_1+\cdots+X_k$ be their sum. Then $$\operatorname{Var}(S_k)=k\left(\frac{n-k}{n-1}\right)\sigma^2,$$ where $\sigma^2$ is the variance of a single draw.

For your situation, the population is the list $1,2,\ldots,n$ and a single draw $X$ from this population has distribution $P(X=i)=\frac1n$ for $i=1,\ldots,n$. So calculate $$E(X)=\sum_{i=1}^n i\,P(X=i)\stackrel{(1)}=\frac1n\frac{n(n+1)}2=\frac{n+1}2$$ $$E(X^2)=\sum_{i=1}^ni^2P(X=i)\stackrel{(2)}=\frac1n\frac{n(n+1)(2n+1)}6=\frac{(n+1)(2n+1)}6$$ where step (1) uses the identity for $\sum_{i=1}^ni$ and step (2) uses the identity for $\sum_{i=1}^n i^2$. Finally, after some algebra, conclude the variance of a single draw is $$\operatorname{Var}(X)=E(X^2)-[E(X)]^2=\frac{n^2-1}{12}.$$

grand_chat
  • 38,951