3

How could I find the expected value of the sum of the elements of a subset of $\{1,2,\ldots,n\}$ given that the elements must be distinct and the subset must be of size $k$, selected at random with $k<n$, with all integers in $\{1,2,\ldots,n\}$ having equal probability of being chosen.

4 Answers4

3

A "different" why to see that: we can call the chosen of some number of the set as some random variable $X_i$, and we can define the random variable of the sum as

$$X=\sum_{i=1}^{k}X_i$$

Then we have that

$$\Bbb E[X]=\Bbb E\left[\sum_{i=1}^{k}X_i\right]=\sum_{i=1}^{k}\Bbb E[X_i]$$

But we have that the $X_i$ are not independent but anyway they expected value is the same because

$$\Bbb E[X_i]=\sum_{x_1,x_2,...,x_i}x_i\Pr[X_i=x_i|X_1=x_1,X_2=x_2,...,X_{i-1}=x_{i-1}]\cdot\Pr[X_1=x_1,X_2=x_2,...,X_{i-1}=x_{i-1}]$$

where

$$\Pr[X_i=x_i|X_1=x_1,X_2=x_2,...,X_{i-1}=x_{i-1}]=\begin{cases}\frac1{n-i+1}\quad&\text{if }x_i\notin\{x_1,...,x_{i-1}\}\\0\quad&\text{if }x_i\in\{x_1,...,x_{i-1}\}\end{cases}$$

and

$$\Pr[X_1=x_1,X_2=x_2,...,X_{i-1}=x_{i-1}]=\frac1{(n)_{i-1}}$$

where $(a)_b$ is a falling factorial. So for any $x_i$ value we will have a limited number of (i-1)-tuplas $(x_1,x_2,...,x_{i-1})$ where probability is not zero. The number of these (i-1)-tuplas is $(n-1)_{i-1}$.

Then we have

$$\Bbb E[X_i]=\sum_{k=1}^{n}\frac{(n-1)_{i-1}}{(n)_{i-1}}k\frac{1}{n-i+1}=\frac1n\sum_{k=1}^{n}k=\frac{n+1}{2}$$

Then finally

$$\Bbb E[X]=\sum_{j=1}^{k}\Bbb E[X_j]=k\frac{n+1}{2}$$

Masacroso
  • 30,417
2

If each integer in $\{1,2,\dots,n\}$ has an equal probability of being selected, then the expected value of the sum of $k$ of them is $$ \frac{k(n+1)}2 $$ This is because the expected value of one of them is $\frac1n\frac{n(n+1)}2$ and the expected value of the sum of $k$ of them is $k$ times the expected value of one (Linearity of Expectation).

robjohn
  • 345,667
1

So, it seems that we are selecting $k$ items from $n$ without repetition or bias.

$$\mathsf E(X) = ... = \sum_{i=1}^n \dfrac{i\cdot k}{n}= \tfrac 12 k(n+1)$$

Can you figure out why?

Graham Kemp
  • 129,094
1

Comment: Couldn't resist simulation (correct to two or three significant digits).

 m = 10^6;  n = 20;  k = 10;  x = numeric(m)
 for (i in 1:m) { x[i] = sum(sample(1:n, k)) }
 mean(x);  sd(x)
 ## 105.0115
 ## 13.22128
 k*(n+1)/2
 ## 105

Now, can @GrahamKemp and @robjohn's method be used to get $E(X^2)$ and hence $V(X)?$

Added in view of Comment from @r.e.s. (for which, thanks):

 v = var(1:20)*(n-1)/n  # adj for pop. var; 'var' is sample var
 sqrt(v*k*(n-k)/(n-1))
 ##  13.22876
BruceET
  • 51,500