How could I find the expected value of the sum of the elements of a subset of $\{1,2,\ldots,n\}$ given that the elements must be distinct and the subset must be of size $k$, selected at random with $k<n$, with all integers in $\{1,2,\ldots,n\}$ having equal probability of being chosen.
-
You need to specify the distribution = how you pick the elements. – iiivooo Mar 25 '16 at 03:11
-
How are these subsets selected? – Graham Kemp Mar 25 '16 at 03:11
-
If any other clarifications are needed please do mention them. – Daniel Lee Mar 25 '16 at 03:12
-
"at random" is ambiguous. Does each integer in ${1,2,\dots,n}$ have an equal probability of being chosen? – robjohn Mar 25 '16 at 03:15
4 Answers
A "different" why to see that: we can call the chosen of some number of the set as some random variable $X_i$, and we can define the random variable of the sum as
$$X=\sum_{i=1}^{k}X_i$$
Then we have that
$$\Bbb E[X]=\Bbb E\left[\sum_{i=1}^{k}X_i\right]=\sum_{i=1}^{k}\Bbb E[X_i]$$
But we have that the $X_i$ are not independent but anyway they expected value is the same because
$$\Bbb E[X_i]=\sum_{x_1,x_2,...,x_i}x_i\Pr[X_i=x_i|X_1=x_1,X_2=x_2,...,X_{i-1}=x_{i-1}]\cdot\Pr[X_1=x_1,X_2=x_2,...,X_{i-1}=x_{i-1}]$$
where
$$\Pr[X_i=x_i|X_1=x_1,X_2=x_2,...,X_{i-1}=x_{i-1}]=\begin{cases}\frac1{n-i+1}\quad&\text{if }x_i\notin\{x_1,...,x_{i-1}\}\\0\quad&\text{if }x_i\in\{x_1,...,x_{i-1}\}\end{cases}$$
and
$$\Pr[X_1=x_1,X_2=x_2,...,X_{i-1}=x_{i-1}]=\frac1{(n)_{i-1}}$$
where $(a)_b$ is a falling factorial. So for any $x_i$ value we will have a limited number of (i-1)-tuplas $(x_1,x_2,...,x_{i-1})$ where probability is not zero. The number of these (i-1)-tuplas is $(n-1)_{i-1}$.
Then we have
$$\Bbb E[X_i]=\sum_{k=1}^{n}\frac{(n-1)_{i-1}}{(n)_{i-1}}k\frac{1}{n-i+1}=\frac1n\sum_{k=1}^{n}k=\frac{n+1}{2}$$
Then finally
$$\Bbb E[X]=\sum_{j=1}^{k}\Bbb E[X_j]=k\frac{n+1}{2}$$

- 30,417
If each integer in $\{1,2,\dots,n\}$ has an equal probability of being selected, then the expected value of the sum of $k$ of them is $$ \frac{k(n+1)}2 $$ This is because the expected value of one of them is $\frac1n\frac{n(n+1)}2$ and the expected value of the sum of $k$ of them is $k$ times the expected value of one (Linearity of Expectation).

- 345,667
So, it seems that we are selecting $k$ items from $n$ without repetition or bias.
$$\mathsf E(X) = ... = \sum_{i=1}^n \dfrac{i\cdot k}{n}= \tfrac 12 k(n+1)$$
Can you figure out why?

- 129,094
-
Ah yes, thank you. I was just unsure of whether my answer was correct. – Daniel Lee Mar 25 '16 at 03:18
-
Tip: If you have a solution you wish verified; post it with the (solution-verification) tag. – Graham Kemp Mar 25 '16 at 03:20
Comment: Couldn't resist simulation (correct to two or three significant digits).
m = 10^6; n = 20; k = 10; x = numeric(m)
for (i in 1:m) { x[i] = sum(sample(1:n, k)) }
mean(x); sd(x)
## 105.0115
## 13.22128
k*(n+1)/2
## 105
Now, can @GrahamKemp and @robjohn's method be used to get $E(X^2)$ and hence $V(X)?$
Added in view of Comment from @r.e.s. (for which, thanks):
v = var(1:20)*(n-1)/n # adj for pop. var; 'var' is sample var
sqrt(v*k*(n-k)/(n-1))
## 13.22876

- 51,500
-
1Getting the variance is a special case of the answer posted to Finding variance of the sample mean of a random sample of size n without replacement from finite population of size N.. In the present notation, it turns out that $V(X) = k \frac{n-k}{n-1}\sigma^2$, where $\sigma^2$ is the variance of the population ${1,...,n}$. – r.e.s. Mar 25 '16 at 14:26