1

There are $2^n$ tickets in a jar.The frequency of the tickets of number $i$ is ${n \choose i}$ where $i=0,1,2,..n$.

$m$ tickets are drawn randomly without replacement.

Let S be the sum of the numbers drawn.Find $E(S)$ and variance of S.

I can't find a way to approach this problem.

Furrane
  • 1,562

4 Answers4

1

This problem is a type of coupon collector without replacement where there are ${n\choose j}$ tickets of type $j$ and we ask about the expectation of the sum of the ticket values after $m$ tickets have been drawn. Using the methodology from the following two MSE links we find that the EGF by multiplicity of a set of coupons of type $j$ is given by

$$\sum_{k=0}^{n\choose j} {n\choose j}^{\underline{k}} \frac{z^k}{k!} = (1+z)^{n\choose j}.$$

Distributing all $n$ types of coupons we get

$$\prod_{j=0}^n (1+z)^{n\choose j} = (1+z)^{2^n}$$

for a total count according to multiplicity of

$$m! [z^m] (1+z)^{2^n} = m! \times {2^n\choose m}.$$

Marking the contribution of a ticket of type $j$ with $u^j$ we obtain the mixed generating function

$$G(z, u) = \prod_{j=0}^n (1+u^j z)^{n\choose j}.$$

Differentiate and evaluate at $u=1$ to obtain

$$\left.\frac{\partial}{\partial u} G(z, u)\right|_{u=1} \\ = \left. \prod_{j=0}^n (1+u^j z)^{n\choose j} \sum_{j=0}^n (1+u^j z)^{-{n\choose j}} {n\choose j} (1+u^j z)^{{n\choose j}-1} j u^{j-1} z \right|_{u=1} \\ = (1+z)^{2^n} \sum_{j=1}^n {n\choose j} \frac{jz}{1+z} = z (1+z)^{2^n-1} \sum_{j=1}^n j {n\choose j} \\ = z (1+z)^{2^n-1} \sum_{j=1}^n n {n-1\choose j-1} = n 2^{n-1} z (1+z)^{2^n-1}.$$

Extracting coefficients we thus obtain for the expectation of the sum

$$\mathrm{E}[S] = {2^n\choose m}^{-1} n 2^{n-1} {2^n-1\choose m-1} = n 2^{n-1} \frac{m}{2^n} = \frac{1}{2} nm.$$

Continuing with the variance we evidently require the second factorial moment. Differentiating twice we get three components, the first is

$$\left. \prod_{j=0}^n (1+u^j z)^{n\choose j} \sum_{j=0}^n (1+u^j z)^{-{n\choose j}} {n\choose j} (1+u^j z)^{{n\choose j}-1} j(j-1) u^{j-2} z \right|_{u=1} \\ = z (1+z)^{2^n-1} \sum_{j=2}^n j(j-1) {n\choose j} \\ = z (1+z)^{2^n-1} \sum_{j=2}^n n(n-1) {n-2\choose j-2} = n(n-1) 2^{n-2} z (1+z)^{2^n-1}.$$

The second is

$$\left. \prod_{j=0}^n (1+u^j z)^{n\choose j} \sum_{j=0}^n (1+u^j z)^{-{n\choose j}} {n\choose j} \left({n\choose j}-1\right) (1+u^j z)^{{n\choose j}-2} j^2 u^{2j-2} z^2 \right|_{u=1} \\ = z^2 (1+z)^{2^n-2} \sum_{j=1}^n j^2 {n\choose j} \left({n\choose j}-1\right).$$

The third is

$$\left. 2\prod_{j=0}^n (1+u^j z)^{n\choose j} \sum_{j=0}^n (1+u^j z)^{-{n\choose j}} {n\choose j} (1+u^j z)^{{n\choose j}-1} j u^{j-1} z \\ \times \sum_{k=j+1}^n (1+u^k z)^{-{n\choose k}} {n\choose k} (1+u^k z)^{{n\choose k}-1} k u^{k-1} z \right|_{u=1} \\ = 2 z^2 (1+z)^{2^n-2} \sum_{j=0}^n {n\choose j} j \sum_{k=j+1}^n {n\choose k} k.$$

The coefficients on these last two may be joined and we get

$$-\sum_{j=1}^n j^2 {n\choose j} + \left(\sum_{j=1}^n j {n\choose j} \right)^2 \\ = -\sum_{j=1}^n j(j-1) {n\choose j} - \sum_{j=1}^n j {n\choose j} + \left(n \sum_{j=1}^n {n-1\choose j-1} \right)^2 \\ = - n(n-1)\sum_{j=2}^n {n-2\choose j-2} - n \sum_{j=1}^n {n-1\choose j-1} + n^2 2^{2n-2} \\ = - n(n-1) 2^{n-2} - n 2^{n-1} + n^2 2^{2n-2} = n^2 2^{2n-2} - n(n+1) 2^{n-2}.$$

Extracting coefficients we get for the second factorial moment

$$\frac{1}{4} n(n-1)m + (n^2 2^{2n-2} - n(n+1) 2^{n-2}) \frac{m(m-1)}{2^n(2^n-1)}$$

or alternatively

$$\mathrm{E}[S(S-1)] = \frac{1}{4} n(n-1)m + \frac{1}{4} \frac{m(m-1)}{2^n-1} (n^2 2^{n} - n(n+1)).$$

Finally recall that

$$\mathrm{Var}[S] = \mathrm{E}[S(S-1)] + \mathrm{E}[S] - \mathrm{E}[S]^2$$

so the answer to the problem posed by the OP is

$$\bbox[5px,border:2px solid #00A000]{ \mathrm{E}[S] = \frac{1}{2} nm}$$

and

$$\bbox[5px,border:2px solid #00A000]{ \mathrm{Var}[S] = \frac{1}{4} n(n+1)m + \frac{1}{4} \frac{m(m-1)}{2^n-1} (n^2 2^{n} - n(n+1)) - \frac{1}{4} n^2 m^2.}$$

As a sanity check when $m=2^n$ and all coupons have been drawn we have deterministically that

$$\mathrm{E}[S] = \sum_{j=0}^n j {n\choose j} = n \sum_{j=1}^n {n-1\choose j-1} = n 2^{n-1} = \frac{1}{2} n m$$

and the check goes through.

With this problem requiring careful algebra I also coded a simulation of the coupon collector that is featured here which was in excellent agreement on all values that were tested (outputs first and second factorial moment). Some optimizations are still possible which is left as an exercise to the reader.

#include <stdlib.h>
#include <stdio.h>
#include <assert.h>
#include <time.h>
#include <string.h>

int main(int argc, char **argv)
{
  int n = 4 , m = 2, trials = 1000; 

  if(argc >= 2){
    n = atoi(argv[1]);
  }

  if(argc >= 3){
    m = atoi(argv[2]);
  }

  if(argc >= 4){
    trials = atoi(argv[3]);
  }

  assert(1 <= n);
  assert(1 <= m && m <= 1 << n);
  assert(1 <= trials);

  int all = 1 << n;
  int bincfs[n+1];

  bincfs[0] = 1;
  for(int k = 1; k <= n; k++)
    bincfs[k] = bincfs[k-1]*(n+1-k)/k;

  srand48(time(NULL));
  long long data = 0, dataV = 0;

  for(int tind = 0; tind < trials; tind++){
    int src[1 << n];

    int srcpos = 0;
    for(int k = 0; k <= n; k++)
      for(int r = 0; r < bincfs[k]; r++)
        src[srcpos++] = k;

    int steps = 0; int sum = 0;

    while(steps < m){
      int cpidx = drand48() * (double)(all-steps);
      int coupon = src[cpidx];

      for(int cind=cpidx; cind < all-steps-1; cind++)
        src[cind] = src[cind+1];

      steps++;
      sum += coupon;
    }

    data += sum;
    dataV += sum*(sum-1);
  }

  long double
    fm1 = (long double)data/(long double)trials,
    fm2 = (long double)dataV/(long double)trials;

  printf("[n = %d, m = %d, trials = %d]: %Le, %Le\n", 
         n, m, trials, fm1, fm2);

  exit(0);
}
Marko Riedel
  • 61,317
0

Can you find the expectation of the value of one ticket? Now use the linearity of expectation. Can you find the variance of the value of one ticket? Look up the definition. What is the variance of a sum?

Ross Millikan
  • 374,822
0

You can easily deduce that, with $X$ the ticket number variable : $$\forall i \in [0,n], P(X = i) = {{n\choose i}\over 2^n}$$

From there, you can apply the formula for expectation :

$$E(X) = \sum_{i=0}^n P(X=i)\cdot i=\sum_{i=0}^n {{n\choose i}\over 2^n}\cdot i$$

Then the formula for variance :

$$V(X) = E[(X-E(X))^2]$$

Furrane
  • 1,562
  • 1
    Since tickets are not replaced, this is expectation for the value of the first ticket only? – dEmigOd Jul 18 '17 at 05:02
  • You're absolutely right @dEmigOd, and since the probability isn't equiprobable we can't just say $E(S)=m\cdot E(X)$.

    Looks like I missed the point of the author's post :/

    – Furrane Jul 18 '17 at 05:58
  • If $X_i$ denotes the number appearing on the $i$th ticket $(i=1,2,...,m)$, then sum of the numbers on the tickets drawn is $X=\sum_{i=1}^mX_i$. The probability distribution of $X_i$ is precisely what you have stated,i.e., $P(X_i=k)=\frac{1}{2^n}{\binom{n}{k}}$ for $k=0,1,...,n$. So the problem can be definitely solved using this. – StubbornAtom Jul 22 '17 at 12:10
  • My computation is right for the first draw but since it's without replacement and the tickets are not equiprobables I don't think we can conclude. – Furrane Jul 22 '17 at 12:21
0

Without resorting to generating functions we can do the following:

Let $X_i$ denote the number on the $i$th ticket for $i=1,2,...,m$, so that $S=\sum\limits_{i=1}^mX_i$.

The pmf of $X_i$ for all $i=1,2,\ldots,m$ is then $$P(X_i=k)=\begin{cases}\frac{1}{2^n}\binom{n}{k}&,\text{ if } k=0,1,\ldots,n\\\\\quad0&,\text{ otherwise}\end{cases}$$

Since $\operatorname{E}(X_i)=\frac{n}{2}$, it follows that $$\operatorname{E}(S)=\frac{mn}{2}$$

Again $\operatorname{E}(X_i^2)=\frac{n(n+1)}{4}$, giving $$\operatorname{Var}(X_i)=\operatorname{E}(X_i^2)-(\operatorname{E}(X_i))^2=\frac{n}{4}$$

The calculation of variance of $S$ is a bit more involved.

\begin{align} \operatorname{Var}(S)&=\sum_{i=1}^m\operatorname{Var}(X_i)+2\sum_{i<j}\operatorname{Cov}(X_i,X_j) \\&=m\cdot\frac{n}{4}+2\binom{m}{2}\rho\cdot\frac{n}{4}\qquad,\small\rho\text{ is the correlation between $X_i$ and $X_j$} \\&=\frac{mn}{4}\left(1+(m-1)\rho\right)\tag{1} \end{align}

Now observe that $$\operatorname{Var}\left(\sum_{i=1}^{\color\green{2^n}}X_i\right)=0\tag{2}$$, the sum within parentheses being a constant.

Moreover the joint distribution of $(X_i,X_j)$ for all $i\ne j$ is independent of $m$.

So we can replace $m$ by $2^n$ in $(1)$ to get from $(2)$:

$$\frac{n2^n}{4}(1+(2^n-1)\rho)=0$$

That is, $$\rho=\frac{1}{1-2^n}$$

Substituting the value of $\rho$ in $(1)$, we get $$\operatorname{Var}(S)=\frac{mn}{4}\left(1+\frac{m-1}{1-2^n}\right)$$

StubbornAtom
  • 17,052