Finite sample bounds for generalized coupon collector problem

Question

Consider the usual generalization of the coupon collector problem where $m$ copies of each coupon need to be collected. Let $T_m$ be the first time m copies of each coupon are collected. Donald J. Newman and Lawrence Shepp showed that $$\mathbf{E} (T_{m})=n\log n+(m-1)n\log \log n+O(n),\ {\text{as}}\ n\to \infty $$

Then Erdős and Rényi showed that:

$$\displaystyle \operatorname {P} {\bigl (}T_{m}<n\log n+(m-1)n\log \log n+cn{\bigr )}\to e^{-e^{-c}/(m-1)!},\ \ {\text{as}}\ n\to \infty .$$

But these statement are asymptotic. Are there known finite sample (upper and lower) bounds on $T_m$ ?

Relevant: https://math.stackexchange.com/questions/151267/coupon-collector-problem-for-collecting-set-k-times — Clement C., Sep 12 '17 at 12:27
(and e.g., Section 4 of http://www.brynmawr.edu/math/people/anmyers/PAPERS/SIGEST_Coupons.pdf for the pgf) — Clement C., Sep 12 '17 at 12:29
The above link has gone dead; here's an archived version: http://web.archive.org/web/20110307091459/https://www.brynmawr.edu/math/people/anmyers/PAPERS/SIGEST_Coupons.pdf. — joriki, Jan 04 '24 at 14:35

Marko Riedel · Accepted Answer · 2017-09-18T01:01:56.283

What follows is a minor contribution where we compute a formula for the expectation for the case where $j$ instances of each of $n$ types of coupons must be seen. Using the notation from the following MSE link we have from first principles that

$$P[T = m] = \frac{1}{n^m}\times {n\choose 1}\times (m-1)! [z^{m-1}] \left(\exp(z) - \sum_{q=0}^{j-1} \frac{z^q}{q!}\right)^{n-1} \frac{z^{j-1}}{(j-1)!} \\ = \frac{n}{n^m}\times {m-1\choose j-1} (m-j)! [z^{m-j}] \left(\exp(z) - \sum_{q=0}^{j-1} \frac{z^q}{q!}\right)^{n-1}.$$

We verify that this is a probability distribution. The goal here is to find a closed form for the infinite series in $m$ so that its value may be calculated rather than approximated. Expanding the power we find

$$\sum_{m\ge j} P[T=m] = \frac{n}{n^j} \sum_{m\ge 0} \frac{1}{n^m} \times {m+j-1\choose j-1} m! [z^m] \sum_{k=0}^{n-1} {n-1\choose k} \exp(kz) \\ \times (-1)^{n-1-k} \left(\sum_{q=0}^{j-1} \frac{z^q}{q!}\right)^{n-1-k} \\ = \frac{n}{n^j} \sum_{m\ge 0} \frac{1}{n^m} \times {m+j-1\choose j-1} m! \sum_{k=0}^{n-1} {n-1\choose k} \sum_{p=0}^m \frac{k^{m-p}}{(m-p)!} \\ \times (-1)^{n-1-k} [z^p] \left(\sum_{q=0}^{j-1} \frac{z^q}{q!}\right)^{n-1-k} \\ = \frac{n}{n^j} \sum_{k=0}^{n-1} {n-1\choose k} (-1)^{n-1-k} \sum_{p\ge 0} [z^p] \left(\sum_{q=0}^{j-1} \frac{z^q}{q!}\right)^{n-1-k} \\ \times \sum_{m\ge p}\frac{1}{n^m} \times {m+j-1\choose j-1} m! \times \frac{k^{m-p}}{(m-p)!} \\ = \frac{n}{n^j} \sum_{k=0}^{n-1} {n-1\choose k} (-1)^{n-1-k} \sum_{p\ge 0} n^{-p} [z^p] \left(\sum_{q=0}^{j-1} \frac{z^q}{q!}\right)^{n-1-k} \\ \times \sum_{m\ge 0}\frac{1}{n^m} \times {m+p+j-1\choose j-1} (m+p)! \times \frac{k^m}{m!}$$

The inner sum is

$$\frac{1}{(j-1)!} \sum_{m\ge 0} (k/n)^m \frac{(m+p+j-1)!}{m!} = \frac{(p+j-1)!}{(j-1)!} \frac{1}{(1-k/n)^{p+j}}$$

and with $P=(n-1-k)(j-1)$ we obtain

$$\bbox[5px,border:2px solid #00A000]{ n \sum_{k=0}^{n-1} {n-1\choose k} \frac{(-1)^{n-1-k}}{(n-k)^j} \times \sum_{p=0}^P \frac{1}{(n-k)^p} \frac{(p+j-1)!}{(j-1)!} [z^p] \left(\sum_{q=0}^{j-1} \frac{z^q}{q!}\right)^{n-1-k}.}$$

We claim that the inner sum is $(n-k)^{j-1},$ proof for $j=2$ at end of document. With this the sum reduces to

$$n \sum_{k=0}^{n-1} {n-1\choose k} \frac{(-1)^{n-1-k}}{n-k} \\ = - \sum_{k=0}^{n-1} {n\choose k} (-1)^{n-k} = 1 - \sum_{k=0}^{n} {n\choose k} (-1)^{n-k} = 1$$

and we see that we indeed have a probability distribution.

Continuing with the expectation and re-capitulating the earlier computation we find

$$E[T] = \sum_{m\ge j} m P[T = m] = \frac{n}{n^j} \sum_{k=0}^{n-1} {n-1\choose k} (-1)^{n-1-k} \sum_{p\ge 0} n^{-p} [z^p] \left(\sum_{q=0}^{j-1} \frac{z^q}{q!}\right)^{n-1-k} \\ \times \sum_{m\ge 0}\frac{1}{n^m} \times {m+p+j-1\choose j-1} (m+p)! (m+p+j) \times \frac{k^m}{m!}$$

The inner sum has two pieces, the first is

$$\frac{1}{(j-1)!} \sum_{m\ge 1} (k/n)^m \frac{(m+p+j-1)!}{(m-1)!} = \frac{1}{(j-1)!} \frac{k}{n} \sum_{m\ge 0} (k/n)^m \frac{(m+p+j)!}{m!} \\ = \frac{k}{n} \frac{(p+j)!}{(j-1)!} \frac{1}{(1-k/n)^{p+j+1}} = \frac{k}{n-k} \frac{(p+j)!}{(j-1)!} \frac{1}{(1-k/n)^{p+j}}$$

and the second has been evaluated when we summed the probabilities to give

$$(p+j) \frac{(p+j-1)!}{(j-1)!} \frac{1}{(1-k/n)^{p+j}} = \frac{(p+j)!}{(j-1)!} \frac{1}{(1-k/n)^{p+j}}.$$

Substituting these into the outer sum we thus obtain

$$\bbox[5px,border:2px solid #00A000]{E[T] = n^2 \sum_{k=0}^{n-1} {n-1\choose k} \frac{(-1)^{n-1-k}}{(n-k)^{j+1}} \times \sum_{p=0}^P \frac{1}{(n-k)^p} \frac{(p+j)!}{(j-1)!} [z^p] \left(\sum_{q=0}^{j-1} \frac{z^q}{q!}\right)^{n-1-k}.}$$

There is a very basic program which confirmed this formula for several digits of precision by simulation which is written in C and goes as follows.

#include <stdlib.h>
#include <stdio.h>
#include <assert.h>
#include <time.h>

int main(int argc, char **argv)
{
  int n = 6 , j = 3, trials = 1000; 

  if(argc >= 2){
    n = atoi(argv[1]);
  }

  if(argc >= 3){
    j = atoi(argv[2]);
  }

  if(argc >= 4){
    trials = atoi(argv[3]);
  }

  assert(1 <= n);
  assert(1 <= j);
  assert(1 <= trials);

  srand48(time(NULL));
  long long data = 0;

  for(int tind = 0; tind < trials; tind++){
    int seen = 0; int steps = 0; 
    int dist[n];

    for(int cind = 0; cind < n; cind++){
      dist[cind] = 0;
    }

    while(seen < n){
      int coupon = drand48() * (double)n;

      steps++;

      if(dist[coupon] == j-1)
        seen++;
      dist[coupon]++;
    }

    data += steps;
  }

  long double expt = (long double)data/(long double)trials;
  printf("[n = %d, j = %d, trials = %d]: %Le\n", 
         n, j, trials, expt);

  exit(0);
}

Proof of inner sum for $j=2.$ Setting $j=2$ we have to show that

$$n-k = \sum_{p=0}^{n-1-k} {n-1-k\choose p} \frac{1}{(n-k)^p} (p+1)!.$$

This is

$$\frac{1}{(n-1-k)!} = \sum_{p=0}^{n-1-k} \frac{1}{(n-1-k-p)!} \frac{p+1}{(n-k)^{p+1}}.$$

Re-writing as follows

$$\frac{1}{m!} = \sum_{p=0}^{m} \frac{1}{(m-p)!} \frac{p+1}{(m+1)^{p+1}}.$$

and introducing

$$\frac{1}{(m-p)!} = \frac{1}{2\pi i} \int_{|w|= \gamma} \frac{1}{w^{m-p+1}} \exp(w) \; dw$$

we obtain for the sum (the integral vanishes nicely when $p\gt m$ so we may extend $p$ to infinity)

$$\frac{1}{2\pi i} \int_{|w|= \gamma} \frac{1}{w^{m+1}} \exp(w) \frac{1}{m+1} \sum_{p\ge 0} \frac{p+1}{(m+1)^p} w^p \; dw \\ = \frac{1}{m+1} \frac{1}{2\pi i} \int_{|w|= \gamma} \frac{1}{w^{m+1}} \exp(w) \frac{1}{(1-w/(m+1))^2} \; dw.$$

We now use the fact that residues sum to zero and the poles are at $w=0$, $w=m+1$ and $w=\infty.$ We get for the residue at infinity

$$-\frac{1}{m+1} \mathrm{Res}_{w=0} \frac{1}{w^2} w^{m+1} \exp(1/w) \frac{1}{(1-1/w/(m+1))^2} \\ = - \frac{1}{m+1} \mathrm{Res}_{w=0} w^{m+1} \exp(1/w) \frac{1}{(w-1/(m+1))^2} \\ = - (m+1) \mathrm{Res}_{w=0} w^{m+1} \exp(1/w) \frac{1}{(1-w(m+1))^2} \\ = -(m+1) [w^{-(m+2)}] \exp(1/w) \frac{1}{(1-w(m+1))^2}.$$

Extracting coefficients we find

$$-(m+1)\sum_{q\ge 0} \frac{1}{(q+m+2)!} (q+1) (m+1)^q.$$

This is

$$-(m+1) \left(\sum_{q\ge 0} \frac{1}{(q+m+1)!} (m+1)^q - \sum_{q\ge 0} \frac{1}{(q+m+2)!} (m+1)^{q+1}\right) \\ = -(m+1) \left(\sum_{q\ge 0} \frac{1}{(q+m+1)!} (m+1)^q - \sum_{q\ge 1} \frac{1}{(q+m+1)!} (m+1)^{q}\right) \\ = -(m+1) \frac{(m+1)^0}{(m+1)!} = - \frac{1}{m!}.$$

We thus have the claim if we can show the residue at $w=m+1$ is zero. We use

$$(m+1)\frac{1}{2\pi i} \int_{|w|= \gamma} \frac{1}{w^{m+1}} \exp(w) \frac{1}{(w-(m+1))^2} \; dw.$$

and observe that

$$\left.\left(\frac{1}{w^{m+1}} \exp(w) \right)'\right|_{w=m+1} \\= \left.\left( -(m+1)\frac{1}{w^{m+2}} \exp(w) + \frac{1}{w^{m+1}} \exp(w) \right)\right|_{w=m+1} \\ = \exp(m+1) \left(-(m+1)\frac{1}{(m+1)^{m+2}}+\frac{1}{(m+1)^{m+1}}\right) = 0$$

as required. This concludes the computation.

Finite sample bounds for generalized coupon collector problem

1 Answers1

Linked