How deep in a stack of $n$ exams should I expect to look to find $k$ specific exams?

Question

The other day I was grading final exams for a course I was teaching. At one point we had all $n$ exams stacked up and needed to search the stack of exams for $k$ specific students’ exams. I therefore started at the top of the pile and started flipping through the exams looking for them. Through quite a coincidence I found $k-1$ of the exams pretty close to the top, but the very last exam was fourth from the bottom. It therefore took quite a lot of time to find them all!

That got me thinking about how many exams, in expectation, I would have to look at to find all $k$ of the exams I was looking for. For example, if I’m looking for a single exam $(k = 1)$, then on expectation I’d look at $\frac{n}{2}$ exams because that one exam has an equal chance of being anywhere in the pile. At the other extreme, if I’m looking for all $n$ exams $(n = k)$, then I’ll have to look at all $n$ exams to find them all.

There’s clearly some sort of “interpolation” between these two extreme cases, but I’m not sure how to work out the math on this.

My current progress is as follows: I believe (?) the probability that all $k$ of the exams in question occur in the first $r$ exams in the stack is given by counting all permutations formed by picking $k$ slots from the first $r$ positions, and for each one permuting the $k$ exams in question in those slots and the $n - k$ other exams outside those slots:

$$\frac{\binom{r}{k}k!(n - k)!}{n!} = \frac{(n - k)!r!}{n!(r - k)!}$$

I imagine that there’s some way to go from this expression to a proper expected value, and ideally from there to some nice asymptotic approximation that works for large $n$ and small $k$, but I’m not sure how to do either of those.

Any advice about how to proceed?

For the asymptotic approximation, you can treat the distance of the last paper from the bottom of the stack as the minimum of i.i.d. uniform random variables: https://math.stackexchange.com/questions/786392 — Karl, Dec 17 '22 at 20:07
@Karl I believe this is better directly computed as a Hypergeometric random variable. If $n$ large, can be approximated by a binomial distribution. — Alborz, Dec 17 '22 at 20:09
Take $X_1, ..., X_k$ random variables which are uniformly distributed on $[1, ..., n]$. Then the value you're looking for is $E[\max(X_1, ..., X_k)]$. Based on the answer that @Karl linked, I would guess that the answer is approximately $n \cdot \frac{k}{k+1}$, but I'd need someone to double-check that. EDIT: I realized that's not quite right, because it doesn't account for the fact that the $X_i$ must not be equal, but if $n$ is much larger than $k$ it might not make much of a difference. — Sambo, Dec 17 '22 at 20:36

score 3 · Accepted Answer · answered Dec 17 '22 at 20:38

Notice that your expression for the probability that all $k$ exams are in the first $r$ is $$\frac{\binom rk k! (n-k)!}{n!} = \frac{\binom rk}{\binom nk}$$ Let's try starting from here. Let $R$ be the number of exams you have to look at before finding all $k$. Then $$\mathbb E[R] = \sum_{r=1}^\infty \mathbb P[R \geq r] = \sum_{r=0}^n \mathbb P[R > r] = \sum_{r=0}^n \left( 1 - \frac{\binom rk}{\binom nk} \right) = (n+1)-\frac1{\binom nk} \sum_{r=k}^n \binom rk$$ Now we use the hockey stick identity to find that $$\sum_{r=k}^n \binom rk = \binom{n+1}{k+1}$$ so $$\mathbb E[R] = (n+1) - \frac{\binom{n+1}{k+1}}{\binom nk} = n+1 - \frac{n+1}{k+1} = \frac{k(n+1)}{k+1}$$ We see that if we set $k=1$, we get $\mathbb E[R] = \frac{n+1}2$ (the average of the numbers from $1$ to $n$). If we set $k = n$, we get $\mathbb E[R]=n$. Additionally, our expectation is an increasing function of $k$, which is what we'd expect.

score 1 · Answer 2 · edited Dec 17 '22 at 20:29

1

This can be modeled by a Hypergeometric distribution with parameters $r,n,k$ where $n$ is the number of tests (the population size), $k$ is the number of members of the population satisfying the feature of being a student-of-interest's test, $r$ is the number of samples from the population (tests actually pulled from the pile and examined). Thus, letting $X$ be the number of successful finds from the pile, $X\sim\operatorname{Hypergeometric}(r,n,k)$, and it is well known that $$E[X]=\frac{rk}{n}$$

edited Dec 17 '22 at 20:29

K.defaoite

12,536

answered Dec 17 '22 at 20:16

Alborz

1,173
2
13

1

I think OP wants the expected number of draws required to see all $k$ of the desired exams. – Karl Dec 17 '22 at 20:29

How deep in a stack of $n$ exams should I expect to look to find $k$ specific exams?

2 Answers2