Birthday Problem: Asymptotics of Expected Time Until a Match Occurs

Question

I'm working on a variant of the birthday problem that I haven't found discussed on this site.

Suppose the sequence $(X_n)$ of independent random variables takes values uniformly in $\{ 1,...,N \}$. Let $F_{N} = \min\{ m: X_m = X_k, k<m \}$ be the first time that a match is observed.

I want to know what can be said about $E(F_N)$ as $N \to \infty$.

It's easy to see that $$P(F_N = k) = \frac{N}{N} \frac{N-1}{N}... \frac{N - (k-2)}{N} \frac{k-1}{N}.$$

Hence, $$E(F_N) = \sum_{k=2}^{N+1} k \Big[\frac{N}{N} \frac{N-1}{N}... \frac{N - (k-2)}{N} \frac{k-1}{N} \Big]. $$

Any suggestions about where to go from here?

Note that $$P(F_N \geqslant k+2) = \prod_{i=1}^{k}\left(1-\frac{i}N\right)$$ and that the RHS is of order $$e^{-k^2/(2N)}$$ hence, if $k_N^2\ll N\ll j_N^2$, then $$P(F_N\geqslant k_N)\to1\qquad P(F_N\geqslant j_N)\to0$$ In this sense, $$F_N=\Theta(\sqrt{N})$$ and one can guess that the same asymptotics holds for $E(F_N)$. — Did, Sep 25 '16 at 22:43
Remark: to get the first asymptotic Did stated, note that $\log \prod_{i=1}^k (1-i/N) = \sum_{i=1}^k \log(1-i/N) \approx \sum_{i=1}^k -i/N = -\frac{k(k+1)}{2N} \approx \frac{-k^2}{2N}$. Then exponentiate everywhere. These approximations are reasonable in the region $1 \ll k \ll N$. — Ian, Sep 25 '16 at 22:58
Maybe one can follow @Did's idea to obtain $E(F_N)\sim c \sqrt N$ with some explicit positive constant $c$. — Sungjin Kim, Sep 25 '16 at 23:12
Conjecture: $$\lim_{N\to\infty}\frac{E(F_N)}{\sqrt{N}}=\sqrt{\frac\pi2}$$ — Did, Sep 25 '16 at 23:15
@Did An empirical calculation and bold extrapolation suggests $1.2533$ might be close, and this is indeed $\sqrt{\frac{\pi}2}$ rounded — Henry, Sep 25 '16 at 23:22
Sketch of a possible proof of Did's conjecture: $E[F_N]=\sum_{k=1}^{N+1} P(F_N \geq k)=2+\sum_{k=1}^{N-1} P(F_N \geq k+2) \sim 2+\sum_{k=1}^{N-1} e^{-k^2/2N}$. (The last step requires proof.) Finally if we look at $\frac{1}{\sqrt{N}} \sum_{k=1}^N e^{-k^2/2N}$, we may consider dividing the interval $[0,\sqrt{N}]$ into $N$ subintervals of length $\frac{1}{\sqrt{N}}$. The desired sum is is then a rectangle rule where $x_k=k/\sqrt{N}$ so that $x_k^2/2=k^2/2N$. — Ian, Sep 25 '16 at 23:34
So we would hope that this sum would behave like $\int_0^{\sqrt{N}} e^{-x^2/2} dx$ which of course converges to $\frac{\sqrt{2 \pi}}{2} = \sqrt{\pi/2}$ as Did conjectured. This rectangle rule step still requires proof, because of the growth of the domain of integration, but I suspect that proof is not really so difficult: instead of trying to argue that you are approximating $\int_0^{\sqrt{N}}$, instead throw in an additional term so that it "looks like" you are approximating $\int_0^\infty$ and control the tail using standard techniques. — Ian, Sep 25 '16 at 23:34
The expectation is the ratio of OEIS A063170 and OEIS A000312 and in the "Formula section" of the former N-E. Fahssi gives the equivalent asymptotic as @Did — Henry, Sep 25 '16 at 23:34
@DId Would a similar result hold, for example, the time of the second match $F_N^{(2)}$, the third match $F_N^{(3)}$, etc.? — Sungjin Kim, Sep 26 '16 at 20:59
@i707107 It seems that, for every fixed $n$, setting $F_N^{(0)}=0$, the random vector $$\left(\frac{F_N^{(k)}-F_N^{(k-1)}}{\sqrt{N}}\right){1\leqslant k\leqslant n}$$ converges in distribution to a continuous nonnegative random vector with joint PDF $$x_1x_2\cdots x_n,e^{-(x_1+x_2+\cdots+x_n)^2/2}$$ This suggests that each $(F_N^{(n)}-F_N^{(n-1)})/\sqrt{N}$ converges in distribution to a random variable with PDF $$xe^{-x^2/2}$$ and that, for every fixed $n$, $$\lim{N\to\infty}\frac{E(F_N^{(n)})}{\sqrt{N}}=n,\sqrt{\frac{\pi}2}$$ — Did, Sep 27 '16 at 05:53
@Did I calculated the joint PDF as $N\rightarrow\infty$. I am not sure if the expression you have, and that I have are equivalent. — Sungjin Kim, Sep 28 '16 at 04:40
@Did It seems that $x_2$ needs to be replaced by $x_1+x_2$, $\ldots$ , $x_n$ needs to be replaced by $x_1+x_2+\cdots x_n$. — Sungjin Kim, Sep 28 '16 at 05:07
@i707107 Indeed, I stand corrected. Then $$\lim_{N\to\infty}\frac{E(F_N^{(k)})}{\sqrt{N}}=m_k$$ with $$m_k=\frac1{2^{k-1}(k-1)!}\int_0^\infty x^{2k}e^{-x^2/2}dx=\frac{(2k-1)!!}{2^{k-1}(k-1)!}\sqrt{\frac\pi2}=\frac{k}{2^{2k-1}}{2k\choose k}\sqrt{\frac\pi2}$$ — Did, Sep 28 '16 at 06:23
...and, again unless I am mistaken, $$\lim_{k\to\infty}\frac{m_k}{\sqrt{k}}=2\sqrt2$$ — Did, Sep 28 '16 at 06:30
@Did This is interesting that $m_k$ grows as fast as $c\sqrt k$ which would make more sense because there are more track records of $X_i$ to have matches as $k$ increase. Thus, as $k$ increase, the match occurs more quickly. By the way, when I computed, it was $\sqrt 2$. — Sungjin Kim, Sep 28 '16 at 15:23

score 7 · Accepted Answer · edited Apr 13 '17 at 12:19

The probability of the first match on the $k^\text{th}$ trial is $$ \begin{align} &\overbrace{\frac nn\frac{n-1}n\cdots\frac{n-k+2}n}^{\text{no match in $k-1$ trials}}-\overbrace{\frac nn\frac{n-1}n\cdots\frac{n-k+1}n}^{\text{no match in $k$ trials}}\\ &=\frac{n!}{n^{k-1}(n-k+1)!}-\frac{n!}{n^k(n-k)!}\\ &=\frac{n!}{n^{k-1}(n-k+1)!}-\frac{n!}{n^{k-1}(n-k+1)!}\frac{n-k+1}n\\ &=\frac{n!\,(k-1)}{n^k(n-k+1)!}\tag{1} \end{align} $$ Therefore, the expected value is $$ \begin{align} E(F_n)= &\sum_{k=0}^n\frac{n!\,k(k-1)}{n^k(n-k+1)!}\\ &=\frac{n!}{n^{n+1}}\sum_{k=0}^n\frac{k(k-1)}{(n-k+1)!}n^{n-k+1}\\ &=\frac{n!}{n^{n+1}}\sum_{k=1}^{n-1}\frac{(n-k+1)(n-k)}{k!}n^k\\ &=\frac{n!}{n^{n+1}}\sum_{k=1}^{n-1}\frac{n(n+1)-2kn+k(k-1)}{k!}n^k\\ &=\frac{(n+1)!}{n^n}\sum_{k=1}^{n-1}\frac{n^k}{k!}-\frac{2n!}{n^{n-1}}\sum_{k=0}^{n-2}\frac{n^k}{k!}+\frac{n!}{n^{n-1}}\sum_{k=0}^{n-3}\frac{n^k}{k!}\\ &=-\frac{(n+1)!}{n^n}+\frac{n!}{n^n}\sum_{k=0}^n\frac{n^k}{k!}\tag{2} \end{align} $$ Applying equation $(11)$ from this answer and Stirling's Approximation gives the expected value as $$ \bbox[5px,border:2px solid #C0A000]{E(F_n)=\frac12\sqrt{2\pi n}+\frac23+O\left(\frac1{\sqrt{n}}\right)}\tag{3} $$

Extended Asymptotics

Extending the computation we did for $(3)$, we get $$ E(F_n) =\sqrt{2\pi n}\left(\frac12+\frac1{24n}+\frac1{576n^2}\right) +\left(\frac23-\frac4{135n}+\frac8{2835n^2}\right) +O\left(\frac1{n^{5/2}}\right)\tag{4} $$

Sungjin Kim · Answer 2 · 2016-09-28T15:27:54.143

This is an elaboration of what @Did commented on $n$-th match.

Fix $K>0$. Denote by $F_N^{(n)}$ the time of $n$-th match and set $F_N^{(0)}=0$. For any fixed $n\geq 1$, we will find the joint CDF: For positive integers $k_1, \ldots , k_n $ with $0<k_1<k_2<\cdots <k_n\leq N$ and $k_n \leq K\sqrt N $, $$ \begin{align} P &(F_N^{(1)} =k_1, \ldots , F_N^{(n)} =k_n ) \\ &=\frac{k_1-1}N\prod_{i=1}^{k_1-2}\left(1-\frac{i}N\right) \frac{k_2-2}N\prod_{i=k_1-1}^{k_2-3}\left(1-\frac iN\right)\cdots \frac{k_n-n}N\prod_{i=k_{n-1}-(n-1) }^{k_n-(n+1)}\left(1-\frac iN\right)\\ &=\frac{k_1-1}N \cdots \frac{k_n-n}N \prod_{i=1}^{k_n-(n+1)}\left(1-\frac iN\right)\\ &=\frac{k_1-1}N \cdots \frac{k_n-n}N \exp\left( -\frac{(k_n-(n+1))^2}{2N}+O(N^{-2})\right). \end{align} $$ This gives $$ P (F_N^{(1)} =k_1, \ldots , F_N^{(n)} =k_n ) = \frac{k_1-1}{\sqrt N } \cdots \frac{k_n- n}{\sqrt N}\exp\left( -\frac12 \left(\frac{ k_n-(n+1) }{\sqrt N}\right)^2+O(N^{-2})\right) \frac 1{\sqrt N^n} . $$ Fix $0\leq x_1, \ldots , x_n\leq K$, and sum this up for $k_i\leq x_i \sqrt N$, we have as $N\rightarrow\infty$, $$ P\left( \frac{F_N^{(1)}}{\sqrt N}\leq x_1, \ldots , \frac{F_N^{(n)}}{\sqrt N} \leq x_n\right) \rightarrow \int_{0\leq t_1\leq \cdots \leq t_n, \ \forall i, t_i\leq x_i} t_1 \cdots t_n \exp \left(-\frac12 t_n^2\right) dV $$ (Think of this as summing the probabilities over the boxes with side length $1/\sqrt N$. The Dominated Convergence Theorem will suffice to justify this limit.)

Thus, the random vector $\left(\frac{F_N^{(1)}}{\sqrt N}, \ldots , \frac{F_N^{(n)}}{\sqrt N}\right)$ converges in distribution to the continuous random variable with PDF $$ f(t_1,\ldots , t_n) = t_1 \cdots t_n \exp\left(-\frac 12 t_n^2\right) \mathbf{1}_{0\leq t_1 \leq \cdots \leq t_n}. $$

The question was originally about the expectation in case with $n=1$. So, the above calculation suggests that the expectation can be similarly calculated as $N\rightarrow\infty$, $$ \mathbf{E}\left(\frac{F_N^{(1)}}{\sqrt N} \right)\rightarrow \int_0^{\infty} t_1^2 \exp\left(-\frac12 t_1^2\right) dt_1 = \sqrt{\frac{\pi}2}. $$

But, we need to treat the case with $k_1\neq O(\sqrt N)$. In the general case $k_n\neq O(\sqrt N)$. To do this, we use $$ \log(1-x) \leq -x. $$ Then for $K\sqrt N < k_n$, $$ \frac{k_n}{\sqrt N}P (F_N^{(1)} =k_1, \ldots , F_N^{(n)} =k_n ) \leq \frac{k_1\cdots k_{n-1}k_n^2}{\sqrt N^{n+1}} \exp\left( -\frac12 \left(\frac{ k_n-n-1 }{\sqrt N}\right)^2\right)\frac1{\sqrt N^n}. $$ Again by the Dominated Convergence Theorem, the right side after summing up for $0\leq k_1\leq \cdots \leq k_n$, becomes as $N\rightarrow\infty$, $$ \int_K^{\infty} \int_0^{t_n} \cdots \int_0^{t_2} t_1\cdots t_{n-1}t_n^2 \exp\left(-\frac12 t_n^2 \right) dt_1\cdots dt_n. $$ Note that this can be made arbitrarily small with sufficiently large $K$. This shows that the suggested calculation is valid. We now have $$ \mathbf{E}\left(\frac{F_N^{(n)}}{\sqrt N}\right) \rightarrow \int_0^{\infty} \int_0^{t_n} \cdots \int_0^{t_2} t_1 \cdots t_{n-1}t_n^2 \exp\left(-\frac 12 t_n^2 \right) dt_1\cdots dt_n. $$ This integral is in fact as @Did computed in the last comment to the question, $$ \frac1{2^{n-1}(n-1)!} \int_0^{\infty} t_n^{2n} \exp\left(-\frac 12 t_n^2 \right) dt_n=\frac{(2n-1)!!}{2^{n-1}(n-1)!} \sqrt{\frac{\pi}2}\sim \sqrt{2n}. $$

Thank you! This is helpful, but yes, the question is about expectation and it seems there are still some details to fill in there. Also, I appreciate the generalization but would accept a detailed answer to the case $F_N$. — aduh, Sep 28 '16 at 13:54
I'm sure I should know this but can you tell me where $$\prod_{i=1}^{k_n - (n+1)} 1- i/N = \exp \left(- \frac{ (k_n - (n+1))^2}{2N} + O(N^{-2})\right)$$ comes from? — aduh, Sep 28 '16 at 13:57
I am editing to include the details. About the product, there is $-$ sign inside the exponential. The idea is discussed in the comments, and it uses the Maclaurin series of $\log (1-x)$. — Sungjin Kim, Sep 28 '16 at 13:58
Edited, thanks. I must have missed that in the comments. I'll look again more carefully. Thanks again! — aduh, Sep 28 '16 at 13:59

Birthday Problem: Asymptotics of Expected Time Until a Match Occurs

2 Answers2