1

The identity I want help proving is the following (given $m$ probabilities, $p_j$ such that $\sum_j p_j = 1$): $$ \int\limits_0^\infty t \sum\limits_j \left(\prod\limits_{k \neq j}(1-e^{-p_k t}) \right)e^{-p_jt}p_j dt = \sum\limits_j\frac 1 p_j - \sum\limits_{i<j}\frac {1}{p_i+p_j} + \dots +(-1)^{m-1} \frac{1}{p_1+\dots+p_m}$$

For background and motivation, see below.


In example 5.17 of the book, Introduction to probability models by Sheldon Ross, the Coupon collector's problem is tackled for the general case where the probability of drawing coupon $j$ is given by $p_j$ and of course, $\sum\limits_j p_j = 1$. Now, he defines $X_j$ as the first time a coupon of type $j$ is observed, if the $j$th coupon arrives in accordance to a Poisson process with rate $p_j$. We're interested in the time it takes to collect all coupons, $X$. So we get:

$$X = \max_{1\leq j \leq m}X_j$$

Further, since the $X_j$ are independent (discussion on that here), we get:

$$F_X(t) = P(X<t) = P(X_j<t \; \forall \; j) = \prod\limits_{j=1}^{m}(1-e^{-p_j t})\tag{1}$$

Now, Ross uses the expression: $E(X) = \int\limits_0^\infty S_X(t)dt$, where $S_X(t)$ is the survival function to get:

$$E(X) = \int\limits_{0}^{\infty}\left(1-\prod\limits_{j=1}^{m}(1-e^{-p_j t})\right) dt = \sum\limits_j\frac 1 p_j - \sum\limits_{i<j}\frac {1}{p_i+p_j} + \dots +(-1)^{m-1} \frac{1}{p_1+\dots+p_m}\tag{2}$$

Now, I want to get this same result using the old-fashioned definition of the expected value. For this, I differentiate equation (1) to get the PDF of $X$. First, let's take logarithm on both sides.

$$\log(F_X(t)) = \sum\limits_j \log(1-e^{-p_j t})$$

Now differentiate with respect to $t$.

$$\frac{f_X(t)}{F_X(t)} = \sum\limits_j \frac{p_j e^{-p_j t}}{1-e^{-p_j t}}$$

Finally yielding:

$$f_X(t) = \sum\limits_j \left(\prod\limits_{k \neq j}(1-e^{-p_k t}) \right)e^{-p_jt}p_j$$

Using this, we get an alternate expression for the expectation:

$$E(X) = \int\limits_0^\infty t f_X(t) dt = \int\limits_0^\infty t \sum\limits_j \left(\prod\limits_{k \neq j}(1-e^{-p_k t}) \right)e^{-p_jt}p_j dt$$

This should lead to the same expression as in equation (2). However, I don't know where to start. Why do I want to do it through this alternate route? Because I hope to find an expression for the variance as well and for that, need $E(X^2)$. Thought I'd tackle the easier, $E(X)$ for which we know there is a nice expression first.

Rohit Pandey
  • 6,803
  • Your question a day ago explained that rather than $1$ coupon per step, you have the arrivals of each of the coupon types as independent Poisson processes with rates $p_j$ – Henry Nov 06 '19 at 05:37
  • Right, and this turns out to be equivalent to one coupon per step. Which is why I linked the earlier question and the book for background, but please let me know if I should add more of it in the question itself. – Rohit Pandey Nov 06 '19 at 05:39
  • 1
    If you are only interested in $E[X^2]$, you may consider $E[X^2] = \int_0^{+\infty} x^2f(x)dx = \int_0^{+\infty} \int_0^{x} 2udu f(x)dx = \int_0^{+\infty} \int_u^{+\infty} f(x)dx 2u du = \int_0^{+\infty}2u[1-F(u)]du$ – BGM Nov 11 '19 at 16:40
  • Thanks! That might actually work fine for what I was interested in. Let me see if I can solve it with this much simplified expression. This could have been an answer. – Rohit Pandey Nov 11 '19 at 17:48
  • 1
    The correct expression for the expectation value should contain factors $(-1)^{||S||}$, where $||S||$ are cardinalities of corresponding subsets. – user Nov 14 '19 at 21:06
  • @user - good catch, fixed. – Rohit Pandey Nov 17 '19 at 02:32
  • In the beginning of your question the signs remain unchanged. Besides you introduced new undefined variable $n $. – user Nov 17 '19 at 08:35
  • @BGM - I tried your approach to get $E(X^2)$ and the expression gets very close to the correct value, but misses one term. Haven't been able to figure out where it goes wrong in a long time. Asked another question about it here: https://math.stackexchange.com/questions/3439096/coupon-collectors-problem-variance-calculation-missing-a-term – Rohit Pandey Nov 17 '19 at 20:34

2 Answers2

1

For brevity let $F = F_X$. For $L>0$ let $$I_L = \int_{0}^{L}tf_X(t)dt.$$ Using integration by parts, it follows that \begin{align*} I_L &= \int_{0}^{L}t F'(t) dt \\ &= tF(t)|_{0}^{L} - \int_{0}^{L} F(t) dt \\ &= L(F(L)-1) + J_L \end{align*} where $$J_L = \sum_{i=1}^{m} (-1)^{i-1} \sum_{0<j_1<...<j_i<m+1} \frac{1 - e^{-(p_{j_1}+...+p_{j_i})L}}{p_{j_1}+...+p_{j_i}}.$$ Show that $$\lim_{L\to\infty} L(F(L)-1) = 0.$$ Then it follows that $$\lim_{L\to\infty} I_L = \lim_{L\to\infty} J_L = \sum_{i=1}^{m} (-1)^{i-1} \sum_{0<j_1<...<j_i<m+1} \frac{1}{p_{j_1}+...+p_{j_i}}.$$

For the $E(X^2)$ you might consider doing what I did here but applying integration by parts twice.

DAS
  • 732
0

I have an attempt at calculating the variance using the technique @BGM pointed out. The attempt is so far un-successful, but wanted to post it for my own reference and an answer seemed the best place given how long the question already is. As pointed out by @BGM,

$$E(X^2) = \int\limits_0^\infty 2u\left(1-F(u)\right) = \int\limits_0^\infty 2u\left(1-\prod\limits_{j=1}^m (1-e^{-p_j u})\right)du$$

$$ = \int\limits_0^\infty 2u\left(\sum e^{-{p_ju}} - \sum_{i<j} e^{-{(p_j+p_i)u}}+\dots+(-1)^{m+1}e^{-{(p_1+p_2+\dots+p_m)u}}\right)du$$

Now we know,

$$I = \int\limits_0^\infty u e^{-pu}du = \frac{1}{p^2}$$

$$=> E(X^2) = 2\left(\sum \frac{1}{p_j^2} + \sum_{i<j} \frac{1}{(p_i+p_j)^2} +\dots\right)$$

Trying to validate it for the case $p_j = \frac 1 m$ leads to some trouble. See here.

Rohit Pandey
  • 6,803