4

Let $X_r (r \geq 1)$ be an independent random variable that is uniformly distributed over $[0,1]$ and $x$ be a number between $0$ and $1$. $N$ is defined as follows:

$$N= \min\{{n \geq 1 | X_1 + X_2 + \cdots + X_n > x}\}$$

prove that for any $x$, $P( N > n ) = \frac {x^n}{n!}$

And calculate the mean and the variance of $N$.

I am self-studying and I found this problem in my book. I have tried various methods but have gotten nowhere so far. I would really appreciate some help.

Zhanxiong
  • 14,040
Dojou
  • 409

1 Answers1

8

Wow, this is a hard question! Apologies in advance if I make any careless mistakes.

Note that $P(N > k)$ is asking: "after $k$ tries, what is the probability that the sum of the r.v.'s $X_1, X_2, \cdots X_k$ is still less than $x$?" This way you need more than $k$ r.v's to go beyond $x$. So $P(N > k) = P(X_1 + X_2 + \cdots X_k < x)$.

I'm going to cheat and knock it down to the case of small variables first. When $k=1, P(N > k) = P(X_1 < x) = x$, so the formula works. When $k=2, P(N > k) = P(X_1 + X_2 < x)$. We condition on the first variable. That is, \begin{align} P(X_1 + X_2 < x) &= \int_0^x P(X_2 <x -x_1 \mid X_1 = x_1) f_{X_1}(x_1) dx_1 \\ \end{align} where $f_{X_1}(x_1)$ is the PDF of the random variable $X_1$. However, since $f_{X_1}$ is uniformly distributed, $f_{X_1}(x_1) = 1$ for all $x_1 \in [0,1]$. Hence, \begin{align} P(X_1 + X_2 < x) &= \int_0^x P(X_2 <x -x_1 \mid X_1 = x_1) (1) dx_1 \\ \end{align} By independence, $P(X_2 <x -x_1 \mid X_1 = x_1) = P(X_2 < x -x_1)$, so, \begin{align} P(X_1 + X_2 < x) &= \int_0^x P(X_2 <x -x_1) dx_1 \\ &= \int_0^x \int_0^{x-x_1} 1 dx_2 dx_1. \end{align} Now, $\int_0^{x-x_1} 1 dx_2$ gives $(x-x_1)$, and $\int_0^x (x-x_1) dx_1 = \frac{x^2}{2}$.

For $k=3$, you can verify by a similar procedure that the integral you need is, \begin{align} P(X_1 + X_2 + X_3 < x) = \int^x_0 \int^{x-x_1}_0 \int^{x-x_1-x_2}_0 1 dx_3 dx_2 dx_1. \end{align} This gets hard to evaluate real quick, but luckily for us, we have a secret weapon! Notice that $x-x_1$ does not participate in the inner two integrals (they are integrating with respect to $x_2$ and $x_3$, hence you may let $x-x_1 = u$, and get, \begin{align} P(X_1 + X_2 + X_3 < x) = \int^x_0 \int^{u}_0 \int^{u-x_2}_0 1 dx_3 dx_2 dx_1. \end{align} But by the case of $k=2$, you know the inner two integrals works out to be, \begin{align} \int^{u}_0 \int^{u-x_2}_0 1 dx_3 dx_2 = \frac{u^2}{2!} = \frac{(x-x_1)^2}{2!} \end{align} Unsurprisingly, $$ \int^x_0 \frac{(x-x_1)^2}{2!} dx_1 = \frac{x^3}{3!}. $$ By now you should see a pattern. For $k=4$, you will eventually end up with $$ \int^x_0 \frac{(x-x_1)^3}{3!} dx_1, $$ for which you want to show this integral works out to be $\frac{x^4}{4!}$. For $k=5$, you will end up with $$ \int^x_0 \frac{(x-x_1)^4}{4!} dx_1, $$ for which you want to show this integral works out to be $\frac{x^5}{5!}$. What you need to prove now is that in general, the following preposition,

Proposition: $$\int^x_0 \frac{(x-x_1)^n}{n!} dx_1 = \frac{x^{n+1}}{(n+1)!}.$$

There are many ways to do this, (brute forcing binomial expansion of $(x-x_1)^n$ is perhaps the easiest), so I will not prove it here.

Once you prove this, the fact that $P(N > k) = \frac{x^k}{k!}$ should be clear.


Another way to compute expectation (for discrete variables only) is by using this formula: $$ E(X) = \sum_{n = 1}^{\infty} P(X \geq n) $$ (Why is this true?) Then, \begin{align} E(N) &= \sum_{n = 1}^{\infty} P(N \geq n) \\ &= \sum_{n-1= 0}^{\infty} P(N > n-1) \\ &= \sum_{u = 0}^{\infty} P(N > u) \\ &= \sum_{u = 0}^{\infty} \frac{x^u}{u!} \\ &= e^x. \end{align}


Variance is a little tricker. We know $E(N)$ already, so the $E(N)^2$ term isn't the problem. What is problematic is the $E(N^2)$ term! \begin{align} E(N^2) &= \sum_{n=1}^{\infty} P(N^2 \geq n) \\ &=P(N^2 \geq 1) + P(N^2 \geq 2) + P(N^2 \geq 3) + P(N^2 \geq 4) + ... \end{align} Note, \begin{align} & P(N^2 \geq 1) = P(N > 0) \\ & P(N^2 \geq 2) = P(N > 1) \\ & P(N^2 \geq 3) = P(N > 1) \\ & P(N^2 \geq 4) = P(N > 1) \end{align} So 1 contributes 1 $P(N > 0)$ term, 2 to 4 contributes 3 $P(N > 1)$ terms, and extrapolating, 5-9 contributes 5 $P(N > 2)$ terms, 9-16 gives 7 $P(N > 3)$ terms, and so on. We hence have the following sum, \begin{align} E(N^2) &= P(N > 0) + 3 P(N > 1) + 5P(N > 2) + 7P(N > 3) + \cdots \\ &= 1 + 3x + 5\frac{x^2}{2!} + 7\frac{x^3}{3!} + \cdots \\ &= e^x + 2x + 4\frac{x^2}{2!} + 6\frac{x^3}{3!} + 8\frac{x^4}{4!} + \cdots \\ &= e^x + 2x(1+x+\frac{x^2}{2!} + \frac{x^3}{3!} + \cdots)\\ &= e^x + 2xe^x. \end{align} Thus, $\text{Var}(N) = e^x + 2xe^x - e^{2x}$, which is somehow (magically) positive for $x \in [0,1]$.

koifish
  • 2,779
  • Great answer. But variance is always positive, why should that be magical? – Adam Dec 18 '20 at 17:01
  • 1
    That's true, when I was doing the question, I thought $e^x + 2xe^x - e^{2x}$ would be negative, since the $e^{2x}$ term would outweigh the other terms, but then I remembered $x \in [0,1]$. I said magically as a joke tbh, because looking at the variance formula, it almost seems coincidental that it is positive from 0 to 1. – koifish Dec 18 '20 at 23:52