3

Suppose you draw $n$ samples from a uniform distribution where $X$ is from $(-100,100)$

Call your samples $X_1, X_2, . . . X_n$.

Next you draw one more sample, call it $X_0$. Then you start comparing $X_0$ to each of the $X_1, X_2, … X_n$ in turn, stopping only when $X_0 > X_i, 0 < i \leq n$

The expected value of $i$, as $n \rightarrow \infty$ is apparently $\infty$. I can't wrap my head around this counterintuitive result, and I would appreciate an explanation.

Sam
  • 1,013
  • Interesting problem... where is it from by the way? – Nap D. Lover Oct 28 '19 at 00:18
  • Yes it is...It was brought up in probability theory class – Sam Oct 28 '19 at 00:49
  • 1
    If $X_0 \le \min{X_1, \ldots, X_N}$, what is the value of the random variable $i$? – angryavian Oct 28 '19 at 00:58
  • 3
    We may instead compare the normalized values $$Y_k=\frac{100+X_k}{200}$$ without affecting the algorithm. Then given $Y_0=p\in(0,1)$, the expected value of $i$ is $1/p$, reciprocal to $p$. This is simply because the stopping criteria $Y_0>Y_k$ has probability $p$ to occur for each $k\geq 1$. Now there is a delicate competition between $p$ and $1/p$, i.e., it is harder to observe values of small $Y_0$ while small values of $Y_0$ will make the stopping to happen much later. And it turns out that the latter effect is slightly stronger to give infinite expectation. – Sangchul Lee Oct 28 '19 at 01:21

1 Answers1

4

I will suppose that $i=n$ if $X_0$ is not greater than any of the $X_k.$

Then $$ \mathrm E(i) = \mathrm P(i \geq 1) + \mathrm P(i \geq 2) + \cdots + \mathrm P(i \geq n). $$

Obviously $\mathrm P(i \geq 1)=1$, but $i \geq 2$ only if $X_0$ is the greater of the values $X_0,X_1$; either of these is equally likely to be greater, so $\mathrm P(i \geq 2)=\frac12.$ Similarly, $i \geq 3$ only if $X_0$ is the greatest of the three values $X_0,X_1,X_2$; any of these is equally likely to be the greatest in the list, so $\mathrm P(i \geq 3)=\frac13.$ In general, for $k \leq n,$ $i \geq k$ only if $X_0$ is the greatest of the $k$ values $X_0,X_1,\ldots,X_k$; any of these is equally likely to be the greatest in the list, so $\mathrm P(i \geq k)=\frac1k.$


In summary, $\mathrm E(i) = 1 + \frac12 + \frac13 + \ldots + \frac1n.$ Now let $n \to \infty.$


Addendum

To prove that $\mathrm E(i) = \mathrm P(i \geq 1) + \mathrm P(i \geq 2) + \cdots + \mathrm P(i \geq n),$ one possible approach is to observe that $$ \mathrm P(i \geq k) = \mathrm P(i = 1) + \mathrm P(i = 2) + \cdots + \mathrm P(i = k), $$ break up all the terms of the sum this way, and recombine them to get $$\mathrm P(i = 1) + 2\mathrm P(i = 2) + \cdots + n\mathrm P(i = n).$$

Another way is to define $Z_k = 1$ if $i \geq k,$ $Z_k = 0$ otherwise, that is, $Z_k$ is an indicator variable saying whether we compared $X_0$ to $X_k$ (as opposed to stopping earlier). But since $i$ is just the number of comparisons we made, $i = Z_1 + Z_2 + \cdots + Z_n$ (adding $1$ for each $X_k$ that actually got compared and $0$ for each of the others), and by linearity of expectation, $$\mathrm E(i) = \mathrm E(Z_1) + \mathrm E(Z_2) + \cdots + \mathrm E(Z_n).$$

But $Z_k = 1$ if and only if $i \geq k,$ so $\mathrm E(Z_k) = \mathrm P(i \geq k),$ and the conclusion follows immediately.

David K
  • 98,388
  • It may be a trivial question, but how is the expected value merely the sum of the probabilities? – Chaos Oct 28 '19 at 12:35
  • 1
    @RScrlli $E = \sum n \cdot \mathrm{pmf}(n) = \sum [1-\mathrm{cmf}(n)]$, using summation by parts. – eyeballfrog Oct 28 '19 at 12:55
  • 1
    Right, it's not the sum of "the" probabilities $\mathrm P(i = k).$ It's a sum of probabilities $\mathrm P(i \color{red}{\geq} k),$ which are probabilities of overlapping events and can have a sum greater than $1.$ And it only works out to be this simple because the support of the distribution is a subset of the positive integers. – David K Oct 28 '19 at 13:21
  • Very nice, thanks for the clarification! – Chaos Oct 28 '19 at 15:20
  • This is great, thank you! – Sam Oct 30 '19 at 18:48