Limiting the expected number of geometrically distributed variables I need to look at

Question

I am given a series $X_1, X_2, \dots X_n$ of geometrically distributed (with $p = 0.5$), independent random variables, for some $n$. Since there seem to be multiple definitions around: Think of each $X_i$ as the number of coin flips until you see "heads".

Let $\hat{X} = \max_i(X_i)$. What I want is an upper bound (in $n$) on the expected number of geometrically distributed random variables (with $p = 0.5$) $X'_1, X'_2, X'_m$ that I need to look at before I see a value of at least $\hat{X} + 1$. In other words: I draw $n$ such random variables and determine the maximum. Now, in the second step, I keep on drawing such random variables until I see a value larger than the previously determined variables. How many variables do I expect to draw in the second step?

My assumption is: $m \leq 2n$ or $m \le 3n$. I have two approaches of which I feel like they should show this, but for both I'm not sure whether they hold / how to correctly bound $m$.

Approach 1 : Symmetry and Handwaving

I feel like there should be symmetry at play here. Say I wasn't looking for a value larger than $\hat{X}$, but at least as large as $\hat{X}$. Concatenate both series together to form $X_1, X_2, \dots X_n, X'_1, X'_2, \dots, X_m$. Now, the values of the individual $X_i$ (resp. $X'_i$) are indepentent of my choice of $n$ - so why should the distance "to the left" from $X_n$ to $\hat{X}$ be larger than the distance "to the right" from $X_n$ to the next element of at least the same value as $\hat{X}$? For symmetry reasons, we should expect $m = n$ here.

Now I'm not looking for an element of the same value, but of strictly greater value. Since $P[X_i = \hat{X}] = 2 \cdot P[X_i = \hat{X} + 1]$, the values I'm looking for should be halve as densely distributed… thus we should expect $m = 2n$ … right?

At this point I'm waving my hands really hard and hope for the best. Is this approach valid? Would you (as a reviewer / reader) accept this in a paper, if I used it to prove some property of some data structure?

Approach 2 : Actually do the Maths

One more rigorous approach I thought of is this: From this answer I know that the expected maximum of $n$ such $X_i$ (let's call it $M(\{X_i\})$) is

$$E(M(\{X_1, \dots X_n\})) = \sum_{k \geq 0}\left( 1 - \left(1 - \frac{1}{2^k}\right)^n\right)$$

I feel like if I find an $m$ such that $E(M(\{X'_1, \dots X'_m\})) - E(M(\{X_1, \dots X_n\})) \ge 1$, then I should have proven my point - right?

I chose $m = 3n$. The resulting sum from the above can then be simplified to:

$$\sum_{k \ge 0}\left( \left(1 - \frac{1}{2^k} \right)^n - \left(1 - \frac{1}{2^k} \right)^{3n} \right) \ge 1$$

However, at this point, I seem to not have paid attention in my calculus classes, or just don't see how to bound this series. Does anybody see a way of showing the above? I would be happy with any $m = cn$ for a constant $c$. I plotted it for $m = 3n$ (and $0 \le k \le 1000$), and the left side is well above $1$ for all $n \geq 1$.

Even if I could show that: Is my approach ("if I chose $m$ such that the expected values differ by at least $1$, I have found the upper bound I'm looking for") valid?

Thanks a lot!

For my intuition it is easy once we have $\hat{X}$. We would simply find $P(X'_i>\hat{X})$ and take 1 divided by this probability to determine the expected amount of samples it takes. Now if we want to know beforehand, so without $\hat{X}$ fixed the question becomes more involved. — Jan, Aug 31 '18 at 09:31
Do you want this approach for general $\hat{X}$ or will this suffice? — Jan, Aug 31 '18 at 09:32
Hi Jan, thanks for your answer. No, I do not know any fixed $\hat{X}$ - I only have a fixed $n$. — Lukas Barth, Aug 31 '18 at 09:34
If $X$ is continuous then it is easy to compute as there is no tie. But even now in discrete case you can still compute the probability by law of total probability, and write down the corresponding expectation as it is just another geometric distribution, as mentioned by Jan. — BGM, Aug 31 '18 at 09:55

score 1 · Accepted Answer · answered Aug 31 '18 at 10:15

1

Let's take the simple case of $n=1$, so we can say $X$ rather than $\hat{X}$:

You take your geometric random variable with $\mathbb P(X>k)=\frac1{2^k}$ and $\mathbb P(X=k)=\frac1{2^k}$ for $k \ge 1$
Given $X=k$, the number of additional attempts $A$ needed to exceed it is also a geometric random variable with $\mathbb P(A>m \mid X=k) = \left(1-\frac1{2^k} \right)^{m}$ and $\mathbb P(A=m \mid X=k) = \frac1{2^k}\left(1-\frac1{2^k} \right)^{m-1}$ for $m \ge 1$
So $\mathbb E[A \mid X=k]=2^k$
Since $\mathbb E[A] = \sum\limits_{k=1}^\infty E[A \mid X=k]\, \mathbb P(X=k)$, you get the surprising conclusion that the expected number of additional attempts to set a new record is infinite

It does not get better with $n>1$

answered Aug 31 '18 at 10:15

Henry

157,058

Hi Henry, thanks for your reply. That is really surprising (and really not what I was hoping for). I answer only now because I tried multiple times (and failed) to understand how you went from step 2 to step 3, i.e., how you arrived at $E[A | X = k] = 2^k$ - even though that is indeed the expected value that I would have guessed intuitively. Could you help me out there? – Lukas Barth Sep 24 '18 at 12:28
@LukasBarth Looking at this again, I think the second point should be different: I would now say $\mathbb P(X'_i > k) = \frac1{2^k}$ so given $k$ you have $A$ as a geometric random variable with parameter $\frac1{2^k}$ and thus $\mathbb E[A \mid X=k]=2^k$ – Henry Sep 24 '18 at 23:03

Limiting the expected number of geometrically distributed variables I need to look at

Approach 1 : Symmetry and Handwaving

Approach 2 : Actually do the Maths

1 Answers1