5

We have $n$ guys walking down the street, and each can find $1, 2, \ldots $ or $n$ dollars in the street. ($n$ is the same number of guys and the same number of dollars in the problem).

Each of them finds $1$ dollar with probability $\dfrac{1}{2}$, $2$ dollars with probability $\dfrac{1}{4}$, $3$ dollars with probability $\dfrac{1}{8}$, and so on. This is, he finds $x$ number of dollars with probability $\dfrac{1}{2^x}$. (These probabilities are independent between guys, think that they are walking down different streets; the remaining $1/2^n$ probability can be arbitrarily assigned to getting $n$ (or 0) dollars, whatever is more convenient).

In expectation, every agent finds less than $2$ dollars. But: what is the expected number of dollars that the luckiest guy gets? This is, if $x_1, \ldots, x_n$ is the number of dollars that people find, what is $E[\max(x_1, \ldots, x_n)]$?

Edit: I am interested in obtaining a simple expression for the expectation OR a good upper bound.

Idea: Simulations suggest that the answer is around $\log(n)$.

Edit: I tried solving it using Lulu's suggestion. Let $M=\max(x_1, \ldots, x_n)]$. Then

$E[M]=\sum_{i=1}^n Pr(M=i)\cdot i$.

$Pr(M=1)=\frac{1}{2^n}$.

$Pr(M=2)=\frac{3^n}{4^n}-\frac{1}{2^n}$.

$Pr(M=3)=\frac{7^n}{8^n}-\frac{3^n}{4^n}+\frac{1}{2^n}$.

$Pr(M=4)=\frac{15^n}{16^n}-\frac{7^n}{8^n}+\frac{3^n}{4^n}-\frac{1}{2^n}$.

This implies that $E[M]=1 \cdot \frac{1}{2^n} + 2 \cdot (\frac{3^n}{4^n}-\frac{1}{2^n}) + 3 \cdot (\frac{7^n}{8^n}-\frac{3^n}{4^n}+\frac{1}{2^n}) + 4 \cdot (\frac{15^n}{16^n}-\frac{7^n}{8^n}+\frac{3^n}{4^n}-\frac{1}{2^n}) + \ldots$.

But I do not see how that expression converges to something easy to work with - or close to $\log (n)$.

Edit 2: A different approach, using $E[M]= \sum_{k=0}^\infty Pr(M\geq k)$, gives

$E[M]=\sum_{k=0}^\infty 1- \left( \frac{2^{k-1}-1}{2^{k-1}}\right)^n$, but I am not sure how to expand this term to make it approximately equal to $\log(n)$.

fox
  • 679
  • 7
  • 18
  • I think the "$n$ dollars" at the end of the first sentence should be removed in order to not confuse with the "$n$ guys" at the beginning of the sentence. Given the probability distribution, each guy can pick up any positive number of dollars. – angryavian Mar 29 '22 at 19:09
  • Thanks. $n$ is the same thing in the question, both the number of dollars and the number of agents. – fox Mar 29 '22 at 19:11
  • 1
    Since $\frac{1}{2} + \frac{1}{2^2} + \cdots + \frac{1}{2^n} = 1 - \frac{1}{2^n} < 1$, there is something missing in your probability distribution. – angryavian Mar 29 '22 at 19:12
  • 1
    Think of it as the limit distribution when $n \rightarrow \infty$. – fox Mar 29 '22 at 19:15
  • 1
    How exactly did you implement the finite-$n$ probability distribution? Are you assuming that in the remaining event with probability $1/2^n$ a person finds zero dollars? Or are you allowing them to find more than $n$ dollars? – angryavian Mar 29 '22 at 19:20
  • As the comments suggest, your question is not clear. That said, once you get the details straight, it is easy to compute $P(M≤k)$ for any $k$ (where, of course, $M$ denotes the max). After all, that just requires each person to find less than $k$. But then you can easily compute $P(M=k)$ by subtraction. – lulu Mar 29 '22 at 19:25
  • @lulu thanks for this. would you mind being more specific? – fox Mar 29 '22 at 19:26
  • As I say, your problem is incomplete as stated. But once you get the details straight, saying that $M≤k$ is just saying that each individual independently finds $≤k$. But then $P(M=k)=P(M≤k)-P(M≤k-1)$. – lulu Mar 29 '22 at 19:28
  • @lulu I followed your suggestion, without much luck I am afraid. I edited the question to show my reasoning. – fox Mar 31 '22 at 14:33
  • Note this expression for the max. I don't know if there is a simple way to address the sum or nor. – lulu Mar 31 '22 at 14:45
  • I have posted a bounty trying to encourage an answer. – fox Mar 31 '22 at 20:59
  • Related https://math.stackexchange.com/a/26214/312 – leonbloy Apr 05 '22 at 15:28

2 Answers2

3

I find it easier to see this problem as "$1+$ the expected value of the maximum of consecutive heads, for $n$ series of flips. When I say "consecutive heads", that means we stop at the first tail we have and write down the number of heads we had. The $1+$ is here because the distribution would be shifted by 1 (you get 0 head with probability 1/2, 1 with probability 1/4, 2 with 1/8, etc.).

We end up on a longest success run length problem. We will denote by $L_n$ the longest run random variable, $L_n = \max(x_1,\dots,x_n)$. Every $x_i$ run follows a geometric distribution $\mathbb{P}(x_i=k) = pq^{k} = 1/2^{k+1}$ with $p=q=1/2$ here and $k \in \mathbb{N}$.

Based on that, we may argue that a qualitative answer for the expected value of the long run is an $l$ such that $np^lq = 1$ (the longest run is expected to occur at least once). Solving for l, we have $$l = -\dfrac{\log n(1-p)}{\log p} = \dfrac{\log n/2}{\log 2}.$$

This article establishes that the spread between this result and the real $\mathbb{E}[L_n]$ follows a Gumbel distribution with $\mu=0$ and $\beta = 1/\log(1/p)$ and thus allows to draw confidence intervals. No closed-form though.

Do not forget to add $1$ to get back to the initial problem. That gives $\mathbb{E}[L_n] \approx \log_2(n)$.

This problem was already tackled here.

  • This answer looks interesting, but I am afraid I do not completely follow how my problem is equivalent to the one you are solving. – fox Apr 04 '22 at 08:35
  • The distribution is just shifted by 1 : the support for the dollar problem is $\mathbb{N}^{*}$ whereas for the longest heads run it is $\mathbb{N}$.

    Each of the $n$ people is given an attempt to win a certain amount of dollars. We want the expected value of the one that gets the highest number, thus it is a longest run problem.

    – vtisserand Apr 04 '22 at 11:30
  • If you could explain why the $p$ and $q$ that would be great, also why do you choose $l$ as the solution. – fox Apr 04 '22 at 15:40
  • Ok I got the $p, q$ thing ($q$ is simply $1-p$), but why do we assume that form for $l$? – fox Apr 04 '22 at 15:47
  • $l$ is the proxy for the longest success run. We expect one such run to appear at least once. We might say that we are looking for the smallest $l$ such that $n\times qp^l \ge 1$. – vtisserand Apr 04 '22 at 17:23
  • Ok I got most of it now, but where do the $log_2$ in ${E}[L_n] \approx \log_2(n)$ come from? – fox Apr 05 '22 at 11:16
  • Back to the dollar problem, we want $\mathbb{E}[L_n] \approx l + 1 = \dfrac{\log n - \log 2}{\log 2} + 1 = \dfrac{\log n}{\log 2} = \log_2 {n}$ by changing the base of the logarithm. – vtisserand Apr 05 '22 at 11:33
  • What happened to the minus? – fox Apr 05 '22 at 13:00
  • Vanishes with the $+1$. – vtisserand Apr 05 '22 at 16:36
  • No, I meant the minus in $$l = -\dfrac{\log n(1-p)}{\log p} = \dfrac{\log n/2}{\log 2}.$$ – fox Apr 12 '22 at 12:28
  • $p=1/2$, thus $-\log(p)= -\log(1/2)= \log(2).$ – vtisserand Apr 12 '22 at 12:48
1

Each of the $X_i$ (amount of dollars) has a truncated geometric distribution. For those the same result holds, as for the usual geometric distribution, namely that $\min(X_i, X_j)\sim Geo(1-(1-p)^2)$ and in general for $k$ of those variables, we get $$\min(X_{i_1},X_{i_2},\ldots,X_{i_k})\sim Geo(1-(1-p)^k)$$

Now we can use the mini-max-Formula for expectations (yet another inclusion exclusion principle): $$\mathbb E(\max(X_1,\ldots,X_n))=n\cdot\mathbb E[W_1]+\sum_{m=2}^n(-1)^{m+1}\binom nm\cdot \mathbb E[ \min(W_1,\ldots,W_m)] \\=\sum_{k=1}^n\binom nk \frac{(-1)^{k+1}}{1-(1-p)^k}=\sum_{k=1}^n\binom nk \frac{(-1)^{k+1}}{1-1/2^k}$$

Edit: The extreme value theorem by Fisher–Tippett–Gnedenko indeed implies that $T_n:=\max(W_1,\ldots,W_n)$ converges in distribution (also in probability) to a standard-Gumbel Distribution, i.e.

$$\lim_{n\rightarrow \infty}\mathbb P\left((1-p)\cdot\left(T_n+\frac{\ln(n)}{\ln(1-p)}\right)\leq x\right)=F_G(x)$$ where $G\sim Gumbel(0,1)$.

So indeed, $$\mathbb E[T_n]\approx \frac{\gamma_E}{1-p}-\frac{\ln(n)}{\ln(1-p)}$$ for large $n$. The convergence is however not so fast, so $n$ needs to be quite large. I could only numerically test in R, where the exact value went crazy for $n>50$.

You find the theorem on Wikipedia here: https://en.wikipedia.org/wiki/Fisher%E2%80%93Tippett%E2%80%93Gnedenko_theorem,

The second case is the one applicable here. One "only" needs to check all the assumptions.

It is possible though, that I messed up the computation of $a_n=1-p$. Also, $ln(n+1)$ makes the approximation a little better, but it makes no difference for the convergence naturally.

  • This formula looks very similar to the one I derived, or to the one suggested in the PS by @DinosaurEgg. The thing is how to find a short expression for it as a function of $n$. – fox Apr 04 '22 at 08:37
  • Actually, I am not sure they are even equivalent, seems like you got a missing term in the numerator. – fox Apr 04 '22 at 08:38
  • @fox, getting the $log(n)$ expression from this is really difficult. There should be a connection to the harmonic series, but i haven dug into that, yet. – Peter Strouvelle Apr 04 '22 at 09:16
  • Thanks, that is what I am looking for. I also got that expression in the question (without expanding it). – fox Apr 04 '22 at 09:20
  • @fox, the formula from dinosaur egg differs if you evaluate them numerically. However, my result is the exact result for geometric distribution, not the truncated one. Although i don't see why there is a difference after $m\rightarrow\infty$. – Peter Strouvelle Apr 04 '22 at 09:33
  • The reference to the formula used above is Sheldon-Ross: "A first course in probability". I just used the fact, that the random variables are exchangable and that the minimum of geometric distributions is geometric again. – Peter Strouvelle Apr 04 '22 at 09:45
  • I am interested in getting a simple expression. It does not need to be equal, it could be a simple upper bound. – fox Apr 04 '22 at 10:20