12

If you throw $n$ balls into $n$ bins uniformly and independently at random, let $X$ be the number of balls in the bin with the largest number of balls in it.

Is there an elementary way to compute $\mathbb{E}(X)$?

This problem comes up when considering hashing in computer science, for example, or randomized load balancing.

EDIT. Having seen the current answer, if there is a simpler way to prove that $\mathbb{E}(X) =\Theta(\log{n}/\log{\log{n}})$ instead of an exact formula I would be happy with that.

  • Very similar to http://math.stackexchange.com/questions/28930/another-balls-and-bins-question?rq=1 I think... – abiessu Oct 02 '13 at 18:40
  • 1
    @abiessu I don't think it's the same as that person is looking for the number of bins with $0$ or $1$ ball in them. In my problem you have to worry about the bin that is most full which seems harder to me. –  Oct 02 '13 at 18:45
  • where did you come across this problem, or did you just make it up? If we knew where it came from, we might be able to locate a solution or a hint. I suspect this has been done before. – Stefan Smith Oct 02 '13 at 18:57
  • I still think it is similar, because the question "how many balls in the fullest bin" comes down to "how many bins have ${0,1,...n}$ balls in them by probability?" Then assigning a $0$ or $1$ for having one ball or not, further assigning a $0$ or $1$ for having two balls or not, further assigning $0$ or $1$ for having $k$ balls or not would be a way to break down the probability groupings. – abiessu Oct 02 '13 at 19:26
  • Related: http://mathoverflow.net/questions/104948/distribution-of-maximum-of-a-uniform-multinomial-distribution – leonbloy Oct 06 '13 at 14:39
  • 2
    An alternative approach to the one in the answer so far is to apply the second moment method (estimate the expectation and variance of the number of bins with at least $c \log n / \log \log n$ balls. Use this to show that if $c$ is large there are probably no such bins, while if $c$ is small there are probably many, then use your bounds from this to get the expectation). For full details, check out the first four sections of http://www14.informatik.tu-muenchen.de/personen/raab/publ/balls.pdf – Kevin P. Costello Oct 07 '13 at 18:53
  • @KevinCostello That looks a lot better. Thank you. –  Oct 08 '13 at 09:26

4 Answers4

9

More generally, suppose we have N balls and M bins. Section 9.4 of An Introduction to the Analysis of Algorithms, Second Edition by Robert Sedgewick and Philippe Flajolet shows that the average maximum occupancy is given by

$$\frac{N!}{M^N} [z^N] \sum_{k \ge 0} \left( e^{Mz} - \left( \sum_{0 \le j \le k} \frac{z^j}{j!} \right)^M \right) $$

where $[z^N]$ denotes the coefficient of $z^N$ when the expression following is expanded. The book also quotes an asymptotic approximation due to Gonnet:

$$\sim \frac{\ln N}{\ln \ln N} \text{ as } N, M \to \infty$$ in such a way that $N/M = \alpha$ with $\alpha$ constant.

awkward
  • 14,736
  • Thank you although that looks much more complicated than I was hoping. –  Oct 03 '13 at 18:05
4

The discussion in Section 4 of "Balls into Bins" - A Simple and Tight Analysis by Raab and Steger (found here) seems simple enough, as long as you're comfortable using moment method inequalities to bound probabilities of events.

eda
  • 498
  • Thank you. It does seem the simplest way I have seen so far. I suppose it proves more than is strictly needed to obtain the mean however. –  Oct 11 '13 at 09:27
2

Would you like to know the equation(s) for the Maximum Load given a set number of bins and any number of balls? You are at the right place. After examining the data and working back from there, these results are pretty straightforward, except for the not-so-obvious UAF.

Definitions: T: tosses of balls, U: urns or bins, EM(T,U): Expected Maximum Load for T random tosses into U urns

Probabilistically, there is a single exact expected value for the Maximum Load for a set number of tosses and urns. For example, EM(200,2), the expected maximum load for 200 tosses into 2 urns, is 105.6348479009 (rounded).

A nice equation can be seen for the expected maximum load when U=2 and T is sufficiently large:

EM(T,2)= T/2 + sqrt( T / (2*π) ) as T → ∞

What if there are 3 urns?

Because the maximum urn count can be as high as T, but the minimum urn can only be as low as zero, there is an unscalable adjustment factor (UAF) of (0.0918881…) / 2 for U=3.

EM(T,3)= T/3 + (3/2) * sqrt( T / (3*π) ) + 0.04594407 as T → ∞

EM(1800,3) = 620.7748847701196 which compares favorably with the equation's output of 620.77559304. (The extra load is 99.996% of the equation’s upper limit at T/U=600.)

To determine EM(T,U), there are definitive increasing constant multipliers for every U. Wonderful. Since EM(T,U) resides at the 50th percentile, my guess is that there may be exact multipliers for each of the other p values, too.

Estimates have been determined for a few of these constants via simulation where U > 3 using a stellar PRNG. Notice that EM(T,U) is scalable by the square root of the expected urn count, T/U.

For 4 urns, where T/U is sufficiently large, the equation is approximately EM(T,4)= T/4 + 1.029 * sqrt( T/4 ) + 0.09

For 13 urns, where T/U is sufficiently large, the equation is approximately EM(T,13)= T/13 + 1.667 * sqrt( T/13 ) + 0.35

For 100 urns, where T/U is sufficiently large, the equation is approximately EM(T,100)= T/100 + 2.51 * sqrt( T/100 ) + 0.9

For 1000 urns, where T/U is sufficiently large, the equation is approximately EM(T,1000)= T/1000 + 3.24 * sqrt( T/1000 ) + 1.6

How many urns / bins are you working with?

  • It's very nice concrete example. I have been looking for the answer in: http://math.stackexchange.com/questions/1512644/balls-and-bins-hash-table-a-concrete-example – user153465 Nov 13 '15 at 15:07
  • My question is that how can I calculate probability with which a bin has specific max load. In particular I have $n=10^6$ balls and $k=13\cdot 10^3$ bins. As I'm writing a paper I need a reference for the formula. I would be great if I could have have formula and the probability. – user153465 Nov 13 '15 at 15:11
0

For the maximum load the answer is $\Theta (\log n/ \log \log n)$.

Question: is the answer known for the 2-nd highest maximum load? what for the third? And so on.

fox
  • 679
  • 7
  • 18