0

Suppose that I have $N$ balls and m bins, where $N > m$. Each ball is randomly assigned to a bin (with equal probability of being assigned to any bin $i = 1 ...m$). What is the probability of there being $k$ balls in any bin?

My naive first guess was that the probability of any ball being assigned to a bin is $1/m$, so that the probability was ${N \choose k} (1/m)^k(1-1/m)^{N-k}$. However, this is incorrect, since if (to take an extreme case) $N$ balls are assigned to bin $j$, none can be assigned to $j \neq i$. A binomial distribution for each bin $i$ would have to allow for this impossible scenario.

My next thought was that could be described as a generalized ($m$ rather than 2 state) hypergeometric distribution, but that doesn't seem valid either. Specifically, a multivariable hypergeometric would correspond to a case where I draw a sample of $n$ = $x_1$+$x_2$+...+$x_m$ of types $1...m$ out of a pool of $N$= $X_1$+...$X_m$ marbles of types 1 through $m$. Here, I have $N$ unlabeled (indistinguishable) balls being assigned to $m$ indistinguishable urns, and I want the probability that a given urn contains exactly $k$ balls.

Max
  • 1
  • I did some reformatting for improved legibility; you should check to make sure I didn't derail your meaning. – StumpyLeg Jun 05 '14 at 17:28
  • Yes hypergeometric is the way to go. – PA6OTA Jun 05 '14 at 17:33
  • Look up hypergeometric on Wikipedia... – afedder Jun 05 '14 at 17:45
  • Presumably this would be a multivariate hypergeometric? – Max Jun 05 '14 at 18:18
  • That being said, I'm not convinced that a multivariate hypergeometric is applicable here. The multivariate hypergeometric would be for a case where there are (x1,x2,...xm) labeled balls of classes 1...m, and I draw a sample N <= x1+x2+...xm. This doesn't seem equivalent to drawing a sample of N and assigning each of them to 1....m different bins. – Max Jun 05 '14 at 18:30
  • Multivariable hypergeometric would be a case where I draw a sample of n<=N balls from a pool where there are m colors x1+x2+...xm = N. Here, I have N identical balls being assigned to m urns (with equal probability), and I need to know the probability that there are precisely k balls in any given urn. – Max Jun 05 '14 at 18:38
  • @Max Because the bins are labeled do you agree they are distinguishable? – bobbym Jun 05 '14 at 18:53
  • @bobbym: The context in which this problem came up is as follows: suppose you have N dollars that you distribute randomly among m people. The question that I'm interested in is what's the probability that a randomly selected person gets exactly k dollars - so in the context of this problem the bins/urns are indistinguishable – Max Jun 05 '14 at 19:29
  • In your m-people context the original answer you rejected is correct, because having randomly selected a person, this is equivalent to asking for the probability that person 1 gets exactly $k$ dollars. – Mark Fischler Jun 05 '14 at 20:50
  • @Mark Fischler: What I basically want is $P(k)$, the probability that an individual has $k$ dollars (in other words, what is the distribution of the $N$ dollars among the $m$ individuals). Obviously, the joint distribution is not just the product of binomials, since if one individual has $N$ dollars (with binomial probaility $1/m^N$, the remaining $m-1$ individuals by necessity have 0, rather than $k$ determined by an independent binomial random variable. – Max Jun 05 '14 at 21:29

1 Answers1

0

All possible outcomes of the distribution are states of the format $0^{m_0} 1^{m_1}\cdots$, meaning in the multiset notation that $m_i$ bins have $i$ balls, subject to the condition $\sum_{i\ge 0} m_i=m$, $\sum_{i\ge 0} im_i=N$. Now supposed that the $m_i$ are a fixed vector of nonnegative integers, the number of ways of distributing $N$ labeled balls across $m$ labeled bins with that type of frequencies is $m!N!/[(0!)^{m_0}m_0! (1!)^{m_1}m_1!\cdots)]$. Divide this through $m^N$, the number of ways of assigning balls to bins, to get the probability. If one wants to get the probability that there is exactly one bin with $k$ balls and the other are no-care, one needs to sum this probability over all the remaining free solutions to $\sum_i im_i=\sum_{i\neq k}im_i+k=N$.

R. J. Mathar
  • 2,324