1

Assume we have $n$ objects. And we are selecting $n$ objects from these $n$ objects with replacement, where the probability of choosing any object is $\frac{1}{n}$.

For each $k$ from $1$ to $n$, what is the probability that we chose $k$ distinct objects.

For $k = 1$, this is clearly $(\frac{1}{n})^n$.

For $k = 2$, this is $(\frac{1}{n})^{n-1}(1-\frac{1}{n})\binom{n}{1}$.

But I can't seem to generalize this for $2 < k \leq n$. And I see that there is a similar question regarding expectation given here. But I am trying to calculate individual probabilities. Expected number of unique items when drawing with replacement

  • As an aside, for $k=1$ it is actually $\left(\frac{1}{n}\right)^{n-1}=n\cdot \left(\frac{1}{n}\right)^{n}$. Letting the $n$ objects be ${1,2,3,\dots,n}$ the expression you gave was that every selection was $1$, but if all of the selections were $2$ that would also have satisfied the condition that we had selected a single unique object. Similarly for $3$,$4$, etc... – JMoravitz May 31 '22 at 17:59
  • Your attempted expression for $k=2$ was more incorrect. That would have been the probability that you had $n-1$ times that "$1$" was selected and a single time that you selected anything else... but you could have just as well have had, say... five times that $3$ was selected and the remaining $n-5$ times that $7$ was selected, etc... That is to say, we neither required that one of the distinct selections was "$1$" nor did we require that the distinct selections were grouped into $n-1$ iterations of the one selection and only $1$ iteration of the other selection. – JMoravitz May 31 '22 at 18:02

2 Answers2

2

The Stirling Number of the Second Kind ${n\brace k}$ counts the number of ways to partition an $n$ element set into $k$ non-labeled non-empty subsets.

The falling factorial $n\frac{k}{~}$ is the number of ways to select an ordered sequence of $k$ elements out of $n$ with no repeats.

So, to count the ways to have $k$ distinct objects selected in your $n$ selections, first select a way to partition the sequence of selections (first time you picked, second time you picked, etc...) into $k$ non-labeled non-empty subsets (e.g. the first subset being the first select, third selection, fifth selection) so that each time in the selection process all of those times in the same part in the partition will have had the result of their selection be the same. Then, choose what the selection was for each of your groups of selections.

$$\Pr(X=k)=\frac{n\frac{k}{~}{n\brace k}}{n^n}$$

JMoravitz
  • 79,518
  • This problem was much more involved than I first realized. Is there a way to solve this problem using just binomial coefficients? – user1231231hd May 31 '22 at 18:06
  • 1
    Falling factorials are taught before binomial coefficients, so there should be no complaint there. I just prefer the $n\frac{k}{~}$ notation over $~nP_k$ or $P(n,k)$ or whatever other notation you may have seen first. $n^n$ is also taught before binomial coefficients, so there should be no complaint there. As for Stirling Numbers of the Second Kind... read the linked article to find the equation ${n\brace k} = \frac{1}{k!}\sum\limits{i=0}^k(-1)^i\binom{k}{i}(k-i)^n$ which can be found using more elementary counting methods and inclusion-exclusion. – JMoravitz May 31 '22 at 18:18
  • 1
    I far prefer looking at the answer as $\frac{n\frac{k}{~}{n\brace k}}{n^n}$ than as $\frac{\binom{n}{k}\left(\sum\limits_{i=0}^k(-1)^i\binom{k}{i}(k-i)^n\right)}{n^n}$ – JMoravitz May 31 '22 at 18:23
  • You're right, only the Stirling Numbers is a bit more involved (and using the Stirling Notation is more concise). The rest is quite straightforward. – user1231231hd May 31 '22 at 18:38
  • @user1231231hd I do recommend saying the whole phrase so as to not confuse them with Stirling Numbers of the First Kind which counts something else entirely. – JMoravitz May 31 '22 at 18:41
0

Alternative approach:

The Math in the answer of JMoravitz can be derived analytically, using Inclusion-Excusion. See this article for an introduction to Inclusion-Exclusion. Then, see this answer for an explanation of and justification for the Inclusion-Exclusion formula.

For any set $E$ with a finite number of elements, let $|E|$ denote the number of elements in the set $E$.

Assume that $N \in \Bbb{Z_{\geq 2}}$ and that $K \in \{1,2,\cdots,(N-1)\}.$ Here, it is being assumed that $N$ and $K$ are fixed positive integers.

Let $A$ denote the set of all possible ordered $N$-tuples $\left(a_1, a_2, \cdots, a_N\right)$, where each component $a_i$ is an element in $\{1,2,\cdots,N\}$.

Then, each element in $A$ represents a distinct way that $N$ items can be selected from $\{1,2,\cdots,N\}$, sampling with replacement, where the order of the selection is deemed important.

Then $|A| = N^N.$

Let $B$ denote the subset of $A$, where each ordered $N$-tuple $\left(a_1, a_2, \cdots, a_N\right) \in B$ satisfies the following constraints:

  • Each component $a_i$ is an element in $\{1,2,\cdots,K\}$.

  • For each element $m$ in the set $\{1,2,\cdots,K\}$ at least one of the components $a_1, a_2, \cdots, a_N$ is equal to $m$.

Then, the desired computation of the probability is

$$ \frac{\binom{N}{K} \times |B|}{|A|} = \frac{\binom{N}{K} \times |B|}{N^N}. \tag1 $$

When examining whether order of selection is to be regarded as important, the numerator and denominator in (1) above must be computed in a consistent manner. Further, it is very convenient to regard order of selection as important, when (for example) enumerating $A$. This convenience drives my strategy.


In (1) above, the factor of $\binom{N}{K}$ in the numerator reflects that any $K$ items from $\{1,2,\cdots,N\}$ could be chosen to be the $K$ items that will be selected. Note that this approach takes advantage of the fact that $B$ represents that each of the items in $\{1,2,\cdots,K\}$ will be selected at least once.

Therefore, you have $\binom{N}{K}$ mutually exclusive subsets of ordered $N$-tuples, where each subset represents that $K$ specific elements from $\{1,2,\cdots,N\}$ will be selected.

So, based on (1) above, the problem has been reduced to computing $|B|$.


Let $S$ denote the subset of $A$, where each ordered $N$-tuple $\left(a_1, a_2, \cdots, a_N\right) \in S$ satisfies the following constraint:

  • Each component $a_i$ is an element in $\{1,2,\cdots,K\}$.

Notice that the set $S$ is a superset to the set $B$, and that the set $S$ will (also) include ordered $N$-tuples whose components do not span $\{1,2,\cdots,K\}$.

For $j \in \{1,2,\cdots,K\}$ let $S_j$ denote the subset of ordered $N$-tuples from $S$ that each satisfy the following constraint:

  • None of the components of the ordered $N$-tuple is equal to $j$.

Then

$$|B| = |S| - |S_1 \cup S_2 \cup \cdots \cup S_K|. \tag2 $$

Let $T_0$ denote $|S|$.

Let $T_1$ denote $~\displaystyle \sum_{1 \leq i_1 \leq K} |S_{i_1}|.$
Thus, $T_1$ denotes the summation of $~\displaystyle \binom{K}{1}$ terms.

Let $T_2$ denote $~\displaystyle \sum_{1 \leq i_1 < i_2 \leq K} |S_{i_1} \cap S_{i_2}|.$
Thus, $T_2$ denotes the summation of $~\displaystyle \binom{K}{2}$ terms.

Similarly, for $r \in \{3,4,\cdots,(K-1)\}$
let $T_r$ denote $~\displaystyle \sum_{1 \leq i_1 < i_2 < \cdots < i_r \leq K} |S_{i_1} \cap S_{i_2} \cap \cdots \cap S_{i_r}|.$
Thus, $T_r$ denotes the summation of $~\displaystyle \binom{K}{r}$ terms.

Then, in accordance with Inclusion-Exclusion theory,

$$|B| = \sum_{r=0}^{K-1} (-1)^r T_r.$$

So, the problem is reduced to computing each of
$T_0, T_1, \cdots, T_{K-1}.$


$\underline{\text{Computation of} ~T_0}$

There are $K$ choices for each component of the ordered $N$-tuple in $S$. Therefore,

$$T_0 = |S| = K^N.$$


$\underline{\text{Computation of} ~T_1}$

Similar to the analysis in the previous section, when enumerating $S_1$, there are $(K-1)$ choices for each component of the ordered $N$-tuple in $S_1$. Therefore,

$\displaystyle |S_1| = (K-1)^N.$

Further, by symmetry, $|S_1| = |S_2| = \cdots = |S_K|.$

Therefore,

$$T_1 = \binom{K}{1} \left(K-1\right)^N.$$


$\underline{\text{Computation of} ~T_2}$

Similar to the analysis in the previous section, when enumerating $\left(S_1 \cap S_2\right)$, there are $(K-2)$ choices for each component of the ordered $N$-tuple in $\left(S_1 \cap S_2\right)$. Therefore,

$\displaystyle |S_1 \cap S_2| = (K-2)^N.$

Further, by symmetry, for each $1 \leq i_1 < i_2 \leq K,$ you have that $|S_{i_1} \cap S_{i_2}| = |S_1 \cap S_2|.$

Therefore,

$$T_2 = \binom{K}{2} \left(K-2\right)^N.$$


$\underline{\text{Computation of} ~T_r ~: 3 \leq r \leq (K-1)}$

Similar to the analysis in the previous section, when enumerating $\left(S_1 \cap S_2 \cap \cdots \cap S_r\right)$, there are $(K-r)$ choices for each component of the ordered $N$-tuple in $\left(S_1 \cap S_2 \cap \cdots \cap S_r\right)$. Therefore,

$\displaystyle |S_1 \cap S_2 \cap \cdots \cap S_r| = (K-r)^N.$

Further, by symmetry, for each $1 \leq i_1 < i_2 < \cdots < i_r \leq K,$ you have that $|S_{i_1} \cap S_{i_2} \cap \cdots \cap S_{i_r}| = |S_1 \cap S_2 \cap \cdots \cap S_r|.$

Therefore,

$$T_r = \binom{K}{r} \left(K-r\right)^N.$$


Final computation:

$$|B| = \sum_{r=0}^{K-1} (-1)^r \times T_r = \sum_{r=0}^{K-1} \left[(-1)^r \times \binom{K}{r} \left(K-r\right)^N\right]. \tag3 $$

Combining (3) and (1), the desired computation of the probability is

$$ \frac{\binom{N}{K} \times |B|}{N^N}, $$

where $|B|$ is computed in (3) above.

user2661923
  • 35,619
  • 3
  • 17
  • 39