0

I am interested in making a certain statement on the probability distribution of "drawing a sample with $d$ distinct elements, when the sample is of size $k$ and is drawn (with replacement) from a set containing $N$ distinct elements ". Here is an example (just to avoid confusion):

--EXAMPLE--

$N=\{A,B,C\}$; $k=2$

The number of possible samples of length $2$ when drawing a random sample (with replacement) from $N$ is given by $3^{2}=9$. More specifically these samples are:

$\{A,A\}$,$\{B,B\}$,$\{C,C\}$ (1 distinct element in each sample)

$\{A,B\}$,$\{B,A\}$,$\{A,C\}$,$\{C,A\}$,$\{B,C\}$,$\{C,B\}$ (2 distinct elements in each sample)

So the probability of drawing 1 distinct element is 3/9, and the probability of drawing 2 distinct elements is 3/9. Any ideas?

--END OF EXAMPLE--

In general the probability is given by:

$Prob(d|k,N)=(\frac{1}{N^{k}})\frac{N!}{(N-d)!}S(k,d)$

where $S(k,d)$ are Stirling numbers of the second kind.

I would now like to make a statement on what happens if $k$ or $N$ changes. $Intuitively$, if $N$ or $k$ increase, it will become $less$ likely that you draw a sample with a $low$ number of distinct elements and $more$ likely that you draw a sample with a $high$ number of distinct items (probability mass is shifting towards high values of $d$).

If we therefore consider the sign of the change in probability (that is the sign of $Prob(d|k+1,N)-Prob(d|k,N)$), there should be some critical value $d^*$ such that if $d<d^*$ then the sign is negative, whereas if $d>d^*$ then the sign if positive. There would therefore only be $one$ sign change (this is what I ultimately want to prove).

$\textbf{Question}$: How do I go about proving this $formally$? One thing I cannot seem to get my head around is how to deal with the Stirling numbers.

Clearly if $d=1$,then $S(k,1)=1$ so $Prob(1|k,N)=N^{-k}$. For this case it is easy to see what happens if $N$ or $k$ change, but for the subsequent cases it is less obvious. One thing I have been trying is to exploit the recurrence relationship of the Stirling numbers: $S(k+1,d)=d*S(k,d)+S(k,d-1)$, but without any success. Using this approach I do already know that the expected number of distinct items is equal to:

$E[d]=N-\frac{(N-1)^{k}}{N^{k-1}}$

Which is clearly increasing in $k$ and $N$. This does however not suffice to show the above statement.

Abe Doe
  • 158
  • I think you're trying to show the probabilities are unimodal, and maybe using that word as part of a websearch will turn up something useful for you. – Gerry Myerson Aug 09 '12 at 07:22
  • Dear Gerry, thanks for the suggestion. $Prob(d|k,N)$ can indeed be shown to be (strongly) unimodal in $d$. This essentially follows from the log-concavity in $d$ of $S(k,d)$ and $\frac{1}{(N-d)!}$. However, what I want to show is what happens if $k$ or $N$ change. This new probability distribution will again be (strongly) unimodal. I now want to show that the difference in probability of these two distributions only has one sign change. – Abe Doe Aug 09 '12 at 13:26
  • EDIT: Graphically the difference distribution (i.e. $Prob(d|k+1,N)-Prob(d|k,N)$) does not appear to be unimodal in $d$. – Abe Doe Aug 09 '12 at 13:39

1 Answers1

0

An increase in $N$

Given that: $$Prob(d|k,N)=(\frac{1}{N^{k}})\frac{N!}{(N-d)!}S(k,d)$$

We have that $$Prob(d|k,N+1)\geq Prob(d|k,N)$$ if and only if $$\frac{\left( N+1\right) !}{\left( N+1\right) ^{k}\left( N+1-d\right) !}\geq\frac{N!}{N^{k}\left( N-d\right) !}$$

Solving for $d$ yields

$$d \geq d_{N}^{\ast}\left( k\right) =\left( N+1\right) \left( 1-\left( \frac{N}{N+1}\right) ^{k}\right) ,\text{ where }1<d_{N}^{\ast}\left( k\right) <N+1$$

So there would be only one sign change (i.e. at $d_{N}^{\ast}$)

An increase in $k$

Similarly we have that $$Prob(d|k+1,N)\geq Prob(d|k,N)$$ if and only if

$$\frac{1}{N^{k+1}}S\left( k+1,d\right)\geq\frac{1}{N^{k}}S\left(k,d\right)$$ which reduces to

$$\frac{1}{N} \geq R\left( k,d\right) =\frac{S\left( k,d\right) }{S\left( k+1,d\right) }$$

Now note that the LHS of the inequality is positive, weakly smaller than $1 $, and independent of $d$. The RHS is strictly decreasing in $d$ [See Theorem 3.2 in Sibuya (1988)]; It also holds that $R\left( k,1\right) =1$ $\ $and $R\left( k,k+1\right) =0$. Consequently there exists a unique $1\leq d_{k}^{\ast}\left( N\right) \leq k+1$ such that $\frac{1}{N}\geq R\left( k+1,d\right) $ if $m\geq d_{k}% ^{\ast}\left( N\right) $.

So there would be only one sign change (i.e. at $d_{k}^{\ast}$)

Reference:

SIBUYA, M. "Log-Concavity of Stirling Numbers and Unimodality of Stirling Numbers", Ann. Ins. Statist. Math., Vol. 40, Nr. 4 (1988), pp. 693-713.

Abe Doe
  • 158