4

Here is a problem that bothers me, could some one grand me some help?

There is a sequence of N random integers, {$X_1,X_2,...,X_N$}. Each $X_i$ is uniformly chosen from a integer set {1,...,M}.

For each specific values of the sequence, {$x_1,x_2,...,x_N$}, if there exits repetition, we put the all the duplicated variables in a bag. For example, when N = 6, M = 7, if the sequence is {$x_1=1,x_2=1,x_3=2,x_4=2,x_5=3,x_6=6,x_7=3$}, we put $x_1,x_2,x_3,x_4,x_5,x_7$ in the bag, and the bag size is 6 since we have 6 duplicated variables.

My question is that, what is the average bag size?

I know this can be solved by writing a simple program, but I am wondering if there exits an function relationship between N, M, and the average bag size.

Thank you!

Willie Wong
  • 73,139
Eric
  • 41

1 Answers1

4

Each $x_i$ is not put in the bag if the $N-1$ other $x_j$ are different from $x_i$. This happens with probability $\left(1-\frac1M\right)^{N-1}$. This probability does not depend on $i$ hence the mean number of variables put in the bag is $$ \sum_{i=1}^NP[x_i\ \text{is put in the bag}]=N\cdot\left(1-\left(1-\frac1M\right)^{N-1}\right). $$

Did
  • 279,727