2

I would like to develop a simple test for the uniform distribution of a discrete random variable, but I did not manage to find on Wikipedia or here the relevant informations, and I am pretty sure that someone will be able to help me.

Let us assume that an experiment has $n$ possible outcomes, $\{1,2,\ldots,n\}$, all with the same probability.
Once we perform $n^3$ experiments, we denote as $M$ the number of experiments leading to the most successful outcome, $L$ the number of experiments leading to the least successful outcome.

  1. What is the average value of $M-L$?
  2. What is the distribution of $M-L$?

I would guess that the typical outcome has frequency $n^2\pm cn$, such that $M-L$ is expected to be $2cn$ for some explicit constant $c$.
I am not really sure about the second point, I am just guessing a Beta distribution of some sort.

Jack D'Aurizio
  • 353,855

1 Answers1

2

I suspect that the distribution of $M-L$ is closely approximated by a suitable beta-binomial distribution with parameters $(a, b, n^3)$ for suitably large $n$. The quality of this approximation is poor when $n \le 4$ but improves substantially thereafter. I did some simulation to empirically determine values of $a$ and $b$ as a function of $n$, but with limited success in discerning a pattern. These may be summarized in the following table: $$\begin{array}{c|cc} n & a & b \\ \hline 2 & 1.5071 & 4.2193 \\ 3 & 6.66791 & 28.872 \\ 4 & 11.1539 & 75.7145 \\ 5 & 14.9201 & 145.694 \\ 6 & 18.1219 & 239.632 \\ 7 & 20.8874 & 357.923 \\ 8 & 23.26 & 500.029 \\ 9 & 25.3165 & 665.664 \\ 10 & 27.2136 & 857.58 \\ 11 & 28.9557 & 1075.75 \\ 12 & 30.5506 & 1320.17 \\ 13 & 31.9877 & 1589.23 \\ 14 & 33.4234 & 1890.14 \\ 15 & 34.7004 & 2214.84 \\ 16 & 35.892 & 2566.21 \\ 17 & 37.1326 & 2954.86 \\ 18 & 38.2098 & 3363.76 \\ 19 & 39.3065 & 3808.03 \\ 20 & 40.338 & 4280.45 \\ \end{array}$$ I post this as an invitation for others to see if they can tease out a pattern in these parameter estimates, keeping in mind that these are empirically derived.

heropup
  • 135,869
  • (+1) precious data indeed. The turnpoint $n=5$ is not surprising either, since it also represents the instant in which a binomial distribution starts to be closely approximated by a gaussian distribution. – Jack D'Aurizio May 04 '20 at 22:10