1

So suppose I have a uniform distribution $\mathcal{U}_{[0,1]}$. I want to know the PDF of how many samples it will take to sample a value $x\leq0.01$.

To be clear, let's say I sample the distribution 149 and on the 150th sample I get $x=0.007$. Then it took 150 samples to fulfill this criteria. Now, I can repeat this over and over again and histogram the results. I expect that as the number of samples gets very large, my distribution will begin to look like a normal distribution with $\mu=100$. Is this correct?

If so, what is the standard deviation $\sigma$ of this distribution as the number of samples $N$ gets large? And what about in the case of a general uniform distribution $\mathcal{U}_{[a,b]}$? Ideally, I'm looking for an explicit equation for $\sigma(a, b, N)$ of the normal distribution that approximates this PDF.

This may be related to other questions on here but I don't totally understand how they would answer my question (in particular, I think this question is almost helpful to me!).

  • This is so not-normal that for a given independent probability of success, no matter how small, the most likely first occurrence of a success is the first draw – Henry Sep 09 '22 at 09:26

1 Answers1

0

I think you are asking for the distribution of waiting times for an event with probability $0.01$: this is a geometric distribution Geom$(p)$, with p=0.01. For this distribution, $E(X) = \frac{1}{p}$ and $Var(X)= \frac{1-p}{p^2}$. In your case this gives mean 100, as you say, and variance $9900$, so standard deviation about $99.5$. The distribution will not be approximately Normal: the pdf is $P(X=n) = (1-p)^{n-1}p$, so you would expect the histogram to look like a negative exponential (though discrete).

For the general case, with a uniform distribution on $[a,b]$, and the question "when do I get the first value lying in a given range of length $L$?", the results are the same, with $p=\frac{L}{b-a}$.

mcd
  • 3,448