2

How can I calculate the following:

Suppose I have an event (in my case, a computer bug) that can happen with unknown probability $p$. I run $n$ experiments and don't see the bug. What is the probability that $p$ is below some fixed value $p_0$?

(I am not well versed in this type of statistical analysis. From a bit of research I gather this is a Bernoulli process, and I saw some formulas for determining things like the probability of $k$ successes in $n$ runs, but not exactly what I am interested in.

This question occurred to me as I am considering how often one should replicate a bug to make sure it is a consistent bug and not a sporadic bug. Or how sure you can be a bug is fixed if you don't see it anymore.)

1 Answers1

2

There is no such thing as "the" probability without more input. You can calculate a number with Bayesian methods by making the following additional assumptions:

  • First assuming a prior probability distribution over the possible values of $p$. The simplest option is the uniform distribution on $[0, 1]$. If $n$ is large enough it doesn't matter too much which prior you pick.
  • Then assuming that each of your experiments is independent. This is a strong assumption and in my opinion it's quite debatable whether it holds in this case, at least without putting in a certain amount of work to make sure that your experiments are "sufficiently different" from each other.

With those additional assumptions we can use Bayes' theorem to update on the observation that $n$ independent experiments don't produce the bug. The posterior probability distribution over the possible values of $p$ is then proportional to the likelihood $(1 - p)^n$, which gives

$$\mathbb{P}(p \le x) = \frac{\int_0^x (1 - t)^n \, dt}{\int_0^1 (1 - t)^n \, dt} = \boxed{ 1 - (1 - x)^{n+1} }.$$

The median of this distribution can be computed by setting this probability to $\frac{1}{2}$, which gives $(1 - x_{\text{median}})^{n+1} = \frac{1}{2}$ or

$$x_{\text{median}} = 1 - \sqrt[n+1]{\frac{1}{2}} = 1 - \exp \left( - \frac{\log 2}{n+1} \right) \approx \frac{\log 2}{n+1}.$$

So this distribution is concentrated around $O \left( \frac{1}{n} \right)$ which should be reasonably intuitive. You can also compute the posterior mean using the Beta function integral, which gives

$$\mathbb{E}(p) = \frac{\int_0^1 t(1 - t)^n \, dt}{\int_0^1 (1 - t)^n \, dt} = \frac{1}{n+2}.$$

This is a special case of Laplace's rule of succession.

Qiaochu Yuan
  • 419,620
  • Cool, exactly what I wanted, including the two assumptions. I agree the second one is dubious, but this is still a good starting point to think about the problem. – Herman Tulleken Jan 25 '23 at 19:59