Coin toss with unknown probability – Bayesian interpretation

Question

I have observed a coin being tossed $n$ times. I do not know whether the coin is fair or not, but in every single toss I observed, the coin came up heads.

What should my belief about $p$ (the probability that the coin shows heads) be now? I cannot even say with certainty that $p>0$, since even an event with $p=0$ can occur. The frequency of heads is most compatible with $p=1$, but I doubt that is the best guess, especially if $n$ is low (it would be ridiculous to assume that $p=1$ after seeing a single heads only).

How can this be handled in a Bayesian framework? What is my best guess for the true value of $p$?

An estimate only makes sense after observing a reasonable number of tosses, lets say 200. — Peter, Feb 05 '14 at 17:25
I don't think you meant to write "since even an event with p=0 can occur". An event with p=0 can't occur. — TooTone, Feb 05 '14 at 17:30
@TooTone Of course it can, take, e.g. a continuous random variable. — Michael Hoppe, Feb 05 '14 at 17:38
@limulus check on minimax estimator in wikipedia, example 1 is something relevant to your question. briefly, uncertainty of success rate enter the problem as prior distribution. in example they using symmetric case but if you want you can change that. http://en.wikipedia.org/wiki/Minimax_estimator — lowtech, Feb 05 '14 at 17:46
@MichaelHoppe true for a continuous random variable, but each coin toss is a discrete Bernoulli random variable taking 1=heads or 0=tails. If $p=0$ for heads you will never get heads (and a binomial random variable for the $n$ tosses). I think the OP can say with certainty that $p>0$ because if $p=0$ then no heads would have been seen. — TooTone, Feb 05 '14 at 17:49
@TooTone You're right, but I wanted to point out that: “An event with $p=0$ *can't occur.” is false in general. Just nitpicking. — Michael Hoppe, Feb 05 '14 at 17:57
@MichaelHoppe Yes you're right it is false in general, and come to think of it I imagine that I am being thought of as a nitpicker in one or two areas I am studying at the moment! :) — TooTone, Feb 05 '14 at 23:19

score 0 · Accepted Answer · answered Feb 05 '14 at 17:35

This depends on the a priory assumption about $p$. If it is uniformly distributed a priory (i.e. $P(p<a)= a$ for $0\le a\le1$), then the probability of seeing $n$ heads in a row is $$\int_0^1 p^n \,\mathrm dp=\frac1{n+1}.$$ The probability of $n$ heads and $p<a$ is $$\int_0^a p^n \,\mathrm dp=\frac1{n+1}a^{n+1}.$$ Then the probability of $p<a$ given that we observe $n$ heads is $$ P(p<a\mid n\text{ heads})=\frac{\frac1{n+1}a^{n+1}}{\frac1{n+1}}=a^{n+1}.$$ In other words: With every head we observe, the cdf of $p$ is just a $(n+1)$th power and thus shifts further and further topwards $1$. We see that there is a 50% chance that $p>\frac1{\sqrt[n+1]2}$ and the most likely $p$ is indeed $1$ - even after a single head! If you find that counterintuitive it is because in the back of your head you don't start with all values of $p$ equally likely but rather with a huge bias towards "more or less" fair coins.

Usually statisticians refer to "prior" and "posterior" distributiosn, and it seems others refer to "a priori" and "a posteriori" distributions (and notice the spelling). "A priori" is almost always wrong if construed literally to refer to the concept in epistemology, so I think statisticians' conventional usage is arguably better. — Michael Hardy, Feb 05 '14 at 18:13

score 0 · Answer 2 · edited Apr 13 '17 at 12:21

Bayesianism is adherence to a degree-of-belief interpretation of probability rather than to a frequency interpretation.

Hagen von Eitzen's answer is correct if the prior degree of belief about $p$ is expressed by a uniform distribution.

The physicist Edwin Jaynes once argued in a paper that if one has never suspected either outcome of existing until one of them is observed, then that epistemic situation should be modeled by using $$ \frac{dp}{p(1-p)} \tag 1 $$ as the prior distribution. That is NOT a probability distribution since it assigns infinite measure to the whole space. If you observed heads ten times, the posterior would then be $$ \frac{p^9\,dp}{1-p}, $$ which is still not a probability distribution. At this point one is in the epistemic state of never having even suspected that the black swan --- the tails outcome --- is a possibility. But if one has tried twice and observed heads once and tails once, then one knows that both possible outcomes exist, and application of Bayes' formula to the prior $(1)$ yields the uniform distribution as the posterior.

If your epistemic state is like that --- knowing ONLY that those two outcomes are possible --- then Jaynes' argument would lead to the conclusion that the uniform distribution is the right prior.

Historically, in Thomas Bayes' famous posthumous paper that appeared in 1763, two years after his death, the uniform prior and the Beta posterior resulting from just this kind of experiment, was the only problem considered. It was in that paper that Bayes derived the result that $$ \int_0^1 \binom n k x^k(1-x)^{n-k}\,dx = \frac{1}{n+1} $$ by the method that I described here.

Coin toss with unknown probability – Bayesian interpretation

2 Answers2