5

Suppose, we throw a biased coin $N$ times with $p(\text{head}) = \pi$, and we observe the number of heads as $k$ (could be any number, say $k=4$ for simplicity). We are interested in to find the most likely $N$ as a function of $\pi$.

The likelihood can be written as (for $k=4$), $$p(x = 4 | N,\pi) = {N\choose 4} \pi^4 (1-\pi)^{N-4}$$

I aim to calculate,$$N^* = \text{argmax}_N p(x=4|N,\pi)$$which is, it turns out, pretty hard to solve analytically for $N$ (you can try it yourself). Although it is a discrete variable, I tried to differentiate the log-likelihood wrt $N$ (since log is monotone, the result stays same) and tried to solve for $N$ which resulted in insolvable equations for me.

So far so good. What makes this interesting for me is that, solving the problem for $\pi$ and finding most likely values of $\pi$ as a function of $N$, and then leaving $N$ alone seems to give the correct result. If you differentiate the likelihood (not log-likelihood) with respect to $\pi$, then set it to zero, and solve for $\pi$, you will find $\pi = 4/N$.

Now choosing $N = 4/\pi$ is consistent with empirical results, it seems true; although, I couldn't calculate it via maximizing $N$ directly. Now see the figure.

Blue line is the computationally calculated for maximum $N$'s for corresponding $\pi$'s and red is the $4/\pi$.

I wonder how it can be true via solving for $\pi$ instead of $N$. Is there a general property about this likelihood that I am missing?

RobPratt
  • 45,619

2 Answers2

3

This is an old thread, but there are some wrong answers that need clarification, as this is a common misconception. The MLE of $N$, assuming the sampling probability $\pi$ is known, is generally not equal to $\frac{k}{\pi}$.

Let's assume that $N$ is a continuous parameter. The log-likelihood of the Binomial, ignoring terms that do not contain $N$, is equal to $$\ln{N \choose k} + (N-k)\ln(1-\pi).$$ Setting the derivative w.r.t $N$ equal to zero yields $$H_{N}-H_{N-k} + \ln({1-\pi})=0,$$ where $H_k$ is the k-th harmonic number. This is also mentioned in Eq. (2.5) in this paper.

This equation is only implicitly solvable for $N$, but the difference $H_N-H_{N-k}$ can be bounded by $$\ln({\frac{N+1}{N+1-k}}) \leq H_{N}-H_{N-k} \leq \ln({\frac{N}{N-k}}).$$ Combining with the derivative equation gives $$\frac{k}{\pi}-1 \leq N \leq \frac{k}{\pi}$$.

In general there is inequality $H_N-H_{N-k} < \ln({\frac{N}{N-k}})$, hence also $N < \frac{k}{\pi}$, although not by much.

If $N$ is required to be integer, then the MLE has solution $\lfloor\frac{k}{\pi}\rfloor$ if $\frac{k}{\pi}$ is non-integer, and two solutions $\frac{k}{\pi}$ and $\frac{k}{\pi}-1$ when $\frac{k}{\pi}$ is integer. Note that these are exactly the only possible integer solutions in the interval above.

fawadria
  • 373
  • 2
  • 9
  • great answer! can your answer be applied to my question? https://math.stackexchange.com/questions/4757261/estimating-the-number-of-trials-in-a-binomial-distribution – stats_noob Dec 31 '23 at 21:03
0

If you've tossed $N$ coins, and received $X$ heads, then the MLE for $\pi$ is $\hat \pi = \frac{X}{N}$, which you are aware of.

We can write this more abstractly as $\pi^*= \text{argmax}_{\pi} p(X|N,\pi) \implies N\pi - X=0$. This is the general maximum likelihood condition for the Binomial distribution. If $X,N,\pi$ satisfy this relation, then the associated binomial probability will be maximized, in the sense that you cannot increase the probability by adjusting one of the variables, holding the other two fixed.

Normally, you are given $N$ and $X$ and must find $\pi$, but given $X$ and $\pi$ we see that there is exactly one $N$ that will satisfy this relation. Hence, there is a one-to-one mapping between any two of these quantities and the third.

Conversely, imagine that $N^*\neq X\pi$, this implies that the mode of the binomial distribution is not at $Xp$, but by definition, it must be. Hence, there would be a contradiction.

  • For fixed $x$ ($x=4$), we can write $p(x=4 | \pi,N) = \mathcal{L}(N,\pi)$. So $\pi^*$ is a maximiser for a fixed $N$, I still can not see that why this is true. Shouldn't we show that $N$ is unique in some way? May be if I plot $\mathcal{L}$ wrt $(N,\pi)$, that would shed some light. –  Oct 15 '14 at 04:40
  • @Deniz don't get caught up in the weeds of the likelihood function, its not a nice, differentiable function in N. Just note that the maximum likelihood occurs where $\pi = \frac{X}{N}$, regardless of which variable know. If you don't know $N$, the most likely value is the one that ensures that $X$ occurred with the actual success probability (or gets as close as possible). The condition I gave is simply a re-arrangment of the MLE for $\pi$, since it defines where the maximum occurs. –  Oct 15 '14 at 04:47