Suppose, we throw a biased coin $N$ times with $p(\text{head}) = \pi$, and we observe the number of heads as $k$ (could be any number, say $k=4$ for simplicity). We are interested in to find the most likely $N$ as a function of $\pi$.
The likelihood can be written as (for $k=4$), $$p(x = 4 | N,\pi) = {N\choose 4} \pi^4 (1-\pi)^{N-4}$$
I aim to calculate,$$N^* = \text{argmax}_N p(x=4|N,\pi)$$which is, it turns out, pretty hard to solve analytically for $N$ (you can try it yourself). Although it is a discrete variable, I tried to differentiate the log-likelihood wrt $N$ (since log is monotone, the result stays same) and tried to solve for $N$ which resulted in insolvable equations for me.
So far so good. What makes this interesting for me is that, solving the problem for $\pi$ and finding most likely values of $\pi$ as a function of $N$, and then leaving $N$ alone seems to give the correct result. If you differentiate the likelihood (not log-likelihood) with respect to $\pi$, then set it to zero, and solve for $\pi$, you will find $\pi = 4/N$.
Now choosing $N = 4/\pi$ is consistent with empirical results, it seems true; although, I couldn't calculate it via maximizing $N$ directly. Now see the figure.
Blue line is the computationally calculated for maximum $N$'s for corresponding $\pi$'s and red is the $4/\pi$.
I wonder how it can be true via solving for $\pi$ instead of $N$. Is there a general property about this likelihood that I am missing?