0

A railroad numbers its locomotives in order 1..N. One day you see a locomotive with the number 60. Estimate how many locomotives the railroad has.

Using the Likelihood Approach:

Assume prior is {1..1000}. So we assume in this case 1000 trains with uniform distribution.

$$ P(60Trains|see60)=(1/60)/\sum_{i=60}^{1000} (1/n)$$

To get the mean of posterior:

Add up the expected value of probabilities computed from equation above for 60 to 1000 to gets answer of ~333 Trains.

Suppose that in addition to train 60 we also see trains 30 and 90. What is the mean of the posterior?

  • Related but without the prior: http://math.stackexchange.com/questions/111374/solution-to-locomotive-problem-mosteller-fifty-challenging-problems-in-probabi – Henry Nov 09 '14 at 23:07
  • @Henry there is a lot of great content for locomotive using MLE. Was looking for Bayesian explanation. Specifically posterior mean. Above link does not address. – user3357381 Nov 09 '14 at 23:15

1 Answers1

1

I would start by saying that I do not like your prior. What would happen if you saw train $6000$?

But let's do the calculation as you have set the problem: if this is a uniform distribution then the prior is $\pi(n) = \frac{1}{1000}$ for $1 \le n \le 1000$ and the likelihood of seeing train $60$ is $\Pr(60|n)= \frac{1}{n}\mathbb{1}_{n\ge 60}$ so the posterior is $$\pi(n|60)=\frac{\frac{1}{1000n}}{ \sum_{m=60}^{1000}\frac{1}{1000m}} = \frac{1}{n (H_{1000}-H_{59})} \approx \frac{ 0.3543251}{n}$$ for $60 \le n \le 1000$ and so the expected value is about $0.3543251 \times 941 \approx 333.4$ as you say.

If your prior had been uniform in $[1,100]$ you would have had an expected value of about $78.2$; if it had been uniform in $[1,10000]$ you would have had an expected value of about $1939.9$. So the sensitivity to your prior is clear.

Doing more trainspotting can be treated in the same way: let's assume it is without replacement to avoid the question of whether seeing a train once makes it more likely to be seen again. The prior is the same. The likelihood of seeing trains $60, 30, 90$ is $\Pr(60, 30, 90|n)= \frac{1}{n(n-1)(n-2)}\mathbb{1}_{n\ge 90}$ so the posterior is $$\pi(n|60, 30, 90)=\frac{\frac{1}{1000n(n-1)(n-2)}}{ \sum_{m=90}^{1000}\frac{1}{1000m(m-1)(m-2)}} \approx \frac{15787.77}{1000n(n-1)(n-2)}$$ and the expected value is about $163.6$ (or about $94.7$ or $176.4$ with the other priors; it would have been about $164.3$ if I had used sampling with replacement, showing that that assumption was less critical than the prior).

Henry
  • 157,058