0

In a city there are $N > 1$ cabs with the clearly visible numbers $1, ..., N$. A tourist stands at one of the central places of the city and observes the traffic, thereby he sees $n$ cabs and notes the cab numbers as $X_1, ..., X_n$. Small note: The noted cabs appear partly twice in the list. The tourist wants to estimate the number of cabs with the help of the information that they are uniform distributed over the city. For this he has thought of the following estimator:

$U_n = max(X_1, .., X_n)$ and $ \hat{U_n} = \frac{2}{n} \cdot \sum_{i=1}^nx_i$

a) Calculate $E(x_i)$

Because of the interval [1,n], I thought of $E(x_i)= \frac{1+n}{2}$.

b) Calculate $E(\hat{U_n})$

$E(\hat{U_n}) = E(\frac{2}{n} \cdot \sum_{i=1}^nx_i) = \frac{2}{n} \cdot \sum_{i=1}^n E(x_i) = \frac{2}{n} \cdot n \cdot \frac{1+n}{2} = n+1$. I don't think that this is correct because it would be a bad estimator for N because of the possibility for multiple taxis.

c) Determine $P (U_n ≤ k)$ with $k ∈ [1, ..., N]$ and derive $P(U_n = k)$

I have no clue how to solve this.

I appreciate any ideas or solutions to my problem.

  • Your (a) should be $\frac{1+N}{2}$ according to your argument and that affects your answer to (b). You could start by answering $(c)$ by saying $P(U_n \le u)$ is the probability $X_1\le u, X_2\le u,\ldots, X_n\le u$. – Henry Sep 16 '22 at 02:15
  • This is "The German Tank Problem", see https://en.wikipedia.org/wiki/German_tank_problem See also https://math.stackexchange.com/questions/65398/why-does-this-expected-value-simplify-as-shown and https://math.stackexchange.com/questions/455840/why-are-these-estimates-to-the-german-tank-problem-different and https://math.stackexchange.com/questions/1903456/what-are-some-problems-in-which-bayesians-and-frequentists-get-different-results and lots of other earlier appearances of the problem on this website. – Gerry Myerson Sep 16 '22 at 03:03
  • @Henry So b) would be N+1 which seems to be a good estimator for N, right? – Dieter Brow Sep 16 '22 at 11:08
  • @GerryMyerson Thanks for your help, I had not found this topic here before with my searches. My problem is that my text says "The noted cabs appear partly twice in the list". And the "German Tank Problem" seems to be sampling without replacement. – Dieter Brow Sep 16 '22 at 11:11
  • I would guess that if the numbers are large enough it wouldn't make much difference. Or, that if you understand the argument for sampling without replacement, you can make the modifications to handle sampling with replacement. – Gerry Myerson Sep 16 '22 at 11:28
  • @DieterBrow "double the average" is not a bad estimator but, as well as being slightly biased upwards, it sometimes might be implausibly low, in particular when $U_n < \max x_i$ – Henry Sep 16 '22 at 11:34
  • @Henry Isn't "double the average" wrong because we don't want to estimate the expected value of X, but the end of the interval, i.e. N? Then N+1 would be good, wouldn't it? – Dieter Brow Sep 16 '22 at 12:49
  • 1
    @DieterBrow You defined $\hat U_n$ as double the average i.e. $2 \cdot \frac{1}{n} \sum_{i=1}^nx_i$ – Henry Sep 16 '22 at 14:06
  • Perhaps you can benefit from the observation of repeated cab numbers in the sample and use this information in the estimation of the population size... have a look at this question about using the Good-Turing estimate for the number of books in a library given observation of the number of repeated of books sampled from that library https://math.stackexchange.com/questions/615464/how-many-books-are-in-a-library – bluemaster Sep 17 '22 at 01:57

0 Answers0