0

I have been trying to understand the derivation from the UMVUE for the German Tank Problem. We have $n$ values sampled without replacement from a population $\{1, 2, \cdots, N\}$ of unknown size $N$, and we wish to estimate $N$. Say the samples we observes are $X_1, \cdots, X_n$, and the ordered statistics are $X_{(1)}, \cdots X_{(n)}$.

In Johnson 1994, it states that the probability that the probability that the maximum of our observations equals $j$ is:

$$Pr[X_{(n)} = j] = \frac{{j-1}\choose{n-1}}{{N}\choose{n}}$$

However, I keep thinking that there is a missing factor of $n$.


My thought process:

Numerator: We have $n$ slots for our sample. We have $1\choose 1$ ways of pulling $j$ from the population, and we have ${n \choose 1} = n$ ways to place it in our $n$ slots. The remaining $n-1$ slots must be filled with one of $\{1, \cdots, j-1\}$. All items are distinguishable, and there are ${n-1 \choose n-1} = 1$ ways to choose the remaining slots. So, omitting the factors of 1,

$$numerator = n \times {{j-1}\choose{n-1}}$$

Denominator: We have $n$ slots and can choose from any of our $N$ in the population. We have ${n \choose n} = 1$ choices for which slots we take. All items are distinguishable. So

$$denominator = {N \choose n} $$


What might I be misunderstanding? (I suspect it's something to do with "sequences vs sets" and/or the hypergeometric distribution, whose intuitions seem to often elude me...) Thank you in advance!

DGK
  • 163
  • 1
  • 6

1 Answers1

1

You’re sometimes distinguishing by order and sometimes not. You need to consistently distinguish by order either in both the numerator and the denominator or in neither.

Your denominator is correct if you don’t distinguish by order: There are $\binom Nn$ equiprobable unordered tuples of observations you could have made. Your numerator is wrong because while you don’t care about the order of the $n-1$ elements, you do introduce a factor of $n$ according to the $n$ points in the order of the observations at which $j$ could have been observed. So that factor of $n$ shouldn’t be there.

joriki
  • 238,052
  • To see if I understand: even though I say "first I choose $j$ [Event A], then I pick the rest [Event B]", my expression without $n$ builds an unordered tuple of all the items chosen in A and B, rather than an ordered tuple of "first the item in A, then the items chosen in B"?

    If the above is accurate, what would be the expression for (number of/probability of) the ordered tuple (A,B) = "first the item in A, and then the items chosen in B"?

    – DGK Mar 29 '20 at 21:19
  • @DGK I'm afraid I don't quite understand what you're saying, so this may not answer your question. Counting with distinguishing by order, you'd have $N(N-1)\cdots(N-n+1)=\frac{N!}{(N-n)!}$ in the denominator, since the first slot can be filled with $N$ different numbers, the next with $N-1$, and so on up to $N-n+1$. In the numerator, you'd have $n$ slots to put the $j$ in and then $(j-1)(j-2)\cdots(j-n+1)=\frac{(j-1)!}{(j-n)!}$ choices for the remaining $n-1$ slots. The difference with respect to the unordered counts is (perhaps not surprisingly) $n!$ in both the denominator and the numerator. – joriki Mar 29 '20 at 21:30
  • 1
    Thank you, I think I understand. To hopefully better describe what I meant: I was wondering how to count the number of sequences where $j$ appeared first, followed by $n-1$ items smaller than $j$. I was forgetting that all sequences with the same elements have the same probability (i.e. Pr[X = (1,5,2)] = Pr[X = (5,2,1)]). For my comment, I could first calculate the probability of getting a satisfactory unordered tuple, then calculate the fraction of ordered tuples from that subset which satisfies my constraint. (I now see how that differs from the original question.) – DGK Mar 29 '20 at 22:11