I'm working through the Coursera NLP course by Jurafsky & Manning, and the lecture on Good-Turing smoothing struck me odd.
The example given was:
You are fishing (a scenario from Josh Goodman), and caught:
10 carp, 3 perch, 2 whitefish, 1 trout, 1 salmon, 1 eel = 18 fish
...
How likely is it that the next species is new (i.e. catfish or bass)
Let's use our estimate of things-we-saw-once to estimate the new things.
3/18 (because N_1=3)
I get the intuition of using the count of uniquely seen items to estimate the number of unseen item types (N = 3), but the next steps seem counterintuitive.
Why is the denominator left unchanged instead of incremented by the estimate of unseen item types? I.e., I would expect the probabilities to become:
Carp : 10 / 21
Perch : 3 / 21
Whitefish : 2 / 21
Trout : 1 / 21
Salmon : 1 / 21
Eel : 1 / 21
Something new : 3 / 21
It seems like the Good-Turing count penalizes seen items too much (trout, salmon, & eel are each taken down to 1/27); coupled with the need to adjust the formula for gaps in the counts (e.g., Perch & Carp would be zeroed out otherwise), it just feels like a bad hack.