Consider the following code:
from keras.preprocessing.text import Tokenizer
tokenizer = Tokenizer(num_words = 5000)
tokenizer.fit_on_texts(texts)
print('Found %d unique words.' % len(tokenizer.word_index))
When I run this, it prints:
Found 88582 unique words.
My question is, isn't num_words
the parameter that controls the number of words in the mapping dictionary known as tokenizer.word_index
? Then why it still holds 88582
words when I explicitly asked it to keep only 5000
words?