8

Consider the following code:

from keras.preprocessing.text import Tokenizer

tokenizer = Tokenizer(num_words = 5000)
tokenizer.fit_on_texts(texts)
print('Found %d unique words.' % len(tokenizer.word_index))

When I run this, it prints:

Found 88582 unique words.

My question is, isn't num_words the parameter that controls the number of words in the mapping dictionary known as tokenizer.word_index? Then why it still holds 88582 words when I explicitly asked it to keep only 5000 words?

Mehran
  • 277
  • 1
  • 2
  • 12

1 Answers1

2

The problem is with the way things are documented. Check this link: https://stackoverflow.com/questions/46202519/keras-tokenizer-num-words-doesnt-seem-to-work

Prince
  • 21
  • 2