Who teaches English?
Now, after tokenizing, stemming.. it gives me
Who, teach, English
In my list of word, I have a word called
teacher
Lemmatizing, stemming teacher gives teacher and lemmatizing, stemming teaches gives teach
Even, calculating edit_distance will not solve this.. As, edit_distance comes out to be 2.
Now, What do I do to have teacher and teach treated as similar? Similarly, there may be other cases with extra 's' at the end. Is there some stemmer that solves this problem? Is there any solution?
Other similar example can be: instructor and instructs