Let's say in a particular field, the word the
has a specific meaning and not just be a determination. The common the
one and the specific the
one are use mixedly in the corpus. Is there a way to handle this? Or manually tagging the specific ones before pre-processing is the only option?
I've looked at the question Detect if word is «common English» word or slang word but it doesn't seem to answer this.