1

I want to find similarity between a phrase and possible combination of tokens that may form the phrase.

For example, phrase = 'sea surface water'

Possible token = ['sea','surface land', 'surface water']

My approach: First, i generate combinations, c = ['sea','surface land', 'surface water', 'sea surface land','surface land surface water', 'sea surface water']

Then I compute cosine similarity between the phrase 'sea surface water' and each of 'c' elements.

Finally, i use the max cosine similarity to find out the best combination of the tokens that form the phrase, in this case, the combination 'sea surface water' which is made of two tokens 'sea' and 'surface water'

My question, is there any algorithm out there that address the same issue?

0 Answers0