1

I am loading the model using gensim package this way:

from gensim.models import FastText
model = FastText.load_fasttext_format('wiki-news-300d-1M-subword.bin')

as stated here.

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe6 in position 57: unexpected end of data

The .bin file is downloaded from this source.

How to load the model correctly?

rishi_07
  • 11
  • 2

2 Answers2

0

The first answer is good but might make you confused. Those two steps should be implemented together.

See this link: https://github.com/RaRe-Technologies/gensim/issues/2378#issuecomment-791999124

However fasttext package does not have the 'similarity' module, you can add the scipy way in case you need.

import fasttext
from scipy import spatial

model = fasttext.load_model('wiki-news-300d-1M-subword.bin')

print(model['teacher']) print(model['teaches'])

result = 1 - spatial.distance.cosine(model['teacher'], model['headteacher']) print(result)

0

You could use the load_model method instead of load_fasstext_format method. Also, you could FastText Library directly without really having to install gensim to do the same.

Nischal Hp
  • 775
  • 3
  • 10