Which languages llama2 supports? I looked at the docs and huggingface but I couldn't find a list. Just it says usage in other languages than English as out-of-scope.
Asked
Active
Viewed 20 times
0
-
hi @heyula, welcome to the forum. If you find the answers to your question useful, please consider upvoting them, and accepting one by checking the tick mark next to it. If you don't find them helpful, please clarify in a comment why. – brewmaster321 Mar 04 '24 at 07:23
1 Answers
1
From the Llama2 paper here: "Language Identification. While our pretraining data is mostly English, it also includes text from a small
number of other languages. Table 10 shows the distribution of languages in our corpus, subsetted to those
found in more than 0.005% of the documents. Our analysis uses the fastText (Bojanowski et al., 2016) language
identification tool and a threshold of 0.5 for the language detection. A training corpus with a majority in
English means that the model may not be suitable for use in other languages.
"
It's probably better at python than French.

brewmaster321
- 1,330
- 1
- 3
- 10