Langid Language Codes and Confidence | Generated by AI
In langid
, the classify(text)
function returns a tuple (lang, confidence)
where:
lang
is a two-letter ISO 639-1 language code (sometimes extended with region codes likezh-cn
orzh-tw
).confidence
is a float between 0 and 1 indicating how confident the model is.
The set of possible values for lang
comes from the languages supported by langid.py
. Out of the box, it includes 97 languages. Some common examples:
en
→ Englishzh
/zh-cn
/zh-tw
→ Chinese (generic / Simplified / Traditional)ja
→ Japaneseko
→ Koreanfr
→ Frenchde
→ Germanes
→ Spanishhi
→ Hindiar
→ Arabicru
→ Russian
And many others — the full list covers most major world languages (Afrikaans, Bengali, Greek, Hebrew, Italian, Portuguese, Swahili, Thai, Turkish, Vietnamese, etc.).
👉 If you want, I can give you the complete list of all 97 codes supported by langid
. Would you like me to output that?