Langid Language Codes and Confidence | Generated by AI
In langid, the classify(text) function returns a tuple (lang, confidence) where:
langis a two-letter ISO 639-1 language code (sometimes extended with region codes likezh-cnorzh-tw).confidenceis a float between 0 and 1 indicating how confident the model is.
The set of possible values for lang comes from the languages supported by langid.py. Out of the box, it includes 97 languages. Some common examples:
en→ Englishzh/zh-cn/zh-tw→ Chinese (generic / Simplified / Traditional)ja→ Japaneseko→ Koreanfr→ Frenchde→ Germanes→ Spanishhi→ Hindiar→ Arabicru→ Russian
And many others — the full list covers most major world languages (Afrikaans, Bengali, Greek, Hebrew, Italian, Portuguese, Swahili, Thai, Turkish, Vietnamese, etc.).
👉 If you want, I can give you the complete list of all 97 codes supported by langid. Would you like me to output that?