Korean Misclassified as Chinese Validator Fix | Generated by AI
Got it — in this debug output:
Debug: raw langdetect output: [('ko', 0.9999995211930662)]
...
Detected languages: [('ko', 0.9999995211930662)]
The code ko
is the ISO 639-1 language code for Korean. 🇰🇷
So langdetect
is saying:
- It thinks the text is Korean (
ko
) - With probability ≈ 0.9999995 (essentially 100%)
That explains the error: your validator expected "zh-tw"
(Traditional Chinese), but the detector decided the input looks like Korean instead.
👉 Quick mapping reminder from langdetect
/ ISO 639-1:
zh
→ Chinese (unspecified)zh-cn
→ Simplified Chinese (China)zh-tw
(orhant
in your mapping) → Traditional Chinese (Taiwan)ja
→ Japaneseko
→ Koreanne
→ Nepalihi
→ Hindi
Do you want me to suggest how you could harden your validator so it doesn’t fail when langdetect
misclassifies (e.g. allowing a fallback if the target script characters are clearly Chinese but the code comes out as ko
)?