Translation Instruction Layers and Improvements | Generated by AI
Here’s what the code reveals:
Current State: Two Competing Mechanisms
1. prompt front matter field (the “proper” way)
Used in ~18 files. Example from 2024-11-24-review-2024-en.md:
prompt: Do not translate specific items such as Yin, Yin Wang, Zhiwei, etc.
This gets picked up in scripts/translation/markdown_translate_client.py:27:
front_matter_prompt = front_matter_dict_copy.get("prompt", None)
Then injected into the LLM system prompt via translate_client.py:127-128:
if front_matter:
head += f"{front_matter}\n"
This is reliable — the LLM sees it as an instruction, not content.
2. **AI Translation tips:** inline in body (the niyao way)
Only in 2026-05-28-niyao-en.md. This is just markdown body content — the LLM sees it as article text, not a structured instruction. It works because LLMs are smart enough to infer intent, but it’s fragile and non-standard.
3. Hardcoded global term maps (code-level)
In translate_client.py:56-68 and translate_lang.py:22-34:
zh_term_map = {
"Zhiwei": "智维",
"Zhiwei Li": "李智维",
"Yin Wang": "王垠",
"Fun Live": "趣直播",
# ... 11 terms total
}
Plus validation in translate_validate_utils.py — check_prohibited_zh_terms() catches “志炜”, check_yin_wang_mistranslation() catches “阴王”/”王寅”.
The Problem
You have 3 layers doing the same thing differently:
- Global hardcoded maps in Python code (requires code change to update)
- Per-file
prompt:field in front matter (freeform string, no structure) - Inline
**AI Translation tips:**in body (fragile, non-standard)
The niyao post’s approach is clever — it keeps name mappings co-located with the content. But it should be in prompt: front matter, not inline in the body.
What Could Be Better
Option A: Migrate inline tips to prompt: field (minimal change)
Move the niyao tips to front matter:
prompt: "AI Translation tips: Niyao (倪瑶), Zhiwei Li (李智维), Xiaohe Meditation (小荷冥想), Zhuoyebang (作业帮), Tang Qiao (唐巧), Wang Wei (王巍)"
This is already supported — markdown_translate_client.py already injects prompt into the system prompt. Just delete the inline line from the body.
Option B: Structured translation_hints: front matter (cleaner)
Replace the freeform prompt: with a structured field:
translation_hints:
zh:
Niyao: 倪瑶
Xiaohe Meditation: 小荷冥想
Zhuoyebang: 作业帮
Then update translate_client.py to read this field and build term-specific rules automatically, similar to how the hardcoded zh_term_map works but per-file. This would:
- Be machine-parseable (no regex parsing of prose)
- Allow language-specific hints (different mappings for zh vs ja)
- Merge cleanly with the global map (per-file overrides global)
Option C: Move global maps to a YAML config file (best long-term)
Extract the hardcoded zh_term_map from code into scripts/translation/term_map.yaml:
zh:
Zhiwei: 智维
"Zhiwei Li": 李智维
"Yin Wang": 王垠
# ...
Per-file translation_hints: in front matter overrides entries from this global map. No code change needed to add new terms.
My Recommendation
Do A now (30 seconds), then C when you have 15 minutes:
-
Now: Move the niyao inline tips to
prompt:front matter, delete the inline line. Already works with existing code. -
Soon: Extract
zh_term_mapto a YAML config, updateget_language_specific_preamble()to load from it. Then per-filetranslation_hints:can extend/override without code changes.
Want me to do step 1 now?