Misapplied Chat Template Causes Errors | Generated by AI

Home 2026.04

🦞 Here’s the full breakdown:

Root Cause

The GGUF model file DeepSeek-R1-0528-Qwen3-8B-Q4_K_M.gguf has a Qwen3 chat template baked into its metadata — because the base architecture is Qwen3 (note the name: “DeepSeek-R1-0528-Qwen3-8B”). But I started the server with --chat-template deepseek, which forced the DeepSeek LLM template format instead.

What went wrong mechanically

Chat templates define how messages get serialized into the prompt string the model actually sees. For example.

When the DeepSeek template was applied to a Qwen3 model:

  1. The prompt tokens dropped to 2 (essentially garbage) instead of 14 (properly formatted)
  2. The model received an almost empty, nonsensical prompt
  3. It fell back to whatever patterns it had memorized — which is why it started talking about “DeepSeek-Coder-7B-Instruct” completely unprompted
  4. The web UI’s input wasn’t being formatted at all, so every message was garbled

Why I made the mistake

The model name starts with “DeepSeek-R1” so I reflexively used --chat-template deepseek. But the architecture is Qwen3 (a Qwen model fine-tuned by DeepSeek), not a native DeepSeek model. The name is misleading — the template should match the architecture, not the fine-tuner.

The fix

Removed the --chat-template deepseek flag entirely. When no explicit template is specified, llama-server reads the chat template embedded in the GGUF file itself — which is the correct Qwen3 template. This is always the safest approach unless you have a reason to override.

Lesson


Back Donate