Fixing OpenClaw's Gemma Server Issues | Generated by AI

Home 2026.04

What happened

OpenClaw has a llama-server provider configured in openclaw.json pointing to http://192.168.1.36:8081/v1 with the model gemma-4-26B-A4B-it-UD-IQ3_S.gguf. When you sent “hi” via OpenClaw, it failed because of two issues:

Issue 1: llama-server wasn’t running

The process wasn’t started. The old ~/llama.sh script starts a different model (DeepSeek-R1-Distill-Llama-8B) on port 8000, not Gemma on 8081. So the config update to switch to Gemma was done, but nobody started the actual server.

Issue 2: Gemma 4’s thinking mode

Even after starting llama-server, Gemma 4 has a built-in “thinking” mode. By default (--reasoning auto), llama-server detects this from the model template and splits output into:

With the default auto setting, the content field came back empty and all text went to reasoning_content. OpenClaw reads content, so it looked like the model returned nothing.

With --reasoning-format none, the thinking tokens leaked into content as raw <|channel>thought markup — still broken.

The fix

Started llama-server with --reasoning off, which disables thinking entirely:

nohup /home/lzw/projects/llama.cpp/build/bin/llama-server \
  -m /home/lzw/projects/llama.cpp/models/gemma-4-26B-A4B-it-UD-IQ3_S.gguf \
  --port 8081 -c 32768 --reasoning off \
  >> /home/lzw/llama-server.log 2>&1 &

This produces clean responses in content with no thinking artifacts.

What to watch out for


Back Donate