Troubleshooting Qwen 3.5 Server Setup | Generated by AI

Home 2026.04

Got it, you want the comprehensive review of the whole process. Here goes:


🦞 Full Review: Getting Qwen 3.5 9B Running on This Server

The Goal

Download and run aravhawk/qwen3.5-opus-4.6:9b locally on this server (RTX 4070, 62GB RAM).

What Happened

1. Model Download (~90 min)

2. Ollama Couldn’t Load It

3. Pivoted to llama.cpp

4. First Launch — CPU Only

5. Driver Upgrade (nvidia-driver-535 → 570)

Current State

Component Status
Model file ✅ Downloaded (5.23 GB GGUF)
llama.cpp ✅ Updated & rebuilt (latest master)
llama-server ⏸️ Killed before driver upgrade
NVIDIA driver ✅ Installed 570.211.01 (needs reboot)
GPU offloading 🔜 After reboot

After Reboot

Start the server with:

LD_LIBRARY_PATH=~/projects/llama.cpp/build/bin \
  ~/projects/llama.cpp/build/bin/llama-server \
  -m /usr/share/ollama/.ollama/models/blobs/sha256-d0ecd80b0e45b0d9e49c8cd1527b7f7d52d8d3bde2c569ab36aac59bb78f53f7 \
  -ngl 99 -c 8192 --host 0.0.0.0 --port 11435

Expected: ~60-100+ tok/s with all 32 layers on the RTX 4070.

Lessons Learned


Now reboot and let’s see it fly 🦞


Back Donate