Resolving VRAM Constraints for Gemma | Generated by AI

Home 2026.04

Here’s the full summary:


What we did:

  1. Downloaded gemma-4-26B-A4B-it-UD-IQ3_S.gguf (10.4GB) from HuggingFace via proxychains
  2. Rebuilt llama.cpp from source (latest commit b8-94ca829b6)
  3. Started llama-server with the Gemma 4 26B model

Key findings:

Running config:

-m ./models/gemma-4-26B-A4B-it-UD-IQ3_S.gguf
-ngl 25
-c 2048
--host 0.0.0.0
--port 8081
--path ./tools/server/public

Performance:

Firewall issue:

Current status:


Back Donate