Segfault troubleshooting for gemma4 MoE model | Generated by AI

Home 2026.04

Question: Should I rebuild llama.cpp first, or attempt a run with Flash Attention disabled using the --no-flash-attn flag to troubleshoot the consistent segfaults and OOM issues with the gemma4 MoE model?

Answer:

Given the consistent segfault at the same memory offset (+231000), you are likely looking at a specific code path failure rather than a random hardware glitch. Here is the recommended path forward:

1. Try --no-flash-attn first

Running with Flash Attention disabled is the fastest way to isolate the variable. Since you suspect the Sliding Window Attention (SWA) and MoE indexing logic, bypassing the FA kernels will tell you immediately if the issue lies in the optimized attention implementation or the underlying KV cache management.

2. The Rebuild (with a twist)

If you decide to rebuild, don’t just git pull. If the segfault is consistent on CPU 31, it might be worth checking if your current build used aggressive compiler optimizations (like -march=native) that are producing unstable code for that specific core’s instruction set.

3. Monitoring the V-cache Padding

The fact that your V-cache is padding to 2048 without FA is a strong hint. If the segfault happens during the transition between SWA and non-SWA layers, the “padding” logic might be calculating an out-of-bounds pointer. Keep an eye on the syslog for any “address alignment” or “illegal instruction” errors that might accompany the next crash.


Next Step: Would you like me to help you draft a specific gdb command to trace exactly what is happening at that +231000 offset during the next run?


Back Donate