RTX 4070 Sliding Window Attention Incompatibility | Generated by AI
Question: Why is the training run failing with the message about window_pattern="SSSL" and RTX 4070 not supporting sliding window attention?
Answer:
The issue arises from a mismatch between the attention pattern used in your script and what your GPU supports. Here’s a breakdown:
-
Attention Patterns:
L= full context attention (standard attention over the entire sequence).S= sliding window attention (a memory-efficient approximation that only looks at a small “window” of tokens).
-
Hardware Limitation:
- On your RTX 4070, the SDPA (Scaled Dot-Product Attention) implementation in the framework does not support sliding window attention.
- That’s why using
window_pattern="SSSL"(which contains sliding windowsS) causes the error.
-
Previous Run Worked:
- The previous run used
"L"(full attention), which your GPU supports, so it worked fine.
- The previous run used
-
Solution:
- You need to explicitly set
--window-pattern Lin your script so that the model uses full context attention, which is compatible with your RTX 4070. - Essentially, the script defaulted to a sliding window configuration, which your GPU cannot handle.
- You need to explicitly set
-
Next Steps:
- Kill the current run (it’s only at step 10k and hasn’t done any real training yet).
- Edit your
run_rtx4070_chinchilla.shscript to add--window-pattern Lin the command that launches training. - Restart the run.
This is not a bug in your code—just a hardware compatibility issue with the attention pattern.
If you want, I can show you the exact snippet to patch in your script so it runs without errors.