VRAM-Optimized FineWeb Training Setup | Generated by AI

Home 2026.05

It’s OK — you can run for hours. Summary of what the smoke run showed and what I changed:

Findings

Time budget for the real run (max_iters=19073, ~10B tokens seen)

Run this in another terminal

cd /mnt/data/nanoGPT tmux new -s nano # so it survives disconnects source venv/bin/activate

PYTORCH_ALLOC_CONF=expandable_segments:True
python -u train.py config/train_fineweb_gpt3.py 2>&1
| tee -a out-fineweb-gpt3/train.log # detach: Ctrl-b d reattach: tmux attach -t nano

If compile=True errors on your torch/CUDA combo, just re-launch with –compile=False appended.

Resume after interruption: add –init_from=resume.


Back Donate