Fineweb GPT-3 Training Setup | Generated by AI

Home 2026.05

Loader works — 1407 train shards + 1 val, uint16, first token is 50256 (EOT) as expected.

4) What to do — commands to run

cd /mnt/data/nanoGPT
source venv/bin/activate

# (optional) confirm the GPU is free
nvidia-smi

# dry run: 5 iters at small batch to verify nothing crashes before committing
python train.py config/train_fineweb_gpt3.py \
    --compile=False --batch_size=2 --gradient_accumulation_steps=1 \
    --max_iters=5 --eval_interval=10 --log_interval=1

# real run (single 4070, 12GB). Use a tmux/screen session — this is many hours.
tmux new -s nano
python train.py config/train_fineweb_gpt3.py 2>&1 | tee out-fineweb-gpt3/train.log
# detach: Ctrl-b d   |   reattach: tmux attach -t nano

Notes / knobs to tune later


Back Donate