Fineweb GPT-3 Training Setup | Generated by AI
Loader works — 1407 train shards + 1 val, uint16, first token is 50256 (EOT) as expected.
4) What to do — commands to run
cd /mnt/data/nanoGPT
source venv/bin/activate
# (optional) confirm the GPU is free
nvidia-smi
# dry run: 5 iters at small batch to verify nothing crashes before committing
python train.py config/train_fineweb_gpt3.py \
--compile=False --batch_size=2 --gradient_accumulation_steps=1 \
--max_iters=5 --eval_interval=10 --log_interval=1
# real run (single 4070, 12GB). Use a tmux/screen session — this is many hours.
tmux new -s nano
python train.py config/train_fineweb_gpt3.py 2>&1 | tee out-fineweb-gpt3/train.log
# detach: Ctrl-b d | reattach: tmux attach -t nano
Notes / knobs to tune later
max_iters=19073≈ 10B tokens seen at 524k tokens/step. To actually use the full ~140B you tokenized, bump it to ~267000 (and setlr_decay_itersto match). That’s a multi-day run on a 4070.- If the dry run OOMs, drop
batch_sizeto 4 and doublegradient_accumulation_stepsto 128 (keeps tokens/step constant). - If
compile=Trueerrors on your PyTorch/CUDA combo, override with--compile=False. - Resume after interruption:
python train.py config/train_fineweb_gpt3.py --init_from=resume. - The patch I made to
train.pyis backward-compatible — old configs withoutshard_dirstill readtrain.bin/val.bin.