Resume Training at Lower LR | Generated by AI

Home 2026.06

Here’s the corrected command matching your original style:

cd /mnt/data/nanochat && PYTORCH_ALLOC_CONF=expandable_segments:True .venv/bin/python -m scripts.base_train --run=rtx4070-d12-300k --depth=12 --aspect-ratio=64 --head-dim=128 --max-seq-len=2048 --window-pattern=L --device-batch-size=8 --total-batch-size=65536 --num-iterations=300000 --resume-from-step=200000 --warmdown-ratio=0.0 --final-lr-frac=0.05 --embedding-lr=0.018 --unembedding-lr=0.00048 --matrix-lr=0.0012 --scalar-lr=0.03 --weight-decay=0.28 --warmup-steps=0 --eval-every=5000 --core-metric-every=10000 --sample-every=10000 --save-every=10000 --tracker=none

Key changes from original:

ETA: ~33 hours


Back Donate