326k Steps Training Journey | Generated by AI

Home 2026.06

Here’s the full story of how we got to ~326k steps:

=== THE ZZ TRAINING JOURNEY ===

PHASE 1: DATA COLLECTION & TOOLING (May-June 2026) The repo started as a dataset processing pipeline. You built scripts to:

PHASE 2: FIRST SMALL RUNS (FineWeb, 125M model) Early training on FineWeb data with a GPT-2 124M model:

PHASE 3: NANOCCHAT RUNS (286M model, June 4-6) Bigger model (286M params, depth=12, n_embd=768, n_head=6) on fineweb-edu:

PHASE 4: 760M RUN (MI300X, briefly) Attempted a 760M model run (logged in nanogpt_760m_train logs), likely on a rented MI300X GPU.

PHASE 5: GITHUB-CODE 124M (CURRENT, running now) Switched dataset to 282 .npy shards of GitHub code (27GB tokenized). Config: GPT-2 124M, batch=4×1024×8 accum = 32,768 tok/step, lr=6e-4, 427k max iters → 14B tokens total.

Progress: Step 0: loss 10.87 (random init) Step 25k: loss ~3.6 (learning fast) Step 325k: val loss 2.7892 (just evaluated at 325k) Current: ~326,960 / 427,000 = 76.6% done

Step time started at ~3.9s but dropped to ~621ms after torch.compile warmed up. MFU plateaued at 14.44%. At current pace, ~62k steps remaining × 621ms ≈ 10.7 hours left.

The checkpoint saves to /mnt/data/nanoGPT/out-github-code-124m/ (1.4GB).


Back Donate