Article Verified as Accurate | Generated by AI
Now I have all the facts. Here’s the verification:
======================================== ARTICLE vs REALITY — Fact Check ========================================
The article is VERY accurate. Here’s the breakdown:
PIPELINE & DATA Article: “38 of 1126 parquet shards (~27 GB)” → Actual: 27 GB on disk ✓ Article: “140 train shards + 1 val shard” → Actual: 140 train + 1 val ✓ Article: “~14B tokens” → Actual: 13.97B train + 100M val ≈ 14.07B ✓ Article: “282 .npy shards” → Technically correct but misleading — there are 282 .npy files in the dir, but 141 are symlinks (edufineweb_*) pointing to the 141 real files. Minor quibble.
MODEL ARCHITECTURE Article: n_layer=12, n_head=12, n_embd=768 → Config matches exactly ✓ Article: block_size=1024, dropout=0.0, bias=False → All match ✓ Article: ~124M non-embedding, ~163M total → Standard GPT-2 124M ✓
TRAINING CONFIG Article: batch=4, grad_accum=8, effective=32,768 → Config matches ✓ Article: lr=6e-4, min_lr=6e-5 → Match ✓ Article: warmup=2000, max_iters=427000 → Match ✓ Article: weight_decay=0.1, beta1=0.9, beta2=0.95 → Match ✓ Article: cosine decay over full 427k steps → lr_decay_iters=427000 ✓
HARDWARE Article: RTX 4070 12GB → Confirmed ✓ Article: ~636 ms/step, ~51,900 tok/sec, 14.4% MFU → Can’t verify post-hoc, but plausible
CHECKPOINT Article: final step 427k, val_loss 3.47 → Actual: step=427000, best_val_loss=3.466 ✓ Article: best generalization at ~70k step (2.08) → Can’t verify (no saved ckpt from that step), but the overfitting analysis is sound
MINOR ISSUE The “282 .npy shards” number is inflated by 141 symlinks. Real unique files: 141 (140 train + 1 val). But the article immediately clarifies “140 train shards + 1 val shard” so it’s fine.
VERDICT: The article is accurate. Everything matches the actual config, checkpoint, and data on this machine. The training really did run on this laptop’s RTX 4070 to completion at step 427,000.