Training Progress and ETA | Generated by AI

Home 2026.04

Here’s the analysis of the running training and ETA.

Run state

Config in use (config/train_fineweb.py)

   
dataset fineweb
model 12L / 12H / 768d, ~123.6M params
block_size 1024
batch / grad_accum 8 × 64 → effective 512 seq → 524,288 tokens/iter
max_iters / lr_decay_iters 12000
warmup 500
lr 3e-4 → 3e-5
eval_interval 500 (always_save_checkpoint=True)
compile True

So total tokens at completion ≈ 12000 × 524,288 ≈ 6.29 B tokens (the comment’s “~3B” is stale).

Loss trajectory

step    train    val
0       10.997   10.991
500      5.451    5.508
1000     4.379    4.391
1500     4.063    4.010
2000     3.906    3.852
2500     3.813    3.774
3000     3.744    3.691

Still dropping ~0.07 per 500 steps and val < train — healthy, no overfitting. Plenty of room: a 125M GPT-2 on FineWeb-scale data should plateau roughly around 3.0–3.1 val.

When does it end?

Next checkpoint save is due at step 3500, ≈ iter 3290 + 210 → in ~33 min from now (around 22:53).

Things worth noting

Want me to keep monitoring and ping you at the next checkpoint, or sample from the current ckpt to sanity-check generations?


Back Donate