Healthy Training Run in Progress | Generated by AI

Home 2026.05

Training is healthy and running. Here’s the status:

Process: train.py config/train_gpt2_200m.py --compile=False (PID 1681991), GPU at 100%, 211W.

Progress: iter 1840 / 8000 (23%), last val loss 3.6331 at step 1750 (down from 10.99 at start).

Speed: ~18.53s/iter (one 4070, no compile, batch_size=4, grad_accum=128 → 524K tokens/step). Eval+checkpoint every 250 iters costs ~28s extra.

Time remaining: 6160 iters × 18.53s + ~25 checkpoint stalls ≈ 31–32 hours from now, ETA roughly Saturday 2026-05-16, ~05:30 local.

Elapsed so far: 9h 34m.

Suggestions if you want it faster:


Back Donate