Training Completed with Best Checkpoint | Generated by AI

Home 2026.04

Training finished cleanly at iter 12000. Summary:

Final losses

Curve

Health

Comparison point: nanoGPT’s reference GPT-2 (124M) reproduction reaches ~2.85 val on OpenWebText after ~600k iters. You stopped at 1/50th of that compute, so 3.29 is roughly where you’d expect — model is undertrained, not broken.

Best checkpoint: out-fineweb/ (saved at every eval — the iter-11500 save is your best val).


Back Donate