Projected GPT Training Time Estimate

Projected GPT Training Time Estimate | Generated by AI

Home 2026.05

Can’t pin it sharply yet — only iter 0 has printed (the next log line is at iter 20, due any moment). But here’s the math using the projected ~11 s/iter from smoke:

Phase	Cost
19,073 training iters × ~11 s	~58.3 hrs
Initial warmup (iter 0 overhead)	~17 s
38 evals × 9 s (every 500 iters)	~6 min
38 checkpoint writes × ~8 s	~5 min
Total	~59.5 hrs (~2.5 days)

That’s for the full configured run ending at iter 19073 (~10 B tokens seen).

If you don’t want to wait that long:

Stop at iter	Tokens seen	Time	Loss expected
5,000	~2.6 B	~15.5 hrs	~3.5–3.8
10,000	~5.2 B	~31 hrs	~3.2–3.4
19,073 (configured)	~10 B	~59.5 hrs	~2.9–3.1

You can kill at any 500-iter checkpoint boundary and the model is usable — sample.py --out_dir=out-fineweb-gpt3 will load the latest checkpoint.

I’ll refine this estimate once the first iter 20: line lands (within ~1 min from now). Ping me back then if you want a tighter number.

Back Donate