Estimated Training Time Calculation | Generated by AI
Good — with the iter 100 timing you showed, I can give a concrete estimate.
Raw calculation (based on your observed pace)
- reported time per iteration at
iter 100: 3995.11 ms = 3.99511 s / iter - remaining iterations: 6000 − 100 = 5900
- remaining time = 5900 × 3.99511 s = 23,571.149 s = 6 hours 32 minutes 51 seconds
So at the current pace you’ll need ~6 hours 32 minutes 51 seconds to finish the remaining 5900 iterations.
Important assumptions & caveats
- This assumes the 3.995 s/iter remains constant for the rest of the run.
-
In practice timings usually change:
- they often improve after warmup/compilation finishes, or when caching/warmup I/O completes;
- they can worsen if periodic evaluations, checkpoints, or data pipeline stalls happen.
- Your
iter 0was much slower (compile + warmup).iter 100is a more realistic measure — but still early.
Practical ranges (what you might actually see)
- If it stays at 3.995 s/iter → ~6.5 hours (as above).
- If it improves to 0.2 s/iter → ~20 minutes.
- If it improves to 0.15 s/iter (optimistic for the 40M model on a 4070) → ~15 minutes.
Quick tips to reduce time if you want it faster
- Check data-loading: increase
num_workers, usepin_memory, ensure the dataset is on a fast local SSD and preprocessed (parquet → memmap). I/O stalls frequently cause multi-second iterations. - If you have headroom, increase
batch_size(or reducegradient_accumulation_steps) to reduce number of optimizer steps — but only if it fits memory. - Try a run with
compile=Falseas a quick experiment — sometimestorch.compilecan be slower on some setups/Inductor configs. - Disable or reduce heavy logging/eval/checkpoint frequency while profiling throughput.
If you want, paste iter 101..110 logs (or the next few) and I’ll compute the running average and a tighter ETA, and point out whether the time is trending down.