Torch Compile Phases and Training Estimates

Torch Compile Phases and Training Estimates | Generated by AI

Home 2026.04

Two phases happening here:

iter 0: 113s — torch.compile() is tracing/optimizing the graph. This is a one-time cost.
iter 50: 21s — compiled and running

After compilation finishes, you should settle around 1-3 sec/step. The 164% MFU looks weird — likely a measurement artifact during compile.

Time estimate:

Check nvidia-smi right now — is it still 100% GPU-Util? If yes, it’s just compile overhead. Once that finishes you’ll see a massive speedup.

Also — loss dropping from 11.05 → 7.65 in 50 steps is healthy. Model is learning fine. The question is whether it’ll speed up.

Back Donate