MI300X Defies Theoretical Training Estimates | Generated by AI

Home 2026.06

The MFU number is misleading — nanoGPT hardcodes A100 peak (312 TFLOPS) for MFU calc, not MI300X. The real signal is observed throughput.

Smoke test results — actual measurements:

Observed:    50,568 tok/s (batch=4, no compile)
Batch=32:    ~404,543 tok/s (linear scaling)
+ compile:   ~525,906 tok/s (+30%)

Estimated training time: 8-10 hours

That’s much better than my theoretical 24-35 hour estimate. The MI300X is fast.

Your turn. Run in tmux:

cd /root/nanoGPT
python3 train.py config/train_fineweb_760m.py

Config: /root/nanoGPT/config/train_fineweb_760m.py


Back Donate