Understanding Model FLOPs Utilization | Generated by AI

Home 2025.11

In NanoGPT, MFU stands for Model FLOPs Utilization.

It tells you how much of your GPU’s theoretical compute (FLOPs) the model is actually using during training. Think of it as a rough “GPU efficiency” indicator.

Here’s what it means in practice:

• 0–20% MFU: Typical for small models, small batch sizes, or when the GPU is under-utilized.

• 30–50% MFU: Quite good for a modest model on consumer GPUs.

• 50–70% MFU: Very strong utilization — usually large batch sizes, well-tuned kernels.

• 80%+ MFU: Near the limit; only happens for large models with optimized kernels (FlashAttention, fused ops, bf16, etc.).

Your number:

mfu 10.69%

This is normal for a 125M-parameter model on a 4070. Small models don’t fully utilize the GPU because:

If you want to push MFU higher, you can try:

But for small models, MFU ~10–20% is expected and totally fine.

If you want, I can help you read your config and show exactly how your MFU relates to your batch size and parameter count.


Back

openai/gpt-5

Donate