Understanding Model FLOPs Utilization (MFU) | Generated by AI

Home 2025.11

In the context of nanoGPT (a minimal implementation for training GPT models), mfu stands for Model FLOPs Utilization. It’s a key efficiency metric during training that measures how effectively the hardware (e.g., GPU) is being utilized, expressed as a percentage of the theoretical peak floating-point operations per second (FLOPs) achievable on that device.

Quick Breakdown:

For example, in your logs:

If you’re tweaking nanoGPT for better MFU, try increasing batch size or using mixed precision.

References


Back

x-ai/grok-4-fast

Donate