NVIDIA GPU SM Explained | Generated by AI

Home 2026.06

Question: What does “SM” stand for in NVIDIA GPUs?

Answer:

SM = Streaming Multiprocessor.

It’s the fundamental compute unit of an NVIDIA GPU. A GPU is essentially a collection of SMs (e.g., H100 has 132 SMs, RTX 4070 has 46 SMs). Each SM contains:

The execution model maps directly to this hierarchy: when you launch a CUDA kernel, thread blocks are assigned to SMs. The block stays resident on one SM for its lifetime, and the SM executes its threads in warps of 32. Occupancy — a number you’ll see constantly in profiling — is the ratio of active warps to the SM’s maximum, bounded by register and shared-memory pressure.

So in practice: more SMs + higher clocks + better Tensor Cores per SM = more throughput. When you tune a kernel (tile sizes, shared mem usage, register count), you’re really tuning how well work packs into each SM.

References:


Back Donate