NVIDIA GPU SM Explained

NVIDIA GPU SM Explained | Generated by AI

Home 2026.06

Question: What does “SM” stand for in NVIDIA GPUs?

Answer:

SM = Streaming Multiprocessor.

It’s the fundamental compute unit of an NVIDIA GPU. A GPU is essentially a collection of SMs (e.g., H100 has 132 SMs, RTX 4070 has 46 SMs). Each SM contains:

CUDA cores (FP32/INT32 ALUs) — the scalar lanes
Tensor Cores — matrix-multiply-accumulate units (the workhorse for GEMMs in deep learning)
Warp schedulers — dispatch warps (groups of 32 threads) to execution units
Register file — typically 64K 32-bit registers per SM
Shared memory / L1 cache — fast on-chip scratchpad (configurable split, e.g. 228 KB on Hopper)
Load/Store units, SFUs (special function units for transcendentals)

The execution model maps directly to this hierarchy: when you launch a CUDA kernel, thread blocks are assigned to SMs. The block stays resident on one SM for its lifetime, and the SM executes its threads in warps of 32. Occupancy — a number you’ll see constantly in profiling — is the ratio of active warps to the SM’s maximum, bounded by register and shared-memory pressure.

So in practice: more SMs + higher clocks + better Tensor Cores per SM = more throughput. When you tune a kernel (tile sizes, shared mem usage, register count), you’re really tuning how well work packs into each SM.

References:

Back Donate