Memory Usage for 1.5B Model Training

Memory Usage for 1.5B Model Training | Generated by AI

Home 2026.04

77GB breakdown for 1.5B model:

Component	Memory
Weights (bf16)	~3GB
Adam states (2 states × fp32 = m & v)	~12GB
Gradients (fp32)	~6GB
Activations (batch=16, seq=1024)	~56GB
Total	~77GB

Activations dominate — that’s normal. Each layer’s intermediate outputs for every token in every sequence add up fast. 56GB for activations is expected on a 1.5B model with seq=1024 and batch=16.

You could technically fit a bigger batch since you have ~66GB headroom, but going larger would increase activations proportionally and you might OOM. The current 77GB is a solid sweet spot for this GPU.

Back Donate