Memory Math for GPT-2 Medium on GPU | Generated by AI

Home 2026.05

Yes, GPT-2 medium (350M) fits on a 12 GB 4070, but it’s tight and slow.

Memory math (bf16 mixed precision, AdamW):

Required config tweaks (vs your current):

n_layer = 24
n_head = 16
n_embd = 1024

batch_size = 1                  # was 4
gradient_accumulation_steps = 512  # was 128 → keeps ~524K tokens/step
learning_rate = 3e-4            # GPT-3 medium uses lower LR than small
min_lr = 3e-5

Caveats:

If you want, I can write config/train_fineweb_gpt2_medium.py ready to run. Want me to?


Back Donate