Storage Needs for GPT-3 Training | Generated by AI

Home 2026.06

Question: For training GPT-3 scale from scratch, how large a volume do I need for everything — raw text, tokenized data, and all checkpoints?

Answer:

GPT-3 Model Size First

GPT-3: 175B parameters
float32:  175B × 4 bytes = 700GB per checkpoint
float16:  175B × 2 bytes = 350GB per checkpoint
bfloat16: 175B × 2 bytes = 350GB per checkpoint

With AdamW optimizer states (float32 master copy + moments):

float32 master weights:  700GB
gradients:               700GB
momentum (m):            700GB
variance (v):            700GB
total:                  ~2.8TB per training checkpoint

Reality Check — You Cannot Train True GPT-3 on MI300X

MI300X HBM:     192GB
GPT-3 fp16:     350GB minimum (weights alone)
                ──────────────────────────────
                doesn't fit in a single GPU

Even with model parallelism across multiple MI300X, a single droplet won’t cut it. True GPT-3 training requires:

What You Probably Mean — GPT-3 Small/Medium Scale

OpenAI’s GPT-3 paper defines several sizes:

Model Params fits MI300X?
GPT-3 Small 125M ✅ trivially
GPT-3 Medium 350M
GPT-3 Large 760M
GPT-3 XL 1.3B
GPT-3 2.7B 2.7B
GPT-3 6.7B 6.7B
GPT-3 13B 13B ✅ tight
GPT-3 175B 175B ❌ needs ~50x MI300X

Realistic Target: GPT-3 6.7B on MI300X

6.7B × 4 bytes (fp32) = 26.8GB weights
AdamW full states:
  weights:    26.8GB
  grads:      26.8GB
  momentum:   26.8GB
  variance:   26.8GB
  total:     ~107GB  ← fits in 192GB HBM3 ✅

Volume Size Breakdown for 6.7B

Item Size Notes
Raw text (Pile-scale) 800GB The Pile is 825GB
Tokenized .bin 400GB ~half after BPE tokenization
Checkpoint fp32+AdamW 107GB full training state
Keep last 3 checkpoints 321GB  
Best checkpoint fp16 13GB inference only
Logs, misc 5GB  
Total ~1.5TB  

Volume Recommendation by Target Model

Target Volume Size Cost at rest/month
GPT-3 Small (125M) 200GB $21/mo
GPT-3 XL (1.3B) 300GB $32/mo
GPT-3 6.7B 1.5TB $162/mo
GPT-3 13B 2.5TB $270/mo
GPT-3 175B forget it need cluster

My Recommendation for You

Given your MI300X (192GB HBM3), target GPT-3 6.7B — it’s the largest model that fits comfortably with full AdamW states, and it’s a meaningful scale jump from GPT-2 XL (1.5B).

Volume: 1TB initially
Cost:   $0.00015 × 1000GB = $0.15/hr at rest
        $0.15 × 24 × 30  = ~$108/month idle

If budget is tight, start with GPT-3 2.7B on a 500GB volume (~$54/month idle) — still a serious model, half the storage cost.

Practical Config

# nanoGPT config for 6.7B
n_layer = 32
n_head = 32
n_embd = 4096
block_size = 2048

out_dir = '/mnt/data/checkpoints'
# with MI300X 192GB, can do decent batch size
batch_size = 16
gradient_accumulation_steps = 40  # effective batch ~640

Bottom Line

“GPT-3” as OpenAI built it = not feasible solo. But GPT-3 6.7B on your MI300X is totally legit, meaningful research-grade training, and that needs a 1TB volume at ~$108/month idle — or spin it down and snapshot just the checkpoints when not training.


Back Donate