DigitalOcean Storage for GPT-3 XL Training | Generated by AI

Home 2026.06

Question: For training GPT-3 XL (1.3B) on 30B tokens, how large a DigitalOcean block storage volume do I need?

Answer:

Model Checkpoint Size

1.3B parameters
fp32 weights:     1.3B × 4 bytes = 5.2GB
AdamW states:
  grads:          5.2GB
  momentum m:     5.2GB
  variance v:     5.2GB
total per ckpt:   ~21GB

Keep last 3 checkpoints:

21GB × 3 = 63GB

Data Size

30B tokens, GPT-2 BPE, stored as uint16:

30B tokens × 2 bytes = 60GB
+ raw text (before tokenization): ~80GB

If you delete raw text after tokenizing (recommended):

tokens only: 60GB

Full Breakdown

Item Size Keep?
Raw text (download) 80GB delete after tokenizing
Tokenized train.bin 57GB
Tokenized val.bin 3GB
Checkpoints (last 3) 63GB
Best checkpoint fp16 2.6GB
Logs / wandb / misc 3GB
Total (after deleting raw) ~129GB  
Total (keeping raw) ~209GB  

Recommendation

200GB volume

Cost at rest:
$0.00015 × 200GB = $0.03/hr
$0.03 × 24 × 30  = $21.60/month idle

During training (5 days):
Volume cost negligible vs $2/hr GPU

Why Not 150GB?

129GB used + headroom:
  mid-training temp files: ~5GB
  pip packages, conda env: ~8GB
  unexpected data growth:  ~10GB
  ──────────────────────────────
  safe floor: ~152GB  ← too tight

200GB gives ~70GB breathing room ✅

Bottom Line

200GB volume
$21.60/month idle
$0.75 for 5-day training period (volume portion)
Total training cost: ~$240 GPU + $0.75 volume = ~$241

Same recommendation as GPT-2 XL coincidentally — 200GB is your sweet spot for this scale of experiment.


Back Donate