DigitalOcean Storage for GPT-3 XL Training

DigitalOcean Storage for GPT-3 XL Training | Generated by AI

Home 2026.06

Question: For training GPT-3 XL (1.3B) on 30B tokens, how large a DigitalOcean block storage volume do I need?

Answer:

Model Checkpoint Size

1.3B parameters
fp32 weights:     1.3B × 4 bytes = 5.2GB
AdamW states:
  grads:          5.2GB
  momentum m:     5.2GB
  variance v:     5.2GB
total per ckpt:   ~21GB

Keep last 3 checkpoints:

21GB × 3 = 63GB

Data Size

30B tokens, GPT-2 BPE, stored as uint16:

30B tokens × 2 bytes = 60GB
+ raw text (before tokenization): ~80GB

If you delete raw text after tokenizing (recommended):

tokens only: 60GB

Full Breakdown

Item	Size	Keep?
Raw text (download)	80GB	delete after tokenizing
Tokenized train.bin	57GB	✅
Tokenized val.bin	3GB	✅
Checkpoints (last 3)	63GB	✅
Best checkpoint fp16	2.6GB	✅
Logs / wandb / misc	3GB	✅
Total (after deleting raw)	~129GB
Total (keeping raw)	~209GB

Recommendation

200GB volume

Cost at rest:
$0.00015 × 200GB = $0.03/hr
$0.03 × 24 × 30  = $21.60/month idle

During training (5 days):
Volume cost negligible vs $2/hr GPU

Why Not 150GB?

129GB used + headroom:
  mid-training temp files: ~5GB
  pip packages, conda env: ~8GB
  unexpected data growth:  ~10GB
  ──────────────────────────────
  safe floor: ~152GB  ← too tight

200GB gives ~70GB breathing room ✅

Bottom Line

200GB volume
$21.60/month idle
$0.75 for 5-day training period (volume portion)
Total training cost: ~$240 GPU + $0.75 volume = ~$241

Same recommendation as GPT-2 XL coincidentally — 200GB is your sweet spot for this scale of experiment.

Back Donate