Training GPT-3 on One MI300X | Generated by AI

Home 2026.06

Question: How long would it take to train GPT-3 6.7B on a single MI300X?

Answer:

The Math from First Principles

GPT-3 6.7B was trained on 300B tokens (from the paper).

FLOPs per token ≈ 6 × N
where N = number of parameters

6 × 6.7B = 40.2 GFLOPs per token
300B tokens × 40.2 GFLOPs = 1.206 × 10^22 FLOPs total

MI300X Real Performance

MI300X peak bf16:     ~1,300 TFLOPS (theoretical)
Realistic MFU:        30-45% for training
Effective throughput: ~400-600 TFLOPS

Let’s use 40% MFU = 520 TFLOPS:

1.206 × 10^22 FLOPs / 520 × 10^12 FLOPs/sec
= 23,192,307 seconds
= 268 days

By MFU Scenario

MFU Effective TFLOPS Days Cost @ $2/hr
20% 260 TFLOPS 537 days ~$25,776
30% 390 TFLOPS 358 days ~$17,184
40% 520 TFLOPS 268 days ~$12,864
50% 650 TFLOPS 215 days ~$10,320

Reality Check — What OpenAI Actually Used

GPT-3 175B: 3.14 × 10^23 FLOPs, 10,000 V100s, ~14 days
GPT-3 6.7B: proportionally ~1/26th of that compute

On 10,000 V100s: ~13 hours for 6.7B
On 1 MI300X:     ~268 days

So you’re alone doing what took OpenAI a cluster of 10,000 GPUs.

Practical Options

Option A: Reduce Tokens (Most Practical)

Chinchilla optimal for 6.7B = ~134B tokens (20× params)
vs GPT-3's 300B tokens

134B tokens on MI300X @ 40% MFU:
= 1.206e22 × (134/300) / 520e12
= 120 days, ~$5,760

Still long. Go smaller:

10B tokens:  ~9 days,  ~$432   ← proof of concept
30B tokens:  ~27 days, ~$1,296 ← meaningful run
100B tokens: ~89 days, ~$4,272 ← serious run

Option B: Smaller Model, Same Architecture

Model Params 100B tokens Cost
GPT-3 Small 125M ~2 days ~$96
GPT-3 Medium 350M ~5 days ~$240
GPT-3 XL 1.3B ~18 days ~$864
GPT-3 6.7B 6.7B ~89 days ~$4,272

Option C: Multiple MI300X Droplets

DigitalOcean doesn’t support multi-GPU droplets easily, but AMD Dev Cloud might have multi-GPU nodes — 8× MI300X would cut time to ~34 days for full 300B token run.

My Recommendation for You

Phase 1: GPT-3 Small (125M), 10B tokens
         ~4 hours, ~$8
         Goal: validate pipeline, data, checkpointing

Phase 2: GPT-3 XL (1.3B), 30B tokens  
         ~5 days, ~$240
         Goal: confirm loss curves, MFU, wandb tracking

Phase 3: GPT-3 6.7B, 30B tokens
         ~27 days, ~$1,296
         Goal: serious training run, publishable loss

Don’t jump straight to 6.7B + 300B tokens. Karpathy’s own advice: run small experiments first, confirm everything is working, then scale. A bug at day 25 of a 27-day run is a $1,200 mistake.

Bottom Line

Full GPT-3 6.7B (300B tokens) on 1× MI300X:
  ~268 days, ~$12,864  ← not practical alone

Practical target:
  GPT-3 1.3B, 30B tokens
  ~5 days, ~$240       ← this is your sweet spot

The goal isn’t to replicate OpenAI’s run. It’s to learn the full pipeline at meaningful scale — and 1.3B at 30B tokens does that at a price you can actually pay.


Back Donate