FineWeb NanoGPT GPU Memory Guide

FineWeb NanoGPT GPU Memory Guide | Generated by AI

Home 2025.11

You’re training a small GPT (nanoGPT-style) on the FineWeb dataset (the 100B-token cleaned version of Common Crawl used to train LLaMA, Mistral, etc.), but you want to do it with limited GPU memory (probably 16–24 GB, like an RTX 3090/4090 or A6000).

Your current config is already quite conservative and good for a single consumer GPU, but here’s a detailed guide on how to choose the best parameters for nanoGPT on FineWeb, ranked by importance and memory impact.

Recommended Configurations by GPU Memory

GPU VRAM	Model Size (params)	n_layer	n_head	n_embd	batch_size	grad_acc	effective BS	block_size	Notes
80 GB (A100/H100)	~350M	12	12	768	64	8	512	1024	Close to original GPT-2 medium, trains well
48 GB (A6000/3090)	~250–300M	12	12	672	32	16	512	1024	Very good compromise
24 GB (4090/3090)	~160–200M	10	10	640	32	12–16	384–512	1024	Your sweet spot
16–20 GB (4080, 3090 Ti)	~125M	8	8	512	32	12	384	1024	Stable
<16 GB (4070 Ti, etc.)	~84M	6	6	384	16–32	16–24	256–512	512–1024	Your current config

Your current config (6L 6H 384embd) is safe but a bit small. You can go larger.

Best Config for 24 GB GPU (RTX 4090 / 3090) on FineWeb

This is the most popular setup in the nanoGPT community right now:

out_dir = 'out-fineweb-160M'
eval_interval = 1000
eval_iters = 200
log_interval = 100
always_save_checkpoint = True

wandb_log = True
wandb_project = 'fineweb'
wandb_run_name = '160M-fineweb'

dataset = 'fineweb'
gradient_accumulation_steps = 16   # 32 * 16 = 512 effective batch size
batch_size = 32
block_size = 1024                  # important: FineWeb was trained with 1024+

n_layer = 10
n_head = 10
n_embd = 640
dropout = 0.0                      # can try 0.1 later
learning_rate = 6e-4               # slightly higher for smaller models
max_iters = 50000                  # ~50–100B tokens total is ideal
warmup_iters = 2000
lr_decay_iters = 50000
min_lr = 6e-5
beta2 = 0.99

→ This is ~160M parameters, runs comfortably on a 4090 with ~20–22 GB VRAM usage.

Even Better: 200M+ Model (if you have 24 GB+)

n_layer = 12
n_head = 12
n_embd = 768    # → ~350M params (original GPT-2 medium size)
batch_size = 32
gradient_accumulation_steps = 16   # effective BS 512
block_size = 1024
learning_rate = 5e-4
max_iters = 60000

Many people train this successfully on a single 4090.

Key Rules of Thumb for FineWeb + nanoGPT

block_size = 1024 is strongly recommended
FineWeb was filtered and trained with 1024 context. Using 512 hurts perplexity more than you think.
Effective batch size ≈ 512 is the sweet spot
Original LLaMA used ~4M tokens per batch. For nanoGPT, 512 sequences × 1024 tokens = ~0.5M tokens/batch → close enough scaling law.
Learning rate
- ~100M params: 6e–8e-4
- ~350M params: 5e–6e-4
- ~770M params: 3e-4
Train for at least 50B tokens (better 100B+)
With 160M model:
- 50k iters × 32 × 16 × 1024 tokens = ~83B tokens
  Good convergence.
Use the official FineWeb dataset (not openwebtext)
Your script is already correct:
FineWeb » OpenWebText in quality.
Use the 100B subset: fineweb100B or full fineweb (truncated to your disk).

Optional but helpful

compile = True          # PyTorch 2.0 compile, ~20–30% faster
bias = False            # no bias in Linear layers (like LLaMA)
weight_decay = 0.1

Quick Command to Download FineWeb (100B version)

pip install datasets
python -c "
from datasets import load_dataset
ds = load_dataset('HuggingFaceFW/fineweb', name='sample-100BT', split='train')
ds.save_to_disk('data/fineweb100B')
"

Then point nanoGPT to that folder.

Summary: Upgrade Your Current Config

Replace your current model part with this (for 24 GB GPU):

n_layer = 10
n_head = 10
n_embd = 640
batch_size = 32
gradient_accumulation_steps = 16
block_size = 1024
learning_rate = 6e-4
max_iters = 50000

You’ll get a much stronger model with almost no extra cost.

People are getting very good 160M–350M models on FineWeb with these settings — often surpassing original GPT-2 performance.

Let me know your exact GPU and I can give you the optimal config! 🚀

Back Donate