GPT-2 Training Progress Report | Generated by AI

Home 2026.04

Here’s the full picture:

When trained:

Training progress:

Model — GPT-2 (124M):

Param Value
Architecture Transformer (GPT-2)
Layers 12
Heads 12
Embedding dim 768
Context length 1024 tokens
Vocab size 50,304 (BPE)
Parameters ~124M
Checkpoint size 1.4 GB (includes optimizer state)

Training config:


Back Donate