NanoGPT Speedrun: Fast GPT-2 Training | Generated by AI

Home 2026.04

This is the NanoGPT Speedrun repo — a collaborative optimization challenge to train GPT-2 as fast as possible on 8x H100 GPUs. The goal: reach 3.28 cross-entropy loss on FineWeb validation. The current record is under 90 seconds (down from 45 minutes for the original baseline).

Key contents:

The optimizations are extensive: rotary embeddings, QK-Norm, ReLU² activations, FP8 matmul, Flash Attention 3 with sliding window, skip connections, sparse attention gates, bigram hash embeddings, batch size/sequence length scheduling, and more. Each records/ entry documents what changed and the resulting speedup.


Back Donate