Modded NanoGPT Speedrun Benchmark | Generated by AI

Home 2026.04

Question: Comprehensive introduction to the modded-nanogpt project

Answer:

modded-nanogpt is an open-source GitHub repository that serves as a high-performance “speedrun” benchmark for training a small GPT-style language model. It focuses on achieving a specific performance target—3.28 cross-entropy validation loss on the first 10,485,760 tokens of the FineWeb dataset—as quickly as possible using 8 NVIDIA H100 GPUs.

Origins and Lineage

The project builds directly on Andrej Karpathy’s educational work:

The name “modded-nanogpt” reflects its evolution: heavy modifications (“modded”) to the original nanoGPT baseline for extreme wall-clock speed on modern hardware. It is maintained primarily by Keller Jordan, with contributions from a collaborative/competitive community.

Core Objective: The NanoGPT Speedrun

This is not a general-purpose training framework but a speedrunning challenge:

It emphasizes wall-clock time over other metrics like data efficiency or generalization, though many optimizations incidentally improve efficiency.

Key Innovations and Optimizations

The dramatic speed gains come from a combination of architectural, algorithmic, systems, and numerical improvements. Notable techniques include:

These changes are aggressively tuned for the specific hardware and target. Not all generalize perfectly to larger models or different setups, but many have influenced downstream projects.

How to Run It

The repo provides straightforward reproduction:

  1. Clone the repository: git clone https://github.com/KellerJordan/modded-nanogpt.git
  2. Build and run via Docker for the current record: sudo docker build -t modded-nanogpt . followed by running the container.
  3. Or execute scripts directly (e.g., speedrun.sh or train_gpt.py).

It includes Wandb logging, benchmarking, and generation capabilities in related discussions.

Impact and Community Influence

Educational Value

While optimized for speed rather than readability, the core train_gpt.py (and related files like Triton kernels) offers deep insights into modern LLM training. Walkthroughs exist online covering setup, FP8 ops, parallelism, attention mechanisms, and the optimizer. It is more complex than original nanoGPT but remains far more accessible than production frameworks like Hugging Face Transformers.

In summary, modded-nanogpt transforms a simple educational GPT trainer into a cutting-edge optimization playground. It highlights the rapid progress possible in LLM training through open collaboration and low-level engineering, turning what was once a 45-minute task into a sub-3-minute feat on high-end hardware.

References:


Back Donate