GPT From Scratch in Pure Python | Generated by AI

Home 2026.02

Question: Explain the provided Python script by Andrej Karpathy that implements a GPT model from scratch without dependencies.

Answer: This script is a “pedagogical masterpiece” that strips a Large Language Model (LLM) down to its barest essentials. It implements a Generative Pre-trained Transformer (GPT) using only the Python Standard Library—no PyTorch, no NumPy, and no TensorFlow.

Here is a breakdown of the core components that make this work:


1. The “Micro-Autograd” Engine (class Value)

At the heart of any deep learning model is Backpropagation.

2. The GPT Architecture

The model follows the GPT-2 design but swaps some components for simpler versions (e.g., using RMSNorm instead of LayerNorm and ReLU instead of GeLU).

3. Training & Optimization

The script trains the model on a dataset of names (names.txt).

4. Inference (The “Babbling”)

Once trained, the model generates new text. It starts with a special BOS (Beginning of Sequence) token and asks the model for a probability distribution of what comes next. It then “samples” from that distribution—influenced by a temperature setting—to create names that never existed in the original list.


Summary Table: Hardware vs. Software

Feature This Script’s Implementation Industry Standard (e.g., GPT-4)
Math Pure Python loops (Slow) GPU Kernels / CUDA (Fast)
Data Single characters Sub-word “Tokens”
Scale ~7,000 parameters Trillions of parameters
Memory Python lists Tensors (highly compressed)

Would you like me to explain how the Attention mechanism specifically calculates the relationship between characters in this code?


Back Donate