NanoGPT Checkpoint Format and Contents | Generated by AI

Home 2026.04

Question: What is the format of ckpt.pt in nanoGPT and what does it contain (e.g., model parameters)?

Answer:

In nanoGPT, a file like:

out-fineweb/ckpt.pt

is a PyTorch checkpoint file saved using torch.save().

It is not a simple text or model-only file — it is a Python serialized dictionary that typically contains everything needed to resume training or run inference.


1. What format is ckpt.pt?

It is usually a PyTorch Pickle-based binary file, created like:

torch.save(state, "ckpt.pt")

So internally it is a serialized Python dict, not a standardized model format like ONNX.

You cannot read it directly with cat; you must load it with PyTorch:

import torch

ckpt = torch.load("ckpt.pt", map_location="cpu")

2. What does nanoGPT store inside ckpt.pt?

In nanoGPT (Andrej Karpathy’s implementation), a checkpoint typically contains a dictionary like:

Core structure:

{
    "model": model_state_dict,
    "optimizer": optimizer_state_dict,
    "model_args": {...},
    "iter_num": int,
    "best_val_loss": float,
    "config": {...},
}

3. Main components explained

(1) Model parameters (weights)

"model": model.state_dict()

This is the most important part:

👉 This is the trained neural network parameters


(2) Optimizer state

"optimizer": optimizer.state_dict()

Contains:

👉 Needed only if you want to resume training exactly


(3) Training metadata

"iter_num": 12345
"best_val_loss": 2.31

Used for:


(4) Model configuration

"model_args": {
    "n_layer": 12,
    "n_head": 12,
    "n_embd": 768,
    ...
}

Defines architecture so model can be rebuilt exactly.


4. What is NOT inside

A nanoGPT checkpoint usually does NOT contain:


5. Is it just “params”?

Yes — but more precisely:

👉 It contains:

So it is a full training snapshot, not just a model.


6. Typical size

Depends on model size:


7. How it is used

Resume training:

python train.py --init_from=resume

Inference:

load only "model" weights into model and run generation.


8. Key intuition

Think of ckpt.pt as:

“A frozen snapshot of the entire training system at one moment in time”


If you want, I can also show:


Back Donate