SFT in Nanochat Explained | Generated by AI

Home 2026.06

Question: What’s SFT in nanochat?

Answer:

SFT (Supervised Fine-Tuning) in nanochat is the stage that turns the base pretrained model into a conversational assistant by training it on full conversations formatted with the chat special tokens. It sits between base pretraining/midtraining and RL in Karpathy’s pipeline (scripts/chat_sft.py).

The key idea: same next-token cross-entropy loss as pretraining, but now over rendered conversations, with the loss masked to only the assistant tokens — you don’t train the model to predict user turns or system prompts, only to generate the assistant’s replies.

What’s distinctive about nanochat’s SFT vs midtraining:

Minimal sketch of the masked loss, which is the whole conceptual core:

# logits: (B, T, V), targets: (B, T), mask: (B, T) 1 for assistant tokens
loss = F.cross_entropy(
    logits.view(-1, logits.size(-1)),
    targets.view(-1),
    reduction="none",
).view_as(targets)
loss = (loss * mask).sum() / mask.sum().clamp(min=1)

After SFT you get chatsft checkpoints, which is what the web UI / CLI serve by default — and what RL (GRPO on GSM8K in scripts/chat_rl.py) builds on top of.

References:


Back Donate