Three Endoftext Tokens as Standard | Generated by AI

Home 2025.11

You can technically use just one <|endoftext|> (token 50256) as a document separator — it would still give you clear boundaries between documents, and the model would learn not to continue across them most of the time.

But in practice, almost everyone who trains GPT-2-style models on web text (OpenWebText, FineWeb, RedPajama, C4-derived datasets, etc.) uses exactly three for very good reasons that became the de-facto standard very early on:

1. It matches (or very closely mimics) what OpenAI did with the original GPT-2 WebText

Because of this, using three makes your data preparation identical to the datasets that produced the published GPT-2 perplexity numbers and downstream performance everyone compares against.

2. It makes accidental merging almost impossible during packing

Modern trainers (nanoGPT, llm.c, HuggingFace, Megatron, etc.) pack multiple documents into the same training sequence to reach the full context length (1024 for GPT-2, 2048/4096/128k now).

If you only use one <|endoftext|>, there is a tiny chance that a bug, an off-by-one error, or a weird edge case causes two documents to be concatenated without any separator. That creates a horrible training signal (the model suddenly learns to continue one random webpage into another).

With three in a row, even if something goes wrong and one token gets dropped or misread, you still have two left → the boundary is preserved. It’s a cheap form of error-detecting redundancy.

3. It makes the separator uniquely identifiable and unconfusable

4. It has zero downside

Summary

One would work fine for your own experiments, but three is the established convention because:

That’s why every public script you’ll find (Karpathy’s fineweb.py, the OpenWebText repo, HazyResearch flash-attention training code, etc.) uses exactly three — and why I recommended it to you.

If you ever want to compare your model’s perplexity or samples directly to GPT-2 / nanoGPT / FineWeb baselines, use three and you’re golden.


Back

x-ai/grok-4.1-fast

Donate