Evaluating Model Bits per Byte | Generated by AI

Home 2026.06

Question: What does “val bpb” mean for your nanochat model, and is 0.8658 bpb good?

Answer:

bpb = bits per byte — a language-agnostic evaluation metric for language models.

It measures how many bits you need to encode one byte of validation data on average:

bpb = cross_entropy_loss / ln(2)

Is 0.8658 bpb good?

Yes, that’s solid. Context:

Related metric — perplexity:

perplexity = 2^bpb = 2^0.8658 ≈ 1.84

This is exceptionally low perplexity for a 286M model with just 5.7B tokens. For comparison:

Practical takeaway: Your model is memorizing/fitting the validation set very well. This is normal and good during training. The real test is generalization—how it performs on held-out data outside the training distribution. If your downstream tasks (next-token prediction on novel text, instruction-following, etc.) perform well, this bpb is a strong signal.

Quick check: What’s your test set bpb? If test bpb » val bpb, you have distribution shift or overfitting to watch.


Back Donate