1.5B Model Training Time Estimate | Generated by AI

Home 2026.06

Question: For training a 1.5B model on nanochat, how many tokens should we prepare and how much time will it take?

Answer:

Training a 1.5B parameter model requires planning both total token budget and wall-clock time, depending on your goals (research-quality vs production-quality). Here’s a structured estimate:


1. Token budget

A good rule of thumb for transformer LLMs:

Calculation for 1.5B

Recommendation for MI300X: start with 200–250B tokens; enough to see meaningful training without overcommitting.


2. Training time estimation

Use your current MI300X throughput as reference:

Wall-clock time

Note: gradient accumulation, eval steps, logging, and pauses can add 10–20% overhead. Realistic total: ~45–50 days.


3. Practical adjustments


4. Summary Table

Model Target Tokens Estimated Tok/sec Time (hours) Notes
1.5B 50B 55k 250 exploratory
1.5B 200B 55k 1010 (~42d) research-quality
1.5B 300B 55k 1515 (~63d) higher quality

Real throughput will fluctuate depending on exact batch/seq/config.


If you want, I can draft a ready-to-run nanochat config for 1.5B optimized for MI300X, including batch, grad accumulation, sequence length, and checkpoint schedule to finish 200B tokens efficiently.


Back Donate