1.5B Model Training on MI300X | Generated by AI

Home 2026.06

Question: What about training 1.5B parameter models on the same MI300X setup?

Answer:

Training a 1.5B model on your MI300X setup is not only feasible—it’s actually a much better fit for utilization efficiency, but there are trade-offs depending on what you optimize for (throughput, MFU, or total training cost).


1. What changes when you go to 1.5B

Compared to your current ~760M model:


2. What happens to MFU on MI300X

This is the interesting part:

You will likely see

Why?

Because larger models:

So paradoxically:

Bigger model → better MFU efficiency (up to a point)


3. Throughput vs reality

Even though MFU improves:

Rough expectation:

So:

Lower raw tokens/sec, but better scaling efficiency


4. Memory considerations (MI300X is key here)

MI300X is actually well-suited:

You may even be able to:


5. Training stability improves

Larger models often benefit your current setup:


6. The real trade-off

If your goal is

🚀 Maximum tokens trained per hour

→ 760M is better (higher tok/sec)

🧠 Better model quality per run

→ 1.5B is significantly better

⚖️ Balanced “research + production”

→ 1.5B is usually the sweet spot on MI300X


7. Practical recommendation for your setup

Given your current config (~100k tok/s at 760M):

A strong next experiment would be:

Expected result:


Bottom line


If you want, I can help you design the exact best 1.5B configuration (batch/seq/accumulation) to push MI300X closer to its real efficiency ceiling (~50%+ MFU territory).


Back Donate