Training Times for Pangu v1 v2 v3 | Generated by AI

Home 2026.03

Question: Typically how much time is needed to train Pangu v1, v2, or v3 according to the “True-Story-of-Pangu” repository?

Answer:

Short answer: The repo does NOT give explicit training time (like “3 weeks” or “2 months”) for v1 / v2 / v3. It only gives relative descriptions such as “few hundred billion tokens”, “very long training”, “from-scratch with many restarts”, etc. So you must infer approximate scale from context.

Here’s what the document implies:


1. Pangu v1 (early 13B → 38B → 71B → 135B)

From the whistleblower text:

This implies:

The text explicitly says the 135B vocabulary replacement succeeded only after

“continually trained on at least 1T of data” (Reddit)

So v1 generation = long iterative training, not single run.


2. Pangu v2 (alleged “shell-wrap” version)

The repo claims:

Quote summary:

“by training on just a few hundred billion tokens, they improved metrics…” (Reddit)

Implication:

So v2 is much faster than v1.


3. Pangu v3 (first real from-scratch training)

This is described as:

Quote summary:

“To ensure stable training… rollbacks and restarts whenever gradients showed anomalies… not a single loss spike throughout training.” (Reddit)

This implies:

So v3 ≈ months-scale training.


Typical Timeline (inferred)

Based on token counts + typical hardware:

version training type tokens estimated time
v1 from scratch + retries ~1T weeks–months
v2 continued training few hundred B days–weeks
v3 full scratch stable training ~1T+ months

Why no exact time?

Because training time depends on:

The repo intentionally describes engineering struggle, not metrics.


The key insight

The important takeaway from that repo is:

So the speed difference is huge:

v2 « v1 ≈ v3

This is actually the main point the whistleblower was making.


If you’re trying to compare with:

I can estimate those too.


Back Donate