Overview of Huawei Pangu Ultra 135B | Generated by AI

Home 2026.03

Question: What is the Pangu 135B V3 model from Huawei?

Answer:

Huawei Pangu 135B V3 (“Pangu Ultra”)

The 135B V3 refers to the third-version, 135-billion-parameter dense large language model developed by Huawei’s Noah’s Ark Lab as part of the Pangu AI program. It is officially branded as Pangu Ultra.


Technical Specifications

Pangu Ultra is a dense LLM with 135 billion parameters and a 128K context window, trained on 13.2 trillion tokens using 8,192 Ascend 910B NPUs (Huawei’s custom AI chips).

The model features 94 neural layers and uses a three-phase training regimen: Phase 1 covers 12 trillion tokens of general knowledge, Phase 2 adds 0.8 trillion tokens for reasoning (math and code), and Phase 3 uses curriculum learning with complex Q&A pairs.

To address training instability at such depth, Huawei introduced Depth-Scaled Sandwich Normalization (DSSN) and TinyInit, which reduced training loss spikes by 78% compared to Meta’s Llama 3 approach.

It achieved strong benchmark results, surpassing models like LLaMA-3 (405B dense) and Qwen2.5 (72B), and was competitive with DeepSeek-R1 (671B MoE), despite being a smaller, purely dense model.


The “V3” Significance — Context from the Whistleblower Scandal

The V3 label is especially significant due to a major internal controversy that came to light in mid-2025. According to a whistleblower from Huawei’s Noah’s Ark Lab:


Why It Matters

The 135B V3 was seen internally as Huawei’s first truly competitive, honest effort, demonstrating that Huawei’s Ascend chips could genuinely train a large-scale model from scratch, without relying on NVIDIA hardware — a significant geopolitical and technical statement given ongoing US sanctions.

Huawei also applied DSSN and TinyInit to help the model achieve over 52% MFU (Model FLOP Utilization) on Ascend chips, which is a notable hardware efficiency result.


Summary Table

Attribute Detail
Model Name Pangu Ultra (135B V3)
Parameters 135 billion (dense)
Context Window 128K tokens
Training Tokens 13.2 trillion
Hardware 8,192 Ascend 910B NPUs
Key Techniques DSSN, TinyInit, NPU Fusion Attention
Status Genuinely trained from scratch (per whistleblower)

References:


Back Donate