Huawei's Pangu Team's Dark Secrets | Generated by AI
https://github.com/HW-whistleblower/True-Story-of-Pangu
Editor’s Note: The following is a translation of a lengthy post by an anonymous individual claiming to be a Huawei Noah’s Ark Lab employee. This individual alleges significant issues within the Pangu large model development team, including alleged intellectual property infringement, internal power struggles, and a detrimental work environment. The original post was written in Chinese and shared online on July 6, 2025. This translation aims to accurately convey the content and tone of the original.
The Tragedy of Pangu: The Heartbreak and Darkness of Huawei Noah’s Ark Lab’s Pangu Large Model Development Journey
Hello everyone,
I am an employee of the Pangu large model team at Huawei Noah’s Ark Lab.
First, to verify my identity, I will list some details:
The current head of Noah is Wang Yunhe, formerly the Minister of the Algorithm Application Department, later renamed the Director of the Small Model Lab. The former head of Noah was Yao Jun (referred to by everyone as Teacher Yao). Several lab directors: Tang Ruiming (Brother Ming, Team Ming, already resigned), Shang Lifeng, Zhang Wei (Brother Wei), Hao Jianye (Teacher Hao), Liu Wulong (referred to as Wu Long Suo), etc. Many other core members and experts have successively left.
We belong to the “Sìyě” (Four Fields) organization. Sìyě has many brigades, and the foundational language large model is the Fourth Brigade. Wang Yunhe’s small model is the Sixteenth Brigade. We participated in gatherings in Suzhou, with various monthly milestones. During the Suzhou “attack meeting,” tasks were assigned, and goals needed to be achieved before the deadline. The Suzhou gathering would bring together personnel from various locations to the Suzhou Research Institute, where they usually stayed in hotels, such as those in Luzhi, separated from their families and children.
During the Suzhou gathering, Saturdays were default workdays, which was very hard, but there was afternoon tea on Saturdays, and once there were even crayfish. The workstations at the Suzhou Research Institute were relocated once, from one building to another. The buildings at the Suzhou Research Institute have European-style decorations, with large slopes at the entrance and beautiful scenery inside. Going to the Suzhou gathering usually lasted at least a week, or even longer, with some people unable to return home for one or two months.
Noah was once rumored to be research-oriented, but after joining, because of working on large model projects in Sìyě, project members completely became delivery-oriented, and were full of regular meetings, reviews, and reports. Many times, even conducting experiments required approval. The team needed to interface with various business lines such as Terminal Xiaoyi, Huawei Cloud, and ICT, leading to considerable delivery pressure.
The early internal code name for the Pangu model developed by Noah was “Pangu Zhizi.” Initially, it was only a web-based version that required internal application for trial use. Later, due to pressure, it was integrated into Welink for public beta testing.
These days, the uproar regarding the accusations of Pangu large model plagiarizing Qwen has been fervent. As a member of the Pangu team, I have been tossing and turning at night, unable to sleep. The Pangu brand has been so greatly affected. On one hand, I selfishly worry about my career development and feel that my past hard work was in vain. On the other hand, I feel a great sense of relief that someone has started to expose these matters. For countless days and nights, we gnashed our teeth at the actions of certain individuals within the company who repeatedly gained countless benefits through fraud, yet we were powerless. This oppression and humiliation have gradually eroded my affection for Huawei, making my time here increasingly muddled and lost, often doubting my life and self-worth.
I admit I am a coward. As a small worker, I dare not oppose powerful individuals like Wang Yunhe within the company, nor do I dare to oppose a behemoth like Huawei. I am afraid of losing my job, as I also have family and children, so I truly admire the whistleblowers from the bottom of my heart. However, seeing the internal attempts to cover up the facts and deceive the public, I simply cannot tolerate it anymore. I also wish to be brave once and follow my true self. Even if I harm myself by eight hundred, I hope to harm the enemy by a thousand. I have decided to disclose what I have seen and heard here (partially from colleagues’ verbal accounts) regarding the “legendary story” of the Pangu large model:
Huawei indeed primarily trains large models on Ascend cards (the Small Model Lab has many Nvidia cards, which they used for training before, and later transferred to Ascend). I was once impressed by Huawei’s determination to “build a world’s second choice,” and I myself once had deep affection for Huawei. We accompanied Ascend step by step, from being full of bugs to being able to train models, paying a huge amount of effort and cost.
Initially, our computing power was very limited, and we trained models on the 910A. At that time, it only supported FP16, and the training stability was far inferior to BF16. Pangu’s MoE started very early; in 2023, it was mainly focused on training 38B MoE models and subsequent 71B dense models. The 71B dense model was expanded into the first generation 135B dense model, and later the main models were gradually trained on 910B.
Both the 71B and 135B models had a major flaw: the tokenizer. The tokenizer used at that time had extremely low encoding efficiency. Every single symbol, number, space, and even Chinese character would occupy a token. It is conceivable that this would greatly waste computing power and result in very poor model performance. At this time, the Small Model Lab happened to have a self-trained vocabulary. Teacher Yao suspected that the model’s tokenizer might not be good (although in hindsight, his suspicion was undoubtedly correct). So, he decided to switch the tokenizers for the 71B and 135B, because the Small Model Lab had tried it before. The team stitched together two tokenizers and began the tokenizer replacement. The 71B model replacement failed, while the 135B, using a more refined embedding initialization strategy, successfully replaced the vocabulary after at least 1T of data fine-tuning, but as expected, the effect did not improve.
Concurrently, other domestic companies like Alibaba and Zhipu were training on GPUs and had already found the correct methods, and the gap between Pangu and its competitors grew wider and wider. An internal 230B dense model trained from scratch also failed due to various reasons, leading the project to a near-desperate situation. Facing the pressure of several milestones and strong internal doubts about Pangu, the team’s morale plummeted to an all-time low. The team made many efforts and struggles when computing power was extremely limited. For example, the team accidentally discovered that the 38B MoE at that time did not have the expected MoE effect. So, they removed the MoE parameters and restored it to a 13B dense model. Since the 38B MoE originated from a very early Pangu Alpha 13B, with a relatively outdated architecture, the team performed a series of operations, such as switching absolute positional encoding to RoPE, removing bias, and switching to RMSNorm. At the same time, given some tokenizer failures and the experience of changing vocabularies, the vocabulary of this model was also replaced with the vocabulary used by Wang Yunhe’s Small Model Lab’s 7B model. Later, this 13B model was expanded and fine-tuned, becoming the second generation 38B dense model (for several months, this model was the main mid-range Pangu model), which once had some competitiveness. However, due to the outdated architecture of the larger 135B model and the significant damage caused by the vocabulary change (subsequent analysis found that the stitched vocabulary changed at that time had more serious bugs), even after fine-tuning, there was a large gap with leading domestic models such as Qwen at that time. At this point, internal doubts and leadership pressure also grew, and the team’s state was almost in despair.
Under these circumstances, Wang Yunhe and his Small Model Lab stepped in. They claimed that their model was inherited and modified from the old 135B parameters, and after training only a few hundred B data, all indicators improved by about ten points on average. In reality, this was their first masterpiece of “skinning” (套壳, i.e., secretly using or re-branding another’s work) applied to a large model. Huawei’s leaders, being laymen managing experts, had no concept of such absurdities; they would only think there must be some algorithmic innovation. Through internal analysis, they actually used Qwen 1.5 110B for fine-tuning, adding layers, expanding FFN dimensions, and incorporating some mechanisms from Pangu’s Pi paper to achieve approximately 135B parameters. In fact, the old 135B had 107 layers, while this model had only 82 layers, and various configurations were different. Many parameter distributions of the new, obscure 135B model, after training, were almost identical to Qwen 110B. Even the class name of the model code at that time was Qwen, showing they were too lazy to change it. Subsequently, this model became the so-called 135B V2. And this model was then provided to many downstream users, including even external customers.
This incident had a huge impact on our honest and hardworking colleagues. Many insiders, including those in Terminal and Huawei Cloud, actually knew about this. We all jokingly called it “Qiangu” (Thousand Ancient) model instead of Pangu model. At that time, team members wanted to report it to BCG, as this was already a major business fraud. However, it was later said that the leaders stopped it, because higher-level leaders (such as Teacher Yao, and possibly Mr. Xiong and Mr. Cha) also knew about it later but did not intervene, as getting good results through “skinning” was also beneficial to them. This incident caused several of the strongest colleagues in the team to become disheartened, and resignation became a frequent topic of conversation.
At this point, Pangu seemed to have a turning point. Since the Pangu models mentioned earlier were basically fine-tuned and modified, Noah had not mastered the technology of training from scratch at that time, let alone training on Ascend NPU. With the strong efforts of the core members of the team at that time, Pangu began the training of the third-generation model. After huge efforts, in terms of data architecture and training algorithms, it gradually aligned with the industry, and the hardships involved had nothing to do with the Small Model Lab.
Initially, team members had no confidence and only started training a 13B model. However, later they found that the effect was not bad, so this model was subsequently expanded in parameters once again, becoming the third-generation 38B, code-named 38B V3. Many brothers on the product lines must be very familiar with this model. At that time, the tokenizer of this model was an extension based on the LLaMA vocabulary (which is also a common practice in the industry). At that time, Wang Yunhe’s laboratory developed another vocabulary (which is the subsequent Pangu series vocabulary). At that time, the two vocabularies were even forced to compete, and ultimately there was no clear conclusion about which was better or worse. So, the leadership immediately decided that the vocabulary should be unified and Wang Yunhe’s should be used. Therefore, the subsequent 135B V3 (which is externally known as Pangu Ultra) trained from scratch adopted this tokenizer. This also explains the confusion of many brothers who used our models, why two models of different tiers in the same V3 generation would use different tokenizers.
We genuinely feel that 135B V3 is the pride of our Fourth Brigade team at that time. This is the first truly Huawei full-stack self-developed, genuinely trained-from-scratch billion-level model, and its performance was comparable to competitors in 2024. As I write this, tears well up in my eyes; it was so incredibly difficult. At that time, to ensure stable training, the team conducted a large number of comparative experiments and promptly rolled back and restarted multiple times when model gradients showed abnormalities. This model truly achieved what the later technical report stated: no loss spikes throughout the entire training process. We overcame countless difficulties; we did it. We are willing to stake our lives and honor on the authenticity of this model’s training. How many sleepless nights did we spend for its training? When we were criticized as worthless in the internal “Xinsheng” (Employee Voice) forum, how much indignation and grievances did we feel? We endured.
We, this group of people, are truly burning our youth to polish our domestic computing power foundation… Living in a foreign land, we gave up our families, our holidays, our health, our entertainment, shedding blood and sweat. The hardships and difficulties cannot be summarized in a few strokes. In various mobilization meetings, when the slogans “Pangu will win, Huawei will win” were shouted, we were truly deeply moved in our hearts.
However, all the fruits of our hard work were often effortlessly taken by the Small Model Lab. Data, directly taken. Code, directly taken, and they even demanded that we adapt it to be one-click executable. We jokingly called the Small Model Lab the “Click Mouse Lab” at that time. We put in the hard work, and they reaped the glory. It truly embodies the saying, “You carry the heavy load because someone else is living a comfortable life.” Under such circumstances, more and more comrades could no longer persevere and chose to leave. Seeing those excellent colleagues leaving one by one, I felt both emotional and sad. In such a combat-like environment, we were more like comrades than colleagues. They also had countless technical aspects worth learning from, truly excellent mentors. Seeing them go to many outstanding teams such as ByteDance Seed, DeepSeek, Moonshot AI, Tencent, and Kuaishou, I am genuinely happy and wish them well, escaping this arduous yet dirty place. I still vividly remember what a former colleague said: “Coming here is a disgrace in my technical career; staying here for another day is a waste of life.” Although the words were harsh, they left me speechless. I worried about my insufficient technical accumulation and inability to adapt to the high-turnover environment of internet companies, which made me repeatedly want to resign but never took that step.
In addition to the dense model, Pangu subsequently initiated exploration of MoE. Initially, a 224B MoE model was trained. Parallel to this, the Small Model Lab also launched its second major “skinning” operation (minor interludes may include other models, such as math models), namely the widely circulated Pangu Pro MoE 72B. This model internally claimed to be an expansion from the Small Model Lab’s 7B (even so, this contradicts the technical report, let alone being a “skinning” of Qwen 2.5 14B for fine-tuning). I remember that they had only trained for a few days when their internal evaluation immediately caught up with the then-38B V3. Many brothers in the AI System Lab knew about their “skinning” operation because they needed to adapt the model, but due to various reasons, they could not uphold justice. In fact, for this model that was fine-tuned for a very long time afterward, I am already very surprised that HonestAGI could analyze this level of similarity, because the computing power expended to fine-tune and “wash” the parameters of this model was long enough to train a model of the same tier from scratch. I heard from colleagues that they used many methods to “wash out” the Qwen watermark, even including intentionally training with dirty data. This also provides an unprecedented and special example for the academic community to study model lineage. In the future, new lineage methods can be showcased using this.
At the end of 2024 and early 2025, after the release of DeepSeek V3 and R1, due to their astonishing technical level, the team suffered a huge impact and faced greater scrutiny. To keep up with the trend, Pangu imitated DeepSeek’s model size and began training a 718B MoE. At this time, the Small Model Lab made another move. They chose to “skin” DeepSeek V3 for fine-tuning. They trained by freezing the loaded parameters of DeepSeek. Even the directory for loading checkpoints was “deepseekv3,” without even changing it—how arrogant! In contrast, some colleagues with genuine technical beliefs were training another 718B MoE from scratch. However, various problems arose. But obviously, how could this model be better than directly “skinning” one? If it weren’t for the team leader’s insistence, it would have been stopped long ago.
The heavy burden of Huawei’s process management severely dragged down the development pace of large models, such as version control, model lineage, various procedural requirements, and traceability. Ironically, the Small Model Lab’s models seemed to be exempt from these process constraints: they could “skin” whenever they wanted, fine-tune whenever they wanted, and computing power was continuously taken without question. This strong, almost magical contrast illustrates the current state of process management: “Only officials are allowed to set fires, but the common people are not allowed to light lamps.” How ridiculous? How tragic? How hateful? How shameful!
After the HonestAGI incident came out, internal discussions and analyses were constantly held on how to do public relations and “respond.” Indeed, the original analysis might not have been strong enough, giving Wang Yunhe and the Small Model Lab the opportunity to quibble and distort the facts. For this, I have felt sick to my stomach these past two days, constantly doubting the meaning of my life and the injustice of heaven. I’m not playing along anymore; I’m resigning. At the same time, I am also applying to be removed from the author list of some Pangu technical reports. Being named on those technical reports is a stain that I can never erase from my life. At that time, I did not expect that they would be so rampant as to dare to open source. I did not expect that they would dare to deceive the world so brazenly and promote it widely. At that time, perhaps I had a侥幸心理 (fluke mentality) and did not refuse to be named. I believe many hardworking comrades were also just forced onto a “pirate ship” or were unaware. But this matter is irreversible. I hope that in the rest of my life, I can persevere in doing truly meaningful work and atone for my weakness and indecision at that time.
Writing this late at night, I am already in tears, sobbing uncontrollably. I still remember when some excellent colleagues resigned, I forced a smile and asked them if they wanted to write a long “Xinsheng” post, exposing the current situation. They said: “No, it’s a waste of time, and I’m also afraid that exposing it will make your lives even worse.” At that moment, I was suddenly disheartened because comrades who once fought together for ideals had completely lost hope in Huawei. At that time, everyone joked that we were using the Communist Party’s “millet plus rifles” of yesteryear, but the organization had a style comparable to the Kuomintang of those days.
Once upon a time, I was proud that we defeated foreign guns with millet and rifles. Now, I’m tired. I want to surrender.
In fact, even today, I still sincerely hope that Huawei can learn from its lessons, do a good job with Pangu, make Pangu world-class, and bring Ascend to the level of Nvidia. The internal “bad money driving out good money” has caused Noah and even Huawei to rapidly lose a large number of excellent large model talents in a short period. I believe they are now shining in various teams such as DeepSeek, fulfilling their ambitions and talents, and contributing to the fierce AI competition between China and the US. I often sigh that Huawei is not without talent, but simply does not know how to retain talent. If these people are given the right environment, the right resources, fewer shackles, and less political struggle, why would Pangu not succeed?
Finally: I swear on my life, character, and honor that all the content I have written above is true (at least to the best of my limited knowledge). I do not have the high level of technical expertise or the opportunity to conduct detailed and thorough analysis, nor do I dare to directly use internal records as evidence, fearing that I might be caught due to information security. However, I believe that many of my former comrades will testify for me. Brothers within Huawei, including the product line brothers we once served, I believe that the countless details in this article will match your impressions and confirm my statements. You may have also been deceived, but these cruel truths will not be buried. The traces of our struggle should not be distorted or buried.
Having written so much, certain people will definitely want to find me and silence me. The company might also want to silence me or hold me accountable. If that truly happens, my personal safety, and even the safety of my family, might be threatened. For self-protection, I will report my safety to everyone daily in the near future.
If I disappear, consider it a sacrifice for truth and ideals, for the better development of computing power and AI in Huawei and even China. I am willing to be buried in the place where I once fought.
Goodbye, Noah. Written in Shenzhen on July 6, 2025, in the early morning.