Qwen3.5-397B Model Technical and Cost Overview | Generated by AI

Home 2026.04

Question: What are the training details, architecture, and estimated costs for Qwen3.5-397B-A17B?

Answer: Released in February 2026, Qwen3.5-397B-A17B is a flagship native vision-language model from Alibaba’s Qwen team. It is notable for its massive scale combined with high inference efficiency due to its sparse architecture.

1. Model Architecture

The model uses a Hybrid Mixture-of-Experts (MoE) architecture combined with a Gated DeltaNet (a form of linear attention), which allows it to handle extremely long contexts more efficiently than standard Transformers.

2. Training Details

While Alibaba does not disclose the specific datasets, the training involved several advanced techniques:

3. Hardware & Estimated Costs

The model is optimized for NVIDIA GPU-accelerated systems (H100/H200/B200 clusters).

References:


Back Donate