DeepSeek-V4-Pro Architecture and Code Overview | Generated by AI

Home 2026.05

Here’s the full breakdown:

DeepSeek-V4-Pro Inference Code

Location: ~/projects/deepseek-v4-inference/

Architecture (from config.json)

DeepSeek-V4-Pro is a massive Mixture-of-Experts model:

File-by-File

model.py (38K, 827 lines) — the core:

kernel.py (22K, 536 lines) — tilelang JIT kernels:

generate.py (6.1K, 155 lines) — entry point:

convert.py (6.9K, 168 lines) — weight conversion:

requirements.txt:

What’s Missing

The ../encoding/ directory with encoding_dsv4.py is not included — it handles chat template encoding/parsing. You’d need that from the full repo for interactive mode to work.

Hardware Requirements

With FP8+FP4 quantization and 384 experts, this model is enormous. The default config expects 8-way model parallelism (MP=8). Your RTX 4070 (12 GB) won’t fit even one shard. The MI300X (192 GB HBM3) on AMD Dev Cloud could handle it — likely needs 1-2 MI300X cards depending on batch size.


Back Donate