Who Built PyTorch Distributed | Generated by AI

Home 2026.06

Question: Is Andrew Tulloch the person behind PyTorch’s distributed training module (torch.distributed), or was it built by others?

Answer:

Short version: not Tulloch. He’s a PyTorch ecosystem legend, but torch.distributed / DDP was built by a different group at Meta.

Who actually built torch.distributed / DDP

The canonical reference is the VLDB 2020 paper PyTorch Distributed: Experiences on Accelerating Data Parallel Training (arXiv:2006.15704). The author list is Shen Li, Yanli Zhao, Rohan Varma, Omkar Salpekar, Pieter Noordhuis, Teng Li, Adam Paszke, Jeff Smith, Brian Vaughan, Pritam Damania, and Soumith Chintala. Tulloch is not on it. The key names to know:

So what is Tulloch known for?

Adjacent but different territory — performance and inference infrastructure, not the distributed training APIs:

Mental model for you

Since you’re studying nanoGPT-scale training and DeepSeek-style MoE, map the names to the layers:

torch.distributed (c10d, ProcessGroup, NCCL/Gloo)  → Pieter Noordhuis, Teng Li
DDP (Reducer, bucketing, comm/compute overlap)      → Shen Li
FSDP (param sharding, ZeRO-3-style)                 → Yanli Zhao, Rohan Varma, Min Xu (fairscale)
DTensor / torchtitan (TP/PP/FSDP2 composition)      → Wanchao Liang
Kernels / quantization / inference perf (FBGEMM)    → Andrew Tulloch

If you want to read code, start with torch/csrc/distributed/c10d/reducer.cpp — that’s the heart of Shen Li’s DDP design (gradient buckets + autograd hooks + async AllReduce), and it’s surprisingly readable.

References:


Back Donate