vLLM Adoption by AI Providers | Generated by AI

Home 2025.10

Usage of vLLM by AI Providers

vLLM is an open-source inference engine commonly recommended for serving open-weight models efficiently. However, its adoption in production services varies. Below is a breakdown based on available information:

Provider/Model Uses vLLM in Production? Details
Claude (Anthropic) No Anthropic relies on proprietary infrastructure for serving Claude models. vLLM offers compatibility for local or third-party deployments mimicking Anthropic’s API, but there’s no evidence of internal use.
OpenAI (GPT models) No OpenAI uses custom, in-house serving systems optimized for scale. vLLM supports OpenAI-compatible APIs for local inference, but OpenAI does not base its production on vLLM.
Minimax AI No MiniMax open-sources models like MiniMax-M1/M2 and recommends vLLM for user deployments due to its performance. No confirmation of vLLM in their core production API; they may use forks or custom setups.
Kimi (Moonshot AI) No Moonshot recommends vLLM for self-hosting Kimi-K2, and their Checkpoint-Engine tool integrates with it for weight updates. However, their production inference likely uses proprietary optimizations alongside engines like SGLang or TensorRT-LLM.
DeepSeek AI Yes DeepSeek has open-sourced a custom inference engine built directly on vLLM, incorporating optimizations from their research. This is used for models like DeepSeek-V3 and R1, with contributions back to the vLLM project.

References


Back

x-ai/grok-4-fast

Donate