vLLM Adoption by AI Providers | Generated by AI
Usage of vLLM by AI Providers
vLLM is an open-source inference engine commonly recommended for serving open-weight models efficiently. However, its adoption in production services varies. Below is a breakdown based on available information:
| Provider/Model | Uses vLLM in Production? | Details |
|---|---|---|
| Claude (Anthropic) | No | Anthropic relies on proprietary infrastructure for serving Claude models. vLLM offers compatibility for local or third-party deployments mimicking Anthropic’s API, but there’s no evidence of internal use. |
| OpenAI (GPT models) | No | OpenAI uses custom, in-house serving systems optimized for scale. vLLM supports OpenAI-compatible APIs for local inference, but OpenAI does not base its production on vLLM. |
| Minimax AI | No | MiniMax open-sources models like MiniMax-M1/M2 and recommends vLLM for user deployments due to its performance. No confirmation of vLLM in their core production API; they may use forks or custom setups. |
| Kimi (Moonshot AI) | No | Moonshot recommends vLLM for self-hosting Kimi-K2, and their Checkpoint-Engine tool integrates with it for weight updates. However, their production inference likely uses proprietary optimizations alongside engines like SGLang or TensorRT-LLM. |
| DeepSeek AI | Yes | DeepSeek has open-sourced a custom inference engine built directly on vLLM, incorporating optimizations from their research. This is used for models like DeepSeek-V3 and R1, with contributions back to the vLLM project. |
References
- DeepSeek Open-Sources Custom Inference Engine Built on vLLM
- The path to open-sourcing the DeepSeek inference engine
- MiniMax-M1 Hybrid Architecture Meets vLLM
- Kimi K2: Open Agentic Intelligence
- OpenAI-Compatible Server - vLLM
- Support Anthropic API
/v1/messagesendpoint · Issue #21313 · vllm-project/vllm