sglang Serving Qwen2.5-1.5B on RTX 4070 | Generated by AI

Home 2026.06

Done. sglang is running and serving Qwen2.5-1.5B-Instruct successfully on the RTX 4070.

Here’s the summary:

WHAT WAS SET UP: Fresh venv: ~/sglang-env torch: 2.7.1+cu126 (CUDA 12.6 runtime) triton: 3.3.1 (not the segfaulting 3.5/3.6) sglang: 0.4.10.post2 sgl-kernel: 0.2.8 flashinfer: 0.2.9rc2

KEY DETAILS:

LAUNCH SCRIPT: /tmp/start-sglang.sh Sets proxy (127.0.0.1:7890) and PATH, then runs sglang on port 30000

TO RESTART LATER: ssh lzw@192.168.1.36 “nohup /tmp/start-sglang.sh > /tmp/sglang-server.log 2>&1 &”


Back Donate