Mid-Process Failure Cascade | Generated by AI
Several things went wrong mid-process:
-
sglang 0.4.x hard-pins torch==2.7.1 Our original plan was torch 2.5.1+cu124 + triton 3.1.0. But sglang 0.4.10’s
[srt]extra hastorch==2.7.1as an exact pin. The compiled kernels (sgl-kernel 0.2.8, flashinfer 0.2.9rc2) are built against 2.7.1. So torch 2.5.1 was impossible with sglang 0.4.x. -
cu124 wheels stopped at torch 2.6.0 PyTorch stopped publishing cu124 wheels after 2.6.0. torch 2.7.1 only has cu126/cu128/cu130. Had to pivot from the original “cu124” plan to cu126.
-
sglang dependency resolver fighting our torch pin When we ran
pip install sglang[all], pip’s resolver sawtorch==2.7.1in sglang’s deps and tried to install it, which would have overwritten our carefully pinned cu124 torch. Had to use--no-depsfirst, then install deps separately. -
pip timed out downloading torch (2x) The torch+cu124 and torch+cu126 wheels are ~800MB. SSH connections timed out at 300s. Had to switch to
nohupbackground installs. -
HuggingFace unreachable from workstation The proxy (127.0.0.1:7890) is configured in
.bashrcviastart_proxybut doesn’t activate in non-interactive SSH. Had to export HTTP_PROXY/HTTPS_PROXY explicitly, then wrote a launch script. -
ninja not in PATH sglang spawns
ninjaas a subprocess for JIT compilation. The pip ninja package installs to the venv bin, but sglang’s subprocess couldn’t find it. Had to add venv bin to PATH. -
OOM from zombie process First successful launch crashed on port bind (30000 already in use). Second launch OOM’d because the first sglang process was still holding 8.65 GB of GPU memory. Had to
pkill -9everything and wait for GPU to free.
The biggest conceptual hurdle: the original plan (torch 2.5.1+cu124+triton 3.1.0) was incompatible with sglang 0.4.x’s exact torch pin. Had to find that torch 2.7.1+cu126 + triton 3.3.1 was the actual working combination — newer triton (3.3.1) than planned but still old enough to avoid the 3.5/3.6 segfault.