Remote CUDA Driver Installation Guide | Generated by AI
📋 Review: CUDA Driver Installation on Remote Server
Step 1 — Connect & discover hardware
SSH’d to root@47.107.70.224, ran lscpu, free, lspci, nvidia-smi. Found 2× RTX 3090 but no driver loaded. Key insight: lspci showed the GPUs at PCIe level even when nvidia-smi couldn’t see them (driver-independent check).
Step 2 — Set up passwordless SSH
Used sshpass to copy the local ed25519 public key to ~/.ssh/authorized_keys on the remote. This let me chain multiple commands without re-authenticating. Verified with ssh -o BatchMode=yes.
Step 3 — Check available drivers
Ran apt update && ubuntu-drivers devices on the remote. This queries Ubuntu’s driver database and lists compatible NVIDIA drivers. The recommended one was nvidia-driver-580-open. Chose nvidia-driver-580 (non-open, proprietary — better CUDA compatibility).
Step 4 — Install NVIDIA driver
DEBIAN_FRONTEND=noninteractive apt install -y nvidia-driver-580 nvidia-utils-580
DEBIAN_FRONTEND=noninteractive— suppresses all interactive prompts (critical for remote/headless installs)nvidia-driver-580— the kernel module + userspace librariesnvidia-utils-580— CLI tools includingnvidia-smi- Verified with
nvidia-smi→ both GPUs visible, driver 580.126.09, CUDA 13.0 runtime
Step 5 — Install CUDA toolkit (the messy part)
First attempt: apt install cuda-toolkit-12-8 — failed, package not in default Ubuntu repos.
Fixed by adding NVIDIA’s official repo:
wget cuda-keyring_1.1-1_all.deb # Sets up NVIDIA's apt source + GPG key
dpkg -i cuda-keyring_1.1-1_all.deb
apt update
apt install cuda-toolkit-12-8
Second attempt: SSH connection dropped mid-install (exit code 255 = connection killed). Long-running apt over SSH is risky.
Third attempt: Used nohup to background the install on the remote:
nohup apt install -y cuda-toolkit-12-8 > /tmp/cuda-install.log 2>&1 &
But hit a lock conflict — the earlier apt process (PID 23704) was still alive from the dropped SSH session. The duplicate waited on /var/lib/dpkg/lock-frontend forever.
Fix: Killed the duplicate, waited for the original apt to finish, confirmed with dpkg -l cuda-toolkit-12-8.
Step 6 — Configure PATH
CUDA installs to /usr/local/cuda-12.8/bin which isn’t in PATH by default. Added to ~/.bashrc:
export PATH=/usr/local/cuda-12.8/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda-12.8/lib64:$LD_LIBRARY_PATH
🤔 What went wrong / what I’d do differently
| Issue | Root cause | Better approach |
|---|---|---|
| CUDA package not found | Tried installing from Ubuntu repos, not NVIDIA’s | Should’ve added cuda-keyring first |
| SSH dropped during install | Long-running apt over SSH killed the connection | Use nohup or tmux/screen from the start for any install > 30s |
| Lock conflict | Two apt instances running simultaneously | Kill stale processes before retrying, or use flock |
nvcc not found in non-interactive shell |
.bashrc isn’t sourced in non-interactive SSH sessions |
Use explicit export PATH=... in scripts, or add to /etc/environment |
✅ What went well
sshpassfor scripted password auth — fast, no manual interactionnohuptrick saved the install after SSH dropped — good recovery- Single SSH key setup at the start paid off for every subsequent command
lspcias fallback whennvidia-smifailed — caught the GPUs early
TL;DR: Should’ve used nohup/tmux from the start for the CUDA install, and should’ve added NVIDIA’s repo before trying to install. The rest was smooth. Lesson learned: never run long apt installs over raw SSH without a session wrapper.