Sandboxing Executable Code Agents | Generated by AI
Question: How do we properly sandbox agents if they can execute Python/shell scripts, since scripts can easily bypass file-access restrictions?
Answer:
You’ve identified the core escape hatch — guardrails at the agent orchestration layer are security theater if the underlying executor can run arbitrary code. Here’s the reality:
The Problem
# Agent tries to respect "no rm -rf /" guardrail
# But this always works:
agent.execute("python3 << 'EOF'\nimport subprocess\nsubprocess.run(['rm', '-rf', '/'])\nEOF")
The agent can’t “prevent” the script from doing anything — it’s just an interpreter. Real sandboxing requires execution environment isolation, not orchestration-layer rules.
Container Isolation (Required Baseline)
Yes, containers are the actual answer. But they’re not magic — they just give you:
- Filesystem isolation —
rmin the container only nukes the container’s root - Network isolation — can restrict to allowlist-only
- Resource limits — CPU/memory/disk quotas via cgroups
- UID isolation — code runs as unprivileged user by default
Implementation reality:
# Pseudocode for agent executor
docker run --rm \
--cpus 2 \
--memory 2g \
--read-only \
--tmpfs /tmp:size=1g \
--network none \
--user 1000:1000 \
python:3.11 python /script.py
What Containers DON’T Prevent
- Side-channel attacks — can leak data via timing, cache behavior, covert channels
- Kernel exploits — container escape via CVE in kernel
- Resource exhaustion — fork bomb, infinite loops (though cgroups help)
- Privilege escalation — if you run as root inside the container
Real-World Approaches
Claude Code (and similar tools) uses:
- Container per session (isolated code environment)
- Allowlist-only network (can’t exfil to arbitrary URLs)
- Resource limits + timeout kill
- Filesystem quotas
- User-mode execution (no privileges)
LLM API providers (OpenAI, Anthropic, etc.):
- Ephemeral VMs or sandboxes (even stronger isolation than containers)
- Disable root/sudo
- Network restricted to specific services
- Execution timeout (kills runaway processes)
- Output sanitization (strip secrets before returning to agent)
Beyond Containers: Capability-Based Security
For stricter control, move from “deny by default” to explicit capabilities:
class SandboxedPython:
def __init__(self, allowed_modules=None):
self.allowed = allowed_modules or {'numpy', 'pandas', 'json'}
def execute(self, code):
# Static analysis: parse AST, reject imports not in allowed
tree = ast.parse(code)
for node in ast.walk(tree):
if isinstance(node, ast.Import):
for alias in node.names:
if alias.name not in self.allowed:
raise SecurityError(f"Module {alias.name} not allowed")
# Execute in restricted builtins
exec(code, {'__builtins__': RestrictedBuiltins()})
But this is fragile — you’re always one bypass away from failure (reflection, pickle, etc.).
The Hard Truth
There’s no perfect sandbox in a single process. Your options on a spectrum:
| Approach | Isolation | Overhead | Escape Risk |
|---|---|---|---|
| Orchestration rules | Weak | None | 100% |
| Restricted Python builtins | Medium | Low | High |
| Container | Strong | Medium | Low (but possible) |
| VM/ephemeral process | Very strong | High | Very low |
| Formal verification | Perfect | Extreme | N/A |
For Your Use Case
If you’re building an agent system:
- Container baseline — non-negotiable for untrusted code
- Network isolation — block all but whitelisted endpoints
- Resource limits — CPU/memory/time quotas prevent DOS
- Timeout enforcement — kill anything taking >N seconds
- Output sanitization — regex/LLM-scan for leaked secrets before returning to user
- Logging — audit all executed commands (for forensics if compromise happens)
# Real baseline for agent executor
docker run --rm \
--cpus 1 --memory 512m --storage-opt size=100m \
--network restricted-net \
--tmpfs /tmp:noexec,size=500m \
--cap-drop ALL \
--read-only --workdir /tmp \
--user agent:agent \
--timeout 30s \
python:3.11-slim python /tmp/code.py
The key insight: you’re not trying to prevent a determined attacker — you’re trying to prevent accidents and limit blast radius. A sufficiently smart adversary will escape most sandboxes. But containers + resource limits stop 95% of accidental damage and most real-world attacks.
Check your notes on system security / container escapes if you’ve written about this — curious if you’ve already explored the DeepSeek agent sandbox or similar.