docs: add docs/development/code-sandbox.md

2026-04-21 07:51:16 +00:00 · 2026-04-21 07:51:16 +00:00 · 9db66d7753
commit 9db66d7753
parent 5d3531e380
1 changed files with 96 additions and 0 deletions
--- a/content/docs/development/code-sandbox.md
+++ b/content/docs/development/code-sandbox.md
@ -0,0 +1,96 @@
+# Code Sandbox
+
+The code sandbox isolates agent-generated code execution — specifically the `run_code` tool that executes dynamically generated scripts. Not user-submitted code (there is no user code submission in Molecule AI) — the agent's own generated code is what needs sandboxing.
+
+## What Gets Sandboxed
+
+| | Runs in | Why |
+|---|---------|-----|
+| Agent-generated code execution | Sandbox | e.g. "write and run this script" |
+| pip installs from skill requirements | Sandbox | Untrusted package code |
+| Filesystem writes outside `/memory` and `/configs` | Sandbox | Prevent container escape |
+| `SKILL.md` loading | Workspace container | Just file reads |
+| LangChain `@tool` functions | Workspace container | Just Python function calls |
+| A2A HTTP calls to peers | Workspace container | Network calls to known endpoints |
+| Platform heartbeat/registry calls | Workspace container | Known endpoints |
+
+The sandbox only activates when the agent calls a `run_code` tool that executes dynamic code. Regular skill tools — API calls, file reads, data processing — run directly in the workspace container without sandbox overhead.
+
+## Configuration
+
+```yaml
+# config.yaml
+tier: 3
+sandbox:
+  backend: docker    # docker | firecracker | e2b | none
+  memory_limit: 256m
+  cpu_limit: 0.5
+  network: false
+  timeout: 30s
+```
+
+## Sandbox by Tier
+
+| Tier | `sandbox.backend` | Reason |
+|------|--------------------|--------|
+| 1, 2 | `none` | No `run_code` tool available — tools are just API calls |
+| 3 | `docker` (MVP), `firecracker` or `e2b` (production) | Agent can generate and run code |
+| 4 | `none` | Full-host access tier — no extra sandbox boundary is added by default |
+
+Tier 4 doesn't add a second sandbox by default because the workspace already runs with host-level privileges. If you need isolated code execution at that tier, treat it as an explicit defense-in-depth decision rather than an assumption baked into the current provisioner.
+
+## How It Works (Tier 3)
+
+Each code execution spawns a throwaway container:
+
+1. Agent calls `run_code(code="import pandas as pd; ...")`
+2. Sandbox creates a temporary Docker container (Docker-in-Docker)
+3. Container runs with: network disabled, memory capped, read-only filesystem, CPU limited
+4. Code executes inside the throwaway container
+5. Output (stdout, stderr, return value) is captured
+6. Throwaway container is destroyed immediately after
+
+```python
+@tool(description="Execute code safely")
+async def run_code(code: str) -> dict:
+    result = docker.run(
+        image="python:3.11-slim",
+        command=["python", "-c", code],
+        remove=True,
+        network_disabled=True,
+        mem_limit="256m",
+        read_only=True,
+    )
+    return {"output": result.output}
+```
+
+The workspace container itself is never at risk — the generated code can't escape the sandbox.
+
+## Backends
+
+### docker (MVP)
+
+Docker-in-Docker. The workspace container runs Docker and spawns child containers for code execution. Simple, works everywhere Docker is available.
+
+### firecracker
+
+MicroVM-based isolation. Faster cold starts than Docker, with a stronger boundary than standard containers. Better for production workloads with many concurrent code executions.
+
+### e2b
+
+Cloud-hosted sandboxes via [E2B](https://e2b.dev). No local Docker needed. The workspace sends code to E2B's API and gets results back. Good for hosted deployments where you don't want to manage Docker-in-Docker.
+
+## Key Properties
+
+- Skill code never changes — only the backend config
+- Each execution is isolated — no shared state between runs
+- Containers are destroyed after every run
+- Network is disabled by default (can be enabled per-sandbox if needed)
+- Memory is capped to prevent resource exhaustion
+
+## Related Docs
+
+- [Workspace Tiers](../architecture/workspace-tiers.md) — Which tiers need sandboxing
+- [Config Format](../agent-runtime/config-format.md) — Sandbox configuration in `config.yaml`
+- [Provisioner](../architecture/provisioner.md) — Container deployment details
+- [Skills](../agent-runtime/skills.md) — Skill tools that may use the sandbox