Hongming Wang 24fec62d7f initial commit — Molecule AI platform

Forked clean from public hackathon repo (Starfire-AgentTeam, BSL 1.1)
with full rebrand to Molecule AI under github.com/Molecule-AI/molecule-monorepo.

Brand: Starfire → Molecule AI.
Slug: starfire / agent-molecule → molecule.
Env vars: STARFIRE_* → MOLECULE_*.
Go module: github.com/agent-molecule/platform → github.com/Molecule-AI/molecule-monorepo/platform.
Python packages: starfire_plugin → molecule_plugin, starfire_agent → molecule_agent.
DB: agentmolecule → molecule.

History truncated; see public repo for prior commits and contributor
attribution. Verified green: go test -race ./... (platform), pytest
(workspace-template 1129 + sdk 132), vitest (canvas 352), build (mcp).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-04-13 11:55:37 -07:00

4.0 KiB

Raw Blame History

Code Sandbox

The code sandbox isolates agent-generated code execution — specifically the run_code tool that executes dynamically generated scripts. Not user-submitted code (there is no user code submission in Molecule AI) — the agent's own generated code is what needs sandboxing.

What Gets Sandboxed

	Runs in	Why
Agent-generated code execution	Sandbox	e.g. "write and run this script"
pip installs from skill requirements	Sandbox	Untrusted package code
Filesystem writes outside `/memory` and `/configs`	Sandbox	Prevent container escape
`SKILL.md` loading	Workspace container	Just file reads
LangChain `@tool` functions	Workspace container	Just Python function calls
A2A HTTP calls to peers	Workspace container	Network calls to known endpoints
Platform heartbeat/registry calls	Workspace container	Known endpoints

The sandbox only activates when the agent calls a run_code tool that executes dynamic code. Regular skill tools — API calls, file reads, data processing — run directly in the workspace container without sandbox overhead.

Configuration

# config.yaml
tier: 3
sandbox:
  backend: docker    # docker | firecracker | e2b | none
  memory_limit: 256m
  cpu_limit: 0.5
  network: false
  timeout: 30s

Sandbox by Tier

Tier	`sandbox.backend`	Reason
1, 2	`none`	No `run_code` tool available — tools are just API calls
3	`docker` (MVP), `firecracker` or `e2b` (production)	Agent can generate and run code
4	`none`	Full-host access tier — no extra sandbox boundary is added by default

Tier 4 doesn't add a second sandbox by default because the workspace already runs with host-level privileges. If you need isolated code execution at that tier, treat it as an explicit defense-in-depth decision rather than an assumption baked into the current provisioner.

How It Works (Tier 3)

Each code execution spawns a throwaway container:

Agent calls run_code(code="import pandas as pd; ...")
Sandbox creates a temporary Docker container (Docker-in-Docker)
Container runs with: network disabled, memory capped, read-only filesystem, CPU limited
Code executes inside the throwaway container
Output (stdout, stderr, return value) is captured
Throwaway container is destroyed immediately after

@tool(description="Execute code safely")
async def run_code(code: str) -> dict:
    result = docker.run(
        image="python:3.11-slim",
        command=["python", "-c", code],
        remove=True,
        network_disabled=True,
        mem_limit="256m",
        read_only=True,
    )
    return {"output": result.output}

The workspace container itself is never at risk — the generated code can't escape the sandbox.

Backends

docker (MVP)

Docker-in-Docker. The workspace container runs Docker and spawns child containers for code execution. Simple, works everywhere Docker is available.

firecracker

MicroVM-based isolation. Faster cold starts than Docker, with a stronger boundary than standard containers. Better for production workloads with many concurrent code executions.

e2b

Cloud-hosted sandboxes via E2B. No local Docker needed. The workspace sends code to E2B's API and gets results back. Good for hosted deployments where you don't want to manage Docker-in-Docker.

Key Properties

Skill code never changes — only the backend config
Each execution is isolated — no shared state between runs
Containers are destroyed after every run
Network is disabled by default (can be enabled per-sandbox if needed)
Memory is capped to prevent resource exhaustion

Workspace Tiers — Which tiers need sandboxing
Config Format — Sandbox configuration in config.yaml
Provisioner — Container deployment details
Skills — Skill tools that may use the sandbox

4.0 KiB Raw Blame History