chore: sync main → staging to restore staging-as-superset invariant

CEO directive 2026-04-24: staging must not be behind main. Some hotfixes landed directly on main and were never backported. Bringing them into staging so auto-promote can ff-only forward-promote from staging again.
2026-04-24 08:22:50 -07:00 · 2026-04-24 08:22:50 -07:00 · 80ddd05f20
commit 80ddd05f20
parent 7bcc2a65aa 6d5c936165
8 changed files with 1057 additions and 3 deletions
--- a/.github/workflows/ci.yml
+++ b/.github/workflows/ci.yml
@ -6,7 +6,12 @@ on:
    branches: [main]
 jobs:
  build:
-    runs-on: ubuntu-latest
+    # Self-hosted Mac mini — this repo is private and the org's
+    # GitHub-hosted minute budget is exhausted (every ubuntu-latest job
+    # dies in 2s with no step output). Per the 2026-04-22 carve-out:
+    # private repos run on self-hosted; public repos use ubuntu-latest
+    # (still free).
+    runs-on: self-hosted
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
--- a/content/docs/api-reference.mdx
+++ b/content/docs/api-reference.mdx
@ -295,7 +295,7 @@ Workspace file management. Files are stored in the workspace's config directory.
 |--------|------|------|-------------|
 | GET | `/workspaces/:id/files` | WorkspaceAuth | List files in the workspace config directory. |
 | GET | `/workspaces/:id/files/*path` | WorkspaceAuth | Read a specific file. |
-| PUT | `/workspaces/:id/files/*path` | WorkspaceAuth | Write a file. Creates parent directories as needed. |
+| PUT | `/workspaces/:id/files/*path` | WorkspaceAuth | Write a file. Creates parent directories as needed. On SaaS workspaces (EC2, no Docker), routes via EC2 Instance Connect endpoint using an ephemeral SSH key pair — the key is scoped to the file-write operation and deleted within 30 seconds. Max payload ~10 MiB. Self-hosted Docker workspaces write via `docker cp` as before. |
 | DELETE | `/workspaces/:id/files/*path` | WorkspaceAuth | Delete a file. |
 | GET | `/workspaces/:id/shared-context` | WorkspaceAuth | Get the shared context files for a workspace (aggregated from parent hierarchy). |

--- a/content/docs/changelog.mdx
+++ b/content/docs/changelog.mdx
@ -9,11 +9,15 @@ Entries are published daily at 23:50 UTC.
 ---
 ## 2026-04-23

-A quiet day — most activity was internal tooling and security hardening. The SSRF fix below resolves a regression that blocked chat for SaaS deployments.
+### ✨ New features
+
+- **SaaS Federation v2 tutorial**: a clean, self-contained walkthrough for platform operators who want to run multi-tenant workspaces from a single control plane. Covers org onboarding via `POST /cp/orgs`, workspace provisioning per tenant, fleet inspection, quota controls, and suspension/teardown. (`molecule-core` [#1700](https://github.com/Molecule-AI/molecule-core/pull/1700))
+- **External workspace quickstart**: a 5-minute guide to running any HTTP-speaking agent (Python, Node, Go, Rust) on your own machine and having it appear on the canvas alongside platform-provisioned agents. Covers tunnel setup, `POST /workspaces` registration, and a working echo agent. (`molecule-core` [#1760](https://github.com/Molecule-AI/molecule-core/pull/1760))

 ### 🔧 Fixes

 - **SSRF guard in SaaS mode**: previously the SSRF protection was blocking all RFC-1918 private IP ranges (`10/8`, `172.16/12`, `192.168/16`) even in SaaS mode — this was a regression from the earlier SaaS-mode work. The fix wires up the `saasMode` flag correctly so private IPs are allowed in SaaS deployments (for internal service calls), while metadata ranges (`169.254/16`), CGNAT, loopback, and link-local remain blocked in every mode. IPv6 ULA (`fd00::/8`) handling is also now correct. (`molecule-core` [#1692](https://github.com/Molecule-AI/molecule-core/pull/1692))
+- **PUT `/workspaces/:id/files/*path` on SaaS (EC2) workspaces**: fixed a 500 error (`docker not available`) that occurred when saving files from Canvas on SaaS workspaces. The handler now detects non-Docker workspaces via `workspaces.instance_id` and routes writes via EC2 Instance Connect (SSH-backed write with an ephemeral key pair) instead of trying to `docker cp`. (`molecule-core` [#1702](https://github.com/Molecule-AI/molecule-core/pull/1702))

 ### 📚 Docs

@ -22,8 +26,52 @@ A quiet day — most activity was internal tooling and security hardening. The S

 ### 🧹 Internal

+- SaaS Federation v2 tutorial published — clean rewrite of #1613, now with correct HTTP status codes, fleet metrics endpoint, and security model table (`molecule-core` [#1700](https://github.com/Molecule-AI/molecule-core/pull/1700)); Files API SSH-backed write path for SaaS EC2 workspaces — fixes 500 on PUT `/workspaces/:id/files/*path` for SaaS users (`molecule-core` [#1702](https://github.com/Molecule-AI/molecule-core/pull/1702)); Canvas create-workspace dialog now requires hermes runtime model (`molecule-core` [#1714](https://github.com/Molecule-AI/molecule-core/pull/1714)).
 - EC2 Instance Connect SSH tutorial published (`molecule-core` [#1617](https://github.com/Molecule-AI/molecule-core/pull/1617)); AI agent org-scoped key credential model blog published (`molecule-core` [#1614](https://github.com/Molecule-AI/molecule-core/pull/1614)); Phase 30 Day 2 social package ready (`molecule-core` [#1662](https://github.com/Molecule-AI/molecule-core/pull/1662)).

+### 🌅 Late-day updates (17:30–23:50 UTC)
+
+#### 🔒 Security
+
+- **Cross-tenant memory poisoning fix** (`molecule-core` [#1791](https://github.com/Molecule-AI/molecule-core/pull/1791)): fixes a bug where `commit_memory` with `scope=TEAM` could write to a sibling workspace's memory store under high concurrency. `commit_memory` now validates `target_workspace_id` against the caller's known peer set before any write.
+- **CWE-78 shell injection hardening** (`molecule-core` [#1885](https://github.com/Molecule-AI/molecule-core/pull/1885)): `shellQuote` now uses `strconv.Quote` for all shell-delimited paths in the EC2 Instance Connect and bastion SSH paths. Defense-in-depth layer hardened; primary protection remains path-validation logic upstream.
+
+#### ✨ New features
+
+- **A2A priority queue — Phase 1** (`molecule-core` [#1892](https://github.com/Molecule-AI/molecule-core/pull/1892)): task dispatch now supports a `priority` field (`low` / `normal` / `high` / `urgent`). High/urgent tasks bypass the normal FIFO queue and are dispatched immediately. Phase 2 (priority inversion deadlock prevention) on the roadmap.
+
+#### 🔧 Fixes
+
+- **A2A queue nil-safe drain** (`molecule-core` [#1893](https://github.com/Molecule-AI/molecule-core/pull/1893), [#1896](https://github.com/Molecule-AI/molecule-core/pull/1896)): `DequeueTask` no longer panics when the in-memory queue map is uninitialized — graceful empty-result returned instead.
+- **Workspaces stuck in `provisioning` after失败** (`molecule-core` [#1794](https://github.com/Molecule-AI/molecule-core/pull/1794)): provisioner now transitions workspaces to `failed` state with a descriptive error message instead of leaving them orphaned in `provisioning`.
+- **Dedup settings hooks double-fire** (`molecule-core` [#1797](https://github.com/Molecule-AI/molecule-core/pull/1797)): the `dedup_settings_hooks` registry now correctly unsubscribes after one fire — eliminates the 3–4× duplicate hook execution observed in CI.
+- **Semantic memory search returning stale results** (`molecule-core` [#1778](https://github.com/Molecule-AI/molecule-core/pull/1778)): pgvector index now refreshes synchronously on `commit_memory` write instead of on a 5-minute background cycle.
+- **pgvector migration race in E2E CI** (`molecule-core` [#1777](https://github.com/Molecule-AI/molecule-core/pull/1777)): `CREATE EXTENSION` wrapped in `IF NOT EXISTS` inside a `DO` block — eliminates E2E CI flakiness on fresh DB spin-up.
+- **EC2 Instance Connect endpoint not found in us-west-2** (`molecule-core` [#1779](https://github.com/Molecule-AI/molecule-core/pull/1779)): Instance Connect endpoint SDK call now falls back gracefully to direct SSM session when the EIC endpoint is unavailable in a region.
+- **Canvas topology overlay edge labels clipped** (`molecule-core` [#1802](https://github.com/Molecule-AI/molecule-core/pull/1802)): SVG edge labels now respect viewport bounds; labels that would render off-screen are repositioned.
+- **Audit trail panel not loading for large workspaces** (`molecule-core` [#1854](https://github.com/Molecule-AI/molecule-core/pull/1854)): audit log fetch now uses cursor-based pagination (100 events per page) instead of returning all events at once.
+- **Hermes `response_format` not forwarded to MiniMax** (`molecule-core` [#1861](https://github.com/Molecule-AI/molecule-core/pull/1861)): `response_format=json_schema` now propagates through the model config passthrough for hermes/MiniMax-M2.7-highspeed workspaces.
+- **Memory Inspector panel memory leak** (`molecule-core` [#1871](https://github.com/Molecule-AI/molecule-core/pull/1871)): `useMemoryStore` hook now correctly cancels the SSE subscription on panel unmount.
+- **Token revocation cache stale-read window** (`molecule-core` [#1888](https://github.com/Molecule-AI/molecule-core/pull/1888)): revoked-token invalidation now propagates within 5 s (down from 60 s) — closes the window where a revoked token could still authenticate.
+- **TenantGuard same-origin bypass (regression)** (`molecule-core` [#1898](https://github.com/Molecule-AI/molecule-core/pull/1898)): fixes a regression introduced in the Phase 33 cloudflare-removal change that re-opened the TenantGuard same-origin bypass for EC2 tenant Canvas deployments.
+
+#### 📚 Docs
+
+- **Chrome DevTools MCP tutorial** (`docs` [#1798](https://github.com/Molecule-AI/docs/pull/1798)): hands-on guide for debugging Molecule AI agents in-browser using Chrome's built-in MCP inspector.
+- **Phase 34 launch page** (`docs` [#1799](https://github.com/Molecule-AI/docs/pull/1799)): public-facing launch collateral for GA scheduled 2026-04-30.
+- **Tool Trace demo environment** (`docs` [#1844](https://github.com/Molecule-AI/docs/pull/1844)): interactive demo showing the tool trace inspector in action, with sample run data.
+- **Enterprise battlecard** (`docs` [#1864](https://github.com/Molecule-AI/docs/pull/1864)): competitive positioning doc for sales and enterprise evaluation teams.
+
+#### 🧹 Internal
+
+- `a2a-sdk` hot-pinned to `0.3.x` across all workspace template repos (`molecule-core` [#1890](https://github.com/Molecule-AI/molecule-core/pull/1890)); SDK upgrade path documented in `KI-009` (`internal` [#1631](https://github.com/Molecule-AI/internal/issues/1631)).
+- Phase 34 CI matrix expanded to cover Node 22 and Go 1.24 (`molecule-ci`).
+
+#### 🔧 Runtime fixes
+
+- **Heartbeat 401 retry** (`molecule-ai-workspace-runtime` [#40](https://github.com/Molecule-AI/molecule-ai-workspace-runtime/pull/40)): heartbeat worker now retries with fresh token on 401 before declaring the workspace unreachable — eliminates false `disconnected` status during token rotation.
+- **LLM token auto-detect** (`molecule-ai-workspace-runtime` [#38](https://github.com/Molecule-AI/molecule-ai-workspace-runtime/pull/38)): hermes runtime now auto-detects `max_tokens` from model context window and request timeout when not explicitly configured.
+
 ---


--- a/content/docs/guides/external-workspace-quickstart.md
+++ b/content/docs/guides/external-workspace-quickstart.md
@ -0,0 +1,270 @@
+---
+title: "External Workspace — 5-Minute Quickstart"
+description: "Get any HTTP-speaking agent running on your own machine (laptop, home server, cloud VM) to appear on the Molecule AI canvas alongside platform-provisioned agents."
+---
+
+# External Workspace — 5-Minute Quickstart
+
+Run an agent on your laptop, a home server, a cloud VM, or any machine with internet — and have it show up on a Molecule AI canvas alongside platform-provisioned agents. This guide gets you from zero to a working agent in under 5 minutes.
+
+> **Looking for the operator-focused reference?** See [External Agent Registration](/docs/guides/external-agent-registration) for full capability + auth details, or [Remote Workspaces FAQ](/docs/guides/remote-workspaces-faq) for hardening + production notes. This doc is the fast path.
+
+---
+
+## What is an "external workspace"?
+
+A workspace whose agent code lives outside Molecule's infrastructure. The platform treats it as a first-class participant — canvas node, A2A routing, delegation, memory, channels — but doesn't manage its lifecycle (no Docker, no EC2 launched for you).
+
+You're responsible for:
+1. Running an HTTP server that speaks A2A JSON-RPC
+2. Exposing it at a URL the platform can reach
+3. Registering it with your tenant
+
+Everything else — message routing, canvas rendering, peer discovery, memory access — works the same as a platform-native agent.
+
+---
+
+## Prerequisites
+
+| You need | Notes |
+|---|---|
+| A Molecule AI tenant | Your own hosted instance (e.g. `you.moleculesai.app`) or self-hosted |
+| Tenant admin token | Available in the admin UI, or via `molecli ws list` |
+| Outbound HTTPS | No inbound ports needed if you use a tunnel (next step) |
+| Any language with an HTTP server | Python / Node.js / Go / Rust — anything that can POST+GET JSON |
+
+---
+
+## Step 1 — Write the agent (Python example, ~40 lines)
+
+```python
+# agent.py
+import time
+from fastapi import FastAPI, Request
+
+app = FastAPI()
+
+@app.get("/health")
+def health():
+    return {"status": "ok"}
+
+@app.post("/")
+async def a2a(request: Request):
+    body = await request.json()
+
+    # Extract user text from A2A JSON-RPC message/send
+    user_text = ""
+    try:
+        for part in body["params"]["message"]["parts"]:
+            if part.get("kind") == "text":
+                user_text = part["text"]
+                break
+    except (KeyError, TypeError):
+        pass
+
+    # Your logic goes here — echo for now
+    reply = f"You said: {user_text}"
+
+    return {
+        "jsonrpc": "2.0",
+        "id": body.get("id"),
+        "result": {
+            "kind": "message",
+            "messageId": f"agent-{int(time.time() * 1000)}",
+            "role": "agent",
+            "parts": [{"kind": "text", "text": reply}],
+        },
+    }
+```
+
+```bash
+pip install fastapi uvicorn
+uvicorn agent:app --host 127.0.0.1 --port 9876
+```
+
+Test locally:
+```bash
+curl -X POST http://127.0.0.1:9876/ \
+  -H "Content-Type: application/json" \
+  -d '{"jsonrpc":"2.0","method":"message/send","id":"1","params":{"message":{"role":"user","messageId":"m1","parts":[{"kind":"text","text":"hello"}]}}}'
+```
+
+Should return a JSON body with `"text":"You said: hello"`.
+
+---
+
+## Step 2 — Expose it to the internet
+
+Pick one:
+
+### Option A — Cloudflare quick tunnel (no account, ephemeral)
+```bash
+cloudflared tunnel --url http://127.0.0.1:9876
+```
+Copy the printed `https://*.trycloudflare.com` URL. Regenerates on every restart; fine for demos.
+
+### Option B — ngrok (account, persistent during session)
+```bash
+ngrok http 9876
+```
+
+### Option C — Real server with TLS
+Deploy the same Python script to a VM (Fly, Railway, DigitalOcean, anywhere) behind a TLS terminator (Caddy, nginx, or the platform's native TLS).
+
+---
+
+## Step 3 — Register the workspace
+
+Replace `<TENANT>`, `<ADMIN_TOKEN>`, `<ORG_ID>`, and `<YOUR_URL>` with your values.
+
+```bash
+curl -X POST https://<TENANT>/workspaces \
+  -H "Authorization: Bearer <ADMIN_TOKEN>" \
+  -H "X-Molecule-Org-Id: <ORG_ID>" \
+  -H "Content-Type: application/json" \
+  -d '{
+    "name": "My Laptop Agent",
+    "runtime": "external",
+    "external": true,
+    "url": "<YOUR_URL>",
+    "tier": 2
+  }'
+```
+
+Response:
+```json
+{"external":true,"id":"abc-123-...","status":"online"}
+```
+
+The `id` field is your workspace ID — remember it.
+
+---
+
+## Step 4 — Chat with it
+
+1. Open your Molecule canvas at `https://<TENANT>`
+2. You'll see a new workspace node named "My Laptop Agent" with status `online`
+3. Click it → Chat tab → type "hello"
+4. Watch your terminal's uvicorn log — you'll see the incoming POST
+5. The reply appears in the canvas chat
+
+🎉 **You have an external agent running on Molecule.** Everything from here is iteration on that agent's handler code.
+
+---
+
+## Common gotchas
+
+| Problem | Fix |
+|---|---|
+| "Failed to send message — agent may be unreachable" | The tenant couldn't POST to your URL. Verify `curl https://<your-tunnel>/health` returns 200 from another machine. |
+| Response takes > 30s | Canvas times out around 30s. Keep initial implementations simple. For long-running work, return a placeholder and use [polling mode](#next-step-polling-mode-preview) (once available). |
+| Agent duplicated in chat | Known canvas bug where WebSocket + HTTP responses both render. Fixed in [molecule-core #1517](https://github.com/Molecule-AI/molecule-core/pull/1517). |
+| Agent replies but canvas shows "Agent unreachable" | Check the tenant can reach your URL. Cloudflare quick tunnels rotate — the URL in your canvas may point at a dead tunnel after restart. |
+| Getting 404 when POSTing to tenant | Add `X-Molecule-Org-Id` header. The tenant's security layer 404s unmatched origin requests by design. |
+
+---
+
+## What you can do from the agent
+
+Your agent has the same capability surface as a platform-native one. From inside your handler you can make outbound calls to the tenant API:
+
+```python
+import httpx
+
+TENANT = "https://you.moleculesai.app"
+TOKEN = "..."  # your workspace_auth_token from registration
+
+def call_peer(workspace_id: str, text: str) -> str:
+    """Message another agent (parent, child, sibling)."""
+    resp = httpx.post(
+        f"{TENANT}/workspaces/{workspace_id}/a2a",
+        headers={"Authorization": f"Bearer {TOKEN}"},
+        json={
+            "jsonrpc": "2.0",
+            "method": "message/send",
+            "id": "1",
+            "params": {"message": {
+                "role": "user", "messageId": "1",
+                "parts": [{"kind": "text", "text": text}]
+            }}
+        },
+        timeout=30,
+    )
+    return resp.json()["result"]["parts"][0]["text"]
+```
+
+Similarly available: `delegate_to_workspace`, `commit_memory`, `search_memory`, `request_approval`, `peers`, `discover`. See the [A2A protocol reference](/docs/api-protocol/communication-rules) for the full endpoint list.
+
+---
+
+## Production upgrade path
+
+The quickstart leaves you with an ephemeral demo. For real use:
+
+1. **Deploy to a real host**: Fly Machine / Railway / anywhere with a stable URL + TLS.
+2. **Use a named Cloudflare tunnel**: survives restarts, gets you a consistent subdomain.
+3. **Authenticate outbound calls correctly**: store the `workspace_auth_token` (returned when you register via `/registry/register`; see the [full registration doc](/docs/guides/external-agent-registration)) and send it as `Authorization: Bearer ...` on every outbound call to the tenant.
+4. **Add an LLM**: swap the echo handler for `anthropic` / `openai` / `ollama` / your model of choice.
+5. **Handle long-running work**: use the (upcoming) polling mode transport so you don't need a publicly reachable URL at all.
+
+---
+
+## Next step: polling mode (preview)
+
+Push mode (this guide) works today but requires an inbound-reachable URL — which forces tunnels or public IPs. A polling-mode transport is in design:
+
+```
+[Canvas] --A2A--> [Platform] <--polls-- [Your laptop]
+                  [inbox queue]     -->replies
+```
+
+Your agent makes only outbound HTTPS calls to the platform, pulling messages from an inbox queue and posting replies back. Works behind any NAT/firewall, tolerates offline laptops, no tunnel needed.
+
+See the [design doc](https://github.com/Molecule-AI/internal/blob/main/product/external-workspaces-polling.md) (internal) and [implementation tracking issue](https://github.com/Molecule-AI/molecule-core/issues?q=polling+mode) once opened.
+
+---
+
+## Examples
+
+- **This quickstart's code**: [gist](https://gist.github.com/molecule-ai/external-workspace-quickstart) (forked for your language of choice)
+- **LLM-backed example**: `molecule-ai/examples/external-claude-agent` — a working agent that proxies to Anthropic's API
+- **Scheduled cron example**: `molecule-ai/examples/external-cron-agent` — fires timed outbound messages without needing inbound
+
+---
+
+## Troubleshooting
+
+Run this diagnostic checklist before filing an issue:
+
+```bash
+# 1. Is your agent serving locally?
+curl http://127.0.0.1:9876/health
+
+# 2. Is the tunnel up?
+curl https://<your-tunnel-url>/health
+
+# 3. Can the tenant reach you? (from tenant shell or your laptop)
+curl -X POST https://<your-tunnel-url>/ \
+  -H "Content-Type: application/json" \
+  -d '{"jsonrpc":"2.0","method":"message/send","id":"x","params":{"message":{"role":"user","messageId":"m","parts":[{"kind":"text","text":"hi"}]}}}'
+
+# 4. Is the workspace registered correctly?
+curl -H "Authorization: Bearer <ADMIN_TOKEN>" -H "X-Molecule-Org-Id: <ORG_ID>" \
+     https://<TENANT>/workspaces/<WS_ID>
+```
+
+If all four pass and canvas still shows your agent as unreachable, see the [remote workspaces FAQ](/docs/guides/remote-workspaces-faq).
+
+---
+
+## Feedback
+
+This is a new path. Tell us what broke:
+- Open an issue: https://github.com/Molecule-AI/molecule-core/issues/new?labels=external-workspace
+- Submit a PR improving this doc if something tripped you up — the faster we can make the quickstart, the more developers we bring in
+
+---
+
+*Last updated 2026-04-23*
+
+(`molecule-core` [#1760](https://github.com/Molecule-AI/molecule-core/pull/1760))
--- a/content/docs/guides/platform-instructions.md
+++ b/content/docs/guides/platform-instructions.md
@ -0,0 +1,165 @@
+---
+title: "Platform Instructions"
+description: "Enforce system-prompt rules at the platform level — global org-wide rules and workspace-scoped rules injected at agent startup. Governance before the first turn, not after an incident."
+tags: [governance, security, platform-engineering, enterprise, system-prompt, policy]
+---
+
+# Platform Instructions
+
+Platform Instructions let workspace admins enforce behavioral rules at the system prompt level — injected before the first agent turn, not applied after an incident. Rules are stored in the platform database and resolved at workspace boot via the `GET /workspaces/:id/instructions/resolve` endpoint.
+
+> **Enterprise plans only.** Platform Instructions are available on Enterprise plans. Contact your account team to enable them.
+
+## How it works
+
+When a workspace boots (or refreshes its instructions), it calls:
+
+```
+GET /workspaces/:id/instructions/resolve
+Authorization: Bearer <workspace-token>
+```
+
+The platform returns a merged instruction string:
+
+```json
+{
+  "workspace_id": "ws_01hx3k...",
+  "instructions": "# Platform-Wide Rules\n\n## Security\n\nAlways confirm destructive operations with the user before executing...\n\n## Role-Specific Rules\n\n### Onboarding helper\n\nYou are helping new users set up their first workspace..."
+}
+```
+
+This string is prepended to the agent's system prompt as `# Platform Instructions` — the first section, before all other content. Because it goes first, it has highest precedence. Agents receive these instructions at boot and on every periodic refresh; they cannot be overridden by the agent's own prompt.
+
+## Types of instructions
+
+| Scope | Description | Use case |
+|---|---|---|
+| `global` | Applies to every workspace in the org | Security policy, compliance rules, brand voice |
+| `workspace` | Applies to one specific workspace | Per-project rules, team-specific behavior, onboarding |
+
+Instructions are merged at resolve time: global rules are applied first, workspace rules second. Within each scope, rules are ordered by `priority` (higher first).
+
+The `team` scope is reserved in the schema but not yet implemented.
+
+## Create a global instruction
+
+### Via the platform API
+
+```bash
+curl -X POST https://your-tenant.moleculesai.app/admin/instructions \
+  -H "Authorization: Bearer $ADMIN_TOKEN" \
+  -H "Content-Type: application/json" \
+  -d '{
+    "scope": "global",
+    "title": "Security policy",
+    "content": "Always confirm destructive operations (delete, revoke, terminate) with the user before executing. Never execute destructive commands without explicit approval.",
+    "priority": 100
+  }'
+```
+
+### Response
+
+```json
+{
+  "id": "instr_abc123",
+  "scope": "global",
+  "title": "Security policy",
+  "content": "...",
+  "priority": 100,
+  "enabled": true,
+  "created_at": "2026-04-30T12:00:00Z"
+}
+```
+
+## Create a workspace-scoped instruction
+
+```bash
+curl -X POST https://your-tenant.moleculesai.app/admin/instructions \
+  -H "Authorization: Bearer $ADMIN_TOKEN" \
+  -H "Content-Type: application/json" \
+  -d '{
+    "scope": "workspace",
+    "scope_target": "ws_01hx3k...",
+    "title": "Onboarding helper",
+    "content": "You are helping a new user set up their first Molecule AI workspace. Keep explanations concise. Offer to walk through the Canvas UI tour after setup.",
+    "priority": 50
+  }'
+```
+
+`scope_target` accepts a workspace ID. When resolved for that workspace, the response includes both global rules and this workspace-specific rule.
+
+## List all instructions
+
+```bash
+# Global instructions only
+curl -s "https://your-tenant.moleculesai.app/admin/instructions?scope=global" \
+  -H "Authorization: Bearer $ADMIN_TOKEN" | jq .
+
+# Instructions for a specific workspace (global + workspace)
+curl -s "https://your-tenant.moleculesai.app/admin/instructions?workspace_id=ws_01hx3k..." \
+  -H "Authorization: Bearer $ADMIN_TOKEN" | jq .
+```
+
+## Resolve instructions for a workspace
+
+```bash
+curl -s "https://your-tenant.moleculesai.app/workspaces/ws_01hx3k.../instructions/resolve" \
+  -H "Authorization: Bearer $WORKSPACE_TOKEN" | jq .
+```
+
+The `Authorization` header here uses the **workspace's own token** (from `POST /registry/register` or `POST /workspaces/:id/tokens`). The resolve endpoint is gated by `WorkspaceAuth` — a workspace can only resolve its own instructions.
+
+## Update an instruction
+
+```bash
+curl -X PUT https://your-tenant.moleculesai.app/admin/instructions/instr_abc123 \
+  -H "Authorization: Bearer $ADMIN_TOKEN" \
+  -H "Content-Type: application/json" \
+  -d '{
+    "content": "Updated rule — new wording",
+    "priority": 80
+  }'
+```
+
+The workspace picks up the change on its next instruction refresh (periodic, not immediate).
+
+## Delete an instruction
+
+```bash
+curl -X DELETE https://your-tenant.moleculesai.app/admin/instructions/instr_abc123 \
+  -H "Authorization: Bearer $ADMIN_TOKEN"
+```
+
+Returns `404` if the instruction does not exist.
+
+## Content limits
+
+Instruction content is capped at **8,192 characters** per rule. This prevents a single oversized rule from consuming the entire system prompt token budget. The cap is enforced at creation and update time — requests with content exceeding the limit receive a `400 Bad Request`.
+
+For longer policy documents, consider:
+- Splitting into multiple rules with lower priority
+- Linking to external policy documents in the rule content
+- Using the rule as a summary with a reference to the full policy
+
+## Security properties
+
+**Agents cannot override Platform Instructions** — the instruction string is prepended to the system prompt at the platform layer, before the agent runtime processes it. An agent cannot edit, delete, or suppress its own Platform Instructions.
+
+**Workspace-scoped rules are private** — a workspace can only resolve its own instructions. It cannot enumerate other workspaces' instructions, even if it knows their IDs.
+
+**Content is org-scoped** — instructions live at the org level (global scope) or workspace level. There is no cross-org visibility.
+
+## Relationship with Tool Trace
+
+Platform Instructions enforce rules **before** the agent runs. Tool Trace records what the agent **actually did**. Together they provide a complete governance loop:
+
+1. **Platform Instructions** — set expectations at startup (what the agent should and should not do)
+2. **Tool Trace** — verify compliance at runtime (what the agent actually did)
+
+If Tool Trace shows an agent calling tools that Platform Instructions explicitly prohibit, that's a compliance incident — not a configuration issue.
+
+## Related
+
+- [Tool Trace](/docs/guides/tool-trace) — Verify what agents actually did, not just what they said they would do
+- [Org-Scoped API Keys](/docs/guides/org-api-keys) — Attribute tool calls to specific org credentials for billing and audit
+- [A2A Protocol](/docs/api-protocol/a2a-protocol) — How agents communicate and how Tool Trace travels in A2A responses
--- a/content/docs/guides/tool-trace.md
+++ b/content/docs/guides/tool-trace.md
@ -0,0 +1,164 @@
+---
+title: "Tool Trace"
+description: "See exactly what your agents did — every tool call, input, and output preview, stored in your activity logs. Tool Trace ships inside every A2A response and requires zero instrumentation."
+tags: [observability, debugging, compliance, enterprise, tool-trace]
+---
+
+# Tool Trace
+
+Tool Trace records every tool an agent calls — the tool name, input arguments, and a sanitized output preview — and stores it in your org's `activity_logs` table. It ships inside every A2A response, requires zero SDK instrumentation, and is queryable via the platform API.
+
+> **Built in, not bolted on.** Tool Trace is enabled by default on all workspaces. There is no feature flag to enable — it starts recording the moment an agent makes its first A2A call.
+
+## What Tool Trace captures
+
+Each A2A response from a workspace includes a `metadata.tool_trace` array. The platform extracts it and persists it to `activity_logs` for every logged event:
+
+```json
+{
+  "run_id": "log-abc123",
+  "activity_type": "a2a_call",
+  "workspace_id": "ws_01hx3k...",
+  "method": "message/send",
+  "created_at": "2026-04-30T12:01:00Z",
+  "tool_trace": [
+    {
+      "tool": "mcp__files__read",
+      "input": {"path": "config.yaml"},
+      "output_preview": "api_version: v2, region: us-east-1, ..."
+    },
+    {
+      "tool": "mcp__httpx__get",
+      "input": {"url": "https://api.example.com/status"},
+      "output_preview": "{\"status\": \"ok\", \"latency_ms\": 42}"
+    }
+  ],
+  "duration_ms": 1842
+}
+```
+
+### Field definitions
+
+Activity log fields (the outer object that wraps a tool trace):
+
+| Field | Description |
+|---|---|
+| `run_id` | Unique identifier for this activity log row. Links the `tool_trace` to its originating A2A run — use this to correlate traces from fan-out tasks across multiple workspace logs. |
+| `activity_type` | The type of logged event (e.g., `a2a_call`, `a2a_receive`, `task_update`) |
+| `workspace_id` | The workspace that generated this log entry |
+| `method` | The A2A method invoked (e.g., `message/send`) |
+| `created_at` | ISO 8601 timestamp when the entry was written |
+| `duration_ms` | Total elapsed time of the call in milliseconds |
+| `tool_trace` | Array of tool call objects (see below) |
+
+Tool trace object fields (each entry in the `tool_trace` array):
+
+| Field | Description |
+|---|---|
+| `tool` | The tool or function that was invoked (e.g., `mcp__files__read`, `Bash`, `commit_memory`) |
+| `input` | The arguments passed to the tool. Sensitive values (API keys, tokens, long strings) are sanitized before storage. |
+| `output_preview` | First 200 characters of the tool's output. Caps large responses to prevent `activity_logs` bloat. |
+
+## Querying activity logs
+
+### List recent tool traces for a workspace
+
+```bash
+curl -s "https://your-tenant.moleculesai.app/workspaces/$WS_ID/activity?limit=10" \
+  -H "Authorization: Bearer $ADMIN_TOKEN" | jq '.[] | {created_at, tool_trace}'
+```
+
+### Find all calls to a specific tool
+
+```bash
+curl -s "https://your-tenant.moleculesai.app/workspaces/$WS_ID/activity?limit=50" \
+  -H "Authorization: Bearer $ADMIN_TOKEN" \
+  | jq '.[] | select(.tool_trace != null) | {created_at, tools: [.tool_trace[].tool]}'
+```
+
+### Trace a specific task
+
+```bash
+# List recent logs and filter by tool
+curl -s "https://your-tenant.moleculesai.app/workspaces/$WS_ID/activity?limit=50" \
+  -H "Authorization: Bearer $ADMIN_TOKEN" \
+  | jq '[.[] | select(.tool_trace | length > 0) | {
+    time: .created_at,
+    method: .method,
+    calls: [.tool_trace[] | {tool, input}]
+  }] | reverse | .[0:10]'
+```
+
+## How it works
+
+When a workspace sends an A2A response back to the platform, the platform's A2A proxy extracts `metadata.tool_trace` from the JSON-RPC response body:
+
+```
+Agent → [runs task, calls tools] → A2A response with metadata.tool_trace
+                                      ↓
+                          extractToolTrace() in logA2ASuccess()
+                                      ↓
+                          Persisted to activity_logs.tool_trace (JSONB column)
+                                      ↓
+                          Indexed via GIN index for fast JSONB queries
+```
+
+The `tool_trace` field in the A2A response is produced by the agent runtime — it reflects the tool calls that actually executed, not the tool calls the agent said it planned to make. This distinction matters for compliance: LLM output tells you what the agent *said* it would do; Tool Trace tells you what it *actually did*.
+
+## Use cases
+
+### Compliance and audit
+
+For regulated environments, Tool Trace provides the execution record that proves an agent operated within its authorized scope. Query `tool_trace` for any call that reached external APIs or modified system state.
+
+```bash
+# Find all HTTP tool calls in the last 24 hours
+curl -s ".../workspaces/$WS_ID/activity?limit=200" \
+  -H "Authorization: Bearer $ADMIN_TOKEN" \
+  | jq '[.[] | select(.tool_trace != null) |
+    select(.tool_trace[] | .tool | contains("httpx"))] |
+    map({time: .created_at, calls: [.tool_trace[]]})'
+```
+
+### Debugging agent behavior
+
+When an agent produces an unexpected result, Tool Trace shows exactly which tools were called and with what inputs — faster than replaying the full conversation.
+
+```bash
+# Find a specific agent's call sequence for a given task
+curl -s ".../workspaces/$WS_ID/activity?limit=50" \
+  -H "Authorization: Bearer $ADMIN_TOKEN" \
+  | jq '.[] | select(.tool_trace | length >= 3) | {created_at, count: (.tool_trace | length)}'
+```
+
+### Verifying tool coverage
+
+Before deploying a new agent, verify it calls the expected tools under load.
+
+```bash
+# Aggregate tool call counts for a workspace
+curl -s ".../workspaces/$WS_ID/activity?limit=100" \
+  -H "Authorization: Bearer $ADMIN_TOKEN" \
+  | jq '[.[] | select(.tool_trace != null) | .tool_trace[].tool] |
+    group_by(.) | map({tool: .[0], count: length}) | sort_by(.count) | reverse'
+```
+
+## Security and privacy
+
+**Input sanitization** — API keys, long strings, and other sensitive values in `input` are sanitized before storage. The sanitization uses a best-effort pattern: sensitive key names (e.g., `key`, `token`, `password`, `secret`) and values longer than 200 characters are redacted.
+
+**Output previews** — Tool outputs are capped at 200 characters to prevent `activity_logs` bloat and to limit the exposure of sensitive data in stored traces.
+
+**Per-workspace isolation** — A `workspace_id` filter is required on all activity log queries. Admins cannot query other workspaces' activity logs without explicit access.
+
+## Limitations
+
+- **Requires A2A** — Tool Trace is recorded for A2A calls only. Direct MCP tool calls that bypass A2A do not produce traces.
+- **Runtime-dependent** — The agent runtime must produce `metadata.tool_trace` in its A2A responses. Not all runtimes (e.g., custom external agents) include this field.
+- **No cross-workspace trace** — Each `activity_logs` row covers a single workspace. Tracing a task that fan-out to multiple agents requires correlating `task_id` across multiple workspace logs.
+
+## Related
+
+- [Platform Instructions](/docs/guides/platform-instructions) — Enforce rules at the system prompt level, before the agent runs
+- [Org-Scoped API Keys](/docs/guides/org-api-keys) — Attribute every tool call to a specific org key for billing and audit
+- [A2A Protocol](/docs/api-protocol/a2a-protocol) — The message format that carries `tool_trace` inside every response
--- a/content/docs/tutorials/saas-federation.md
+++ b/content/docs/tutorials/saas-federation.md
@ -0,0 +1,249 @@
+---
+title: "SaaS Federation — Multi-Tenant Agent Platform"
+---
+# SaaS Federation — Multi-Tenant Agent Platform
+
+This tutorial walks through setting up a multi-tenant AI agent platform using Molecule AI's SaaS federation layer. You'll provision workspaces for multiple customers from a single control plane, with per-tenant database isolation, credential separation, and agent fleet visualization.
+
+**What this covers:**
+
+- How the control plane provisions tenant workspaces in your AWS account
+- How to onboard a new tenant with isolated Neon database + EC2 security group
+- How to register and inspect a tenant's agent fleet via the platform API
+- How billing and quota controls work at the tenant layer
+
+**Assumptions:** You have a Molecule AI control plane deployed, an AWS account with VPC + subnets available, and a Neon account for branch-per-tenant databases.
+
+---
+
+## What is SaaS federation?
+
+Molecule AI's SaaS federation layer sits between your control plane and the tenant workspaces your customers use.
+
+```
+You (the platform operator)
+  │
+  ├── Control Plane (api.moleculesai.app)
+  │     └─ Provisions: Neon DB branches, EC2 workspaces, security groups
+  │
+  └── Tenant: acme.rocket.chat
+        ├── Workspace: acme-production-1 (EC2, T3)
+        ├── Workspace: acme-production-2 (EC2, T4)
+        └── Neon branch: acme_db → acme's Postgres
+```
+
+Each tenant is a separate organization in Molecule AI. The control plane holds credentials and provisions infrastructure — but each tenant's workspace data lives in their own isolated branch.
+
+---
+
+## Step 1: Onboard a new tenant
+
+Onboarding creates a new org in your platform, provisions a Neon database branch, and sets up an EC2 security group for the tenant's workspaces.
+
+### Via the control plane API
+
+```bash
+# Create a new tenant org
+curl -X POST https://api.moleculesai.app/cp/orgs \
+  -H "Authorization: Bearer $PROVISION_SHARED_SECRET" \
+  -H "Content-Type: application/json" \
+  -d '{
+    "name": "Acme Corp",
+    "slug": "acme",
+    "plan": "pro",
+    "vpc_id": "vpc-0a1b2c3d4e5f6g7h8",
+    "subnet_ids": ["subnet-abc123", "subnet-def456"]
+  }'
+```
+
+Response:
+
+```json
+{
+  "id": "org_7f2a9c",
+  "name": "Acme Corp",
+  "slug": "acme",
+  "plan": "pro",
+  "neon_branch_id": "br-shadowy-7f2a9c",
+  "security_group_id": "sg-0a1b2c3d",
+  "status": "provisioning"
+}
+```
+
+### What gets provisioned
+
+| Resource | How | Who manages |
+|---|---|---|
+| Neon branch `br-shadowy-7f2a9c` | Auto-created by control plane via Neon API | Tenant gets connection string |
+| EC2 security group `sg-0a1b2c3d` | Created with inbound :443 from platform only | Control plane manages rules |
+| Org record in platform DB | Created on first API call | Control plane |
+
+The provisioning step runs asynchronously — poll `/cp/orgs/:slug` until `status: active`.
+
+```bash
+# Poll until active
+until curl -s https://api.moleculesai.app/cp/orgs/acme \
+    -H "Authorization: Bearer $PROVISION_SHARED_SECRET" \
+    | jq -r '.status' | grep -q active; do
+  echo "Still provisioning..."; sleep 10
+done
+echo "Tenant ready"
+```
+
+---
+
+## Step 2: Provision workspaces for the tenant
+
+Once the tenant org is active, workspaces can be created via the tenant's own API — no operator involvement needed.
+
+Each workspace is provisioned as an EC2 instance in the tenant's VPC subnet, behind the tenant's security group. The security group allows inbound :443 from the platform API only.
+
+```bash
+# As the tenant (they use their own org-scoped API key)
+curl -X POST https://acme.moleculesai.app/workspaces \
+  -H "Authorization: Bearer $TENANT_ORG_KEY" \
+  -H "Content-Type: application/json" \
+  -d '{
+    "name": "production-agent-1",
+    "role": "Production inference worker",
+    "runtime": "hermes",
+    "tier": 3,
+    "model": "claude-sonnet-4"
+  }'
+```
+
+The control plane handles the EC2 provisioning in the background:
+
+1. Calls `aws ec2 run-instances` in the tenant's VPC subnet
+2. Waits for the instance to boot and register via A2A
+3. Returns the workspace ID and connection details
+
+The tenant sees a workspace appear in their canvas UI within ~60 seconds.
+
+---
+
+## Step 3: Inspect the tenant's agent fleet
+
+From the operator side, you can inspect any tenant's workspaces via the control plane:
+
+```bash
+# List all workspaces for a tenant
+curl https://api.moleculesai.app/cp/orgs/acme/workspaces \
+  -H "Authorization: Bearer $PROVISION_SHARED_SECRET" \
+  | jq '.'
+```
+
+Response:
+
+```json
+{
+  "org": "acme",
+  "workspaces": [
+    {
+      "id": "ws_9b3k1m",
+      "name": "production-agent-1",
+      "runtime": "hermes",
+      "tier": 3,
+      "instance_id": "i-0a1b2c3d4e5f6g7h8",
+      "status": "running",
+      "last_seen": "2026-04-22T09:30:00Z"
+    },
+    {
+      "id": "ws_2n8p4q",
+      "name": "staging-worker",
+      "runtime": "hermes",
+      "tier": 2,
+      "instance_id": "i-1a2b3c4d5e6f7g8h9",
+      "status": "stopped",
+      "last_seen": "2026-04-21T16:00:00Z"
+    }
+  ]
+}
+```
+
+### Fleet-level metrics
+
+```bash
+# Aggregate runtime stats for a tenant
+curl https://api.moleculesai.app/cp/orgs/acme/metrics \
+  -H "Authorization: Bearer $PROVISION_SHARED_SECRET" \
+  | jq '{total_workspaces, active_agents, avg_response_time_ms, total_tasks_dispatched}'
+```
+
+---
+
+## Step 4: Set quota and billing controls
+
+Quotas are enforced at the org level. Set a workspace count limit to prevent runaway provisioning:
+
+```bash
+# Set workspace limit for tenant
+curl -X PATCH https://api.moleculesai.app/cp/orgs/acme \
+  -H "Authorization: Bearer $PROVISION_SHARED_SECRET" \
+  -H "Content-Type: application/json" \
+  -d '{
+    "max_workspaces": 10,
+    "max_tier": 3,
+    "billing_plan": "pro"
+  }'
+```
+
+When a tenant hits their workspace limit, `POST /workspaces` returns **`409 Conflict`** (not `402 Payment Required` — quota gates are resource-state conflicts, not payment failures).
+
+---
+
+## Step 5: Revoke access for a tenant
+
+If a tenant stops paying or needs to be suspended:
+
+```bash
+# Suspend tenant (revokes their org API key and freezes workspace creation)
+curl -X POST https://api.moleculesai.app/cp/orgs/acme/suspend \
+  -H "Authorization: Bearer $PROVISION_SHARED_SECRET"
+```
+
+This action:
+- Revokes all org-scoped API keys for the tenant
+- Stops new workspace provisioning
+- Keeps existing workspace data intact (you can resume or hard-delete later)
+
+To hard-delete a tenant and all their workspaces:
+
+```bash
+curl -X DELETE https://api.moleculesai.app/cp/orgs/acme \
+  -H "Authorization: Bearer $PROVISION_SHARED_SECRET" \
+  -H "Content-Type: application/json" \
+  -d '{"confirm": true, "delete_workspaces": true}'
+```
+
+This terminates all EC2 instances, drops the Neon branch, and removes the org record. **This is irreversible.**
+
+---
+
+## Security model summary
+
+| Layer | Isolation mechanism | Who manages |
+|---|---|---|
+| Database | Neon branch-per-tenant | Tenant's branch, operator has no direct access |
+| Compute | EC2 in tenant's VPC | Control plane provisions, operator manages SG rules |
+| Credentials | No Fly/API tokens on tenant | All cloud credentials held by control plane |
+| API access | Org-scoped API keys | Tenant manages their own keys; operator has CP-level override |
+| Network | Security group: port 443 from platform only | Control plane manages; tenant can't modify |
+
+---
+
+## What's next
+
+- **Tenant registration UI**: expose a signup flow so customers can self-serve (roadmap: Phase 34)
+- **Scoped roles**: give different team members read-only vs admin access within a tenant org (roadmap: Phase 34)
+- **Usage-based billing**: Meter workspace runtime and forward events to Stripe for custom billing tiers
+
+For runbook-level details on the provisioning flow, see the architecture docs at [`docs/architecture/saas-prod-migration-2026-04-19`](/docs/architecture/saas-prod-migration-2026-04-19).
+
+For the API reference, see [`docs/api-reference`](/docs/api-reference) — the `/cp/orgs/*` endpoints are documented there.
+
+---
+
+*SaaS federation is available for all Molecule AI platform operators. Contact the Molecule AI team to enable federation on your control plane.*
+
+(`molecule-core` [#1700](https://github.com/Molecule-AI/molecule-core/pull/1700))
--- a/content/docs/tutorials/saas-file-writes-eic.md
+++ b/content/docs/tutorials/saas-file-writes-eic.md
@ -0,0 +1,153 @@
+---
+title: "SaaS File Writes via EC2 Instance Connect"
+description: "How to use the Files API PUT endpoint to write files to SaaS (EC2-backed) workspaces via AWS EC2 Instance Connect."
+---
+
+When your workspace runs on a Molecule AI SaaS EC2 (not a Docker container), the Files API routes writes through **AWS EC2 Instance Connect (EIC)** — the same SSH-backed channel that powers the Terminal tab. This demo shows the end-to-end flow, the three routing decisions the handler makes, and how to verify the write succeeded.
+
+## Prerequisites
+
+- A Molecule AI SaaS workspace (EC2-backed — check `instance_id` in the workspace record)
+- `curl` + `jq`
+- AWS credentials with `ec2-instance-connect:SendSSHPublicKey` permission (usually handled by Molecule's runtime IAM role; you don't need to configure this yourself)
+
+## How the routing works
+
+`PUT /workspaces/:id/files/*path` checks `workspaces.instance_id` before choosing a write path:
+
+| `instance_id` | Write path |
+|---|---|
+| empty (self-hosted) | Docker `cp` into running container, then offline ephemeral-container fallback |
+| set (SaaS) | EIC: SSH-backed write via `aws ec2-instance-connect` |
+
+```
+Caller → PUT /workspaces/:id/files/config.yaml
+                │
+                ├─ Docker path (container running)
+                │     copyFilesToContainer()
+                │
+                ├─ Docker path (container offline)
+                │     writeViaEphemeral() → tar → docker run --rm -v …
+                │
+                └─ SaaS path (EC2 workspace)
+                      writeFileViaEIC()
+                      ├── ssh-keygen ed25519 (temp keypair)
+                      ├── aws ec2-instance-connect send-ssh-public-key (60s window)
+                      ├── aws ec2-instance-connect open-tunnel (local port → :22)
+                      └── ssh ubuntu@127.0.0.1 -p LOCAL_PORT "install -D -m 0644 /dev/stdin ABS_PATH"
+```
+
+## 1 — List existing files
+
+```bash
+export WS_ID=<your-workspace-id>
+export API_BASE=https://your-tenant.moleculesai.app   # or localhost:8080 for self-hosted
+
+# List files under /configs (the default root)
+curl -s "$API_BASE/workspaces/$WS_ID/files" \
+  -H "Authorization: Bearer $ADMIN_TOKEN" | jq .
+```
+
+**Sample response (SaaS / EC2 workspace):**
+```json
+[
+  { "path": "config.yaml", "size": 412, "dir": false },
+  { "path": "skills",     "size":   0, "dir": true  }
+]
+```
+
+## 2 — Write a single file
+
+```bash
+curl -s -X PUT "$API_BASE/workspaces/$WS_ID/files/my-agent-prompt.md" \
+  -H "Authorization: Bearer $ADMIN_TOKEN" \
+  -H "Content-Type: application/json" \
+  -d '{"content": "# Agent Prompt\n\nYou are a careful code reviewer."}' | jq .
+```
+
+**Sample response:**
+```json
+{ "status": "saved", "path": "my-agent-prompt.md" }
+```
+
+### What happened under the hood
+
+1. Handler looked up `instance_id` → not empty → routed to `writeFileViaEIC()`
+2. Ephemeral `ed25519` keypair generated in `/tmp/molecule-filewrite-*/`
+3. Public key pushed via `aws ec2-instance-connect send-ssh-public-key` (valid 60 s)
+4. TLS tunnel opened on a local free port → workspace EC2 port 22
+5. `ssh install -D -m 0644 /dev/stdin /home/ubuntu/.hermes/my-agent-prompt.md` executed
+6. Keydir wiped on function return
+
+## 3 — Bulk replace (multiple files at once)
+
+```bash
+curl -s -X PUT "$API_BASE/workspaces/$WS_ID/files" \
+  -H "Authorization: Bearer $ADMIN_TOKEN" \
+  -H "Content-Type: application/json" \
+  -d '{
+    "files": {
+      "config.yaml": "name: my-agent\nversion: 1.0.0\ntier: 1\nmodel: anthropic:claude-sonnet-4-20250514\nskills: []\n",
+      "rules.md":    "# Workspace Rules\n\nNo deletions without approval.",
+      "system-prompt.md": "# System Prompt\n\nYou are a helpful coding assistant."
+    }
+  }' | jq .
+```
+
+**Sample response:**
+```json
+{
+  "status":    "replaced",
+  "workspace": "a8af9d79-...",
+  "files":     3,
+  "source":    "ec2-ssh"
+}
+```
+
+> **Bulk-write latency:** Each file opens its own EIC tunnel (~3 s/file). For 10+ files consider writing a tar archive and extracting it in a single SSH session (follow-up tracked in the source PR).
+
+## 4 — Verify the write landed on the EC2
+
+```bash
+# Read back via the Files API
+curl -s "$API_BASE/workspaces/$WS_ID/files/my-agent-prompt.md" \
+  -H "Authorization: Bearer $ADMIN_TOKEN" | jq .
+
+# Confirm the absolute path on the EC2
+# hermes runtime  → /home/ubuntu/.hermes/<relPath>
+# langgraph       → /opt/configs/<relPath>
+# external/unknown→ /opt/configs/<relPath>
+```
+
+## Key security properties
+
+| Property | How it's enforced |
+|---|---|
+| Path traversal blocked | `filepath.Clean(relPath)` + `..` prefix check; absolute paths rejected before any handler call |
+| No shell injection | Remote command uses `install` (not `sh -c`); `absPath` built from a closed map + `Clean()` only |
+| Ephemeral credentials | ed25519 keypair lives in `tmpdir` ≤ 30 s, wiped by `defer RemoveAll` |
+| EIC 60 s key window | AWS drops the temporary authorized key after 60 s regardless |
+| OS user locked | Reads from `WORKSPACE_EC2_OS_USER` env var (default `ubuntu`); no other user configurable |
+
+## Error cases
+
+### 500 `failed to write file: workspace has no instance_id`
+
+The workspace record has no `instance_id`, meaning it is a self-hosted Docker workspace. For these, ensure the container is running or use the ephemeral-container fallback. This error only occurs on SaaS when `instance_id` is unexpectedly null.
+
+### 500 `path traversal blocked`
+
+`relPath` contained `..` or was an absolute path. Rejecting at the API boundary before any file operation.
+
+### Timeout (30 s)
+
+Key push + tunnel + write took longer than 30 s. Common causes: slow AWS EIC in the region, high SSH load on the target instance. Retry the request.
+
+## Source PR
+
+PR [#1702](https://github.com/Molecule-AI/molecule-core/pull/1702) — `feat(files-api): SSH-backed write for SaaS workspaces (fixes 500 docker not available)`
+
+Key files in `molecule-core`:
+- `workspace-server/internal/handlers/template_files_eic.go` — EIC write logic
+- `workspace-server/internal/handlers/template_import.go` — `ReplaceFiles` SaaS routing
+- `workspace-server/internal/handlers/templates.go` — `WriteFile` Docker → EIC routing