- workspace-runtime-package.md: add explicit "Where to make changes" section documenting the mirror-only policy on Molecule-AI/molecule-ai-workspace-runtime — direct PRs are auto-rejected by mirror-guard CI; staging push regenerates both the mirror and the PyPI wheel via .github/workflows/publish-runtime.yml. - infra/workspace-terminal.md: replace dead molecule-core#1528 reference (repo renamed to molecule-monorepo, no longer accepting issues at the old name) with a forward-pointer to monorepo + molecule-controlplane issue trackers. - architecture/backends.md: bump audit date to 2026-05-02 and add rows for channel envelope enrichment (#2471), chat_history MCP tool (#2474), /activity before_ts paging (#2476), /activity peer_id filter (#2472), runtime_wedge smoke gate (#2473 + #2475), and the canvas-E2E state-file requirement (#2327). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
9.9 KiB
Workspace Backend Parity Matrix
Status: living document — update when you ship a feature that touches one backend. Owner: workspace-server + controlplane teams. Last audit: 2026-05-02 (Claude agent, PR #TBD).
Why this exists
Molecule AI ships workspaces on two backends:
- Docker — the self-hosted / local-dev path.
provisioner.Dockerinworkspace-server/internal/provisioner/. Each workspace is a container on the same daemon as the platform. - EC2 (SaaS) — the control-plane path.
provisioner.CPProvisionerin the same directory, which calls the control plane atPOST /cp/workspaces/provision. Each workspace is its own EC2 instance.
Every user-visible workspace feature should work on both backends unless it is fundamentally tied to one substrate (e.g. docker logs command, AWS serial console). When the two diverge silently — a handler works on Docker but quietly 500s on EC2, or vice versa — users hit dead ends that look like bugs but are actually architectural gaps.
This document is the canonical matrix. If you are landing a workspace-facing feature, update the row before you merge.
The matrix
| Feature | File(s) | Docker | EC2 | Verdict |
|---|---|---|---|---|
| Lifecycle | ||||
| Create | workspace_provision.go:19-214 |
provisionWorkspace() → provisioner.Start() |
provisionWorkspaceCP() → cpProv.Start() |
✅ parity |
| Start | provisioner.go:140-325 |
container create + image pull | EC2 RunInstance via CP |
✅ parity |
| Stop | provisioner.go:772-785 |
ContainerRemove(force=true) + optional volume rm |
DELETE /cp/workspaces/:id |
✅ parity |
| Restart | workspace_restart.go:45-210 |
reads runtime from live container before stop | reads runtime from DB only | ⚠️ divergent — config-change + crash window can boot old runtime on EC2 |
| Delete | workspace_crud.go |
stop + volume rm | stop only (stateless) | ✅ parity (expected divergence on volume cleanup) |
| Secrets | ||||
| Create / update | secrets.go |
DB insert, injected at container start | DB insert, injected via user-data at boot | ✅ parity |
| Redaction | workspace_provision.go:251 |
applied at memory-seed time | applied at agent runtime | ⚠️ divergent — timing differs |
| Files API | ||||
| List / Read / Write / Replace / Delete | container_files.go, template_import.go |
docker exec + tar CopyToContainer |
SSH via EIC tunnel (PR #1702) | ✅ parity as of 2026-04-22 (previously docker-only) |
| Plugins | ||||
| Install / uninstall / list | plugins_install.go |
deliverToContainer() + volume rm |
gap — no live plugin delivery | 🔴 docker-only |
| Terminal (WebSocket) | ||||
| Dispatch | terminal.go:90-105 |
instance_id="" → handleLocalConnect → docker attach |
instance_id set → handleRemoteConnect → EIC SSH + docker exec |
✅ parity (different implementations, same UX) |
| A2A proxy | ||||
| Forward | a2a_proxy.go |
127.0.0.1:<port> |
EC2 private IP inside tenant VPC | ✅ parity |
| Liveness | a2a_proxy_helpers.go |
provisioner.IsRunning() |
cpProv.IsRunning() (DB-backed) |
✅ parity |
| Channel envelope enrichment (peer_name / peer_role / agent_card_url) | a2a_proxy.go + workspace-runtime channel emitter (PR #2471) |
inbox row carries enriched fields | inbox row carries enriched fields | ✅ parity as of 2026-05-02 |
| MCP tools (a2a) | ||||
chat_history — fetch prior turns with a peer |
mcp_server.go + workspace-runtime a2a_mcp (PR #2474) |
runtime-served, backend-agnostic | runtime-served, backend-agnostic | ✅ parity as of 2026-05-02 |
| Activity API | ||||
before_ts paging on /workspaces/:id/activity |
activity.go (PR #2476) |
DB-driven | DB-driven | ✅ parity as of 2026-05-02 |
peer_id filter on /workspaces/:id/activity |
activity.go (PR #2472) |
DB-driven | DB-driven | ✅ parity as of 2026-05-02 |
| Config / template injection | ||||
| Template copy at provision | provisioner.go:553-648 |
host walk → tar → CopyToContainer(/configs) |
CP user-data bakes template into bootstrap script | ⚠️ divergent — sync (docker) vs async (EC2) |
| Runtime config hot-reload | templates.go + handlers |
no hot-reload — restart required | no hot-reload — restart required | ✅ parity (both require restart; acceptable) |
| Memory (HMA) | ||||
| Seed initial memories | workspace_provision.go:226-260 |
DB insert at provision time | DB insert at provision time | ✅ parity |
| Bootstrap signals | ||||
| Ready detection | registry /registry/register |
container heartbeat | tenant heartbeat + boot-event phone-home (CP bootevents table + wait_platform_health=ok) |
✅ parity as of molecule-controlplane#235 |
| Console / log output | workspace_bootstrap.go |
docker logs |
ec2:GetConsoleOutput via CP proxy |
🟡 ec2-only (docker has docker logs directly; no unified API) |
runtime_wedge post-execute() smoke gate |
workspace-runtime smoke_mode.py (PRs #2473 + #2475) |
runtime-served, surfaces SDK-init wedges to wheel-smoke + container start | runtime-served, surfaces SDK-init wedges to wheel-smoke + container start | ✅ parity as of 2026-05-02 |
| Test infrastructure | ||||
Canvas-E2E .playwright-staging-state.json written before any CP call |
tools/e2e-staging-setup (PR #2327, 2026-04-30) |
n/a — staging-only safety net | required so workflow safety-net can find slug; pattern-sweeping by date prefix poisons concurrent runs | ✅ enforced (staging E2E) |
| Orphan cleanup | ||||
| Detect + terminate stale | healthsweep.go + CP DeprovisionInstance |
Docker daemon scan | CP OrgID-tag cascade (molecule-controlplane#234) | ✅ parity as of 2026-04-23 |
| Health / budget / schedules | ||||
| Budget enforcement | budget.go |
DB-driven | DB-driven | ✅ parity |
| Schedule execution | workspace_restart.go:235-280 |
provisioner.Stop() + re-provision |
cpProv.Stop() + CP auto-restart |
✅ parity |
| Liveness probe | healthsweep.go |
provisioner.IsRunning() |
cpProv.IsRunning() |
✅ parity |
| Template recipes (per-template user-data) | ||||
Hermes install.sh (bare-host) / start.sh (Docker) |
molecule-ai-workspace-template-hermes/ |
start.sh entrypoint |
install.sh called by CP user-data hook |
⚠️ structurally divergent — two scripts maintained separately; parity enforced by CI lint, see tools/check-template-parity.sh |
Top drift risks (ordered by production impact)
- Plugin install is docker-only. Hot-install UX (POST /plugins) calls
deliverToContainer()which requires a live Docker daemon. On EC2, there is no equivalent — plugins must be baked into user-data at boot. SaaS users who want to iterate on plugins without restarting today cannot. Fix path: add a CP-side plugin-manager endpoint that the tenant workspace-server proxies to, or document "restart required" on SaaS. - Template config injection is sync on Docker and async on EC2. Docker writes config files right before
ContainerStart; EC2 embeds them in user-data and they materialize whenever cloud-init runs. A workspace that starts serving before cloud-init completes can see stale config. Fix path: make the canvas wait forwait_platform_health=okboot-event before flipping toonline, same mechanism the provisioning path uses. - Restart divergence on runtime changes. Docker re-reads
/configs/config.yamlfrom the container before stop, so a changedruntime:survives a restart even if the DB isn't synced. EC2 trusts the DB only. If you change the runtime via the Config tab and the handler races the restart, Docker will land on the new runtime, EC2 will land on the old one. Fix path: make the Config-tab save explicitly flush to DB before kicking off a restart, not deferred. - Console-output asymmetry. Users debugging a stuck workspace on Docker see
docker logs; on EC2 they seeGetConsoleOutput. The two outputs look nothing alike. Fix path: expose a unifiedGET /workspaces/:id/boot-logthat proxies to whichever backend serves the data. Already partly there viacp_provisioner.Console. - Template script drift.
install.shandstart.shin each template repo do the same high-level work (install hermes-agent, write .env, write config.yaml, start gateway) but must be kept byte-level consistent on the provider-key forwarding block. Easy to forget. Enforced now bytools/check-template-parity.sh(see below) — run it in each template repo's CI. - Both backends panic when underlying client is nil. Discovered by the contract-test scaffold landing in this PR:
Provisioner.{Stop,IsRunning}nil-dereferences the Docker client, andCPProvisioner.{Stop,IsRunning}nil-dereferenceshttpClient. The real code always sets these, so this is theoretical in prod — but it means the contract runner can't execute scenarios against zero-value backends. Fix path: guard each method withif p.docker == nil { return false, errNoBackend }(and equivalent for CP), then flip thet.Skipin the contract tests tot.Run.
Enforcement
tools/check-template-parity.sh(this repo) — ensuresinstall.shandstart.shin a template repo forward identical sets of provider keys. Wire into each template repo's CI asbash $MONOREPO/tools/check-template-parity.sh install.sh start.sh.- Contract tests (stub) —
workspace-server/internal/provisioner/backend_contract_test.godefines the behaviors everyprovisioner.Provisionerimplementation must satisfy. Fails compile when a method drifts betweenDockerandCPProvisioner. Scenario-level runs aret.Skip'd today pending drift risk #6 (see above) — compile-time assertions still catch method drift.
How to update this doc
When you land a feature that touches a handler dispatch on h.cpProv != nil, add or update the matching row. If you can't implement both backends in the same PR, mark the row docker-only or ec2-only and file an issue tracking the gap.