Security: - Replace hardcoded Cloudflare account/zone/KV IDs in wrangler.toml with placeholders; add wrangler.toml to .gitignore, ship .example - Replace real EC2 IPs in docs with <EC2_IP> placeholders - Redact partial CF API token prefix in retrospective - Parameterize Langfuse dev credentials in docker-compose.infra.yml - Replace Neon project ID in runbook with <neon-project-id> Community: - Add CONTRIBUTING.md (build, test, branch conventions, CI info) - Add CODE_OF_CONDUCT.md (Contributor Covenant 2.1) Cleanup: - Replace personal runner username/machine name in CI + PLAN.md - Replace personal tenant URL in MCP setup guide - Replace personal author field in bundle-system doc - Replace personal login in webhook test fixture - Rewrite cryptominer incident reference as generic security remediation - Remove private repo commit hashes from PLAN.md Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
47 KiB
PLAN.md — Molecule AI Build Plan
Completed phases (1–11, 13–14) are documented in
/docsand removed from here. This file tracks only in-progress and upcoming work.
Completed Phases (see /docs for details)
| Phase | Name | Docs |
|---|---|---|
| 1 | Core Loop | docs/architecture/architecture.md, CLAUDE.md |
| 2 | E2E Validation | CLAUDE.md (build/test commands) |
| 3 | Hierarchy & Communication | docs/api-protocol/communication-rules.md |
| 4 | Provisioner | docs/architecture/provisioner.md |
| 5 | Agent Management | CLAUDE.md (API routes) |
| 6 | Bundle Export/Import | docs/agent-runtime/bundle-system.md |
| 7 | Team Expansion | docs/agent-runtime/team-expansion.md |
| 8 | Human-in-the-Loop Approvals | docs/agent-runtime/system-prompt-structure.md |
| 9 | Hierarchical Memory | docs/architecture/memory.md |
| 10 | Observability (Langfuse) | docs/development/observability.md |
| 11 | Canvas Polish & UX | docs/frontend/canvas.md |
| 13 | Runtime Enhancements | docs/agent-runtime/workspace-runtime.md |
| 14 | Production Hardening | docs/architecture/provisioner.md, CLAUDE.md |
| 15 | Per-Workspace Dir | PR #38 — workspace_dir per workspace |
| 16 | Plugin System | PR #39 — per-workspace plugins with registry |
| 17 | Agent GitHub Access | PR #40 — git/gh in images, GITHUB_TOKEN env |
| 18 | File Browser Lazy Loading | PR #37 — depth=1, path traversal protection |
| 19 | MCP Full Coverage | PR #40 — 52→54 tools (plugins, global secrets, pause/resume, org, delegation) |
| 20 | Canvas UX Sprint | PRs #4, #21, #39 — Settings Panel, Onboarding, Plugins UI, Pause/Resume |
| 21 | Claude Agent SDK Migration | PR #48 — ClaudeSDKExecutor replaces CLI subprocess |
| 22 | Cron Scheduling | PR #49 — recurring tasks via cron expressions, Canvas Schedule tab |
| 23 | Code Quality & Multi-Provider | PR #50 — model fallback, DeepAgents full SDK, 7 LLM providers, 100% test coverage |
| 24 | Async Delegation | PR #41 — non-blocking delegation with status polling, check_delegation_status tool |
| 25 | Social Channels | PR #54 — adapter-based Telegram integration, Canvas Channels tab, 7 MCP tools, hot reload, multi-chat IDs, auto-detect, /start auto-reply, full Telegram Bot API audit fixes |
| 26 | Auth Env Vars | PR #55 — required_env config replaces .auth-token files, env-var only path; reno-stars 15-agent org template |
| 27 | Channel Polish & Org Auto-link | PR #56 — poller lifetime fix (bgCtx), Restart Pending button (only when needed), org template channels: field auto-links Telegram on import |
Phase 12: Code Sandbox — DONE
Three-backend sandbox for the
run_codetool, selectable per-workspace viaSANDBOX_BACKENDenv (set fromconfig.yaml → sandbox.backend).
run_codetool —workspace-template/builtin_tools/sandbox.pysubprocessbackend (default) — asyncio subprocess with hard timeoutdockerbackend — throwaway container with resource limits (MVP)e2bbackend (cloud) — E2B microVMs viae2b-code-interpreter, readsE2B_API_KEY- Sandbox config —
SandboxConfigdataclass inworkspace-template/config.py
Firecracker-as-a-backend is intentionally skipped: each tenant platform now
runs on a Fly Machine (which IS a Firecracker microVM — see Phase 32
Phase B), so the entire workspace process is already Firecracker-isolated
from other tenants. Running Firecracker inside Firecracker would double-
nest for no additional security. For stronger per-call isolation within
one tenant, use the e2b backend.
Phase 20: Canvas UX Sprint — MOSTLY COMPLETE
UX specs created by UIUX Designer agent. See
docs/ux-specs/for full specs.
20.1 Settings Panel (Global Secrets UI) — DONE
Spec: docs/ux-specs/ux-spec-settings-panel.md
- Gear icon in canvas top bar (Cmd+, shortcut)
- Slide-over drawer (480px, right-anchored)
- Service groups (GitHub, Anthropic, OpenRouter, Custom)
- CRUD: add, view (masked), edit, delete secrets
- Empty state with guided setup
- Unsaved changes guard on close
20.2 Onboarding / Deploy Interception — DONE
Spec: docs/ux-specs/ux-spec-onboarding-interception.md
- Pre-deploy secret check — detect missing API keys per runtime
- Missing Keys Modal — inline form, only asks for what's needed
- Provisioning timeout → named error state with recovery actions
- No dead ends — every error has a fix action
20.3 Canvas UI Improvements — PARTIAL
Spec: docs/ux-specs/ux-spec-canvas-improvements.md
- Plugins install/uninstall in Skills tab (PR #39)
- Pause/resume from context menu
- Org template import from canvas (PR —
OrgTemplatesSectionin TemplatePalette) - Workspace search (Cmd+K)
- Batch operations
Phase 30: SaaS — Remote Workspaces & Cross-Network Federation — IN PROGRESS
Goal: let a Python agent running on a laptop in another city boot, register, authenticate, accept A2A from its parent PM on the platform, and appear on the canvas as a first-class workspace.
Why now: the self-hostable single-box model has landed; the next meaningful expansion is letting orgs span machines and networks. This is the step that turns Molecule AI from "Docker-compose on one box" into a multi-tenant SaaS-shaped product.
Design thesis: ride the existing runtime='external' escape hatch.
Every Docker-touching handler already short-circuits when a workspace
is external. We don't need a parallel subsystem — we need to close
four small gaps and add per-workspace auth. See
docs/remote-workspaces-readiness.md
for the full code audit.
Shipping order (eight bounded steps, ~2 weeks to GA)
-
30.1 Workspace auth tokens — foundation; prevents spoofing. New
workspace_auth_tokenstable;POST /registry/registerissues a token; middleware validatesAuthorization: Bearer <token>on/registry/heartbeat,/registry/update-card. Lazy bootstrap so in-flight workspaces upgrade gracefully. Transparent to local containers — provisioner carries the token through the existing env-var pattern. No feature flag. -
30.2 Secrets pull endpoint —
GET /workspaces/:id/secrets/valuesreturns decrypted secrets JSON, gated by the 30.1 token. Local agents can use it too (removes env-at-create coupling for rotating secrets). -
30.3 Plugin tarball download —
GET /plugins/:name/downloadreturns a tarball; agent unpacks locally. Replaces Docker-exec plugin install for remote agents. BehindREMOTE_PLUGIN_DOWNLOAD_ENABLED. -
30.4 Workspace state polling —
GET /workspaces/:id/statereturns{status, paused, deleted_at, pending_events[]}as a drop-in for the WebSocket feed remote agents can't reach. BehindREMOTE_STATE_POLLING_ENABLED. -
30.5 A2A proxy token validation — the proxy enforces the caller's auth token on
POST /workspaces/:id/a2a. Mutual auth between agents. -
30.6 Direct sibling discovery + URL caching — agents call
GET /registry/{parent_id}/peersonce, cache sibling URLs, call them directly for A2A. Resilient to brief platform outages. -
30.7 Poll-liveness for external runtime —
LivenessCheckerinterface inregistry/;PollLivenessmarks offline if no heartbeat in 90s. Docker checker becomes one implementation, poll-liveness another. Health sweep routes by runtime. BehindREMOTE_LIVENESS_POLLING_ENABLED. -
30.8 Remote-agent SDK + docs —
sdk/python/molecule_agent/thin client: register → pull secrets → run A2A loop → poll state → heartbeat. Workingsdk/python/examples/remote-agent/a new user can run on a laptop. Remove the three feature flags. Remote workspaces become GA.
Out of scope for Phase 30
- Mutual TLS / platform-identity verification from the agent side. Agent trusts any platform URL in its env. Defer until real multi- tenant deployment forces the question.
- Agent-to-agent mesh across NATs. Direct sibling calls only work when siblings are reachable from each other. Behind-NAT ↔ behind-NAT needs a relay — defer to Phase 31.
- Platform-managed persistent state for remote agents. Remote agents own their filesystem; platform never mounts.
Success criteria
sdk/python/examples/remote-agent/boots on a laptop disconnected from the platform's LAN, registers, receives a task from parent PM via A2A, returns a result, appears on the canvas.tests/e2e/test_federation.shspawns a second platform instance + remote agent pointing at the first; both platforms see the agent as a workspace in the right state.- Spoofing test: attempt to impersonate a workspace with a guessed ID but no token → 401.
Phase 31 — Quality + Infra Pass (Q2 2026) — SHIPPED 2026-04-13
Completed in PRs #1–#8 and documented in docs/edit-history/2026-04-13.md:
- Brand migration cleanup — LICENSE "Agent Molecule" → "Molecule AI"; new icon assets (PR #1).
- Repo structural cleanup — moved
examples/remote-agent/→sdk/python/examples/,docs/superpowers/plans/→plugins/superpowers/plans/; deleted emptyplatform/plugins/; gitignored.agents/,platform/workspace-configs-templates/,backups/,logs/,test-results/; added READMEs undertests/anddocs/(PR #3). - MCP per-domain split —
mcp-server/src/index.ts1697 → 89 lines; 12 per-domain modules insrc/tools/; sharedsrc/api.ts; startup log now reports 87 tools (PRs #2, #4, #7). - Canvas dialog unification — native
confirm()/alert()replaced withConfirmDialogin 7 sites; newsingleButtonprop + 5 tests (vitest 352 → 357). - Platform handler decomposition — 4 oversize functions (
proxyA2ARequest,Delegate,Discover,SessionSearch) split into testable helpers; +47 Go tests;handlerscoverage 56.1% → 57.6%. - Env-var documentation —
.env.examplegained 11 previously-undocumented vars; all 21 distinctos.Getenv/envx.*keys now documented. - E2E hardening + CI — Phase 30.1 bearer auth + Phase 30.6
X-Workspace-IDrequirements baked intotest_api.sh(62/62) andtest_comprehensive_e2e.sh(67/67); shared_lib.sh+_extract_token.py; new CI jobse2e-apiandshellcheck;setup-gogains module cache (PRs #5, #7, #8).
PR Workflow Rules
All PRs must follow this checklist:
- Branch: Never push to main. Always create a feature/fix branch.
- Code Review: Run
/code-reviewskill and fix all issues before requesting merge. - Tests: All existing tests must pass. New features require new tests.
- Documentation: Run
/update-docsskill. Every PR must update:docs/edit-history/session log- Relevant docs in
docs/(API, architecture, frontend, etc.) CLAUDE.mdif routes, env vars, or commands changedPLAN.mdif the work completes a phase or adds new items
- E2E Test: Rebuild, restart service, and manually verify before reporting done.
- QA Review: QA Engineer reviews for edge cases, plan compliance, and documentation completeness before CEO merge approval.
- CEO Approval: Only the CEO approves merges. Never merge without explicit approval.
Ecosystem Awareness
Adjacent projects worth tracking (Holaboss, Hermes, gstack, …) are catalogued
in docs/ecosystem-watch.md. Skim quarterly,
add entries liberally, and when one of those projects ships something we
should react to, file a "Signals to react to" line in that doc and create a
Backlog entry below pointing at it. Agents doing research or strategy work
should read docs/ecosystem-watch.md first — it's the canonical starting
point for "what else is out there."
Backlog (prioritized)
- Canvas: Org template import — Phase 20.3 (deploy org from canvas UI)
- Canvas: Workspace search (Cmd+K) — Phase 20.3 (quick find)
- Canvas: Batch operations — Phase 20.3 (multi-select delete/restart)
- Sandbox: Firecracker/E2B backends — Phase 12 (production isolation)
- NemoClaw adapter — stub exists at
adapters/nemoclaw/, no implementation yet - Remote plugin registry — install plugins from npm/git (currently local only)
- Agent git worktrees — per-agent branches without full clone
- SDK follow-ups — live tool-call visibility, cost telemetry, cancel UX, governance hooks
- Real webhook mode for channels — Phase 27 candidate. Currently polling-only; webhook needs:
mode: "webhook"|"polling"config fieldPUBLIC_URLenv var- Platform calls
setWebhookon channel create (with randomwebhook_secret),deleteWebhookon delete - Canvas toggle to enable webhook mode (only when PUBLIC_URL is set)
- Polling works fine for ≤hundreds of bots; webhook needed at thousands+ scale or for serverless
- More channel adapters — Slack (OAuth + Events API), Discord (Bot + Gateway), WhatsApp (Cloud API)
- Delegations list endpoint mismatch —
GET /workspaces/:id/delegationsreturns[]while the agent's internalcheck_delegation_statusshows active/completed delegations. One source of truth. - YAML-configurable per-agent repo access — new
workspace_access: none|read_only|read_writefield inorg.yaml+:robind-mount for research agents; eliminates the "PM couriers documents to reports" workaround. - SDK executor swallows subprocess stderr —
workspace-template/claude_sdk_executor.pysurfaces only "Command failed with exit code 1 / Check stderr output for details" when theclaudeCLI crashes, making every failure opaque. Capture stderr, log at ERROR, include first ~1 KB in the A2A error response. High priority — blocked real debugging during PLAN.md coordination on 2026-04-12. - Agent MCP client defaults to
localhost:8080— inside a workspace container,localhostis the container itself, not the platform — somcp__molecule__*tools fail with "platform unreachable." InjectMOLECULE_URL=${PLATFORM_URL}into every container at provision time and change the MCP client default tohttp://host.docker.internal:8080. High priority — blocks agents from calling platform tools (e.g. PM couldn't restart its own reports).
Note: items 11–14 previously carried sequential refs
#64–#67. Those refs were placeholder enumeration, not GitHub issues. They now collide with actual merged PRs and issues with different scopes, so the refs were removed in 2026-04-14 tick-5. If/when these items get prioritized, file real GitHub issues for them.
- Workspace
restart_prompt— user-defined restart context (#19 Layer 2) — GitHub issue #66 (new 2026-04-14 tick-4 follow-up to PR #65 which shipped Layer 1). Letconfig.yaml/org.yamldeclare a user-authoredrestart_promptthat is delivered alongside the platform-generated restart-context system message — e.g. "re-read your CLAUDE.md, re-hydrate TODOs from memory, resume the active delegation." Layer 1 (platform state snapshot) already ships; Layer 2 adds the user-defined side.
Recently launched (2026-04-14 tick-4)
- GitHub issue #15 — Provisioner: auto-refresh
CLAUDE_CODE_OAUTH_TOKENfromglobal_secretson workspace restart → DONE via PR #64 (SetGlobal/DeleteGlobalnow fan outRestartByIDto every affected workspace). - GitHub issue #19 Layer 1 — Platform-generated restart context → DONE via PR #65 (synthetic A2A
message/sendwithmetadata.kind=restart_context,system:restart-contextcaller prefix, 30s re-register wait). Layer 2 deferred to issue #66 (see Backlog item 15 above).
Recently launched (2026-04-15 overnight sweep — ticks 17–30+, ~27 PRs)
Security hardening cluster. Roughly half the sweep was closing auth gaps surfaced by the Security Auditor's hourly audit cron:
#94RFC-1918 + link-local in registry URL validator#99AdminAuth gate onGET /workspaces(topology leak / #104)#106path-sanitize + admin-gatePOST /org/import(#103 HIGH)#110revokeworkspace_auth_tokenson workspace delete#119IPv6 SSRF blocklist (fe80::/10, ::1/128, fc00::/7) + scheduler unit tests#162field-level authz onPATCH /workspaces/:id(#138 — cosmetic vs sensitive split)#155wire existingSecurityHeadersmiddleware into router#167gate 6 previously-unauth routes behindAdminAuth(#164 CRITICAL anon bundles/import; #165 HIGH events+bundles/export topology leak; #166 MED viewport+liveness)#185AdminAuthonGET /approvals/pending(#180)#200AdminAuthonPOST /templates/import(#190 HIGH)#203CanvasOrBearermiddleware — route-split for #168 canvas regression, onlyPUT /canvas/viewport; rejected PR #194's broader Origin-fallback approach because it would have re-opened #164#209source_id spoof defense inactivity.Report(cherry-picked from the rejected #169 batch)#233resolveInsideRootonPOST /workspaces template/runtime(#226 MED)
Data integrity. Three bugs that would have silently corrupted state:
#212CRITICAL migration-runner bug —RunMigrationsglobbed*.sqland alphabetically ran.down.sqlBEFORE.up.sqlon every boot, wipingworkspace_auth_tokens(and 018/019 pairs). Filter fix + unit test inpostgres_migrate_test.go.#224YAML injection ingenerateDefaultConfig— body.Name now emitted as a double-quoted YAML scalar with all control chars escaped. Structural test (parse + verify key count).#236log-injection in the #209 security-event log line — attacker-controlled source_id echoed via%sallowed fake log entries; switched to%q.
CI / infra.
#186+ controlplane#28— every CI job migrated fromubuntu-latestto[self-hosted, macos, arm64](Mac miniself-hosted-runner). Non-trivial:services:replaced with inlinedocker runcontainers (ports 15432/16379),actions/setup-pythonbypassed via Homebrew python3.11 on$GITHUB_PATH,docker/setup-qemu-actionadded for cross-arch builds. Workaround for GH Actions billing cap on private repos.#149independent heartbeat pulse goroutine so long cron fires don't look stale on/admin/liveness(#140)#211migration runner regression (see #212 above — PR #212 is the fix)- Fly registry
FLY_API_TOKENrotated to a deploy token scoped tomolecule-tenant(previously personal token, was rotated during the security incident remediation)
Platform / Scheduler reliability.
#95panic-recover in schedulertick()+ per-fire goroutines (closes #85)#207concurrency-aware skip —scheduler.fireSchedulereadsworkspaces.active_tasksand advancesnext_run_at+ records acron_runrow withstatus='skipped'instead of colliding with a busy agent (#115)#206surfaceerror_detailin schedule history API (#152 problem B)
Workspace runtime features.
#205idle-loop reflection pattern — opt-inidle_prompt+idle_interval_secondsinconfig.yaml; self-sends whenheartbeat.active_tasks == 0. Hermes/Letta shape.#208Hermes Phase 1 multi-provider registry — 15 providers viaadapters/hermes/providers.py(Nous, OpenRouter, OpenAI, Anthropic, xAI, Gemini, Qwen, GLM, Kimi, MiniMax, DeepSeek, Groq, Together, Fireworks, Mistral). 26 tests.#198A2A protocol compliance batch (#173/#174/#175):cancel()emitsTaskStatusUpdateEvent(canceled, final=True),stateTransitionHistory=Truein AgentCapabilities. Regression:push_sender=PushNotificationSender()crashed on startup because PushNotificationSender is abstract — reverted in #210.#216idle-loop pilot enabled on Technical Researcher workspace.#225+#235auth_headers()on/registry/register+ initial_prompt + idle loop self-posts (#215/#220)#231Claude SDK stderr probe for proper rate-limit error attribution (#160 diagnostics)
Controlplane (molecule-controlplane).
#19+#20Grafana Cloud remote-write counter registry (cp_requests_total), push loop toprometheus-prod-32-prod-ca-east-0.grafana.net, Basic auth with user 3116422#21AWS KMS envelope encryption — per-secret DEK viaGenerateDataKey, dual-mode (v2 blobs via KMS, legacy via static key, auto-routes by leading byte)#24/cp/statusdeep probe for Betterstack#26+#27public/legal/{terms,privacy,dpa,acceptable}pages from embedded markdown + smoke coverage- Isolation red-team test suite + observability runbooks (Grafana dashboard, Betterstack, Stripe Atlas)
Self code-review follow-ups (#228 + #232). Ran /code-review on the batch merges, surfaced 8 🟡 issues, split into Go (#228) and Python/docs (#232):
CanvasOrBearerinvalid-bearer fall-through fixshort()helper replacing unsafe[:N]slices inscheduler.go- 6 new tests (
TestShort_helper,TestRecordSkipped_*,TestActivityHandler_Report_*,TestHistory_IncludesErrorDetail) - idle-loop hardening (
asyncio.get_running_loop(),IDLE_FIRE_TIMEOUT_SECONDSclamp, typed exception handling,add_done_callbackfor fire-and-forget error logging) idle_prompt/idle_interval_secondsdocumented inorg.yamldefaults- New
docs/runbooks/admin-auth.md— the three middleware variants + three-question test for adding toCanvasOrBearer
Test counts post-sweep: +70 Go (816 total), +40 Python (1180 total), +0 Canvas vitest (453 unchanged — UI/a11y patches only).
Outstanding (user action): #126 Slack adapter (Phase-H product decision), #160 Claude Max OAuth quota (wait for 2026-04-17 23:00Z reset OR upgrade OR switch to ANTHROPIC_API_KEY), #191 runner persistent-state docs (P3), #199 Fly registry token (resolved this session but publish-platform-image re-run pending runner), Stripe Atlas application (launch blocker, 2-week lead).
Recently launched (2026-04-15 tick-9)
- Phase 32 Phase B.2 (image pipeline) — PR #80 adds
.github/workflows/publish-platform-image.yml: on every main-merge touchingplatform/**, buildsplatform/Dockerfileand pushesghcr.io/molecule-ai/platform:latest+:sha-<commit>to GHCR. Paired with the privatemolecule-controlplaneFly + Neon provisioner (PR #3 there) that readsTENANT_IMAGEenv and boots tenant Fly Machines from this image. Tick-8 docs-sync PR #79 also landed.
Recently launched (2026-04-14 tick-8)
- Phase 32 PR #1 —
TenantGuardmiddleware (PR #78). Public repo's only SaaS hook: whenMOLECULE_ORG_IDenv is set, non-allowlisted requests require matchingX-Molecule-Org-Idheader or 404. Unset → passthrough (self-hosted unchanged). Allowlist is exact-match:/health+/metrics. Paired with the privateMolecule-AI/molecule-controlplanerepo scaffolded this tick (Fly Machines provisioner stub,/cp/orgsCRUD, subdomain→fly-replay router, migrations 001-003 fororganizations/org_instances/org_members). +6TestTenantGuard_*tests. Phase 32 plan: follow-up PRs wire real Fly provisioner, WorkOS AuthKit, Stripe, Cloudflare, signup UX — all in the private repo except the single public middleware.
Recently launched (2026-04-14 tick-7)
- GitHub issue #24 — Runtime-added workspace_schedules drift on org re-import → DONE via PR #76 (new
sourcecolumn onworkspace_schedulesvia migration022; org/import now upserts withON CONFLICT (workspace_id, name) DO UPDATE ... WHERE source='template', so runtime-added rows survive re-imports; legacy rows backfilled to'template'; +3 tests). - GitHub issue #51 — PM hardcoded audit-category routing → DONE via PR #75 (generic
category_routing:block inorg-templates/<name>/org.yamldefaults+ per-workspace override; rendered into each workspace'sconfig.yamlviarenderCategoryRoutingYAMLusingyaml.Node+yaml.Marshalfor safe escaping; PM prompt replaced with generic config-lookup; +6 tests). - PR #74 —
org-templates/molecule-dev/org.yamlrole overrides shrunk to just the deltas now that UNION semantics (PR #71) are in effect — removes verbose re-listing of defaults across PM, Research Lead, Research sub-roles, Security Auditor, UIUX Designer.
Recently launched (2026-04-14 tick-6)
- GitHub issue #68 — Per-workspace
plugins:REPLACE semantics caveat → DONE via PR #71 (mergePluginshelper inplatform/internal/handlers/org.gonow UNIONs per-workspace withdefaults.plugins;!pluginor-pluginprefix on a per-workspace entry opts a default out; +5TestPlugins_*tests). Role overrides inorg-templates/*/org.yamlcan now declare just the delta instead of restating every default.
Recently launched (2026-04-14 tick-5)
- PR #70 — Wired the 12 modular plugins from PR #63 (tick-4) into the default
molecule-devorg template.defaults.pluginsexpands from 3 → 9 (safety hooks + operational-memory skills become universal); PM role gainsmolecule-workflow-triage+molecule-workflow-retro, Security Auditor gainsmolecule-skill-code-review+molecule-skill-cross-vendor-review+molecule-skill-llm-judge. Verbose per-role re-listing is a consequence of REPLACE (not UNION) semantics inplatform/internal/handlers/org.go; union-semantics proposal tracked as issue #68. - PR #69 — Backlog items 11–14 stripped of stale sequential refs
#64–#67(see footnote near item 15 above).
Test Coverage
| Stack | Tests | Framework |
|---|---|---|
| Go (platform) | 726 | go test -race (raw PASS lines incl. subtests; +6 top-level Test* this tick: #64 secrets auto-restart x2, #65 restart-context x4) |
| Python (workspace) | 1,140 | pytest |
| Canvas (frontend) | 357 | Vitest |
| SDK (python) | 132 | pytest |
| MCP server | 97 | Jest |
| Total | 2,452 |
E2E: 67/67 comprehensive checks passing, 62/62 API tests (also gated in CI e2e-api job), shellcheck-clean across all 5 E2E scripts.
Team Assignments
| Agent | Current Focus |
|---|---|
| PM | Sprint coordination, backlog prioritization |
| Dev Lead | Engineering planning, PR review |
| UIUX Designer | UX specs for Phase 20 (DONE — 5 specs delivered) |
| Frontend Engineer | Phase 20.3 remaining items (org import, search, batch) |
| Backend Engineer | Sandbox production backends, API completeness |
| QA Engineer | Review every PR for docs + plan compliance |
| DevOps Engineer | CI/CD, Docker image optimization |
| Security Auditor | API key handling, path traversal, auth review |
Next Steps
- Frontend Engineer implements remaining Phase 20.3 items (org import from canvas, Cmd+K search)
- Backend Engineer scopes Firecracker/E2B sandbox backends (Phase 12)
- QA Engineer reviews PR #52 for docs compliance before merge
- All agents use
GITHUB_TOKENenv var to clone repo, branch, and create PRs
Plugin Adaptor System — shipped; deferred follow-ups only
The system is done. Landed (see feat/plugin-adaptor-registry and feat/agentskills-compliance):
per-runtime plugin adaptors, hybrid resolver (registry > plugin-shipped >
raw-drop), AgentskillsAdaptor covering rule+skill plugins for all
runtimes, /plugins?runtime= filter, /workspaces/:id/plugins/available
endpoint, molecule-plugin SDK, gemini org parity with molecule-dev,
and full agentskills.io spec compliance for all first-party skills
(installable in Claude Code, Cursor, Codex, and ~35 other skill-compatible
tools — see docs/plugins/agentskills-compat.md).
Deferred, not blocking:
- Upstream
runtime-adapters/extension to agentskills.io spec — once we've lived with our own per-runtime adapter model for ~month, propose it as a spec extension toagentskills/agentskillsso other tools can share Molecule AI-authored adaptors. - Install-from-GitHub-URL flow —
POST /plugins/install {git_url}that clones a repo into the registry, validates the manifest, and runs the adaptor through a sandbox. Needs signature/version pinning and a review of the adaptor-execution threat model before shipping. - Promote-to-default UI — today, promoting a community plugin to
"curated" means manually copying its
adapters/<runtime>.pyintoworkspace-template/plugins_registry/<plugin>/. Later add a canvas button + PR template that opens an upstream PR automatically. - Plugin packs — manifest that lists other plugins to bundle
(
superpowers-pack→ installsuperpowers-tdd+superpowers-debug+ …). Skip until a real user asks; first-party plugins are small enough to install individually today. - Hot-reload on DeepAgents — upstream docs say skills/sub-agents are startup-only; would need platform-level container restart on plugin file change. Defer until users complain.
- Atomic split of first-party plugins —
superpowersandeccstill ship as multi-skill bundles. Pipeline already supports splitting but non-urgent. - Sub-agent plugins for non-DeepAgents runtimes — Claude Code / LangGraph don't have a native sub-agent feature; emulating via tool-routing is possible but invasive. Defer.
- Workspace install tracking table — a
workspace_plugin_installstable would let uninstall call the adaptor'suninstall()path reliably. Today uninstall is arm -rf /configs/plugins/<name>which leaves copied skill dirs behind. Low user impact. - Shared org-template
system-prompt.mdvia_shared/— DRY molecule-dev and molecule-worker-gemini. Drift risk; revisit at 3+ orgs.
Phase 32 — Cloud SaaS launch (2026-Q2/Q3)
Goal: ship Molecule AI as a multi-tenant cloud SaaS (not just self-hosted per-customer). Ordered by dependency + ROI.
Current state (2026-04-15)
Live infrastructure:
- Control plane deployed: https://molecule-cp.fly.dev (Fly app
molecule-cp, 2 machines, Neon projectmolecule-cp/cool-sea-89357706) - Tenant app: Fly app
molecule-tenant(Neon parent projectmolecule-tenants/dawn-bar-08311714, tenants get a branch per org) - Shared Redis: Upstash
grateful-prawn-89393.upstash.io(key-prefix isolation, Phase H moves to per-tenant) - Container registry:
registry.fly.io/molecule-tenant:latest(mirrored fromghcr.io/molecule-ai/platform:latestvia GH Actions on every main push) - First real tenant provisioned: org
acme→ Fly machine + Neon branch + encrypted URLs inorg_instances - WorkOS AuthKit live at
/cp/auth/{signup,login,callback,signout,me}— hosted signup redirects correctly; see https://molecule-cp.fly.dev/cp/auth/signup - Stripe billing scaffold deployed in orgs-only mode (no Stripe creds configured yet; webhook handler + signature verification code ready)
- Domain:
moleculesai.app(DNS not yet wired — subdomain routing works viaX-Molecule-Org-Slugheader pending Cloudflare)
Phase status (post 2026-04-15 overnight sweep):
- A — Foundation (accounts, tokens, domain): ✅ done
- B — Fly provisioner + Neon branching: ✅ done
- C — WorkOS AuthKit scaffold + RequireSession + org-ownership check: ✅ done
- D — Stripe billing scaffold + auth-scoped checkout + plan quotas: ✅ code done; live keys pending Stripe Atlas
- E — Cloudflare + DNS
*.moleculesai.app+ per-tenant Vercel canvas: ✅ done - F — Sign-up UX + onboarding: ✅ basic flow done (signup / org create / canvas redirect); polish + email pending
- G — Observability + quotas + admin: ✅ Sentry + Grafana remote-write +
/cp/statusBetterstack probe + per-org rate limiter; admin panel/cp/admin/*pending - H — Hardening: ⏳ partial — AWS KMS envelope encryption ✅ (controlplane PR #21), tenant-isolation red-team CI gate ✅ (
isolation_test.go), legal pages ✅ (/legal/*from controlplane PR #26); load test + Stripe Atlas application + status page custom domain pending - I — Launch: pending Stripe Atlas (~2 week lead)
Live infrastructure deltas (post-sweep):
- Migration runner safety fix landed (#212) —
*.down.sqlfilter; was wipingworkspace_auth_tokenson every restart - Workspace auth tokens now revoked on workspace delete (#110)
- All known unauth admin routes gated; #138 canvas regression resolved via field-level authz +
CanvasOrBearermiddleware - Self-hosted Mac mini CI runner replaced GH-hosted Linux to bypass private-repo Actions billing cap;
FLY_API_TOKENrotated to a deploy token scoped tomolecule-tenantafter the token was rotated during the security incident remediation /legal/{terms,privacy,dpa,acceptable}live athttps://app.moleculesai.app/legal/*
Known open issues on the live system:
- Tenant
/workspacesreturns Neon pooler warnings (unnamed prepared statement does not exist) — lib/pq + Neon pooler incompatibility, tracked for lib/pq → pgx migration in a later phase #160Claude Max OAuth quota exhausted on the agent-fleet token until 2026-04-17 23:00 UTC; mitigations: wait, upgrade plan, OR switch workspace containers toANTHROPIC_API_KEYenv var#191self-hosted runner persistent-state docs (P3, low urgency)#199Fly registry token — resolved in the 2026-04-15 sweep butpublish-platform-imagere-run pending runner availability
Companion repo: Molecule-AI/molecule-controlplane (private). n8n-style open-core split: this public repo stays OSS (tenant binary + plugins + channels, contributable surface); control plane (orgs / signup / billing / provisioner / routing) is private. See molecule-controlplane/PLAN.md for its roadmap.
Tier 1 — blocks multi-tenant launch
- Multi-tenancy:
organizationstable,org_idFK +WHERE org_id = $caller_orgfilter on every row-returning handler (workspaces,workspace_secrets,global_secrets,activity_logs,structure_events,agent_memories,workspace_schedules,workspace_channels). Middleware resolves caller's org from session token → ctx. Full security audit of tenant isolation before first external user. - Human auth + orgs: WorkOS AuthKit (NOT build-yourself, NOT Clerk — WorkOS treats per-org SSO as first-class; Clerk treats it as an upsell). Keep Phase 30.1 bearer tokens for machine-to-machine (agents). Stripe integration via WorkOS hooks.
- Container isolation: replace raw-Docker-socket provisioner
with Fly Machines API (Firecracker microVMs, per-workspace
isolation, sub-second boot, pay-per-second). Today's shared
/var/run/docker.sockis an RCE-to-host footgun that cannot ship multi-tenant.provisionerinterface stays — only backend swaps. Docker path remains for local dev. - Stripe billing: subscriptions + usage metering (workspace-hours, LLM-token pass-through, storage), trial flow, dunning, invoices.
- Per-org resource quotas: tier memory/CPU is configurable (PR #58) but unenforced at provision time. Add per-org ceilings: max workspaces, max concurrent-running, max total memory.
- Managed Postgres + Redis: move off
docker-composefor prod. Neon (serverless, branch-per-PR) for Postgres; Upstash for Redis. Alternative: drop Redis entirely —LISTEN/NOTIFY- advisory locks cover heartbeat TTL + URL cache.
- Secrets at rest via KMS: current
SECRETS_ENCRYPTION_KEYis a single static AES-256 key. Move to AWS/GCP KMS-backed envelope encryption; thesecrets_encryption_versiontable slot is already reserved for rotation. - Migration runner out of app boot: a bad migration currently crashes platform boot with no rollback. Extract to goose as a release step / init container. Auto-discovery runner stays for dev mode only.
Tier 1 follow-ups (before customer #1)
- Observability: wire
/metricsto a scraper (Grafana Cloud or self-hosted). Add Sentry for Go + Next.js error tracking. Langfuse stays for LLM traces. - Rate limiting per-org: global
RATE_LIMIT=600/minis a shared bucket today. Needs per-org + per-endpoint buckets. - Cloudflare in front: WAF + CDN + DDoS. Free tier covers pre-revenue.
- Sign-up / onboarding flow: landing → signup → first workspace in 60 seconds. No such flow today.
- Transactional email: Resend or Postmark.
- Admin panel: view orgs, suspend accounts, see usage, issue refunds. SQL-only at first; UI by ~50 orgs.
- Privacy policy + ToS + DPA: real ones, vetted. GDPR / CCPA data-export + deletion endpoints (workspace-export already exists; need org-level).
Tier 2 — tech-stack upgrades (high ROI, non-blocking)
- Go platform: migrate
lib/pq→ pgx/v5 (1–2 days;lib/pqin maintenance since ~2021). Then sqlc incrementally for new queries — keeps the no-ORM philosophy + typed Go. - Platform async: River (Postgres-backed, Go-native job
queue). Delegation dispatch,
workspace_schedulescron, future billing events + webhook fan-out all migrate cleanly. NOT Temporal — Temporal already ships in workspace-template as an agent tool; keep the separation. - Frontend: TanStack Query for server state. Zustand keeps
pure UI state. Stops reimplementing cache / refetch / dedup. WS
updates flow via
qc.setQueryData. Single highest-ROI frontend refactor. - Turbopack for
next build: one flag, 2–5× cold-build speedup. - Python workspace runtime → uv:
uv pip installinentrypoint.shcuts workspace cold-start 10–100×. User-visible latency win. - Python MCP client inside runtime: today
mcp-server/exposes the platform as an MCP server; agents inside workspaces can't yet consume external MCP servers. Closing the gap joins the winning 2026 ecosystem. - shadcn/ui CLI convention: already Radix + Tailwind;
adopt
npx shadcn add …passively for new components. No rewrite.
Tier 3 — explicitly NOT doing
- Kubernetes: company-of-one cannot run K8s. Fly Machines covers isolation without the ops tax.
- ORM (GORM / ent / bun): raw-SQL + sqlc covers every case.
- Framework swap (Next → Vite / TanStack Start): 2-week rewrite buys nothing users see.
- Auth-from-scratch: every hour on auth is an hour not on product.
- Canvas library swap (xyflow → tldraw): xyflow is still the correct tool for typed node graphs.
Tier 4 — compliance / enterprise (when revenue lands)
- SOC 2 via Drata / Vanta
- Status page (Betterstack or Instatus)
- Staging environment that mirrors prod
- Blue-green / canary deploy pipeline
- Per-org backup + point-in-time restore
- Load testing (
hey/vegeta) — current per-node ceiling unknown
Success criteria for Phase 32
- Customer can sign up at moleculesai.app, create an org, deploy their first workspace, send their first message in < 5 minutes.
- Two orgs on the same cluster cannot observe each other's workspaces, secrets, memory, or activity — verified by automated tenant-isolation test + manual red-team.
- Fly Machines cost per active workspace-hour documented and reproducible.
- Stripe-backed subscription + usage-based add-ons working end-to- end in sandbox.
- One paying design partner on the cluster, paying a real invoice.
Phase 34: Partner API Keys — Programmatic Org Management
Goal: Enable partner platforms, CI/CD pipelines, and automation tools to create and manage orgs via API without a browser session. Critical for partner integrations, marketplace resellers, and internal testing.
Docs:
docs/architecture/partner-api-keys.md
Phase 34.1 — Core infrastructure
- Migration:
partner_api_keystable (key_hash, scopes, org_id, rate_limit) internal/auth/partner_keys.go— key validation, SHA-256 hashing, scope check- Update
auth.Middleware— checkBearer mol_pk_*before WorkOS session - Scope enforcement helpers —
RequireScope("orgs:create")per handler
Phase 34.2 — Admin endpoints
POST /cp/admin/partner-keys— create key (returns plaintext once)GET /cp/admin/partner-keys— list keys (prefix + metadata only)DELETE /cp/admin/partner-keys/:id— revoke key
Phase 34.3 — Rate limiting + audit
- Per-key rate limiter (separate from session rate limit)
last_used_attracking on each request- Add
mol_pk_to pre-commit secret scanner
Phase 34.4 — Partner onboarding
- Partner onboarding guide (docs)
- Example: create org → poll status → redirect user to tenant
- Example: CI/CD test org lifecycle (create → test → delete)
Success criteria for Phase 34
- Partner can
POST /cp/orgswith an API key and get a provisioned org - Org-scoped keys cannot access other orgs
- Revoked keys immediately return 401
- Rate limiting prevents abuse
- Full audit trail: who created which key, when last used
Phase 36: Full Staging Environment — GATES ALL INFRA CHANGES
Goal: Stop merging untested infra changes to production. Every change ships to staging first, gets verified, then promotes to production.
Why now: The 2026-04-17 session broke CI twice and caused hours of edge cache issues because there was no staging to catch regressions. This gates Phase 33 (Tunnel migration) and Phase 35 (security hardening).
Docs:
docs/architecture/staging-environment.md
Phase 36.1 — Railway + Neon staging
- Create Railway
stagingenvironment with staging-specific vars - Create Neon staging branch from main
- Add
staging.api.moleculesai.appCNAME to Railway staging - Verify CP deploys and boots on staging
Phase 36.2 — Image + deploy pipeline
- Publish workflow pushes
:stagingtag (not:latest) on main merge - Add
promote-to-production.ymlworkflow (manual trigger) - Promotion: retag
:staging→:latest, deploy CP to production - Production tenants auto-update via Option B cron
Phase 36.3 — Staging DNS + Vercel
*.staging.moleculesai.appfor staging tenant subdomainsstaging.app.moleculesai.appfor Vercel staging preview- Staging Cloudflare Tunnel (or Worker) for tenant routing
Phase 36.4 — Automated verification
- Post-deploy staging smoke test (run
test_saas_tenant.sh) - Block promotion if smoke test fails
- Slack/GitHub notification on staging deploy + promotion
Success criteria for Phase 36
- No infra change reaches production without passing staging first
- Staging mirrors production (same services, same auth, separate data)
- Promotion is a single manual action (button click or CLI command)
- Staging cleanup is automated (terminate test EC2s after verification)
Phase 33: Tenant Subdomain Routing — MIGRATING TO CLOUDFLARE TUNNEL
Original: Wildcard DNS + Cloudflare Worker (implemented 2026-04-17). Replacing with: Cloudflare Tunnel per tenant (issue #933). Worker approach caused edge cache poisoning + security gaps (ADMIN_TOKEN in plaintext, unencrypted HTTP). Tunnel eliminates all of these. Docs:
docs/architecture/wildcard-dns-proxy.md(original), issue #933 (tunnel migration plan). Prerequisite: Phase 36 (staging) — test tunnel on staging first.
Phase 33.1 — Worker + wildcard DNS (no tenant changes)
- Create Cloudflare Worker that extracts slug from hostname, looks up backend IP from CP API, proxies request to EC2
- Add
GET /cp/orgs/:slug/instanceendpoint to CP (public, rate-limited) - Add
*.moleculesai.appwildcard DNS record (proxied, orange cloud) - Worker serves static "provisioning" splash page when tenant not ready
- Deploy Worker via
wrangler deploy+ GitHub Actions - Verify Worker routing works for existing tenants alongside old A records
Phase 33.2 — Stop per-tenant DNS records
- Remove Cloudflare A record creation from
ec2.goprovisioner - Remove Cloudflare DNS cleanup from deprovision/purge cascade
- Existing A records coexist harmlessly (explicit wins over wildcard)
Phase 33.3 — Remove Caddy from EC2
- Worker handles TLS termination — EC2 runs plain HTTP only
- Remove Caddy install + Caddyfile from EC2 user-data script
- EC2 security group: allow inbound HTTP from Cloudflare IPs only
- ~30s faster cold start (no apt-get caddy, no Let's Encrypt)
Phase 33.4 — Cleanup
- Delete old per-tenant A records from Cloudflare
- Remove
cloudflareapi/package from CP (Worker replaces it) - Update
docs/runbooks/saas-secrets.mdwith Worker secrets
Success criteria for Phase 33
- New org subdomain resolves instantly (zero DNS wait)
- No NXDOMAIN caching — user never sees "site can't be reached"
- Provisioning splash page shown while EC2 boots (auto-refreshes)
- Cold start ~30s faster (no Caddy/Let's Encrypt)
- Cost: Cloudflare Worker free tier or $5/mo
Phase 35: SaaS Production Hardening (post-2026-04-17 retrospective)
Goal: Address security gaps, remove debug code, fix workspace registration, and reduce boot time identified during the SaaS buildout session. See
docs/retrospectives/2026-04-17-saas-buildout.mdfor full context.
Phase 35.1 — Security (CRITICAL, before any public launch)
- Fix #756 — X-Workspace-ID header forge bypasses CanCommunicate (derive callerID from authenticated token, not raw header)
- Fix #757 — GLOBAL memory poisoning mitigations (content delimiters
- audit log at minimum)
- Remove ADMIN_TOKEN from public
/cp/orgs/:slug/instanceendpoint — store in Worker KV at provision time instead - Encrypt ADMIN_TOKEN in
org_instancestable (use envelope key) - Remove debug HTTP server (:9999) from workspace boot script
- Remove
set -exfrom boot scripts (leaks env vars to EC2 console) - Restrict workspace EC2 security group (Cloudflare IPs + tenant IP only)
- Add HTTPS between Worker and EC2 (or Cloudflare Tunnel)
Phase 35.2 — Workspace registration fix
- Pass workspace auth token in EC2 boot script env so runtime can
register with
POST /registry/register - Or: have runtime request a token at startup via
GET /admin/workspaces/:id/test-token - Verify workspace status flips to "online" on Canvas after boot
- Test full Canvas flow: deploy → STARTING → online → chat works
Phase 35.3 — Boot time optimization
- Pre-baked AMI per runtime (Packer or EC2 Image Builder):
ami-hermes: Python + openai + anthropic + molecule-runtime + hermes adapterami-claude-code: Node + claude-code SDK + molecule-runtimeami-langgraph: Python + langchain + langgraph + molecule-runtime
- Runtime switch = launch from different AMI. Boot ~30s vs current ~9 min
- Remove apt-get + pip install from boot script (only config + secrets + start)
Phase 35.4 — Stability + CI
- Fix go.mod replace directive (PR #900) — unblocks all CI
- Use stable origin IP for wildcard DNS (dedicated proxy or Tunnel)
- Add workspace boot integration test to CI
- Add SaaS tenant smoke test (
tests/e2e/test_saas_tenant.sh) to CI - Clean up Cloudflare edge cache poisoning from session (or wait ~24h for natural expiry)
Infra footnote — Temporal
docker-compose.infra.yml now includes Temporal (:7233 gRPC, :8233 Web
UI) backing workspace-template/builtin_tools/temporal_workflow.py for
durable long-running agent workflows. All infra services share the
molecule-monorepo-net Docker network, which infra/scripts/setup.sh
creates idempotently. Temporal currently runs with no auth on
0.0.0.0:7233 — dev-only; any production deployment must front it with
mTLS, API keys, or a reverse proxy before exposing the cluster.