Full retrospective of the 2026-04-16/17 SaaS buildout session: - What was done (infra migration, 40+ PRs, 5 issues, 4 docs, 1 new repo) - What should NOT have been changed (wildcard DNS churn, AdminAuth shortcut) - Security concerns (8 items, 2 CRITICAL) - Workflow gaps (registration, boot time, CI) - Tests needed (automated + manual + security) Phase 35 in PLAN.md covers production hardening follow-ups. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
43 KiB
PLAN.md — Molecule AI Build Plan
Completed phases (1–11, 13–14) are documented in
/docsand removed from here. This file tracks only in-progress and upcoming work.
Completed Phases (see /docs for details)
| Phase | Name | Docs |
|---|---|---|
| 1 | Core Loop | docs/architecture/architecture.md, CLAUDE.md |
| 2 | E2E Validation | CLAUDE.md (build/test commands) |
| 3 | Hierarchy & Communication | docs/api-protocol/communication-rules.md |
| 4 | Provisioner | docs/architecture/provisioner.md |
| 5 | Agent Management | CLAUDE.md (API routes) |
| 6 | Bundle Export/Import | docs/agent-runtime/bundle-system.md |
| 7 | Team Expansion | docs/agent-runtime/team-expansion.md |
| 8 | Human-in-the-Loop Approvals | docs/agent-runtime/system-prompt-structure.md |
| 9 | Hierarchical Memory | docs/architecture/memory.md |
| 10 | Observability (Langfuse) | docs/development/observability.md |
| 11 | Canvas Polish & UX | docs/frontend/canvas.md |
| 13 | Runtime Enhancements | docs/agent-runtime/workspace-runtime.md |
| 14 | Production Hardening | docs/architecture/provisioner.md, CLAUDE.md |
| 15 | Per-Workspace Dir | PR #38 — workspace_dir per workspace |
| 16 | Plugin System | PR #39 — per-workspace plugins with registry |
| 17 | Agent GitHub Access | PR #40 — git/gh in images, GITHUB_TOKEN env |
| 18 | File Browser Lazy Loading | PR #37 — depth=1, path traversal protection |
| 19 | MCP Full Coverage | PR #40 — 52→54 tools (plugins, global secrets, pause/resume, org, delegation) |
| 20 | Canvas UX Sprint | PRs #4, #21, #39 — Settings Panel, Onboarding, Plugins UI, Pause/Resume |
| 21 | Claude Agent SDK Migration | PR #48 — ClaudeSDKExecutor replaces CLI subprocess |
| 22 | Cron Scheduling | PR #49 — recurring tasks via cron expressions, Canvas Schedule tab |
| 23 | Code Quality & Multi-Provider | PR #50 — model fallback, DeepAgents full SDK, 7 LLM providers, 100% test coverage |
| 24 | Async Delegation | PR #41 — non-blocking delegation with status polling, check_delegation_status tool |
| 25 | Social Channels | PR #54 — adapter-based Telegram integration, Canvas Channels tab, 7 MCP tools, hot reload, multi-chat IDs, auto-detect, /start auto-reply, full Telegram Bot API audit fixes |
| 26 | Auth Env Vars | PR #55 — required_env config replaces .auth-token files, env-var only path; reno-stars 15-agent org template |
| 27 | Channel Polish & Org Auto-link | PR #56 — poller lifetime fix (bgCtx), Restart Pending button (only when needed), org template channels: field auto-links Telegram on import |
Phase 12: Code Sandbox — DONE
Three-backend sandbox for the
run_codetool, selectable per-workspace viaSANDBOX_BACKENDenv (set fromconfig.yaml → sandbox.backend).
run_codetool —workspace-template/builtin_tools/sandbox.pysubprocessbackend (default) — asyncio subprocess with hard timeoutdockerbackend — throwaway container with resource limits (MVP)e2bbackend (cloud) — E2B microVMs viae2b-code-interpreter, readsE2B_API_KEY- Sandbox config —
SandboxConfigdataclass inworkspace-template/config.py
Firecracker-as-a-backend is intentionally skipped: each tenant platform now
runs on a Fly Machine (which IS a Firecracker microVM — see Phase 32
Phase B), so the entire workspace process is already Firecracker-isolated
from other tenants. Running Firecracker inside Firecracker would double-
nest for no additional security. For stronger per-call isolation within
one tenant, use the e2b backend.
Phase 20: Canvas UX Sprint — MOSTLY COMPLETE
UX specs created by UIUX Designer agent. See
docs/ux-specs/for full specs.
20.1 Settings Panel (Global Secrets UI) — DONE
Spec: docs/ux-specs/ux-spec-settings-panel.md
- Gear icon in canvas top bar (Cmd+, shortcut)
- Slide-over drawer (480px, right-anchored)
- Service groups (GitHub, Anthropic, OpenRouter, Custom)
- CRUD: add, view (masked), edit, delete secrets
- Empty state with guided setup
- Unsaved changes guard on close
20.2 Onboarding / Deploy Interception — DONE
Spec: docs/ux-specs/ux-spec-onboarding-interception.md
- Pre-deploy secret check — detect missing API keys per runtime
- Missing Keys Modal — inline form, only asks for what's needed
- Provisioning timeout → named error state with recovery actions
- No dead ends — every error has a fix action
20.3 Canvas UI Improvements — PARTIAL
Spec: docs/ux-specs/ux-spec-canvas-improvements.md
- Plugins install/uninstall in Skills tab (PR #39)
- Pause/resume from context menu
- Org template import from canvas (PR —
OrgTemplatesSectionin TemplatePalette) - Workspace search (Cmd+K)
- Batch operations
Phase 30: SaaS — Remote Workspaces & Cross-Network Federation — IN PROGRESS
Goal: let a Python agent running on a laptop in another city boot, register, authenticate, accept A2A from its parent PM on the platform, and appear on the canvas as a first-class workspace.
Why now: the self-hostable single-box model has landed; the next meaningful expansion is letting orgs span machines and networks. This is the step that turns Molecule AI from "Docker-compose on one box" into a multi-tenant SaaS-shaped product.
Design thesis: ride the existing runtime='external' escape hatch.
Every Docker-touching handler already short-circuits when a workspace
is external. We don't need a parallel subsystem — we need to close
four small gaps and add per-workspace auth. See
docs/remote-workspaces-readiness.md
for the full code audit.
Shipping order (eight bounded steps, ~2 weeks to GA)
-
30.1 Workspace auth tokens — foundation; prevents spoofing. New
workspace_auth_tokenstable;POST /registry/registerissues a token; middleware validatesAuthorization: Bearer <token>on/registry/heartbeat,/registry/update-card. Lazy bootstrap so in-flight workspaces upgrade gracefully. Transparent to local containers — provisioner carries the token through the existing env-var pattern. No feature flag. -
30.2 Secrets pull endpoint —
GET /workspaces/:id/secrets/valuesreturns decrypted secrets JSON, gated by the 30.1 token. Local agents can use it too (removes env-at-create coupling for rotating secrets). -
30.3 Plugin tarball download —
GET /plugins/:name/downloadreturns a tarball; agent unpacks locally. Replaces Docker-exec plugin install for remote agents. BehindREMOTE_PLUGIN_DOWNLOAD_ENABLED. -
30.4 Workspace state polling —
GET /workspaces/:id/statereturns{status, paused, deleted_at, pending_events[]}as a drop-in for the WebSocket feed remote agents can't reach. BehindREMOTE_STATE_POLLING_ENABLED. -
30.5 A2A proxy token validation — the proxy enforces the caller's auth token on
POST /workspaces/:id/a2a. Mutual auth between agents. -
30.6 Direct sibling discovery + URL caching — agents call
GET /registry/{parent_id}/peersonce, cache sibling URLs, call them directly for A2A. Resilient to brief platform outages. -
30.7 Poll-liveness for external runtime —
LivenessCheckerinterface inregistry/;PollLivenessmarks offline if no heartbeat in 90s. Docker checker becomes one implementation, poll-liveness another. Health sweep routes by runtime. BehindREMOTE_LIVENESS_POLLING_ENABLED. -
30.8 Remote-agent SDK + docs —
sdk/python/molecule_agent/thin client: register → pull secrets → run A2A loop → poll state → heartbeat. Workingsdk/python/examples/remote-agent/a new user can run on a laptop. Remove the three feature flags. Remote workspaces become GA.
Out of scope for Phase 30
- Mutual TLS / platform-identity verification from the agent side. Agent trusts any platform URL in its env. Defer until real multi- tenant deployment forces the question.
- Agent-to-agent mesh across NATs. Direct sibling calls only work when siblings are reachable from each other. Behind-NAT ↔ behind-NAT needs a relay — defer to Phase 31.
- Platform-managed persistent state for remote agents. Remote agents own their filesystem; platform never mounts.
Success criteria
sdk/python/examples/remote-agent/boots on a laptop disconnected from the platform's LAN, registers, receives a task from parent PM via A2A, returns a result, appears on the canvas.tests/e2e/test_federation.shspawns a second platform instance + remote agent pointing at the first; both platforms see the agent as a workspace in the right state.- Spoofing test: attempt to impersonate a workspace with a guessed ID but no token → 401.
Phase 31 — Quality + Infra Pass (Q2 2026) — SHIPPED 2026-04-13
Completed in PRs #1–#8 and documented in docs/edit-history/2026-04-13.md:
- Brand migration cleanup — LICENSE "Agent Molecule" → "Molecule AI"; new icon assets (PR #1).
- Repo structural cleanup — moved
examples/remote-agent/→sdk/python/examples/,docs/superpowers/plans/→plugins/superpowers/plans/; deleted emptyplatform/plugins/; gitignored.agents/,platform/workspace-configs-templates/,backups/,logs/,test-results/; added READMEs undertests/anddocs/(PR #3). - MCP per-domain split —
mcp-server/src/index.ts1697 → 89 lines; 12 per-domain modules insrc/tools/; sharedsrc/api.ts; startup log now reports 87 tools (PRs #2, #4, #7). - Canvas dialog unification — native
confirm()/alert()replaced withConfirmDialogin 7 sites; newsingleButtonprop + 5 tests (vitest 352 → 357). - Platform handler decomposition — 4 oversize functions (
proxyA2ARequest,Delegate,Discover,SessionSearch) split into testable helpers; +47 Go tests;handlerscoverage 56.1% → 57.6%. - Env-var documentation —
.env.examplegained 11 previously-undocumented vars; all 21 distinctos.Getenv/envx.*keys now documented. - E2E hardening + CI — Phase 30.1 bearer auth + Phase 30.6
X-Workspace-IDrequirements baked intotest_api.sh(62/62) andtest_comprehensive_e2e.sh(67/67); shared_lib.sh+_extract_token.py; new CI jobse2e-apiandshellcheck;setup-gogains module cache (PRs #5, #7, #8).
PR Workflow Rules
All PRs must follow this checklist:
- Branch: Never push to main. Always create a feature/fix branch.
- Code Review: Run
/code-reviewskill and fix all issues before requesting merge. - Tests: All existing tests must pass. New features require new tests.
- Documentation: Run
/update-docsskill. Every PR must update:docs/edit-history/session log- Relevant docs in
docs/(API, architecture, frontend, etc.) CLAUDE.mdif routes, env vars, or commands changedPLAN.mdif the work completes a phase or adds new items
- E2E Test: Rebuild, restart service, and manually verify before reporting done.
- QA Review: QA Engineer reviews for edge cases, plan compliance, and documentation completeness before CEO merge approval.
- CEO Approval: Only the CEO approves merges. Never merge without explicit approval.
Ecosystem Awareness
Adjacent projects worth tracking (Holaboss, Hermes, gstack, …) are catalogued
in docs/ecosystem-watch.md. Skim quarterly,
add entries liberally, and when one of those projects ships something we
should react to, file a "Signals to react to" line in that doc and create a
Backlog entry below pointing at it. Agents doing research or strategy work
should read docs/ecosystem-watch.md first — it's the canonical starting
point for "what else is out there."
Backlog (prioritized)
- Canvas: Org template import — Phase 20.3 (deploy org from canvas UI)
- Canvas: Workspace search (Cmd+K) — Phase 20.3 (quick find)
- Canvas: Batch operations — Phase 20.3 (multi-select delete/restart)
- Sandbox: Firecracker/E2B backends — Phase 12 (production isolation)
- NemoClaw adapter — stub exists at
adapters/nemoclaw/, no implementation yet - Remote plugin registry — install plugins from npm/git (currently local only)
- Agent git worktrees — per-agent branches without full clone
- SDK follow-ups — live tool-call visibility, cost telemetry, cancel UX, governance hooks
- Real webhook mode for channels — Phase 27 candidate. Currently polling-only; webhook needs:
mode: "webhook"|"polling"config fieldPUBLIC_URLenv var- Platform calls
setWebhookon channel create (with randomwebhook_secret),deleteWebhookon delete - Canvas toggle to enable webhook mode (only when PUBLIC_URL is set)
- Polling works fine for ≤hundreds of bots; webhook needed at thousands+ scale or for serverless
- More channel adapters — Slack (OAuth + Events API), Discord (Bot + Gateway), WhatsApp (Cloud API)
- Delegations list endpoint mismatch —
GET /workspaces/:id/delegationsreturns[]while the agent's internalcheck_delegation_statusshows active/completed delegations. One source of truth. - YAML-configurable per-agent repo access — new
workspace_access: none|read_only|read_writefield inorg.yaml+:robind-mount for research agents; eliminates the "PM couriers documents to reports" workaround. - SDK executor swallows subprocess stderr —
workspace-template/claude_sdk_executor.pysurfaces only "Command failed with exit code 1 / Check stderr output for details" when theclaudeCLI crashes, making every failure opaque. Capture stderr, log at ERROR, include first ~1 KB in the A2A error response. High priority — blocked real debugging during PLAN.md coordination on 2026-04-12. - Agent MCP client defaults to
localhost:8080— inside a workspace container,localhostis the container itself, not the platform — somcp__molecule__*tools fail with "platform unreachable." InjectMOLECULE_URL=${PLATFORM_URL}into every container at provision time and change the MCP client default tohttp://host.docker.internal:8080. High priority — blocks agents from calling platform tools (e.g. PM couldn't restart its own reports).
Note: items 11–14 previously carried sequential refs
#64–#67. Those refs were placeholder enumeration, not GitHub issues. They now collide with actual merged PRs and issues with different scopes, so the refs were removed in 2026-04-14 tick-5. If/when these items get prioritized, file real GitHub issues for them.
- Workspace
restart_prompt— user-defined restart context (#19 Layer 2) — GitHub issue #66 (new 2026-04-14 tick-4 follow-up to PR #65 which shipped Layer 1). Letconfig.yaml/org.yamldeclare a user-authoredrestart_promptthat is delivered alongside the platform-generated restart-context system message — e.g. "re-read your CLAUDE.md, re-hydrate TODOs from memory, resume the active delegation." Layer 1 (platform state snapshot) already ships; Layer 2 adds the user-defined side.
Recently launched (2026-04-14 tick-4)
- GitHub issue #15 — Provisioner: auto-refresh
CLAUDE_CODE_OAUTH_TOKENfromglobal_secretson workspace restart → DONE via PR #64 (SetGlobal/DeleteGlobalnow fan outRestartByIDto every affected workspace). - GitHub issue #19 Layer 1 — Platform-generated restart context → DONE via PR #65 (synthetic A2A
message/sendwithmetadata.kind=restart_context,system:restart-contextcaller prefix, 30s re-register wait). Layer 2 deferred to issue #66 (see Backlog item 15 above).
Recently launched (2026-04-15 overnight sweep — ticks 17–30+, ~27 PRs)
Security hardening cluster. Roughly half the sweep was closing auth gaps surfaced by the Security Auditor's hourly audit cron:
#94RFC-1918 + link-local in registry URL validator#99AdminAuth gate onGET /workspaces(topology leak / #104)#106path-sanitize + admin-gatePOST /org/import(#103 HIGH)#110revokeworkspace_auth_tokenson workspace delete#119IPv6 SSRF blocklist (fe80::/10, ::1/128, fc00::/7) + scheduler unit tests#162field-level authz onPATCH /workspaces/:id(#138 — cosmetic vs sensitive split)#155wire existingSecurityHeadersmiddleware into router#167gate 6 previously-unauth routes behindAdminAuth(#164 CRITICAL anon bundles/import; #165 HIGH events+bundles/export topology leak; #166 MED viewport+liveness)#185AdminAuthonGET /approvals/pending(#180)#200AdminAuthonPOST /templates/import(#190 HIGH)#203CanvasOrBearermiddleware — route-split for #168 canvas regression, onlyPUT /canvas/viewport; rejected PR #194's broader Origin-fallback approach because it would have re-opened #164#209source_id spoof defense inactivity.Report(cherry-picked from the rejected #169 batch)#233resolveInsideRootonPOST /workspaces template/runtime(#226 MED)
Data integrity. Three bugs that would have silently corrupted state:
#212CRITICAL migration-runner bug —RunMigrationsglobbed*.sqland alphabetically ran.down.sqlBEFORE.up.sqlon every boot, wipingworkspace_auth_tokens(and 018/019 pairs). Filter fix + unit test inpostgres_migrate_test.go.#224YAML injection ingenerateDefaultConfig— body.Name now emitted as a double-quoted YAML scalar with all control chars escaped. Structural test (parse + verify key count).#236log-injection in the #209 security-event log line — attacker-controlled source_id echoed via%sallowed fake log entries; switched to%q.
CI / infra.
#186+ controlplane#28— every CI job migrated fromubuntu-latestto[self-hosted, macos, arm64](Mac minihongming-m1-mini). Non-trivial:services:replaced with inlinedocker runcontainers (ports 15432/16379),actions/setup-pythonbypassed via Homebrew python3.11 on$GITHUB_PATH,docker/setup-qemu-actionadded for cross-arch builds. Workaround for GH Actions billing cap on private repos.#149independent heartbeat pulse goroutine so long cron fires don't look stale on/admin/liveness(#140)#211migration runner regression (see #212 above — PR #212 is the fix)- Fly registry
FLY_API_TOKENrotated to a deploy token scoped tomolecule-tenant(previously personal token, invalidated byflyctl auth loginduring the malware cleanup)
Platform / Scheduler reliability.
#95panic-recover in schedulertick()+ per-fire goroutines (closes #85)#207concurrency-aware skip —scheduler.fireSchedulereadsworkspaces.active_tasksand advancesnext_run_at+ records acron_runrow withstatus='skipped'instead of colliding with a busy agent (#115)#206surfaceerror_detailin schedule history API (#152 problem B)
Workspace runtime features.
#205idle-loop reflection pattern — opt-inidle_prompt+idle_interval_secondsinconfig.yaml; self-sends whenheartbeat.active_tasks == 0. Hermes/Letta shape.#208Hermes Phase 1 multi-provider registry — 15 providers viaadapters/hermes/providers.py(Nous, OpenRouter, OpenAI, Anthropic, xAI, Gemini, Qwen, GLM, Kimi, MiniMax, DeepSeek, Groq, Together, Fireworks, Mistral). 26 tests.#198A2A protocol compliance batch (#173/#174/#175):cancel()emitsTaskStatusUpdateEvent(canceled, final=True),stateTransitionHistory=Truein AgentCapabilities. Regression:push_sender=PushNotificationSender()crashed on startup because PushNotificationSender is abstract — reverted in #210.#216idle-loop pilot enabled on Technical Researcher workspace.#225+#235auth_headers()on/registry/register+ initial_prompt + idle loop self-posts (#215/#220)#231Claude SDK stderr probe for proper rate-limit error attribution (#160 diagnostics)
Controlplane (molecule-controlplane).
#19+#20Grafana Cloud remote-write counter registry (cp_requests_total), push loop toprometheus-prod-32-prod-ca-east-0.grafana.net, Basic auth with user 3116422#21AWS KMS envelope encryption — per-secret DEK viaGenerateDataKey, dual-mode (v2 blobs via KMS, legacy via static key, auto-routes by leading byte)#24/cp/statusdeep probe for Betterstack#26+#27public/legal/{terms,privacy,dpa,acceptable}pages from embedded markdown + smoke coverage- Isolation red-team test suite + observability runbooks (Grafana dashboard, Betterstack, Stripe Atlas)
Self code-review follow-ups (#228 + #232). Ran /code-review on the batch merges, surfaced 8 🟡 issues, split into Go (#228) and Python/docs (#232):
CanvasOrBearerinvalid-bearer fall-through fixshort()helper replacing unsafe[:N]slices inscheduler.go- 6 new tests (
TestShort_helper,TestRecordSkipped_*,TestActivityHandler_Report_*,TestHistory_IncludesErrorDetail) - idle-loop hardening (
asyncio.get_running_loop(),IDLE_FIRE_TIMEOUT_SECONDSclamp, typed exception handling,add_done_callbackfor fire-and-forget error logging) idle_prompt/idle_interval_secondsdocumented inorg.yamldefaults- New
docs/runbooks/admin-auth.md— the three middleware variants + three-question test for adding toCanvasOrBearer
Test counts post-sweep: +70 Go (816 total), +40 Python (1180 total), +0 Canvas vitest (453 unchanged — UI/a11y patches only).
Outstanding (user action): #126 Slack adapter (Phase-H product decision), #160 Claude Max OAuth quota (wait for 2026-04-17 23:00Z reset OR upgrade OR switch to ANTHROPIC_API_KEY), #191 runner persistent-state docs (P3), #199 Fly registry token (resolved this session but publish-platform-image re-run pending runner), Stripe Atlas application (launch blocker, 2-week lead).
Recently launched (2026-04-15 tick-9)
- Phase 32 Phase B.2 (image pipeline) — PR #80 (merged
c3cc8e87) adds.github/workflows/publish-platform-image.yml: on every main-merge touchingplatform/**, buildsplatform/Dockerfileand pushesghcr.io/molecule-ai/platform:latest+:sha-<commit>to GHCR. Paired with the privatemolecule-controlplaneFly + Neon provisioner (PR #3 there, merged2e85d5ad) that readsTENANT_IMAGEenv and boots tenant Fly Machines from this image. Tick-8 docs-sync PR #79 (mergedd53a1287) also landed.
Recently launched (2026-04-14 tick-8)
- Phase 32 PR #1 —
TenantGuardmiddleware (PR #78, merged57a05686). Public repo's only SaaS hook: whenMOLECULE_ORG_IDenv is set, non-allowlisted requests require matchingX-Molecule-Org-Idheader or 404. Unset → passthrough (self-hosted unchanged). Allowlist is exact-match:/health+/metrics. Paired with the privateMolecule-AI/molecule-controlplanerepo scaffolded this tick (Fly Machines provisioner stub,/cp/orgsCRUD, subdomain→fly-replay router, migrations 001-003 fororganizations/org_instances/org_members). +6TestTenantGuard_*tests. Phase 32 plan: follow-up PRs wire real Fly provisioner, WorkOS AuthKit, Stripe, Cloudflare, signup UX — all in the private repo except the single public middleware.
Recently launched (2026-04-14 tick-7)
- GitHub issue #24 — Runtime-added workspace_schedules drift on org re-import → DONE via PR #76 (new
sourcecolumn onworkspace_schedulesvia migration022; org/import now upserts withON CONFLICT (workspace_id, name) DO UPDATE ... WHERE source='template', so runtime-added rows survive re-imports; legacy rows backfilled to'template'; +3 tests). - GitHub issue #51 — PM hardcoded audit-category routing → DONE via PR #75 (generic
category_routing:block inorg-templates/<name>/org.yamldefaults+ per-workspace override; rendered into each workspace'sconfig.yamlviarenderCategoryRoutingYAMLusingyaml.Node+yaml.Marshalfor safe escaping; PM prompt replaced with generic config-lookup; +6 tests). - PR #74 —
org-templates/molecule-dev/org.yamlrole overrides shrunk to just the deltas now that UNION semantics (PR #71) are in effect — removes verbose re-listing of defaults across PM, Research Lead, Research sub-roles, Security Auditor, UIUX Designer.
Recently launched (2026-04-14 tick-6)
- GitHub issue #68 — Per-workspace
plugins:REPLACE semantics caveat → DONE via PR #71 (mergePluginshelper inplatform/internal/handlers/org.gonow UNIONs per-workspace withdefaults.plugins;!pluginor-pluginprefix on a per-workspace entry opts a default out; +5TestPlugins_*tests). Role overrides inorg-templates/*/org.yamlcan now declare just the delta instead of restating every default.
Recently launched (2026-04-14 tick-5)
- PR #70 — Wired the 12 modular plugins from PR #63 (tick-4) into the default
molecule-devorg template.defaults.pluginsexpands from 3 → 9 (safety hooks + operational-memory skills become universal); PM role gainsmolecule-workflow-triage+molecule-workflow-retro, Security Auditor gainsmolecule-skill-code-review+molecule-skill-cross-vendor-review+molecule-skill-llm-judge. Verbose per-role re-listing is a consequence of REPLACE (not UNION) semantics inplatform/internal/handlers/org.go; union-semantics proposal tracked as issue #68. - PR #69 — Backlog items 11–14 stripped of stale sequential refs
#64–#67(see footnote near item 15 above).
Test Coverage
| Stack | Tests | Framework |
|---|---|---|
| Go (platform) | 726 | go test -race (raw PASS lines incl. subtests; +6 top-level Test* this tick: #64 secrets auto-restart x2, #65 restart-context x4) |
| Python (workspace) | 1,140 | pytest |
| Canvas (frontend) | 357 | Vitest |
| SDK (python) | 132 | pytest |
| MCP server | 97 | Jest |
| Total | 2,452 |
E2E: 67/67 comprehensive checks passing, 62/62 API tests (also gated in CI e2e-api job), shellcheck-clean across all 5 E2E scripts.
Team Assignments
| Agent | Current Focus |
|---|---|
| PM | Sprint coordination, backlog prioritization |
| Dev Lead | Engineering planning, PR review |
| UIUX Designer | UX specs for Phase 20 (DONE — 5 specs delivered) |
| Frontend Engineer | Phase 20.3 remaining items (org import, search, batch) |
| Backend Engineer | Sandbox production backends, API completeness |
| QA Engineer | Review every PR for docs + plan compliance |
| DevOps Engineer | CI/CD, Docker image optimization |
| Security Auditor | API key handling, path traversal, auth review |
Next Steps
- Frontend Engineer implements remaining Phase 20.3 items (org import from canvas, Cmd+K search)
- Backend Engineer scopes Firecracker/E2B sandbox backends (Phase 12)
- QA Engineer reviews PR #52 for docs compliance before merge
- All agents use
GITHUB_TOKENenv var to clone repo, branch, and create PRs
Plugin Adaptor System — shipped; deferred follow-ups only
The system is done. Landed (see feat/plugin-adaptor-registry and feat/agentskills-compliance):
per-runtime plugin adaptors, hybrid resolver (registry > plugin-shipped >
raw-drop), AgentskillsAdaptor covering rule+skill plugins for all
runtimes, /plugins?runtime= filter, /workspaces/:id/plugins/available
endpoint, molecule-plugin SDK, gemini org parity with molecule-dev,
and full agentskills.io spec compliance for all first-party skills
(installable in Claude Code, Cursor, Codex, and ~35 other skill-compatible
tools — see docs/plugins/agentskills-compat.md).
Deferred, not blocking:
- Upstream
runtime-adapters/extension to agentskills.io spec — once we've lived with our own per-runtime adapter model for ~month, propose it as a spec extension toagentskills/agentskillsso other tools can share Molecule AI-authored adaptors. - Install-from-GitHub-URL flow —
POST /plugins/install {git_url}that clones a repo into the registry, validates the manifest, and runs the adaptor through a sandbox. Needs signature/version pinning and a review of the adaptor-execution threat model before shipping. - Promote-to-default UI — today, promoting a community plugin to
"curated" means manually copying its
adapters/<runtime>.pyintoworkspace-template/plugins_registry/<plugin>/. Later add a canvas button + PR template that opens an upstream PR automatically. - Plugin packs — manifest that lists other plugins to bundle
(
superpowers-pack→ installsuperpowers-tdd+superpowers-debug+ …). Skip until a real user asks; first-party plugins are small enough to install individually today. - Hot-reload on DeepAgents — upstream docs say skills/sub-agents are startup-only; would need platform-level container restart on plugin file change. Defer until users complain.
- Atomic split of first-party plugins —
superpowersandeccstill ship as multi-skill bundles. Pipeline already supports splitting but non-urgent. - Sub-agent plugins for non-DeepAgents runtimes — Claude Code / LangGraph don't have a native sub-agent feature; emulating via tool-routing is possible but invasive. Defer.
- Workspace install tracking table — a
workspace_plugin_installstable would let uninstall call the adaptor'suninstall()path reliably. Today uninstall is arm -rf /configs/plugins/<name>which leaves copied skill dirs behind. Low user impact. - Shared org-template
system-prompt.mdvia_shared/— DRY molecule-dev and molecule-worker-gemini. Drift risk; revisit at 3+ orgs.
Phase 32 — Cloud SaaS launch (2026-Q2/Q3)
Goal: ship Molecule AI as a multi-tenant cloud SaaS (not just self-hosted per-customer). Ordered by dependency + ROI.
Current state (2026-04-15)
Live infrastructure:
- Control plane deployed: https://molecule-cp.fly.dev (Fly app
molecule-cp, 2 machines, Neon projectmolecule-cp/cool-sea-89357706) - Tenant app: Fly app
molecule-tenant(Neon parent projectmolecule-tenants/dawn-bar-08311714, tenants get a branch per org) - Shared Redis: Upstash
grateful-prawn-89393.upstash.io(key-prefix isolation, Phase H moves to per-tenant) - Container registry:
registry.fly.io/molecule-tenant:latest(mirrored fromghcr.io/molecule-ai/platform:latestvia GH Actions on every main push) - First real tenant provisioned: org
acme→ Fly machine + Neon branch + encrypted URLs inorg_instances - WorkOS AuthKit live at
/cp/auth/{signup,login,callback,signout,me}— hosted signup redirects correctly; see https://molecule-cp.fly.dev/cp/auth/signup - Stripe billing scaffold deployed in orgs-only mode (no Stripe creds configured yet; webhook handler + signature verification code ready)
- Domain:
moleculesai.app(DNS not yet wired — subdomain routing works viaX-Molecule-Org-Slugheader pending Cloudflare)
Phase status (post 2026-04-15 overnight sweep):
- A — Foundation (accounts, tokens, domain): ✅ done
- B — Fly provisioner + Neon branching: ✅ done
- C — WorkOS AuthKit scaffold + RequireSession + org-ownership check: ✅ done
- D — Stripe billing scaffold + auth-scoped checkout + plan quotas: ✅ code done; live keys pending Stripe Atlas
- E — Cloudflare + DNS
*.moleculesai.app+ per-tenant Vercel canvas: ✅ done - F — Sign-up UX + onboarding: ✅ basic flow done (signup / org create / canvas redirect); polish + email pending
- G — Observability + quotas + admin: ✅ Sentry + Grafana remote-write +
/cp/statusBetterstack probe + per-org rate limiter; admin panel/cp/admin/*pending - H — Hardening: ⏳ partial — AWS KMS envelope encryption ✅ (controlplane PR #21), tenant-isolation red-team CI gate ✅ (
isolation_test.go), legal pages ✅ (/legal/*from controlplane PR #26); load test + Stripe Atlas application + status page custom domain pending - I — Launch: pending Stripe Atlas (~2 week lead)
Live infrastructure deltas (post-sweep):
- Migration runner safety fix landed (#212) —
*.down.sqlfilter; was wipingworkspace_auth_tokenson every restart - Workspace auth tokens now revoked on workspace delete (#110)
- All known unauth admin routes gated; #138 canvas regression resolved via field-level authz +
CanvasOrBearermiddleware - Self-hosted Mac mini CI runner replaced GH-hosted Linux to bypass private-repo Actions billing cap;
FLY_API_TOKENrotated to a deploy token scoped tomolecule-tenantafter the personal token was invalidated byflyctl auth loginduring the 2025-12-06 cryptominer cleanup /legal/{terms,privacy,dpa,acceptable}live athttps://app.moleculesai.app/legal/*
Known open issues on the live system:
- Tenant
/workspacesreturns Neon pooler warnings (unnamed prepared statement does not exist) — lib/pq + Neon pooler incompatibility, tracked for lib/pq → pgx migration in a later phase #160Claude Max OAuth quota exhausted on the agent-fleet token until 2026-04-17 23:00 UTC; mitigations: wait, upgrade plan, OR switch workspace containers toANTHROPIC_API_KEYenv var#191self-hosted runner persistent-state docs (P3, low urgency)#199Fly registry token — resolved in the 2026-04-15 sweep butpublish-platform-imagere-run pending runner availability
Companion repo: Molecule-AI/molecule-controlplane (private). n8n-style open-core split: this public repo stays OSS (tenant binary + plugins + channels, contributable surface); control plane (orgs / signup / billing / provisioner / routing) is private. See molecule-controlplane/PLAN.md for its roadmap.
Tier 1 — blocks multi-tenant launch
- Multi-tenancy:
organizationstable,org_idFK +WHERE org_id = $caller_orgfilter on every row-returning handler (workspaces,workspace_secrets,global_secrets,activity_logs,structure_events,agent_memories,workspace_schedules,workspace_channels). Middleware resolves caller's org from session token → ctx. Full security audit of tenant isolation before first external user. - Human auth + orgs: WorkOS AuthKit (NOT build-yourself, NOT Clerk — WorkOS treats per-org SSO as first-class; Clerk treats it as an upsell). Keep Phase 30.1 bearer tokens for machine-to-machine (agents). Stripe integration via WorkOS hooks.
- Container isolation: replace raw-Docker-socket provisioner
with Fly Machines API (Firecracker microVMs, per-workspace
isolation, sub-second boot, pay-per-second). Today's shared
/var/run/docker.sockis an RCE-to-host footgun that cannot ship multi-tenant.provisionerinterface stays — only backend swaps. Docker path remains for local dev. - Stripe billing: subscriptions + usage metering (workspace-hours, LLM-token pass-through, storage), trial flow, dunning, invoices.
- Per-org resource quotas: tier memory/CPU is configurable (PR #58) but unenforced at provision time. Add per-org ceilings: max workspaces, max concurrent-running, max total memory.
- Managed Postgres + Redis: move off
docker-composefor prod. Neon (serverless, branch-per-PR) for Postgres; Upstash for Redis. Alternative: drop Redis entirely —LISTEN/NOTIFY- advisory locks cover heartbeat TTL + URL cache.
- Secrets at rest via KMS: current
SECRETS_ENCRYPTION_KEYis a single static AES-256 key. Move to AWS/GCP KMS-backed envelope encryption; thesecrets_encryption_versiontable slot is already reserved for rotation. - Migration runner out of app boot: a bad migration currently crashes platform boot with no rollback. Extract to goose as a release step / init container. Auto-discovery runner stays for dev mode only.
Tier 1 follow-ups (before customer #1)
- Observability: wire
/metricsto a scraper (Grafana Cloud or self-hosted). Add Sentry for Go + Next.js error tracking. Langfuse stays for LLM traces. - Rate limiting per-org: global
RATE_LIMIT=600/minis a shared bucket today. Needs per-org + per-endpoint buckets. - Cloudflare in front: WAF + CDN + DDoS. Free tier covers pre-revenue.
- Sign-up / onboarding flow: landing → signup → first workspace in 60 seconds. No such flow today.
- Transactional email: Resend or Postmark.
- Admin panel: view orgs, suspend accounts, see usage, issue refunds. SQL-only at first; UI by ~50 orgs.
- Privacy policy + ToS + DPA: real ones, vetted. GDPR / CCPA data-export + deletion endpoints (workspace-export already exists; need org-level).
Tier 2 — tech-stack upgrades (high ROI, non-blocking)
- Go platform: migrate
lib/pq→ pgx/v5 (1–2 days;lib/pqin maintenance since ~2021). Then sqlc incrementally for new queries — keeps the no-ORM philosophy + typed Go. - Platform async: River (Postgres-backed, Go-native job
queue). Delegation dispatch,
workspace_schedulescron, future billing events + webhook fan-out all migrate cleanly. NOT Temporal — Temporal already ships in workspace-template as an agent tool; keep the separation. - Frontend: TanStack Query for server state. Zustand keeps
pure UI state. Stops reimplementing cache / refetch / dedup. WS
updates flow via
qc.setQueryData. Single highest-ROI frontend refactor. - Turbopack for
next build: one flag, 2–5× cold-build speedup. - Python workspace runtime → uv:
uv pip installinentrypoint.shcuts workspace cold-start 10–100×. User-visible latency win. - Python MCP client inside runtime: today
mcp-server/exposes the platform as an MCP server; agents inside workspaces can't yet consume external MCP servers. Closing the gap joins the winning 2026 ecosystem. - shadcn/ui CLI convention: already Radix + Tailwind;
adopt
npx shadcn add …passively for new components. No rewrite.
Tier 3 — explicitly NOT doing
- Kubernetes: company-of-one cannot run K8s. Fly Machines covers isolation without the ops tax.
- ORM (GORM / ent / bun): raw-SQL + sqlc covers every case.
- Framework swap (Next → Vite / TanStack Start): 2-week rewrite buys nothing users see.
- Auth-from-scratch: every hour on auth is an hour not on product.
- Canvas library swap (xyflow → tldraw): xyflow is still the correct tool for typed node graphs.
Tier 4 — compliance / enterprise (when revenue lands)
- SOC 2 via Drata / Vanta
- Status page (Betterstack or Instatus)
- Staging environment that mirrors prod
- Blue-green / canary deploy pipeline
- Per-org backup + point-in-time restore
- Load testing (
hey/vegeta) — current per-node ceiling unknown
Success criteria for Phase 32
- Customer can sign up at moleculesai.app, create an org, deploy their first workspace, send their first message in < 5 minutes.
- Two orgs on the same cluster cannot observe each other's workspaces, secrets, memory, or activity — verified by automated tenant-isolation test + manual red-team.
- Fly Machines cost per active workspace-hour documented and reproducible.
- Stripe-backed subscription + usage-based add-ons working end-to- end in sandbox.
- One paying design partner on the cluster, paying a real invoice.
Phase 33: Wildcard DNS + Cloudflare Worker Proxy
Goal: Eliminate DNS propagation delays and NXDOMAIN caching for tenant subdomains. Every SaaS (Vercel, Railway, Fly.io) uses this pattern — wildcard DNS + edge proxy routing by hostname.
Docs:
docs/architecture/wildcard-dns-proxy.md
Phase 33.1 — Worker + wildcard DNS (no tenant changes)
- Create Cloudflare Worker that extracts slug from hostname, looks up backend IP from CP API, proxies request to EC2
- Add
GET /cp/orgs/:slug/instanceendpoint to CP (public, rate-limited) - Add
*.moleculesai.appwildcard DNS record (proxied, orange cloud) - Worker serves static "provisioning" splash page when tenant not ready
- Deploy Worker via
wrangler deploy+ GitHub Actions - Verify Worker routing works for existing tenants alongside old A records
Phase 33.2 — Stop per-tenant DNS records
- Remove Cloudflare A record creation from
ec2.goprovisioner - Remove Cloudflare DNS cleanup from deprovision/purge cascade
- Existing A records coexist harmlessly (explicit wins over wildcard)
Phase 33.3 — Remove Caddy from EC2
- Worker handles TLS termination — EC2 runs plain HTTP only
- Remove Caddy install + Caddyfile from EC2 user-data script
- EC2 security group: allow inbound HTTP from Cloudflare IPs only
- ~30s faster cold start (no apt-get caddy, no Let's Encrypt)
Phase 33.4 — Cleanup
- Delete old per-tenant A records from Cloudflare
- Remove
cloudflareapi/package from CP (Worker replaces it) - Update
docs/runbooks/saas-secrets.mdwith Worker secrets
Success criteria for Phase 33
- New org subdomain resolves instantly (zero DNS wait)
- No NXDOMAIN caching — user never sees "site can't be reached"
- Provisioning splash page shown while EC2 boots (auto-refreshes)
- Cold start ~30s faster (no Caddy/Let's Encrypt)
- Cost: Cloudflare Worker free tier or $5/mo
Phase 35: SaaS Production Hardening (post-2026-04-17 retrospective)
Goal: Address security gaps, remove debug code, fix workspace registration, and reduce boot time identified during the SaaS buildout session. See
docs/retrospectives/2026-04-17-saas-buildout.mdfor full context.
Phase 35.1 — Security (CRITICAL, before any public launch)
- Fix #756 — X-Workspace-ID header forge bypasses CanCommunicate (derive callerID from authenticated token, not raw header)
- Fix #757 — GLOBAL memory poisoning mitigations (content delimiters
- audit log at minimum)
- Remove ADMIN_TOKEN from public
/cp/orgs/:slug/instanceendpoint — store in Worker KV at provision time instead - Encrypt ADMIN_TOKEN in
org_instancestable (use envelope key) - Remove debug HTTP server (:9999) from workspace boot script
- Remove
set -exfrom boot scripts (leaks env vars to EC2 console) - Restrict workspace EC2 security group (Cloudflare IPs + tenant IP only)
- Add HTTPS between Worker and EC2 (or Cloudflare Tunnel)
Phase 35.2 — Workspace registration fix
- Pass workspace auth token in EC2 boot script env so runtime can
register with
POST /registry/register - Or: have runtime request a token at startup via
GET /admin/workspaces/:id/test-token - Verify workspace status flips to "online" on Canvas after boot
- Test full Canvas flow: deploy → STARTING → online → chat works
Phase 35.3 — Boot time optimization
- Pre-baked AMI per runtime (Packer or EC2 Image Builder):
ami-hermes: Python + openai + anthropic + molecule-runtime + hermes adapterami-claude-code: Node + claude-code SDK + molecule-runtimeami-langgraph: Python + langchain + langgraph + molecule-runtime
- Runtime switch = launch from different AMI. Boot ~30s vs current ~9 min
- Remove apt-get + pip install from boot script (only config + secrets + start)
Phase 35.4 — Stability + CI
- Fix go.mod replace directive (PR #900) — unblocks all CI
- Use stable origin IP for wildcard DNS (dedicated proxy or Tunnel)
- Add workspace boot integration test to CI
- Add SaaS tenant smoke test (
tests/e2e/test_saas_tenant.sh) to CI - Clean up Cloudflare edge cache poisoning from session (or wait ~24h for natural expiry)
Infra footnote — Temporal
docker-compose.infra.yml now includes Temporal (:7233 gRPC, :8233 Web
UI) backing workspace-template/builtin_tools/temporal_workflow.py for
durable long-running agent workflows. All infra services share the
molecule-monorepo-net Docker network, which infra/scripts/setup.sh
creates idempotently. Temporal currently runs with no auth on
0.0.0.0:7233 — dev-only; any production deployment must front it with
mTLS, API keys, or a reverse proxy before exposing the cluster.