Hongming Wang 76d3b32ab9 fix: resolve PLAN.md merge conflict — keep both Phase 34 and Phase 36

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-04-17 21:41:32 -07:00

47 KiB

Raw Blame History

PLAN.md — Molecule AI Build Plan

Completed phases (1–11, 13–14) are documented in /docs and removed from here. This file tracks only in-progress and upcoming work.

Completed Phases (see /docs for details)

Phase	Name	Docs
1	Core Loop	`docs/architecture/architecture.md`, `CLAUDE.md`
2	E2E Validation	`CLAUDE.md` (build/test commands)
3	Hierarchy & Communication	`docs/api-protocol/communication-rules.md`
4	Provisioner	`docs/architecture/provisioner.md`
5	Agent Management	`CLAUDE.md` (API routes)
6	Bundle Export/Import	`docs/agent-runtime/bundle-system.md`
7	Team Expansion	`docs/agent-runtime/team-expansion.md`
8	Human-in-the-Loop Approvals	`docs/agent-runtime/system-prompt-structure.md`
9	Hierarchical Memory	`docs/architecture/memory.md`
10	Observability (Langfuse)	`docs/development/observability.md`
11	Canvas Polish & UX	`docs/frontend/canvas.md`
13	Runtime Enhancements	`docs/agent-runtime/workspace-runtime.md`
14	Production Hardening	`docs/architecture/provisioner.md`, `CLAUDE.md`
15	Per-Workspace Dir	PR #38 — `workspace_dir` per workspace
16	Plugin System	PR #39 — per-workspace plugins with registry
17	Agent GitHub Access	PR #40 — git/gh in images, GITHUB_TOKEN env
18	File Browser Lazy Loading	PR #37 — depth=1, path traversal protection
19	MCP Full Coverage	PR #40 — 52→54 tools (plugins, global secrets, pause/resume, org, delegation)
20	Canvas UX Sprint	PRs #4, #21, #39 — Settings Panel, Onboarding, Plugins UI, Pause/Resume
21	Claude Agent SDK Migration	PR #48 — `ClaudeSDKExecutor` replaces CLI subprocess
22	Cron Scheduling	PR #49 — recurring tasks via cron expressions, Canvas Schedule tab
23	Code Quality & Multi-Provider	PR #50 — model fallback, DeepAgents full SDK, 7 LLM providers, 100% test coverage
24	Async Delegation	PR #41 — non-blocking delegation with status polling, `check_delegation_status` tool
25	Social Channels	PR #54 — adapter-based Telegram integration, Canvas Channels tab, 7 MCP tools, hot reload, multi-chat IDs, auto-detect, /start auto-reply, full Telegram Bot API audit fixes
26	Auth Env Vars	PR #55 — `required_env` config replaces `.auth-token` files, env-var only path; reno-stars 15-agent org template
27	Channel Polish & Org Auto-link	PR #56 — poller lifetime fix (bgCtx), Restart Pending button (only when needed), org template `channels:` field auto-links Telegram on import

Phase 12: Code Sandbox — DONE

Three-backend sandbox for the run_code tool, selectable per-workspace via SANDBOX_BACKEND env (set from config.yaml → sandbox.backend).

run_code tool — workspace-template/builtin_tools/sandbox.py
subprocess backend (default) — asyncio subprocess with hard timeout
docker backend — throwaway container with resource limits (MVP)
e2b backend (cloud) — E2B microVMs via e2b-code-interpreter, reads E2B_API_KEY
Sandbox config — SandboxConfig dataclass in workspace-template/config.py

Firecracker-as-a-backend is intentionally skipped: each tenant platform now runs on a Fly Machine (which IS a Firecracker microVM — see Phase 32 Phase B), so the entire workspace process is already Firecracker-isolated from other tenants. Running Firecracker inside Firecracker would double- nest for no additional security. For stronger per-call isolation within one tenant, use the e2b backend.

Phase 20: Canvas UX Sprint — MOSTLY COMPLETE

UX specs created by UIUX Designer agent. See docs/ux-specs/ for full specs.

20.1 Settings Panel (Global Secrets UI) — DONE

Spec: docs/ux-specs/ux-spec-settings-panel.md

Gear icon in canvas top bar (Cmd+, shortcut)
Slide-over drawer (480px, right-anchored)
Service groups (GitHub, Anthropic, OpenRouter, Custom)
CRUD: add, view (masked), edit, delete secrets
Empty state with guided setup
Unsaved changes guard on close

20.2 Onboarding / Deploy Interception — DONE

Spec: docs/ux-specs/ux-spec-onboarding-interception.md

Pre-deploy secret check — detect missing API keys per runtime
Missing Keys Modal — inline form, only asks for what's needed
Provisioning timeout → named error state with recovery actions
No dead ends — every error has a fix action

20.3 Canvas UI Improvements — PARTIAL

Spec: docs/ux-specs/ux-spec-canvas-improvements.md

Plugins install/uninstall in Skills tab (PR #39)
Pause/resume from context menu
Org template import from canvas (PR — OrgTemplatesSection in TemplatePalette)
Workspace search (Cmd+K)
Batch operations

Phase 30: SaaS — Remote Workspaces & Cross-Network Federation — IN PROGRESS

Goal: let a Python agent running on a laptop in another city boot, register, authenticate, accept A2A from its parent PM on the platform, and appear on the canvas as a first-class workspace.

Why now: the self-hostable single-box model has landed; the next meaningful expansion is letting orgs span machines and networks. This is the step that turns Molecule AI from "Docker-compose on one box" into a multi-tenant SaaS-shaped product.

Design thesis: ride the existing runtime='external' escape hatch. Every Docker-touching handler already short-circuits when a workspace is external. We don't need a parallel subsystem — we need to close four small gaps and add per-workspace auth. See docs/remote-workspaces-readiness.md for the full code audit.

Shipping order (eight bounded steps, ~2 weeks to GA)

30.1 Workspace auth tokens — foundation; prevents spoofing. New workspace_auth_tokens table; POST /registry/register issues a token; middleware validates Authorization: Bearer <token> on /registry/heartbeat, /registry/update-card. Lazy bootstrap so in-flight workspaces upgrade gracefully. Transparent to local containers — provisioner carries the token through the existing env-var pattern. No feature flag.
30.2 Secrets pull endpoint — GET /workspaces/:id/secrets/values returns decrypted secrets JSON, gated by the 30.1 token. Local agents can use it too (removes env-at-create coupling for rotating secrets).
30.3 Plugin tarball download — GET /plugins/:name/download returns a tarball; agent unpacks locally. Replaces Docker-exec plugin install for remote agents. Behind REMOTE_PLUGIN_DOWNLOAD_ENABLED.
30.4 Workspace state polling — GET /workspaces/:id/state returns {status, paused, deleted_at, pending_events[]} as a drop-in for the WebSocket feed remote agents can't reach. Behind REMOTE_STATE_POLLING_ENABLED.
30.5 A2A proxy token validation — the proxy enforces the caller's auth token on POST /workspaces/:id/a2a. Mutual auth between agents.
30.6 Direct sibling discovery + URL caching — agents call GET /registry/{parent_id}/peers once, cache sibling URLs, call them directly for A2A. Resilient to brief platform outages.
30.7 Poll-liveness for external runtime — LivenessChecker interface in registry/; PollLiveness marks offline if no heartbeat in 90s. Docker checker becomes one implementation, poll-liveness another. Health sweep routes by runtime. Behind REMOTE_LIVENESS_POLLING_ENABLED.
30.8 Remote-agent SDK + docs — sdk/python/molecule_agent/ thin client: register → pull secrets → run A2A loop → poll state → heartbeat. Working sdk/python/examples/remote-agent/ a new user can run on a laptop. Remove the three feature flags. Remote workspaces become GA.

Out of scope for Phase 30

Mutual TLS / platform-identity verification from the agent side. Agent trusts any platform URL in its env. Defer until real multi- tenant deployment forces the question.
Agent-to-agent mesh across NATs. Direct sibling calls only work when siblings are reachable from each other. Behind-NAT ↔ behind-NAT needs a relay — defer to Phase 31.
Platform-managed persistent state for remote agents. Remote agents own their filesystem; platform never mounts.

Success criteria

sdk/python/examples/remote-agent/ boots on a laptop disconnected from the platform's LAN, registers, receives a task from parent PM via A2A, returns a result, appears on the canvas.
tests/e2e/test_federation.sh spawns a second platform instance + remote agent pointing at the first; both platforms see the agent as a workspace in the right state.
Spoofing test: attempt to impersonate a workspace with a guessed ID but no token → 401.

Phase 31 — Quality + Infra Pass (Q2 2026) — SHIPPED 2026-04-13

Completed in PRs #1–#8 and documented in docs/edit-history/2026-04-13.md:

Brand migration cleanup — LICENSE "Agent Molecule" → "Molecule AI"; new icon assets (PR #1).
Repo structural cleanup — moved examples/remote-agent/ → sdk/python/examples/, docs/superpowers/plans/ → plugins/superpowers/plans/; deleted empty platform/plugins/; gitignored .agents/, platform/workspace-configs-templates/, backups/, logs/, test-results/; added READMEs under tests/ and docs/ (PR #3).
MCP per-domain split — mcp-server/src/index.ts 1697 → 89 lines; 12 per-domain modules in src/tools/; shared src/api.ts; startup log now reports 87 tools (PRs #2, #4, #7).
Canvas dialog unification — native confirm()/alert() replaced with ConfirmDialog in 7 sites; new singleButton prop + 5 tests (vitest 352 → 357).
Platform handler decomposition — 4 oversize functions (proxyA2ARequest, Delegate, Discover, SessionSearch) split into testable helpers; +47 Go tests; handlers coverage 56.1% → 57.6%.
Env-var documentation — .env.example gained 11 previously-undocumented vars; all 21 distinct os.Getenv/envx.* keys now documented.
E2E hardening + CI — Phase 30.1 bearer auth + Phase 30.6 X-Workspace-ID requirements baked into test_api.sh (62/62) and test_comprehensive_e2e.sh (67/67); shared _lib.sh + _extract_token.py; new CI jobs e2e-api and shellcheck; setup-go gains module cache (PRs #5, #7, #8).

PR Workflow Rules

All PRs must follow this checklist:

Branch: Never push to main. Always create a feature/fix branch.
Code Review: Run /code-review skill and fix all issues before requesting merge.
Tests: All existing tests must pass. New features require new tests.
Documentation: Run /update-docs skill. Every PR must update:
- docs/edit-history/ session log
- Relevant docs in docs/ (API, architecture, frontend, etc.)
- CLAUDE.md if routes, env vars, or commands changed
- PLAN.md if the work completes a phase or adds new items
E2E Test: Rebuild, restart service, and manually verify before reporting done.
QA Review: QA Engineer reviews for edge cases, plan compliance, and documentation completeness before CEO merge approval.
CEO Approval: Only the CEO approves merges. Never merge without explicit approval.

Ecosystem Awareness

Adjacent projects worth tracking (Holaboss, Hermes, gstack, …) are catalogued in docs/ecosystem-watch.md. Skim quarterly, add entries liberally, and when one of those projects ships something we should react to, file a "Signals to react to" line in that doc and create a Backlog entry below pointing at it. Agents doing research or strategy work should read docs/ecosystem-watch.md first — it's the canonical starting point for "what else is out there."

Backlog (prioritized)

Canvas: Org template import — Phase 20.3 (deploy org from canvas UI)
Canvas: Workspace search (Cmd+K) — Phase 20.3 (quick find)
Canvas: Batch operations — Phase 20.3 (multi-select delete/restart)
Sandbox: Firecracker/E2B backends — Phase 12 (production isolation)
NemoClaw adapter — stub exists at adapters/nemoclaw/, no implementation yet
Remote plugin registry — install plugins from npm/git (currently local only)
Agent git worktrees — per-agent branches without full clone
SDK follow-ups — live tool-call visibility, cost telemetry, cancel UX, governance hooks
Real webhook mode for channels — Phase 27 candidate. Currently polling-only; webhook needs:
- mode: "webhook"|"polling" config field
- PUBLIC_URL env var
- Platform calls setWebhook on channel create (with random webhook_secret), deleteWebhook on delete
- Canvas toggle to enable webhook mode (only when PUBLIC_URL is set)
- Polling works fine for ≤hundreds of bots; webhook needed at thousands+ scale or for serverless
More channel adapters — Slack (OAuth + Events API), Discord (Bot + Gateway), WhatsApp (Cloud API)
Delegations list endpoint mismatch — GET /workspaces/:id/delegations returns [] while the agent's internal check_delegation_status shows active/completed delegations. One source of truth.
YAML-configurable per-agent repo access — new workspace_access: none|read_only|read_write field in org.yaml + :ro bind-mount for research agents; eliminates the "PM couriers documents to reports" workaround.
SDK executor swallows subprocess stderr — workspace-template/claude_sdk_executor.py surfaces only "Command failed with exit code 1 / Check stderr output for details" when the claude CLI crashes, making every failure opaque. Capture stderr, log at ERROR, include first ~1 KB in the A2A error response. High priority — blocked real debugging during PLAN.md coordination on 2026-04-12.
Agent MCP client defaults to localhost:8080 — inside a workspace container, localhost is the container itself, not the platform — so mcp__molecule__* tools fail with "platform unreachable." Inject MOLECULE_URL=${PLATFORM_URL} into every container at provision time and change the MCP client default to http://host.docker.internal:8080. High priority — blocks agents from calling platform tools (e.g. PM couldn't restart its own reports).

Note: items 11–14 previously carried sequential refs #64–#67. Those refs were placeholder enumeration, not GitHub issues. They now collide with actual merged PRs and issues with different scopes, so the refs were removed in 2026-04-14 tick-5. If/when these items get prioritized, file real GitHub issues for them.

Workspace restart_prompt — user-defined restart context (#19 Layer 2) — GitHub issue #66 (new 2026-04-14 tick-4 follow-up to PR #65 which shipped Layer 1). Let config.yaml / org.yaml declare a user-authored restart_prompt that is delivered alongside the platform-generated restart-context system message — e.g. "re-read your CLAUDE.md, re-hydrate TODOs from memory, resume the active delegation." Layer 1 (platform state snapshot) already ships; Layer 2 adds the user-defined side.

Recently launched (2026-04-14 tick-4)

GitHub issue #15 — Provisioner: auto-refresh CLAUDE_CODE_OAUTH_TOKEN from global_secrets on workspace restart → DONE via PR #64 (SetGlobal / DeleteGlobal now fan out RestartByID to every affected workspace).
GitHub issue #19 Layer 1 — Platform-generated restart context → DONE via PR #65 (synthetic A2A message/send with metadata.kind=restart_context, system:restart-context caller prefix, 30s re-register wait). Layer 2 deferred to issue #66 (see Backlog item 15 above).

Recently launched (2026-04-15 overnight sweep — ticks 17–30+, ~27 PRs)

Security hardening cluster. Roughly half the sweep was closing auth gaps surfaced by the Security Auditor's hourly audit cron:

#94 RFC-1918 + link-local in registry URL validator
#99 AdminAuth gate on GET /workspaces (topology leak / #104)
#106 path-sanitize + admin-gate POST /org/import (#103 HIGH)
#110 revoke workspace_auth_tokens on workspace delete
#119 IPv6 SSRF blocklist (fe80::/10, ::1/128, fc00::/7) + scheduler unit tests
#162 field-level authz on PATCH /workspaces/:id (#138 — cosmetic vs sensitive split)
#155 wire existing SecurityHeaders middleware into router
#167 gate 6 previously-unauth routes behind AdminAuth (#164 CRITICAL anon bundles/import; #165 HIGH events+bundles/export topology leak; #166 MED viewport+liveness)
#185 AdminAuth on GET /approvals/pending (#180)
#200 AdminAuth on POST /templates/import (#190 HIGH)
#203 CanvasOrBearer middleware — route-split for #168 canvas regression, only PUT /canvas/viewport; rejected PR #194's broader Origin-fallback approach because it would have re-opened #164
#209 source_id spoof defense in activity.Report (cherry-picked from the rejected #169 batch)
#233 resolveInsideRoot on POST /workspaces template/runtime (#226 MED)

Data integrity. Three bugs that would have silently corrupted state:

#212 CRITICAL migration-runner bug — RunMigrations globbed *.sql and alphabetically ran .down.sql BEFORE .up.sql on every boot, wiping workspace_auth_tokens (and 018/019 pairs). Filter fix + unit test in postgres_migrate_test.go.
#224 YAML injection in generateDefaultConfig — body.Name now emitted as a double-quoted YAML scalar with all control chars escaped. Structural test (parse + verify key count).
#236 log-injection in the #209 security-event log line — attacker-controlled source_id echoed via %s allowed fake log entries; switched to %q.

CI / infra.

#186 + controlplane #28 — every CI job migrated from ubuntu-latest to [self-hosted, macos, arm64] (Mac mini hongming-m1-mini). Non-trivial: services: replaced with inline docker run containers (ports 15432/16379), actions/setup-python bypassed via Homebrew python3.11 on $GITHUB_PATH, docker/setup-qemu-action added for cross-arch builds. Workaround for GH Actions billing cap on private repos.
#149 independent heartbeat pulse goroutine so long cron fires don't look stale on /admin/liveness (#140)
#211 migration runner regression (see #212 above — PR #212 is the fix)
Fly registry FLY_API_TOKEN rotated to a deploy token scoped to molecule-tenant (previously personal token, invalidated by flyctl auth login during the malware cleanup)

Platform / Scheduler reliability.

#95 panic-recover in scheduler tick() + per-fire goroutines (closes #85)
#207 concurrency-aware skip — scheduler.fireSchedule reads workspaces.active_tasks and advances next_run_at + records a cron_run row with status='skipped' instead of colliding with a busy agent (#115)
#206 surface error_detail in schedule history API (#152 problem B)

Workspace runtime features.

#205 idle-loop reflection pattern — opt-in idle_prompt + idle_interval_seconds in config.yaml; self-sends when heartbeat.active_tasks == 0. Hermes/Letta shape.
#208 Hermes Phase 1 multi-provider registry — 15 providers via adapters/hermes/providers.py (Nous, OpenRouter, OpenAI, Anthropic, xAI, Gemini, Qwen, GLM, Kimi, MiniMax, DeepSeek, Groq, Together, Fireworks, Mistral). 26 tests.
#198 A2A protocol compliance batch (#173/#174/#175): cancel() emits TaskStatusUpdateEvent(canceled, final=True), stateTransitionHistory=True in AgentCapabilities. Regression: push_sender=PushNotificationSender() crashed on startup because PushNotificationSender is abstract — reverted in #210.
#216 idle-loop pilot enabled on Technical Researcher workspace.
#225 + #235 auth_headers() on /registry/register + initial_prompt + idle loop self-posts (#215/#220)
#231 Claude SDK stderr probe for proper rate-limit error attribution (#160 diagnostics)

Controlplane (molecule-controlplane).

#19+#20 Grafana Cloud remote-write counter registry (cp_requests_total), push loop to prometheus-prod-32-prod-ca-east-0.grafana.net, Basic auth with user 3116422
#21 AWS KMS envelope encryption — per-secret DEK via GenerateDataKey, dual-mode (v2 blobs via KMS, legacy via static key, auto-routes by leading byte)
#24 /cp/status deep probe for Betterstack
#26+#27 public /legal/{terms,privacy,dpa,acceptable} pages from embedded markdown + smoke coverage
Isolation red-team test suite + observability runbooks (Grafana dashboard, Betterstack, Stripe Atlas)

Self code-review follow-ups (#228 + #232). Ran /code-review on the batch merges, surfaced 8 🟡 issues, split into Go (#228) and Python/docs (#232):

CanvasOrBearer invalid-bearer fall-through fix
short() helper replacing unsafe [:N] slices in scheduler.go
6 new tests (TestShort_helper, TestRecordSkipped_*, TestActivityHandler_Report_*, TestHistory_IncludesErrorDetail)
idle-loop hardening (asyncio.get_running_loop(), IDLE_FIRE_TIMEOUT_SECONDS clamp, typed exception handling, add_done_callback for fire-and-forget error logging)
idle_prompt / idle_interval_seconds documented in org.yaml defaults
New docs/runbooks/admin-auth.md — the three middleware variants + three-question test for adding to CanvasOrBearer

Test counts post-sweep: +70 Go (816 total), +40 Python (1180 total), +0 Canvas vitest (453 unchanged — UI/a11y patches only).

Outstanding (user action): #126 Slack adapter (Phase-H product decision), #160 Claude Max OAuth quota (wait for 2026-04-17 23:00Z reset OR upgrade OR switch to ANTHROPIC_API_KEY), #191 runner persistent-state docs (P3), #199 Fly registry token (resolved this session but publish-platform-image re-run pending runner), Stripe Atlas application (launch blocker, 2-week lead).

Recently launched (2026-04-15 tick-9)

Phase 32 Phase B.2 (image pipeline) — PR #80 (merged c3cc8e87) adds .github/workflows/publish-platform-image.yml: on every main-merge touching platform/**, builds platform/Dockerfile and pushes ghcr.io/molecule-ai/platform:latest + :sha-<commit> to GHCR. Paired with the private molecule-controlplane Fly + Neon provisioner (PR #3 there, merged 2e85d5ad) that reads TENANT_IMAGE env and boots tenant Fly Machines from this image. Tick-8 docs-sync PR #79 (merged d53a1287) also landed.

Recently launched (2026-04-14 tick-8)

Phase 32 PR #1 — TenantGuard middleware (PR #78, merged 57a05686). Public repo's only SaaS hook: when MOLECULE_ORG_ID env is set, non-allowlisted requests require matching X-Molecule-Org-Id header or 404. Unset → passthrough (self-hosted unchanged). Allowlist is exact-match: /health + /metrics. Paired with the private Molecule-AI/molecule-controlplane repo scaffolded this tick (Fly Machines provisioner stub, /cp/orgs CRUD, subdomain→fly-replay router, migrations 001-003 for organizations/org_instances/org_members). +6 TestTenantGuard_* tests. Phase 32 plan: follow-up PRs wire real Fly provisioner, WorkOS AuthKit, Stripe, Cloudflare, signup UX — all in the private repo except the single public middleware.

Recently launched (2026-04-14 tick-7)

GitHub issue #24 — Runtime-added workspace_schedules drift on org re-import → DONE via PR #76 (new source column on workspace_schedules via migration 022; org/import now upserts with ON CONFLICT (workspace_id, name) DO UPDATE ... WHERE source='template', so runtime-added rows survive re-imports; legacy rows backfilled to 'template'; +3 tests).
GitHub issue #51 — PM hardcoded audit-category routing → DONE via PR #75 (generic category_routing: block in org-templates/<name>/org.yaml defaults + per-workspace override; rendered into each workspace's config.yaml via renderCategoryRoutingYAML using yaml.Node + yaml.Marshal for safe escaping; PM prompt replaced with generic config-lookup; +6 tests).
PR #74 — org-templates/molecule-dev/org.yaml role overrides shrunk to just the deltas now that UNION semantics (PR #71) are in effect — removes verbose re-listing of defaults across PM, Research Lead, Research sub-roles, Security Auditor, UIUX Designer.

Recently launched (2026-04-14 tick-6)

GitHub issue #68 — Per-workspace plugins: REPLACE semantics caveat → DONE via PR #71 (mergePlugins helper in platform/internal/handlers/org.go now UNIONs per-workspace with defaults.plugins; !plugin or -plugin prefix on a per-workspace entry opts a default out; +5 TestPlugins_* tests). Role overrides in org-templates/*/org.yaml can now declare just the delta instead of restating every default.

Recently launched (2026-04-14 tick-5)

PR #70 — Wired the 12 modular plugins from PR #63 (tick-4) into the default molecule-dev org template. defaults.plugins expands from 3 → 9 (safety hooks + operational-memory skills become universal); PM role gains molecule-workflow-triage + molecule-workflow-retro, Security Auditor gains molecule-skill-code-review + molecule-skill-cross-vendor-review + molecule-skill-llm-judge. Verbose per-role re-listing is a consequence of REPLACE (not UNION) semantics in platform/internal/handlers/org.go; union-semantics proposal tracked as issue #68.
PR #69 — Backlog items 11–14 stripped of stale sequential refs #64–#67 (see footnote near item 15 above).

Test Coverage

Stack	Tests	Framework
Go (platform)	726	`go test -race` (raw PASS lines incl. subtests; +6 top-level `Test*` this tick: #64 secrets auto-restart x2, #65 restart-context x4)
Python (workspace)	1,140	pytest
Canvas (frontend)	357	Vitest
SDK (python)	132	pytest
MCP server	97	Jest
Total	2,452

E2E: 67/67 comprehensive checks passing, 62/62 API tests (also gated in CI e2e-api job), shellcheck-clean across all 5 E2E scripts.

Team Assignments

Agent	Current Focus
PM	Sprint coordination, backlog prioritization
Dev Lead	Engineering planning, PR review
UIUX Designer	UX specs for Phase 20 (DONE — 5 specs delivered)
Frontend Engineer	Phase 20.3 remaining items (org import, search, batch)
Backend Engineer	Sandbox production backends, API completeness
QA Engineer	Review every PR for docs + plan compliance
DevOps Engineer	CI/CD, Docker image optimization
Security Auditor	API key handling, path traversal, auth review

Next Steps

Frontend Engineer implements remaining Phase 20.3 items (org import from canvas, Cmd+K search)
Backend Engineer scopes Firecracker/E2B sandbox backends (Phase 12)
QA Engineer reviews PR #52 for docs compliance before merge
All agents use GITHUB_TOKEN env var to clone repo, branch, and create PRs

Plugin Adaptor System — shipped; deferred follow-ups only

The system is done. Landed (see feat/plugin-adaptor-registry and feat/agentskills-compliance): per-runtime plugin adaptors, hybrid resolver (registry > plugin-shipped > raw-drop), AgentskillsAdaptor covering rule+skill plugins for all runtimes, /plugins?runtime= filter, /workspaces/:id/plugins/available endpoint, molecule-plugin SDK, gemini org parity with molecule-dev, and full agentskills.io spec compliance for all first-party skills (installable in Claude Code, Cursor, Codex, and ~35 other skill-compatible tools — see docs/plugins/agentskills-compat.md).

Deferred, not blocking:

Upstream runtime-adapters/ extension to agentskills.io spec — once we've lived with our own per-runtime adapter model for ~month, propose it as a spec extension to agentskills/agentskills so other tools can share Molecule AI-authored adaptors.
Install-from-GitHub-URL flow — POST /plugins/install {git_url} that clones a repo into the registry, validates the manifest, and runs the adaptor through a sandbox. Needs signature/version pinning and a review of the adaptor-execution threat model before shipping.
Promote-to-default UI — today, promoting a community plugin to "curated" means manually copying its adapters/<runtime>.py into workspace-template/plugins_registry/<plugin>/. Later add a canvas button + PR template that opens an upstream PR automatically.
Plugin packs — manifest that lists other plugins to bundle (superpowers-pack → install superpowers-tdd + superpowers-debug + …). Skip until a real user asks; first-party plugins are small enough to install individually today.
Hot-reload on DeepAgents — upstream docs say skills/sub-agents are startup-only; would need platform-level container restart on plugin file change. Defer until users complain.
Atomic split of first-party plugins — superpowers and ecc still ship as multi-skill bundles. Pipeline already supports splitting but non-urgent.
Sub-agent plugins for non-DeepAgents runtimes — Claude Code / LangGraph don't have a native sub-agent feature; emulating via tool-routing is possible but invasive. Defer.
Workspace install tracking table — a workspace_plugin_installs table would let uninstall call the adaptor's uninstall() path reliably. Today uninstall is a rm -rf /configs/plugins/<name> which leaves copied skill dirs behind. Low user impact.
Shared org-template system-prompt.md via _shared/ — DRY molecule-dev and molecule-worker-gemini. Drift risk; revisit at 3+ orgs.

Phase 32 — Cloud SaaS launch (2026-Q2/Q3)

Goal: ship Molecule AI as a multi-tenant cloud SaaS (not just self-hosted per-customer). Ordered by dependency + ROI.

Current state (2026-04-15)

Live infrastructure:

Control plane deployed: https://molecule-cp.fly.dev (Fly app molecule-cp, 2 machines, Neon project molecule-cp / cool-sea-89357706)
Tenant app: Fly app molecule-tenant (Neon parent project molecule-tenants / dawn-bar-08311714, tenants get a branch per org)
Shared Redis: Upstash grateful-prawn-89393.upstash.io (key-prefix isolation, Phase H moves to per-tenant)
Container registry: registry.fly.io/molecule-tenant:latest (mirrored from ghcr.io/molecule-ai/platform:latest via GH Actions on every main push)
First real tenant provisioned: org acme → Fly machine + Neon branch + encrypted URLs in org_instances
WorkOS AuthKit live at /cp/auth/{signup,login,callback,signout,me} — hosted signup redirects correctly; see https://molecule-cp.fly.dev/cp/auth/signup
Stripe billing scaffold deployed in orgs-only mode (no Stripe creds configured yet; webhook handler + signature verification code ready)
Domain: moleculesai.app (DNS not yet wired — subdomain routing works via X-Molecule-Org-Slug header pending Cloudflare)

Phase status (post 2026-04-15 overnight sweep):

A — Foundation (accounts, tokens, domain): ✅ done
B — Fly provisioner + Neon branching: ✅ done
C — WorkOS AuthKit scaffold + RequireSession + org-ownership check: ✅ done
D — Stripe billing scaffold + auth-scoped checkout + plan quotas: ✅ code done; live keys pending Stripe Atlas
E — Cloudflare + DNS *.moleculesai.app + per-tenant Vercel canvas: ✅ done
F — Sign-up UX + onboarding: ✅ basic flow done (signup / org create / canvas redirect); polish + email pending
G — Observability + quotas + admin: ✅ Sentry + Grafana remote-write + /cp/status Betterstack probe + per-org rate limiter; admin panel /cp/admin/* pending
H — Hardening: ⏳ partial — AWS KMS envelope encryption ✅ (controlplane PR #21), tenant-isolation red-team CI gate ✅ (isolation_test.go), legal pages ✅ (/legal/* from controlplane PR #26); load test + Stripe Atlas application + status page custom domain pending
I — Launch: pending Stripe Atlas (~2 week lead)

Live infrastructure deltas (post-sweep):

Migration runner safety fix landed (#212) — *.down.sql filter; was wiping workspace_auth_tokens on every restart
Workspace auth tokens now revoked on workspace delete (#110)
All known unauth admin routes gated; #138 canvas regression resolved via field-level authz + CanvasOrBearer middleware
Self-hosted Mac mini CI runner replaced GH-hosted Linux to bypass private-repo Actions billing cap; FLY_API_TOKEN rotated to a deploy token scoped to molecule-tenant after the personal token was invalidated by flyctl auth login during the 2025-12-06 cryptominer cleanup
/legal/{terms,privacy,dpa,acceptable} live at https://app.moleculesai.app/legal/*

Known open issues on the live system:

Tenant /workspaces returns Neon pooler warnings (unnamed prepared statement does not exist) — lib/pq + Neon pooler incompatibility, tracked for lib/pq → pgx migration in a later phase
#160 Claude Max OAuth quota exhausted on the agent-fleet token until 2026-04-17 23:00 UTC; mitigations: wait, upgrade plan, OR switch workspace containers to ANTHROPIC_API_KEY env var
#191 self-hosted runner persistent-state docs (P3, low urgency)
#199 Fly registry token — resolved in the 2026-04-15 sweep but publish-platform-image re-run pending runner availability

Companion repo: Molecule-AI/molecule-controlplane (private). n8n-style open-core split: this public repo stays OSS (tenant binary + plugins + channels, contributable surface); control plane (orgs / signup / billing / provisioner / routing) is private. See molecule-controlplane/PLAN.md for its roadmap.

Tier 1 — blocks multi-tenant launch

Multi-tenancy: organizations table, org_id FK + WHERE org_id = $caller_org filter on every row-returning handler (workspaces, workspace_secrets, global_secrets, activity_logs, structure_events, agent_memories, workspace_schedules, workspace_channels). Middleware resolves caller's org from session token → ctx. Full security audit of tenant isolation before first external user.
Human auth + orgs: WorkOS AuthKit (NOT build-yourself, NOT Clerk — WorkOS treats per-org SSO as first-class; Clerk treats it as an upsell). Keep Phase 30.1 bearer tokens for machine-to-machine (agents). Stripe integration via WorkOS hooks.
Container isolation: replace raw-Docker-socket provisioner with Fly Machines API (Firecracker microVMs, per-workspace isolation, sub-second boot, pay-per-second). Today's shared /var/run/docker.sock is an RCE-to-host footgun that cannot ship multi-tenant. provisioner interface stays — only backend swaps. Docker path remains for local dev.
Stripe billing: subscriptions + usage metering (workspace-hours, LLM-token pass-through, storage), trial flow, dunning, invoices.
Per-org resource quotas: tier memory/CPU is configurable (PR #58) but unenforced at provision time. Add per-org ceilings: max workspaces, max concurrent-running, max total memory.
Managed Postgres + Redis: move off docker-compose for prod. Neon (serverless, branch-per-PR) for Postgres; Upstash for Redis. Alternative: drop Redis entirely — LISTEN/NOTIFY
- advisory locks cover heartbeat TTL + URL cache.
Secrets at rest via KMS: current SECRETS_ENCRYPTION_KEY is a single static AES-256 key. Move to AWS/GCP KMS-backed envelope encryption; the secrets_encryption_version table slot is already reserved for rotation.
Migration runner out of app boot: a bad migration currently crashes platform boot with no rollback. Extract to goose as a release step / init container. Auto-discovery runner stays for dev mode only.

Tier 1 follow-ups (before customer #1)

Observability: wire /metrics to a scraper (Grafana Cloud or self-hosted). Add Sentry for Go + Next.js error tracking. Langfuse stays for LLM traces.
Rate limiting per-org: global RATE_LIMIT=600/min is a shared bucket today. Needs per-org + per-endpoint buckets.
Cloudflare in front: WAF + CDN + DDoS. Free tier covers pre-revenue.
Sign-up / onboarding flow: landing → signup → first workspace in 60 seconds. No such flow today.
Transactional email: Resend or Postmark.
Admin panel: view orgs, suspend accounts, see usage, issue refunds. SQL-only at first; UI by ~50 orgs.
Privacy policy + ToS + DPA: real ones, vetted. GDPR / CCPA data-export + deletion endpoints (workspace-export already exists; need org-level).

Tier 2 — tech-stack upgrades (high ROI, non-blocking)

Go platform: migrate lib/pq → pgx/v5 (1–2 days; lib/pq in maintenance since ~2021). Then sqlc incrementally for new queries — keeps the no-ORM philosophy + typed Go.
Platform async: River (Postgres-backed, Go-native job queue). Delegation dispatch, workspace_schedules cron, future billing events + webhook fan-out all migrate cleanly. NOT Temporal — Temporal already ships in workspace-template as an agent tool; keep the separation.
Frontend: TanStack Query for server state. Zustand keeps pure UI state. Stops reimplementing cache / refetch / dedup. WS updates flow via qc.setQueryData. Single highest-ROI frontend refactor.
Turbopack for next build: one flag, 2–5× cold-build speedup.
Python workspace runtime → uv: uv pip install in entrypoint.sh cuts workspace cold-start 10–100×. User-visible latency win.
Python MCP client inside runtime: today mcp-server/ exposes the platform as an MCP server; agents inside workspaces can't yet consume external MCP servers. Closing the gap joins the winning 2026 ecosystem.
shadcn/ui CLI convention: already Radix + Tailwind; adopt npx shadcn add … passively for new components. No rewrite.

Tier 3 — explicitly NOT doing

Kubernetes: company-of-one cannot run K8s. Fly Machines covers isolation without the ops tax.
ORM (GORM / ent / bun): raw-SQL + sqlc covers every case.
Framework swap (Next → Vite / TanStack Start): 2-week rewrite buys nothing users see.
Auth-from-scratch: every hour on auth is an hour not on product.
Canvas library swap (xyflow → tldraw): xyflow is still the correct tool for typed node graphs.

Tier 4 — compliance / enterprise (when revenue lands)

SOC 2 via Drata / Vanta
Status page (Betterstack or Instatus)
Staging environment that mirrors prod
Blue-green / canary deploy pipeline
Per-org backup + point-in-time restore
Load testing (hey / vegeta) — current per-node ceiling unknown

Success criteria for Phase 32

Customer can sign up at moleculesai.app, create an org, deploy their first workspace, send their first message in < 5 minutes.
Two orgs on the same cluster cannot observe each other's workspaces, secrets, memory, or activity — verified by automated tenant-isolation test + manual red-team.
Fly Machines cost per active workspace-hour documented and reproducible.
Stripe-backed subscription + usage-based add-ons working end-to- end in sandbox.
One paying design partner on the cluster, paying a real invoice.

Phase 34: Partner API Keys — Programmatic Org Management

Goal: Enable partner platforms, CI/CD pipelines, and automation tools to create and manage orgs via API without a browser session. Critical for partner integrations, marketplace resellers, and internal testing.

Docs: docs/architecture/partner-api-keys.md

Phase 34.1 — Core infrastructure

Migration: partner_api_keys table (key_hash, scopes, org_id, rate_limit)
internal/auth/partner_keys.go — key validation, SHA-256 hashing, scope check
Update auth.Middleware — check Bearer mol_pk_* before WorkOS session
Scope enforcement helpers — RequireScope("orgs:create") per handler

Phase 34.2 — Admin endpoints

POST /cp/admin/partner-keys — create key (returns plaintext once)
GET /cp/admin/partner-keys — list keys (prefix + metadata only)
DELETE /cp/admin/partner-keys/:id — revoke key

Phase 34.3 — Rate limiting + audit

Per-key rate limiter (separate from session rate limit)
last_used_at tracking on each request
Add mol_pk_ to pre-commit secret scanner

Phase 34.4 — Partner onboarding

Partner onboarding guide (docs)
Example: create org → poll status → redirect user to tenant
Example: CI/CD test org lifecycle (create → test → delete)

Success criteria for Phase 34

Partner can POST /cp/orgs with an API key and get a provisioned org
Org-scoped keys cannot access other orgs
Revoked keys immediately return 401
Rate limiting prevents abuse
Full audit trail: who created which key, when last used

Phase 36: Full Staging Environment — GATES ALL INFRA CHANGES

Goal: Stop merging untested infra changes to production. Every change ships to staging first, gets verified, then promotes to production.

Why now: The 2026-04-17 session broke CI twice and caused hours of edge cache issues because there was no staging to catch regressions. This gates Phase 33 (Tunnel migration) and Phase 35 (security hardening).

Docs: docs/architecture/staging-environment.md

Phase 36.1 — Railway + Neon staging

Create Railway staging environment with staging-specific vars
Create Neon staging branch from main
Add staging.api.moleculesai.app CNAME to Railway staging
Verify CP deploys and boots on staging

Phase 36.2 — Image + deploy pipeline

Publish workflow pushes :staging tag (not :latest) on main merge
Add promote-to-production.yml workflow (manual trigger)
Promotion: retag :staging → :latest, deploy CP to production
Production tenants auto-update via Option B cron

Phase 36.3 — Staging DNS + Vercel

*.staging.moleculesai.app for staging tenant subdomains
staging.app.moleculesai.app for Vercel staging preview
Staging Cloudflare Tunnel (or Worker) for tenant routing

Phase 36.4 — Automated verification

Post-deploy staging smoke test (run test_saas_tenant.sh)
Block promotion if smoke test fails
Slack/GitHub notification on staging deploy + promotion

Success criteria for Phase 36

No infra change reaches production without passing staging first
Staging mirrors production (same services, same auth, separate data)
Promotion is a single manual action (button click or CLI command)
Staging cleanup is automated (terminate test EC2s after verification)

Phase 33: Tenant Subdomain Routing — MIGRATING TO CLOUDFLARE TUNNEL

Original: Wildcard DNS + Cloudflare Worker (implemented 2026-04-17). Replacing with: Cloudflare Tunnel per tenant (issue #933). Worker approach caused edge cache poisoning + security gaps (ADMIN_TOKEN in plaintext, unencrypted HTTP). Tunnel eliminates all of these. Docs: docs/architecture/wildcard-dns-proxy.md (original), issue #933 (tunnel migration plan). Prerequisite: Phase 36 (staging) — test tunnel on staging first.

Phase 33.1 — Worker + wildcard DNS (no tenant changes)

Create Cloudflare Worker that extracts slug from hostname, looks up backend IP from CP API, proxies request to EC2
Add GET /cp/orgs/:slug/instance endpoint to CP (public, rate-limited)
Add *.moleculesai.app wildcard DNS record (proxied, orange cloud)
Worker serves static "provisioning" splash page when tenant not ready
Deploy Worker via wrangler deploy + GitHub Actions
Verify Worker routing works for existing tenants alongside old A records

Phase 33.2 — Stop per-tenant DNS records

Remove Cloudflare A record creation from ec2.go provisioner
Remove Cloudflare DNS cleanup from deprovision/purge cascade
Existing A records coexist harmlessly (explicit wins over wildcard)

Phase 33.3 — Remove Caddy from EC2

Worker handles TLS termination — EC2 runs plain HTTP only
Remove Caddy install + Caddyfile from EC2 user-data script
EC2 security group: allow inbound HTTP from Cloudflare IPs only
~30s faster cold start (no apt-get caddy, no Let's Encrypt)

Phase 33.4 — Cleanup

Delete old per-tenant A records from Cloudflare
Remove cloudflareapi/ package from CP (Worker replaces it)
Update docs/runbooks/saas-secrets.md with Worker secrets

Success criteria for Phase 33

New org subdomain resolves instantly (zero DNS wait)
No NXDOMAIN caching — user never sees "site can't be reached"
Provisioning splash page shown while EC2 boots (auto-refreshes)
Cold start ~30s faster (no Caddy/Let's Encrypt)
Cost: Cloudflare Worker free tier or $5/mo

Phase 35: SaaS Production Hardening (post-2026-04-17 retrospective)

Goal: Address security gaps, remove debug code, fix workspace registration, and reduce boot time identified during the SaaS buildout session. See docs/retrospectives/2026-04-17-saas-buildout.md for full context.

Phase 35.1 — Security (CRITICAL, before any public launch)

Fix #756 — X-Workspace-ID header forge bypasses CanCommunicate (derive callerID from authenticated token, not raw header)
Fix #757 — GLOBAL memory poisoning mitigations (content delimiters
- audit log at minimum)
Remove ADMIN_TOKEN from public /cp/orgs/:slug/instance endpoint — store in Worker KV at provision time instead
Encrypt ADMIN_TOKEN in org_instances table (use envelope key)
Remove debug HTTP server (:9999) from workspace boot script
Remove set -ex from boot scripts (leaks env vars to EC2 console)
Restrict workspace EC2 security group (Cloudflare IPs + tenant IP only)
Add HTTPS between Worker and EC2 (or Cloudflare Tunnel)

Phase 35.2 — Workspace registration fix

Pass workspace auth token in EC2 boot script env so runtime can register with POST /registry/register
Or: have runtime request a token at startup via GET /admin/workspaces/:id/test-token
Verify workspace status flips to "online" on Canvas after boot
Test full Canvas flow: deploy → STARTING → online → chat works

Phase 35.3 — Boot time optimization

Pre-baked AMI per runtime (Packer or EC2 Image Builder):
- ami-hermes: Python + openai + anthropic + molecule-runtime + hermes adapter
- ami-claude-code: Node + claude-code SDK + molecule-runtime
- ami-langgraph: Python + langchain + langgraph + molecule-runtime
Runtime switch = launch from different AMI. Boot ~30s vs current ~9 min
Remove apt-get + pip install from boot script (only config + secrets + start)

Phase 35.4 — Stability + CI

Fix go.mod replace directive (PR #900) — unblocks all CI
Use stable origin IP for wildcard DNS (dedicated proxy or Tunnel)
Add workspace boot integration test to CI
Add SaaS tenant smoke test (tests/e2e/test_saas_tenant.sh) to CI
Clean up Cloudflare edge cache poisoning from session (or wait ~24h for natural expiry)

Infra footnote — Temporal

docker-compose.infra.yml now includes Temporal (:7233 gRPC, :8233 Web UI) backing workspace-template/builtin_tools/temporal_workflow.py for durable long-running agent workflows. All infra services share the molecule-monorepo-net Docker network, which infra/scripts/setup.sh creates idempotently. Temporal currently runs with no auth on 0.0.0.0:7233 — dev-only; any production deployment must front it with mTLS, API keys, or a reverse proxy before exposing the cluster.

47 KiB Raw Blame History Unescape Escape