molecule-core/PLAN.md
Hongming Wang dae07d61fd chore: structural cleanup — dead dirs, moves, gitignore
- Delete empty platform/plugins/ (dead remnant; plugins/ at repo root is
  the real registry; router.go comment updated)
- Gitignore local dev cruft: platform/workspace-configs-templates/,
  .agents/ (codex/gemini skill cache), backups/
- Untrack .agents/skills/ (keep local, stop tracking)
- Move examples/remote-agent/ → sdk/python/examples/remote-agent/
  (co-locate with the SDK it exercises); update refs in
  molecule_agent README + __init__ + PLAN.md + the demo's own README
- Move docs/superpowers/plans/ → plugins/superpowers/plans/
  (plans were written by the superpowers plugin's writing-plans
  subskill; belong with the plugin, not under docs)
- Add tests/README.md explaining the unit-tests-per-package +
  root-E2E split so new contributors don't ask
- Add docs/README.md explaining why site tooling lives under docs/
  rather than a separate docs-site/ (VitePress ergonomics)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 14:06:52 -07:00

305 lines
16 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# PLAN.md — Molecule AI Build Plan
> Completed phases (111, 1314) are documented in `/docs` and removed from here.
> This file tracks only **in-progress and upcoming work**.
---
## Completed Phases (see /docs for details)
| Phase | Name | Docs |
|-------|------|------|
| 1 | Core Loop | `docs/architecture/architecture.md`, `CLAUDE.md` |
| 2 | E2E Validation | `CLAUDE.md` (build/test commands) |
| 3 | Hierarchy & Communication | `docs/api-protocol/communication-rules.md` |
| 4 | Provisioner | `docs/architecture/provisioner.md` |
| 5 | Agent Management | `CLAUDE.md` (API routes) |
| 6 | Bundle Export/Import | `docs/agent-runtime/bundle-system.md` |
| 7 | Team Expansion | `docs/agent-runtime/team-expansion.md` |
| 8 | Human-in-the-Loop Approvals | `docs/agent-runtime/system-prompt-structure.md` |
| 9 | Hierarchical Memory | `docs/architecture/memory.md` |
| 10 | Observability (Langfuse) | `docs/development/observability.md` |
| 11 | Canvas Polish & UX | `docs/frontend/canvas.md` |
| 13 | Runtime Enhancements | `docs/agent-runtime/workspace-runtime.md` |
| 14 | Production Hardening | `docs/architecture/provisioner.md`, `CLAUDE.md` |
| 15 | Per-Workspace Dir | PR #38`workspace_dir` per workspace |
| 16 | Plugin System | PR #39 — per-workspace plugins with registry |
| 17 | Agent GitHub Access | PR #40 — git/gh in images, GITHUB_TOKEN env |
| 18 | File Browser Lazy Loading | PR #37 — depth=1, path traversal protection |
| 19 | MCP Full Coverage | PR #40 — 52→54 tools (plugins, global secrets, pause/resume, org, delegation) |
| 20 | Canvas UX Sprint | PRs #4, #21, #39 — Settings Panel, Onboarding, Plugins UI, Pause/Resume |
| 21 | Claude Agent SDK Migration | PR #48`ClaudeSDKExecutor` replaces CLI subprocess |
| 22 | Cron Scheduling | PR #49 — recurring tasks via cron expressions, Canvas Schedule tab |
| 23 | Code Quality & Multi-Provider | PR #50 — model fallback, DeepAgents full SDK, 7 LLM providers, 100% test coverage |
| 24 | Async Delegation | PR #41 — non-blocking delegation with status polling, `check_delegation_status` tool |
| 25 | Social Channels | PR #54 — adapter-based Telegram integration, Canvas Channels tab, 7 MCP tools, hot reload, multi-chat IDs, auto-detect, /start auto-reply, full Telegram Bot API audit fixes |
| 26 | Auth Env Vars | PR #55`required_env` config replaces `.auth-token` files, env-var only path; reno-stars 15-agent org template |
| 27 | Channel Polish & Org Auto-link | PR #56 — poller lifetime fix (bgCtx), Restart Pending button (only when needed), org template `channels:` field auto-links Telegram on import |
---
## Phase 12: Code Sandbox — PARTIAL
> MVP done (subprocess + Docker backends). Production backends not started.
- [x] `run_code` tool — `tools/sandbox.py`
- [x] Docker-in-Docker backend (MVP) — throwaway container with resource limits
- [ ] Firecracker backend (production) — MicroVM isolation, faster cold starts
- [ ] E2B backend (cloud) — cloud-hosted via E2B API
- [x] Sandbox config — `SandboxConfig` dataclass in config.py
---
## Phase 20: Canvas UX Sprint — MOSTLY COMPLETE
> UX specs created by UIUX Designer agent. See `docs/ux-specs/` for full specs.
### 20.1 Settings Panel (Global Secrets UI) — DONE
**Spec**: `docs/ux-specs/ux-spec-settings-panel.md`
- [x] Gear icon in canvas top bar (Cmd+, shortcut)
- [x] Slide-over drawer (480px, right-anchored)
- [x] Service groups (GitHub, Anthropic, OpenRouter, Custom)
- [x] CRUD: add, view (masked), edit, delete secrets
- [x] Empty state with guided setup
- [x] Unsaved changes guard on close
### 20.2 Onboarding / Deploy Interception — DONE
**Spec**: `docs/ux-specs/ux-spec-onboarding-interception.md`
- [x] Pre-deploy secret check — detect missing API keys per runtime
- [x] Missing Keys Modal — inline form, only asks for what's needed
- [x] Provisioning timeout → named error state with recovery actions
- [x] No dead ends — every error has a fix action
### 20.3 Canvas UI Improvements — PARTIAL
**Spec**: `docs/ux-specs/ux-spec-canvas-improvements.md`
- [x] Plugins install/uninstall in Skills tab (PR #39)
- [x] Pause/resume from context menu
- [x] Org template import from canvas (PR — `OrgTemplatesSection` in TemplatePalette)
- [ ] Workspace search (Cmd+K)
- [ ] Batch operations
---
## Phase 30: SaaS — Remote Workspaces & Cross-Network Federation — IN PROGRESS
**Goal:** let a Python agent running on a laptop in another city boot,
register, authenticate, accept A2A from its parent PM on the platform,
and appear on the canvas as a first-class workspace.
**Why now:** the self-hostable single-box model has landed; the next
meaningful expansion is letting orgs span machines and networks. This
is the step that turns Molecule AI from "Docker-compose on one box" into
a multi-tenant SaaS-shaped product.
**Design thesis:** ride the existing `runtime='external'` escape hatch.
Every Docker-touching handler already short-circuits when a workspace
is external. We don't need a parallel subsystem — we need to close
four small gaps and add per-workspace auth. See
[`docs/remote-workspaces-readiness.md`](docs/remote-workspaces-readiness.md)
for the full code audit.
### Shipping order (eight bounded steps, ~2 weeks to GA)
- [x] **30.1 Workspace auth tokens** — foundation; prevents spoofing.
New `workspace_auth_tokens` table; `POST /registry/register` issues
a token; middleware validates `Authorization: Bearer <token>` on
`/registry/heartbeat`, `/registry/update-card`. Lazy bootstrap so
in-flight workspaces upgrade gracefully. Transparent to local
containers — provisioner carries the token through the existing env-var
pattern. No feature flag.
- [x] **30.2 Secrets pull endpoint**`GET /workspaces/:id/secrets/values`
returns decrypted secrets JSON, gated by the 30.1 token. Local agents
can use it too (removes env-at-create coupling for rotating secrets).
- [x] **30.3 Plugin tarball download**`GET /plugins/:name/download`
returns a tarball; agent unpacks locally. Replaces Docker-exec plugin
install for remote agents. Behind `REMOTE_PLUGIN_DOWNLOAD_ENABLED`.
- [x] **30.4 Workspace state polling**`GET /workspaces/:id/state`
returns `{status, paused, deleted_at, pending_events[]}` as a drop-in
for the WebSocket feed remote agents can't reach. Behind
`REMOTE_STATE_POLLING_ENABLED`.
- [x] **30.5 A2A proxy token validation** — the proxy enforces the caller's
auth token on `POST /workspaces/:id/a2a`. Mutual auth between agents.
- [x] **30.6 Direct sibling discovery + URL caching** — agents call
`GET /registry/{parent_id}/peers` once, cache sibling URLs, call them
directly for A2A. Resilient to brief platform outages.
- [x] **30.7 Poll-liveness for external runtime**`LivenessChecker`
interface in `registry/`; `PollLiveness` marks offline if no heartbeat
in 90s. Docker checker becomes one implementation, poll-liveness
another. Health sweep routes by runtime. Behind
`REMOTE_LIVENESS_POLLING_ENABLED`.
- [x] **30.8 Remote-agent SDK + docs**`sdk/python/molecule_agent/`
thin client: register → pull secrets → run A2A loop → poll state →
heartbeat. Working `sdk/python/examples/remote-agent/` a new user can run on a
laptop. Remove the three feature flags. Remote workspaces become GA.
### Out of scope for Phase 30
- Mutual TLS / platform-identity verification from the agent side.
Agent trusts any platform URL in its env. Defer until real multi-
tenant deployment forces the question.
- Agent-to-agent mesh across NATs. Direct sibling calls only work when
siblings are reachable from each other. Behind-NAT ↔ behind-NAT needs
a relay — defer to Phase 31.
- Platform-managed persistent state for remote agents. Remote agents
own their filesystem; platform never mounts.
### Success criteria
- `sdk/python/examples/remote-agent/` boots on a laptop disconnected from the
platform's LAN, registers, receives a task from parent PM via A2A,
returns a result, appears on the canvas.
- `tests/e2e/test_federation.sh` spawns a second platform instance +
remote agent pointing at the first; both platforms see the agent as
a workspace in the right state.
- Spoofing test: attempt to impersonate a workspace with a guessed ID
but no token → 401.
---
## PR Workflow Rules
All PRs must follow this checklist:
1. **Branch**: Never push to main. Always create a feature/fix branch.
2. **Code Review**: Run `/code-review` skill and fix all issues before requesting merge.
3. **Tests**: All existing tests must pass. New features require new tests.
4. **Documentation**: Run `/update-docs` skill. Every PR must update:
- `docs/edit-history/` session log
- Relevant docs in `docs/` (API, architecture, frontend, etc.)
- `CLAUDE.md` if routes, env vars, or commands changed
- `PLAN.md` if the work completes a phase or adds new items
5. **E2E Test**: Rebuild, restart service, and manually verify before reporting done.
6. **QA Review**: QA Engineer reviews for edge cases, plan compliance, and documentation completeness before CEO merge approval.
7. **CEO Approval**: Only the CEO approves merges. Never merge without explicit approval.
---
## Ecosystem Awareness
Adjacent projects worth tracking (Holaboss, Hermes, gstack, …) are catalogued
in **[`docs/ecosystem-watch.md`](docs/ecosystem-watch.md)**. Skim quarterly,
add entries liberally, and when one of those projects ships something we
should react to, file a "Signals to react to" line in that doc and create a
Backlog entry below pointing at it. Agents doing research or strategy work
should read `docs/ecosystem-watch.md` first — it's the canonical starting
point for "what else is out there."
---
## Backlog (prioritized)
1. **Canvas: Org template import** — Phase 20.3 (deploy org from canvas UI)
2. **Canvas: Workspace search (Cmd+K)** — Phase 20.3 (quick find)
3. **Canvas: Batch operations** — Phase 20.3 (multi-select delete/restart)
4. **Sandbox: Firecracker/E2B backends** — Phase 12 (production isolation)
5. **NemoClaw adapter** — stub exists at `adapters/nemoclaw/`, no implementation yet
6. **Remote plugin registry** — install plugins from npm/git (currently local only)
7. **Agent git worktrees** — per-agent branches without full clone
8. **SDK follow-ups** — live tool-call visibility, cost telemetry, cancel UX, governance hooks
9. **Real webhook mode for channels** — Phase 27 candidate. Currently polling-only; webhook needs:
- `mode: "webhook"|"polling"` config field
- `PUBLIC_URL` env var
- Platform calls `setWebhook` on channel create (with random `webhook_secret`), `deleteWebhook` on delete
- Canvas toggle to enable webhook mode (only when PUBLIC_URL is set)
- Polling works fine for ≤hundreds of bots; webhook needed at thousands+ scale or for serverless
10. **More channel adapters** — Slack (OAuth + Events API), Discord (Bot + Gateway), WhatsApp (Cloud API)
11. **Delegations list endpoint mismatch**#64. `GET /workspaces/:id/delegations` returns `[]` while the agent's internal `check_delegation_status` shows active/completed delegations. One source of truth.
12. **YAML-configurable per-agent repo access**#65. New `workspace_access: none|read_only|read_write` field in `org.yaml` + `:ro` bind-mount for research agents; eliminates the "PM couriers documents to reports" workaround.
13. **SDK executor swallows subprocess stderr**#66. `workspace-template/claude_sdk_executor.py` surfaces only "Command failed with exit code 1 / Check stderr output for details" when the `claude` CLI crashes, making every failure opaque. Capture stderr, log at ERROR, include first ~1 KB in the A2A error response. **High priority** — blocked real debugging during PLAN.md coordination on 2026-04-12.
14. **Agent MCP client defaults to `localhost:8080`**#67. Inside a workspace container, `localhost` is the container itself, not the platform — so `mcp__molecule__*` tools fail with "platform unreachable." Inject `MOLECULE_URL=${PLATFORM_URL}` into every container at provision time and change the MCP client default to `http://host.docker.internal:8080`. **High priority** — blocks agents from calling platform tools (e.g. PM couldn't restart its own reports).
---
## Test Coverage
| Stack | Tests | Framework |
|-------|-------|-----------|
| Go (platform) | 476 | `go test -race` |
| Python (workspace) | 1,040 | pytest |
| Canvas (frontend) | 352 | Vitest |
| SDK (python) | 87 | pytest |
| **Total** | **1,955** | |
E2E: 68/68 comprehensive checks passing, 62 API tests.
---
## Team Assignments
| Agent | Current Focus |
|-------|--------------|
| PM | Sprint coordination, backlog prioritization |
| Dev Lead | Engineering planning, PR review |
| UIUX Designer | UX specs for Phase 20 (DONE — 5 specs delivered) |
| Frontend Engineer | Phase 20.3 remaining items (org import, search, batch) |
| Backend Engineer | Sandbox production backends, API completeness |
| QA Engineer | **Review every PR for docs + plan compliance** |
| DevOps Engineer | CI/CD, Docker image optimization |
| Security Auditor | API key handling, path traversal, auth review |
---
## Next Steps
1. Frontend Engineer implements remaining Phase 20.3 items (org import from canvas, Cmd+K search)
2. Backend Engineer scopes Firecracker/E2B sandbox backends (Phase 12)
3. QA Engineer reviews PR #52 for docs compliance before merge
4. All agents use `GITHUB_TOKEN` env var to clone repo, branch, and create PRs
---
## Future Work — Plugin Adaptor System
Landed (see `feat/plugin-adaptor-registry` and `feat/agentskills-compliance`):
per-runtime plugin adaptors, hybrid resolver (registry > plugin-shipped >
raw-drop), `AgentskillsAdaptor` covering rule+skill plugins for all
runtimes, `/plugins?runtime=` filter, `/workspaces/:id/plugins/available`
endpoint, `molecule-plugin` SDK, gemini org parity with molecule-dev,
and **full agentskills.io spec compliance** for all first-party skills
(installable in Claude Code, Cursor, Codex, and ~35 other skill-compatible
tools — see `docs/plugins/agentskills-compat.md`).
Deferred, not blocking:
- **Upstream `runtime-adapters/` extension to agentskills.io spec** —
once we've lived with our own per-runtime adapter model for ~month,
propose it as a spec extension to `agentskills/agentskills` so other
tools can share Molecule AI-authored adaptors.
- **Install-from-GitHub-URL flow** — `POST /plugins/install {git_url}` that
clones a repo into the registry, validates the manifest, and runs the
adaptor through a sandbox. Needs signature/version pinning and a review
of the adaptor-execution threat model before shipping.
- **Promote-to-default UI** — today, promoting a community plugin to
"curated" means manually copying its `adapters/<runtime>.py` into
`workspace-template/plugins_registry/<plugin>/`. Later add a canvas
button + PR template that opens an upstream PR automatically.
- **Plugin packs** — manifest that lists other plugins to bundle
(`superpowers-pack` → install `superpowers-tdd` + `superpowers-debug` + …).
Skip until a real user asks; first-party plugins are small enough to
install individually today.
- **Hot-reload on DeepAgents** — upstream docs say skills/sub-agents are
startup-only; would need platform-level container restart on plugin
file change. Defer until users complain.
- **Atomic split of first-party plugins** — `superpowers` and `ecc` still
ship as multi-skill bundles. Pipeline already supports splitting but
non-urgent.
- **Sub-agent plugins for non-DeepAgents runtimes** — Claude Code /
LangGraph don't have a native sub-agent feature; emulating via
tool-routing is possible but invasive. Defer.
- **Workspace install tracking table** — a `workspace_plugin_installs`
table would let uninstall call the adaptor's `uninstall()` path
reliably. Today uninstall is a `rm -rf /configs/plugins/<name>` which
leaves copied skill dirs behind. Low user impact.
- **Shared org-template `system-prompt.md` via `_shared/`** — DRY molecule-dev
and molecule-worker-gemini. Drift risk; revisit at 3+ orgs.