diff --git a/docs/architecture.md b/docs/architecture.md deleted file mode 100644 index c29507c3..00000000 --- a/docs/architecture.md +++ /dev/null @@ -1,220 +0,0 @@ -# Architecture Overview - -Molecule AI is a distributed platform for orchestrating AI agent teams. Three components form the core system, connected by HTTP, WebSocket, and JSON-RPC protocols. - -## System Components - -``` -Browser ──HTTP/WS──> Canvas (Next.js :3000) - │ - HTTP + WS - │ - Platform (Go :8080) - ┌──┴──┐ - Postgres Redis - │ - Docker API - │ - ┌─────────┼─────────┐ - Agent-1 Agent-2 Agent-N - (Python) (Python) (Python) - └──A2A JSON-RPC 2.0──┘ -``` - -### Canvas (Next.js 15 + React Flow) - -The browser-based visual UI. Built with `@xyflow/react` v12, Zustand for state, and Tailwind CSS. - -- Renders workspaces as draggable nodes on a canvas -- Connects to Platform via REST (`http://localhost:8080`) and WebSocket (`ws://localhost:8080/ws`) -- Sends user messages to agents through the Platform's A2A proxy -- Receives real-time updates via WebSocket events (status changes, agent messages, A2A responses) - -Source: `canvas/` - -### Platform (Go / Gin) - -The control plane. Manages workspace lifecycle, provisions containers, proxies A2A communication, and broadcasts events. - -Key responsibilities: -- **Workspace CRUD** -- create, list, update, delete workspaces -- **Container provisioning** -- starts Docker containers for each workspace agent, injects secrets as env vars -- **A2A proxy** -- forwards JSON-RPC requests from canvas to workspace agents, avoiding CORS/Docker network issues -- **Registry** -- agents self-register on startup, send heartbeats, update their AgentCard -- **Discovery** -- workspaces discover peers via hierarchy-based access control rules -- **WebSocket hub** -- broadcasts events to canvas clients (all events) and workspace clients (filtered by access) -- **Secrets management** -- global (`/settings/secrets`) + workspace-level encrypted secrets (AES-256-GCM) with inheritance (workspace overrides global) -- **Liveness monitoring** -- 3-layer health detection: passive (Redis TTL), proactive (Docker health sweep), reactive (A2A proxy check) - -Source: `workspace-server/` - -### Workspace Runtime (Python) - -The execution engine for individual agents. Each workspace runs in its own Docker container. - -- Loads config from `/configs/config.yaml` -- Discovers the appropriate adapter (LangGraph, Claude Code, etc.) -- Wraps the agent in an A2A server (using `a2a-sdk`) -- Self-registers with Platform on startup (`POST /registry/register`) -- Sends periodic heartbeats (`POST /registry/heartbeat`) -- Communicates with other workspaces via A2A JSON-RPC 2.0 - -Source: `workspace/` - -## Message Flow - -### User sends a message to an agent - -``` -1. User types in ChatTab -2. Canvas sends POST /workspaces/:id/a2a with JSON-RPC body -3. Platform resolves workspace URL (cache or DB) -4. Platform wraps body in JSON-RPC 2.0 envelope if needed -5. Platform forwards to agent container (5-min timeout for canvas, 30-min for agent-to-agent) -6. Agent processes via LangGraph/adapter, returns JSON-RPC response -7. Platform broadcasts A2A_RESPONSE via WebSocket (canvas-initiated requests only) -8. Platform logs activity asynchronously -9. Canvas receives A2A_RESPONSE event, extracts text, displays in ChatTab -``` - -### Agent-to-agent delegation - -``` -1. Agent A calls message/send targeting Agent B -2. Request goes through Platform A2A proxy (POST /workspaces/:id/a2a with X-Workspace-ID header) -3. Platform verifies access via CanCommunicate(callerID, targetID) -4. Platform forwards to Agent B's container (30-min timeout) -5. Agent B responds, Platform returns response to Agent A -6. Activity logged for both workspaces -``` - -## Core Concepts - -### Workspace - -The fundamental unit. A workspace represents an organizational **role** (not a task). Each workspace: -- Has a unique UUID, name, role description, and tier (1-4) -- Runs in its own Docker container -- Exposes a single A2A endpoint -- Can be expanded into a sub-team (Team Lead + children) -- Has a lifecycle: `provisioning` -> `online` -> `degraded` -> `offline` -> `removed` - -### Agent Card - -An A2A protocol discovery document. Each workspace agent publishes an AgentCard containing: -- Name, description, version -- URL endpoint -- Capabilities (streaming, push notifications) -- Skills (id, name, description, tags, examples) -- Supported input/output modes - -Updated via `POST /registry/update-card` and broadcast as `AGENT_CARD_UPDATED`. - -### A2A Protocol (Agent-to-Agent) - -Industry-standard JSON-RPC 2.0 protocol for agent communication: -- `message/send` -- synchronous request/response -- `message/stream` -- SSE streaming variant -- `tasks/get` -- poll async task status - -All agent-to-agent traffic flows through the Platform A2A proxy for access control and observability. - -### Hierarchy & Access Control - -The organizational structure IS the network topology. `CanCommunicate(callerID, targetID)` rules: -- Same workspace: allowed -- Parent <-> child: allowed -- Siblings (same parent_id): allowed -- Root-level workspaces (both parent_id IS NULL): allowed -- Everything else: denied - -### Team Expansion (Fractal Architecture) - -Any workspace can recursively expand into a sub-team. From the outside, it still exposes a single A2A endpoint. Inside, a Team Lead coordinates child agents. - -``` -Before: After expand: -┌──────────┐ ┌──────────────────────┐ -│ Marketing│ │ Marketing (Team Lead)│ -│ │ ──expand──> │ ├─ SEO Agent │ -│ │ │ ├─ Content Writer │ -│ │ │ └─ Analytics Agent │ -└──────────┘ └──────────────────────┘ -``` - -- `POST /workspaces/:id/expand` provisions child workspaces from config -- `POST /workspaces/:id/collapse` removes children, reverting to single workspace -- Children are auto-wired: Team Lead ↔ children can communicate, children are siblings -- On the canvas, children render as chips inside the parent node - -### Tiered Security - -| Tier | Name | Isolation | -|------|------|-----------| -| 1 | Sandboxed | Read-only root FS, tmpfs /tmp, no /workspace mount | -| 2 | Standard | 512 MiB memory, 1.0 CPU limit | -| 3 | Privileged | Privileged mode, host PID, Docker network | -| 4 | Full Access | Privileged, host PID, host network, Docker socket | - -## Database (PostgreSQL) - -Key tables: - -| Table | Purpose | -|-------|---------| -| `workspaces` | Core entity: id, name, role, tier, status, url, parent_id, agent_card (JSONB), heartbeat timestamps | -| `workspace_secrets` | Per-workspace encrypted secrets (AES-256-GCM). UNIQUE(workspace_id, key) | -| `global_secrets` | Platform-wide secrets. Workspace secrets with same key override globals | -| `activity_logs` | A2A communication logs: source, target, method, request/response bodies, duration, status | -| `agent_memories` | Hierarchical Memory Architecture: LOCAL, TEAM, GLOBAL scoped memories | -| `structure_events` | Append-only event log (WORKSPACE_ONLINE, AGENT_CARD_UPDATED, etc.) | -| `workspace_config` | Arbitrary JSONB config per workspace | -| `workspace_memory` | Key-value store with optional TTL per workspace | -| `canvas_layouts` | Node x/y positions on the canvas | - -Migrations: `workspace-server/migrations/` (12 files, auto-applied on startup). - -## Directory Structure - -``` -molecule/ -├── canvas/ # Frontend (Next.js 15) -│ └── src/ -│ ├── app/ # Next.js app router pages -│ ├── components/ # React components (tabs/, workspace-node) -│ ├── store/ # Zustand stores (canvas, socket, events) -│ ├── hooks/ # Custom React hooks -│ └── lib/ # Utilities -├── workspace-server/ # Backend (Go / Gin) -│ ├── cmd/server/main.go # Entry point -│ ├── cmd/cli/ # molecli TUI dashboard -│ ├── internal/ -│ │ ├── handlers/ # 24 HTTP handler files -│ │ ├── ws/ # WebSocket hub + client management -│ │ ├── events/ # Broadcaster (WS + Redis pub/sub) -│ │ ├── db/ # PostgreSQL + Redis connections -│ │ ├── provisioner/ # Docker container lifecycle -│ │ ├── registry/ # Liveness, health sweep, access rules -│ │ ├── crypto/ # AES-256-GCM encryption -│ │ └── models/ # Data types -│ └── migrations/ # 12 SQL migration files -├── workspace/ # Agent Runtime (Python) -│ ├── main.py # Entry point -│ ├── a2a_executor.py # A2A request handler -│ ├── config.py # YAML config loader -│ ├── heartbeat.py # Platform heartbeat loop -│ ├── adapters/ # Runtime backends (langgraph, claude-code, ...) -│ └── tools/ # Agent tools (delegation, sandbox, ...) -├── docker-compose.yml # Full stack -└── docker-compose.infra.yml # Infrastructure only (dev) -``` - -## Supporting Infrastructure - -| Service | Image | Purpose | -|---------|-------|---------| -| PostgreSQL 16 | `postgres:16-alpine` | Primary database | -| Redis 7 | `redis:7-alpine` | URL caching, pub/sub, TTL-based liveness | -| Langfuse | `langfuse/langfuse:2` + ClickHouse | LLM call tracing and observability | -| LiteLLM (optional) | `ghcr.io/berriai/litellm` | Unified multi-provider LLM routing | -| Ollama (optional) | `ollama/ollama` | Local model inference | diff --git a/docs/demo/fractal-expansion-script.md b/docs/demo/fractal-expansion-script.md deleted file mode 100644 index c43f4ee6..00000000 --- a/docs/demo/fractal-expansion-script.md +++ /dev/null @@ -1,237 +0,0 @@ -# Fractal Expansion Demo — Recording Script - -This document specifies the exact steps, canvas state, and UI interactions -to record the hero GIF used in the README and marketing materials. - -## Output Spec - -| Property | Value | -|---|---| -| Format | `.gif` (or `.webm` converted to gif) | -| Resolution | `800 × 500 px` (2× for retina: record at 1600×1000, export at 800×500) | -| Frame rate | 20 fps | -| Max file size | 5 MB (use `gifsicle -O3` or `gifski` to compress) | -| Duration | ~12 seconds | -| Loop | Infinite | -| Alt text (for README) | `Molecule AI fractal expansion: a single Engineering Lead node expands into a Frontend Dev, Backend Dev, and QA sub-team, then an A2A task arrives and escalates to human approval` | - ---- - -## Recording Tool Setup - -**Recommended:** [Kap](https://getkap.co/) (macOS) or [LICEcap](https://www.cockos.com/licecap/) (Windows/macOS) - -For highest quality, use [ScreenToGif](https://www.screentogif.com/) (Windows) or -record with QuickTime → convert with `ffmpeg` + `gifski`: - -```bash -# Convert QuickTime .mov → high-quality GIF -ffmpeg -i recording.mov -vf "fps=20,scale=800:-1:flags=lanczos" frames/frame%04d.png -gifski --fps 20 --width 800 -o fractal-expansion.gif frames/*.png -``` - ---- - -## Pre-Recording Setup - -### 1. Environment -- Run `docker compose up` and wait for all services to be healthy. -- Open Chrome at `http://localhost:3000`. -- Set browser zoom to 100%. -- Open DevTools → Network tab → enable "Slow 3G" throttling OFF (use full speed). -- Hide bookmarks bar for a clean capture area. - -### 2. Canvas State (before recording) -Ensure the canvas has **exactly one workspace node** visible: - -``` -┌─────────────────────────────────────┐ -│ │ -│ │ -│ ┌──────────────────────┐ │ -│ │ 🏗️ Engineering Lead │ │ -│ │ status: online │ │ -│ └──────────────────────┘ │ -│ │ -│ │ -└─────────────────────────────────────┘ -``` - -Use `setup-org.sh` to provision the org, then delete all nodes except the -Engineering Lead. Or provision a fresh single workspace via: - -```bash -curl -X POST http://localhost:8080/workspaces \ - -H "Content-Type: application/json" \ - -d '{"name":"Engineering Lead","role":"Engineering Lead","tier":1}' -``` - -### 3. Canvas Viewport -Pan and zoom so the Engineering Lead node is centered, slightly left of centre, -with breathing room above for the sub-nodes to appear during expansion. -Save viewport: `PUT /canvas/viewport`. - ---- - -## Scene-by-Scene Script - -### Scene 1 — Establishing Shot (0:00 – 1:00) - -**Duration:** ~1 second (pause on static canvas) - -**Canvas state:** -- Single "Engineering Lead" node, `status: online` (green dot) -- Node is centred on canvas - -**Narrator voiceover / caption (optional):** -> "One node. One role." - ---- - -### Scene 2 — Right-click → Expand to Team (1:00 – 2:00) - -**Duration:** ~1 second - -**Action:** -1. Move mouse smoothly to the Engineering Lead node (no jitter — use a mouse - recording tool with smoothing enabled) -2. Right-click the node to open the context menu -3. The context menu appears with options including **"Expand to Team"** -4. Hover over "Expand to Team" — it highlights - -**UI detail:** -The context menu is rendered by `WorkspaceContextMenu` in the canvas. -"Expand to Team" appears as the second item below "View Details". - ---- - -### Scene 3 — Sub-nodes Materialise (2:00 – 5:00) - -**Duration:** ~3 seconds - -**Action:** -1. Click **"Expand to Team"** -2. The platform calls `POST /workspaces/:id/expand` -3. Three child nodes animate into view — they should slide in from the - Engineering Lead node with the default React Flow `fade` transition: - -``` - ┌──────────────────────┐ - │ 🏗️ Engineering Lead │ - │ status: online │ - └──────────┬───────────┘ - │ - ┌──────────┼──────────┐ - │ │ │ - ┌──────┴──┐ ┌────┴────┐ ┌──┴──────┐ - │Frontend │ │Backend │ │ QA │ - │ Dev │ │ Dev │ │Engineer │ - └─────────┘ └─────────┘ └─────────┘ -``` - -**Timing note:** If provisioning takes > 3 seconds in your recording, set the -workspace tier to 1 (no Docker pull needed) and pre-build the workspace image -(`docker build -t workspace-template:latest workspace/`). - ---- - -### Scene 4 — Nodes Come Online (5:00 – 7:00) - -**Duration:** ~2 seconds - -**Action:** -- All three child nodes transition from `provisioning` (grey dot) → `online` (green dot) -- The Engineering Lead's tier badge updates to show "Team Lead" indicator -- WebSocket events trigger the canvas update in real time (no manual refresh) - -**Visual target:** -All four nodes showing green online dots. - ---- - -### Scene 5 — A2A Task Arrives (7:00 – 9:00) - -**Duration:** ~2 seconds - -**Action:** -1. Using the MCP server or `curl`, send an A2A message to the Engineering Lead: - -```bash -curl -X POST http://localhost:8080/workspaces//a2a \ - -H "Content-Type: application/json" \ - -d '{ - "method": "message/send", - "params": { - "message": { - "role": "user", - "parts": [{"kind": "text", "text": "Ship the login feature by Friday"}] - } - } - }' -``` - -2. An amber **"current task"** banner appears on the Engineering Lead node: - > *"Decomposing: Ship login feature"* - ---- - -### Scene 6 — Approval Escalation (9:00 – 12:00) - -**Duration:** ~3 seconds - -**Action:** -1. The Engineering Lead detects a high-risk action (e.g., "deploy to production") - and creates an approval request via `POST /workspaces/:id/approvals` -2. An **approval card** animates into view at the top of the canvas: - ``` - ┌─────────────────────────────────────────────┐ - │ ⚠️ Approval Required │ - │ "Deploy login service to production?" │ - │ Requested by: Engineering Lead │ - │ [Approve] [Deny] │ - └─────────────────────────────────────────────┘ - ``` -3. End on this frame — the GIF loops back to Scene 1 (static canvas) - -**How to trigger an approval for the demo:** -```bash -curl -X POST http://localhost:8080/workspaces//approvals \ - -H "Content-Type: application/json" \ - -d '{ - "question": "Deploy login service to production?", - "context": "The login feature is complete and passes all tests.", - "risk_level": "high" - }' -``` - ---- - -## Post-Processing - -```bash -# Compress with gifsicle (reduces file size 40-60%) -gifsicle -O3 --lossy=80 fractal-expansion-raw.gif -o fractal-expansion.gif - -# Verify file size -ls -lh fractal-expansion.gif # target: < 5 MB - -# Save to: -cp fractal-expansion.gif docs/demo/fractal-expansion.gif -``` - -Then update `README.md` — replace the placeholder comment with: -```markdown -![Molecule AI fractal expansion demo](./docs/demo/fractal-expansion.gif) -``` - ---- - -## Checklist Before Publishing - -- [ ] GIF file size < 5 MB -- [ ] Resolution exactly 800 × 500 px -- [ ] All nodes visible and labelled -- [ ] No personally identifiable info in terminal/UI -- [ ] Alt text set in README img tag -- [ ] Loops cleanly (last frame matches first frame visually) -- [ ] Tested in GitHub README preview (GitHub caps animated GIFs at 10 MB) diff --git a/docs/ecosystem-research-outcomes.md b/docs/ecosystem-research-outcomes.md deleted file mode 100644 index 58df1a49..00000000 --- a/docs/ecosystem-research-outcomes.md +++ /dev/null @@ -1,344 +0,0 @@ -# Ecosystem Research Outcomes - -**Input:** [`docs/ecosystem-watch.md`](./ecosystem-watch.md) — Holaboss -(`holaboss-ai/holaboss-ai`), Hermes Agent (`NousResearch/hermes-agent`), -gstack (`garrytan/gstack`). - -**Method:** Molecule AI-dev team coordination — PM fan-out to Research Lead -(3 analysts) and Dev Lead (6 engineers). Full research corpus archived -under `/tmp/eco_research/` during synthesis; raw outputs are what the -team actually said. Cross-referenced against real repo files before -listing any file path in this doc. - -**Date:** 2026-04-12 - ---- - -## Top-5 platform improvements (prioritized) - -Ranking is by convergence across team members + impact for the hours -spent. All effort tags are S (≤1 day), M (1–3 days), L (≥1 week). - -### 1. Memory: Postgres FTS + namespace scoping — **S, high impact** - -Replace the `content ILIKE '%q%'` sequential scan in -`workspace-server/internal/handlers/memories.go:Search` with a `tsvector` -generated column, GIN index, and `ts_rank` ordering. Add a -`namespace VARCHAR(50) DEFAULT 'general'` column plus the -`(workspace_id, namespace)` composite index. Ship as migration -`workspace-server/migrations/017_memories_fts_namespace.sql`. Purely -additive — old rows get `namespace = 'general'`, new query params -(`?q=`, `?namespace=`) are optional, no breaking change. - -Converged across Backend, QA, Frontend, UIUX — everyone proposed -some flavour of this. Combines Hermes's FTS5 recall pattern with -Holaboss's `knowledge/{facts,procedures,blockers,reference}/` -namespace model. Canvas can render namespace-grouped accordions -against the same endpoint with zero backend changes after day one. - -**Ecosystem citations:** Hermes — "FTS5 + LLM-summarization for -cross-session recall — cheap, no vector-store overhead"; Holaboss — -filesystem-as-memory hierarchy. - -### 2. Workspace hibernation: idle watchdog + auto-pause — **M, DevOps win** - -DevOps Engineer's proposal: add a `_idle_watchdog` background job -in `workspace/entrypoint.sh` that reads `/tmp/.last_activity` -(written by `main.py` on each A2A request) and calls the existing -`POST /workspaces/:id/pause` after `IDLE_SHUTDOWN_MINUTES` (default -30). Platform's existing liveness monitor handles resume on next task; -no new Go code required — this is a shell + one `main.py` line + an -env var. Enables Hermes-style serverless-ish behaviour for agents -that only wake for cron audits (Security Auditor, QA Engineer). -Pairs naturally with Proposal 3 below. - -**Ecosystem citation:** Hermes — Daytona / Modal serverless backends -with hibernation. - -### 3. Parallel adapter builds — **S, QoL** - -`workspace/build-all.sh` builds the 6 adapter images -sequentially (~15 min wall-clock). They all `FROM -workspace-template:base` with no inter-adapter dependency — swap the -Step 3 loop for background jobs + `wait`, log each build to -`/tmp/build_.log`. Cuts total build time to ~5–7 minutes. -Prerequisite for hibernate/wake feeling snappy (Proposal 2). - -### 4. Plugin manifest: permissions + version floor + config schema — **S, spec-alignment** - -Extend `pluginInfo` in `workspace-server/internal/handlers/plugins.go` -with `permissions []string` (e.g. `env:GITHUB_TOKEN`, -`path:/workspace/repo`, `docker:CAP`), `min_platform_version` -(semver floor enforced at install time when `PLATFORM_VERSION` -env is set), and `config_schema json.RawMessage` (stored raw so -canvas can render a form without re-parsing). All three are -additive — missing keys unmarshal to zero values. Positions -Molecule AI ahead of the agentskills.io spec picking up -permissions semantics, and mirrors Holaboss's -`workspace.yaml`-forces-prompts-into-AGENTS.md principle -(config stays machine-readable). - -**Ecosystem citations:** Hermes — "if `agentskills.io` spec picks -up mass adoption → align our plugin manifest"; Holaboss — -`workspace.yaml` rejects inline prompt bodies. - -### 5. Fail-secure encryption at boot — **S, security critical** - -Security Auditor's top proposal. Today `SECRETS_ENCRYPTION_KEY` -is optional — when unset, the platform boots and silently falls -back to storing secrets in plaintext. Flip to fail-secure: if the -binary is built with `go build -tags prod` (or `MOLECULE_ENV=prod` -is set), refuse to start without a 32-byte key and log a loud -abort. Dev builds retain the current fallback with a startup -warning. Small, surgical change in `workspace-server/internal/crypto/aes.go` -+ `cmd/server/main.go` init; unit test already exists to verify -encryption path. - -**Ecosystem citation:** gstack CSO persona — OWASP A02:2021 -(cryptographic failures), STRIDE "Tampering / Information -Disclosure." - ---- - -## Per-agent improvement proposals - -Each of the 9 team members produced 2–3 concrete proposals mapped to -real file paths. Summary here; full proposals live in the captured -research (happy to expand any). Proposals adopted into Top-5 above are -marked ✅. - -### Market Analyst (Holaboss axis, 3,164 chars) - -- ✅ Structured filesystem memory layer (→ Top-5 #1) -- Compaction-boundary artifact for long-horizon single-agent mode — - **defer** (we're multi-agent; only useful if we add a persistent - PM-only mode). -- Section-based prompt assembly with per-section cache fingerprints — - **consider** once Claude-SDK prompt-caching becomes a cost line item. - -### Technical Researcher (Hermes axis, 3,874 chars) - -- ✅ Nudge-to-persist memory pattern → exposed in UIUX Proposal 2 below. -- ✅ FTS5 recall (→ Top-5 #1). -- ✅ Daytona/Modal-style hibernation (→ Top-5 #2). -- Honcho dialectic user-modelling backend — **evaluate for PM role - only**; too invasive to bolt onto every workspace. -- `hermes claw migrate` graceful-import pattern — **add to backlog** - if we ever deprecate a runtime adapter. - -### Competitive Intelligence (gstack axis, 2,974 chars) - -- Weekly Retro Synthesis command (`/retro`) — **CEO-side skill, see - below**. -- `/freeze`, `/guard`, `/unfreeze` architectural guardrails — see - QA proposal 3. -- Lift CSO (OWASP + STRIDE) and Designer (AI-slop detection) role - prompts into our Security Auditor and UIUX system-prompts as - attributed additions — **S effort, high leverage**. - -### Frontend Engineer (6,154 chars) - -1. **Namespaced Memory Browser** — `canvas/src/components/tabs/MemoryTab.tsx` - parses `namespace:key` naming into grouped accordions. Zero backend - change for MVP; converges with BE Top-5 #1. -2. **"Save as memory" nudge in ActivityTab** — - `canvas/src/components/tabs/ActivityTab.tsx` renders an inline - "Save as memory →" link on `task_complete` and `skill_promo` - events; clicks pre-populate the MemoryTab add form. Hermes - closed-learning-loop pattern. -3. **[3rd proposal in full output]** — available on request. - -### Backend Engineer (12,938 chars, the longest output) - -1. ✅ Memory FTS + namespace (→ Top-5 #1) -2. ✅ Plugin manifest extension (→ Top-5 #4) -3. Schedule import/export via bundle system — **M**; currently - `workspace_schedules` rows are orphaned on `bundles/export`. Small - handler change in `workspace-server/internal/handlers/bundle.go`. - -### DevOps Engineer (6,761 chars) - -1. ✅ Idle watchdog auto-pause (→ Top-5 #2) -2. ✅ Parallel adapter builds (→ Top-5 #3) -3. Per-adapter CI stages (build + smoke-test each image in its own - GitHub-Actions matrix job) — **M**; currently adapter images only - get built locally. - -### Security Auditor (8,875 chars, strongest deliverable) - -1. ✅ Fail-secure encryption at boot (→ Top-5 #5) -2. Remove `test:*` from production `systemCallerPrefixes` — **S**. - `workspace-server/internal/handlers/a2a_proxy.go:50` currently whitelists - the literal prefix `test:` in every environment; it's an - access-control bypass waiting to be exploited. Guard behind - `MOLECULE_ENV != prod`. -3. Plugin supply-chain hardening — mandate `plugin.yaml` presence - and reject staged trees containing executable bits (`+x`) outside - `skills/*/hook.sh`. **S**; adds a preflight in - `workspace-server/internal/plugins/localresolver.go`. - -### QA Engineer (6,395 chars) - -1. Filesystem memory namespace isolation (tests enforcing namespace - separation) — **S**, complements Top-5 #1. -2. Autonomous skill-creation loop + FTS5 recall test suite — **M**; - Hermes self-improvement pattern needs explicit coverage before - landing. -3. Freeze / Guard / Unfreeze architectural guardrail tests — **M**; - ports gstack's `/freeze` primitive as enforced test fixtures - (e.g. a `/freeze` on the auth middleware fails CI if any handler - modifies it without an override flag). - -### UIUX Designer (14,285 chars, the longest engineering output) - -1. Namespaced Memory Browser — same as FE proposal 1; the two - should be implemented as one ticket, UIUX leads, FE executes. -2. Clickable Skill Promotion Nudge on node card — surfaces Hermes's - skill-promotion pattern at the canvas-graph level. **S**. -3. Inline `initial_prompt` body warning in ConfigTab — - `canvas/src/components/tabs/ConfigTab.tsx` flags when - `initial_prompt:` has inline body text >200 chars with a - "Extract to AGENTS.md" lint-style hint. **S**; encodes the - Holaboss principle that config should stay machine-readable. - ---- - -## CEO workflow improvements - -Patterns to adopt at the **Claude Code CLI** layer (the CEO's -interface), not inside the Molecule AI platform itself. - -### New skills to add under `.claude/skills/` - -1. **`/retro`** — lifted directly from gstack. Generates a weekly - retrospective by reading `git log --since='7 days ago' --oneline - --shortstat`, the merged PR list, the set of closed issues, and - the activity logs across the org. Outputs a markdown doc under - `docs/retros/.md`. High leverage at near-zero cost; - gstack validated the pattern at 70k⭐. -2. **`/freeze `** — sets a repo-level flag (a file under - `.claude/freezes/`) that any future code-review or edit skill - reads. When the next CEO-driven change touches a frozen path, - the edit skill blocks with a clear message. Adopted from - gstack's `/freeze` / `/guard` / `/unfreeze` trio. -3. **`/verify-refs`** — explicit helper for the "verify before - citing" discipline we encoded in the team's hardened prompts. - Takes a draft message, finds `#NNN` / `sha:hex` / `path:` refs, - runs `gh issue view`, `git log -1`, `cat` respectively, and - reports any mismatches before the CEO sends. - -### Settings / Hooks changes - -- **Pre-tool hook on `Bash` commands that match `git push origin main`** - — reject unless `FORCE_PUSH_MAIN=1` is exported. This session we - caught ourselves (and PM) about to commit to `main` twice. A - hook makes the rule programmatic. -- **Status line / telemetry counter** for MCP tool failures — so - PR breakage from upstream MCP-client issues (e.g. #67) surfaces - in the prompt, not only when we try to use it. - -### Process / conventions - -- When briefing PM on a fan-out task: always include the explicit - workspace IDs and instruction to **inline documents** — even - though this is now encoded in the hardened system prompts - (PR #69), the reminder at task-issue time saves a round-trip. -- Treat `Agent error (ProcessError)` as a **platform bug**, not a - transient failure. Restart the affected workspace, note the - incident in the issue tracker referencing #66 and #71 until - they land. - ---- - -## Ecosystem signals to monitor (next quarter) - -Items to watch on `docs/ecosystem-watch.md` and in the repos directly: - -- **agentskills.io spec finalisation** — if the upstream spec - locks in permissions semantics, our `plugin.yaml` should - conform on the first release day. Today's Top-5 #4 positions - us to lead rather than follow. -- **Hermes multi-agent / A2A addition** — would put us in direct - overlap on the core thesis. Signal: any Nous Research blog - post or commit matching `a2a|delegate|subagent_a2a`. -- **gstack parallel / multi-session** — if gstack ships - simultaneous Claude Code workers + routing between them, - their 70k⭐ head start converts into direct competition. - Signal: any `/multi-*` command in the `garrytan/gstack` repo - or a Garry Tan post showing it. -- **Holaboss → A2A** — Holaboss shipping workspace-to-workspace - messaging would put them in the "AI company" shape we occupy - today. Signal: a `workspace.yaml` `connections:` field or a - `holaboss a2a` subcommand. -- **Atropos RL trajectory format** — if Nous standardises the - schema for RL training-data export, our activity logs should - adopt it so users can export Molecule AI runs for training. - ---- - -## Explicit non-adoptions - -Decisions to NOT copy, with reasons, so we don't revisit them: - -- **Holaboss single-active-agent-per-workspace shape** — incompatible - with our core thesis. Keep the concept of workspace-as-container - but don't collapse to a single agent. -- **Hermes six-backend abstraction** (Docker / SSH / Daytona / - Singularity / Modal) — our Docker provisioner is the right - ceiling for v1. Serverless hibernation (Top-5 #2) buys us 80% - of the cost win without the plumbing. -- **gstack's Claude-Code-native-only execution model** — gstack is - a prompt library living inside one Claude Code session. Adopting - that shape would eliminate our multi-agent / multi-runtime - differentiation. We borrow specific role prompts, not the - architecture. -- **`workspace.yaml` banning inline prompts at the schema level** - — Holaboss rejects inline prompt bodies at parse time. We ship a - UIUX *warning* instead (UIUX proposal 3) so existing templates - keep working. The principle is right; the enforcement mechanism - is too blunt for a platform that already has shipped templates - out in the wild. -- **Compaction-boundary artifact** — solves long-horizon single-agent - cost. We are multi-agent with per-workspace checkpointing already; - this would be complexity for no direct gain. - ---- - -## Process observations (meta) - -Notes on how this coordination went that inform future runs: - -1. **`#66` (opaque stderr) and `#71` (initial_prompt replay crash) - are blocking team coordination.** Every fresh org import today - started with ProcessError cascades. Until these land, any - multi-agent research task requires host-side intervention - (touching `.initial_prompt_done`, restarting crashed workspaces). -2. **`#65` (per-agent repo-access YAML) would eliminate the - inline-documents workaround** that every Hard-Learned Rule we - just added to the prompts tells the team to do. This is the - single highest-leverage platform improvement on the list. -3. **Capturing raw analyst outputs from the activity log is a valid - fallback** when PM crashes mid-synthesis. All 9 outputs in this - doc were retrieved from `GET /workspaces/:id/activity` after the - PM/RL/DL round-trip failed. Worth surfacing in platform docs - as the "recovery" path. -4. **The hardened system prompts (PR #69) are already paying off**: - Research Lead and Dev Lead both immediately fanned out in - parallel with delegation IDs, rather than attempting solo - synthesis. The "always fan out" rule is doing work. - ---- - -## Next actions - -Recommend proceeding in this order, each as its own PR: - -1. **Ship #71 fix** (initial_prompt marker up-front) — unblocks all - future fresh org imports. -2. **Ship #66 fix** (stderr capture) — restores debuggability. -3. **Top-5 #1** (memory FTS + namespace) — highest-convergence - team ask, cleanest migration. -4. **Top-5 #5** (fail-secure encryption) — security-critical, trivial. -5. **CEO `/retro` skill** — near-zero effort, compounding weekly. - -Everything else in this doc flows from there. diff --git a/docs/ecosystem-watch.md b/docs/ecosystem-watch.md deleted file mode 100644 index 0c23e545..00000000 --- a/docs/ecosystem-watch.md +++ /dev/null @@ -1,2979 +0,0 @@ -# Ecosystem Watch - -Projects adjacent to molecule-monorepo that are worth tracking — for design -ideas to borrow, terminology collisions to be aware of, and to stay honest -about where our differentiation actually is. - -## How to use this doc - -- **Skim quarterly.** The agent-infra space moves fast; expect entries to be - stale within ~3 months. When a project on this list ships something we - should react to, add a line under "Signals to react to" for that entry - and a short plan. -- **Add entries liberally.** Easier to prune than to miss. -- **One entry per project.** Keep each under ~200 words — link out, don't duplicate. - -## Template - -````markdown -### — `org/repo` - -**Pitch:** one sentence in their words. - -**Shape:** what it actually is (language, deployment target, one-vs-many-agents, etc.) - -**Overlap with us:** where our designs touch. - -**Differentiation:** why we're not the same product. - -**Worth borrowing:** specific ideas we should study. - -**Terminology collisions:** shared words that mean different things. - -**Signals to react to:** what they might ship that would change our roadmap. - -**Last reviewed:** YYYY-MM-DD · **Stars / activity:** -```` - ---- - -## Competitor Snapshot - -> **Machine-readable index for PMM cron diffing.** One YAML entry per competitor — -> the cron diffs this block to detect version bumps, threat escalations, and new -> `notable_changes`, then updates `docs/marketing/competitors.md`. -> -> **Maintenance rule:** whenever you update a narrative entry below, also bump the -> corresponding `date`, `version`, and `notable_changes` fields here. -> -> Fields: `name` · `slug` · `date` (last reviewed) · `version` · `stars` · -> `threat_level` (high / medium / low) · `notable_changes` (≤ 2 sentences) · `source_url` - -```yaml -# competitor-snapshot -# Generated: 2026-04-17 | Maintainer: Research Lead -# PMM cron reads this block, diffs vs. previous commit, updates docs/marketing/competitors.md. -# Update date + version + notable_changes whenever a competitor ships something significant. - -snapshots: - - # ── HIGH THREAT ──────────────────────────────────────────────────────────────────── - # Direct substitutes or major market-erosion risk for Molecule AI. - - - name: Paperclip - slug: paperclip - date: "2026-04-17" - version: "v2026.416.0" - stars: "54.8k" - threat_level: medium - notable_changes: > - Downgraded HIGH → MEDIUM (2026-04-17, deep-dive #571): no A2A protocol, - no visual canvas, no org-chart UI on roadmap. Blocker dependencies are - single-process task-graph DAG, not inter-agent coordination. Execution - policies are budget ceilings, not tool restrictions. Only capability gap - vs Molecule AI is per-workspace budget limits (tracked #541). Brand/ - framing threat ("zero-human companies") but not a technical substitute. - v2026.416.0 (Apr 16) ships chat threads + execution policies. - source_url: https://github.com/paperclipai/paperclip/releases - - - name: OpenAI Agents SDK - slug: openai-agents-sdk - date: "2026-04-17" - version: "v0.14.1" - stars: "14k" - threat_level: high - notable_changes: > - v0.14.1 (Apr 15 2026) patches tracing export on top of v0.14.0's - SandboxAgent beta — persistent isolated workspaces, snapshot/resume, - and sandbox memory directly competing with our workspace lifecycle model. - source_url: https://github.com/openai/openai-agents-python/releases - - - name: OpenAI Codex Agent - slug: openai-codex-agent - date: "2026-04-17" - version: "2026-04-17-launch" - stars: "N/A" - threat_level: high - notable_changes: > - Relaunched Apr 17 2026 as a full autonomous agent product (HN #2, 769 pts): - parallel subagent orchestration, cross-session project memory, autonomous - self-wake scheduling, macOS computer control, inline image generation — - distinct threat surface from openai-agents-sdk; directly overlaps our - workspace lifecycle, agent_memories, and workspace_schedules. - source_url: https://openai.com/index/codex-for-almost-everything/ - - - name: CrewAI - slug: crewai - date: "2026-04-17" - version: "v1.14.1" - stars: "48k" - threat_level: high - notable_changes: > - Deep-dive 2026-04-17: Crew Studio is a real node-and-edge drag-and-drop - canvas (workflow design paradigm, not governance — no org hierarchy, no - auth audit trail). AMP Factory self-hosted confirmed: on-prem/private VPC, - K8s, FedRAMP High certified. A2A spec v0.3.0 first-class (client+server, - matches Molecule AI a2a-sdk==0.3.25) — zero-shim interop confirmed; - CrewAI agents recruitable as Molecule AI workers today. v1.0.0 migration - (Mar 2026 spec) not yet adopted by either side — shared upgrade clock. - ICP unchanged: moat is governance-layer canvas (#582), not visual canvas - alone. File FedRAMP gap as enterprise procurement tracking issue. - source_url: https://github.com/crewAIInc/crewAI/releases - - - name: Google ADK - slug: google-adk - date: "2026-04-17" - version: "v1.30.0" - stars: "19k" - threat_level: high - notable_changes: > - v1.30.0 (Apr 13 2026) adds Auth Provider support to the agent registry, - Parameter Manager integration, and Gemma 4 model support; v2.0.0a3 - pre-release introduces a graph-based execution engine. - source_url: https://github.com/google/adk-python/releases - - - name: Microsoft Agent Framework - slug: microsoft-agent-framework - date: "2026-04-17" - version: "python-1.0.1" - stars: "9.5k" - threat_level: high - notable_changes: > - v1.0 GA (Apr 7 2026): multi-agent orchestration (sequential, concurrent, - group-chat, handoff, magnetic patterns), native A2A+MCP, OpenTelemetry, - pause/resume durability, HITL approvals. AG-UI protocol for SSE-streaming - agent events to frontends — direct competitor to our WebSocket canvas. - Process Framework GA planned Q2 2026. Molecule gap: AG-UI SSE endpoint, - tool governance registry, cost transparency per workspace. - source_url: https://github.com/microsoft/agent-framework/releases - - # ── MEDIUM THREAT ────────────────────────────────────────────────────────────────── - # Significant overlap in adjacent space; no direct substitution risk today. - - - name: Dify - slug: dify - date: "2026-04-17" - version: "v1.13.3" - stars: "60k" - threat_level: medium - notable_changes: > - Latest stable is v1.13.3 (Mar 27 2026); v1.14.0 RC adds Human Input - node (HITL); raised $30M Pre-A (Mar 2026, $180M valuation) with - 280 enterprise deployments — no-code positioning targets business users, - not our developer audience. - source_url: https://github.com/langgenius/dify/releases - - - name: LangGraph - slug: langgraph - date: "2026-04-17" - version: "v1.1.6" - stars: "29k" - threat_level: medium - notable_changes: > - langgraph-cli v0.4.22 (Apr 16 2026) adds deploy source tracking; - core v1.1.6 (Apr 10 2026) ships LangGraph 2.0 declarative guardrail nodes; - LangGraph Cloud hosted execution competes with our scheduler. - source_url: https://github.com/langchain-ai/langgraph/releases - - - name: VoltAgent - slug: voltagent - date: "2026-04-17" - version: "server-elysia@2.0.7" - stars: "8.2k" - threat_level: medium - notable_changes: > - @voltagent/server-elysia v2.0.7 (Apr 11 2026) fixes A2A agent card - endpoints to advertise correct absolute URLs; VoltOps Console is the - closest Canvas analogue in the TypeScript ecosystem. - source_url: https://github.com/VoltAgent/voltagent/releases - - - name: n8n - slug: n8n - date: "2026-04-17" - version: "v2.17.2" - stars: "50k" - threat_level: medium - notable_changes: > - v2.17.2 (Apr 16 2026) improves AI Gateway credentials endpoint; - n8n 2.0 (Dec 2025) added enterprise-grade AI Agent nodes, RBAC, SSO, - and 400+ channel integrations — direct overlap with our workspace_channels. - source_url: https://github.com/n8n-io/n8n/releases - - - name: Claude Code Routines - slug: claude-code-routines - date: "2026-04-17" - version: "cloud-feature" - stars: "n/a" - threat_level: medium - notable_changes: > - Launched Apr 14 2026 (research preview): Anthropic-hosted cron + GitHub- - event-triggered Claude Code sessions running on Anthropic cloud; competes - with our workspace_schedules; single-model, no org canvas. - source_url: https://code.claude.com/docs/en/routines - - - name: Scion - slug: scion - date: "2026-04-17" - version: "active" - stars: "early" - threat_level: medium - notable_changes: > - Launched Apr 8 2026 — GCP experimental container-per-agent harness for - Claude Code/Gemini CLI with parallel isolated workspaces and markdown - workflow definitions; escalation risk to HIGH if productized by Google. - source_url: https://github.com/GoogleCloudPlatform/scion - - - name: Multica - slug: multica - date: "2026-04-17" - version: "active-36-releases" - stars: "12.8k" - threat_level: medium - notable_changes: > - Positioned as open-source Claude Managed Agents alternative (Apr 2026); - local daemon + central backend with pgvector semantic skill compounding; - +1,503 stars/day at launch — no A2A or org canvas but similar architecture. - source_url: https://github.com/multica-ai/multica/releases - - - name: Cline - slug: cline - date: "2026-04-17" - version: "active" - stars: "44k" - threat_level: medium - notable_changes: > - VS Code Claude Code extension with 44k ⭐ and MCP support; primary user - overlap with our Claude Code workspace — developers who outgrow Cline's - single-session model are our conversion path. - source_url: https://github.com/cline/cline/releases - - - name: ClawRun - slug: clawrun - date: "2026-04-17" - version: "active-45-releases" - stars: "84" - threat_level: medium - notable_changes: > - Closest architectural match tracked — sandbox/heartbeat/snapshot-resume/ - channels/cost-tracking feature parity with us; 84 ⭐ but 45 releases - shows active shipping; adding A2A would make this a direct lightweight - competitor. - source_url: https://github.com/clawrun-sh/clawrun/releases - - - name: Gemini CLI - slug: gemini-cli - date: "2026-04-17" - version: "v0.38.1" - stars: "101k" - threat_level: medium - notable_changes: > - v0.38.1 (Apr 15 2026) is a cherry-pick stability patch; 1M-token context - + MCP support; runtime candidate for our workspace adapter — elevated to - MEDIUM because it forms a full agent stack with Google ADK + adk-web. - source_url: https://github.com/google-gemini/gemini-cli/releases - - - name: opencode - slug: opencode - date: "2026-04-17" - version: "v1.4.7" - stars: "145k" - threat_level: medium - notable_changes: > - v1.4.7 (Apr 16 2026); 145k★ open-source provider-agnostic coding agent - (Claude/OpenAI/Google/local); build+plan dual-mode; no A2A, no multi-agent. - Largest open-source coding agent by stars; users outgrowing single-agent - model are direct Molecule conversion path. Evaluate as workspace template - adapter (GH #720). Escalate to HIGH if A2A or multi-agent coordination added. - source_url: https://github.com/anomalyco/opencode/releases - - - name: Qwen3.6-35B-A3B - slug: qwen3-6-agentic - date: "2026-04-17" - version: "3.6-35B-A3B" - stars: "N/A" - threat_level: medium - notable_changes: > - Launched Apr 17 2026 (HN #1, 984 pts): open-weight MoE model (35B total, - 3B active/token) purpose-built for agentic coding loops; frictionless - self-hosted adoption commoditizes the model layer for multi-agent stacks; - erodes API-cost moat for cloud-dependent competitors; watch VoltAgent + - Paperclip BYO-model builds for first-mover Qwen3.6 integration. - source_url: https://qwen.ai/blog?id=qwen3.6-35b-a3b - - # ── LOW THREAT ───────────────────────────────────────────────────────────────────── - # Tools, infra layers, single-agent tools, or products we use — not substitutes. - - - name: Hermes Agent - slug: hermes-agent - date: "2026-04-17" - version: "v0.10.0" - stars: "61k" - threat_level: low - notable_changes: > - v0.10.0 (Apr 16 2026) launches Tool Gateway giving paid Portal subscribers - built-in web search, image generation, TTS, and browser automation; no - multi-agent or org hierarchy — personal AI shape, not platform competitor. - source_url: https://github.com/NousResearch/hermes-agent/releases - - - name: gstack - slug: gstack - date: "2026-04-17" - version: "active" - stars: "70k" - threat_level: low - notable_changes: > - Viral Claude Code skills bundle with 70k ⭐; sequential single-session - persona-switching — no persistent infra, Docker isolation, or A2A protocol; - differentiation holds unless multi-session execution is added. - source_url: https://github.com/garrytan/gstack - - - name: Flowise - slug: flowise - date: "2026-04-17" - version: "flowise@3.1.2" - stars: "30k" - threat_level: low - notable_changes: > - v3.1.2 (Apr 14 2026) delivers security hardening (CORS abuse, credential - leaks, unauthorized access); acquired by Workday (Aug 2025) — repositioned - for HR/finance enterprise, narrowing its developer-team market. - source_url: https://github.com/FlowiseAI/Flowise/releases - - - name: OpenHands - slug: openhands - date: "2026-04-17" - version: "v1.6.0" - stars: "47k" - threat_level: low - notable_changes: > - v1.6.0 (Mar 30 2026) adds hook support and /clear command preserving - sandbox runtime; jumped to v1.x series (was v0.39.0); SWE-Bench top - open-source rank — single-agent software engineer, not a platform. - source_url: https://github.com/All-Hands-AI/OpenHands/releases - - - name: Temporal - slug: temporal - date: "2026-04-17" - version: "v1.30.4" - stars: "13k" - threat_level: low - notable_changes: > - v1.30.4 (Apr 10 2026) patches CVE-2026-5724 MEDIUM authorization - vulnerability; $300M Series D (Feb 2026, $5B valuation); we integrate - Temporal as infra via workspace/builtin_tools/temporal_workflow.py. - source_url: https://github.com/temporalio/temporal/releases - - - name: Chrome DevTools MCP - slug: chrome-devtools-mcp - date: "2026-04-17" - version: "active" - stars: "35.5k" - threat_level: low - notable_changes: > - Official ChromeDevTools org MCP server with 23 browser-control tools; - replaces our bespoke Puppeteer CDP plugin — we adopt it as of issue #540. - source_url: https://github.com/ChromeDevTools/chrome-devtools-mcp - - - name: Composio - slug: composio - date: "2026-04-17" - version: "active" - stars: "18k" - threat_level: low - notable_changes: > - 250+ tool integrations with managed auth; potential skill-pack dependency - for workspace channel integrations rather than a competing platform. - source_url: https://github.com/composio-dev/composio/releases - - - name: AgentScope - slug: agentscope - date: "2026-04-17" - version: "v1.0.18" - stars: "23.8k" - threat_level: low - notable_changes: > - v1.0.18 (Mar 26 2026) from Alibaba/ModelScope with MsgHub typed routing - and OpenTelemetry; MCP integration; no deployment layer — framework only. - source_url: https://github.com/modelscope/agentscope/releases - - - name: Skills CLI - slug: skills-cli - date: "2026-04-17" - version: "active" - stars: "14.2k" - threat_level: low - notable_changes: > - Vercel-backed canonical agentskills.io install CLI covering 45+ agents - including our Claude Code workspace; aligning plugins/ manifest to the - agentskills.io spec gives us free distribution through this channel. - source_url: https://github.com/vercel-labs/skills - - - name: pydantic-ai - slug: pydantic-ai - date: "2026-04-17" - version: "active" - stars: "16.4k" - threat_level: low - notable_changes: > - Python agent framework with native A2A + MCP + HITL; type-safe structured - output via Pydantic validation; FastAPI-like DX. Potential workspace template - adapter target (GH #721) — A2A native means zero-shim Molecule peer if - a2a-sdk version compatible. Reference: Pydantic Evals for agent quality gates. - source_url: https://github.com/pydantic/pydantic-ai/releases - - - name: Archon - slug: archon - date: "2026-04-17" - version: "v0.3.6" - stars: "18.1k" - threat_level: low - notable_changes: > - v0.3.6 active; YAML-DAG coding workflow with mixed AI/deterministic nodes - and human approval gates; reference design for our workspace delivery - pipelines — no multi-agent coordination. - source_url: https://github.com/coleam00/Archon/releases - - - name: Tencent AI-Infra-Guard - slug: tencent-ai-infra-guard - date: "2026-04-17" - version: "v4.1.3" - stars: "3.5k" - threat_level: low - notable_changes: > - v4.1.3 (Apr 9 2026); red team platform scanning MCP server and skills - surfaces — use as security compliance checklist for our MCP server and - plugin registry hardening; not a runtime competitor. - source_url: https://github.com/Tencent/AI-Infra-Guard/releases - - - name: Holaboss - slug: holaboss - date: "2026-04-17" - version: "active" - stars: "1.7k" - threat_level: low - notable_changes: > - Desktop "AI employee" with filesystem-as-memory and compaction boundaries; - single-agent, no A2A — primary concern is terminology collisions - (workspace / MEMORY.md / SKILL.md / agentskills.io). - source_url: https://github.com/holaboss-ai/holaboss-ai - - - name: claude-mem - slug: claude-mem - date: "2026-04-17" - version: "active" - stars: "56k" - threat_level: low - notable_changes: > - SQLite FTS5 + Chroma hybrid cross-session memory with lifecycle hooks; - 56k ⭐ signals strong demand for the gap we need to close in agent_memories - — adopt PostToolUse + SessionEnd observation pipeline. - source_url: https://github.com/thedotmack/claude-mem - - - name: Plannotator - slug: plannotator - date: "2026-04-17" - version: "v0.17.10" - stars: "4.3k" - threat_level: low - notable_changes: > - v0.17.10 (Apr 13 2026); HITL plan annotation UX with structured feedback - types (delete/insert/replace/comment); reference design for improving our - approvals API response schema. - source_url: https://github.com/backnotprop/plannotator/releases - - - name: open-multi-agent - slug: open-multi-agent - date: "2026-04-17" - version: "v1.1.0" - stars: "5.7k" - threat_level: low - notable_changes: > - v1.1.0 (Apr 1 2026); TypeScript multi-agent with runtime goal-to-DAG - decomposition in 3 deps; ephemeral per-run — no persistent identity, - no canvas, no scheduling. - source_url: https://github.com/JackChen-me/open-multi-agent/releases - - - name: Open Agents (Vercel) - slug: open-agents-vercel - date: "2026-04-17" - version: "active" - stars: "2.2k" - threat_level: low - notable_changes: > - +1,020 stars in one day (Apr 15 2026); Vercel Labs reference app for - background coding agents with snapshot-based VM resumption; no multi- - agent coordination — reference template, not a platform. - source_url: https://github.com/vercel-labs/open-agents - - - name: GenericAgent - slug: generic-agent - date: "2026-04-17" - version: "v1.0" - stars: "2.1k" - threat_level: low - notable_changes: > - v1.0 (Jan 16 2026); self-evolving skill tree with four-tier memory - hierarchy (L0 rules → L4 session archives); single-agent, no A2A — - memory taxonomy worth borrowing for agent_memories scopes. - source_url: https://github.com/lsdefine/GenericAgent/releases - - - name: OpenSRE - slug: opensre - date: "2026-04-17" - version: "active" - stars: "900" - threat_level: low - notable_changes: > - AI SRE toolkit with 40+ observability integrations (Grafana/Datadog/ - K8s/AWS/GCP/PagerDuty); potential DevOps workspace skill-pack source - rather than a competing platform. - source_url: https://github.com/Tracer-Cloud/opensre - - - name: AMD GAIA - slug: amd-gaia - date: "2026-04-17" - version: "v0.17.2" - stars: "1.2k" - threat_level: low - notable_changes: > - v0.17.2 (Apr 10 2026); AMD-backed local agent framework hardware-locked - to Ryzen AI 300+ NPU; MCP support; not general-purpose. - source_url: https://github.com/amd/gaia/releases - - - name: Cognee - slug: cognee - date: "2026-04-17" - version: "v1.0.1.dev1" - stars: "15.8k" - threat_level: low - notable_changes: > - Hybrid graph+vector knowledge engine for agent memory; claude-code plugin - + Hermes Agent native integration; cross-agent knowledge sharing with - tenant isolation; reference design for closing our agent_memories gap. - source_url: https://github.com/topoteretes/cognee/releases - - - name: Archestra - slug: archestra - date: "2026-04-17" - version: "platform-v1.2.15" - stars: "3.6k" - threat_level: low - notable_changes: > - Enterprise MCP registry + dual-LLM security gateway (Apr 16 2026); - centralized MCP server governance, Kubernetes-native, AGPL-3.0; - reference design for our plugin registry governance story. - source_url: https://github.com/archestra-ai/archestra/releases - - - name: GitHub MCP Server - slug: github-mcp-server - date: "2026-04-17" - version: "v1.0.0" - stars: "28.9k" - threat_level: low - notable_changes: > - v1.0.0 GA (Apr 16 2026); 60+ tools across 20+ toolsets (repos, issues, - PRs, Actions, security, code scanning); GitHub-hosted or local Docker; - adopt as workspace plugin source for GitHub-native agent orgs. - source_url: https://github.com/github/github-mcp-server/releases - - - name: Skillshare - slug: skillshare - date: "2026-04-17" - version: "v0.19.2" - stars: "1.5k" - threat_level: low - notable_changes: > - v0.19.2 (Apr 14 2026); Go binary syncing SKILL.md + agent configs across - 50+ AI tools (Claude Code, Codex, OpenClaw, Cursor) via symlinks; reference - design for cross-tool skill distribution; direct overlap with our plugins/. - source_url: https://github.com/runkids/skillshare/releases - - - name: Compound Engineering Plugin - slug: compound-engineering-plugin - date: "2026-04-17" - version: "v2.66.1" - stars: "14.5k" - threat_level: low - notable_changes: > - v2.66.1 (Apr 16 2026); TypeScript CLI distributes one plugin to 12 AI - runtimes simultaneously (Claude Code, Cursor, Codex, OpenClaw, Gemini, - Kiro, Windsurf, etc.); competing multi-runtime distribution mechanism - vs. our agentskills.io plugin portability strategy; 103 stars gained today. - source_url: https://github.com/EveryInc/compound-engineering-plugin/releases - - - name: EDDI - slug: eddi - date: "2026-04-17" - version: "v6.0.1" - stars: "296" - threat_level: low - notable_changes: > - Show HN Apr 17 2026; config-driven multi-agent orchestration (Java/Quarkus) - with A2A, cron scheduling, Ed25519 cryptographic agent identity, - GDPR/HIPAA posture, HMAC-SHA256 immutable audit ledger, 12 LLM providers + - MCP; reference design for compliance-guardrails audit trail posture. - source_url: https://github.com/labsai/EDDI/releases - - - name: Cloudflare Artifacts - slug: cloudflare-artifacts - date: "2026-04-17" - version: "beta" - stars: "N/A" - threat_level: low - notable_changes: > - Apr 16 2026 private beta; Git-compatible versioned workspace storage - for agents (programmatic repo create/fork/clone/diff, ~100KB Zig+WASM - Git engine) on Cloudflare Durable Objects; ArtifactFS driver open-sourced; - infrastructure watch — escalate to MEDIUM if Cloudflare Agents SDK - integrates Artifacts as a managed workspace-persistence layer. - source_url: https://blog.cloudflare.com/artifacts-git-for-agents-beta/ - - - name: dimos - slug: dimos - date: "2026-04-17" - version: "v0.0.11" - stars: "2.9k" - threat_level: low - notable_changes: > - GitHub trending Apr 17 2026 (+137 today); agentic OS for robotics - (humanoids, quadrupeds, drones, robotic arms) via natural language; - MCP as primary agent interface; module/blueprint architecture with - typed stream passing; spatial+temporal memory (SLAM + spatio-temporal - RAG); hardware: Unitree, AgileX, DJI, MAVLink. Python/MIT. Watch for - A2A support — would make robot workspaces first-class Molecule AI peers. - source_url: https://github.com/dimensionalOS/dimos - - - name: Cloudflare Workers AI - slug: cloudflare-workers-ai - date: "2026-04-17" - version: "Agents Week 2026" - stars: "N/A" - threat_level: low - notable_changes: > - Agents Week Apr 2026; unified inference layer for agents: 70+ models, - 14+ providers (OpenAI, Anthropic, Google), auto-failover, streaming - resilience, 330 global PoPs. Complements Cloudflare Durable Objects - (agent state), Artifacts (versioned storage), and Agents SDK (multi-step - orchestration). Cloudflare assembling full-stack agent platform. - Escalate to MEDIUM if Agents SDK integrates all four primitives into - one-click multi-agent deployment. - source_url: https://blog.cloudflare.com/ai-workspace-server/ - - - name: EvoMap Evolver - slug: evomap-evolver - date: "2026-04-17" - version: "v1.67.1" - stars: "3.3k" - threat_level: low - notable_changes: > - v1.67.1 (Apr 17 2026, +812 stars today); GEP-powered A2A-native agent - self-evolution engine (JavaScript/GPL-3.0); worker nodes advertise - capability domains on A2A Hub, heartbeat every 6 min, compatible with - our A2A protocol; SKILL.md + networked Skill Store natively align with - agentskills.io; immutable EvolutionEvent JSONL is the closest open-source - audit ledger reference for governance canvas (#582). Integration - opportunity — not a direct competitor. - source_url: https://github.com/EvoMap/evolver/releases - - - name: AI Hedge Fund - slug: ai-hedge-fund - date: "2026-04-17" - version: "n/a" - stars: "55.7k" - threat_level: low - notable_changes: > - +763 stars today (Apr 17 2026); reference multi-agent system with 19 - specialized financial-analysis agents (portfolio manager, risk manager, - bear/bull analysts, sector specialists) collaborating on stock analysis - and trading signals; supports Ollama local LLMs and cloud providers; - high-visibility demand signal for domain-specific multi-agent - orchestration; not a competing platform — a reference implementation. - source_url: https://github.com/virattt/ai-hedge-fund -``` - ---- - -## Entries - -### Holaboss — `holaboss-ai/holaboss-ai` - -**Pitch:** "AI workspace desktop for business — build, run, and package AI -workspaces and workspace templates with a desktop app and portable runtime." - -**Shape:** Electron desktop app + TypeScript runtime. **Single active agent -per workspace.** MIT-licensed OSS core with a hosted Holaboss backend for -some features (proposal ideation). macOS supported; Windows/Linux in progress. - -**Overlap with us:** both call the unit of packaging a "workspace"; -both ship a `skills//SKILL.md` convention; both have a plugin/app -marketplace; both treat long-lived context as important. - -**Differentiation:** Holaboss is the **"AI employee"** shape — one agent -holding one role for months, with heroic effort spent on token-cost -discipline (compaction boundaries, `prompt_cache_profile`, stable vs -volatile prompt sections). We're the **"AI company"** shape — many agents -collaborating via A2A, visual org chart, multiple runtimes. No A2A, no -multi-agent coordination on their side. - -**Worth borrowing:** -- Filesystem-as-memory: `memory/workspace//knowledge/{facts,procedures,blockers,reference}/` + scoped `preference/` and `identity/` namespaces. Clean model for durable memory that beats our current DB-only approach for inspectability. -- Compaction boundary artifact (summary + restoration order + preserved turn ids + request snapshot fingerprint) — if we ever add long-horizon single-agent mode, this is the reference design. -- Section-based prompt assembly with per-section cache fingerprints. Could reduce our Claude Code prompt cost. -- `workspace.yaml` rejects inline prompt bodies — forces prompts into `AGENTS.md`. We should do the same in `config.yaml` to keep runtime plans machine-readable. - -**Terminology collisions:** -- "workspace" — theirs is a directory + agent state; ours is a Docker container running one agent in a team. -- "MEMORY.md" — theirs is the structured memory-service root; ours is the native file Claude Code / DeepAgents read. -- "skills/SKILL.md" — same filesystem convention, both inject into system prompt. Fully compatible in spirit. - -**Signals to react to:** -- If they add A2A between workspaces → direct competitor; revisit differentiation. -- If they publish the compaction-boundary format as a spec → adopt. - -**Last reviewed:** 2026-04-12 · **Stars / activity:** ~1.7k ⭐, pushed today - ---- - -### Hermes Agent — `NousResearch/hermes-agent` - -**Pitch:** "The self-improving AI agent built by Nous Research — creates -skills from experience, improves them during use, searches its own past -conversations, and builds a model of who you are across sessions." - -**Shape:** Python-first agent framework with a TUI + multi-messenger -gateway (Telegram / Discord / Slack / WhatsApp / Signal / Email). Single -user, single continuous agent with a closed **learning loop**. Six -execution backends (local, Docker, SSH, Daytona, Singularity, Modal — -last two are serverless w/ hibernation). MIT, ~61k⭐ and climbing fast. - -**Overlap with us:** -- "Skills" with filesystem convention — compatible with the - [agentskills.io](https://agentskills.io) open standard they back. -- Subagent spawning for parallel work. -- Scheduled automations (natural-language cron). -- Model-agnostic (Nous Portal, OpenRouter, GLM, Kimi, MiniMax, OpenAI, …). - -**Differentiation:** Hermes is the **"personal AI across every messenger"** -shape — one agent that knows *you* deeply and runs anywhere. We're the -**"team of agents behind a canvas"** shape — many roles collaborating on -shared work. Hermes has no visual canvas, no org hierarchy, no A2A between -workspaces. - -**Worth borrowing:** -- **Closed learning loop**: autonomous skill creation after complex tasks, - skills self-improve during use, agent-curated memory with periodic nudges - to persist knowledge. This is a much stronger memory discipline than - ours; the "nudge to persist" pattern in particular is cheap to implement. -- **FTS5 + LLM-summarization** for cross-session recall — cheap, no - vector-store overhead, works great for the "did I tell you about X" case. -- **Honcho dialectic user modeling** (`plastic-labs/honcho`) for building - a model of the user across sessions. Worth evaluating as a memory backend - for Molecule AI's PM workspace specifically (the one role where knowing - the CEO well matters most). -- **Daytona / Modal serverless backends** with hibernation — a great fit - for our DevOps workspaces that only wake for scheduled audits. Could - drop our idle compute cost meaningfully. -- **`hermes claw migrate`** command — gracefully import users from - OpenClaw (the predecessor). Good pattern if we ever deprecate a runtime - adapter. - -**Terminology collisions:** -- "skills" — same direction as ours post-refactor (file-based, installable, - runtime-agnostic). Their - [agentskills.io](https://agentskills.io) spec is worth reading before we - finalize our plugin manifest schema. -- Topic tags on the repo include `openclaw`, `clawdbot`, `moltbot`, - `claude-code`, `codex` — Nous Research has a whole agent family. Our - `workspace/adapters/openclaw/` adapter predates Hermes's - rebrand; check whether it still points to a live project. - -**Signals to react to:** -- If `agentskills.io` spec picks up mass adoption → align our plugin - manifest so the same skill repo installs on Hermes AND Molecule AI. -- If Hermes ships multi-agent / A2A → direct overlap with our core thesis. -- If Atropos RL trajectory generation becomes the standard for training - tool-calling models → our workspace activity logs should adopt the - trajectory schema so users can export training data. - -**Last reviewed:** 2026-04-12 · **Stars / activity:** ~61k ⭐, pushed today - ---- - -### gstack — `garrytan/gstack` - -**Pitch:** "Use Garry Tan's exact Claude Code setup: 23 opinionated tools -that serve as CEO, Designer, Eng Manager, Release Manager, Doc Engineer, -and QA." Claude Code skills bundle, MIT, ~70k⭐ and going viral on X. - -**Shape:** A single directory of Markdown slash-command definitions -installed at `~/.claude/skills/gstack/`, invoked inside one Claude Code -session: `/office-hours`, `/plan-ceo-review`, `/review`, `/qa`, `/ship`, -`/land-and-deploy`, `/cso` (security), `/retro`, etc. No services, no -containers, no DB — just prompts and scripts that the Claude Code CLI -executes in whatever repo the user has open. - -**Overlap with us:** -- **Same role metaphor as molecule-dev.** Both cast AI work as a cast of - roles (CEO, Eng Manager, Designer, Security, QA). The naming overlap is - nearly 1:1 with our org template. -- **Claude Code-native**, Markdown-driven config, "skills" as the unit. -- Team-mode auto-updates shared repos — same instinct as our org templates. - -**Differentiation:** gstack is **sequential, single-session, single-repo.** -One Claude Code session runs each slash command in turn; the "team" is a -persona switch, not separate processes. We're **parallel, multi-session, -hierarchical**: real containers, A2A between siblings, a visual canvas, -real-time WebSocket updates, schedules, org bundles. gstack has no -multi-agent coordination, no A2A, no canvas, no workspace persistence -beyond git — it's a brilliant prompt library, not an orchestration platform. - -**Worth borrowing:** -- **`/retro` command**: generates a weekly retrospective from git history - ("140,751 lines added, 362 commits, ~115k net LOC in one week"). Would - be a natural addition to our PM agent's toolbox — `commit_memory` + - git log synthesis. Cheap win. -- **`/autoplan` and `/freeze` / `/guard` / `/unfreeze`** for architectural - guardrails during a risky change. Maps cleanly onto our approval flow — - could turn into a `/freeze` hook that sets a workspace-level policy flag - preventing certain tool calls during a migration. -- **Role-prompt library.** gstack has spent a lot of effort on the CEO / - Designer / Eng Manager personas. Even without adopting their runtime, - we could lift the prompt text into our molecule-dev system-prompt.md - files with attribution. Their CSO (OWASP + STRIDE audit) and Designer - (AI-slop detection) personas are both stronger than ours today. -- **Team-mode auto-update** (throttled once/hour, network-failure-safe, - silent) — good pattern for keeping plugins in sync across an org - without requiring manual `/plugins/install` calls. - -**Terminology collisions:** -- "Skills" — gstack ships everything as Claude Code skills (filesystem - convention `~/.claude/skills//`). Same filesystem shape as - ours AND Hermes AND Holaboss. Four projects, one spec shape — should - formalize with [agentskills.io](https://agentskills.io). -- "Ship / Release" — their `/ship` is a local PR-and-merge flow; - nothing to do with our A2A lifecycle. -- Mentions "OpenClaw" (247k ⭐ claim) as inspiration — tracks with the - Hermes entry's note that the OpenClaw name is alive in multiple - ecosystems. - -**Signals to react to:** -- If gstack adds multi-session / parallel execution (spawning multiple - Claude Code workers and routing between them) → direct competitor - with a 70k⭐ head start. Revisit our differentiation messaging. -- If their `/plan-ceo-review` prompt or `/qa` browser flow becomes an - informal standard → copy it into molecule-dev's system prompts. -- If Garry Tan posts a video deploying gstack on a new use case → - high-signal about what "everyone" will ask us to support next week. - -**Last reviewed:** 2026-04-12 · **Stars / activity:** ~70k ⭐, pushed yesterday - ---- - -### Composio — `composio-dev/composio` - -**Pitch:** "The integration layer for AI agents — 250+ tools across Slack, -GitHub, Telegram, Linear, Discord, and more, with managed auth." - -**Shape:** Python + TypeScript SDK. Pure integration library — no agent -runtime, no visual canvas. Plugs into any LLM framework (LangChain, -LangGraph, AutoGen, CrewAI, Claude, OpenAI Agents). Managed auth so agents -can act on user-connected accounts. MIT-adjacent, ~18k ⭐. - -**Overlap with us:** Both provide agent-accessible Slack, Telegram, and -Discord channels. Both handle OAuth / credential management for workspace -integrations. Channels feature in `workspace-server/internal/handlers/channels.go` -does a subset of what Composio does for the messaging platforms. - -**Differentiation:** Composio is a tool library, not a runtime or org -hierarchy. No canvas, no A2A between agents, no org structure. They're -"the 250 tools agents can call"; we're "the company that runs the agents." -Composio could be a dependency inside a Molecule AI workspace skill — not a -competitor for the platform layer. - -**Worth borrowing:** -- **Trigger model:** inbound webhook → fire agent → respond in same channel. - Our channels feature handles outbound well but inbound triggers are still - manually configured. Composio's trigger schema is worth adopting. -- **"Connected accounts" pattern:** per-workspace OAuth token stored per - integration, reused across runs. Our `workspace_channels` JSONB config is - close; formalize as a named model. -- **Auth sandbox:** test mode that mocks API calls — useful for our - `POST /workspaces/:id/channels/:id/test` endpoint. - -**Terminology collisions:** -- "actions" = their tool calls; we use "skills." -- "triggers" = their inbound webhooks; we use channels + schedules. - -**Signals to react to:** -- If they add persistent agent identity across trigger runs → direct overlap - with our workspace model. -- If they add A2A between agent sessions or multi-agent orchestration → threat - to our integration story. -- If `agentskills.io` adopts Composio trigger schema → we should too. - -**Last reviewed:** 2026-04-13 · **Stars / activity:** ~18k ⭐, active - ---- - -### n8n — `n8n-io/n8n` - -**Pitch:** "Fair-code workflow automation with 400+ integrations — build AI -pipelines visually, self-host or cloud." - -**Shape:** Node.js, self-hosted or n8n cloud. Visual workflow builder (nodes -+ edges, not unlike React Flow). 400+ connectors: Slack, Telegram, Discord, -WhatsApp, Email, GitHub, Linear, Notion, … plus dedicated AI nodes -(LLM chains, agent nodes, vector stores, tool use). Fair-code license -(source-available, free for internal use). ~50k ⭐, pushed daily. - -**Overlap with us:** -- Visual graph metaphor for orchestrating work (their nodes ≈ our canvas - workspaces). -- Connects AI agents to Slack / Telegram / Discord / WhatsApp — identical - surface to our `workspace_channels` feature. -- Scheduled automations (cron triggers) → same as `workspace_schedules`. -- Self-hostable, Docker Compose first-class. - -**Differentiation:** n8n is trigger→step→step→output (stateless sequential -workflow per run). No persistent agent identity, no shared memory across -runs, no org hierarchy, no A2A between agents. Each execution is isolated. -We're "agents that remember, collaborate, and hold roles"; they're "workflows -that transform data." The UX audiences barely overlap: n8n users are ops/no-code -builders; Molecule AI users are developers building agent companies. - -**Worth borrowing:** -- **Channel trigger UX:** select platform → OAuth → pick chat → done in - three clicks. Our channel setup requires more manual config; this flow is - the right target for `POST /workspaces/:id/channels`. -- **"Test workflow" dry-run:** one-click test execution with live output. - Maps well onto our `POST /workspaces/:id/channels/:id/test` — we should - fire a real test message and show the round-trip result inline. -- **Sticky notes on canvas:** freeform annotation nodes for documentation. - Cheap win for our canvas — could be a "comment node" workspace type. -- **Execution log with step-level timing:** n8n shows each node's in/out - data and ms. Our `activity_logs` captures A2A traffic but not intra-agent - step timing. Worth adding to the trace view. - -**Terminology collisions:** -- "workflow" — their atomic unit; for us "workflow" is informal. No hard - collision but our marketing copy should avoid it to stay distinct. -- "nodes" — their workflow steps; our canvas nodes are workspaces. Different - enough to not cause user confusion, but worth noting in docs. - -**Signals to react to:** -- If n8n ships persistent agent nodes (memory between runs) → direct - substitute for simple Molecule AI use cases. They've been adding AI nodes - fast (AI Agent node shipped 2024-Q3). -- If they add multi-agent coordination with shared state → revisit our - differentiation messaging. -- If a major Slack/Discord bot tutorial uses n8n instead of a custom agent - → indicates channel-first UX is the market expectation we need to match. - -**Last reviewed:** 2026-04-13 · **Stars / activity:** ~50k ⭐, pushed daily - ---- - -### Pydantic AI — `pydantic/pydantic-ai` - -**Pitch:** "AI Agent Framework, the Pydantic way." - -**Shape:** Python SDK (MIT), ~16.3k ⭐, last release v1.8.0 on April 10, 2026 — actively maintained at high velocity. Single and multi-agent, with typed dependency injection (`RunContext[DepsType]`), structured/validated outputs (`Agent[Deps, OutputType]`), composable capability bundles (tools + hooks + instructions + model settings), built-in streaming, and human-in-the-loop tool approvals. Supports A2A and MCP natively as first-class integrations. Model-agnostic: OpenAI, Anthropic, Gemini, Mistral, Cohere, DeepSeek, Bedrock, Vertex, Ollama, OpenRouter, and more. Observability via Pydantic Logfire. - -**Overlap with us:** A2A support means Pydantic AI agents can speak directly to Molecule AI workspaces over our native protocol — they're potential consumers of Molecule AI's registry, not just a parallel ecosystem. MCP integration mirrors our workspace tool model. The composable capability bundles are the same instinct as our plugin/skills system. Logfire's agent tracing is a polished alternative to our `GET /workspaces/:id/traces` + Langfuse stack. - -**Differentiation:** Pydantic AI is a library for building agents in Python — no visual canvas, no Docker workspace isolation, no registry/discovery, no scheduling, no WebSocket org chart, no channels. It's the in-process layer; we're the operational platform layer. The two are naturally complementary: a Molecule AI workspace *running* Pydantic AI agents is a valid architecture, not a contradiction. - -**Worth borrowing:** -- **Typed dependency injection via `RunContext`** — passing strongly-typed deps (DB connection, API client, user object) into every tool and instruction without global state. Our `config.yaml` passes env vars; this pattern is safer and more testable. -- **`Agent[Deps, OutputType]` generic typing** — structured, schema-validated agent outputs. Our A2A responses are freeform text; adopting structured output schemas at the A2A layer would enable typed inter-workspace contracts. -- **Composable capability bundles** — reusable packages of tools + hooks + instructions. Our plugins install files; this is the right next evolution (code bundles, not just Markdown). - -**Terminology collisions:** -- "capabilities" — their term for composable tool+instruction bundles; we use "plugins" or "skills." -- "RunContext" — their typed dependency carrier; not a shared term, but will appear in codebases mixing Pydantic AI + Molecule AI adapters. -- "tools" — same word, same meaning. No collision, but documentation should be explicit about Pydantic AI tools vs. MCP tools vs. Molecule AI skills. - -**Signals to react to:** -- If Pydantic AI ships a workspace/session persistence layer → fills the one gap between it and Molecule AI's value; revisit our Python-SDK adapter story. -- If `pydantic-deepagents` (`vstorm-co/pydantic-deepagents`) gains traction — "Claude Code–style deep agents on Pydantic AI" — it would become a direct competitor to our Claude Code runtime adapter. -- If Logfire's agent tracing becomes the de facto standard → align our trace schema so Logfire can ingest Molecule AI workspace traces natively. - -**Last reviewed:** 2026-04-13 · **Stars / activity:** ~16.3k ⭐, v1.8.0 released April 10, 2026 - ---- - -### Rivet — `Ironclad/rivet` - -**Pitch:** "The open-source visual AI programming environment and TypeScript library." - -**Shape:** Electron desktop app + TypeScript library (MIT), ~4.5k ⭐. Visual node-based editor where AI workflows are built by connecting nodes in a graph: LLM call nodes, tool nodes, subgraph nodes, conditional branches. Runs locally; exports workflows as `.rivet-project` files that can be embedded in applications via the `@ironclad/rivet-node` npm package. Built and open-sourced by Ironclad (a Series D contract intelligence company). Model-agnostic. Plugin marketplace for custom node types. - -**Overlap with us:** The canvas is the obvious overlap — both products present AI agent work as a visual graph. Rivet's subgraph nesting (complex workflows broken into reusable components) maps to our parent/child workspace hierarchy. The plugin marketplace for custom nodes mirrors our `plugins/` registry. Rivet workflows can call external APIs, making them potential consumers of Molecule AI's `/workspaces/:id/a2a` endpoint — a Rivet node that delegates to a Molecule AI agent is a plausible integration. - -**Differentiation:** Rivet is a **workflow authoring tool**, not an agent runtime. A `.rivet-project` file describes a static graph; there's no persistent agent identity, no memory across runs, no org hierarchy, no real-time WebSocket canvas, no scheduling, no Docker container management. The Rivet editor is for building workflows; Molecule AI is for running a live org of agents. The `/channels` angle is absent from Rivet — it has no concept of an agent receiving or sending messages via Telegram, Slack, or other social platforms. Rivet's audience is developers prototyping single pipelines; ours is teams deploying multi-agent organizations. - -**Worth borrowing:** -- **Nested subgraph UX** — Rivet's handling of "graph within graph" as a first-class reusable node is the cleanest visual pattern for our parent/child workspace hierarchy. Our current Canvas flattens deeply nested teams into chips; Rivet's subgraph expand/collapse is the reference UX to study. -- **Node-level debug inspector** — clicking any node in a completed run shows its exact inputs, outputs, and latency. Our Canvas chat shows A2A messages but not intra-workspace step-level data. This is the natural evolution of our trace view. -- **`.rivet-project` portability** — workflow-as-file, embeddable in any TypeScript app via npm. Suggests we should support a "workspace bundle export" that can run outside Molecule AI, not just be imported back into it. - -**Terminology collisions:** -- "graph" — their graph is a workflow definition (static); ours is the live org chart (dynamic, stateful). Different semantics, same word. -- "node" — their nodes are workflow steps; our canvas nodes are workspaces. No runtime collision but documentation must be unambiguous. -- "plugin" — both have plugin systems; theirs extends the node palette, ours extends the workspace runtime. - -**Signals to react to:** -- If Rivet adds persistent agent state between runs → closes the gap with Molecule AI for simple use cases; revisit our "quick start" story for non-enterprise users. -- If Rivet adds a "deploy workflow as agent endpoint" feature → their visual builder becomes a Molecule AI workspace creator; consider a Rivet → Molecule AI import adapter. -- If `.rivet-project` format becomes a de facto workflow interchange standard → support importing Rivet projects as Molecule AI workspace configs. - -**Last reviewed:** 2026-04-13 · **Stars / activity:** ~4.5k ⭐, actively maintained - ---- - -### Letta — `letta-ai/letta` - -**Pitch:** "The platform for building stateful agents: AI with advanced memory that can learn and self-improve over time." - -**Shape:** Python + TypeScript SDK (Apache-2.0), ~22k ⭐, v0.16.7 released March 31, 2026. Formerly MemGPT (the research project that pioneered OS-inspired virtual context management for LLMs). Letta's defining feature is a **multi-block memory architecture**: each agent holds named, editable in-context memory segments ("core memory") such as `human`, `persona`, and `archival` blocks, which the agent can read and write via tool calls. Memories persist across sessions in a Letta Server (self-hosted or Letta Cloud). Agents are accessed via a REST API. The **ADE (Agent Development Environment)** is a graphical interface for creating, testing, and monitoring agents in real-time. Multi-agent support via subagents and shared memory. Model-agnostic (OpenAI, Anthropic, local LLMs via Ollama). - -**Overlap with us:** Letta's named memory blocks (`human`, `persona`, `archival`) are a structured evolution of the same problem our `agent_memories` table and `MEMORY.md` file solve — persistent, durable knowledge for a long-lived agent. The ADE's graphical agent-monitoring interface overlaps with our Canvas; both offer a UI to inspect and interact with running agents. Letta Server exposes a REST API that accepts messages at agent endpoints — structurally similar to our A2A proxy (`POST /workspaces/:id/a2a`). Multi-agent subagent support maps to our parent/child workspace hierarchy. Letta's `initial_prompt` equivalent (agent system prompt + memory bootstrap) mirrors our `initial_prompt` in `config.yaml`. - -**Differentiation:** Letta is focused on **the single-agent memory problem**, not the multi-agent org problem. No Docker container isolation per agent, no workspace registry, no real-time WebSocket org chart, no scheduling, no channels to Slack/Telegram/Discord. The ADE shows individual agents; it does not visualize an org hierarchy or inter-agent A2A traffic. Letta's multi-agent support is hierarchical subagent spawning within a single Letta Server context — not independently deployable, independently schedulable workspaces. We're "a company of agents"; Letta is "an agent with a very good memory." - -**Worth borrowing:** -- **Named, agent-editable memory blocks** — the `human` / `persona` / `archival` distinction is the clearest taxonomy we've seen for agent memory. Our `agent_memories` namespace is flat; adopting explicit named blocks (at minimum: `self`, `user`, `task-context`, `long-term-knowledge`) would make memory more inspectable and auditable in the Canvas. -- **Memory self-editing as a tool call** — Letta agents call `core_memory_replace(label, old, new)` and `archival_memory_insert(content)` as first-class tool actions, making memory updates part of the visible tool-call trace. Our `commit_memory` MCP tool is close; making it show up in `activity_logs` as a named tool call (not a silent background action) would match this pattern. -- **ADE real-time message inspector** — the ADE shows each tool call, memory read/write, and reasoning step inline in a timeline. This is more granular than our Canvas chat tab; it's the reference design for a "step-through debug mode" in our trace view. - -**Terminology collisions:** -- "archival memory" — Letta: a searchable long-term store the agent queries via tool calls. Ours: not a defined term. Our `agent_memories` table is functionally similar but not surfaced to agents as a named primitive. -- "persona" — Letta: a named memory block containing the agent's self-description. Ours: the `role:` field in `config.yaml` plus the system prompt. Same intent, different packaging. -- "agent" — Letta: a long-lived server-side object with persistent memory, accessed via REST. Ours: a Docker container running one of six runtimes. Same word, substantially different operational model. - -**Signals to react to:** -- If Letta ships a multi-agent canvas that visualizes org hierarchies (not just individual agent inspection) → direct overlap with our Canvas; they have strong memory credibility that could attract our target buyer. -- If Letta formalizes a memory-block schema as an open spec (building on their MemGPT research lineage) → evaluate adopting it as Molecule AI's `agent_memories` schema to gain interoperability with the Letta ecosystem. -- If Letta Cloud adds Slack/Telegram/Discord inbound triggers → they gain channels capability; currently absent, but a REST API means it's one webhook away. -- Watch v0.x → v1.0 trajectory: v0.16.7 suggests pre-1.0 API stability; a 1.0 GA announcement would signal enterprise readiness and an accelerated sales motion. - -**Last reviewed:** 2026-04-13 · **Stars / activity:** ~22k ⭐, v0.16.7 March 31, 2026 - ---- - -### Trigger.dev — `triggerdotdev/trigger.dev` - -**Pitch:** "Build and deploy fully-managed AI agents and workflows." - -**Shape:** TypeScript (Apache-2.0), ~14.5k ⭐, v4.4.3 released March 10, 2026. Started as a developer-friendly alternative to cron + background jobs; v4 repositions it squarely as **durable execution infrastructure for AI agents**. Tasks are TypeScript functions decorated with `task()` — they run in a managed cloud with: automatic retry with exponential backoff, checkpoint/resume (task state saved to storage, resumed after crash or timeout), queue and concurrency control, and cron scheduling up to one-year duration. Human-in-the-loop via `waitForApproval()`. MCP server available (`trigger-dev` MCP) so AI assistants (Claude Code, Cursor, etc.) can trigger tasks, check run status, and deploy from chat. Warm starts execute in 100–300ms. Fully self-hostable. - -**Overlap with us:** Trigger.dev's `schedules.task()` cron system overlaps directly with our `workspace_schedules` table and `POST /workspaces/:id/schedules` API — both schedule recurring prompts/tasks on a cron expression. The checkpoint/resume model (`waitForApproval`, `wait.for()`) is a precise parallel to our workspace `pause` / `resume` lifecycle. Human-in-the-loop approval gates match our `POST /workspaces/:id/approvals`. The MCP server enabling AI agents to trigger tasks maps to the same use case as our MCP server's `delegate_task` tool. Both platforms treat long-running, fault-tolerant execution as a core design constraint. - -**Differentiation:** Trigger.dev has **no agent identity** — tasks are stateless TypeScript functions, not persistent agents with memory, roles, or system prompts. No visual canvas, no org hierarchy, no A2A protocol, no workspace registry. It is execution infrastructure, not an agent platform. The right mental model: Trigger.dev is to Molecule AI what Temporal is to Molecule AI — a lower-level durable execution substrate that Molecule AI's workspaces could use as a backend for their scheduled tasks, rather than a replacement for Molecule AI itself. Their `/channels` story is inbound-only (HTTP triggers, webhooks, cron) with no native Slack/Telegram messaging surface. - -**Worth borrowing:** -- **Idempotency keys on task invocation** — `trigger("send-report", payload, { idempotencyKey: runId })` ensures a task is only ever executed once for a given key, even if triggered multiple times. Our delegation system has no equivalent guard; duplicate delegations from container-restart races are a known issue (see `delegationRetryDelay` in `delegation.go`). Adding idempotency keys to `POST /workspaces/:id/delegate` would fix the duplicate-execution class of bugs. -- **`waitForApproval()` inline in task code** — instead of a separate approvals table and polling loop, the task itself calls `await wait.for({ event: "approval" })` and suspends. Our approval flow requires a separate API round-trip and the agent to re-check; Trigger.dev's inline suspension is the right long-term model. -- **Warm-start pool for sub-300ms agent starts** — Trigger.dev pre-warms TypeScript runtimes to achieve 100–300ms cold start. Our Docker workspace startup is measured in seconds. Worth evaluating their warm-pool approach for our claude-code and langgraph adapters. - -**Terminology collisions:** -- "task" — Trigger.dev: a decorated TypeScript function, the atomic unit of execution. Ours: informal (used in delegation context and `current_task` heartbeat field). Their definition is more precise; consider whether our heartbeat `current_task` field should be renamed to avoid collision with Trigger.dev vocabulary in integrations. -- "schedule" — same word, same meaning. Trigger.dev's cron schedule API and ours (`workspace_schedules`) are functionally identical at the surface. Our docs should distinguish "Molecule AI schedules" from "Trigger.dev schedules" clearly when positioning integrations. -- "run" — Trigger.dev: a single execution of a task with full lifecycle tracking. Ours: informal. No hard collision. - -**Signals to react to:** -- If Trigger.dev ships native agent identity (persistent state, memory across runs, named agents) → crosses from infrastructure into platform territory; reevaluate positioning. -- If the `trigger-dev` MCP becomes a de facto standard for AI-tool-triggered background work → add a Trigger.dev adapter to our workspace runtime so Molecule AI agents can fire Trigger.dev tasks as a tool call (complementary, not competitive). -- If Trigger.dev ships a Slack/Discord trigger adapter → they gain a channels surface; currently absent. Watch their integration roadmap. -- Their TypeScript-first stack and MCP server target the same developer audience as our Canvas + mcp-server. Co-marketing opportunity: "run your Molecule AI agent on a schedule via Trigger.dev" is a cleaner story than our current in-house cron for some user segments. - -**Last reviewed:** 2026-04-13 · **Stars / activity:** ~14.5k ⭐, v4.4.3 March 10, 2026 - ---- - -### Mem0 — `mem0ai/mem0` - -**Pitch:** "The memory layer for AI agents — add persistent, adaptive memory to any LLM application." - -**Shape:** Python/TypeScript SDK (Apache 2.0), ~25k ⭐. Runs as an embedded library or managed cloud service. Extracts structured memory objects from conversations (facts, preferences, relationships), stores them with embeddings, and retrieves relevant memories on each new interaction. Supports multiple vector backends (Qdrant, Pinecone, Chroma, Postgres pgvector). REST API available. - -**Overlap with us:** Molecule AI ships `agent_memories` + `/workspaces/:id/memories` for per-agent memory. Mem0 targets exactly this use case and is the incumbent OSS solution for add-on agent memory. Any team evaluating Molecule AI will compare our memory primitives to Mem0's. - -**Differentiation:** Mem0 is a memory service, not an agent platform. It has no workspace lifecycle, no org hierarchy, no A2A protocol, no canvas, no scheduling. Molecule AI memory is scoped per-workspace and stored in Postgres as raw key-value pairs; Mem0 extracts and semantically indexes facts across interactions using vector search. The extraction step is the critical gap — we store what agents explicitly save, Mem0 learns what matters automatically. - -**Worth borrowing:** -- **Structured extraction** — Mem0 auto-extracts facts ("project uses zinc-900 palette") from conversation text. Adding extraction to our memory writes would improve recall quality for long-running agents without agents needing to explicitly call `commit_memory`. -- **pgvector backend** — supports Postgres pgvector; we could add semantic memory search to our existing DB with no new infrastructure. - -**Terminology collisions:** -- "memory" — same word, different semantics. Mem0 memories are extracted semantic facts; our memories are programmatically set key-value pairs. - -**Signals to react to:** -- If Mem0 ships multi-agent scoped memories (shared across an org) → directly competes with our team memory model. -- If Mem0 becomes default memory backend for LangGraph or CrewAI → assess whether our adapters should delegate to Mem0 under the hood. - -**Last reviewed:** 2026-04-13 · **Stars / activity:** ~25k ⭐, actively maintained - ---- - -### AG2 — `ag2ai/ag2` - -**Pitch:** "A programming framework for agentic AI — the continuation of AutoGen by the original team." - -**Shape:** Python library (Apache 2.0), ~40k ⭐ (combined AutoGen lineage). Community fork maintained by the original AutoGen core contributors after Microsoft redirected `microsoft/autogen` toward a new architecture. Preserves the classic AutoGen API: `AssistantAgent`, `UserProxyAgent`, `GroupChat`, `GroupChatManager`. Actively ships new features (tool calling, code execution, nested chats). `pip install ag2` is now the recommended path for the classic AutoGen experience. - -**Overlap with us:** Molecule AI ships an `autogen` runtime adapter targeting `microsoft/autogen`. AG2 is API-compatible for most use cases but is the fork with active community investment — our adapter should be validated against AG2 and the migration path assessed. - -**Differentiation:** AG2 is a conversation orchestration framework, not an agent platform. Agents are ephemeral Python objects per-conversation; no persistent workspace identity, no canvas, no Docker management, no org hierarchy, no A2A, no scheduling. Molecule AI workspaces are long-lived; AG2 sessions are not. - -**Worth borrowing:** -- **GroupChat speaker selection** — AG2's `GroupChatManager` supports round-robin, auto (LLM-selected), and custom speaker strategies. More sophisticated than our linear PM → Lead → Engineer delegation; study for future dynamic routing. -- **Hardened code execution sandbox** — AG2's Docker-isolated code execution container is the reference design for any Molecule AI feature where engineer agents run arbitrary code. - -**Terminology collisions:** -- "agent" — their agents are ephemeral Python objects; ours are long-lived Docker workspaces. -- "GroupChat" — their multi-agent coordination primitive; analogous to our PM + team hierarchy but stateless. - -**Signals to react to:** -- If the `microsoft/autogen` ↔ AG2 split resolves → update our adapter target accordingly; don't maintain two paths. -- If AG2 ships persistent agent state → direct competitor to our Claude Code and LangGraph adapters. - -**Last reviewed:** 2026-04-13 · **Stars / activity:** ~40k ⭐ (primary community repo for AutoGen lineage) ---- - -### Super Dev — `shangyankeji/super-dev` - -**Pitch:** "Engineering workflow layer for AI coding tools — specs, review, quality gates, and traceability for commercial-grade AI-assisted delivery." - -**Shape:** Python 3.10+ CLI (MIT), ~217 ⭐, v2.3.7. Not an agent runtime — a governance overlay that injects structured workflow into existing AI coding hosts (Claude Code, Cursor, Cline, Codex). Users invoke via `/super-dev` inside their host tool. Delivers an 8-phase pipeline (research → PRD → architecture → UI/UX → spec → implementation → quality → delivery) with 11 domain-expert context injections per phase (PRODUCT, PM, ARCHITECT, UI, UX, SECURITY, CODE, DBA, QA, DEVOPS, RCA), YAML-driven validation rules, knowledge-file auto-injection, and DORA-4 delivery metrics. Primary audience: Chinese-market developers; bilingual README. 63 forks as of April 2026. - -**Overlap with us:** Both use a PM role, a "skills" directory convention, CLAUDE.md injection, and quality gates. Molecule AI users who run Claude Code workspaces may already use super-dev inside that workspace — orthogonal layers, not competitors. - -**Differentiation:** Super-dev engineers a solo developer's AI coding session; Molecule AI engineers a team of persistent AI agents collaborating via A2A. Super-dev has no agent identity, no workspace lifecycle, no Docker runtime, no multi-agent coordination. Molecule AI has no per-phase expert Playbooks or spec-traceability. Complementary shapes. - -**Worth borrowing:** -- **Expert-Playbook injection** — 11 domain experts with 350-line Playbooks auto-injected per pipeline phase. Our org-template system-prompts are the equivalent, but super-dev's staged injection (only relevant experts per phase) is more surgical than our always-on prompts. -- **Staged pipeline formalism** — explicit phase names (research → spec → quality) with mandatory confirmation gates. Formalizing this in Molecule AI's PM org-template would make agent hand-offs auditable. -- **Spec-Code traceability** — `super-dev spec trace` links implementation files back to spec docs. Worth adding as a workflow convention even without tooling. -- **YAML validation rules with multi-level severity** — 14 built-in rules + custom rules. Adapt for Molecule AI's own QA step. - -**Terminology collisions:** -- "memory" — super-dev has 4 typed memory categories (user / feedback / project / reference) with dream consolidation; ours are key-value pairs programmatically set by agents. -- "skills" — super-dev's `super-dev-skill/` is a host-injection convention; our `skills/` are composable agent behaviours loaded at workspace boot. -- "PM" — their PM is an expert context fragment; ours is a live orchestrating agent. -- "pipeline" — their 8-phase delivery sequence vs our runtime adapter selection + delegation chains. - -**Signals to react to:** -- If super-dev ships multi-agent coordination (shared workspace state, agent hand-offs beyond single-host) → overlap increases materially; assess positioning. -- If super-dev adds a Molecule AI workspace adapter (they already handle Claude Code, Cursor, Cline) → co-marketing / integration opportunity; our Claude Code adapter runs inside their pipeline. -- If the "11 expert Playbook" pattern gets wide adoption → formalize equivalent staged-injection in our PM + Dev Lead system prompts. - -**Last reviewed:** 2026-04-13 · **Stars / activity:** ~217 ⭐, 63 forks, v2.3.7 pushed Apr 13 2026 ---- - -### Sierra — `sierra.ai` *(commercial, no public repo)* - -**Pitch:** "AI agents for customer service — production-grade conversational AI that handles complex customer issues end-to-end without human escalation." - -**Shape:** Enterprise SaaS (YC-backed, ~$4B valuation, 2024). Sierra builds custom AI agents for specific companies (Sonos, Weight Watchers, OluKai) rather than a general-purpose platform. Each deployment is a brand-trained agent that handles returns, account management, troubleshooting, and purchasing through multi-turn natural-language conversation. No self-serve tier; sold via enterprise contract. Backed by Bret Taylor and Clay Bavor (ex-Google). No public SDK or API. - -**Overlap with us:** Both are "agents with persistent state and human-readable conversation history." Sierra's agent architecture (multi-turn session, tool calls to CRMs/ERPs, escalation triggers) is the same shape as a Molecule AI workspace with A2A access to backend tools. Sierra targets the customer-service vertical; Molecule AI targets engineering teams. Same underlying pattern, radically different buyer. - -**Differentiation:** Sierra is a fully managed, vertically specialized offering — customers buy a branded agent, not a platform. Molecule AI sells the platform and lets teams compose their own agents. Sierra has no org hierarchy, no multi-agent orchestration within a session, no developer API. Molecule AI has no trained vertical-specific knowledge, no out-of-box CRM/ERP connectors, no customer service SLA guarantees. Sierra's moat is vertical depth + enterprise trust; ours is composability + developer control. - -**Worth borrowing:** -- **Agent personality/brand layer** — Sierra's agents adopt a company's tone, policies, and vocabulary as a first-class config layer. Our `SOUL.md` convention in the OpenClaw adapter is the nearest equivalent; worth generalising as a platform concept (a "persona" config block in org.yaml that injects brand voice into every system-prompt). -- **Escalation to human** — Sierra has a defined handoff protocol when confidence drops or the issue requires a human. Our `approvals` table covers the "pause for review" pattern; a formal escalation tool (create a ticket, notify a human via channel) is missing. - -**Terminology collisions:** -- "agent" — Sierra: a deployed brand-trained assistant. Ours: a Docker workspace with a role. Conceptually adjacent, not interchangeable. -- "session" — Sierra: one customer conversation. Ours: not a first-class concept. - -**Signals to react to:** -- If Sierra opens a developer API or self-serve tier → they enter our addressable market for teams that want a customer-facing agent alongside their internal engineering agents. -- If Sierra raises another round or announces a platform play → they may be building the platform we're building, just starting from the customer service vertical rather than engineering. -- Enterprise buyers comparing us to Sierra → emphasize Molecule AI's programmability and multi-agent composition vs Sierra's closed vertical depth. - -**Last reviewed:** 2026-04-13 · **Stars / activity:** commercial SaaS, ~$4B valuation, no public repo - ---- - -### ERNIE / Baidu LLM line — `qianfan.baidubce.com` - -**Pitch:** Baidu's family of large language models — ERNIE 4.5, ERNIE Speed, ERNIE Lite — available via the Qianfan platform with OpenAI-compatible endpoints. Primary model provider for the Chinese-market hackathon ecosystem and the cheapest LLM option for Molecule AI sub-agents given available free credits. - -**Shape:** Cloud API (Baidu Cloud). ERNIE models span capability tiers: ERNIE 4.5 (flagship, strong reasoning), ERNIE Speed (fast, cost-efficient), ERNIE Lite (cheapest, for low-stakes tasks). Accessed via `https://qianfan.baidubce.com/v2` with OpenAI-compat JSON format. Auth: `QIANFAN_API_KEY` (standard) or `AISTUDIO_API_KEY` (via Google AI Studio compat layer at `https://generativelanguage.googleapis.com/v1beta/openai`). Not a competitor; it's infrastructure. - -**Overlap with us:** Molecule AI now has `AISTUDIO_API_KEY` and `QIANFAN_API_KEY` as recognised adapter keys (openclaw adapter fix, SHA d779e16). The MeDo hackathon integration targets the Baidu Cloud ecosystem, making ERNIE models the natural default for hackathon workspaces. ERNIE Speed / ERNIE Lite are cost candidates for Research Lead and Market Analyst sub-agents where we don't need Opus-class reasoning. - -**Differentiation:** ERNIE is a model line, not a platform. No agents, no orchestration, no workflow. Molecule AI is the platform; ERNIE is one of many possible backends. The entry here is about when to route to ERNIE rather than Anthropic or OpenAI. - -**Worth borrowing:** -- **Tiered model routing by task complexity** — ERNIE's Speed/Lite/4.5 tiers make explicit the "pick the cheapest model that can do the job" principle. Molecule AI's PM could route shallow research tasks (keyword search, web fetch) to ERNIE Lite and deep reasoning tasks (code review, architecture analysis) to Claude Opus. A `model_policy` field in org.yaml per-workspace would encode this without hard-coding model IDs. -- **Qianfan model hub metadata** — the Qianfan API surfaces context window, pricing, and availability per model in a machine-readable format. Worth scraping for a Molecule AI model registry that shows operators the cost/capability tradeoff at provisioning time. - -**Terminology collisions:** -- "knowledge base" — Baidu Qianfan's knowledge base feature (RAG pipeline) vs our `agent_memories` table. Overlapping concept; their offering is more mature on retrieval. - -**Signals to react to:** -- If `QIANFAN_API_KEY` free credit expires → swap hackathon sub-agents back to `AISTUDIO_API_KEY` + Gemini Flash. -- If ERNIE 4.5 closes the gap with Claude Sonnet on English-language reasoning → evaluate as a cost-saving default for non-PM workspaces. -- If Baidu opens ERNIE function-calling / tool-use parity with GPT-4o → ERNIE becomes viable for the Backend Engineer and QA Engineer workspaces, which require reliable structured output. - -**Last reviewed:** 2026-04-13 · **Stars / activity:** commercial API (Baidu Cloud), ERNIE 4.5 released Q1 2026 - ---- - -### MeDo — `moda.baidu.com` *(commercial, no public repo)* - -**Pitch:** Baidu's no-code AI application builder — scaffold and publish AI-powered apps through a visual editor with pre-built LLM integrations. - -**Shape:** SaaS platform (Baidu Cloud, Chinese-market primary). Users compose apps from prompt nodes, data connectors, and UI blocks via a drag-and-drop canvas. Published apps get a hosted endpoint. REST API for programmatic create/update/publish. No OSS repo; requires Baidu Cloud account. Hackathon track: MeDo SEEAI May 2026. - -**Overlap with us:** Both expose a canvas (theirs visual, ours org-chart + agent config). Both have an app-publish lifecycle. Our Canvas + workspace provisioner covers roughly the same surface for technical teams; MeDo targets non-developers. Molecule AI is integrating MeDo via the new `medo.py` builtin tool to enter the May 2026 hackathon. - -**Differentiation:** MeDo is a no-code builder for end-user AI apps; Molecule AI is a developer platform for multi-agent engineering workflows. MeDo has no A2A, no workspace Docker runtime, no persistent agent memory. Molecule AI has no no-code UI builder. The integration is complementary: Molecule AI agents can create and publish MeDo apps programmatically as a delivery step. - -**Worth borrowing:** -- **Visual prompt-node composition** — their drag-and-drop prompt pipeline could inspire a simpler Canvas view for non-technical stakeholders who want to inspect an agent's workflow without reading system-prompt.md. - -**Terminology collisions:** -- "app" — a published MeDo application vs a Molecule AI workspace; different lifecycles. -- "canvas" — their visual editor surface vs our org-chart canvas. - -**Signals to react to:** -- If MeDo opens a REST API to third-party agent platforms → expand `medo.py` from stub to full integration; file a Hermes-style adapter PR. -- If the MeDo hackathon win generates user interest → prioritise MeDo as a first-class delivery target alongside GitHub and Slack. - -**Last reviewed:** 2026-04-13 · **Stars / activity:** commercial SaaS (Baidu Cloud), active hackathon track May 2026 - ---- - -### Inngest — `inngest/inngest` - -**Pitch:** "The durable execution engine for AI agents and background functions — write reliable step functions that survive failures, retries, and deploys." - -**Shape:** Go + TypeScript SDK (Apache 2.0), ~5.2k ⭐. Cloud-hosted or self-hosted. Developers define "functions" as async step graphs; Inngest handles scheduling, retries, concurrency limits, rate limits, and failure recovery. HTTP-native — functions live in your existing web server and Inngest calls them. Comparable to Temporal but lighter: no gRPC, no workflow history replay, just durable HTTP step execution. - -**Overlap with us:** Molecule AI ships an in-house cron scheduler and a Temporal adapter for durable background work. Inngest is a third option in the same space: schedule-driven agent tasks, retry-on-failure, fan-out. Any Molecule AI feature that today uses `CronCreate` or temporal_workflow could instead use Inngest's step functions. - -**Differentiation:** Inngest is infrastructure-as-a-service for function scheduling; Molecule AI is an agent platform. Inngest has no concept of persistent agent identity, workspace lifecycle, org hierarchy, or A2A. Our Temporal adapter is the direct equivalent for complex multi-step workflows; Inngest targets simpler event-triggered functions with less operational overhead than Temporal. - -**Worth borrowing:** -- **HTTP-native step graph model** — Inngest steps live in a plain web route. Adopting this pattern for Molecule AI's skill execution would remove the need for the workspace's internal runner process for short tasks. -- **Built-in rate limiting per function** — our current delegation tool has no per-workspace rate limit; Inngest's concurrency + rate-limit primitives are the reference design. - -**Terminology collisions:** -- "function" — Inngest functions are durable async step graphs; ours are Python tool functions decorated with `@tool`. -- "event" — Inngest events trigger functions; our `event_queue` in A2A is different. - -**Signals to react to:** -- If Inngest ships native agent-state primitives (memory, long-running sessions) → direct overlap with our workspace model; re-evaluate our Temporal dependency. -- If Inngest becomes the dominant alternative to Temporal in AI stacks → add an `inngest` adapter alongside `temporal_workflow.py`. - -**Last reviewed:** 2026-04-13 · **Stars / activity:** ~5.2k ⭐, v0.x actively developed - ---- - -### Arize Phoenix — `Arize-ai/phoenix` - -**Pitch:** "AI observability and evaluation platform — trace, evaluate, and troubleshoot LLM applications and agents in production." - -**Shape:** Python + TypeScript (Apache-2.0), ~5k ⭐, v8.x. Self-hostable or Phoenix Cloud. Ships an OpenTelemetry-compatible tracing SDK (`pip install arize-phoenix-otel`) that auto-instruments LangChain, LangGraph, LlamaIndex, OpenAI, Anthropic, and more. Every LLM call, tool use, retrieval, and agent step is captured as an OpenTelemetry span and displayed in a trace waterfall UI. Built-in evaluation framework (hallucination, Q&A accuracy, toxicity) runs over captured traces. - -**Overlap with us:** Our `GET /workspaces/:id/traces` endpoint and Langfuse integration solve the same problem — making agent behaviour inspectable after the fact. Phoenix's span-level trace waterfall (LLM call → tool call → next LLM call) is more granular than our per-A2A-message `activity_logs`. Any team evaluating Molecule AI will compare our trace depth to Phoenix's. - -**Differentiation:** Phoenix is a pure observability layer — no agent runtime, no org hierarchy, no A2A, no workspace lifecycle. Molecule AI is the platform that runs agents; Phoenix can be wired in as the backend for our trace data. They're complementary by design: an OpenTelemetry exporter in each Molecule AI workspace adapter could ship spans to a Phoenix instance with zero code change. - -**Worth borrowing:** **Span-level trace waterfall** — tool calls, LLM inputs/outputs, and latency shown as a nested tree per agent run. Our current trace view shows A2A messages; this granularity is the natural next step. **Evaluation datasets from traces** — capturing production traces as an eval dataset is a clean pattern for improving agent quality without manual labeling. - -**Terminology collisions:** "traces" — same word, same meaning. Molecule AI's `GET /workspaces/:id/traces` → Langfuse; Phoenix offers an alternative or complementary backend. - -**Signals to react to:** If Phoenix becomes the de facto OTel backend for LangGraph + CrewAI workspaces → add an `OTEL_EXPORTER_OTLP_ENDPOINT` env var to our workspace containers and document Phoenix as the recommended trace backend. If Phoenix ships agent evaluation pipelines that score multi-turn A2A conversations → directly useful for Molecule AI's QA Engineer workspace. - -**Last reviewed:** 2026-04-13 · **Stars / activity:** ~5k ⭐, v8.x, actively maintained - ---- - -### SWE-agent — `SWE-agent/SWE-agent` - -**Pitch:** "SWE-agent turns LLMs into software engineers that can fix real bugs and implement features in GitHub repos." - -**Shape:** Python framework (MIT), ~16k ⭐, v1.0.6 released March 2026. A research project from Princeton NLP. An LLM is given a **SWE-agent Computer Interface (ACI)** — a curated set of bash tools and file-viewing commands purpose-built for code navigation — and autonomously works through GitHub issues end-to-end. No persistent agent identity; each run is ephemeral. Benchmarked heavily on SWE-bench: scored ~12% (GPT-4 turbo) up to ~53% with Claude 3.7 Sonnet. The ACI (not the LLM) is the key innovation — existing tools like bash/grep/vim are replaced by search-and-edit primitives that reduce LLM confusion in large codebases. - -**Overlap with us:** SWE-agent's ACI is the reference design for what our Backend Engineer, Frontend Engineer, and QA Engineer workspaces *should* have as their tool surface. Our workspaces currently rely on Claude Code's built-in tooling (Read, Edit, Bash, Grep, Glob) plus MCP skills; SWE-agent's research shows that custom ACI primitives improve coding benchmark scores meaningfully. Both platforms run LLMs inside Docker containers to execute code safely. - -**Differentiation:** SWE-agent is a **single-run task solver** — give it an issue, get a patch. No persistent state, no org hierarchy, no scheduling, no multi-agent coordination, no canvas. It's a benchmark runner and research artifact, not an operational platform. Molecule AI workspaces remember context across sessions, hold roles, coordinate with siblings, and run on schedules. SWE-agent is what you'd want our Backend Engineer workspace to *invoke* for a focused one-shot task, not what replaces the workspace. - -**Worth borrowing:** -- **Agent Computer Interface primitives** — `open`, `scroll`, `search_file`, `find_file`, `edit` with line ranges are strictly better than raw bash for LLM coding agents. Our workspaces could expose these as platform-installed skills to reduce token waste on naive bash usage. -- **Thought/action/observation trace format** — SWE-agent logs a structured trace of every reasoning step. Worth adopting as the schema for our `GET /workspaces/:id/traces` endpoint instead of raw activity log text. -- **Cost/performance tradeoff tracking** — SWE-bench results per model at different temperatures are published with cost estimates. This is the data we need for our `model_policy` routing strategy (cheap model for low-stakes tasks, expensive for SWE-bench-class tasks). - -**Terminology collisions:** -- "agent" — SWE-agent: a one-shot issue-solving process. Ours: a long-lived Docker workspace. -- "environment" — SWE-agent: a sandboxed Docker container with the repo. Ours: the `workspace_dir` bind-mount. Same concept, different lifecycle. -- "trajectory" — SWE-agent: one full (thought+action+observation)* run. We should use this term for our trace schema going forward. - -**Signals to react to:** -- If SWE-agent adds persistent memory between runs → crosses from benchmark tool to agent platform; reassess positioning. -- If SWE-bench scores with Claude cross 70% → the underlying ACI + model combo is good enough for production unattended use; evaluate as a Molecule AI runtime adapter for one-shot engineering tasks. -- If the ACI spec gets published as a standard tool surface → adopt it in our platform-installed skill set so Molecule AI coding agents benchmark cleanly on SWE-bench. - -**Last reviewed:** 2026-04-13 · **Stars / activity:** ~16k ⭐, v1.0.6 March 2026 - ---- - -### Devin — `cognition.ai` *(commercial, no public repo)* - -**Pitch:** "The first AI software engineer — Devin works alongside your team to tackle complex engineering tasks end-to-end." - -**Shape:** Commercial SaaS (Cognition AI, ~$2B valuation). Devin is a fully autonomous AI software engineer: given a task (natural language, GitHub issue, or Slack message), it opens a browser and a terminal in a sandboxed environment, writes and runs code, debugs failures, opens PRs, and iterates until done — all without human intervention. Persistent session per task; Devin can pick up where it left off. Teams access via Slack bot or web UI. Enterprise-tier pricing, no self-serve API. - -**Overlap with us:** Devin's session model — a long-lived, role-holding agent with persistent state that can be assigned tasks asynchronously and delivers results via Slack — is the same shape as a Molecule AI `Backend Engineer` workspace. Both use Docker containers, both accept A2A-style message delegation, both hold a role across sessions. Cognition has productized exactly the "one AI teammate" use case that our per-workspace org model targets. - -**Differentiation:** Devin is a **single fully-managed AI engineer**, not a platform for building multi-agent teams. No org hierarchy, no canvas, no registry, no A2A protocol between multiple Devins. Molecule AI lets teams deploy *many* specialized workspaces that coordinate — a PM delegates to a Dev Lead who delegates to a Backend Engineer. Devin is one very capable engineer; Molecule AI is the company those engineers work in. Devin's moat is vertical depth and polish (browser, full IDE, PR workflow out of the box); ours is composability and multi-agent coordination. - -**Worth borrowing:** -- **Slack-native task assignment** — Devin accepts tasks from Slack with zero friction: `@Devin fix the auth bug in PR #123`. Our Telegram channel integration is close, but formal Slack-bot task routing (task accepted, progress updates, done notification) should match this UX. Map to `workspace_channels` + `approvals` flow. -- **Session replay / audit trail** — Devin records every browser action, terminal command, and file edit in a viewable replay. Our `GET /workspaces/:id/traces` and `activity_logs` give the data; a UI replay view would close the gap for customers who need to audit AI work. -- **Task acceptance confirmation before execution** — Devin sends a plan and waits for explicit human approval before starting expensive work. This maps cleanly onto our `approvals` table: add a "plan approval" step before any long-running delegation. - -**Terminology collisions:** -- "session" — Cognition: a self-contained task execution run with persistent context. Ours: not a first-class concept (workspace is the persistent unit). No hard collision; avoid using "session" in our Devin-comparison docs. -- "teammate" — Devin's primary marketing metaphor. We use "agent" or "workspace." If Devin's framing wins the market, consider adopting "AI teammate" in our onboarding copy. - -**Signals to react to:** -- If Cognition opens a public API for Devin → evaluate as a Molecule AI adapter (`devin` runtime). Teams could provision a Devin workspace alongside Claude Code workspaces for tasks that benefit from browser access. -- If Devin adds multi-agent orchestration (multiple Devins coordinating on a project) → direct competitor to our multi-workspace org model; expect significant marketing push. -- If SWE-bench scores plateau and Cognition shifts positioning toward "AI company" (not just "AI engineer") → direct brand conflict; double down on our team-of-agents narrative. - -**Last reviewed:** 2026-04-13 · **Stars / activity:** commercial SaaS, ~$2B valuation, no public repo - ---- - -### Cline — `cline/cline` - -**Pitch:** "AI coding assistant that lives in VS Code and can autonomously edit files, run commands, and browse the web." - -**Shape:** VS Code extension (Apache-2.0), ~44k ⭐, pushed daily. Wraps any LLM (Claude, GPT-4o, Gemini, DeepSeek, local via Ollama) with a system-level tool belt: read/write files, run shell commands, call browser MCP. Single active session per VS Code window. Marketplace install, no containers, no persistent agent identity between sessions. - -**Overlap with us:** Cline's Claude-backed coding session is the same core loop as a Molecule AI Claude Code workspace — both wrap Claude with file+shell tools and stream results. super-dev explicitly runs inside Cline. Developers who discover Cline as a quick "AI pair programmer" are exactly our target user for the Claude Code runtime. - -**Differentiation:** Cline is a VS Code-local tool, not a multi-agent platform. No persistent identity between sessions, no org hierarchy, no A2A between agents, no WebSocket canvas, no scheduling. "Done" for Cline means a code change lands in the editor; "done" for Molecule AI means a team of agents deployed a feature through a review pipeline. Complementary shapes — a Cline user who needs parallelism is a Molecule AI convert. - -**Worth borrowing:** Auto-approval modes (read-only → write → execute tiers) with per-command diff review — more granular than our single `approvals` gate. The "cost meter" (running token spend shown in UI) is a cheap trust-building feature for our Canvas. - -**Terminology collisions:** "task" — their in-session coding task vs our `current_task` heartbeat field. "tools" — same word, both mean structured LLM tool calls. - -**Signals to react to:** If Cline adds multi-session agent persistence or cross-window agent communication → direct threat to our Claude Code runtime story. If Cline's MCP support becomes the de facto way developers wire tools → align our workspace tool model to the same MCP surface. - -**Last reviewed:** 2026-04-13 · **Stars / activity:** ~44k ⭐, pushed daily - ---- - -### OpenHands — `All-Hands-AI/OpenHands` - -**Pitch:** "Open-source AI software engineer — let AI be your co-developer: browse the web, write code, run commands, and collaborate on tasks." - -**Shape:** Python + TypeScript (MIT), ~47k ⭐, v0.39.0. Web-hosted UI (or local Docker) where an AI agent operates inside a sandboxed runtime (browser, shell, files) to complete multi-step engineering tasks. Supports Claude, GPT-4o, Gemini, DeepSeek. SWE-Bench top-ranked open-source system. Community of ~3k contributors. - -**Overlap with us:** OpenHands is the closest open-source parallel to a Molecule AI Claude Code workspace — both run an AI agent with shell+file access inside a container. The sandbox model (Docker-isolated execution, browser use, file I/O) is identical to our `workspace-template` runtime layer. Molecule AI users building a "solo engineer" workspace are building what OpenHands ships out of the box. - -**Differentiation:** OpenHands is single-agent, single-task — no org hierarchy, no A2A between agents, no visual canvas, no scheduling, no persistent identity across sessions. A single "project" is one sandboxed run. Molecule AI is a persistent, multi-agent company with A2A, schedules, and a visual org chart. OpenHands is the reference implementation for the solo-agent shape; Molecule AI is the platform for the team shape. - -**Worth borrowing:** **CodeAct action space** — agent emits Python code instead of JSON tool schemas; code is executed directly in the sandbox. More expressive than JSON tool calls and simpler to extend. If our workspace agents need arbitrary tool composition, CodeAct is worth evaluating as an alternative to our MCP tool list. - -**Terminology collisions:** "workspace" — theirs is a sandboxed task run; ours is a long-lived Docker container with an agent role. "agent" — same word, different persistence model. - -**Signals to react to:** If OpenHands ships multi-agent coordination (agents spawning sub-agents with shared memory) → direct overlap with our team model. If their SWE-Bench rank approaches GPT-4o with an open model → cost-effective backend for our DevOps / QA workspaces. - -**Last reviewed:** 2026-04-13 · **Stars / activity:** ~47k ⭐, v0.39.0, very active - ---- - -### Scion — `GoogleCloudPlatform/scion` - -**Pitch:** "An experimental agent hypervisor — each agent runs in its own isolated container with dedicated credentials, config, and git worktree; orchestrates Claude Code, Gemini CLI, Codex, and OpenCode concurrently." - -**Shape:** Go + YAML (Apache-2.0). Container-per-agent isolation via Docker, Podman, Apple Containers, or Kubernetes. Named runtime profiles. Introduces an `agents.md` capability-declaration convention. Not a framework — a harness supervisor. - -**Overlap with us:** Container-per-agent mirrors our Docker workspace model. Multi-harness concurrency maps to multi-workspace A2A topology. Explicitly manages Claude Code — direct contact with our user base. - -**Differentiation:** No persistent agent memory, no visual canvas, no A2A between agents, no channels. It is the container orchestration layer beneath agents; we are the agent identity and collaboration layer above. - -**Worth borrowing:** `agents.md` capability spec — a standard file per workspace declaring what the agent can do. Adopt in `workspace/` for Scion interoperability. - -**Terminology collisions:** "profile" — Scion: named runtime config; ours: undefined. "harness" — both mean "the process managing agent execution." - -**Signals to react to:** If Scion adds A2A or a memory layer → direct overlap. If `agents.md` gains wide adoption → align `workspace/` to the spec. - -**Last reviewed:** 2026-04-15 · **Stars / activity:** GCP repo, 230 HN pts at launch, April 8, 2026 - ---- - -### claude-mem — `thedotmack/claude-mem` - -**Pitch:** "Automatically captures everything Claude does during coding sessions — persistent cross-session memory with search, timeline, and observation retrieval as MCP tools." - -**Shape:** TypeScript (AGPL-3.0), ~56k ⭐, +2,997 stars in one day. Five lifecycle hooks (`SessionStart`, `UserPromptSubmit`, `PostToolUse`, `Stop`, `SessionEnd`) intercept agent actions, compress observations via Claude SDK, store in SQLite FTS5 + Chroma hybrid. Three MCP tools exposed: `search`, `timeline`, `get_observations`. Web viewer at localhost:37777. ⚠️ `ragtime/` retrieval subdirectory is PolyForm Noncommercial — reimplementation required for commercial SaaS use. - -**Overlap with us:** Directly addresses our known cross-session memory gap. Lifecycle hooks are structurally compatible with our harness entry points. - -**Differentiation:** A memory add-on for a single Claude Code session; no A2A, no org hierarchy, no scheduling, no channels. - -**Worth borrowing:** `PostToolUse` + `SessionEnd` → compressed observation pipeline, compatible with our harness lifecycle. Progressive-disclosure retrieval (summaries first, full content on demand) caps token overhead at `SessionStart`. - -**Terminology collisions:** "observations" — their captured agent actions; not a first-class term in our platform. - -**Signals to react to:** If PolyForm NC removed from `ragtime/` → evaluate direct integration. If hook schema is formalized → adopt as standard workspace lifecycle spec. - -**Last reviewed:** 2026-04-15 · **Stars / activity:** ~56k ⭐, +2,997 today - ---- - -### Multica — `multica-ai/multica` - -**Pitch:** "Turn coding agents into real teammates — assign tasks, track progress, compound skills." - -**Shape:** TypeScript + Go (Next.js 16 / Chi router / PostgreSQL 17 + pgvector), ~12.8k ⭐, +1,503 today. Local agent daemons execute Claude Code / Codex / OpenCode in isolation; state syncs to a central backend. Solved tasks are semantically indexed via pgvector and surfaced to future agents team-wide — the "skill-compounding" model. 36 releases, 1.6k forks, actively shipped. - -**Overlap with us:** Skill-compounding maps to our plugin/skills registry but adds automatic semantic indexing. Local-daemon + central-backend mirrors Docker workspaces + Canvas backend. Cross-agent task assignment and scheduling are first-class features. - -**Differentiation:** No visual org-chart canvas, no A2A protocol, no persistent agent identity across restarts, no channel integrations. Central backend is a coordination hub, not peer-to-peer. Closer to a task manager for agents than an agent company platform. - -**Worth borrowing:** pgvector semantic indexing of solved tasks — each completed workspace run contributes to a searchable skill pool, evolving our plugin registry from file-based discovery to semantic retrieval. - -**Terminology collisions:** "skills" — their skills are solved-task embeddings; ours are installed behaviour bundles. - -**Signals to react to:** If Multica adds A2A or persistent agent identity → direct competitor. Star velocity (+1,503/day) warrants weekly tracking. - -**Last reviewed:** 2026-04-15 · **Stars / activity:** ~12.8k ⭐, +1,503 today, 36 releases - ---- - -### Skills CLI — `vercel-labs/skills` - -**Pitch:** "The CLI for the open agent skills ecosystem — discover, install, and share reusable skills across 45+ coding agents." - -**Shape:** TypeScript (MIT), ~14.2k ⭐, +153 today. `npx skills` package manager backed by Vercel. Skills are `SKILL.md` directories following the [agentskills.io](https://agentskills.io) open spec. Targets Claude Code, Codex, Gemini CLI, Cursor, Cline, OpenCode, Hermes, Holaboss, and 37+ others from a single repository. - -**Overlap with us:** Three existing entries (Hermes, gstack, Holaboss) flag "if agentskills.io picks up mass adoption → align our plugin manifest." This is that moment: Vercel ships the canonical install CLI with 14k stars and 45-agent coverage. - -**Differentiation:** Skills CLI is a package manager, not an agent runtime. No canvas, A2A, or scheduling. It installs behavior bundles into whatever agent the developer uses; Molecule AI is the runtime those bundles run inside. - -**Worth borrowing:** Align our `plugins/` manifest to the agentskills.io `SKILL.md` spec so any `npx skills`-installable skill also installs cleanly into a Molecule AI workspace. Dual compatibility = free distribution channel. - -**Terminology collisions:** "skills" — same word, same filesystem convention; full spec alignment is the goal, not a collision to manage. - -**Signals to react to:** If `npx skills` becomes the de facto install path industry-wide → our `plugins/install` should natively consume the same manifest format. If agentskills.io publishes a versioned schema → adopt it immediately in `plugins/`. - -**Last reviewed:** 2026-04-15 · **Stars / activity:** ~14.2k ⭐, +153 today, Vercel-backed - ---- - -### Archon — `coleam00/Archon` - -**Pitch:** "The first open-source harness builder for AI coding — make AI coding deterministic and repeatable." - -**Shape:** TypeScript (MIT), ~18.1k ⭐, +396 today. Defines AI coding workflows as YAML DAGs: planning → implementation → validation → review → PR. Each run is git-worktree-isolated. Nodes are either AI-powered (Claude Code generation) or deterministic (bash, test runners). Human approval gates at any phase. Delivery to Slack, Telegram, Discord, GitHub, or web UI. "What Dockerfiles did for infra, Archon does for AI coding." - -**Overlap with us:** Wraps Claude Code in a structured pipeline — the same pattern as our Dev Lead delegating to a Claude Code workspace. Approval gates map to our `approvals` table. Git-worktree isolation mirrors our `workspace/` worktree pattern. - -**Differentiation:** No persistent agent identity, no org hierarchy, no A2A, no canvas, no multi-session scheduling. Archon defines a single delivery run; Molecule AI is the persistent company those runs operate inside. - -**Worth borrowing:** YAML-DAG workflow definition (planning → implementation → validation → PR) with mixed AI/deterministic nodes — natural extension of `workspace/` for repeatable, auditable delivery pipelines. - -**Terminology collisions:** "workflow" — their YAML DAG vs our informal usage. "harness" — Archon, Scion, and our Claude Code runner all claim the word; Molecule AI docs should clarify its own use. - -**Signals to react to:** If Archon adds multi-workspace coordination → direct competitor to our orchestration layer. If their YAML workflow schema gains wide adoption → add an Archon import adapter to `workspace/`. - -**Last reviewed:** 2026-04-15 · **Stars / activity:** ~18.1k ⭐, +396 today, v0.3.6 - ---- - -### Claude Code Routines — `anthropic.com` *(commercial, no public repo)* - -**Pitch:** "Schedule Claude Code agents to run automatically on timers and GitHub events — agentic workflows in the cloud without manual intervention." - -**Shape:** Anthropic-hosted cloud feature. Users define routines that fire a Claude Code session on cron timers or GitHub events (push, PR, issue). Runs serverlessly inside Anthropic infrastructure. No self-hosting, no public API. HN item 47768133: 611 pts, 355 comments at launch today — significant community concern about vendor lock-in. - -**Overlap with us:** Direct overlap with `workspace_schedules` + cron-triggered workspace execution. Anthropic now competes in the scheduled agentic execution space with a first-party hosted offering. - -**Differentiation:** No persistent agent memory, no org hierarchy, no A2A between agents, no visual canvas, no multi-model support, Anthropic-only lock-in. HN consensus: "trivially reproducible with cron + API." Our differentiators: multi-agent coordination, persistent identity, model-agnosticism, self-hostability. - -**Worth borrowing:** GitHub event triggers (push/PR/issue → fire agent) as first-class schedule trigger types. Our `workspace_schedules` is cron-only; this gap is now competitively visible. - -**Terminology collisions:** "routine" — Anthropic: a scheduled agent session; near-synonym with our `workspace_schedule` rows. - -**Signals to react to:** If Routines adds A2A between routines → direct platform competition from Anthropic with massive distribution advantage. If lock-in backlash grows → double down on "self-hostable, model-agnostic" narrative as the open alternative. - -**Last reviewed:** 2026-04-15 · **Stars / activity:** Anthropic cloud feature, 611 HN pts today (item 47768133) - ---- - -### Anthropic Managed Agents — `api.anthropic.com` *(commercial, public beta)* - -**Pitch:** "Run managed agent sessions with built-in sandboxing, checkpointing, credential management, and end-to-end tracing — without managing infrastructure." - -**Shape:** Anthropic-hosted API, public beta since April 8, 2026 (`managed-agents-2026-04-01` beta header required). Bundles: agent loop + tool execution, sandboxed container per session, state persistence (conversation-history checkpointing per session), credential management + scoped permissions, end-to-end tracing. Pricing: standard API token cost + **$0.08/session-hour** active runtime (idle = zero cost). SSE stream endpoint (`GET /v1/sessions/{id}/stream`) for real-time event delivery. `user.tool_confirmation` SSE event supports async tool approval/denial from the application layer. - -**Overlap with us:** Idle-zero billing addresses the same problem as GH #711 (workspace hibernation). Per-session sandboxing overlaps E2B (#574). Session-level conversation checkpointing partially overlaps Temporal durable execution (#583). - -**Differentiation:** Session checkpointing ≠ Temporal — Managed Agents checkpoints conversation history; Temporal handles cross-workspace workflow orchestration, retry sagas, and distributed state. Our Docker workspace model is richer: persistent identity, multi-agent A2A, org hierarchy, RBAC, visual canvas, model-agnosticism. RBAC passthrough requires an async out-of-band sidecar (our `check_permission` gates run inside the workspace process; Managed Agents loop runs server-side). Cost neutral at ~2 active hrs/day (~$0.16/day vs ~$0.10–0.17/day Fly.io shared-1x); more expensive for high-throughput workspaces (8+ active hrs/day). API surface explicitly unstable ("behaviors may be refined between releases" — Anthropic docs). - -**Signals to react to:** GA announcement → re-evaluate `ClaudeManagedAgentsExecutor` adapter spike (GH #742 closed: WATCH-FOR-GA). Multiagent coordination + memory research-preview features exit waitlist → evaluate whether built-in multi-agent replaces our A2A layer or complements it. `tool_confirmation` API stabilizes → simplifies our RBAC passthrough sidecar design. Price drop below $0.05/session-hour → re-run cost model for high-traffic workspaces. - -**Last reviewed:** 2026-04-17 · **Stars / activity:** Anthropic cloud API, public beta (Apr 8 2026). **Verdict: WATCH-FOR-GA** (GH #742 closed). Adapter estimated ~150–200 LOC, non-trivial async session model, RBAC interception requires architectural work. - ---- - -### Microsoft Agent Framework — `microsoft/agent-framework` - -**Pitch:** "A framework for building, orchestrating and deploying AI agents and multi-agent workflows with support for Python and .NET." - -**Shape:** Python + C#/.NET (MIT), ~9.5k ⭐, April 2026 active releases. Graph-based workflow engine with streaming, checkpointing, and human-in-the-loop approval gates. Supports Azure OpenAI, Microsoft Foundry, and OpenAI. Ships a DevUI for interactive debugging, OpenTelemetry observability, and "AF Labs" (experimental RL-based features). Ships a migration guide from AutoGen — this is the official Microsoft successor to `microsoft/autogen`. - -**Overlap with us:** Our workspace-template adapters target AutoGen/AG2; this is the official Microsoft path forward, making our adapter coverage incomplete. HITL approval gates and graph-based multi-agent routing mirror our `approvals` table + delegation chain. - -**Differentiation:** Orchestration SDK only — no persistent agent memory, no org-chart canvas, no A2A between independently deployed agents, no scheduling, no channel integrations. - -**Worth borrowing:** DevUI interactive debugging panel (inspect agent state mid-run without a full canvas). AF Labs RL routing — agents improve delegation decisions from past run outcomes; worth evaluating for our PM workspace's `delegate_task` routing. - -**Terminology collisions:** "middleware" — their processing pipeline hook; undefined in our platform. "graph" — their workflow DAG vs our live org chart (same word, different semantics). - -**Signals to react to:** AF 1.0 GA shipped April 7 with AG-UI (SSE protocol for streaming agent events to frontends). AG-UI is a direct competitor to our WebSocket canvas events — if AG-UI becomes a standard, we need an AG-UI-compatible SSE endpoint to attract MAF users. Process Framework GA in Q2 2026 will add visual workflow design — evaluate overlap with our Canvas. Google's private Tool Registry (Vertex AI) sets an enterprise expectation for tool governance that we should match with per-org curated plugin registries. - -**Last reviewed:** 2026-04-17 · **Stars / activity:** ~9.5k ⭐, v1.0 GA April 7 2026, AG-UI protocol announced - ---- - -### Open Agents — `vercel-labs/open-agents` - -**Pitch:** "An open-source reference app for building and running background coding agents on Vercel — fork it, adapt it, ship your own cloud coding agent." - -**Shape:** TypeScript (MIT), ~2.2k ⭐, +1,020 today. Three-layer architecture: web UI → agent workflow (Vercel Workflow SDK for durable execution) → isolated sandbox VM. Key design principle: **agent runs *outside* the sandbox VM** and interacts with it through tools — not co-located. Snapshot-based VM resumption, auto-commit/push/PR, session sharing via read-only links, voice input. From Vercel Labs — same team as the Skills CLI entry above. - -**Overlap with us:** Vercel Workflow SDK gives checkpoint-and-resume durability — the same gap our workspace restart-context solves ad hoc. Agent-outside-sandbox mirrors our Docker workspace + adapter separation. Auto-PR creation is a first-class feature we implement manually. - -**Differentiation:** Single coding agent, no org hierarchy, no A2A, no scheduling, no persistent memory across sessions, no channels. A reference template, not an operational platform. - -**Worth borrowing:** Snapshot-based sandbox resumption — preserves VM state across agent restarts without re-cloning the repo. More efficient than our current Docker restart + `git clone` approach for long-running workspace tasks. - -**Terminology collisions:** "workflow" — Vercel's durable execution primitive; our informal delegation chain term. - -**Signals to react to:** If Vercel Workflow SDK becomes a standard durable-execution backend → evaluate as a drop-in for `workspace_schedules` on Vercel-hosted deployments. If open-agents adds multi-agent coordination → direct competitor reference app with Vercel distribution. - -**Last reviewed:** 2026-04-15 · **Stars / activity:** ~2.2k ⭐, +1,020 today, Vercel Labs - ---- - -### Gemini CLI — `google-gemini/gemini-cli` - -**Pitch:** "An open-source AI agent that brings the power of Gemini directly into your terminal." - -**Shape:** TypeScript (Apache 2.0), ~101k ⭐, v0.38.1 released April 15, 2026. Single-agent interactive CLI with a 1M-token context window (Gemini models). Tool surface: file read/write, shell execution, web fetch, Google Search grounding. MCP support via `~/.gemini/settings.json` — any MCP server can extend its tool set. ReAct loop architecture. No persistent agent identity between sessions. Ships from Google's own org (`google-gemini`). - -**Overlap with us:** Direct structural parallel to our Claude Code runtime adapter — both are agentic CLIs wrapping a frontier model with file+shell tools. Developers choosing between Claude Code and Gemini CLI for their workspace runtime will hit our adapter story immediately. MCP support means the same skills installed for a Claude Code workspace *can* target a Gemini CLI workspace with zero changes. - -**Differentiation:** No persistent memory, no org hierarchy, no A2A, no scheduling, no canvas. A session ends when the terminal closes. Molecule AI's Claude Code adapter sits *on top* of Claude Code; Gemini CLI would need a parallel adapter. We're the platform; Gemini CLI is the runtime candidate. - -**Worth borrowing:** Google Search grounding as a first-class tool — grounded web results with citations surfaced inline. Our Research Lead workspace uses raw WebSearch; grounding would reduce hallucinated citations. Consider exposing a `google_search_grounded` tool in our claude-code skill pack. - -**Terminology collisions:** "agent" — their single-session CLI process; our long-lived Docker workspace. - -**Signals to react to:** If Gemini CLI adds persistent memory between sessions → it closes the gap with our Claude Code adapter; push adoption of the `gemini-cli` runtime adapter. If `gemini-cli` MCP adoption exceeds `claude-code` MCP adoption → re-weight our adapter documentation priority. If Google ships a multi-agent layer on top of Gemini CLI → direct platform threat with massive distribution. - -**Last reviewed:** 2026-04-16 · **Stars / activity:** ~101k ⭐, v0.38.1 April 15, 2026, Google-maintained - ---- - -### open-multi-agent — `JackChen-me/open-multi-agent` - -**Pitch:** "TypeScript multi-agent framework — one `runTeam()` call from goal to result. Auto task decomposition, parallel execution. 3 dependencies, deploys anywhere Node.js runs." - -**Shape:** TypeScript (MIT), ~5.7k ⭐, v1.1.0 released April 1, 2026. Coordinator-based architecture: one coordinator agent decomposes a natural-language goal into a dependency DAG of tasks, assigns each to a specialist agent, and fans results back. Shared message bus + memory pool across the agent pool. Three runtime deps (`@anthropic-ai/sdk`, `openai`, `zod`). MCP servers connected via `connectMCPTools()`. Supports Claude, GPT, Gemini, Grok, Ollama, and any OpenAI-compatible endpoint per-agent. - -**Overlap with us:** Coordinator-DAG decomposition mirrors our PM → Dev Lead → Engineer delegation chain, but automated at runtime from a single goal string — where we rely on system-prompt-encoded delegation rules. The shared message bus maps to our A2A event queue. MCP-native means workspace skills install into `open-multi-agent` teams as easily as ours. The per-agent model selection (cheap model for shallow tasks, expensive for deep) is the same `model_policy` we've been deferring. - -**Differentiation:** No persistent agent identity across runs, no visual canvas, no scheduling, no Docker isolation, no channels. Teams are ephemeral in-process objects. Molecule AI is an operational platform for long-lived agents; `open-multi-agent` is a library for one-shot goal execution. - -**Worth borrowing:** Runtime goal-to-DAG decomposition — instead of hard-coding delegation trees in system prompts, the PM workspace could call a decomposition step that generates a task graph from the user's goal. Cheap to prototype: wrap `runTeam()` logic as a PM skill. - -**Terminology collisions:** "coordinator" — their orchestrating agent; our PM workspace plays the same role but with a persistent identity. "team" — their ephemeral agent pool; our org-chart canvas of live workspaces. - -**Signals to react to:** If `open-multi-agent` adds persistent agent state → library becomes a platform; assess as a dependency or competitor for our TypeScript SDK. If `runTeam()` pattern becomes idiomatic in the Node.js agent ecosystem → expose a compatible API surface in our SDK for parity. - -**Last reviewed:** 2026-04-16 · **Stars / activity:** ~5.7k ⭐, v1.1.0 April 1, 2026, MIT - ---- - -### AgentScope — `modelscope/agentscope` - -**Pitch:** "Build and run agents you can see, understand and trust." - -**Shape:** Python (Apache 2.0), ~23.8k ⭐, v1.0.18 released March 26, 2026. Alibaba/ModelScope. Multi-agent: `MsgHub` typed message routing, ReAct agents, sequential and concurrent pipelines. MCP client integration. OpenTelemetry observability built-in. Voice agent support. RL-based agent tuning (experimental). - -**Overlap with us:** MCP support means AgentScope agents can call tools exposed by our MCP server — potential consumer of our registry. Pipeline orchestration (sequential / concurrent) is structurally the same as our PM → Dev Lead → Engineer delegation chain. OpenTelemetry instrumentation parallels our `GET /workspaces/:id/traces` + Langfuse stack. - -**Differentiation:** Code-first Python SDK — no visual canvas, no Docker workspace lifecycle, no org-chart hierarchy, no scheduling, no channels, no A2A between independently deployed agents. It's a framework for building agent logic in-process; we're the platform that deploys and coordinates agents as long-lived services. - -**Worth borrowing:** `MsgHub` typed routing (messages carry sender/receiver type metadata, enabling selective fan-out) — more expressive than our flat A2A event queue. RL trajectory logging for agent tuning — if our `activity_logs` adopt the same schema, workspace runs become training data. - -**Terminology collisions:** "pipeline" — their orchestration primitive; we use "delegation chain" informally. "service agent" — their long-running agent variant; close to our workspace concept. - -**Signals to react to:** If AgentScope ships a deployment layer (Docker/Kubernetes-managed agent lifecycle) → direct overlap with our workspace model. If their RL tuning reaches stable → evaluate for PM routing improvement. - -**Last reviewed:** 2026-04-16 · **Stars / activity:** ~23.8k ⭐, v1.0.18 March 26, 2026, Alibaba/ModelScope - ---- - -### Plannotator — `backnotprop/plannotator` - -**Pitch:** "Annotate and review coding agent plans and code diffs visually — share with your team, send feedback to agents with one click." - -**Shape:** TypeScript (Apache 2.0 + MIT dual), ~4.3k ⭐, v0.17.10 April 13, 2026. CLI install → opens browser UI for plan annotation. Supports Claude Code, Gemini CLI, Codex, OpenCode, Copilot CLI. Annotation primitives: delete, insert, replace, comment. Structured feedback returned to agent. Shareable plan links (URL-encoded or encrypted, 7-day expiry). - -**Overlap with us:** Direct overlap with `hitl.py` (shipped PR #346) and the `approvals` table. Both implement "pause agent → human reviews → structured feedback → resume." Plannotator specifically targets the *plan approval* moment — exactly what `requires_approval` in `hitl.py` gates. The annotation type model (delete/insert/replace/comment) is more expressive than our current `resume_task(message: str)` free-text feedback. - -**Differentiation:** A review UX tool, not an agent platform. No agent runtime, no memory, no scheduling, no A2A, no org hierarchy. Molecule AI runs the agents; Plannotator is what the review UI could look like. - -**Worth borrowing:** Structured annotation types as HITL feedback schema — replace `message: str` in `resume_task` with `{action: "approve"|"reject"|"modify", annotations: [{type: "delete"|"insert"|"replace"|"comment", ...}]}`. Shareable approval links with expiry — our approve/deny URLs are static; time-bounded links improve security. - -**Terminology collisions:** "plan" — their agent's proposed action list; we use this informally in system prompts. - -**Signals to react to:** If Plannotator adds MCP integration → agents could self-request plan review via tool call; evaluate as a native HITL trigger in our platform. - -**Last reviewed:** 2026-04-16 · **Stars / activity:** ~4.3k ⭐, v0.17.10 April 13, 2026 - ---- - -### GenericAgent — `lsdefine/GenericAgent` - -**Pitch:** "Self-evolving agent: grows a skill tree from a 3.3K-line seed, achieving full system control with 6x less token consumption." - -**Shape:** Python (MIT), ~2.1k ⭐, v1.0 released January 16, 2026. Single-agent, system-level: browser automation, terminal, filesystem, keyboard/mouse, screen vision, mobile/ADB. Nine atomic tools. **Self-evolving skill tree:** each solved task is crystallised into a reusable skill stored in a four-tier memory hierarchy (L0 rules → L1 indices → L2 facts → L3 task-skills → L4 session archives). Subsequent similar tasks skip exploration and replay the stored skill directly. No MCP. No multi-agent. - -**Overlap with us:** The four-tier memory taxonomy (rules / indices / facts / skills / archives) is structurally more expressive than our flat `agent_memories` key-value table. Skill crystallisation — automatically converting a solved task into a reusable procedure — is the same instinct as our `plugins/` registry but applied at runtime rather than install-time. - -**Differentiation:** Single agent, no org hierarchy, no A2A, no canvas, no channels. The skill tree grows from one user's usage; our plugins are shared org-wide. GenericAgent targets "personal OS agent"; we're "AI company for engineering teams." - -**Worth borrowing:** Four-tier memory taxonomy as a named model for `agent_memories` — add explicit labels (rules / facts / skills / archives) to our memory scopes to improve inspectability and retrieval quality. - -**Terminology collisions:** "skills" — theirs are crystallised task executions (runtime-generated procedures); ours are installed behaviour bundles (developer-authored Markdown). Same word, different origin. - -**Signals to react to:** If skill crystallisation gets formalised as a standard (e.g., aligns with agentskills.io schema) → evaluate automatic skill generation from workspace task history. - -**Last reviewed:** 2026-04-17 · **Stars / activity:** ~2.1k ⭐, v1.0 January 16, 2026, active - ---- - -### OpenSRE — `Tracer-Cloud/opensre` - -**Pitch:** "Build your own AI SRE agents — the open source toolkit for the AI era." - -**Shape:** Python (Apache 2.0), ~900 ⭐, active 2026. Framework + toolkit for AI-powered Site Reliability Engineering. Agents autonomously investigate incidents: fetch alert context, correlate logs/metrics/traces, identify root cause, suggest remediation, optionally execute fixes. **40+ pre-built integrations:** LLM providers (OpenAI, Anthropic, Gemini, local), observability (Grafana, Datadog, Honeycomb, CloudWatch), infrastructure (K8s, AWS EKS/EC2/Lambda, GCP, Azure), databases, PagerDuty, Slack. MCP support including GitHub MCP. Incident summaries delivered directly to Slack/PagerDuty channels. - -**Overlap with us:** Our DevOps workspace (`org-templates/molecule-dev/devops/`) handles infrastructure monitoring and deployment tasks — the same surface OpenSRE's agents cover. MCP integration means OpenSRE tools could be consumed by a Molecule AI DevOps workspace as a skill pack. Slack/PagerDuty delivery mirrors our `workspace_channels` feature. - -**Differentiation:** OpenSRE is a specialised SRE toolkit, not a general agent platform. No visual canvas, no org hierarchy, no A2A between agents, no scheduling, no memory across sessions. - -**Worth borrowing:** 40+ production-tested DevOps integrations as a reference skill pack — rather than building infra tool integrations from scratch, evaluate wrapping OpenSRE's adapters as Molecule AI DevOps workspace skills. - -**Terminology collisions:** "agent" — their incident-response runner; our long-lived Docker workspace. - -**Signals to react to:** If OpenSRE ships a workspace/session persistence layer → closes the gap with our DevOps adapter; reassess. If their 40+ integration catalogue becomes the de facto DevOps tool standard → make them a first-class skill pack dependency for DevOps workspaces. - -**Last reviewed:** 2026-04-17 · **Stars / activity:** ~900 ⭐, Apache 2.0, actively maintained - ---- - -### AMD GAIA — `amd/gaia` - -**Pitch:** "Build AI agents for your PC — an open-source framework for agents that run 100% locally on AMD Ryzen AI hardware with no cloud dependency." - -**Shape:** Python + C++ (MIT), ~1.2k ⭐, v0.17.2 April 10, 2026. AMD-backed. Requires AMD Ryzen AI 300+ hardware (NPU-accelerated); no NVIDIA/CPU-only path documented. High-level API: subclass `Agent`, decorate tools with `@tool`, define system prompt. MCP client support — connects to any MCP server for external tool access. Built-in RAG (50+ file formats), vision (Qwen3-VL), voice (Whisper ASR + Kokoro TTS). `pip install amd-gaia`. - -**Overlap with us:** MCP support means GAIA agents can consume the same tool servers our workspaces use. The `@tool` decorator registration pattern is structurally identical to our `@app.workflow_task`. "No cloud dependency" is a shared positioning — we're both self-hostable, privacy-first alternatives to managed cloud agents. GAIA targets the developer's laptop; Molecule AI targets the team's server. - -**Differentiation:** Hardware-locked to AMD Ryzen AI — not general-purpose. No A2A, no org hierarchy, no canvas, no scheduling, no channels. Single-agent. Molecule AI runs anywhere Docker runs. - -**Worth borrowing:** Clean `@tool` decorator pattern for agent tool registration — simpler than our MCP-tool-as-config approach; worth evaluating for the workspace adapter layer. RAG + vision + voice as first-class built-ins show what a complete local agent surface looks like. - -**Terminology collisions:** "agent" — their in-process Python object; our Docker workspace. "tool" — same concept, same decorator pattern. - -**Signals to react to:** If GAIA adds NVIDIA/CPU-only support → becomes a general local-agent framework with serious AMD backing; evaluate as a runtime adapter. If MCP server protocol via GAIA gains adoption → alignment already exists via our MCP server (#313). - -**Last reviewed:** 2026-04-18 · **Stars / activity:** ~1.2k ⭐, v0.17.2 April 10, 2026, AMD-maintained - ---- - -### ClawRun — `clawrun-sh/clawrun` - -**Pitch:** "Deploy and manage AI agents in seconds — one config to launch secure, sandboxed agents across any cloud." - -**Shape:** TypeScript (Apache 2.0), ~84 ⭐, 45 releases, active 2026. Hosting and lifecycle layer for open-source agents: deploys into secure Vercel Sandboxes (more providers planned), manages startup, heartbeat keep-alive, snapshot/resume, and wake-on-message. Channels: Telegram, Discord, Slack, WhatsApp. Web dashboard + CLI. Cost tracking and budget enforcement per channel. Pluggable agent/provider/channel architecture. - -**Overlap with us:** This is the closest architectural match we've tracked. Feature-for-feature: sandbox → our Docker workspace, heartbeat → our `active_tasks` + `last_heartbeat`, snapshot/resume → our workspace pause/resume, channels → our `workspace_channels`, cost tracking → our usage logging, pluggable architecture → our adapter + plugin system. ClawRun is building the same platform from a different starting point (agent hosting → adding channels) vs our approach (multi-agent org → adding deployment). - -**Differentiation:** No visual canvas, no org hierarchy, no A2A between agents, no memory, no scheduling, no multi-agent coordination. 84 stars signals early stage — but 45 releases shows active shipping. Our differentiator: agent identity + memory + A2A coordination vs ClawRun's pure hosting focus. - -**Worth borrowing:** Per-channel budget enforcement — our `workspace_channels` has no cost cap; adding a `budget_limit` field per channel would prevent runaway messaging costs. Wake-on-message lifecycle — agents sleep when idle and wake only when a message arrives; more cost-efficient than our always-on containers for low-traffic workspaces. - -**Terminology collisions:** "sandbox" — their Vercel Sandbox container; our Docker workspace container. "channel" — same word, same concept. - -**Signals to react to:** If ClawRun adds A2A or multi-agent coordination → becomes a direct lightweight competitor with Apache 2.0 and a simpler onboarding story. If their sandbox provider list expands (AWS/GCP/Azure) → pricing pressure on our Docker-first deployment model. - -**Last reviewed:** 2026-04-18 · **Stars / activity:** ~84 ⭐, 45 releases, Apache 2.0, actively shipped - ---- - -### Paperclip — `paperclipai/paperclip` - -**Pitch:** "Open-source orchestration for zero-human companies." - -**Shape:** Python (MIT), ~54.8k ⭐, launched March 4, 2026. Hierarchical multi-agent -system in which a **CEO agent** receives a top-level company goal, spawns **Manager -agents** for functional areas (engineering, marketing, operations, finance), and -Managers spawn **Worker agents** for atomic tasks. Authority and delegation flow -bidirectionally through the org: workers can escalate, managers can override. Humans -serve as the board with veto authority. Per-agent budget constraints and a full audit -trail of every delegation decision. - -**Overlap with us:** The CEO/manager/worker hierarchy is structurally identical to our -PM → Dev Lead → Engineer delegation chain. Their "zero-human companies" is the same -thesis as our "AI company" framing — and they reached 54.8k ⭐ in six weeks. Budget -constraints and audit-trail export are features we've deferred; Paperclip ships both. -Their bidirectional escalation (worker → manager) maps cleanly to our `approvals` table -but is more automatic. - -**Differentiation:** Paperclip is a framework — agents are in-process Python objects, -ephemeral per run. No Docker workspace isolation, no persistent agent memory, no visual -canvas, no A2A protocol, no scheduling, no channel integrations. We're the operational -platform; Paperclip defines the org chart in code for one-shot execution. - -**Worth borrowing:** -- **Per-agent budget constraints** — token/cost ceilings per layer. Add a `budget_limit` - field per workspace in org.yaml; enforce at the A2A delegation layer. -- **Audit trail schema** — Paperclip logs every CEO → manager → worker delegation with - decision rationale. Adopt this as the standard format for our `activity_logs`. -- **Bidirectional authority** — worker escalation to manager without breaking the PM's - delegation model; maps to a `requires_approval` flag on delegation responses. - -**Terminology collisions:** -- "CEO agent" — their top-level orchestrator; our PM workspace plays the same role. -- "zero-human company" vs our "AI company" — identical positioning, watch for brand - collision in marketing copy. - -**Signals to react to:** -- If Paperclip adds persistent agent memory → closes the primary gap; reassess - differentiation urgently (54.8k ⭐ head start matters). -- If they ship a visual org chart → direct Canvas competitor. -- Paperclip is the highest-star agent-orchestration OSS project we've tracked; watch - weekly. - -**Last reviewed:** 2026-04-16 · **Stars / activity:** ~54.8k ⭐, launched March 4 2026, very active - ---- - -### Google ADK — `google/adk-python` - -**Pitch:** "An open-source, code-first Python toolkit for building, evaluating, and -deploying sophisticated AI agents with flexibility and control." - -**Shape:** Python (Apache-2.0), ~19k ⭐, v1.29.0 released April 9, 2026. Google's -official multi-agent SDK — the framework companion to Gemini CLI (already tracked). -Optimised for Gemini models but model-agnostic. Ships a web DevUI (`google/adk-web`, -~920⭐) for real-time agent debugging, a built-in evaluation framework, and pre-built -tool integrations. Deployed via `pip install google-adk`. Actively maintained inside -Google's own org. - -**Overlap with us:** Google now has a full agent stack: Gemini CLI (interactive terminal -agent) + ADK (framework for building agents) + adk-web (DevUI). Any team evaluating -Molecule AI will weigh ADK + Gemini CLI as a build-your-own path. The adk-web DevUI -overlaps with our Canvas's agent-inspection surface. ADK's evaluation framework is the -same gap our `GET /workspaces/:id/traces` + Langfuse stack addresses. - -**Differentiation:** ADK is a framework, not a platform. No persistent workspace -lifecycle, no Docker container management, no visual org chart, no A2A between -independently deployed agents, no scheduling, no channel integrations. It generates the -agent logic; Molecule AI runs the agents as long-lived services. The two are potentially -complementary: a Molecule AI workspace running ADK agents is a natural pairing. - -**Worth borrowing:** -- **Built-in evaluation framework** — structured agent eval runs tied to traces. Map to - our `GET /workspaces/:id/traces` endpoint; add a companion eval-run API. -- **adk-web DevUI patterns** — event tracking, execution-flow tracing, artifact - management in a browser UI. Reference design for our Canvas trace view. -- **`google-adk` runtime adapter** — add alongside our existing langgraph / autogen / - openclaw adapters so Molecule AI workspaces can run ADK agent logic natively. - -**Terminology collisions:** -- "agent" — their in-process Python object; our long-lived Docker workspace. -- "tool" — same concept; ADK tools and our MCP tools are structurally identical. -- "runner" — ADK's execution context; distinct from our workspace container runtime. - -**Signals to react to:** -- If ADK ships persistent agent state and memory between runs → closes the primary gap - with our platform; update positioning. -- If ADK + Gemini CLI becomes a hosted Vertex AI managed service → Google enters - platform territory with massive distribution; accelerate our model-agnostic story. -- ADK is the official successor for teams currently using LangGraph with Gemini → our - langgraph adapter should note ADK as an alternative path. - -**Last reviewed:** 2026-04-18 · **Stars / activity:** ~22k ⭐, v1.31.0 April 17 2026, Google-maintained - -**v1.31.0 update (2026-04-18):** Multi-language parity landed — Python, TypeScript, Java, Go all at 1.0. Native A2A added: full protocol (agent cards, message/send, task lifecycle, streaming, gRPC v0.3). A2A is Linux Foundation-governed, not Google-only — interops with any framework. **Platform gaps confirmed open**: no scheduling, no cron, no org-level workspace management, no cross-agent HITL (ADK `require_confirmation` explicitly broken across agent boundaries, maintainer-confirmed GitHub Discussion #3276). **Verdict: WATCH** (not elevated). Protocol layer compressed; Molecule platform layer intact. Escalation triggers: Vertex ships org-level workspace mgmt OR ADK fixes cross-agent HITL. - ---- - -### Chrome DevTools MCP — `ChromeDevTools/chrome-devtools-mcp` - -**Pitch:** "Chrome DevTools for coding agents — MCP server enabling agents to control -and inspect live Chrome browsers." - -**Shape:** TypeScript (Apache-2.0), ~35.5k ⭐. Official **ChromeDevTools** org repo — -the same team that maintains Chrome's built-in devtools. An MCP server exposing 23 -tools across six categories: input automation (click, type, scroll), navigation (goto, -back, reload), emulation (viewport, device mode), performance analysis (traces and -Lighthouse insights), network analysis (HAR, request/response inspection), and -debugging (source-mapped stack traces, console, screenshots). Compatible with 29 MCP -clients including Claude Code, Gemini CLI, Cursor, and Copilot. Uses Puppeteer under -the hood with CDP. - -**Overlap with us:** Our `browser-automation` plugin connects to Chrome CDP at -`host.docker.internal:9223` using raw Puppeteer. Chrome DevTools MCP provides the same -capabilities — and much more — as a standard MCP server any workspace agent can call -without custom Puppeteer code. The 23-tool surface covers everything our current CDP -integration does plus performance tracing, network HAR capture, and source-mapped stack -traces we don't currently expose. Official ChromeDevTools org backing makes this the -likely de facto standard for browser tool use in agents. - -**Differentiation:** A pure MCP server — no agent runtime, no memory, no scheduling, no -org hierarchy. Molecule AI is the platform that runs agents that *call* this MCP server. -Complementary by design. - -**Worth borrowing:** -- **Replace custom CDP integration** — update `plugins/browser-automation/` to install - `chrome-devtools-mcp` as the standard MCP server rather than maintaining bespoke - Puppeteer scripts. Agents get performance tracing, HAR capture, and source-mapped - debugging for free. -- **23-tool surface as reference design** — our current browser plugin exposes ~5 tools; - this is the full coverage target. -- **Source-mapped stack traces** — currently absent from our browser-automation debug - output; immediately useful for our QA Engineer workspace. - -**Terminology collisions:** -- "DevTools" — their MCP server name; our plugin is "browser-automation." No user - collision but align naming in skill docs. - -**Signals to react to:** -- If ChromeDevTools org publishes a versioned MCP manifest → treat as the browser-tool - standard and pin a version in our plugin manifest. -- If Anthropic or OpenAI reference this as the recommended browser MCP → accelerate the - `plugins/browser-automation` migration. -- Official org backing + 35.5k ⭐ means this is already the de facto standard. - -**Last reviewed:** 2026-04-16 · **Stars / activity:** ~35.5k ⭐, ChromeDevTools org, Apache-2.0 - ---- - -### LangGraph — `langchain-ai/langgraph` - -**Pitch:** "Build resilient language agents as graphs — stateful, multi-actor -applications with fine-grained control over agent flow." - -**Shape:** Python + JavaScript/TypeScript library (MIT), ~29k ⭐, v1.1.6 released -April 10 2026. Part of the LangChain ecosystem. Agents are modelled as directed -graphs: nodes are callables (LLM calls, tool calls, conditional branches), edges are -routing rules, and a persistent **state schema** carries data between nodes. -Checkpointing (memory persistence across turns) is built in via a pluggable -`Checkpointer` interface (in-memory, SQLite, Postgres, Redis). Multi-agent -compositions via subgraph nodes. LangGraph Cloud offers hosted execution backed by -LangSmith observability. LangGraph 2.0 GA shipped February 2026, adding declarative -guardrail nodes (content filtering, rate limiting, audit logging as config). - -**Overlap with us:** Molecule AI ships a `langgraph` runtime adapter -(`molecule-ai-workspace-template-langgraph`) — this is us *on top of* LangGraph. -Their graph model (nodes, edges, state) is structurally analogous to our workspace -hierarchy (workspaces, A2A calls, shared context). Their `Checkpointer` is the -lower-level equivalent of our `agent_memories` table. LangGraph Cloud's hosted -execution competes directly with our scheduler + workspace lifecycle. - -**Differentiation:** LangGraph is a framework for *building* the logic of one agent -or pipeline; Molecule AI is a platform for *deploying and coordinating* long-lived -agents as an org. LangGraph has no concept of Docker workspace isolation, org-chart -hierarchy, inter-agent A2A protocol, channel integrations, visual canvas, or cron -scheduling. Our langgraph adapter *runs on top of* LangGraph — they're layered, not -competing, for most use cases. The gap is LangGraph Cloud vs our hosted platform. - -**Worth borrowing:** -- **Declarative guardrail nodes** (v2.0) — content filtering and audit logging as - first-class graph nodes rather than custom code. Map to our `approvals` table: - add declarative gate types (content-filter, rate-limit) in workspace config. -- **Subgraph composition** — composing multi-agent pipelines by nesting graphs. - Our workspace parent/child hierarchy is the operational equivalent; study for - dynamic sub-workspace spawning UX. -- **Checkpointer interface** — the pluggable backend design (SQLite → Postgres → - Redis hot path) is the right abstraction for our `agent_memories` persistence layer. - -**Terminology collisions:** -- "state" — LangGraph: the typed dict carried between graph nodes; ours: workspace - status (online/offline/degraded). No user confusion but docs should disambiguate. -- "node" — LangGraph: a callable in the agent graph; our canvas: a workspace tile. - Same word, very different level of abstraction. -- "graph" — LangGraph: the directed workflow graph; our canvas: the live org chart. - Marketing copy should distinguish "workflow graph" (LangGraph) vs "org chart" (us). - -**Signals to react to:** -- If LangGraph Cloud adds persistent agent identity (long-lived named agents beyond - per-session checkpoints) → direct hosted-platform competition; accelerate our - LangGraph adapter differentiation. -- If LangGraph 2.0 guardrail nodes become the standard compliance primitive for AI - pipelines → expose an equivalent gate type in `workspace/` adapters. -- If LangSmith + LangGraph Cloud bundle as an all-in-one enterprise platform → we - need to position our model-agnostic, self-hostable story more aggressively against - LangChain lock-in. - -**Last reviewed:** 2026-04-16 · **Stars / activity:** ~29k ⭐, v1.1.6 April 10 2026, very active - ---- - -### CrewAI — `crewAIInc/crewAI` - -**Pitch:** "Framework for orchestrating role-playing, autonomous AI agents — by -fostering collaborative intelligence, CrewAI empowers agents to work together -seamlessly, tackling complex tasks." - -**Shape:** Python library (MIT), ~48k ⭐, v1.14.2 released April 8 2026. Agents are -defined by `role`, `goal`, and `backstory` fields and assembled into a `Crew` with -`Process.sequential` (fixed order) or `Process.hierarchical` (manager agent -delegates) execution. `Flow` (event-driven stateful pipelines, shipped 2024-Q4) -enables complex conditional branching beyond linear crew execution. Model-agnostic: -OpenAI, Anthropic, Gemini, Mistral, Bedrock, Ollama, and any LiteLLM-compatible -endpoint. Tools are Python callables or MCP integrations. CrewAI Enterprise is the -commercial SaaS offering. - -**Overlap with us:** Molecule AI ships a `crewai` runtime adapter -(`molecule-ai-workspace-template-crewai`) — our workspaces *run* CrewAI crews. -The Crew role model (`role` + `goal` + `backstory`) is our system-prompt-encoded -persona convention made explicit and typed. `Process.hierarchical` with a manager -agent mirrors our PM → Dev Lead → Engineer delegation chain. Flow's event-driven -branching is analogous to our `workspace_schedules` trigger model. - -**Differentiation:** CrewAI is an in-process Python framework; Molecule AI is the -operational platform. CrewAI agents are ephemeral per crew run — no Docker isolation, -no persistent identity across restarts, no org-chart canvas, no A2A between -independently deployed agents, no cron scheduling, no channel integrations. A -Molecule AI CrewAI workspace *persists* across sessions, holds a role in a larger org, -and coordinates via our A2A protocol — capabilities CrewAI alone does not provide. - -**Worth borrowing:** -- **Typed role schema** — `{role, goal, backstory}` as first-class typed fields - (not free-text system prompt). Our `config.yaml` `role:` is a single string; adopting - a richer `{role, goal, backstory}` triplet would improve agent persona consistency - across restarts and be CrewAI-compatible. -- **`Flow` event-driven pipelines** — conditional state-machine branching triggered by - events. Applicable to our `workspace_schedules` — replace cron-only triggers with - an event graph: "when PR merged → trigger QA workspace → on pass → trigger deploy." -- **Tool decorator pattern** — `@tool` with docstring-as-schema is simpler than our - MCP tool config approach for workspace-local tools. - -**Terminology collisions:** -- "crew" — their multi-agent team; our team is a set of workspaces in an org - hierarchy. Marketing copy should use "workspace org" not "crew" to stay distinct. -- "agent" — their ephemeral in-process Python object; our long-lived Docker workspace. -- "task" — their atomic unit of work assigned to an agent; our `current_task` - heartbeat field. Same word, different scope. - -**A2A interop (confirmed 2026-04-17):** CrewAI implements A2A spec v0.3.0 (client + server), matching Molecule AI's `a2a-sdk[http-server]==0.3.25`. **Zero-shim interop confirmed today** — a Molecule AI org can delegate to a CrewAI A2A endpoint, and CrewAI agents can be registered as worker nodes in a Molecule AI hierarchy without any protocol shim. The shared upgrade clock: A2A spec v1.0.0 (March 12 2026) has breaking wire-format changes (`extendedAgentCard` → `AgentCapabilities`, OAuth flow restructure). Neither side has migrated yet. Schedule a coordinated v1.0.0 migration before either platform upgrades unilaterally. - -**Signals to react to:** -- If CrewAI ships persistent agent state between crew runs → closes primary gap with - our workspace model; ~48k ⭐ means it would land with significant reach. -- If CrewAI Enterprise adds visual org-chart canvas → direct platform competitor (Crew - Studio is workflow-only, not governance org-chart — our Canvas moat intact today). -- If the 2026 State of Agentic AI survey (65% of orgs using agents) accelerates - CrewAI Enterprise sales → their enterprise positioning competes directly with ours; - update ICP messaging. -- If either side upgrades to A2A v1.0.0 before the other → breaking interop; watch - crewAIInc/crewAI CHANGELOG for `protocol_version` bump. - -**Last reviewed:** 2026-04-17 (A2A interop confirmed) · **Stars / activity:** ~48k ⭐, v1.14.2 April 8 2026, very active - ---- - -### Temporal — `temporalio/temporal` - -**Pitch:** "The durable execution platform — write code that runs reliably even in -the face of failures, timeouts, and restarts." - -**Shape:** Go server + SDKs for Go, Java, TypeScript, Python, .NET, PHP (MIT), -~13k ⭐ server repo. Workflow logic is deterministic code that Temporal replays from -event history after failures — no explicit retry/checkpoint code. `Activities` are -the fallible steps; `Signals` allow external input mid-workflow; `Queries` expose -read-only workflow state. Temporal Cloud is the managed SaaS; self-hosted runs on -K8s or Docker. Raised $300M Series D at $5B valuation February 2026, with AI driving -demand for durable execution. v1.30.4 released April 10 2026. - -**Overlap with us:** Molecule AI already integrates Temporal via -`workspace/builtin_tools/temporal_workflow.py`. The `infra/scripts/setup.sh` -starts a local Temporal server (`:7233` gRPC + `:8233` Web UI). Any Molecule AI -workspace that needs bulletproof long-running or retryable work delegates to Temporal. -Temporal's Worker Versioning (GA March 2026) solves the same code-deploy-during-live- -workflow problem our restart-context message handles ad hoc. - -**Differentiation:** Temporal is infrastructure — a durable execution engine with no -concept of agent identity, LLM calls, memory, org hierarchy, canvas, channels, or A2A. -It is the *substrate* beneath agents that need guaranteed execution; we are the -*platform* that decides when to call Temporal vs handle work in the workspace itself. -We are Temporal consumers, not competitors. The distinction for users: use Temporal -when you need workflow history replay and multi-step retry guarantees; use Molecule AI -scheduling for lighter cron-triggered agent prompts. - -**Worth borrowing:** -- **Worker Versioning** (GA March 2026) — pin live workflows to a specific code - version so deploys don't corrupt in-flight runs. Analogous problem to our - workspace restart-context; worth evaluating as the underlying mechanism for - zero-downtime workspace deploys. -- **Workflow Update operation** — synchronous request/response pattern for live - workflows (e.g., human approves mid-workflow). Cleaner than our current - `approvals` polling; evaluate for HITL in long Temporal-backed workspace tasks. -- **Upgrade-on-Continue-as-New** (Public Preview March 2026) — pinned workflows can - opt into a newer code version at a clean continuation boundary. Pattern applicable - to our workspace versioning strategy. - -**Terminology collisions:** -- "workflow" — Temporal: a deterministic, replay-safe code function; ours: informal - delegation chain term. In our docs, "Temporal workflow" should always be qualified - to avoid confusion with "workflow" in general product copy. -- "worker" — Temporal: a process that polls the server and executes workflow/activity - code; ours: not a first-class term (workspaces fill this role). -- "activity" — Temporal: a fallible, retryable step in a workflow; ours: `activity_logs` - table (A2A traffic logs). Different concepts sharing a word. - -**Signals to react to:** -- If Temporal Cloud adds native LLM-aware primitives (e.g., LLM call as a first-class - activity with token tracking, model fallback, prompt versioning) → Temporal becomes - an agent platform, not just an infra layer; reassess our `temporal_workflow.py` - integration depth. -- If the $300M Series D accelerates enterprise sales motion → more enterprises will - arrive with Temporal already deployed; strengthen our Temporal integration story as - a first-class enterprise deployment pattern. -- If Upgrade-on-Continue-as-New becomes stable → adopt for workspace blue/green - deploy pattern (no workspace downtime during code updates). - -**Last reviewed:** 2026-04-16 · **Stars / activity:** ~13k ⭐ (server); $5B valuation, $300M Series D Feb 2026; v1.30.4 April 10 2026 - ---- - -### Dify — `langgenius/dify` - -**Pitch:** "Production-ready platform for agentic workflow development — the leading -open-source LLM app development platform." - -**Shape:** Python backend + React frontend (MIT), ~60k ⭐, v1.14.0 released February -2026. Visual drag-drop workflow canvas where LLM calls, RAG retrievers, code -executors, HTTP nodes, and agent loops are wired as a graph. Ships a full app -deployment stack: API server, web UI widget, and Slack/Telegram/WhatsApp channel -integrations. RAG pipeline with knowledge base management (file upload → chunk → -embed → retrieve). Supports 50+ LLM providers. Dify Cloud is the managed SaaS; -self-hosted via Docker Compose. Raised $30M Pre-A round led by HSG, March 2026. - -**Overlap with us:** Both have a visual canvas for connecting AI work. Both support -channel integrations (Slack / Telegram / WhatsApp). Both run LLM-backed agents and -expose a REST API for external trigger. Dify's `Human Input` node (v1.14.0) is the -same pattern as our `approvals` table — pause workflow, wait for human input, resume. -Their knowledge base (RAG) is the equivalent of what our Research Lead workspace does -via tool calls to external retrieval services. Dify Cloud competes with our SaaS -control plane for teams that want a hosted no-code LLM app platform. - -**Differentiation:** Dify targets **no-code and low-code builders** — the UX is -workflow configuration, not code. No persistent agent identity across workflow runs, -no multi-agent org hierarchy (agents in Dify are single workflow nodes, not -first-class citizens), no A2A protocol between independently deployed agents, no -Docker container isolation per agent. Molecule AI targets developers who write -`config.yaml` and system prompts; Dify targets product managers and ops teams who -want to deploy LLM apps without engineering. The ~60k ⭐ signal shows massive -no-code demand that our current product does not address. - -**Worth borrowing:** -- **Human Input node** — native human-in-the-loop as a workflow node type, not a - separate approvals API. Map to our `approvals` table: expose a "wait for human" - node in a future visual workspace config editor. -- **Summary Index** (v1.14.0) — AI-generated summaries per document chunk in the - RAG knowledge base significantly improve retrieval precision. Applicable to our - Research Lead workspace's document retrieval; evaluate for our MCP memory backend. -- **Knowledge base management UI** — file upload → chunk → embed → retrieval test - in a single interface. Reference design for our future `agent_memories` admin UI. -- **Channel trigger UX** — same as n8n: three-click channel connect. Our channel - setup is more manual; Dify is a second data point that this is the target UX. - -**Terminology collisions:** -- "workflow" — Dify: the visual graph of LLM+tool nodes that defines an app; ours: - informal delegation chain. In competitive positioning copy, distinguish "no-code - workflow builder" (Dify) vs "multi-agent org" (us). -- "agent" — Dify: a single ReAct loop node inside a workflow; ours: a long-lived - Docker workspace with an assigned role. Different scope and persistence model. -- "knowledge base" — Dify: an indexed file collection for RAG; ours: not a - first-class term (workspace agents manage their own retrieval). - -**Signals to react to:** -- If Dify ships persistent agent identity (agents that remember state across workflow - runs, not just within one) → closes the primary product gap; ~60k ⭐ + no-code - accessibility is a formidable combination. -- If Dify adds multi-agent coordination (agents that spawn and coordinate sub-agents - as org peers, not just nested workflow nodes) → direct overlap with our multi- - workspace hierarchy. -- If the $30M Pre-A closes more enterprise deals → Dify moves up-market; watch for - enterprise canvas and RBAC features that would narrow our enterprise differentiation. - -**Last reviewed:** 2026-04-16 · **Stars / activity:** ~60k ⭐, v1.14.0 Feb 2026; $30M Pre-A Mar 2026 - ---- - -### Flowise — `FlowiseAI/Flowise` - -**Pitch:** "Build AI Agents, Visually — drag-drop UI to build LLM flows and agent -pipelines using LangChain and LlamaIndex components." - -**Shape:** Node.js + React (MIT repo; post-Workday acquisition terms TBD), ~30k ⭐, -flowise@3.1.0 released March 16 2026. Drag-drop visual node editor where LangChain -chains, LlamaIndex query engines, vector stores, tools, and agents are wired as a -flow graph. Each flow is exported as a JSON config; the Flowise server exposes a -REST API and a chat widget embed. **Agentflow** (shipped 2024) adds multi-agent -composition: a Supervisor agent routes tasks to Worker agents within a single Flowise -flow. **Acquired by Workday** (announced August 2025) — Flowise is now part of -Workday's AI platform, bringing agent-building capability to Workday customers. -Security: three chained CVEs (CVE-2025-59528, CVE-2025-8943, CVE-2025-26319) enabling -unauthenticated RCE via Custom MCP Node were patched in v3.0.6 (exploit confirmed -April 7 2026). - -**Overlap with us:** Both are drag-drop visual builders for AI agent workflows. Both -support LangChain components under the hood. Flowise's Agentflow (Supervisor + Worker -agents) mirrors our PM → engineer hierarchy, but within a single visual flow rather -than independently deployed Docker workspaces. Flowise's REST API per flow is -structurally similar to our `POST /workspaces/:id/a2a` endpoint — both let external -systems trigger an agent and get a response. Channel integrations overlap with our -`workspace_channels`. - -**Differentiation:** Flowise is a **no-code single-server app builder** — agents are -stateless flow executions, not long-lived Docker workspaces with persistent memory, -schedules, and org identity. Post-Workday acquisition, Flowise targets Workday -enterprise customers (HR, finance, ops) rather than developer-first teams building AI -companies. No persistent agent memory between flow runs, no A2A protocol between -independently deployed agents, no cron scheduling, no org-chart canvas. The Workday -acquisition actually *narrows* Flowise's addressable market to Workday-centric -enterprises — which opens space for Molecule AI as the developer-first alternative. - -**Worth borrowing:** -- **Agentflow Supervisor/Worker pattern** — the Supervisor agent dynamically routes - tasks to Workers based on their capabilities, with results aggregated back. More - flexible than our static PM → Lead delegation; study for dynamic routing in the PM - workspace's `delegate_task`. -- **Flow-as-JSON export/import** — each Flowise flow is a portable JSON blob that - can be versioned, shared, and re-imported. Our workspace `config.yaml` is close; - adding a full workflow export (config + memory schema + skill list) as a bundle - would enable the same portability. -- **Chat widget embed** — single-line script tag embeds a Flowise agent as a chat - widget on any webpage. Our `workspace_channels` is closer to outbound messaging; - a widget embed for inbound is a UX gap worth closing for developer adoption. - -**Terminology collisions:** -- "flow" — Flowise: a visual JSON graph of LangChain nodes; ours: not a first-class - term. Avoid "flow" in our visual canvas docs to prevent confusion with Flowise- - trained users. -- "node" — Flowise: a LangChain component tile in the flow canvas; our canvas: a - workspace tile. Same word, same visual metaphor, different semantics. -- "supervisor" / "worker" — Flowise Agentflow roles; our PM / engineer hierarchy is - the same concept with different names. Our marketing should own "PM + engineer" - framing to stay distinct. - -**Signals to react to:** -- If Workday opens Flowise APIs to non-Workday enterprise customers → Flowise - re-enters the general market with Workday distribution; update competitive messaging. -- If the CVE chain (RCE via Custom MCP Node) causes enterprise churn → opportunity - to position Molecule AI's Docker-isolated workspaces as the security-first - alternative for self-hosted agent deployments. -- If Flowise ships persistent agent memory or cross-flow A2A → closes primary gap; - monitor quarterly given Workday engineering resources. - -**Last reviewed:** 2026-04-16 · **Stars / activity:** ~30k ⭐, flowise@3.1.0 March 16 2026; acquired by Workday Aug 2025 - ---- -## Candidates to add (backlog) - -Short-list of projects to write up next time someone has an hour: - -- **AutoGen** (`microsoft/autogen`) — Microsoft's original repo; now superseded by - Microsoft Agent Framework (tracked above) and AG2 community fork (tracked above). - Entry should clarify which adapter target is canonical. -- **DeepAgents** (`langchain-ai/deepagents`) — we adapt it; particularly their - sub-agent feature that collides with our "skills" word. -- **OpenClaw** — check if this is still live post-Hermes rebrand; our - adapter may need renaming. -- **Moltiverse / Moltbook** (`molti-verse.com`) — "social network for AI - agents." Not a competitor; orthogonal ecosystem but worth tracking in - case we want agent-to-agent discovery beyond a single org. - ---- - -### OpenAI Agents SDK — Sandbox Agents — `openai/openai-agents-python` - -**Pitch:** "A lightweight, powerful framework for multi-agent workflows — now with -persistent isolated sandbox workspaces, snapshot/resume, and sandbox memory." - -**Shape:** Python (MIT), ~14k ⭐ (110 stars today), v0.14.0 released April 15, 2026. -New beta surface: `SandboxAgent` backed by a `Manifest` (file tree, Git repo, -mounts) and a `SandboxRunConfig` that targets a pluggable execution backend. -Local: `UnixLocalSandboxClient`; containerised: `DockerSandboxClient`; hosted via -optional extras for Blaxel, Cloudflare, Daytona, E2B, Modal, Runloop, and Vercel. -**Sandbox memory** lets future runs inherit lessons from prior runs with progressive -disclosure and configurable isolation boundaries. Existing SDK primitives (Agents, -Handoffs, Guardrails, Tracing) are unchanged. - -**Overlap with us:** `SandboxAgent` + hosted backends directly competes with our -workspace lifecycle model — a persistent isolated execution environment, snapshot -and resume, durable memory. The multi-backend strategy (Docker, Modal, Vercel, E2B) -mirrors our Docker workspace + cloud-provider abstraction goal. Sandbox memory is -the same cross-session memory gap we address via `agent_memories`. - -**Differentiation:** Still a framework, not a platform — no visual canvas, no -org-chart hierarchy, no A2A between independently deployed sandboxes (handoffs are -in-process), no cron scheduling, no channel integrations. OpenAI-provider-optimised -in practice. Our differentiators: multi-agent org hierarchy with A2A, model-agnostic, -self-hostable, persistent agent identity beyond a single SDK process. - -**Worth borrowing:** `SandboxRunConfig` backend abstraction — decouple workspace -execution from provider (Docker / Modal / Vercel) using a single config object. -Directly applicable to our workspace provisioner. Sandbox memory progressive -disclosure (summaries first, full context on demand) matches the retrieval strategy -in claude-mem; adopt for `agent_memories` query API. - -**Terminology collisions:** "sandbox" — theirs: an isolated execution backend; ours: -not a first-class term (we use "workspace" / "container"). "memory" — same word, -same intent; our `agent_memories` and their sandbox memory are functionally equivalent. - -**Signals to react to:** If OpenAI adds inter-sandbox A2A (sandboxes delegating to -each other across process boundaries) → direct platform feature parity; accelerate -our A2A documentation and SDK ergonomics. If hosted backends gain TypeScript support -(announced as roadmap) → Vercel + TS stack competes for our TypeScript-native users. - -**Last reviewed:** 2026-04-16 · **Stars / activity:** ~14k ⭐, v0.14.0 April 15, 2026, OpenAI-maintained - ---- - -### Tencent AI-Infra-Guard — `Tencent/AI-Infra-Guard` - -**Pitch:** "A full-stack AI Red Teaming platform securing AI ecosystems via Agent -Scan, Skills Scan, MCP scan, AI Infra scan, and LLM jailbreak evaluation." - -**Shape:** Python + Go (Apache-2.0), ~3.5k ⭐, v4.1.3 released April 9, 2026. -Tencent Zhuque Lab. Six scanning surfaces: ClawScan (open-source code security), -Agent Scan (runtime agent behaviour audit), Skills Scan (verifying installed agent -skills), MCP Server scan (tool-surface vulnerability detection), AI infrastructure -CVE matching (1000+ CVEs across 57+ AI components including crewai, kubeai, -lobehub), and LLM jailbreak evaluation. Ships a web UI, REST API, Docker deployment, -and integration with ClawHub agent marketplace. - -**Overlap with us:** Our plugin/skills registry and MCP server are exactly the -surfaces AI-Infra-Guard scans. The Skills Scan module validates installed agent -skill packs — the same artefacts our `plugins/` directory ships. MCP Server scan -targets the same `@molecule-ai/mcp-server` surface our platform exposes. If -enterprise customers adopt AI-Infra-Guard for compliance audits, our plugin manifests -and MCP tool definitions need to be compatible with its scanner. - -**Differentiation:** A security tooling product, not an agent framework or platform. -No agent runtime, no orchestration, no canvas, no memory. Molecule AI builds and -runs agents; AI-Infra-Guard audits them and their supply chain. - -**Worth borrowing:** MCP Server scan vulnerability categories — use as a checklist -for hardening our own MCP server (`@molecule-ai/mcp-server`) before an enterprise -security review. Skills Scan concept — add a `plugin validate` sub-command to -`molecli` that runs the same checks locally before installing a plugin. - -**Terminology collisions:** "agent scan" — their runtime audit process; not a term -we use. "skills scan" — their validation of installed skill packs; same artefact, -different word ("plugin audit" in our vocabulary). - -**Signals to react to:** If AI-Infra-Guard publishes a formal MCP tool-surface -security spec → treat as a compliance baseline for our MCP server hardening. If -Tencent integrates this into enterprise procurement checklists → our plugin and MCP -docs need an explicit security posture section to pass audits. - -**Last reviewed:** 2026-04-16 · **Stars / activity:** ~3.5k ⭐, v4.1.3 April 9, 2026, Tencent Zhuque Lab - ---- - -### VoltAgent — `VoltAgent/voltagent` - -**Pitch:** "The open-source TypeScript AI agent framework with a built-in -observability and deployment console — build agents once, run and monitor them -everywhere." - -**Shape:** TypeScript (MIT), ~8.2k ⭐, 668 releases, latest April 11, 2026. -Two-layer design: `@voltagent/core` framework (typed agent definitions, tool -registry, multi-agent supervisor/sub-agent coordination, memory, RAG, voice, -guardrails) + **VoltOps Console** (hosted or self-hosted web UI for observability, -deployment automation, and agent lifecycle management). MCP client support connects -any MCP server as a tool source. Provider-agnostic: OpenAI, Anthropic, Google, -Ollama, and any OpenAI-compatible endpoint. Ships `@voltagent/server-elysia` for -Bun-native HTTP serving of agents. - -**Overlap with us:** VoltOps Console is the closest analogue to our Canvas we've -tracked in the TypeScript ecosystem — both provide a web UI for managing and -monitoring long-lived agents. The supervisor/sub-agent coordination model mirrors -our PM → engineer delegation. MCP support means workspace skills install into -VoltAgent as easily as ours. `@voltagent/server-elysia` pattern (agent as an HTTP -server) is analogous to our A2A endpoint per workspace. - -**Differentiation:** No Docker workspace isolation, no persistent agent identity -across server restarts, no A2A protocol between independently deployed agents, no -cron scheduling, no channel integrations. VoltOps Console focuses on observability -and deployment automation; our Canvas is the live visual org chart with drag-drop -topology control. Molecule AI targets multi-agent companies; VoltAgent targets -individual TypeScript developers building production agents. - -**Worth borrowing:** VoltOps observability schema — trace views, agent state -inspection, and deployment automation as a single UI surface. Reference design for -merging our Canvas agent-inspection panel with Langfuse traces into a unified -observability tab. `@voltagent/core` typed agent definition API (role, memory, -tools, guardrails as typed config) — cleaner than our YAML-then-system-prompt -pipeline; evaluate for a future typed workspace config schema. - -**Terminology collisions:** "console" — VoltOps Console: their monitoring + deploy -UI; our `molecli`: a TUI dashboard. Both are "consoles" for watching agents. -"supervisor" — their orchestrating agent tier; our PM workspace plays the same role. - -**Signals to react to:** If VoltOps Console adds visual org-chart topology (not just -list view) → direct Canvas competitor in the TypeScript ecosystem. If -`@voltagent/core` multi-agent API becomes idiomatic for TS agent developers → -consider shipping an official Molecule AI VoltAgent runtime adapter alongside our -langgraph/crewai adapters. - -**Last reviewed:** 2026-04-16 · **Stars / activity:** ~8.2k ⭐, 668 releases, latest April 11, 2026 - ---- - -### Cognee — `topoteretes/cognee` - -**Pitch:** "Knowledge Engine for AI Agent Memory in 6 lines of code — hybrid graph + vector search, runs locally, multimodal." - -**Shape:** Python library (Apache 2.0), ~15.8k ⭐, v1.0.1.dev1 April 15, 2026. Six-stage ingest pipeline (`cognify`): classify → permissions → chunk → LLM entity/relationship extraction → LLM summarise → embed into vector + commit graph edges. 14 retrieval modes from top-k cosine up to `GRAPH_COMPLETION` (vector → graph traversal → structured context). Default backends are file-local, zero-config: LanceDB (vectors), KuzuDB (graph), SQLite (metadata). Production upgrade path: Postgres + pgvector or Neo4j via pip extras. Enterprise tier adds cross-agent knowledge sharing with tenant isolation and OTEL tracing. - -**Overlap with us:** Directly addresses the same gap our `agent_memories` table targets — persistent, queryable agent knowledge across sessions. Ships a `claude-code-plugin` for session memory injection (same use case as `claude-mem`'s 56k⭐ demand signal). Native integration with Hermes Agent. The hybrid graph+vector approach (knowledge graph for relationships, vector for semantic recall) is materially more sophisticated than our current key-value `agent_memories` model. - -**Differentiation:** Pure memory library — no workspace lifecycle, no agent orchestration, no A2A, no canvas. Intended to be embedded into any agent framework, including Molecule AI workspaces, not to replace them. - -**Integration path (TR eval 2026-04-17):** **Augment, not replace** the existing key-value `agent_memories` path. -- `cognify` fires 2–5 LLM calls per ingest — must be **async/batched** (on session flush), not inline per-turn. -- `cognee_search (GRAPH_COMPLETION)` latency ~200–500 ms — acceptable for explicit semantic queries, not per-turn default. -- Existing key-value path stays as primary per-turn read (10–50 ms). -- MVP deployment: `pip install cognee` + `LLM_API_KEY` (already supplied as `ANTHROPIC_API_KEY`) + `/configs/cognee/` volume mount. **Zero new containers.** -- Build estimate for `molecule-cognee` plugin: **~3 days** (async ingest wrapper + search skill + plugin.yaml/rules/CI). Recommended sequence: **after #573 (mcp-connector) and #574 (code-sandbox)** land. - -**Worth borrowing:** The four-operation memory API (`remember` / `recall` / `forget` / `improve`) is a clean contract worth adopting in our `agent_memories` API surface. The tenant-isolated cross-agent knowledge graph model (agents share a knowledge base scoped to their org) maps well to our workspace hierarchy. - -**Terminology collisions:** "cognify" — their ingest verb; we'd call this "index" or "ingest". "prune" — their delete; we use `DELETE /workspaces/:id/memories/:id`. - -**Signals to react to:** If Cognee ships a first-class MCP server → immediately relevant as a drop-in memory backend for any MCP-capable workspace. If 56k⭐ `claude-mem` users migrate to Cognee for graph-based recall → validates gap and urgency. - -**Last reviewed:** 2026-04-17 (TR integration eval) · **Stars / activity:** ~15.8k ⭐, v1.0.1.dev1, April 15, 2026 - ---- - -### Archestra — `archestra-ai/archestra` - -**Pitch:** "End the MCP chaos — a self-hosted enterprise platform for governing, securing, and monitoring your organization's MCP servers." - -**Shape:** TypeScript (AGPL-3.0), ~3.6k ⭐, platform v1.2.15 April 16, 2026. Kubernetes-native. Two main surfaces: (1) **MCP Registry** — private, shared MCP server catalog for teams; OAuth + API key management; governance controls on which teams can access which tools. (2) **Security Gateway** — dual-LLM architecture where a security sub-agent intercepts tool responses to block prompt injection and data exfiltration before results reach the primary agent. Also: per-team cost monitoring, ChatGPT-style chat UI with private prompt registry, Terraform provider + Helm chart. - -**Overlap with us:** Our `plugins/` registry and per-workspace plugin install system serve a similar "shared tools across an agent org" purpose. Archestra's MCP governance story (who can call which tools, cost per team, audit trail) is a more formal version of what our `POST /workspaces/:id/plugins` API provides informally. The dual-LLM security gateway pattern is novel and directly applicable to our A2A proxy hardening. - -**Differentiation:** Archestra governs MCP servers, not agent workspaces — it has no multi-agent orchestration, no workspace lifecycle, no A2A protocol, no canvas. It's an MCP-specific control plane, not an agent orchestration platform. Could complement Molecule AI rather than replace it. - -**Worth borrowing:** Dual-LLM security gateway pattern — intercept tool responses with a fast security model before they reach the primary agent. Apply to our A2A proxy (`a2a_proxy.go`) for tool-response sanitisation. Per-team MCP cost attribution model — maps naturally to our workspace tier billing. - -**Terminology collisions:** "orchestrator" — Archestra means "MCP server lifecycle manager"; we mean "multi-agent coordinator". Both use the word for very different things. - -**Signals to react to:** If Archestra adds agent-to-agent coordination on top of its MCP gateway → overlap with our platform increases significantly. If enterprise procurement teams start requiring an MCP governance audit trail → our plugin install API needs a formal audit log surface (issue backlog candidate). - -**Last reviewed:** 2026-04-17 · **Stars / activity:** ~3.6k ⭐, platform v1.2.15, April 16, 2026 - ---- - -### GitHub MCP Server — `github/github-mcp-server` - -**Pitch:** "GitHub's official MCP Server — connect AI agents and assistants directly to your GitHub repositories, issues, PRs, and workflows." - -**Shape:** Go (MIT), ~28.9k ⭐, v1.0.0 April 16, 2026. 60+ tools across 20+ toolsets: repos, issues, PRs, Actions/CI-CD, code security (scanning, Dependabot, secret protection), discussions, gists, git ops, notifications, orgs, projects, labels, users, stargazers. Deployment: GitHub-hosted at `api.githubcopilot.com/mcp/` or local via Docker/compiled binary. Supports dynamic toolset discovery (beta) so hosts can enumerate and enable tools on demand rather than loading all 60+ upfront. - -**Overlap with us:** Chrome DevTools MCP (#540) is already tracked as a tool we adopt into workspaces — GitHub MCP Server is the same pattern for GitHub operations. Any Molecule AI workspace doing code review, PR management, issue triage, or CI monitoring would naturally adopt this. Our Technical Researcher, Dev Lead, and Triage Operator workspace types are obvious candidates. - -**Differentiation:** Tool provider only — no agent orchestration, no workspace model, no A2A. Designed to be consumed by MCP hosts (Claude Code, Copilot, Cursor etc.), not to compete with orchestration platforms. - -**Worth borrowing:** Dynamic toolset discovery (enumerate tools per context, not a monolithic 60-tool blast) — reference design for our workspace plugin `available` endpoint (`GET /workspaces/:id/plugins/available`). Apply the same filtering logic for runtime-aware tool exposure. - -**Terminology collisions:** None significant. - -**Signals to react to:** If GitHub ships an agent-native event webhook model (not just REST polling) → evaluate as a channel adapter alongside our Telegram/Slack integrations. If GitHub exposes repo-scoped A2A agent cards → direct interop opportunity with our registry. - -**Last reviewed:** 2026-04-17 · **Stars / activity:** ~28.9k ⭐, v1.0.0 GA, April 16, 2026 - ---- - -### Skillshare — `runkids/skillshare` - -**Pitch:** "Sync skills across all AI CLI tools with one command — Claude Code, Codex, OpenClaw, Cursor, and 50+ more." - -**Shape:** Go binary (MIT), ~1.5k ⭐, v0.19.2 April 14, 2026. Manages a `~/.config/skillshare/` source-of-truth directory containing SKILL.md files, agent configs, rules, commands, and prompts. Syncs to 50+ AI tool targets via symlinks (macOS/Linux) or NTFS junctions (Windows). Three modes: global (`~/.config/skillshare/`), project (`.skillshare/` per repo, committable), and installable repos (`skillshare install `). Ships a web dashboard UI (`skillshare ui`). Built-in security auditing: scans installed skills for prompt injection and exfiltration patterns. - -**Overlap with us:** Directly overlaps with our `plugins/` distribution model and SKILL.md format — Skillshare treats SKILL.md files as the unit of distribution across tools, the same way our plugin system does. The `skillshare install ` command is equivalent to our `POST /workspaces/:id/plugins` with a `github://` source. The project mode (`.skillshare/` committed to a repo) maps to our org-template skill defaults in `org.yaml`. - -**Differentiation:** Single-user local syncing, not a server-side multi-agent registry. No workspace lifecycle, no per-agent identity, no A2A, no canvas. Designed for individual developer ergonomics across tools, not for governing a fleet of persistent agents. - -**Worth borrowing:** The prompt-injection/exfiltration scanner built into `skillshare sync` — we have no equivalent gate in our plugin install path today. Consider adding a static analysis step to `POST /workspaces/:id/plugins` that scans SKILL.md and rules files for injection patterns before activating. The `install ` one-command install UX is cleaner than our current `{"source":"github://org/repo"}` JSON body — worth documenting as a `molecli` shorthand. - -**Terminology collisions:** "skills" — Skillshare uses this for SKILL.md files that inject instructions into AI tools; we use "skills" for the same concept in our plugin system. Exact collision — no disambiguation needed since we use the same word intentionally. - -**Signals to react to:** If Skillshare adds a server-side shared registry (teams publish skills to a central endpoint) → direct overlap with our plugin registry governance gap that Archestra's MCP registry addresses. If it reaches 10k⭐ → signals the SKILL.md format is becoming a community standard; we should ensure full compatibility. - -**Last reviewed:** 2026-04-17 · **Stars / activity:** ~1.5k ⭐, v0.19.2, April 14, 2026 - ---- - -### Compound Engineering Plugin — `EveryInc/compound-engineering-plugin` - -**Pitch:** "One plugin, 12 runtimes — a CLI that converts a single engineering workflow plugin (brainstorm → plan → work → review) into the correct format for Claude Code, Cursor, Codex, OpenClaw, Gemini CLI, Kiro, Windsurf, Factory Droid, Pi, GitHub Copilot, Qwen, and more simultaneously." - -**Shape:** TypeScript (MIT), ~14.5k ⭐, v2.66.1 April 16, 2026. 97 total releases — high-cadence active project. **Source format: `.claude-plugin/` (Claude Code format) is the canonical input — all other runtimes are generated from it.** `bunx @every-env/compound-plugin install --to ` transpiles to target-specific output via one `.ts` file per runtime in `src/targets/`. Current 11 targets: `codex`, `copilot`, `droid`, `gemini`, `kiro`, `openclaw`, `opencode`, `pi`, `qwen`, `windsurf` + Claude Code source. 12th slot likely Cursor (in-progress). - -**Molecule AI is not on the list.** Adding us requires: (1) `src/targets/molecule-ai.ts` — one `.ts` file handling tool-name mapping and output path generation; (2) one-line export in `index.ts`. Estimated effort: **2–4 hours** (upstream PR to EveryInc/compound-engineering-plugin). Since our `.claude-plugin/` format already matches their source format exactly, this is zero-cost compatibility. - -**Overlap with us:** Distribution-layer overlap with our `agentskills.io` multi-runtime adapter pattern. Compound uses a CLI transpiler (authors run one command); we embed per-runtime `adapters/.py` files inside each plugin (authors maintain adapters). Compound is strictly more ergonomic for authors. The two mechanisms are complementary layers, not in conflict — but if Compound becomes the community standard, absent Molecule AI support means silent bypass of our registry. - -**Differentiation:** Distribution/packaging tool only. No A2A, no workspace lifecycle, no cron, no canvas. Not an orchestration competitor. - -**Worth borrowing:** The `compound install ` one-command UX. Consider a `molecli plugin install ` shorthand. Also: their per-runtime `.ts` target file pattern is cleaner than our `adapters/.py` per-plugin approach — evaluate adopting it for the plugin SDK. - -**Action (time-sensitive):** Open upstream PR to add `molecule-ai.ts` target to EveryInc/compound-engineering-plugin **before the Cursor slot lands** — being 12th (not 13th) matters for perception. This is a ~2-4h Dev Lead task; file as external contribution issue when GH_TOKEN rotates. - -**Signals to react to:** If Compound adds a server-side plugin registry → direct threat to our `plugins/` registry as canonical source. If `molecule-ai.ts` PR is rejected → reassess whether to maintain a Compound-compatible fork. - -**Last reviewed:** 2026-04-17 (CI deep-dive) · **Stars / activity:** ~14.5k ⭐, v2.66.1, April 16, 2026 - ---- - -### EDDI — `labsai/EDDI` - -**Pitch:** "Config-driven multi-agent orchestration middleware — intelligent routing between users, agents, and business systems where agent logic lives in JSON, not code." - -**Shape:** Java 25 + Quarkus (Apache 2.0), ~296 ⭐, v6.0.1, 44 releases. Ships as Docker Compose + Kubernetes manifests. First HN exposure April 17, 2026 (Show HN, early traction). Five enterprise-grade capabilities: Ed25519 cryptographic agent identity per agent, HMAC-SHA256 immutable audit ledger, GDPR/HIPAA-compliant infrastructure, secrets vault with envelope encryption, group conversations with 5 configurable discussion styles. - -**Overlap with us:** Hits five of six Molecule AI orchestration criteria — A2A, cron scheduling, persistent agent identity, self-hostable, model-agnostic (12 LLM providers + MCP). Only gap: no visual canvas. The immutable HMAC audit ledger and GDPR/HIPAA posture directly target the regulated-vertical ICP we sharpened in the #572/#582 market research. - -**Differentiation:** Config-only (JSON) — no graph UI, no org-chart canvas, no Docker workspace isolation per agent. Java stack limits the overlap community; 296 stars = low current traction. Not a near-term competitive threat. - -**Worth borrowing:** The HMAC-SHA256 immutable audit ledger design — every agent action is cryptographically chained so no event can be silently deleted. Relevant to the `compliance-guardrails` plugin spec (staged issue C) and enterprise procurement posture. Also: Ed25519 per-agent signing as a stronger identity mechanism than our current bearer token model. - -**Signals to react to:** If EDDI gains traction (>5k⭐) or ships a visual canvas → reassess threat level. If the HMAC audit ledger pattern gets cited by enterprise compliance auditors as a requirement → accelerate `compliance-guardrails` plugin and add cryptographic chaining to `activity_logs`. - -**Last reviewed:** 2026-04-17 (Show HN) · **Stars / activity:** ~296 ⭐, v6.0.1, Java/Quarkus - ---- - -### Cloudflare Artifacts — `blog.cloudflare.com/artifacts-git-for-agents-beta` - -**Pitch:** "Git for agents — programmatic versioned storage built for agentic workflows: create repos, fork, clone, diff, and branch from code, with Durable Objects durability and ~100KB Zig+WASM Git engine." - -**Shape:** Cloudflare proprietary service (ArtifactFS driver open-sourced), private beta April 16, 2026 — public beta targeted early May 2026. Pricing: $0.15/1k ops (10k/month free), $0.50/GB-month (1 GB free). Not a framework — an infrastructure primitive. - -**Overlap with us:** Not an orchestration platform and does not compete with Molecule AI directly today. Relevant as a new workspace-persistence primitive: any competitor (Paperclip, Scion, VoltAgent) could wire Cloudflare Artifacts into their agent workspace layer to get Git-semantics workspace snapshots cheaper than our current Docker volume + CLAUDE.md prose approach. The fork/clone/diff semantics are a more principled snapshot model than our current `snapshot_id` pattern. - -**Differentiation:** Storage primitive only — no agent identity, no A2A, no scheduling, no canvas. Requires Cloudflare Workers; not self-hostable on arbitrary infra. - -**Worth borrowing:** The `fork()` → `work` → `diff()` → `merge()` lifecycle as a model for workspace snapshot/resume — cleaner than our current lossy prose injection into CLAUDE.md (#583). If ArtifactFS driver becomes usable standalone (non-Cloudflare backend), consider as a replacement for Docker volume snapshots. - -**Signals to react to:** If Cloudflare Agents SDK integrates Artifacts as a built-in workspace-persistence layer → escalate to MEDIUM; Cloudflare would then offer a managed Docker+Git workspace alternative to Molecule AI. If `snapshot_id` semantics become standard across the ecosystem → accelerate #583. - -**Last reviewed:** 2026-04-17 (private beta announcement) · **Stars / activity:** infrastructure service, ArtifactFS driver OSS - ---- - -### dimos — `dimensionalOS/dimos` - -**Pitch:** "Agentic OS for physical space — control humanoids, quadrupeds, drones, and robotic arms via natural language. Python SDK, MCP-native, zero ROS dependency." - -**Shape:** Python (MIT), ~2.9k ⭐, v0.0.11, March 2026. Module-based architecture: components expose typed input/output streams; `autoconnect()` wires them by name+type into a "blueprint." Multiple transports: LCM, shared memory, DDS, ROS 2. Spatial memory via SLAM; temporal memory via spatio-temporal RAG (object permanence across sessions). Hardware support: Unitree Go2/B1/G1, AgileX Piper, Xarm, DJI Mavic, MAVLink drones. MCP is the primary agent-control interface — robots are addressed as MCP tool endpoints. - -**Overlap with us:** Any MCP-capable Molecule AI workspace could issue commands to dimos-managed hardware via the standard MCP tool surface. Spatio-temporal RAG for memory is adjacent to our `agent_memories` approach. - -**Differentiation:** Hardware/robotics domain only — no workspace lifecycle, no A2A, no canvas, no SaaS orchestration. Not a software agent competitor; 278 open issues suggests pre-stability. - -**Worth borrowing:** The `autoconnect()` blueprint wiring (match streams by name+type, not hardcoded edges) is a clean low-ceremony graph composition pattern — applicable to our workflow plugin composition system. - -**Terminology collisions:** "blueprint" = their module-wiring config; we'd call this a workflow or pipeline. - -**Signals to react to:** If dimos ships A2A support → robot-controlling workspaces become first-class Molecule AI peers. If spatio-temporal RAG pattern gains traction in non-hardware agents → revisit `agent_memories` retrieval architecture. - -**Last reviewed:** 2026-04-17 (GitHub trending) · **Stars / activity:** ~2.9k ⭐, v0.0.11, March 2026 - ---- - -### Cloudflare Workers AI — `cloudflare.com/ai-platform` - -**Pitch:** "One API to access any AI model from any provider — built to be fast and reliable. Unified inference layer for agent-native apps with auto-failover and streaming resilience across 330 global PoPs." - -**Shape:** Cloudflare proprietary platform (infrastructure service, some OSS components). Part of Cloudflare "Agents Week" 2026. 70+ models across 14+ providers (OpenAI, Anthropic, Google, etc.). Key capabilities for agents: automatic multi-provider failover, streaming response buffering independent of agent lifetime (reconnect without reprocessing), unified billing + monitoring across all model calls, custom model bring-your-own via Replicate Cog. Part of a broader Cloudflare agent stack: Durable Objects (state), Artifacts (versioned storage, tracked separately), Agents SDK (multi-step orchestration), AI Search (hybrid RAG for agents). - -**Overlap with us:** Cloudflare is assembling a complete managed agent platform: inference + state + storage + orchestration + search. Collectively a competing infrastructure story to Molecule AI's self-hosted model. Neither product has canvas, visual org hierarchy, A2A, or governance tooling. - -**Differentiation:** Pure infrastructure primitives — no agent identity model, no workspace lifecycle, no compliance/governance. Requires Cloudflare Workers (not self-hostable on arbitrary infra). Each piece is standalone; the "platform" is integration, not a packaged product. No pricing announced for full stack. - -**Worth borrowing:** Streaming resilience pattern — buffer streaming LLM responses independently of agent process lifetime, allow graceful reconnection. Apply to our A2A response streaming. Multi-provider failover model — reference design for our model-agnostic workspace layer (`runtime:` field). - -**Terminology collisions:** "Workers" = Cloudflare serverless compute; we call these "workspaces". "Bindings" = their service-to-service connector; we use A2A protocol for agent-to-agent calls. - -**Signals to react to:** If Cloudflare Agents SDK integrates all four primitives (Workers AI + Durable Objects + Artifacts + AI Search) into a one-click multi-agent deployment → escalate to MEDIUM; would offer a competing managed workspace alternative at Cloudflare global scale. Watch for per-agent billing or workspace lifecycle management announcements. - -**Last reviewed:** 2026-04-17 (Agents Week 2026, HN 248pts) · **Stars / activity:** infrastructure service, no public GitHub repo - ---- - -### OpenAI Codex Agent — `openai.com/codex-for-almost-everything` - -**Pitch:** "Codex is an autonomous AI agent — runs parallel subagents, remembers your projects across sessions, controls your desktop, and schedules its own follow-up tasks." - -**Shape:** Proprietary OpenAI product (not open-source), rolling out to ChatGPT desktop users April 17 2026. macOS computer control at launch, Windows forthcoming. Part of ChatGPT subscription. **Distinct from `openai-agents-sdk`** (developer API) — this is the consumer/prosumer agent product. - -**Overlap with us:** The three core features directly mirror Molecule AI: (1) parallel subagent orchestration for write/debug/test ≈ our multi-workspace org hierarchy; (2) cross-session project memory ≈ `agent_memories`; (3) autonomous self-wake scheduling ≈ `workspace_schedules`. Computer use overlaps with our browser-automation plugin. - -**Differentiation:** No org canvas, no multi-tenant governance, no Docker isolation, no custom runtime (OpenAI-only), no A2A, no plugin registry. Single-user prosumer — not an enterprise platform. Our moat: org hierarchy, governance canvas (#582), runtime flexibility, self-hosted deployment. - -**Worth borrowing:** Scheduling UX framing — "schedule a follow-up task" is cleaner than raw cron config. Consider exposing `workspace_schedules` as "follow-up tasks" in the Canvas Config tab. - -**Terminology collisions:** "Projects" = their cross-session persistence unit; we call these "workspaces". "Subagents" = parallel execution units; we call these worker workspaces. - -**Signals to react to:** If subagent API opens to third-party orchestrators → Molecule AI could orchestrate Codex as a specialist worker. If computer control expands to web + Windows → revisit threat level. - -**Last reviewed:** 2026-04-17 · **Stars / activity:** N/A (proprietary) — HN 769 pts / 387 comments at launch - ---- - -### Qwen3.6-35B-A3B — `qwen.ai/blog` - -**Pitch:** "35B MoE model, 3B active parameters per token — agentic coding power, now open to all." - -**Shape:** Open-weight model from Alibaba/Qwen, immediately downloadable. 35B total / 3B active per token via mixture-of-experts routing. Purpose-built for agentic coding loops: tight feedback cycles, low latency, low cost per token. Not an orchestration framework — a model that competitors can wire into their own stacks. - -**Overlap with us:** Indirect. Commoditizes the LLM layer for self-hosted orchestrators. Any competitor (VoltAgent, Paperclip, LangGraph self-hosted) can now offer near-zero API cost for coding agents using Qwen3.6. Erodes the cost argument for cloud-API-locked platforms more than it threatens us (we're already model-agnostic). - -**Differentiation:** Our `runtime:` field is already model-agnostic. Qwen3.6 doesn't threaten our orchestration layer; it pressures cloud-model-dependent competitors. Our cost position is neutral to positive. - -**Worth borrowing:** Add `qwen3.6-35b-a3b` as a documented supported model in workspace config docs before competitors do. Cost-sensitive enterprise buyers wanting self-hosted inference are our conversion path. - -**Terminology collisions:** "Agentic coding" = their framing for autonomous dev-loop use; our framing is "coding workspace." - -**Signals to react to:** If top-tier SWE-bench/Aider benchmark confirms → document as supported model immediately. If VoltAgent or Paperclip ship native Qwen3.6 integration → publish ours first or same day. - -**Last reviewed:** 2026-04-17 · **Stars / activity:** HN #1 story (984 pts / 430 comments); open weights on qwen.ai - ---- - -### EvoMap Evolver — `EvoMap/evolver` - -**Pitch:** "A GEP-powered self-evolution engine for AI agents — turns ad hoc prompt tweaks into auditable, reusable evolution assets with A2A-compatible distributed worker nodes." - -**Shape:** JavaScript (Node.js), GPL-3.0, ~3.3k ⭐, v1.67.1 April 17 2026. Not a general-purpose orchestrator. Deterministic, log-driven prompt-evolution engine: scans `memory/` for error signals → selects Genes/Capsules from local asset library → emits a structured GEP directive → records an immutable `EvolutionEvent` JSONL entry. Three run modes: standalone, `--review` (HITL gate), `--loop` (daemon). Connects to EvoMap Hub via `A2A_HUB_URL` + `A2A_NODE_ID` for distributed worker networks with capability-domain task routing and Evolution Circles (collaborative agent groups with shared context). - -**Overlap with us:** (1) A2A worker pool explicitly uses `A2A_HUB_URL`/`A2A_NODE_ID` — EvoMap nodes can be wired as a specialist `repair`/`harden` role inside a Molecule AI org hierarchy today. (2) Networked Skill Store ships `SKILL.md` natively compatible with agentskills.io. (3) Immutable `EvolutionEvent` JSONL (18 fields: identifiers + execution context + data + HMAC integrity) is the closest open-source implementation of the audit ledger needed by our governance canvas (#582). - -**Differentiation:** No visual canvas, no Docker isolation, no org hierarchy, no scheduling, no multi-runtime. Specialist tool, not a competing platform. GPL-3.0 copyleft: direct code embedding requires legal review; design inspiration is unrestricted. - -**Worth borrowing:** `EvolutionEvent` 18-field JSONL schema as reference for `molecule-audit-ledger` (see also EDDI audit ledger research). `--review` HITL gate pattern for surfacing agent self-edits to the governance canvas approvals UI. - -**Signals to react to:** EvoMap Hub paid-tier adoption → agentskills.io competitive signal. Docker container isolation added → escalate to MEDIUM. - -**Last reviewed:** 2026-04-17 · **Stars / activity:** 3,327 ⭐, +812 today, v1.67.1, 351 forks - ---- - -### AI Hedge Fund — `virattt/ai-hedge-fund` - -**Pitch:** "An autonomous AI team of 19 specialized agents designed for financial analysis and trading signal generation." - -**Shape:** Python (MIT), ~55.7k ⭐, +763 stars on 2026-04-17. Reference implementation, not a framework. 19 hard-coded agent roles: portfolio manager, risk manager, bull/bear analysts, sector specialists (tech, healthcare, consumer, energy, financials). Each agent is a prompted LLM call with a defined scope; the portfolio manager orchestrates. Supports Ollama (local LLMs), OpenAI, Anthropic, and Google cloud providers via a `--llm` flag. No persistent state, no Docker isolation, no scheduling, no plugin system. - -**Overlap with us:** Demonstrates domain-specific multi-agent collaboration at scale: 19 agents with distinct roles, a coordinator, shared context. The role taxonomy (risk manager, specialist analysts, coordinator) maps cleanly onto our workspace hierarchy (PM + specialist worker workspaces). High star count signals strong enterprise demand for vertical-specific agent orchestration in finance — a key Molecule AI ICP. - -**Differentiation:** Not a platform. No workspace lifecycle, no A2A, no canvas, no governance, no multi-tenant. A demo/reference implementation that shows what customers will try to build on Molecule AI. The gap between this repo and a production system is exactly the gap Molecule AI fills. - -**Worth borrowing:** The role taxonomy is a compelling sales reference: "here's a 19-agent financial analysis team running on Molecule AI" is a concrete enterprise demo. Consider shipping an `ai-hedge-fund` org template that reproduces this architecture on Molecule AI's canvas with proper workspace isolation and A2A coordination. - -**Terminology collisions:** "Portfolio manager" = their coordinator agent; we'd map this to a PM workspace. "Analysts" = specialist worker workspaces. - -**Signals to react to:** If the repo adds a framework layer (reusable agent registry, scheduling, persistence) → escalate to MEDIUM. If finance-sector enterprises request a hedge-fund template → ship one. - -**Last reviewed:** 2026-04-17 · **Stars / activity:** 55,750 ⭐, +763 today, MIT - ---- - -### Strix — `usestrix/strix` - -**Pitch:** "Open-source AI hackers to find and fix your app's vulnerabilities." - -**Shape:** Python (91.6%), Apache-2.0, 24.1k ⭐, available on PyPI as `strix-agent`. CLI-first autonomous security testing platform built on a **graph of agents** architecture: specialized agents coordinate in parallel across attack vectors (injection, SSRF, XSS, IDOR, auth bypass, and more), validate findings with real proof-of-concepts rather than static analysis flags, and emit actionable remediation reports. Toolkit includes HTTP proxy, browser automation, terminal environments, and a Python runtime harness. Supports CI/CD pipeline integration. - -**Overlap with us:** (1) Multi-agent graph architecture is conceptually aligned — parallel specialist agents, dynamic coordination, result aggregation. Not an orchestration framework, but a production signal that autonomous multi-agent pipelines are proven in security verticals. (2) CI/CD integration pattern mirrors how Molecule AI workspaces are embedded in dev pipelines. (3) The auto-remediation + structured reporting loop is a demand signal for audit-trail and human-oversight patterns — directly adjacent to the `molecule-audit-ledger` work (GH #594) and our EU AI Act compliance posture. - -**Differentiation:** Domain-locked (security only), no visual canvas, no org hierarchy, no scheduling, no A2A interoperability. Not a competing platform — a vertical application on top of agent primitives similar to what a Molecule AI org template could deliver. - -**Worth borrowing:** Proof-of-concept validation pattern (agents confirm exploits rather than flag suspects) as a model for grounding agent outputs with verifiable artifacts. Their `--ci` mode integration pattern is worth referencing for the playwright-mcp plugin CI workflow. - -**Signals to react to:** If Strix ships an agent SDK / plugin API → they become a platform player, escalate to MEDIUM. If enterprise security teams start asking about Molecule AI + Strix integration → document a reference org template. - -**Last reviewed:** 2026-04-17 · **Stars / activity:** 24,100 ⭐, +202 today, PyPI `strix-agent` - ---- - -### Anthropic Agent Skills — `anthropics/skills` - -**Pitch:** "A cross-platform open standard for portable AI agent skills — declare a skill as `SKILL.md` (YAML frontmatter + Markdown body) and it installs anywhere the standard is adopted." - -**Shape:** Filesystem standard (not a framework), 119k★ on GitHub (trending #1 today), 26+ platform adopters including Cursor, OpenAI Codex, GitHub Copilot, and Gemini CLI. A skill is a `SKILL.md` file with YAML frontmatter (name, description, author, version, tools, compatibility) and Markdown body (instructions). Skills install to `.agents/skills/` or `.claude/skills/`. Anthropic also operates a proprietary REST API track (`/v1/skills`, beta header `skills-2025-10-02`) for org-internal skill upload/management; confirmed pre-built skills: pptx, xlsx, docx, pdf. Partner directory (Atlassian, Figma, Canva, Cloudflare, Sentry, Ramp live; Stripe/Notion/Zapier unconfirmed) is invitation-only with no programmatic import API. - -**Overlap with us:** Molecule AI already uses `SKILL.md` natively — every `configs/plugins/*/skills/*/SKILL.md` is a compliant Agent Skill (confirmed by TR spike 2026-04-17, GH #677). Zero schema chasm. GH #676 (molecule-agent-skills-bridge) will allow Molecule workspaces to install skills from the Anthropic API track and export custom skills to the org registry. - -**Differentiation:** Agent Skills is a portability standard, not a competing orchestration platform. Skills are stateless capability definitions; Molecule AI provides the runtime, lifecycle, governance, and org hierarchy. Compliance with the standard strengthens Molecule's positioning — it joins a 26-platform ecosystem rather than standing outside it. - -**Worth borrowing:** SKILL.md as the canonical external representation of a Molecule skill (already adopted). The `/v1/skills` beta API for distributing skills to partner Claude deployments (org-internal, pending #676). Schema delta to publish: `version`/`author`/`tags` → `metadata` map; `runtimes` → `compatibility` — one-pass transform. - -**Terminology collisions:** "skill" — Anthropic: a SKILL.md capability unit; Molecule: same (no collision). "connector" — claude.com/connectors: Anthropic's Web UI for partner skills; Molecule: channel integrations (Slack, Telegram) — distinct contexts, no collision risk. - -**Signals to react to:** `/v1/skills` API GA (beta header dropped) → ship #676 immediately. New partners added to claude.com/connectors → update #676 supported-partners list. Cross-platform open registry (invitation-only → public) → revisit #676 reverse-export scope. - -**Last reviewed:** 2026-04-17 · **Stars / activity:** 119,323★, GitHub trending Python #1 today, 26+ platform adopters - ---- - -### Microsoft APM — `microsoft/apm` - -**Pitch:** "The open-source dependency manager for AI agents — declare agent packages (skills, plugins, MCP servers, prompts, hooks) in a single `apm.yml` and get reproducible setups across teams." - -**Shape:** Python (95%), open-source, v0.8.11 (Apr 6 2026), 1.8k★. CLI distributed as native binaries (macOS/Linux/Windows) + pip. Manages "instructions, skills, prompts, agents, hooks, plugins, MCP servers" via a unified `apm.yml` manifest. Key features: transitive dependency resolution, multi-source installs (GitHub/GitLab/Bitbucket/Azure DevOps/any git host), content-security scanning (`apm audit` blocks hidden-Unicode and compromised packages), marketplace with governance via `apm-policy.yaml`, GitHub Action for CI/CD. Built on open standards: AGENTS.md and agentskills.io specification. - -**Overlap with us:** Molecule AI's plugin system (`plugins/` registry, `plugin.yaml` per plugin, `/workspaces/:id/plugins` API) solves the same problem: reproducible, declarative agent capability composition. An `apm.yml` that installs Molecule plugins would be a natural extension of both systems. If apm gains enough adoption to become the de facto way enterprise teams declare agent dependencies, Molecule plugin authors will expect apm.yml compatibility. See GH #694 for evaluation tracking. - -**Differentiation:** apm is a dependency manager, not an orchestration platform. No visual canvas, no agent lifecycle management, no A2A protocol, no scheduling. It is infrastructure for composing agents, not running them. Molecule AI is the runtime; apm could theoretically become the package manager for Molecule plugins rather than a competitor. - -**Worth borrowing:** `apm audit` content-security model for plugin installs — Molecule's plugin install endpoint has no equivalent hidden-Unicode / compromised-package scanning (relevant to GH #675 molecule-security-scan). The `apm-policy.yaml` governance pattern is a lightweight analog to what molecule-governance (#674) needs for policy-as-code enforcement. CI GitHub Action for validating plugin manifests in PRs. - -**Terminology collisions:** "plugin" — both use it for capability units; apm's scope is broader (includes skills, prompts, hooks). "package" — apm's primary noun; Molecule calls the same thing a plugin. - -**Signals to react to:** apm ships a `molecule-ai` source scheme or native Molecule plugin support → strong ecosystem validation, document compatibility immediately. Microsoft positions apm as "npm for agents" in Agent Framework docs → evaluate making `plugin.yaml` apm-compatible. apm reaches 10k★ → evaluate publishing Molecule plugins to the apm marketplace. - -**Last reviewed:** 2026-04-17 · **Stars / activity:** 1,766★, v0.8.11 Apr 6 2026, GitHub trending Python today - ---- - -### Cloudflare Agents — `cloudflare/agents` - -**Pitch:** "Build and deploy persistent, stateful AI agents on Cloudflare's edge infrastructure — millions of concurrent instances, auto-hibernation, zero idle cost." - -**Shape:** TypeScript (99%), Apache-2.0, v0.11.2 (Apr 2026), 4.8k★. Built on Cloudflare Workers + Durable Objects. Core primitives: persistent state synced to clients, cron/one-time scheduling, WebSocket lifecycle hooks, MCP (both server AND client), multi-step durable workflows with HITL approval patterns, email (send/receive/reply via CF Email Routing), and "Code Mode" (LLMs emit TypeScript for orchestration). Agents auto-hibernate when idle — zero infra cost during inactivity. - -**Overlap with us:** Near-complete overlap on workspace lifecycle primitives: state persistence (our Redis + Postgres), scheduling (our `workspace_schedules`), WebSocket (our canvas WS hub), MCP client support (our `mcp-connector` #573), HITL approvals (our `approvals.*`). CF's auto-hibernation + one-Durable-Object-per-agent model is architecturally analogous to Molecule's per-workspace Docker container lifecycle. - -**Differentiation:** No A2A protocol, no org hierarchy, no visual canvas. TypeScript-only (Molecule is Python-first). Serverless edge vs. Molecule's Docker workspace model. CF scales to millions of concurrent single agents via infrastructure; Molecule's value is the *organizational hierarchy* of collaborating specialists. No governance layer, no RBAC, no audit trail. - -**Worth borrowing:** Auto-hibernation — when `active_tasks == 0` for N minutes, auto-pause container; resume on next A2A ping. Closes idle-cost gap; filed as GH #711. "Code Mode" (agent-generated TypeScript orchestration) is a signal that declarative workflow gen will become a table-stakes expectation. - -**Terminology collisions:** "workspace" — CF calls the unit an "Agent" (Durable Object); we call it a Workspace (Docker container + config). - -**Signals to react to:** CF adds A2A support → escalate to HIGH, evaluate CF Workers as a Molecule workspace runtime target. CF bundles Agents + Artifacts + AI Gateway into a single platform pricing tier → direct positioning threat. Reaches 20k★ → publish a CF Workers org template. - -**Last reviewed:** 2026-04-17 · **Stars / activity:** 4,776★, v0.11.2 Apr 2026, TypeScript - ---- - -### cognee — `topoteretes/cognee` - -**Pitch:** "Knowledge Engine for AI Agent Memory in 6 lines of code — remember, recall, forget, improve." - -**Shape:** Python (87%) + TypeScript (13%), Apache-2.0, v1.0.1.dev1 (Apr 2026), 16.1k★, 6,700+ commits. Hybrid memory architecture: vector search (semantic retrieval) + graph database (entity relationships) + session cache (fast, syncs to graph in background). Four-verb API: `remember`, `recall`, `forget`, `improve`. MCP-compatible (ships a Claude Code plugin + OpenClaw plugin). Native Hermes Agent integration. - -**Overlap with us:** (1) `agent_memories` — Molecule's HMA scoped memory (Redis + Postgres) vs. cognee's vector+graph hybrid with auto-routing; cognee is a richer retrieval layer. (2) Hermes workspace template — cognee ships native Hermes Agent support, suggesting direct drop-in compatibility with `molecule-ai-workspace-template-hermes`. (3) MCP plugin — cognee exposes memory as MCP tools, consumable via our `mcp-connector` (#573). Tracked for evaluation in GH #717. - -**Differentiation:** cognee is a memory library, not an orchestration platform — no visual canvas, no org hierarchy, no A2A, no scheduling. It augments agent memory; Molecule provides the agent runtime. - -**Worth borrowing:** The `remember`/`recall`/`forget`/`improve` verb API as a higher-level abstraction over `GET/POST /workspaces/:id/memories`. Graph-backed relationship tracking (entities, not just key-value) for richer agent knowledge graphs. - -**Terminology collisions:** "memory" — same word, different layers (cognee: content/semantic store; Molecule: workspace KV memory). "recall" — cognee verb vs. our memory search. - -**Signals to react to:** cognee v1.0.0 stable ships → evaluate as Hermes workspace dep. cognee adds A2A protocol → escalate to MEDIUM. - -**Last reviewed:** 2026-04-17 · **Stars / activity:** 16,096★, v1.0.1.dev1 Apr 2026, active (6.7k commits) - ---- - -### opencode — `anomalyco/opencode` - -**Pitch:** "The open source coding agent." - -**Shape:** TypeScript/MDX, MIT-licensed, CLI + desktop app (beta). 145k★, v1.4.7 (Apr 16 2026), 763 releases — heavily shipped. Provider-agnostic: Claude, OpenAI, Google, local models with no vendor coupling. Two built-in agent modes switchable at runtime: **build** (full read/write/execute access) and **plan** (read-only analysis). Client/server architecture with LSP integration for live diagnostics. - -**Overlap with us:** Directly competes with `molecule-ai-workspace-template-claude-code` as the tool developers reach for when they want autonomous full-codebase coding. At 145k★ it is 3× larger than Cline (our prior single-agent coding comparison point). Users who outgrow opencode's single-agent model — needing multi-agent coordination, org hierarchy, or persistent scheduled work — are our conversion path. - -**Differentiation:** No A2A protocol, no multi-agent coordination, no visual canvas, no org hierarchy, no scheduling, no Docker workspace isolation. Pure single-agent coding tool. Molecule provides the *platform* layer opencode lacks. - -**Worth borrowing:** Build/plan mode toggle — a read-only analysis mode before executing is a safety pattern for workspace config. Provider-agnostic runtime model selection aligns with our multi-runtime workspace architecture. - -**Terminology collisions:** "agent" — they call the two modes "agents" (build/plan); we call the container+config unit a "workspace". Risk of developer confusion between "Molecule workspace" and "opencode agent". - -**Signals to react to:** opencode ships an MCP server → plug in via `mcp-connector` (#573). opencode ships a REST/WebSocket API → evaluate as `molecule-ai-workspace-template-opencode` (GH #720). opencode adds A2A → could become a direct workspace peer. Hits 200k★ → publish positioning blog: Molecule as the org layer over opencode. - -**Last reviewed:** 2026-04-17 · **Stars / activity:** 145k★, v1.4.7 Apr 16 2026, TypeScript, 763 releases - ---- - -### pydantic-ai — `pydantic/pydantic-ai` - -**Pitch:** "AI Agent Framework, the Pydantic way — build production-grade agents with type safety." - -**Shape:** Python, Apache-2.0, ~16.4k★. Brings Pydantic's validation philosophy to agents: type-safe structured output, dependency injection, Pydantic model validation throughout the tool layer. Ships native A2A protocol support, MCP client, HITL approval gates, durable execution across transient failures, graph-based workflows, Logfire observability, and Pydantic Evals systematic evaluation. Multi-model (OpenAI, Anthropic, Gemini, DeepSeek, Grok, Cohere, Mistral, 15+ others). Supports declarative YAML/JSON agent definitions. - -**Overlap with us:** (1) **A2A protocol** — pydantic-ai agents speak native A2A, making them potential first-class Molecule workspace peers with zero shim; (2) **MCP client** — native MCP consumption; could use our `@molecule-ai/mcp-server` toolset directly; (3) **HITL approvals** — tool approval gates overlap our `approvals` API; (4) **adapter candidate** — same adapter-target profile as LangGraph but with native A2A. Filed as GH #721. - -**Differentiation:** Library, not platform. No visual canvas, no org hierarchy, no Docker workspace isolation, no scheduling/cron, no registry. Molecule provides the runtime + orchestration + governance layer; pydantic-ai provides the agent logic inside a workspace. - -**Worth borrowing:** Dependency injection for agent tools — clean testability pattern vs. our current tool registration. Pydantic Evals framework as reference design for systematic agent quality gates. YAML-defined agents aligns with our `config.yaml` declarative philosophy. - -**Terminology collisions:** "agent" — pydantic-ai's `Agent` is a Python class; ours is a Docker workspace. "tools" — pydantic-ai tools ≈ our `builtin_tools`/plugins. - -**Signals to react to:** pydantic-ai surpasses LangGraph in GitHub stars → prioritize `molecule-ai-workspace-template-pydantic-ai` (GH #721). A2A version confirmed compatible with our a2a-sdk==0.3.25 → validate zero-shim interop. pydantic-ai ships a Molecule adapter → zero-effort integration. - -**Last reviewed:** 2026-04-17 · **Stars / activity:** ~16.4k★, Python, Apache-2.0, active - ---- - -### goose (AAIF) — `aaif-goose/goose` - -**Pitch:** "An open source, extensible AI agent that goes beyond code suggestions — install, execute, edit, and test with any LLM." - -**Shape:** Rust, Apache-2.0, ~5k★ (moved Apr 2026 from `block/goose` to Agentic AI Foundation / Linux Foundation). Desktop app (macOS, Linux, Windows) + CLI + embeddable API. 15+ LLM providers: Anthropic, OpenAI, Google, Ollama, Azure, Bedrock, OpenRouter. Single-agent, local-machine focus. Extensible via "extensions" (MCP-compatible tool plugins). Bundled with an `AGENTS.md` agent-description standard, now donated to AAIF alongside MCP. - -**Overlap with us:** (1) Both are general-purpose AI agent execution environments with plugin/extension ecosystems. (2) MCP tool support — goose extensions map to our MCP connector. (3) **AGENTS.md** — Block donated this agent-description standard to the Linux Foundation's AAIF alongside MCP; if it gains traction, workspace templates should include a generated `AGENTS.md` for discoverability. (4) Goose's embedding API could make it a `molecule-ai-workspace-template-goose` candidate. - -**Differentiation:** Goose is single-agent, local-machine execution. No multi-agent coordination, no org hierarchy, no visual canvas, no A2A protocol, no Docker workspace isolation, no scheduling. Molecule is the orchestration platform layer goose lacks. - -**Worth borrowing:** `AGENTS.md` agent-description standard — a human+machine readable file describing an agent's capabilities, limitations, and invocation contract. Aligns with our `config.yaml` philosophy and could become an AAIF interop requirement. Multi-provider Rust runtime (performance reference for future Go workspace provisioner work). - -**Terminology collisions:** "extensions" (goose) ≈ "plugins" (Molecule). "recipes" (goose) = reusable workflow scripts ≈ our org template `initial_prompt` patterns. - -**Signals to react to:** AGENTS.md becomes an AAIF / industry standard → add auto-generated `AGENTS.md` to workspace-template build (see GH issue filed). Goose embedding API matures → evaluate `molecule-ai-workspace-template-goose`. Goose ships A2A → could register as a Molecule workspace peer. - -**Last reviewed:** 2026-04-17 · **Stars / activity:** ~5k★ (aaif-goose fork, Apr 2026), Rust, Apache-2.0, Linux Foundation / AAIF - ---- - -### GitHub Awesome Copilot — `github/awesome-copilot` - -**Pitch:** Community-curated marketplace of GitHub Copilot agents, skills, instructions, plugins, hooks, and agentic workflows — installable via `copilot plugin install @awesome-copilot`. - -**Shape:** Python (69%) + TypeScript (5%) + Markdown, MIT, 30.2k★, 1,600+ commits, actively maintained by GitHub. Six artifact types: **agents** (MCP-connected Copilot extensions), **instructions** (file-pattern scoped rules), **skills** (self-contained instruction + asset bundles), **plugins** (curated agent+skill bundles), **hooks** (session-triggered automations), **agentic workflows** (AI GitHub Actions written in Markdown). Pre-registered as default install source in Copilot CLI and VS Code. - -**Overlap with us:** Direct structural parallel to our plugin+skill ecosystem. "Skills" = our `.claude/skills/`; "Plugins" = our `plugins/`; "Hooks" = our `.claude/settings.json` hooks; "Agents" = our workspace roles. The named community registry pattern (`@awesome-copilot`) mirrors what a `@molecule-ai` plugin registry would look like. Agentic Workflows (AI GitHub Actions in Markdown) = our cron/schedule workflow plugins. - -**Differentiation:** Awesome-Copilot is a curated list for a single agent (Copilot), not an orchestration platform. No inter-agent comms, no canvas, no A2A, no Docker isolation, no hierarchy. Molecule provides the multi-agent coordination layer this ecosystem lacks. - -**Worth borrowing:** Named community registry as default install source — `copilot plugin install name@awesome-copilot` pattern is a UX model for `molecule plugin install name@molecule-hub`. Hooks-as-first-class-artifacts pattern validates our `settings.json` hook approach. The six-type taxonomy (agents / instructions / skills / plugins / hooks / workflows) is a clean conceptual frame. - -**Terminology collisions:** **HIGH RISK.** "Skills", "Plugins", "Agents", "Hooks" — every term overlaps with Molecule's vocabulary. If Molecule publishes to both ecosystems, users will conflate them. Recommend explicit disambiguation note in `docs/glossary.md`. - -**Signals to react to:** GitHub publishes a formal plugin schema spec → evaluate cross-compatibility with our `plugin.yaml` format. Awesome-Copilot plugin format adopted by other tools → position Molecule plugins as cross-compatible. Copilot adds MCP server support → Molecule's `@molecule-ai/mcp-server` becomes directly installable as a Copilot plugin. - -**Last reviewed:** 2026-04-17 · **Stars / activity:** 30,211★, Python/TS, MIT, GitHub-maintained, 1,600+ commits - ---- - -### Mastra — `mastra-ai/mastra` - -**Pitch:** "Build production AI features in TypeScript — agents, workflows, memory, RAG, evals, and voice in one framework." - -**Shape:** TypeScript, Apache-2.0, 22k★, v1.0 Jan 2026. From the Gatsby/GatsbyJS founders (YC). 1.8M monthly downloads by Feb 2026; 300k+ weekly at v1.0 launch. Multi-provider (Claude, OpenAI, Gemini, etc.). Core primitives: `Agent` (tool-using LLM loop), `Workflow` (step DAG with retry/parallel/conditional), `Memory` (vector + semantic retrieval), `RAG` (document ingestion + retrieval), evals, Langfuse/OpenTelemetry observability, and a voice pipeline. MCP client built-in. TypeScript-first. - -**Overlap with us:** TypeScript-native agent framework that competes for the same developer mindshare as pydantic-ai (Python side). MCP client support maps to our `mcp-connector` (#573). Workflow engine (durable step DAG) is a TypeScript analog to our Temporal integration. Potential `molecule-ai-workspace-template-mastra` adapter candidate. - -**Differentiation:** TypeScript only (no Python). No A2A protocol, no multi-agent org hierarchy, no visual canvas, no Docker workspace isolation, no cron scheduling. Molecule provides the multi-agent orchestration + governance layer; Mastra provides agent logic inside a single workspace. - -**Worth borrowing:** Evals built-in from v1.0 — not bolted on. "Steps" workflow primitive with structured retry + parallel branches is a cleaner abstraction than raw LangGraph graphs. Voice pipeline as first-class primitive. - -**Terminology collisions:** "workflows" (Mastra step DAGs) ≈ our LangGraph-based workflows. "integrations" ≈ our plugins. "agents" ≈ our workspaces. - -**Signals to react to:** Mastra ships A2A protocol → prioritize `molecule-ai-workspace-template-mastra`. Mastra adds multi-agent coordination → escalate threat level. Mastra hits 30k★ → competitive positioning blog needed. - -**Last reviewed:** 2026-04-17 · **Stars / activity:** 22k★, TypeScript, Apache-2.0, YC, v1.0 Jan 2026, 1.8M monthly downloads - ---- - -### SAFE-MCP — `safe-agentic-framework/safe-mcp` - -**Pitch:** "An ATT&CK-style threat framework for documenting and mitigating adversary tactics, techniques, and procedures in MCP-based AI agent systems." - -**Shape:** Markdown + Python, MIT. Adopted by Linux Foundation + OpenID Foundation (Apr 2026). 14 tactical categories, 80+ documented attack techniques using SAFE-T#### IDs (mirrors MITRE ATT&CK structure): initial access, tool poisoning, prompt injection via MCP responses, data exfiltration, privilege escalation, persistence. Ships threat modeling guides, developer quickstarts, and per-technique mitigations. - -**Overlap with us:** Our `@molecule-ai/mcp-server` (87 tools) and MCP connector (#573) are directly in scope. Our plugin install pathway (fetch + stage + exec) is a SAFE-T1102 "supply-chain" attack surface. Our workspace bearer-token auth, `PLUGIN_INSTALL_MAX_DIR_BYTES` safeguard, and HMAC audit ledger (#594) map to documented SAFE-MCP mitigations. No runtime overlap — purely a reference/compliance framework. - -**Differentiation:** Not a product — a security threat taxonomy. Pure reference material; no code runtime, no competition. - -**Worth borrowing:** Run SAFE-MCP threat model against `@molecule-ai/mcp-server` before v1.0 customer launch (see GH #747). SAFE-T1102 (tool poisoning) and supply-chain techniques are most applicable to our plugin install flow. - -**Terminology collisions:** None — uses its own SAFE-T#### namespace distinct from ours. - -**Signals to react to:** Enterprise customers ask for SAFE-MCP compliance attestation → generate self-assessment doc. SAFE-MCP ships an automated scanner → add to MCP server CI. SAFE-MCP v2.0 adds A2A threat model → extend audit to our A2A proxy. - -**Last reviewed:** 2026-04-17 · **Stars / activity:** early-stage (LF/OpenID adopted Apr 2026), MIT, foundation-governed - ---- - -### mcp-agent — `lastmile-ai/mcp-agent` - -**Pitch:** "Build effective agents using Model Context Protocol and simple workflow patterns." - -**Shape:** Python, Apache-2.0, 7.4k★, last updated Jan 2026. Batteries-included MCP runtime that implements every pattern from Anthropic's *Building Effective Agents* playbook as composable primitives: `Agent`, `Orchestrator`, `Swarm` (OpenAI Swarm multi-agent pattern, model-agnostic), `ParallelAgent`, `RouterAgent`. Handles MCP server lifecycle, LLM connections, human-in-the-loop signals, and durable execution. Companion repo `lastmile-ai/mcp-eval` evaluates MCP server quality. Pure Python, no framework lock-in. - -**Overlap with us:** (1) Directly targets the same "agent runtime + MCP tools" layer as our workspace-template. (2) Swarm multi-agent pattern implemented without A2A — an alternative coordination model to our JSON-RPC peer-to-peer approach. (3) HITL workflow support overlaps `molecule-hitl` / `@requires_approval`. (4) `mcp-eval` could complement GH #747 SAFE-MCP audit as an MCP server quality gate. - -**Differentiation:** No visual canvas, no org hierarchy, no Docker workspace isolation, no scheduling, no A2A protocol. Single-process Python runtime, not a multi-workspace orchestration platform. Molecule provides the governance + multi-tenant layer mcp-agent lacks. - -**Worth borrowing:** Anthropic's "Building Effective Agents" as the pattern library for our org-template design. `mcp-eval` as an automated quality gate for `@molecule-ai/mcp-server` CI. - -**Terminology collisions:** "Orchestrator" (mcp-agent) = a meta-agent that routes tasks to sub-agents ≈ our PM/Research Lead org template roles. - -**Signals to react to:** mcp-agent ships A2A support → potential `molecule-ai-workspace-template-mcp-agent` adapter. `mcp-eval` adopted broadly → integrate into our MCP server CI (#747). mcp-agent hits 15k★ → assess as competitive threat to workspace-template. - -**Last reviewed:** 2026-04-17 · **Stars / activity:** 7,454★, Python, Apache-2.0, Jan 2026 - ---- - -### BeeAI ACP — `i-am-bee/acp` - -**Pitch:** "Open protocol for communication between AI agents, applications, and humans — REST/OpenAPI-based with Python and TypeScript SDKs." - -**Shape:** Python + TypeScript SDKs, Apache-2.0, IBM BeeAI project. OpenAPI spec defines REST endpoints for agent task dispatch, status streaming, and cancellation. HTTP/REST transport — any language with an HTTP client can speak ACP. Designed for multi-runtime, polyglot agent ecosystems. - -**Overlap with us:** Direct overlap with our A2A protocol — both define how agents communicate with each other. ACP = REST/HTTP; A2A = JSON-RPC 2.0. Both now governed by foundations (ACP under BeeAI/IBM; A2A under AAIF/Linux Foundation). If ACP gains enterprise traction via IBM's distribution, Molecule workspaces may need to bridge or support both protocols. OpenAPI spec means auto-generated client SDKs in any language — lower barrier than our current A2A SDK. - -**Differentiation:** ACP has no concept of org hierarchy, workspace lifecycle, or canvas. REST vs JSON-RPC is a transport difference, not a capability gap. Molecule's A2A is AAIF-governed (Linux Foundation + Anthropic + Google + Microsoft co-signatories) — stronger governance coalition. - -**Worth borrowing:** OpenAPI-first protocol design → generates client SDKs automatically. Streaming task status via REST SSE is cleaner than polling. Consider exposing Molecule's A2A via an ACP compatibility shim for IBM enterprise accounts. - -**Terminology collisions:** "tasks" — both use task as the primary coordination unit. "agents" — identical overlap. "runs" (ACP run lifecycle) ≈ our workspace active_task. - -**Signals to react to:** ACP adopted by a major enterprise vendor (SAP, Salesforce, IBM Watson) → Molecule needs ACP bridge. ACP merges with A2A under AAIF → de-duplication milestone. GitHub Copilot CLI ships ACP support (already in preview Jan 2026) → ACP is a GitHub-distribution channel. - -**Last reviewed:** 2026-04-17 · **Stars / activity:** ⚠️ ARCHIVED Aug 27, 2025 — IBM contributed to AAIF/A2A working group; no active development. A2A won the protocol consolidation. No action needed. - ---- - -### smolagents — `huggingface/smolagents` - -**Pitch:** "The simplest library to build powerful agents" — Hugging Face's barebones, code-first agent framework. - -**Shape:** Python, Apache-2.0, 26.5k★, ~1,000 lines of core library code. Primary primitive is `CodeAgent`: instead of emitting tool calls as JSON, the agent writes executable Python that calls tools directly — "thinking in code." Model-agnostic via LiteLLM (OpenAI, Anthropic, Mistral, Ollama, etc.). Sandboxed code execution via E2B, Modal, Docker, or Pyodide (WASM). Hugging Face Hub integration for sharing reusable tools and agents. Multimodal support (text, vision). CLI utilities (`smolagent`, `webagent`). Companion: `huggingface/agents-course` for onboarding. - -**Overlap with us:** (1) Code-first agent execution sits at the same runtime layer as `molecule-ai-workspace-template`. (2) Tool sharing via Hub = a public registry alternative to our internal tool registry. (3) Sandboxed execution (E2B/Docker) mirrors our Docker workspace isolation model. (4) Multimodal + model-agnostic design aligns with our workspace-template flexibility goals. (5) 26.5k★ + Hugging Face distribution = strong community pull for developers who land here before Molecule. - -**Differentiation:** Single-agent, no multi-agent orchestration, no A2A protocol, no org hierarchy, no canvas, no scheduling, no workspace lifecycle management. "Barebones by design" — Molecule is the governance + multi-tenant + orchestration layer smolagents explicitly omits. smolagents' code execution sandbox is local-process; Molecule provides a full Docker workspace per agent. - -**Worth borrowing:** CodeAgent pattern (agent writes Python to call tools) as an optional execution mode for workspace-template. Hub-based tool registry concept — could inform a public Molecule tool/template marketplace. E2B integration pattern for lightweight sandboxing of short-lived tasks. - -**Terminology collisions:** "agents" (identical), "tools" (identical), "CodeAgent" ≈ our workspace-template code execution runner. - -**Signals to react to:** Monitor HF Hub publish progress (template in active development). If SmolAgents ships native A2A → shim becomes zero-LOC, elevate template priority. If HuggingFace officially designates smolagents as the default agent runtime for HF Spaces → distribution advantage increases, fast-track release. Docker-in-Docker gotcha: default must be `executor_type="local"` (AST-sandboxed); `DockerExecutor` requires `--privileged` and must never be the default. - -**Threshold override note:** BUILD authorized at 26,688★ (below 30k criterion). Rationale: HuggingFace corporate backing, zero-cost `Tool.from_langchain` integration path (~145 LOC A2A shim — `fastapi-agents` SmolagentsAgent validates pattern), and ~60-day trajectory to 30k. Waiting risked a community fork defining the integration pattern before us. - -**Last reviewed:** 2026-04-17 · **Stars / activity:** 26,688★, Python, Apache-2.0, active Hugging Face development. **Verdict: BUILD** (threshold override — GH #792 closed, Dev Lead issue filed). Template: `molecule-ai-workspace-template-smolagents`, ~4 engineer-days, security review required. - ---- - -### Claw Code — `ultraworkers/claw-code` - -**Pitch:** Clean-room Python + Rust rewrite of the Claude Code agentic architecture — fastest GitHub repository to 100k stars in history. - -**Shape:** Rust (73%) + Python (27%), 100k★+, 72.6k forks within days of launch. Python handles agent orchestration, command parsing, LLM integration. Rust implements performance-critical runtime paths with a full-native target in progress. Created by @sigridjineth (WSJ: processed 25B+ Claude Code tokens). Not affiliated with or endorsed by Anthropic. - -**Overlap with us:** Direct architectural reference for `molecule-ai-workspace-template-claude-code`. The Rust runtime path (memory safety, performance) is relevant to workspace container design. Python orchestration layer mirrors our workspace-template structure. 100k★ + 72.6k forks = the largest community validation of the Claude Code architecture pattern. - -**Differentiation:** Single-agent coding tool. No multi-agent orchestration, no A2A protocol, no org hierarchy, no canvas, no scheduling, no Docker workspace isolation. Molecule is the governance + orchestration platform layer above it. - -**Worth borrowing:** Rust runtime for performance-critical tool execution — reference if we ever build a performance-optimized workspace template. Clean-room architecture docs clarify Claude Code's task breakdown, tool chaining, and context management at depth unavailable in Anthropic's official docs. - -**Terminology collisions:** None beyond standard "agent" ambiguity. - -**Signals to react to:** Claw Code ships A2A support → evaluate `molecule-ai-workspace-template-claw-code`. Anthropic legal action → monitor for project discontinuation risk. Claw Code's Python SDK becomes pip-installable → simplifies potential workspace template adapter. - -**Last reviewed:** 2026-04-17 · **Stars / activity:** 100k+★, Rust+Python, 72.6k forks, fastest-growing repo in GitHub history - ---- - -### MemPalace — `milla-jovovich/mempalace` - -> ⛔ **BLOCKED — COORDINATED FRAUD** (Security Audit 2026-04-18). Do not integrate, evaluate, or reference this project. - -**Pitch (original):** Local-first AI memory system using the "Method of Loci" — stores full conversation verbatim in a hierarchical palace structure (wings → rooms → drawers) with semantic search. - -**Fraud findings (SA forensic audit — 2026-04-18):** - -- **F1 CRITICAL — Star fraud (89%):** 42,497 of 47,600 stars are bot-farmed. Bot activity ran April 6–13 at metronomic 30-second intervals; confirmed via stargazer timestamp forensics. Authentic star count ≈ 5,000. -- **F2 CRITICAL — Malware domain:** `mempalace.tech` (cited in the project's own `HISTORY.md`) is a confirmed malware impostor domain. Any traffic to this domain must be treated as hostile. -- **F3 CRITICAL — Deleted PyPI maintainer:** GitHub account `aya-thekeeper` (sole PyPI maintainer) was deleted after publishing — live supply-chain attack surface. Any version published under that account is unverifiable. -- **F4 HIGH — Unpatched ChromaDB RCE:** Depends on ChromaDB with an open server-side + client-side RCE via `trust_remote_code` (GitHub issue #6717). Maintainer has not patched. -- **F5 HIGH — Non-existent PyPI package:** `uvx mempalace-mcp` does not exist on PyPI — squattable typosquat attack surface. -- **F6 HIGH — Unsafe model loading:** HuggingFace model download with pickle deserialization (no hash pinning). -- **F7 MEDIUM — Crypto fraud:** Associated with `MEMPALACE` Solana token pump-and-dump scheme. - -**GH #912** (molecule-mempalace plugin proposal) closed — BLOCKED by this audit. Do not reopen without a full independent security re-audit. - -**Last reviewed:** 2026-04-18 · **Stars / activity:** 47.6k★ claimed (89% bot-farmed; ~5k authentic), Python, MIT, v3.3.0 April 14 2026. **Verdict: BLOCKED/FRAUD** - ---- - -### chrome-devtools-mcp — `ChromeDevTools/chrome-devtools-mcp` - -**Pitch:** "Chrome DevTools for coding agents" — 29 MCP tools exposing browser navigation, input automation, network inspection, Lighthouse auditing, and performance tracing directly to AI agents. - -**Shape:** Official MCP server from Google's Chrome DevTools team (not a third-party wrapper). TypeScript, Apache-2.0, 35.9k★. 29 tools across 6 categories: input (click, type, fill_form), navigation, emulation, performance traces, network inspection, script execution + screenshots. - -**Overlap with us:** Molecule's MCP client already wires up to `opencode.json` and workspace config — this drops in as a bundled MCP server for any workspace agent. Complements existing `browser-automation` plugin (Puppeteer/CDP scraper) with DevTools-level depth: network HAR exports, JS console, Lighthouse audits, memory snapshots. - -**Differentiation:** Pure MCP server — no orchestration, no agent runtime. Molecule is the governance layer that decides *which* workspaces get browser access. - -**Worth borrowing:** Add as a recommended/bundled MCP server option in workspace templates. Instant browser-equipped agents with no build effort. - -**Terminology collisions:** None. - -**Signals to react to:** Google's own DevTools team shipping an MCP server is the strongest possible MCP adoption signal. If it becomes the canonical browser integration, Molecule's MCP client tier-1 support becomes a harder differentiator. - -**Last reviewed:** 2026-04-18 · **Stars / activity:** 35.9k★, TypeScript, Apache-2.0, official Google Chrome DevTools - ---- - -### craft-agents-oss — `lukilabs/craft-agents-oss` - -**Pitch:** Open-source desktop agent app built on Anthropic's Claude Agent SDK — multi-session inbox, 3-tier permissions, MCP + REST API connections, event-driven automations. - -**Shape:** Electron desktop app (+ headless server + CLI), TypeScript, Apache-2.0, 4.3k★, v0.8.9 released April 16 2026. Single-user; 4 LLM providers (Anthropic, Google, OpenAI, GitHub Copilot); drag-drop file attachments; automations triggered by labels, schedules, or tool usage. - -**Overlap with us:** UI-layer overlap — multi-session management, permission tiers, MCP connections, multi-LLM support all map onto Molecule's workspace lifecycle and canvas. Built on the same Claude Agent SDK stack. - -**Differentiation:** craft-agents-oss is single-user desktop; Molecule is multi-tenant org-graph with A2A inter-agent coordination. No agent-to-agent delegation, no org hierarchy, no Docker workspace isolation. - -**Worth borrowing:** 3-tier permission UI (Explore / Ask to Edit / Auto) and multi-session inbox labeling workflow are clean UX references for Molecule's workspace approval queue. - -**Terminology collisions:** "sessions" = Molecule's "workspaces"; "sources" = Molecule's "tools/plugins." Watch for user confusion. - -**Signals to react to:** 4.3k stars on launch day signals strong demand for a GUI wrapper around Claude Agent SDK. Molecule's org-chart canvas is the richer multi-tenant answer — worth differentiating loudly in positioning. - -**Last reviewed:** 2026-04-18 · **Stars / activity:** 4.3k★, TypeScript, Apache-2.0, v0.8.9 April 16 2026 diff --git a/docs/edit-history/2026-03-31.md b/docs/edit-history/2026-03-31.md deleted file mode 100644 index 87626821..00000000 --- a/docs/edit-history/2026-03-31.md +++ /dev/null @@ -1,28 +0,0 @@ -# Edit History — 2026-03-31 - -## Summary - -Added **canvas node selection, side panel, and workspace creation UI** — the first major Canvas UX features beyond the initial viewer. - -## Canvas Side Panel & Node Selection - -**New components:** - -- `canvas/src/components/SidePanel.tsx` — 420px right-side panel with 5 tabs (Details, Chat, Config, Memory, Events), opens on node click -- `canvas/src/components/CreateWorkspaceDialog.tsx` — FAB "New Workspace" button + modal form (name, role, tier, parent ID) -- `canvas/src/components/tabs/DetailsTab.tsx` — Inline edit name/role/tier, peer discovery, delete with confirmation -- `canvas/src/components/tabs/ChatTab.tsx` — A2A message/send to workspace agent via agent URL discovery -- `canvas/src/components/tabs/ConfigTab.tsx` — JSON config viewer/editor for workspace -- `canvas/src/components/tabs/MemoryTab.tsx` — Key/value memory browser with add/TTL support -- `canvas/src/components/tabs/EventsTab.tsx` — Color-coded workspace event log - -**Modified files:** - -- `canvas/src/store/canvas.ts` — Added `selectedNodeId`, `panelTab`, `selectNode()`, `setPanelTab()`, `getSelectedNode()`, `updateNodeData()`, `removeNode()`. Added `url` and `parentId` to `WorkspaceNodeData`. -- `canvas/src/components/Canvas.tsx` — Integrated SidePanel + CreateWorkspaceButton, added `onPaneClick` to deselect nodes -- `canvas/src/components/WorkspaceNode.tsx` — Click-to-select with blue ring highlight, provisioning pulse animation - -## Documentation Updated - -- `docs/frontend/canvas.md` — Added Side Panel section, Zustand store shape, Create Workspace dialog, updated node selection behavior -- `docs/edit-history/2026-03-31.md` — This file diff --git a/docs/edit-history/2026-04-01.md b/docs/edit-history/2026-04-01.md deleted file mode 100644 index a440d6b8..00000000 --- a/docs/edit-history/2026-04-01.md +++ /dev/null @@ -1,107 +0,0 @@ -# Edit History — 2026-04-01 - -## Summary - -Completed **Phase 2 end-to-end validation** (SEO agent template, Docker build fixes, A2A proxy endpoint) and **comprehensive PLAN.md rewrite** (10 → 15 phases). Two rounds of code review fixes on canvas and platform code. - -## Phase 2: End-to-End Validation - -### SEO Agent Template (8a) -- Created `workspace-configs-templates/seo-agent/config.yaml` — Tier 1, two skills, web_search tool -- Created `workspace-configs-templates/seo-agent/system-prompt.md` — SEO specialist identity -- Created `workspace-configs-templates/seo-agent/skills/generate-seo-page/SKILL.md` — 4-step SEO page generation process -- Created `workspace-configs-templates/seo-agent/skills/generate-seo-page/tools/score_seo.py` — @tool function scoring content for SEO factors (word count, keyword density, H1/H2 structure, meta description) -- Created `workspace-configs-templates/seo-agent/skills/audit-seo-page/SKILL.md` — Comprehensive SEO audit checklist - -### Docker Build Fixes (8b) -- Fixed `workspace/requirements.txt` — `a2a-python>=0.2.0` → `a2a-sdk[http-server]>=0.3.0` (correct PyPI package) -- Fixed `workspace/agent.py` — Use `ChatAnthropic` directly instead of `init_chat_model` (not available in current langchain-core). Added provider-agnostic model loading with ImportError handling. -- Fixed `workspace/skills/loader.py` — Detect tools via `isinstance(BaseTool)` instead of `is_tool` attribute (Pydantic v2 compatibility) -- Fixed `workspace/tools/delegation.py` — Removed `is_tool` attribute set (Pydantic v2 rejects arbitrary attributes on StructuredTool) - -### End-to-End Deployment Verified (8c-8d) -- Container starts, loads 2 skills, serves Agent Card at `/.well-known/agent-card.json` -- Registers with platform → status `online` with full Agent Card (skills, capabilities) -- A2A proxy forwards messages through platform to agent → agent calls Claude API -- Blocked by 429 rate limit during testing (subscription quota contention) - -## POST /workspaces/:id/a2a Proxy Endpoint (Phase 11, 17s) - -**New files:** -- `workspace-server/internal/handlers/workspace.go` — Added `ProxyA2A` handler -- `workspace-server/internal/router/router.go` — Added route - -**Behavior:** -1. Resolves workspace URL via Redis cache → DB fallback -2. Wraps bare method+params in JSON-RPC 2.0 envelope if needed -3. Forwards to agent with 120s timeout via shared `http.Client` -4. Request body capped at 1MB, response at 10MB (`io.LimitReader`) -5. Returns agent response directly - -## Code Review Fixes (Round 3) - -- `workspace.go` — Reuse package-level `http.Client`, add `io.LimitReader` for request (1MB) and response (10MB), handle `json.Marshal` error -- `delegation.py` — Removed `/a2a` suffix from target URL (A2A SDK serves at root), read retry config from env vars (`DELEGATION_RETRY_ATTEMPTS`, `DELEGATION_RETRY_DELAY`, `DELEGATION_TIMEOUT`) -- `a2a_executor.py` — Return error message on empty input instead of sending `str(parts)` to LLM, use tuple message format `("user", text)` -- `loader.py` — Move `BaseTool` import outside the per-file loop -- `score_seo.py` — Use word-boundary regex (`\b`) for keyword density instead of substring count -- `agent.py` — Catch `ImportError` for optional provider packages with clear install message - -## PLAN.md Comprehensive Rewrite - -Expanded from 10 phases to 15 phases after cross-referencing all 29 docs files: -- Added Phase 5 (Agent Management), Phase 9 (HMA Memory), Phase 12 (Code Sandbox), Phase 13 (Runtime Enhancements), Phase 15 (SaaS Preparation) -- Tracked all PRD features: F1.13 connection breakage, F4.4 ClawHub, F6.5 configurable approval rules -- Added event type annotations (AGENT_*, WORKSPACE_EXPANDED/COLLAPSED) -- Final counts: 15 phases, 107 tracked items (25 done, 81 todo, 1 partial) - -## Canvas Drag-to-Nest (Phase 3, 9e) - -- `canvas/src/store/canvas.ts` — Added `dragOverNodeId`, `setDragOverNode`, `nestNode()`, `isDescendant()` to store -- `canvas/src/components/Canvas.tsx` — Wrapped in `ReactFlowProvider`, uses `getIntersectingNodes()` during `onNodeDrag` for overlap detection, `onNodeDragStop` to finalize nesting or un-nesting -- `canvas/src/components/WorkspaceNode.tsx` — Green ring highlight (`border-green-500 ring-2 scale-105`) when node is a valid drop target - -## Provisioner Package (Phase 4, 10a-10g) - -- Created `workspace-server/internal/provisioner/provisioner.go` — Docker SDK integration with `Start()`, `Stop()`, `IsRunning()` -- Wired into workspace creation: `POST /workspaces` with `template` field triggers auto-provisioning -- Added `POST /workspaces/:id/retry` endpoint for failed workspaces -- Secret injection from `workspace_secrets` table -- Lifecycle: provisioning → online (via heartbeat) or → failed (3min timeout) -- Provisioner initialized in `main.go` with graceful degradation if Docker unavailable - -## Agent Management (Phase 5, 11a-11d) - -- Created `workspace-server/internal/handlers/agent.go` with 4 endpoints -- `POST /workspaces/:id/agent` — assign (AGENT_ASSIGNED) -- `PATCH /workspaces/:id/agent` — replace model (AGENT_REPLACED) -- `DELETE /workspaces/:id/agent` — remove (AGENT_REMOVED) -- `POST /workspaces/:id/agent/move` — move between workspaces (AGENT_MOVED on both) - -## Bundle Export/Import (Phase 6, 12a-12c) - -- Created `workspace-server/internal/bundle/` package (types.go, exporter.go, importer.go) -- `GET /bundles/export/:id` — serialize workspace → bundle JSON with recursive sub-workspaces -- `POST /bundles/import` — create workspace records + trigger provisioner from bundle -- Created `workspace-server/internal/handlers/bundle.go` - -## A2A Proxy Fix - -- Fixed `POST /workspaces/:id/a2a` — injects `messageId` into `params.message` (required by a2a-sdk v0.3+) -- Fixed stale process on port 8080 serving old binary - -## Code Review Fixes (Round 4) - -- Removed debug log lines from ProxyA2A -- Guard short ID slicing in provisioner and importer (`[:12]`, `[:8]`) -- Fixed `findConfigDir` to match by workspace name in config.yaml -- Handle `os.WriteFile` errors in bundle importer -- Handle DB error in agent Move handler (not just ErrNoRows) -- Handle `json.Unmarshal` error in bundle exporter -- Use optional chaining in `isDescendant` store function - -## CLAUDE.md Updates - -- Added API routes: config, memory, a2a, retry, agent CRUD, bundles -- Fixed migration count from 5 to 6 -- Added `CONFIGS_DIR` env var diff --git a/docs/edit-history/2026-04-02.md b/docs/edit-history/2026-04-02.md deleted file mode 100644 index 3502f3fa..00000000 --- a/docs/edit-history/2026-04-02.md +++ /dev/null @@ -1,96 +0,0 @@ -# Edit History — 2026-04-02 - -## Summary - -Added **Settings tab** (per-workspace LLM/API key configuration), **Terminal tab** (shell access into containers), **Restart button** for offline workspaces, **editable Agent Card**, and **coding-agent workspace template** (OpenClaw-style). Six rounds of code review fixes for security, cleanup, and robustness. - -## Settings Tab (LLM & API Keys) - -**New files:** -- `canvas/src/components/tabs/SettingsTab.tsx` — Quick-set rows for ANTHROPIC_API_KEY, OPENAI_API_KEY, GOOGLE_API_KEY, SERP_API_KEY, MODEL_PROVIDER. Custom env var editor. Values stored via `/workspaces/:id/secrets`, never exposed to browser. -- `workspace-server/internal/handlers/secrets.go` — GET/POST /workspaces/:id/secrets (keys only), DELETE /workspaces/:id/secrets/:key, GET /workspaces/:id/model. UUID validation on workspace ID, BYTEA scan for future encryption compat. - -## Terminal Tab (Container Shell Access) - -**New files:** -- `canvas/src/components/tabs/TerminalTab.tsx` — xterm.js terminal with dark theme, WebSocket to `/workspaces/:id/terminal`, status bar, reconnect button, proper cleanup on unmount. -- `workspace-server/internal/handlers/terminal.go` — WebSocket upgrade, Docker exec /bin/sh, bridges stdin/stdout. Restricted origins (localhost only), shared Docker client from provisioner, 30min idle timeout. - -## Restart Button for Offline/Failed Workspaces - -- `workspace-server/internal/handlers/workspace.go` — Added `POST /workspaces/:id/restart`. Works for offline/failed/degraded. Stops existing container, resets to provisioning, auto-finds template by normalizing workspace name. -- `canvas/src/components/tabs/DetailsTab.tsx` — Green Restart/Retry button visible when workspace is offline, failed, or degraded. - -## Editable Agent Card - -- `canvas/src/components/tabs/DetailsTab.tsx` — AgentCardEditor component. Click "Edit Agent Card" → JSON textarea → Save pushes via `POST /registry/update-card`. Validates JSON, shows success toast. - -## Coding Agent Template (OpenClaw-style) - -**New template: `workspace-configs-templates/coding-agent/`** -- `config.yaml` — Tier 2, 4 skills, filesystem + web_search tools -- `system-prompt.md` — Full-stack engineer identity -- `skills/code-generation/` — SKILL.md + `file_ops.py` (read_file, write_file, list_files, search_code with path traversal protection) -- `skills/shell-exec/` — SKILL.md + `shell.py` (run_shell with timeout, output truncation, destructive command blocklist, process kill on timeout) -- `skills/code-review/` — SKILL.md (review checklist) -- `skills/debug-assist/` — SKILL.md (debug process) - -**Infrastructure:** -- `workspace/Dockerfile` — Added `/workspace` volume -- `workspace-server/internal/provisioner/provisioner.go` — Mount `ws-{id}-workspace` named volume for Tier 2+ (Tier 1 stays read-only) - -## A2A Error Handling Fixes - -- `workspace/a2a_executor.py` — Catch exceptions from `agent.astream()`, return as agent message. Handle Anthropic content blocks (list of dicts). -- `canvas/src/components/tabs/ChatTab.tsx` — Handle JSON-RPC error responses separately from results. Show "Agent error: ..." instead of "(empty response)". -- `workspace-server/internal/handlers/workspace.go` — Inject `messageId` into A2A proxy requests (required by a2a-sdk v0.3+). - -## Code Review Fixes (Rounds 4-6) - -- **Round 4**: Remove debug logs, guard short ID slicing, fix findConfigDir name matching, handle WriteFile errors, handle DB error in agent Move, handle json.Unmarshal error, use optional chaining in isDescendant. -- **Round 5**: Restrict WebSocket origins to localhost, use request context, share Docker client, fix reconnect via connectKey, fix cleanup with refs, validate UUID, per-row saving state, add session timeout, scan BYTEA as []byte. -- **Round 6**: Fix path traversal in file_ops (_resolve validates path stays within WORKSPACE_DIR), add command blocklist in shell.py, kill subprocess on timeout, return error when no template found in Restart. - -## SidePanel Updates - -- Expanded from 5 tabs to 7 tabs (added Settings and Terminal) -- Widened from 420px to 480px -- Tab bar now scrollable with `overflow-x-auto` -- PanelTab type expanded: `"details" | "chat" | "settings" | "terminal" | "config" | "memory" | "events"` - -## Arbitrary Prompt Files (prompt_files support) - -**Problem:** Different agent frameworks use different file structures — OpenClaw has SOUL.md/AGENTS.md/HEARTBEAT.md/etc, Claude Code uses CLAUDE.md, Codex uses AGENTS.md. Our runtime only supported `system-prompt.md`. - -**Solution:** Added `prompt_files` field to `config.yaml` that lists which markdown files to load (in order) as the system prompt: - -```yaml -# OpenClaw-style -prompt_files: [SOUL.md, BOOTSTRAP.md, AGENTS.md, HEARTBEAT.md, TOOLS.md, USER.md] - -# Claude Code-style -prompt_files: [CLAUDE.md] - -# Default (backwards compatible — if omitted, loads system-prompt.md) -``` - -**Files changed:** -- `workspace/config.py` — Added `prompt_files` field to WorkspaceConfig -- `workspace/prompt.py` — `build_system_prompt()` loads prompt_files in order, falls back to `system-prompt.md` -- `workspace/main.py` — Pass `config.prompt_files` to `build_system_prompt()` - -**Coding agent updated to use OpenClaw-style files:** -- Renamed `system-prompt.md` → `SOUL.md` (core identity) -- New `BOOTSTRAP.md` (project orientation on first task) -- New `TOOLS.md` (tool usage guidelines and safety rules) -- New `AGENTS.md` (multi-agent delegation protocol) -- `config.yaml` updated: `prompt_files: [SOUL.md, BOOTSTRAP.md, TOOLS.md, AGENTS.md]` - -**Tested with 5 agent frameworks:** -- OpenClaw (7 files) — PASS -- Claude Code (CLAUDE.md) — PASS -- OpenAI Codex (AGENTS.md) — PASS -- SEO Agent (backwards compat, no prompt_files) — PASS -- Coding Agent (OpenClaw-style, 4 files) — PASS - -Full Docker skill loading and prompt assembly tests also pass (4/4 templates, 9/9 content checks). diff --git a/docs/edit-history/2026-04-04.md b/docs/edit-history/2026-04-04.md deleted file mode 100644 index 1b6060b0..00000000 --- a/docs/edit-history/2026-04-04.md +++ /dev/null @@ -1,210 +0,0 @@ -# Edit History — 2026-04-04 - -## Summary - -Major session covering **file explorer**, **template import/replace**, **bundle drop zone**, **team expansion**, **canvas toolbar**, **viewport persistence**, **hot-reload**, **WebSocket events**, **Docker Compose**, **plugin system** (ECC + Superpowers), **production hardening** (graceful shutdown, rate limiting, AES-256 secrets), **human-in-the-loop** approval chain, **HMA memory** (L1/L2/L3), **coordinator pattern**, **code sandbox**, **memory consolidation**, **Langfuse trace preview**, **search dialog**, **toast notifications**, **OpenRouter provider**, **skill installer**, **ClawHub integration**, **team zoom-in**, **connection breakage visualization**, and 15 code review rounds. UI E2E tested via browser: 95/104 plan items done (91%). - -## File Explorer Tab (FilesTab) - -- Tree view with collapsible directories and file type icons -- Inline code editor with monospace font, Ctrl/Cmd+S save, Tab inserts spaces -- Create new files with path input, delete with confirmation -- Platform endpoints: `GET/PUT/DELETE /workspaces/:id/files/*path` for individual files, `GET /workspaces/:id/files` for tree listing -- Path traversal protection via `validateRelPath()` -- `useMemo` for tree building, ref-based cursor positioning - -## Agent Import & Replace via UI - -- **Import Agent Folder**: button in template palette, uses `webkitdirectory` file picker, uploads to `POST /templates/import` -- **Replace Agent Files**: button in DetailsTab, uploads folder to `PUT /workspaces/:id/files`, auto-restarts, confirmation before destructive replace -- Auto-generates `config.yaml` detecting prompt_files and skills from uploaded files -- CLI script `import-agent.sh` also available - -## Canvas Visual Overhaul - -- **Toolbar**: fixed top-center bar with Molecule AI logo, live status counts, workspace total -- **WorkspaceNode redesign**: status gradient bar, team badge (child count), pill-shaped handles, 4 skill badges -- **SidePanel**: slide-in animation, tab icons, UUID footer, darker backdrop blur -- **ContextMenu**: status dot header, expand/collapse team actions -- **Animations**: node fade-in on mount, panel slide-in from right -- **Scrollbar**: custom thin zinc styling - -## Bundle & Team Features - -- **BundleDropZone**: drag `.bundle.json` onto canvas to import, visual overlay, progress spinner, toast -- **Context menu**: Export Bundle downloads file, Duplicate exports+re-imports -- **Team expansion APIs**: `POST /workspaces/:id/expand` creates sub-workspaces from config, `POST /collapse` removes them -- **Context menu**: Expand to Team / Collapse Team -- **bundle-compile.sh**: compiles all templates to `.bundle.json` artifacts - -## Runtime & Infrastructure - -- **Viewport persistence**: `GET/PUT /canvas/viewport`, debounced save on pan/zoom, restore on load -- **Hot-reload watcher** (`watcher.py`): polls config dir for hash changes, debounced reload -- **WebSocket subscriber** (`events.py`): connects to platform `/ws`, triggers prompt rebuild on peer events -- **Langfuse auto-injection**: detects env vars, creates CallbackHandler -- **Cross-workspace trace linking**: delegation tool passes `parent_task_id` in A2A metadata -- **Workspace forwarding**: discovery follows `forwarded_to` chain (max 5 hops) -- **Full Docker Compose**: 6 services with health checks on shared `molecule-monorepo-net` -- **Auto-discover configs dir**: `findConfigsDir()` searches parent directories - -## Bug Fixes - -- **A2A executor**: switched from `astream` to `ainvoke` for reliable responses across models -- **Chat response parsing**: handle `result.parts[]` with `kind` field (a2a-sdk v0.3 format) -- **Terminal handler**: tries multiple container name patterns (ws-{id}, workspace-name) -- **A2A proxy**: injects `messageId` into `params.message` (required by a2a-sdk) -- **Path traversal**: `validateRelPath()` blocks `../` and absolute paths in file uploads -- **Shared utilities**: `normalizeName()`, `writeFiles()`, `generateDefaultConfig()` extracted to eliminate 3x duplication - -## ECC & Superpowers Integration - -### Workspace Templates -- **ecc-coding-agent**: Everything Claude Code with 10 curated skills (coding-standards, tdd-workflow, e2e-testing, security-review, api-design, backend-patterns, frontend-patterns, deep-research, shell-exec), CLAUDE.md + AGENTS.md + WORKING-CONTEXT.md as prompt files -- **superpowers-agent**: obra/superpowers with 16 skills (14 from superpowers + shell-exec + file-ops), code-reviewer agent, 3 commands (brainstorm, write-plan, execute-plan) -- **import-ecc.sh**: CLI script to import individual or all 156 ECC skills as templates - -### Plugin System (integrated into every workspace) -- `workspace/plugins.py`: scans `/plugins/` for installed plugins, loads rules/*.md, prompt fragments, and skills directories -- `plugins/ecc/`: ECC guardrails rules + 5 shared skills (coding-standards, tdd-workflow, security-review, api-design, deep-research) + AGENTS.md prompt fragment -- `plugins/superpowers/`: 5 shared skills (test-driven-development, systematic-debugging, writing-plans, executing-plans, verification-before-completion) -- Every workspace agent auto-inherits plugin rules + skills (deduplicated by ID, workspace skills take priority) -- Provisioner mounts `/plugins:ro` into every container -- docker-compose.yml mounts `./plugins` for platform - -### Prompt Integration -- `prompt.py`: accepts `plugin_rules` and `plugin_prompts` parameters -- Rules injected as "Platform Rules" section -- Prompt fragments injected as "Platform Guidelines" section -- Plugin skills merged after workspace skills (deduplicated) - -## Code Review Fixes (Rounds 7-9) - -- Round 7: path traversal fix, dedup normalization/config generation, file upload limit (200), delete confirmation for replace -- Round 8: remove unused `getLang`, success timer ref cleanup, delete confirmation for files, textarea ref for Tab key, `useMemo` for tree -- Round 9: deduplicate plugin skills by ID (workspace takes priority), remove arbitrary 50-char prompt size filter -- Round 10: rate limiter goroutine leak (accept context), crypto key warning on invalid, .env.example sync -- Round 11: unused useReactFlow import, search state lifted to store, synthetic event replaced -- Round 12: remove unnecessary comment in Toolbar -- Round 13: N+1 approval polling → single endpoint, auto-expiry, TEAM scope query fix, configurable poll env vars -- Round 14: sandbox shell injection (mount temp file), consolidation fallback path, SandboxConfig wired to load_config -- Round 15: executor returns only AI messages (skip tool results), locale middleware narrowed to known codes -- Round 16: MCP server network error handling + startup platform validation -- Round 17: parallel file downloads in Files tab - -## MCP Server - -- `mcp-server/` — TypeScript MCP server using `@modelcontextprotocol/sdk`, stdio transport -- **20 tools**: list/create/get/delete/restart workspaces, chat with agent, assign model, set/list secrets, list/read/write/delete files, commit/search memory, list templates, expand/collapse team, list/decide approvals -- Works with Claude Code, Cursor, Codex, OpenCode — any MCP client -- Added to `.mcp.json` for immediate Claude Code integration -- Startup validates platform connectivity, logs warning with fix instructions -- Network errors return friendly messages instead of crashing - -## Embedded Sub-Workspaces - -- **WorkspaceNode redesign**: parent nodes expand in size (320-450px) when they have children. Children render as embedded mini-cards in a "Team Members" 2-column grid inside the parent node. -- **No separate nodes/edges**: child nodes set `hidden: true` in React Flow, no parent→child edges created. Children exist in store for data access but render visually inside the parent. -- **Click child chip**: selects the child and opens its side panel. -- **Toolbar**: shows "4 workspaces + 1 sub" to distinguish root vs embedded counts. -- **Fix**: children selector memoized with `useMemo` to prevent infinite render loop (Zustand `.filter()` creates new array reference every render). - -## Files Tab Enhancements - -- **Upload**: folder picker to upload multiple files into workspace -- **Export**: download all workspace files as JSON bundle (parallel via Promise.allSettled) -- **Clear**: delete all files with red confirmation dialog -- **Download**: ↓ button for currently open file -- **Folder delete**: ✕ on hover for directories, confirmation shows "and all its contents", platform uses os.RemoveAll - -## Memory Consolidation Loop (Phase 9, 15g) - -- `consolidation.py`: runs every 5min, checks if LOCAL memories exceed threshold (10), uses agent to summarize into dense TEAM knowledge, deletes originals. Falls back to concatenation if agent unavailable/rate-limited with error-level logging. -- Configurable via `CONSOLIDATION_INTERVAL` and `CONSOLIDATION_THRESHOLD` env vars. - -## Code Sandbox (Phase 12, 18a-18b, 18e) - -- `tools/sandbox.py`: `run_code(code, language)` tool -- Subprocess backend: direct execution with timeout (default for Tier 1-2) -- Docker backend: code written to temp file, mounted read-only at `/sandbox/code.py` (no shell injection). Container runs with `--network none --memory 256m --cpus 0.5 --read-only --tmpfs /tmp:size=32m`. -- Supports python, javascript, shell/bash -- `SandboxConfig` dataclass in config.py, loaded from `sandbox:` in config.yaml -- Configurable via `SANDBOX_BACKEND`, `SANDBOX_TIMEOUT`, `SANDBOX_MEMORY_LIMIT` env vars - -## Connection Breakage Visualization (Phase 11, 17p) - -- Edges in canvas styled by child workspace status: - - Online: animated, dark zinc stroke - - Degraded: amber stroke, thicker (2px), no animation - - Offline/Failed: dashed gray stroke, no animation - -## Interactive Canvas Improvements - -- **Search dialog (⌘K)**: `SearchDialog.tsx` — fuzzy search across name/role/status, click to select + open Details, state managed in Zustand store -- **Toast notifications**: `Toaster.tsx` — global `showToast()` callable anywhere, success/error/info variants, auto-dismiss 4s, slide-up animation. Context menu actions show toasts. -- **Empty state**: `EmptyState.tsx` — shown when no workspaces, visual guide with keyboard shortcuts -- **Keyboard shortcuts**: Escape cascades (close context menu → close panel), ⌘K opens search -- **Toolbar search button**: magnifying glass + ⌘K badge, calls `store.setSearchOpen(true)` - -## Human-in-the-Loop Approval Chain (Phase 8, 14a-14f) - -- **Migration 007_approvals.sql**: `approval_requests` table with status (pending/approved/denied/escalated) -- **handlers/approvals.go**: POST create, GET list, GET /approvals/pending (single query for all), POST decide. Auto-expires pending requests older than 10 minutes. -- **Platform events**: APPROVAL_REQUESTED, APPROVAL_ESCALATED (to parent), APPROVAL_APPROVED, APPROVAL_DENIED -- **tools/approval.py**: `request_approval(action, reason)` tool. Pauses agent, polls for decision (configurable via APPROVAL_POLL_INTERVAL/APPROVAL_TIMEOUT env vars). -- **ApprovalBanner.tsx**: polls single `/approvals/pending` endpoint, shows approve/deny cards with workspace name, slide-in-from-top animation, toast feedback. -- Agent decides when to call `request_approval` based on system prompt guidelines. - -## Hierarchical Memory Architecture (Phase 9, 15a-15f) - -- **Migration 008_agent_memories.sql**: `agent_memories` table with pgvector extension, scope (LOCAL/TEAM/GLOBAL) -- **handlers/memories.go**: POST commit, GET search (with text ILIKE), DELETE - - LOCAL: workspace_id only - - TEAM: parent + siblings via parent_id join, CanCommunicate check, excludes removed workspaces - - GLOBAL: readable by all, write restricted to root (no parent_id) -- **tools/memory.py**: `commit_memory(content, scope)` and `search_memory(query, scope)` tools -- Every agent now has 4 built-in tools: delegate, approve, commit_memory, search_memory - -## OpenRouter Provider + E2E Testing - -- **agent.py**: added OpenRouter as 5th LLM provider (uses langchain-openai with custom base URL) -- **requirements.txt**: added `langchain-openai>=0.3.0` -- **Supported providers**: anthropic, openai, openrouter, google_genai, ollama -- **E2E test results**: 40/41 pass (SEO + Echo agents via OpenRouter claude-3.5-haiku). Tests: workspace CRUD, agent management, registry/discovery, heartbeat/status, A2A chat, secrets, HMA memory, approvals, config, templates, viewport, events, files, cascade delete. - -## Langfuse Trace Preview (Phase 10, 16c) - -- **handlers/traces.go**: `GET /workspaces/:id/traces` proxies to Langfuse API -- **TracesTab.tsx**: expandable trace list with input/output, latency, tokens, cost -- **SidePanel**: 9 tabs now (added Traces) - -## Team Zoom-in + Skill Installer + ClawHub - -- **Canvas zoom-in (13g)**: double-click team node → fitBounds animation -- **SkillInstaller (17n)**: type skill name → creates SKILL.md in workspace files -- **ClawHub (17q)**: "Install from ClawHub" button sends install command via A2A - -## Coordinator Pattern (Phase 7, 13c) - -- `workspace/coordinator.py`: auto-detects children on startup, injects team description into prompt, adds `route_task_to_team` tool -- When workspace has children → becomes coordinator that routes A2A messages to best-suited child based on capabilities -- Coordination rules injected: analyze task, choose member, delegate, aggregate, fallback - -## Production Hardening (Phase 14) - -### Graceful Shutdown (20e) -- `main.go`: signal handler for SIGINT/SIGTERM, context cancellation stops liveness monitor + Redis subscriber, HTTP server drains connections (30s timeout), WebSocket hub Close() disconnects all clients -- `ws/hub.go`: added Close() method - -### Rate Limiting (20d) -- `middleware/ratelimit.go`: token bucket rate limiter (100 req/min/IP), accepts context for clean shutdown, auto-cleanup of stale buckets every 5min -- Applied globally via router middleware - -### Secrets Encryption (20c) -- `crypto/aes.go`: AES-256-GCM encrypt/decrypt, enabled via `SECRETS_ENCRYPTION_KEY` env var (32 bytes raw or base64), graceful degradation to plaintext if not set, warnings on invalid key -- `secrets.go`: encrypt on write, decrypt on read -- `workspace.go`: decrypt secrets before injecting into provisioned containers - -### Delete Team Cascade (13h) -- `workspace.go` Delete handler: without `?confirm=true` returns children list for confirmation, with confirm cascades delete to all sub-workspaces (stops containers, removes from DB, broadcasts events) -- `DetailsTab.tsx`: passes `?confirm=true` after user confirms diff --git a/docs/edit-history/2026-04-05.md b/docs/edit-history/2026-04-05.md deleted file mode 100644 index 5c5c231d..00000000 --- a/docs/edit-history/2026-04-05.md +++ /dev/null @@ -1,208 +0,0 @@ -# Edit History — 2026-04-05 - -## Summary - -Session focused on **recursive sub-workspace rendering**, **eject/extract UX**, and **embedded nesting bug fixes**. Child nodes now properly hide/show when nested/un-nested, the eject button replaces the old close icon with a distinct sky-blue arrow, and sub-workspaces render recursively up to 3 levels deep with full status detail on each chip. Context menu gains "Extract from Team" action. Six code review fixes applied. API test fix for register endpoint field name. - -## Embedded Sub-Workspace Fixes (canvas/src/store/canvas.ts) - -- **`nestNode` visibility**: now sets `hidden: !!targetId` so child nodes disappear from the canvas when nested into a parent and reappear when un-nested (dragged to empty canvas) -- **`removeNode` fix**: was incorrectly reading `n.parentId` (React Flow's layout field) instead of `n.data.parentId` (the actual hierarchy field). Fixed to use `n.data.parentId`. Also properly sets `hidden` on re-parented children and simplified edge cleanup logic. - -## Eject/Extract Button (canvas/src/components/WorkspaceNode.tsx) - -- Replaced the generic close icon on embedded child chips with a new `EjectIcon` SVG (arrow pointing up-right) — visually distinct from delete -- Hover color changed from red to sky-blue to reinforce "extract" (not "delete") semantics -- Each embedded child chip shows the eject button on hover to extract from team - -## Recursive Sub-Workspaces (canvas/src/components/WorkspaceNode.tsx) - -New `TeamMemberChip` component that recursively renders children as mini-cards inside parent nodes: - -- Each sub-card mirrors the parent card layout: status dot + gradient bar, name, tier badge, skills pills, status label, active tasks count, descendant count badge -- Sub-cards can contain their own "Team" section with further nested sub-cards -- `MAX_NESTING_DEPTH = 3` constant caps recursion to prevent runaway rendering -- `countDescendants()` helper counts all descendants recursively (memoized via `useMemo`) -- Parent node dynamically sizes based on nesting depth: - - No children: 210-280px - - With children: 320-450px - - With grandchildren: 400-560px -- Badge shows total descendant count, not just direct children -- Callbacks passed as props (`onSelect`, `onExtract`) instead of individual store subscriptions per chip — avoids N+1 Zustand subscriptions - -## Context Menu Updates (canvas/src/components/ContextMenu.tsx) - -- Added `nestNode` store access -- New "Extract from Team" menu item with up-arrow icon for child nodes -- `handleRemoveFromTeam` with try/catch error handling -- Toast notification says "Extracted from team" (consistent wording with eject button) - -## Code Review Fixes Applied - -1. **`countDescendants` memoized** via `useMemo` to prevent recalculation on every render -2. **Stable `handleExtract` callback** via `useCallback` to prevent unnecessary re-renders -3. **Invalid Tailwind class** `bg-zinc-750/70` changed to valid `bg-zinc-700/70` -4. **Sub-children layout** changed from 2-column grid to `space-y-1` (single column) at all depths to prevent content overflow -5. **Removed fragile `col-span-2` class** that caused layout issues with odd numbers of children - -## Code Review Rounds 18–21 (canvas components + store) - -Comprehensive review across `WorkspaceNode.tsx`, `canvas.ts`, `ContextMenu.tsx`, `Toolbar.tsx`. All issues resolved: - -### Critical fixes -- **`countDescendants` cycle protection**: added `visited` Set parameter to prevent infinite recursion on circular `parentId` references -- **`WORKSPACE_REMOVED` re-parents children**: event handler now re-parents orphaned children to the removed node's parent and clears stale `selectedNodeId` — matching `removeNode` behavior - -### Performance fixes -- **`useHierarchyInfo` consolidated hook**: replaced separate `useChildNodes` + `allNodes` subscriptions with a single stable selector that returns children, `hasGrandchildren`, and `descendantCount` — prevents redundant re-renders on every node drag -- **`EmbeddedTeam` wrapper component**: isolates the `allNodes` store subscription to only mount when children exist, so leaf nodes don't subscribe at all -- **`Toolbar` single-pass counts**: replaced 6 `.filter()` passes with a single `useMemo` reduce loop -- **`ContextMenu` reactive selector**: replaced stale `getState()` during render with proper `useCanvasStore()` selector for `hasChildren`, moved above early return for hooks compliance - -### Type safety / cleanup -- Removed unsafe `data as unknown as WorkspaceNodeData` double cast in `openContextMenu` call -- Removed redundant `as Record | null` casts on `data.agentCard` -- Added runtime `typeof` guard for `agent_card` in `AGENT_CARD_UPDATED` event handler -- Renamed `children` prop to `members` in `EmbeddedTeam` to avoid React reserved prop name -- Removed `console.error` in `savePosition` — silent catch like other non-critical handlers -- Consistent `selectedNodeId` destructuring at top of `applyEvent` instead of separate `get()` call - -## Dual URL Routing for Agent-to-Agent Communication - -Docker containers can't reach `127.0.0.1:PORT` (that's their own loopback). Discovery endpoint now returns different URLs based on caller: - -- **Workspace caller** (`X-Workspace-ID` header present) → Docker-internal URL (`http://:8000`) -- **Canvas/proxy** (no header) → Host-mapped URL (`http://127.0.0.1:`) - -Implementation: -- `CacheInternalURL` / `GetCachedInternalURL` in `db/redis.go` — separate Redis key (`ws:{id}:internal_url`) -- Register endpoint caches agent-reported URL as internal URL -- Discovery checks internal URL first when `X-Workspace-ID` is present, falls back to host URL - -Verified both directions: Echo Agent delegated to SEO Agent (got SEO advice back), SEO Agent delegated to Echo Agent (got echo back). - -## A2A End-to-End Pipeline (8e) — Fully Working - -Verified the full pipeline: Canvas → Platform proxy (POST /workspaces/:id/a2a) → Docker agent container → OpenRouter API → LLM response. - -### Infrastructure fixes to make it work - -1. **`findConfigsDir` validation** (main.go): auto-discovery was finding a stale empty `workspace-server/workspace-configs-templates/` dir before the real one at `../workspace-configs-templates/`. Fixed by requiring at least one template with `config.yaml` inside the dir. -2. **`PLATFORM_URL` for Docker containers** (main.go): was hardcoded to `http://localhost:PORT`. Containers can't reach host's localhost. Changed to `http://host.docker.internal:PORT`. Now configurable via `PLATFORM_URL` env var. -3. **Host port mapping** (provisioner.go): platform runs on host but agents run in Docker. Added ephemeral host port binding (`127.0.0.1:0→8000/tcp`) and resolved actual port via `ContainerInspect` after start. -4. **Provisioner URL preservation** (workspace.go + registry.go): provisioner returns `http://127.0.0.1:PORT` URL, but agent self-registration overwrites it with Docker-internal hostname. Fixed: pre-store provisioner URL in DB+Redis; register endpoint preserves URLs starting with `http://127.0.0.1`. - -### Code review fixes (round 22) -- Provisioner URL storage errors now logged (were silently ignored) -- Registration reads URL from DB instead of Redis (avoids TTL race condition) -- Test timeout configurable via `A2A_TIMEOUT` env var - -### OpenRouter max_tokens fix (workspace/agent.py) -- LangChain ChatOpenAI defaults to 64000 max_tokens which exceeds free-tier credits -- Added `MAX_TOKENS` env var (default 2048) for OpenRouter provider - -## Bundle Round-Trip Test (12j) - -Added to `test_api.sh`: export → delete → import → verify name/tier/agent_card match with new ID. 9 new assertions, all passing. - -## Comprehensive A2A E2E Test Suite (test_a2a_e2e.sh) - -New test script with 22 assertions across 12 test scenarios using free `google/gemini-2.5-flash` via OpenRouter: - -1. Basic message/send — Echo Agent -2. Basic message/send — SEO Agent -3. Auto JSON-RPC envelope wrapping (bare request) -4. Full JSON-RPC 2.0 envelope with custom ID preserved -5. Invalid method returns -32601 error -6. Offline workspace returns error -7. Nonexistent workspace returns 404 -8. Multi-turn conversation -9. Long input handling (50 sentences) -10. Peer discovery (agents see each other) -11. Agent cards reflect skills -12. Heartbeat updates uptime - -## Activity Logging, A2A Communication Tracking, and Current Task Visibility - -Full-stack feature for comprehensive workspace activity logging, inter-agent communication visibility, and real-time current task display. - -### Backend (Go Platform) -- **Migration 009** (`workspace-server/migrations/009_activity_logs.sql`): new `activity_logs` table (workspace_id, activity_type, source/target, method, summary, request/response JSONB, duration_ms, status, error_detail) with composite index. Added `current_task TEXT` to workspaces table. -- **Activity handler** (`workspace-server/internal/handlers/activity.go`): `GET /workspaces/:id/activity` (list with type filter + limit cap at 500), `POST /workspaces/:id/activity` (agent self-report with type validation) -- **A2A proxy logging** (`workspace.go`): ProxyA2A now logs every request/response to activity_logs with method, duration, status. Uses `context.WithoutCancel` for async goroutine. -- **Heartbeat current_task** (`registry.go`): HeartbeatPayload extended with `current_task`. Reads prev value before UPDATE, only broadcasts `TASK_UPDATED` on change. -- **BroadcastOnly** (`broadcaster.go`): WebSocket-only broadcast (no structure_events insert) for high-frequency events. -- **Activity retention**: Background goroutine in `main.go` with configurable retention via `ACTIVITY_RETENTION_DAYS` (default 7) and `ACTIVITY_CLEANUP_INTERVAL_HOURS` (default 6) env vars. - -### Frontend (Canvas) -- **ActivityTab** (`canvas/src/components/tabs/ActivityTab.tsx`): Comprehensive activity log viewer with type filters (All, A2A In/Out, Tasks, Logs, Errors), color-coded entries, A2A flow visualization (source→target), expandable request/response JSON, 5s auto-refresh with live/paused toggle. -- **Current task display**: Amber pulsing banner in WorkspaceNode cards and SidePanel header when agent has active task. -- **Store updates**: `currentTask` field in WorkspaceNodeData, `TASK_UPDATED` event handler, `"activity"` panel tab. - -### MCP Server -- Added `list_activity` tool with type/limit filters. - -### Tests (36 new tests) -- **Go**: 25 total (was 14). Added: TaskChanged/Unchanged/Cleared heartbeat, Activity List/ListByType/ListEmpty/ListCustomLimit/ListMaxLimit, Report/ReportAllValidTypes/ReportMissingBody/Report_InvalidType, WorkspaceGet_CurrentTask. -- **Canvas Vitest**: 58 total (was 52). Added: TASK_UPDATED set/clear/unknown/edge cases, ACTIVITY_LOGGED no-op, hydrate currentTask, setPanelTab activity. -- **Integration** (`test_api.sh`): ~62 checks (was ~43). Added 19 activity + current_task checks. -- **E2E** (`test_activity_e2e.sh`): New script with 25 tests requiring 1 online agent — A2A logging verification, self-report, filtering, task lifecycle, cross-workspace isolation. - -## API Test Fix - -- Register endpoint test updated to use `id` field instead of `workspace_id` — discovered during E2E testing that the platform expects the field named `id` - -## CI Pipeline & Test Infrastructure (PM Review Session) - -PM review identified 7 action items: zero test coverage, no CI, no branch protection, stale tasks, no release tags, incomplete bundle round-trip test. All addressed in this session. - -### GitHub Actions CI (`.github/workflows/ci.yml`) -- 4 parallel jobs: Go build+vet+test, Canvas build+vitest, MCP Server build, Python pytest -- Triggers on push to main and PRs targeting main -- Caching: npm for Canvas/MCP, pip for Python, Go modules via setup-go -- Go version set to `stable` (go.mod says 1.25 which doesn't exist in Actions yet) -- Test steps fail on real failures (no `|| true` swallowing) - -### Canvas Store Tests (47 tests) — `canvas/src/store/__tests__/canvas.test.ts` -- Vitest setup with `vitest.config.ts` (node environment, `@/` path alias) -- Tests: selectNode, hydrate (3), applyEvent (11 covering 6 event types), removeNode (5), isDescendant (6), updateNodeData (2), context menu (2), setPanelTab (2), getSelectedNode (3), savePosition (1), saveViewport (1), nestNode (4 including API revert), misc setters (3) -- Global fetch mock with per-test override for API-calling actions - -### Go Handler Tests (9 tests) — `workspace-server/internal/handlers/handlers_test.go` -- Uses go-sqlmock for DB, miniredis for Redis, real Broadcaster with no-op Hub -- Tests: Register (upsert+event), Heartbeat normal/degraded/recovery (status transitions), WorkspaceCreate (201+provisioning), WorkspaceList (multi-row scan), ProxyA2A wrapping/404/503 -- Each test isolates globals via `t.Cleanup` - -### Python Runtime Tests (45 tests) — `workspace/tests/` -- pytest with conftest.py mocking a2a SDK modules (heavy external dep) -- test_config.py (12): load_config, defaults, env overrides, nested configs, FileNotFoundError -- test_heartbeat.py (9): init, record_success/error, error_rate, async HTTP POST, stop -- test_prompt.py (9): prompt files, fallback, plugins, skills, peers, JSON agent_card -- test_skills_loader.py (7): frontmatter parsing, defaults, load_skills, missing SKILL.md -- test_a2a_executor.py (7): text extraction, empty parts, errors, content blocks - -### Stale Task Cleanup -- Closed 4 awareness tasks from April 1 that were already completed: A2A endpoint, templates endpoint, ANTHROPIC_API_KEY (now uses OpenRouter), garbage task - -### PLAN.md -- Marked 12j (bundle round-trip test) as done — test already existed in test_api.sh - -## Parent Context Inheritance Feature - -Implements automatic context file sharing from parent workspaces to direct children, closing the gap between the HMA docs (L2 Team Memory as "Department Drive") and the actual implementation. - -### How It Works -1. Parent declares `shared_context: [architecture.md, conventions.md]` in config.yaml -2. Platform injects `PARENT_ID` env var when provisioning children during Expand -3. Child calls `GET /workspaces/{parent_id}/shared-context` at startup -4. Parent's shared files injected into child's system prompt as `## Parent Context` -5. Grandchildren only see their direct parent's context (1-level inheritance) - -### Files Changed -- `workspace/config.py` — Added `shared_context` field -- `workspace-server/internal/handlers/team.go` — Inject `PARENT_ID` env var during Expand -- `workspace-server/internal/handlers/templates.go` — New `SharedContext` endpoint -- `workspace-server/internal/router/router.go` — Register new route -- `workspace/coordinator.py` — New `get_parent_context()` function -- `workspace/prompt.py` — Added `parent_context` param to `build_system_prompt()` -- `workspace/main.py` — Wire parent context into startup diff --git a/docs/edit-history/2026-04-06.md b/docs/edit-history/2026-04-06.md deleted file mode 100644 index c94b7f1b..00000000 --- a/docs/edit-history/2026-04-06.md +++ /dev/null @@ -1,302 +0,0 @@ -# Edit History — 2026-04-06 - -## Summary - -Merged PR from `HongmingWang-Rabbit/molecule-monorepo#1` (Claude Code workspace runtime + A2A delegation + canvas improvements — 46 commits, 2,548 additions). Then performed comprehensive code review across all 3 layers (Python, Go, TypeScript) and fixed 18 issues (5 critical, 10 warnings, 3 suggestions). - -## Merged PR: Claude Code Workspace Runtime - -- **CLI-based workspace runtimes** — unified executor for Claude Code, Codex, Ollama, or custom CLI agents -- **A2A delegation via MCP + CLI** — `delegate_task`, `delegate_task_async`, `check_task_status`, `list_peers` -- **Canvas improvements** — legend panel, communication overlay, chat persistence with session sidebar, confirmation dialogs, enhanced thinking indicator -- **Platform fixes** — offline→online heartbeat recovery, file API writes to correct config dir, restart uses workspace's own config, configurable rate limiter, Docker-in-Docker mount resolution -- **Security** — unique temp files, shlex.quote for tokens, subprocess kill on timeout, path traversal prevention - -## Code Review Fixes (18 issues) - -### Critical (5 fixed) - -1. **ChatTab.tsx** — Elapsed time calculation was `Date.now() - Date.now() + thinkingStartTime` (always equals `thinkingStartTime`). Fixed to `Date.now() - thinkingStartTime`. -2. **Canvas.tsx** — `saveTimerRef` debounce timer never cleared on component unmount. Added `useEffect` cleanup. -3. **workspace.go Update handler** — All 5 `ExecContext` calls in `Update()` silently discarded errors. Added `log.Printf` on each. -4. **workspace.go Delete handler** — All 4 cascade delete `ExecContext` calls ignored errors. Added `log.Printf` on each. -5. **cli_executor.py** — Temp files leaked if exception occurred between `mkstemp` and `_temp_files.append()`. Moved `append()` immediately after creation. - -### Warnings (10 fixed) - -6. **a2a_cli.py** — `resp.json()` could crash on malformed JSON response. Wrapped in try/except. -7. **a2a_mcp_server.py** — `chunk.decode()` could crash on invalid UTF-8. Added `errors="replace"`. -8. **a2a_cli.py** — Async mode timeout returned misleading `"submitted_timeout"` status. Changed to `"uncertain"` on stderr. -9. **templates.go** — Config files written with 0644 (world-readable). Changed all 4 occurrences to 0600. -10. **CommunicationOverlay.tsx** — `fetchComms` callback recreated on every `nodes` change, causing interval reset. Stabilized with `useRef`. -11. **ContextMenu.tsx** — Delete confirmation dialog orphaned when context menu closed externally. Added `useEffect` cleanup. -12. **ContextMenu.tsx** — No loading guard on export/duplicate async actions. Added `actionLoading` state to prevent double clicks. -13. **cli_executor.py** — `config.args` appended after prompt, breaking CLI flag parsing. Moved before prompt. -14. **main.py** — Any non-`langgraph` runtime silently treated as CLI. Added validation warning for unknown values. -15. **provisioner.go** — Created container not cleaned up if `ContainerStart` failed. Added `ContainerRemove` on failure. - -### Suggestions (3 fixed) - -16. **router.go** — CORS origins hardcoded to localhost. Now configurable via `CORS_ORIGINS` env var (comma-separated). -17. **config.py** — `int()` conversion on tier crashed on non-numeric YAML. Added `.isdigit()` guard with default 1. -18. **ChatTab.tsx** — `loadSessions()` called twice during mount. Consolidated to single call shared between state initializers. - -## Provisioner Auto-Setup (URL Resolution) - -Fixed the core issue preventing workspace chat from working after creation without manual intervention: - -- **provisioner.go** — Now inspects container after start to resolve the actual host-mapped ephemeral port (`127.0.0.1:`), instead of returning the Docker-internal URL. The host URL is stored in DB and Redis, preserved by the registry's `ON CONFLICT` clause when the agent self-registers. -- **workspace.go** — `provisionWorkspace` now also caches the Docker-internal URL (`ws-:8000`) for inter-container discovery. -- **discovery.go** — When a workspace discovers another workspace (via `X-Workspace-ID` header), constructs the Docker-internal URL from the container name convention (`ws-:8000`) when the Redis cache is empty. This enables inter-agent A2A delegation. - -Before: create workspace → agent registers with Docker hostname → proxy gets 502 → manual re-registration needed. -After: create workspace → provisioner stores host URL → proxy works immediately. - -## Grid Layout for Embedded Team Members - -- **WorkspaceNode.tsx** — Departments render in a 3-column grid at depth 0 (was single column). Sub-teams use 2-column grid at depth 1+. Root nodes wider (720-960px) to accommodate side-by-side layout. Company org chart now fits in one screen without scrolling. - -## Chat UX Improvements - -- **ChatTab.tsx** — 502/503/timeout errors show user-friendly messages ("CEO is not responding. The agent container may not be running. Try restarting the workspace.") instead of raw API error dumps. Input disables after failure. Agent unreachable state shown in empty chat and placeholder. -- **ChatTab.tsx** — Agent and system messages now render markdown (bold, lists, code blocks, headers, tables) via `react-markdown` + `remark-gfm` + `@tailwindcss/typography`. User messages stay as plain text. - -## Workspace Config Cleanup - -- **`.gitignore`** — Added `workspace-configs-templates/ws-*` to exclude auto-generated provisioner instance configs (not templates, shouldn't be committed). -- Removed 15 stale `ws-*` instance directories from the templates folder. - -## Test Infrastructure - -- **test_api.sh** — Fixed degraded status test to re-register before high error rate heartbeat (avoids Redis TTL expiry race). -- **test_activity_e2e.sh** — Fixed assertion to match actual Go binding error field name (`ActivityType` not `activity_type`). -- Full clean-slate E2E verified: nuke → setup → create 11 workspaces → all online with HOST URLs → 21/21 tests pass (peer discovery, access control, chat, delegation, activity logs, current task, URL auto-resolution). - -## Code Review Round 2 (7 fixes) - -### Critical (2 fixed) - -1. **workspace.go** — `workspaceID[:12]` panics on IDs shorter than 12 chars. Added length guard matching `containerName()` pattern. -2. **discovery.go** — Fallback URL synthesis returned a Docker-internal URL even for non-existent or offline workspaces. Now checks workspace status (online/degraded) before constructing URL. - -### Warnings (3 fixed) - -3. **discovery.go** — `CacheInternalURL` error silently discarded (inconsistent with workspace.go). Added `log.Printf`. -4. **ChatTab.tsx** — `ReactMarkdown` rendered for both agent and system messages. System error messages (containing `*`, `#`, etc.) could produce unexpected formatting. Now only renders markdown for `role === "agent"`. -5. **ChatTab.tsx** — `thinkingStartTime` state used in `setInterval` closure was stale (captured before `setThinkingStartTime` applied). Replaced with ref + local variable captured at effect creation time. - -### Suggestions (2 fixed) - -6. **tailwind.config.ts** — `require("@tailwindcss/typography")` replaced with ESM `import typography` for consistency with TypeScript config. -7. **ci.yml** — CI Node.js bumped from 20 to 22 (LTS). Lock file (lockfileVersion 3, npm 11) had `@emnapi` resolution differences with Node 20's npm 10, causing `npm ci` to fail. - -## Code Review Round 3 (DRY + hardening) - -### Refactor: Exported `provisioner.ContainerName()` / `provisioner.InternalURL()` - -The `ws-:8000` URL construction was duplicated in discovery.go, workspace.go, and terminal.go. Exported the provisioner's existing helpers and replaced all inline duplications. Prevents drift if naming convention changes. - -### Fix: Discovery fall-through returned host URLs to container callers - -When a workspace-to-workspace discovery request hit a workspace that was offline/provisioning/failed, the code fell through to the external URL path and returned `http://127.0.0.1:` — unreachable from inside Docker. Now returns `503 workspace not available` (with status) or `404 workspace not found`. - -### Fix: Dead `thinkingStartRef` removed (ChatTab.tsx) - -Round 2 replaced `thinkingStartTime` state with a ref + local variable. The ref was written but never read — only the local `startTime` in the closure was used. Removed the dead ref entirely. - -### Fix: Terminal.go container name lookup - -Replaced inline `"ws-"+workspaceID[:12]` with `provisioner.ContainerName()`. Cached the result in a local `name` variable to avoid calling the function twice. - -### Hardening: `.gitignore` comprehensiveness - -Added 12 missing patterns: `.awareness/`, `**/.next/`, `mcp-server/dist/`, `dist/`, `.pytest_cache/`, `coverage/`, `.nyc_output/`, `*.db`/`*.sqlite*`, `postgres_data/`/`redis_data/`, `.env.production`, `*.bundle.json`. - -## CLI Executor Fixes - -### Fix: Claude Code exit code 1 with valid output - -Claude Code sometimes exits with code 1 but still produces valid output on stdout (e.g. MCP tool failures that don't prevent a response). The executor now accepts stdout output regardless of exit code (`if proc.returncode == 0 or stdout_text`). Also added detailed stderr/stdout logging on non-zero exit. - -### Fix: Empty description crashes AgentCard (main.py) - -Pydantic's `AgentCard` requires a non-null string for `description`. Auto-generated configs had `description: ""`. Fixed with `config.description or config.name`. - -### Fix: No timeout on A2A proxy and CLI executor - -Removed all artificial timeouts from the A2A proxy (`http.Client{}`), CLI executor (`timeout: 0` → `await proc.communicate()` without `wait_for`), and MCP delegation client (`httpx timeout=None`). Delegation chains (PM → Lead → Agent) can take arbitrarily long — agent liveness is monitored via heartbeat, not proxy deadlines. Proxy uses `context.WithoutCancel(ctx)` to survive client disconnect while still canceling on server shutdown. - -## Restart Handler Fixes - -### Fix: Template resolution by config.yaml name field - -`findTemplateByName("PM")` normalized to `"pm"` but the template dir is `org-pm`. Added a second pass that reads `config.yaml` files in template dirs and matches by the `name:` field. - -### Fix: Stale ws-* config dirs take precedence on restart - -A previous restart's `ensureDefaultConfig` created a `ws-/` dir with only `config.yaml` (wrong runtime, empty description). On next restart, the ownDir check found it and used it. Fixed: only use ownDir if it contains more than just `config.yaml` (meaning files were uploaded via the Files API). - -## Live Activity Feed (ChatTab) - -Replaced the fake rotating status messages ("Analyzing your request...", "Almost there...") with a **real-time activity feed** powered by WebSocket events: - -- Opens a dedicated WebSocket while `sending=true` -- Listens for `ACTIVITY_LOGGED` events across all workspaces -- Shows color-coded delegation progress: `→ Delegating to Marketing Lead...` (blue), `← Marketing Lead responded (42s)` (green), `⚠ error` (red) -- MCP server now reports `a2a_send` activity before each delegation call - -## WebSocket Health Check (socket.ts) - -Added periodic rehydration to the canvas WebSocket — if no events arrive for 30s, automatically re-fetches workspace state from the API. Prevents the canvas from showing stale offline status when agents recover between heartbeat cycles without a WebSocket event. - -## Shared Workspace Mount (WORKSPACE_DIR) - -Added `WORKSPACE_DIR` env var for the platform. When set, all provisioned workspace containers bind-mount the host directory as `/workspace` instead of using isolated Docker named volumes. This gives all agents read/write access to the same codebase. - -## Default Org Setup (setup-org.sh) - -Created `setup-org.sh` — reproducible script that creates the full 15-agent org hierarchy: -- PM → Marketing Lead (Content Writer, SEO Specialist, Social Media Manager) -- PM → Research Lead (Market Analyst, Technical Researcher, Competitive Intelligence) -- PM → Dev Lead (Frontend Engineer, Backend Engineer, DevOps Engineer, Security Auditor, QA Engineer) - -All agents use Claude Code runtime with shared OAuth token. Script also extracts the token from macOS keychain and distributes to all `org-*` templates. - -## Canvas Agent Task Visibility - -### Live current_task on workspace cards - -CLI executor now reports `current_task` via immediate heartbeat push when starting/finishing a request. The MCP server also pushes `current_task` when delegating. Each workspace card on the canvas shows an amber task banner with what the agent is currently working on — visible across the entire org chart in real time. - -- `heartbeat.py` — added `current_task` field to heartbeat payload -- `cli_executor.py` — calls `_set_current_task(summary)` on execute start, clears on finish via try/finally -- `a2a_mcp_server.py` — pushes `current_task` heartbeat alongside `report_activity` on delegation - -### Session continuity (Claude Code --resume) - -CLI executor now maintains conversation state across messages using Claude Code's `--resume` flag: -- First message: runs with `--output-format json` to capture `session_id` -- Subsequent messages: runs with `--resume ` to continue the conversation -- System prompt only injected on first message (resumed sessions already have it) - -### Chat input textarea - -Replaced single-line `` with auto-growing `