Go to file
claude-ceo-assistant 624ef4d06d
Some checks failed
E2E API Smoke Test / detect-changes (pull_request) Successful in 6s
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 5s
CI / Detect changes (pull_request) Successful in 5s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 5s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 6s
Harness Replays / detect-changes (pull_request) Successful in 5s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 5s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 2s
CI / Python Lint & Test (pull_request) Successful in 3s
CodeQL / Analyze (${{ matrix.language }}) (go) (pull_request) Failing after 52s
CodeQL / Analyze (${{ matrix.language }}) (python) (pull_request) Failing after 1m20s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 4s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Failing after 9s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Failing after 43s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 4s
CodeQL / Analyze (${{ matrix.language }}) (javascript-typescript) (pull_request) Failing after 1m20s
Harness Replays / Harness Replays (pull_request) Failing after 31s
CI / Platform (Go) (pull_request) Failing after 2m41s
CI / Canvas (Next.js) (pull_request) Failing after 2m42s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
E2E API Smoke Test / E2E API Smoke Test (pull_request) Failing after 5m56s
perf(workspace-server,canvas): EIC tunnel pool + canvas Promise.all (closes core#11)
## Symptom
Canvas detail-panel "config + filesystem load" took ~20s. Reported on
production hongming tenant, workspace c7c28c0b-... (Claude Code Agent T2).

## Two stacked latency sources

### 1. Server-side: per-call EIC tunnel setup (~80% of the win)

`workspace-server/internal/handlers/template_files_eic.go::realWithEICTunnel`
performed ssh-keygen + SendSSHPublicKey + open-tunnel + waitForPort PER call.
4 callers (read/write/list/delete) each paid the full ~3-5s setup cost even
when fired back-to-back on the same workspace EC2.

Fix: refcounted pool keyed on instanceID with TTL ≤ 50s (under the 60s
SendSSHPublicKey grant). One tunnel serves N file ops; concurrent acquires
for the same instance share the slot via a pendingSetups gate; LRU eviction
caps simultaneous tracked instances at 32. Poisons entries on tunnel-fatal
errors (connection refused, broken pipe, auth failed) so the next acquire
builds fresh. Cleanup on panic via defer-release pattern (added after
self-review caught a refcount-leak hazard).

Public API unchanged — `var withEICTunnel` rebinds to `pooledWithEICTunnel`
at package init, so all 4 callers inherit pooling for free.

10 unit tests pin: 4-ops-amortise (1 setup), different-instances-do-not-share,
TTL eviction, poison invalidates, concurrent-acquire-single-setup,
TTL=0 escape hatch, LRU eviction at cap, error classification heuristic,
refcount blocks expired eviction, panic poisons entry. All green.

### 2. Canvas-side: serial fan-out + duplicate fetch (~20% of the win)

`canvas/src/components/tabs/ConfigTab.tsx::loadConfig` awaited 3 independent
metadata GETs (`/workspaces/{id}`, `/model`, `/provider`) serially.
`AgentCardSection` fired a SECOND `/workspaces/{id}` from its own useEffect.

Fix: Promise.all over the 3 metadata GETs (each leg keeps its existing
.catch fallback semantics). AgentCardSection now reads `agentCard` from
the canvas store (`useCanvasStore`) instead of refetching — the canvas
already hydrates `node.data.agentCard` from the platform event stream.
Defensive selector handles test mocks without a `nodes` array.

## Verification

- `go test ./internal/handlers/` 5.07s green (full handlers package, including
  10 new pool tests)
- `go vet ./internal/handlers/` clean
- `npx vitest run` — 1380/1380 canvas unit tests pass (2 test FILES fail on
  a pre-existing xyflow CSS-load issue in vitest config, unrelated to this
  change)
- `npx tsc --noEmit` clean

Live wall-time verification deferred to Phase 4 / E2E (canvas browser session
required; external probe blocked by 403 since the canvas auth chain is
session-cookie + Origin header, not a bearer token I can fabricate).

## Backwards compatibility

API surface unchanged. All 4 EIC handler callers use the rebound var; no
caller migration. Pool defaults to enabled (TTL=50s); tests can disable by
setting poolTTL=0 or by overwriting withEICTunnel directly (existing stub
pattern in template_files_eic_dispatch_test.go preserved).

## Hostile self-review (3 weakest spots)

1. `fnErrIndicatesTunnelFault` is a substring grep on err.Error() — the
   marker list is hand-curated and ssh client error formats vary across
   OpenSSH versions. A future ssh that reports a tunnel failure via a
   phrasing not in the list would NOT poison the entry → next callers reuse
   a dead tunnel until TTL evicts. Acceptable: TTL bounds the impact (≤50s
   of bad reuse), and the heuristic covers every tunnel-error shape that
   appears in the existing test fixtures and known incidents.

2. `acquire`'s for-loop has unbounded retry potential under pathological
   churn (signal closed → new acquirer → setup fails → repeat). No bounded
   retry counter. Today there is no test exercise for "flaky setup that
   succeeds-then-fails-then-succeeds"; if observability ever shows this
   shape, add a max-retry guard. Filed as a known limitation, not blocking.

3. The substring assertion `strings.Contains` style I used for tunnel-fault
   classification could false-positive on app-level error messages that
   happen to contain "permission denied" or "broken pipe" verbatim. The
   classification test covers the discriminator but only against the
   error shapes we know today. Acceptable: poisoning errs on the side of
   building fresh, which is correct-but-slightly-slow rather than incorrect.

## Phase 4 / E2E plan

- Live timing of the canvas detail-panel open against a real workspace
  (browser session, not external probe).
- Target: perceived latency under 2s on warm pool. Cold open still pays
  one tunnel setup (~3-5s) — the pool buys you the SECOND through Nth
  panel-open within the TTL window.
- Memory `feedback_chase_verification_to_staging` applies — will not
  declare done at PR-merge; will follow through to user-visible behavior
  on staging.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 23:17:58 -07:00
.ci-trigger chore: PM-triggered CI re-run 2026-04-21 15:40:21 +00:00
.githooks secret-scan: align local pre-commit + extend drift lint (closes #1569 root) 2026-05-01 23:47:56 -07:00
.github ci(handlers-postgres-integration): apply legacy *.sql migrations too 2026-05-05 22:02:24 -07:00
canvas perf(workspace-server,canvas): EIC tunnel pool + canvas Promise.all (closes core#11) 2026-05-06 23:17:58 -07:00
docs docs: callout Python>=3.11 requirement on Universal MCP install snippet 2026-05-05 13:44:25 -07:00
infra fix(quickstart): wire up template/plugin registry via manifest.json 2026-04-23 14:55:34 -07:00
scripts feat(ops): hourly alarm for auto-promote PR stuck on REVIEW_REQUIRED (#2975) 2026-05-05 17:55:27 -07:00
tests test(e2e): Phase 3.5 — wheel parser classifies real server response (#2967) 2026-05-05 17:31:45 -07:00
tools fix(branch-protection): apply.sh respects live state + full-payload drift 2026-05-04 20:52:11 -07:00
workspace test(e2e): Phase 3.5 — wheel parser classifies real server response (#2967) 2026-05-05 17:31:45 -07:00
workspace-server perf(workspace-server,canvas): EIC tunnel pool + canvas Promise.all (closes core#11) 2026-05-06 23:17:58 -07:00
.coverage-allowlist.txt ci: fix regex + add coverage allowlist (14 known 0% critical paths) 2026-04-23 11:20:36 -07:00
.env.example feat(compose): IMAGE_AUTO_REFRESH=true by default in local dev (#2116) 2026-04-26 13:49:08 -07:00
.gitattributes chore(gitattributes): pin LF on snapshot golden files 2026-04-28 21:01:44 -07:00
.gitignore harness(phase-0): sudo-free Host-header path + chat_history + envelope replays 2026-05-01 20:12:49 -07:00
.mcp.json.example fix(security): GLOBAL memory delimiter spoofing + pin MCP npm version 2026-04-18 11:09:24 -07:00
CODE_OF_CONDUCT.md chore: open-source preparation — scrub secrets, add community files 2026-04-18 00:10:56 -07:00
CONTRIBUTING.md docs: surface molecule-mcp-claude-channel plugin in external-workspace creation + CONTRIBUTING 2026-04-29 11:33:31 -07:00
COVERAGE_FLOOR.md ci(coverage): per-file 75% floor for MCP/inbox/auth Python critical paths 2026-05-04 16:35:21 -07:00
docker-compose.infra.yml fix(quickstart): make README cp-paste flow bugless end-to-end (#1871) 2026-04-23 19:53:43 +00:00
docker-compose.yml feat(compose): IMAGE_AUTO_REFRESH=true by default in local dev (#2116) 2026-04-26 13:49:08 -07:00
LICENSE fix: replace residual "Agent Molecule" with "Molecule AI" in LICENSE 2026-04-13 13:06:21 -07:00
manifest.json manifest: re-add 5 workspace templates pruned by #2536 2026-05-03 05:43:07 -07:00
railway.toml fix: railway.toml buildContext must be repo root for workspace-server COPY paths 2026-04-18 00:29:38 -07:00
README.md docs(readme): fix clone + deploy URLs after molecule-core rename 2026-05-01 19:17:03 -07:00
README.zh-CN.md fix(quickstart): wire up template/plugin registry via manifest.json 2026-04-23 14:55:34 -07:00
render.yaml chore: open-source restructure — rename dirs, remove internal files, scrub secrets 2026-04-18 00:24:44 -07:00

Molecule AI Icon Logo

Molecule AI Text Logo

English | 中文

The Org-Native Control Plane For Heterogeneous AI Agent Teams

The world's most powerful governance platform for AI agent teams.

License: BSL 1.1

Go Version Python Version Next.js

Visual Canvas • Runtime Compatibility • Hierarchical Memory • Skill Evolution • Operational Guardrails

Docs HomeQuick StartArchitecturePlatform APIWorkspace Runtime

Deploy on Railway Deploy to Render


The Pitch

Molecule AI is the most powerful way to govern an AI agent organization in production.

It combines the parts that are usually scattered across demos, internal glue code, and framework-specific tooling into one product:

  • one org-native control plane for teams, roles, hierarchy, and lifecycle
  • one runtime layer that lets LangGraph, DeepAgents, Claude Code, CrewAI, AutoGen, and OpenClaw run side by side
  • one memory model that keeps recall, sharing, and skill evolution aligned with organizational boundaries
  • one operational surface for observing, pausing, restarting, inspecting, and improving live workspaces

Most teams can build a workflow, a strong single agent, a coding agent, or a custom multi-agent graph.

Very few teams can run all of that as a governed organization with clear structure, durable memory boundaries, and production operations.

That is the gap Molecule AI closes.

Why Molecule AI Feels Different

1. The node is a role, not a task

In Molecule AI, a workspace is an organizational role. That role can begin as one agent, later expand into a sub-team, and still keep the same external identity, hierarchy position, memory boundary, and A2A interface.

2. The org chart is the topology

You do not wire collaboration paths by hand. Hierarchy defines the default communication surface. The structure is not decorative UI. It is part of the operating model.

3. Runtime choice stops being a dead-end decision

LangGraph, DeepAgents, Claude Code, CrewAI, AutoGen, and OpenClaw can all plug into the same workspace abstraction. Teams can standardize governance without forcing every group onto one runtime.

4. Memory is treated like infrastructure

Molecule AI's HMA approach is designed around organizational boundaries, not just “store more context somewhere.” Durable recall, scoped sharing, awareness namespaces, and skill promotion are all part of one coherent system.

5. It comes with a real control plane

Registry, heartbeats, restart, pause/resume, activity logs, approvals, terminal access, files, traces, bundles, templates, and WebSocket fanout are not afterthoughts. They are first-class parts of the platform.

The Category Gap Molecule AI Fills

Category What it does well Where it breaks What Molecule AI adds
Workflow builders Visual task automation Nodes are tasks, not durable organizational roles Role-native workspaces, hierarchy, long-lived teams
Agent frameworks Strong runtime semantics Weak control plane and weak org-level operations Unified lifecycle, canvas, registry, policies, observability
Coding agents Excellent local execution Usually not designed as team infrastructure Workspace abstraction, A2A collaboration, platform ops
Custom multi-agent graphs Full flexibility Brittle topology and governance sprawl Standardized operating model without losing runtime freedom

What Makes Molecule AI Defensible

Advantage Why it matters in practice
Role-native workspace abstraction Your org structure survives model swaps, framework changes, and team expansion
Fractal team expansion A single specialist can become a managed department without breaking upstream integrations
Heterogeneous runtime compatibility Different teams can keep their preferred agent architecture while sharing one control plane
HMA + awareness namespaces Memory sharing follows hierarchy instead of leaking across the whole system
Skill evolution loop Durable successful workflows can graduate from memory into reusable, hot-reloadable skills
WebSocket-first operational UX The canvas reflects task state, structure changes, and A2A responses in near real time
Global secrets with local override Centralize provider access, then override only where a workspace needs specialized credentials

Runtime Compatibility, Compared

Molecule AI is not trying to replace the frameworks below. It is the system that makes them easier to run together.

Runtime / architecture Status in current repo Native strength What Molecule AI adds
LangGraph Shipping on main Graph control, tool use, Python extensibility Canvas orchestration, hierarchy routing, A2A, memory scopes, operational lifecycle
DeepAgents Shipping on main Deeper planning and decomposition Same workspace contract, team topology, activity stream, restart behavior
Claude Code Shipping on main Real coding workflows, CLI-native continuity Secure workspace abstraction, A2A delegation, org boundaries, shared control plane
CrewAI Shipping on main Role-based crews Persistent workspace identity, policy consistency, shared canvas and registry
AutoGen Shipping on main Assistant/tool orchestration Standardized deployment, hierarchy-aware collaboration, shared ops plane
OpenClaw Shipping on main CLI-native runtime with its own session model Workspace lifecycle, templates, activity logs, topology-aware collaboration
NemoClaw WIP on feat/nemoclaw-t4-docker NVIDIA-oriented runtime path Planned to join the same abstraction once merged; not yet part of main

This is the key idea: many agent runtimes, one organizational operating system.

Why The Memory Architecture Compounds

Most projects stop at “we added memory.” Molecule AI pushes further:

Conventional memory setup Molecule AI
Flat store or weak namespaces Hierarchy-aligned LOCAL, TEAM, GLOBAL scopes
Sharing is easy to overexpose Sharing is explicit and structure-aware
Memory and procedure get mixed together Memory stores durable facts; skills store repeatable procedure
Every agent can become over-privileged Workspace awareness namespaces reduce blast radius
UI memory and runtime memory blur together Separate surfaces for scoped agent memory, key/value workspace memory, and recall

The flywheel

Task execution
   -> durable insight captured in memory
   -> repeated success becomes a signal
   -> workflow promoted into a reusable skill
   -> skill hot-reloads into the runtime
   -> future work gets faster and more reliable

This is one of Molecule AI's strongest long-term advantages: the system can get more operationally capable without turning into one giant hidden prompt.

Self-Improving Agent Teams, Built Into Molecule AI

Most agent systems stop at "a smart runtime." Molecule AI pushes further: it gives teams a way to capture what worked, promote repeatable procedure into skills, reload those improvements into live workspaces, and keep the whole loop visible at the platform level.

Positioning lens Conventional self-improving agent pattern Molecule AI
Unit of improvement A single agent session or runtime A workspace, a team, and eventually the whole org graph
Operational surface Mostly hidden inside the agent loop Visible in the platform, Canvas, activity stream, memory surfaces, and runtime controls
Strategic outcome A smarter agent A compounding organization with durable knowledge and governed reusable skills

Where that shows up in Molecule AI

Core mechanism Molecule AI module(s) Why it matters
Durable memory that survives sessions workspace/builtin_tools/memory.py, workspace/builtin_tools/awareness_client.py, workspace-server/internal/handlers/memories.go Memory is not just durable, it is workspace-scoped and can route into awareness namespaces tied to the org structure
Cross-session recall workspace-server/internal/handlers/activity.go (/workspaces/:id/session-search) Recall spans both activity history and memory rows, so the system can search what happened and what was learned without inventing a separate hidden store
Skills built from experience workspace/builtin_tools/memory.py (_maybe_log_skill_promotion) Promotion from memory into a skill candidate is surfaced as an explicit platform activity, not a silent internal side effect
Skill improvement during use workspace/skill_loader/watcher.py, workspace/skill_loader/loader.py, workspace/main.py Skills hot-reload into the live runtime, so improvements become available on the next A2A task without restarting the workspace
Persistent skill lifecycle workspace-server/cmd/cli/cmd_agent_skill.go, workspace/plugins.py Skills are not just generated once; they can be audited, installed, published, shared, mounted by plugins, and governed as reusable operational assets

Why this matters in Molecule AI

  1. The learning loop is org-aware, not just session-aware. Memory can live at LOCAL, TEAM, or GLOBAL scope, and awareness namespaces give each workspace a durable identity boundary.

  2. The learning loop is visible to operators. Promotion events, activity logs, current-task updates, traces, and WebSocket fanout mean self-improvement is part of the control plane, not a hidden black box.

  3. The learning loop compounds across teams, not just one agent. A workflow learned by one workspace can become a governed skill, reload into the runtime, appear in the Agent Card, and become usable inside a larger organizational hierarchy.

The result is not just “an agent that learns.” It is an organization that gets more capable as its workspaces accumulate durable memory and reusable procedure.

What Ships In main

Canvas

  • Next.js 15 + React Flow + Zustand
  • drag-to-nest team building
  • empty-state deployment + onboarding wizard
  • template palette
  • bundle import/export
  • 10-tab side panel for chat, activity, details, skills, terminal, config, files, memory, traces, and events

Platform

  • Go/Gin control plane
  • workspace CRUD and provisioning
  • registry and heartbeats
  • browser-safe A2A proxy
  • team expansion/collapse
  • activity logs and approvals
  • secrets and global secrets
  • files API, terminal, bundles, templates, viewport persistence

Runtime

  • unified workspace/ image
  • adapter-driven execution
  • Agent Card registration
  • awareness-backed memory integration
  • plugin-mounted shared rules/skills
  • hot-reloadable local skills
  • coordinator-only delegation path

Ops

  • Langfuse traces
  • current-task reporting
  • pause/resume/restart flows
  • activity streaming
  • runtime tiers
  • direct workspace inspection through terminal and files

Built For Teams That Need More Than A Demo

Molecule AI is especially strong when you need to run:

  • AI engineering teams with PM / Dev Lead / QA / Research / Ops roles
  • mixed runtime organizations where one team prefers LangGraph and another prefers Claude Code
  • long-lived agent organizations that need memory boundaries and reusable procedures
  • internal platforms that want to expose agent teams as structured infrastructure, not ad hoc scripts

Architecture

Canvas (Next.js :3000)  <--HTTP / WS-->  Platform (Go :8080)  <---> Postgres + Redis
         |                                          |
         |                                          +--> Docker provisioner / bundles / templates / secrets
         |
         +-------------------- shows --------------------> workspaces, teams, tasks, traces, events

Workspace Runtime (Python image with adapters)
  - LangGraph / DeepAgents / Claude Code / CrewAI / AutoGen / OpenClaw
  - Agent Card + A2A server
  - heartbeat + activity + awareness-backed memory
  - skills + plugins + hot reload

Quick Start

git clone https://github.com/Molecule-AI/molecule-monorepo.git
cd molecule-monorepo

cp .env.example .env
# Defaults boot the stack locally out of the box. See .env.example for
# production hardening knobs (ADMIN_TOKEN, SECRETS_ENCRYPTION_KEY, etc.).

./infra/scripts/setup.sh
# Boots Postgres (:5432), Redis (:6379), Langfuse (:3001),
# and Temporal (:7233 gRPC, :8233 UI) on the shared
# `molecule-monorepo-net` Docker network. Temporal runs with
# no auth on localhost — dev-only; production must gate it.
#
# Also populates the template/plugin registry by cloning every repo
# listed in manifest.json into workspace-configs-templates/,
# org-templates/, and plugins/. Requires jq — install via
# `brew install jq` (macOS) or `apt install jq` (Debian). Idempotent:
# re-runs skip any target dir that's already populated.

cd workspace-server
go run ./cmd/server   # applies pending migrations on first boot

cd ../canvas
npm install
npm run dev

Then open http://localhost:3000:

  1. Deploy a template or create a blank workspace from the empty state.
  2. Follow the onboarding guide into Config.
  3. Add a provider key in Secrets & API Keys.
  4. Open Chat and send the first task.

Documentation Map

Current Scope

The current main branch already includes the core platform, canvas, memory model, six production adapters, skill lifecycle, and operational surfaces. Adjacent runtime work such as NemoClaw remains branch-level until merged, and this README keeps that distinction explicit on purpose.

License

Business Source License 1.1 — copyright © 2025 Molecule AI.

Personal, internal, and non-commercial use is permitted without restriction. You may not use the Licensed Work to offer a competing product or service. On January 1, 2029, the license converts to Apache 2.0.