molecule-core

Go to file

claude-ceo-assistant (Claude Opus 4.7 on Hongming's MacBook) 4b074f631b Some checks failed pr-guards / disable-auto-merge-on-push (pull_request) Failing after 0s Details Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Failing after 41s Details Harness Replays / Harness Replays (pull_request) Failing after 30s Details E2E API Smoke Test / E2E API Smoke Test (pull_request) Failing after 5m7s Details CI / Canvas Deploy Reminder (pull_request) Has been skipped Details CI / Platform (Go) (pull_request) Failing after 3m8s Details CodeQL / Analyze (${{ matrix.language }}) (javascript-typescript) (pull_request) Failing after 14m4s Details CodeQL / Analyze (${{ matrix.language }}) (go) (pull_request) Failing after 14m36s Details CodeQL / Analyze (${{ matrix.language }}) (python) (pull_request) Failing after 14m30s Details Block internal-flavored paths / Block forbidden paths (pull_request) Has been cancelled Details CI / Python Lint & Test (pull_request) Has been cancelled Details E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Has been cancelled Details CI / Canvas (Next.js) (pull_request) Has been cancelled Details Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Has been cancelled Details CI / Detect changes (pull_request) Has been cancelled Details Secret scan / Scan diff for credential-shaped strings (pull_request) Has been cancelled Details E2E API Smoke Test / detect-changes (pull_request) Has been cancelled Details Runtime PR-Built Compatibility / detect-changes (pull_request) Has been cancelled Details Harness Replays / detect-changes (pull_request) Has been cancelled Details Handlers Postgres Integration / detect-changes (pull_request) Has been cancelled Details E2E Staging Canvas (Playwright) / detect-changes (pull_request) Has been cancelled Details CI / Shellcheck (E2E scripts) (pull_request) Has been cancelled Details feat(provisioner): env-driven RegistryPrefix() for workspace template images (#6 ) Add MOLECULE_IMAGE_REGISTRY env var to override the registry prefix used by all workspace-template image references. Defaults to ghcr.io/molecule-ai (unchanged for OSS users); set to an ECR URI in production tenants when mirroring to AWS. Why this matters: GitHub suspended the Molecule-AI org on 2026-05-06 with no warning. Production tenants kept running because they had images cached locally, but any tenant restart (AWS health event, redeploy, OS reboot) would have failed at `docker pull ghcr.io/molecule-ai/...` because GHCR returned 401. This change introduces the seam needed to point new pulls at a registry we control (AWS ECR) by flipping a single env var on Railway. Design (RFC: molecule-ai/internal#6): - New `RegistryPrefix()` function in `provisioner/registry.go` reads MOLECULE_IMAGE_REGISTRY, falls back to "ghcr.io/molecule-ai". - New `RuntimeImage(runtime)` returns the canonical ref using the prefix. - `RuntimeImages` map computed at init via `computeRuntimeImages()` so existing callers that range over it still work. - `DefaultImage` likewise computed via `RuntimeImage(defaultRuntime)`. - `handlers.TemplateImageRef()` switched from hardcoded format string to `provisioner.RegistryPrefix()`. - `runtime_image_pin.go::resolveRuntimeImage()` automatically inherits the prefix change because it reads from `provisioner.RuntimeImages[]` and only re-formats the tag suffix to a digest pin. Alternatives rejected (see RFC): - Multi-registry fallback chain (try ECR, fall back to GHCR): GHCR is locked from outbound for our org, so the fallback never works for us. Adds code complexity for no benefit. - Hardcoded ECR-only switch: couples production code to a specific deployment environment. OSS users self-hosting Molecule would need the upstream GHCR. - Self-hosted Harbor / registry-on-Hetzner: adds a component to operate. Not justified at 3-tenant scale; AWS ECR is mature and IAM-integrated. Auth — deliberately NOT changed in this commit: - For GHCR, the existing `ghcrAuthHeader()` reads GHCR_USER/GHCR_TOKEN. - For ECR, EC2 user-data installs `amazon-ecr-credential-helper` and adds a `credHelpers` entry in `~/.docker/config.json` so the daemon resolves ECR credentials via the EC2 instance role on every pull. The Go code needs no auth change. This keeps the diff minimal. Backwards compatibility: - Additive: env unset → identical behavior to today (GHCR). - Existing tests reference literal `ghcr.io/molecule-ai/...` strings; they continue to pass under the default prefix. - `RuntimeImages` map preserved for callers that iterate it. - No interface, schema, API, or migration version bump needed. Security review: - No untrusted input: MOLECULE_IMAGE_REGISTRY is set at deploy time (Railway env, EC2 user-data), not by users. - No expanded data collection or logging changes. - No new permissions: ECR pull permission is a future user-data + IAM role change, separate from this code change. - Worst-case: an attacker who already compromises Railway can swap the registry prefix to a malicious URI — same blast radius as compromising Railway today, no expansion. Tests: - 9 new unit tests in `registry_test.go` covering: default fallback, env override, empty env, all 9 known runtimes, unknown runtime, override-applies-to-all, computeRuntimeImages map population, env reflection, alphabetical ordering pin. - All existing provisioner + handlers tests continue to pass. - Mutation-tested mentally: deleting `if v := os.Getenv(...)` makes TestRegistryPrefix_RespectsEnv fail. Deleting `for _, r := range knownRuntimes` makes TestRuntimeImage_AllKnownRuntimes fail. The test suite would catch a regression of the original failure mode. Rollout plan: this PR is safe to merge with no env change. Production cutover happens by setting MOLECULE_IMAGE_REGISTRY on Railway after the AWS ECR mirror is populated (separate ops change, tracked in issue #6 phases 3b–3f). Tracking: - RFC: molecule-ai/internal#6 - Tasks: #97 (ECR setup), #98 (CP fallback) - Tech debt: runbooks/hetzner-rollout-tech-debt-2026-05-06.md item 7 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>		2026-05-06 14:23:01 -07:00
.ci-trigger	chore: PM-triggered CI re-run	2026-04-21 15:40:21 +00:00
.githooks	secret-scan: align local pre-commit + extend drift lint (closes #1569 root)	2026-05-01 23:47:56 -07:00
.github	ci(handlers-postgres-integration): apply legacy *.sql migrations too	2026-05-05 22:02:24 -07:00
canvas	Merge staging into rfc-2991-pr-1 to clear BEHIND (post PR-2993 + PR-3005)	2026-05-05 21:24:20 -07:00
docs	docs: callout Python>=3.11 requirement on Universal MCP install snippet	2026-05-05 13:44:25 -07:00
infra	fix(quickstart): wire up template/plugin registry via manifest.json	2026-04-23 14:55:34 -07:00
scripts	feat(ops): hourly alarm for auto-promote PR stuck on REVIEW_REQUIRED (#2975 )	2026-05-05 17:55:27 -07:00
tests	test(e2e): Phase 3.5 — wheel parser classifies real server response (#2967 )	2026-05-05 17:31:45 -07:00
tools	fix(branch-protection): apply.sh respects live state + full-payload drift	2026-05-04 20:52:11 -07:00
workspace	test(e2e): Phase 3.5 — wheel parser classifies real server response (#2967 )	2026-05-05 17:31:45 -07:00
workspace-server	feat(provisioner): env-driven RegistryPrefix() for workspace template images (#6 )	2026-05-06 14:23:01 -07:00
.coverage-allowlist.txt	ci: fix regex + add coverage allowlist (14 known 0% critical paths)	2026-04-23 11:20:36 -07:00
.env.example	feat(compose): IMAGE_AUTO_REFRESH=true by default in local dev (#2116 )	2026-04-26 13:49:08 -07:00
.gitattributes	chore(gitattributes): pin LF on snapshot golden files	2026-04-28 21:01:44 -07:00
.gitignore	harness(phase-0): sudo-free Host-header path + chat_history + envelope replays	2026-05-01 20:12:49 -07:00
.mcp.json.example	fix(security): GLOBAL memory delimiter spoofing + pin MCP npm version	2026-04-18 11:09:24 -07:00
CODE_OF_CONDUCT.md	chore: open-source preparation — scrub secrets, add community files	2026-04-18 00:10:56 -07:00
CONTRIBUTING.md	docs: surface molecule-mcp-claude-channel plugin in external-workspace creation + CONTRIBUTING	2026-04-29 11:33:31 -07:00
COVERAGE_FLOOR.md	ci(coverage): per-file 75% floor for MCP/inbox/auth Python critical paths	2026-05-04 16:35:21 -07:00
docker-compose.infra.yml	fix(quickstart): make README cp-paste flow bugless end-to-end (#1871 )	2026-04-23 19:53:43 +00:00
docker-compose.yml	feat(compose): IMAGE_AUTO_REFRESH=true by default in local dev (#2116 )	2026-04-26 13:49:08 -07:00
LICENSE	fix: replace residual "Agent Molecule" with "Molecule AI" in LICENSE	2026-04-13 13:06:21 -07:00
manifest.json	manifest: re-add 5 workspace templates pruned by #2536	2026-05-03 05:43:07 -07:00
railway.toml	fix: railway.toml buildContext must be repo root for workspace-server COPY paths	2026-04-18 00:29:38 -07:00
README.md	docs(readme): fix clone + deploy URLs after molecule-core rename	2026-05-01 19:17:03 -07:00
README.zh-CN.md	fix(quickstart): wire up template/plugin registry via manifest.json	2026-04-23 14:55:34 -07:00
render.yaml	chore: open-source restructure — rename dirs, remove internal files, scrub secrets	2026-04-18 00:24:44 -07:00

README.md

English | 中文

The Org-Native Control Plane For Heterogeneous AI Agent Teams

The world's most powerful governance platform for AI agent teams.

Visual Canvas • Runtime Compatibility • Hierarchical Memory • Skill Evolution • Operational Guardrails

Docs Home • Quick Start • Architecture • Platform API • Workspace Runtime

The Pitch

Molecule AI is the most powerful way to govern an AI agent organization in production.

It combines the parts that are usually scattered across demos, internal glue code, and framework-specific tooling into one product:

one org-native control plane for teams, roles, hierarchy, and lifecycle
one runtime layer that lets LangGraph, DeepAgents, Claude Code, CrewAI, AutoGen, and OpenClaw run side by side
one memory model that keeps recall, sharing, and skill evolution aligned with organizational boundaries
one operational surface for observing, pausing, restarting, inspecting, and improving live workspaces

Most teams can build a workflow, a strong single agent, a coding agent, or a custom multi-agent graph.

Very few teams can run all of that as a governed organization with clear structure, durable memory boundaries, and production operations.

That is the gap Molecule AI closes.

Why Molecule AI Feels Different

1. The node is a role, not a task

In Molecule AI, a workspace is an organizational role. That role can begin as one agent, later expand into a sub-team, and still keep the same external identity, hierarchy position, memory boundary, and A2A interface.

2. The org chart is the topology

You do not wire collaboration paths by hand. Hierarchy defines the default communication surface. The structure is not decorative UI. It is part of the operating model.

3. Runtime choice stops being a dead-end decision

LangGraph, DeepAgents, Claude Code, CrewAI, AutoGen, and OpenClaw can all plug into the same workspace abstraction. Teams can standardize governance without forcing every group onto one runtime.

4. Memory is treated like infrastructure

Molecule AI's HMA approach is designed around organizational boundaries, not just “store more context somewhere.” Durable recall, scoped sharing, awareness namespaces, and skill promotion are all part of one coherent system.

5. It comes with a real control plane

Registry, heartbeats, restart, pause/resume, activity logs, approvals, terminal access, files, traces, bundles, templates, and WebSocket fanout are not afterthoughts. They are first-class parts of the platform.

The Category Gap Molecule AI Fills

Category	What it does well	Where it breaks	What Molecule AI adds
Workflow builders	Visual task automation	Nodes are tasks, not durable organizational roles	Role-native workspaces, hierarchy, long-lived teams
Agent frameworks	Strong runtime semantics	Weak control plane and weak org-level operations	Unified lifecycle, canvas, registry, policies, observability
Coding agents	Excellent local execution	Usually not designed as team infrastructure	Workspace abstraction, A2A collaboration, platform ops
Custom multi-agent graphs	Full flexibility	Brittle topology and governance sprawl	Standardized operating model without losing runtime freedom

What Makes Molecule AI Defensible

Advantage	Why it matters in practice
Role-native workspace abstraction	Your org structure survives model swaps, framework changes, and team expansion
Fractal team expansion	A single specialist can become a managed department without breaking upstream integrations
Heterogeneous runtime compatibility	Different teams can keep their preferred agent architecture while sharing one control plane
HMA + awareness namespaces	Memory sharing follows hierarchy instead of leaking across the whole system
Skill evolution loop	Durable successful workflows can graduate from memory into reusable, hot-reloadable skills
WebSocket-first operational UX	The canvas reflects task state, structure changes, and A2A responses in near real time
Global secrets with local override	Centralize provider access, then override only where a workspace needs specialized credentials

Runtime Compatibility, Compared

Molecule AI is not trying to replace the frameworks below. It is the system that makes them easier to run together.

Runtime / architecture	Status in current repo	Native strength	What Molecule AI adds
LangGraph	Shipping on `main`	Graph control, tool use, Python extensibility	Canvas orchestration, hierarchy routing, A2A, memory scopes, operational lifecycle
DeepAgents	Shipping on `main`	Deeper planning and decomposition	Same workspace contract, team topology, activity stream, restart behavior
Claude Code	Shipping on `main`	Real coding workflows, CLI-native continuity	Secure workspace abstraction, A2A delegation, org boundaries, shared control plane
CrewAI	Shipping on `main`	Role-based crews	Persistent workspace identity, policy consistency, shared canvas and registry
AutoGen	Shipping on `main`	Assistant/tool orchestration	Standardized deployment, hierarchy-aware collaboration, shared ops plane
OpenClaw	Shipping on `main`	CLI-native runtime with its own session model	Workspace lifecycle, templates, activity logs, topology-aware collaboration
NemoClaw	WIP on `feat/nemoclaw-t4-docker`	NVIDIA-oriented runtime path	Planned to join the same abstraction once merged; not yet part of `main`

This is the key idea: many agent runtimes, one organizational operating system.

Why The Memory Architecture Compounds

Most projects stop at “we added memory.” Molecule AI pushes further:

Conventional memory setup	Molecule AI
Flat store or weak namespaces	Hierarchy-aligned `LOCAL`, `TEAM`, `GLOBAL` scopes
Sharing is easy to overexpose	Sharing is explicit and structure-aware
Memory and procedure get mixed together	Memory stores durable facts; skills store repeatable procedure
Every agent can become over-privileged	Workspace awareness namespaces reduce blast radius
UI memory and runtime memory blur together	Separate surfaces for scoped agent memory, key/value workspace memory, and recall

The flywheel

Task execution
   -> durable insight captured in memory
   -> repeated success becomes a signal
   -> workflow promoted into a reusable skill
   -> skill hot-reloads into the runtime
   -> future work gets faster and more reliable

This is one of Molecule AI's strongest long-term advantages: the system can get more operationally capable without turning into one giant hidden prompt.

Self-Improving Agent Teams, Built Into Molecule AI

Most agent systems stop at "a smart runtime." Molecule AI pushes further: it gives teams a way to capture what worked, promote repeatable procedure into skills, reload those improvements into live workspaces, and keep the whole loop visible at the platform level.

Positioning lens	Conventional self-improving agent pattern	Molecule AI
Unit of improvement	A single agent session or runtime	A workspace, a team, and eventually the whole org graph
Operational surface	Mostly hidden inside the agent loop	Visible in the platform, Canvas, activity stream, memory surfaces, and runtime controls
Strategic outcome	A smarter agent	A compounding organization with durable knowledge and governed reusable skills

Where that shows up in Molecule AI

Core mechanism	Molecule AI module(s)	Why it matters
Durable memory that survives sessions	`workspace/builtin_tools/memory.py`, `workspace/builtin_tools/awareness_client.py`, `workspace-server/internal/handlers/memories.go`	Memory is not just durable, it is workspace-scoped and can route into awareness namespaces tied to the org structure
Cross-session recall	`workspace-server/internal/handlers/activity.go` (`/workspaces/:id/session-search`)	Recall spans both activity history and memory rows, so the system can search what happened and what was learned without inventing a separate hidden store
Skills built from experience	`workspace/builtin_tools/memory.py` (`_maybe_log_skill_promotion`)	Promotion from memory into a skill candidate is surfaced as an explicit platform activity, not a silent internal side effect
Skill improvement during use	`workspace/skill_loader/watcher.py`, `workspace/skill_loader/loader.py`, `workspace/main.py`	Skills hot-reload into the live runtime, so improvements become available on the next A2A task without restarting the workspace
Persistent skill lifecycle	`workspace-server/cmd/cli/cmd_agent_skill.go`, `workspace/plugins.py`	Skills are not just generated once; they can be audited, installed, published, shared, mounted by plugins, and governed as reusable operational assets

Why this matters in Molecule AI

The learning loop is org-aware, not just session-aware. Memory can live at LOCAL, TEAM, or GLOBAL scope, and awareness namespaces give each workspace a durable identity boundary.
The learning loop is visible to operators. Promotion events, activity logs, current-task updates, traces, and WebSocket fanout mean self-improvement is part of the control plane, not a hidden black box.
The learning loop compounds across teams, not just one agent. A workflow learned by one workspace can become a governed skill, reload into the runtime, appear in the Agent Card, and become usable inside a larger organizational hierarchy.

The result is not just “an agent that learns.” It is an organization that gets more capable as its workspaces accumulate durable memory and reusable procedure.

What Ships In `main`

Canvas

Next.js 15 + React Flow + Zustand
drag-to-nest team building
empty-state deployment + onboarding wizard
template palette
bundle import/export
10-tab side panel for chat, activity, details, skills, terminal, config, files, memory, traces, and events

Platform

Go/Gin control plane
workspace CRUD and provisioning
registry and heartbeats
browser-safe A2A proxy
team expansion/collapse
activity logs and approvals
secrets and global secrets
files API, terminal, bundles, templates, viewport persistence

Runtime

unified workspace/ image
adapter-driven execution
Agent Card registration
awareness-backed memory integration
plugin-mounted shared rules/skills
hot-reloadable local skills
coordinator-only delegation path

Ops

Langfuse traces
current-task reporting
pause/resume/restart flows
activity streaming
runtime tiers
direct workspace inspection through terminal and files

Built For Teams That Need More Than A Demo

Molecule AI is especially strong when you need to run:

AI engineering teams with PM / Dev Lead / QA / Research / Ops roles
mixed runtime organizations where one team prefers LangGraph and another prefers Claude Code
long-lived agent organizations that need memory boundaries and reusable procedures
internal platforms that want to expose agent teams as structured infrastructure, not ad hoc scripts

Architecture

Canvas (Next.js :3000)  <--HTTP / WS-->  Platform (Go :8080)  <---> Postgres + Redis
         |                                          |
         |                                          +--> Docker provisioner / bundles / templates / secrets
         |
         +-------------------- shows --------------------> workspaces, teams, tasks, traces, events

Workspace Runtime (Python image with adapters)
  - LangGraph / DeepAgents / Claude Code / CrewAI / AutoGen / OpenClaw
  - Agent Card + A2A server
  - heartbeat + activity + awareness-backed memory
  - skills + plugins + hot reload

Quick Start

git clone https://github.com/Molecule-AI/molecule-monorepo.git
cd molecule-monorepo

cp .env.example .env
# Defaults boot the stack locally out of the box. See .env.example for
# production hardening knobs (ADMIN_TOKEN, SECRETS_ENCRYPTION_KEY, etc.).

./infra/scripts/setup.sh
# Boots Postgres (:5432), Redis (:6379), Langfuse (:3001),
# and Temporal (:7233 gRPC, :8233 UI) on the shared
# `molecule-monorepo-net` Docker network. Temporal runs with
# no auth on localhost — dev-only; production must gate it.
#
# Also populates the template/plugin registry by cloning every repo
# listed in manifest.json into workspace-configs-templates/,
# org-templates/, and plugins/. Requires jq — install via
# `brew install jq` (macOS) or `apt install jq` (Debian). Idempotent:
# re-runs skip any target dir that's already populated.

cd workspace-server
go run ./cmd/server   # applies pending migrations on first boot

cd ../canvas
npm install
npm run dev

Then open http://localhost:3000:

Deploy a template or create a blank workspace from the empty state.
Follow the onboarding guide into Config.
Add a provider key in Secrets & API Keys.
Open Chat and send the first task.

Documentation Map

Docs Home
Quick Start
Product Overview
System Architecture
Memory Architecture
Platform API
Workspace Runtime
Canvas UI
Local Development
Backend Parity Matrix — Docker vs EC2 feature parity tracker
Testing Strategy — tiered coverage floors, not blanket 100%
PR Hygiene — small PRs, clean branches, cherry-pick on drift
Engineering Postmortems — architecture + testing lessons from real incidents
Ecosystem Watch — adjacent projects we track (Holaboss, Hermes, gstack, …)
Glossary — how we use "harness", "workspace", "plugin", "flow" vs. ecosystem neighbors

Current Scope

The current main branch already includes the core platform, canvas, memory model, six production adapters, skill lifecycle, and operational surfaces. Adjacent runtime work such as NemoClaw remains branch-level until merged, and this README keeps that distinction explicit on purpose.

License

Personal, internal, and non-commercial use is permitted without restriction. You may not use the Licensed Work to offer a competing product or service. On January 1, 2029, the license converts to Apache 2.0.