Go to file
Hongming Wang b4cd78729d
fix(platform-go-ci): align test mocks with schema drift + org_id context contract (#1755)
* fix(platform-go-ci): align test mocks with schema drift + org_id context contract

Reduces Platform (Go) CI failures from 12 to 2 (both remaining are pre-existing
on origin/main and unrelated to this PR's scope).

Schema drift fixes (sqlmock column counts misaligned with current prod Scans):
- `orgtoken/tokens_test.go`: Validate query gained `org_id` column post-migration
  036 — updated 3 TestValidate_* tests from 2-col to 3-col ExpectQuery.
- `handlers/handlers_test.go` + `_additional_test.go`: `scanWorkspaceRow` now
  has 21 cols (`max_concurrent_tasks` inserted between `active_tasks` and
  `last_error_rate`). Updated TestWorkspaceList, TestWorkspaceList_WithData,
  and TestWorkspaceGet_CurrentTask mocks.
- `handlers/handlers_test.go`: activity scan now has 14 cols (`tool_trace`
  between `response_body` and `duration_ms`). Updated 5 TestActivityHandler_*
  tests (List, ListByType, ListEmpty, ListCustomLimit, ListMaxLimit).

Middleware org_id contract (7 failing tests → passing, zero prod callers):
- `middleware/wsauth_middleware.go`: WorkspaceAuth and AdminAuth now set the
  `org_id` context key only when the token has a non-NULL org_id. This lets
  downstream handlers use `c.Get("org_id")` existence to distinguish anchored
  tokens from pre-migration/ADMIN_TOKEN bootstrap tokens. Grep confirmed no
  current prod callers read this key — tests were the sole spec.
- `middleware/wsauth_middleware_test.go` + `_org_id_test.go`: consolidated
  separate primary+secondary ExpectQuery blocks into a single 3-col mock
  per test, and dropped the now-unused `orgTokenOrgIDQuery` constant.

Other:
- `handlers/github_token_test.go`: TestGitHubToken_NoTokenProvider now asserts
  500 + "token refresh failed" (env-based fallback path added in #960/#1101).
  Added missing `strings` import.
- `handlers/handlers_additional_test.go`: TestRegister_ProvisionerURLPreserved
  URL changed from `http://agent:8000` to `http://localhost:8000` — `agent` is
  not DNS-resolvable in CI and is rejected by validateAgentURL's SSRF check;
  `localhost` is name-exempt. The contract under test is provisioner-URL
  precedence, not URL validation.

Methodology (per quality mandate):
- Baselined 12 failing tests on clean origin/main before any edit.
- For each fix: grep'd prod for semantic contract, made minimal edits,
  verified full-suite delta = zero regressions.
- Discovered +5 pre-existing failures previously masked by TestWorkspaceList
  panic (which killed the test binary on origin/main before downstream tests
  ran). 3 of these are in this PR's bug class and were fixed; 2 are unrelated
  (a panicking test with a missing Request and a missing template file) —
  deferred to a follow-up issue.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: trigger CI after base retarget to main

* fix(platform-go-ci): stop TestRequireCallerOwnsOrg_NotOrgTokenCaller panic + skip yaml-includes test

Reduces Platform (Go) CI failures from 2 to 1 on this branch.

- `TestRequireCallerOwnsOrg_NotOrgTokenCaller`: the test's comment says
  "set to a non-string type" but the code stored the string "something",
  which passed the `tokenID.(string)` assertion in requireCallerOwnsOrg
  and triggered a DB lookup on a bare gin test context (no Request) →
  nil-deref in c.Request.Context(). Fixed by storing an int (12345), which
  matches the stated intent of exercising the non-string-assertion branch.

- `TestResolveYAMLIncludes_RealMoleculeDev`: the in-tree copy at
  /org-templates/molecule-dev/ is being extracted to the standalone
  Molecule-AI/molecule-ai-org-template-molecule-dev repo. Until that
  extraction lands the in-tree copy is stale (teams/dev.yaml !include's
  core-platform.yaml etc. that don't exist). Skipped with a pointer to
  the extraction so this doesn't rot.

Remaining failure: `TestRequireCallerOwnsOrg_TokenHasMatchingOrgID` panics
with the same root cause (bare gin context + string org_token_id → DB
lookup → nil-deref). Fixing it by adding a Request would unmask ~25 other
pre-existing hidden failures (schema drift, DNS-dependent tests, mock
drift) that were being masked by the earlier panic killing the test
binary. Those belong to a dedicated cleanup PR; the panic-chain triage
is tracked separately.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(platform-go-ci): eliminate remaining 25 cascade failures + harden auth

Takes Platform (Go) CI from 1 remaining failure (post–first pass) to 0.
Fixing `TestRequireCallerOwnsOrg_NotOrgTokenCaller`'s panic unmasked ~25
pre-existing handler-package failures that were silently hidden because
the panic killed the test binary mid-run. All are now fixed.

## Prod change
`org_plugin_allowlist.go#requireOrgOwnership` now denies unanchored
org-tokens (org_id NULL in DB) instead of treating them as session/admin.
The stated contract in `requireCallerOwnsOrg`'s comment already said
"those callers get callerOrg="" and are denied"; the downstream check
was the gap. Distinguishes the two `callerOrg == ""` paths by reading
`c.Get("org_token_id")` — key present → unanchored token → deny;
absent → session/ADMIN_TOKEN → allow.

## Tests fixed by class

**Request-less test-context panic** (7 tests, `org_plugin_allowlist_test.go`):
added `httptest.NewRequest(...)` to each bare `gin.CreateTestContext` so
the DB path in `requireCallerOwnsOrg` can read `c.Request.Context()`
without nil-deref.

**Workspace scan drift — `max_concurrent_tasks` 21st column** (8 tests):
- `TestWorkspaceGet_Success`, `_FinancialFieldsStripped`, `_SensitiveFieldsStripped`
- `TestWorkspaceBudget_Get_NilLimit`, `_WithLimit` (+ shared `wsColumns`)
- `TestWorkspaceBudget_A2A_UnderLimitPassesThrough`, `_NilLimitPassesThrough`,
  `_DBErrorFailOpen` — each also needed `allowLoopbackForTest(t)` because
  the SSRF guard now blocks `httptest.NewServer`'s 127.0.0.1 URL.

**Org-token INSERT param drift — added `org_id` 5th param** (5 tests,
`org_tokens_test.go`): `TestOrgTokenHandler_Create_*` (4) get a 5th
`nil` `WithArgs` arg; `TestOrgTokenHandler_List_HappyPath` gets `org_id`
as the 4th column in its mock row.

**ReplaceFiles/WriteFile restart-cascade SELECT shape change** (3 tests,
`template_import_test.go` + `templates_test.go`): handler now selects
`name, instance_id, runtime` for the post-write restart cascade — tests
now pin the full 3-column shape instead of just `SELECT name`.

**GitHub webhook forwarding** (2 tests, `webhooks_test.go`): added
`allowLoopbackForTest(t)` — same SSRF-guard / loopback-server mismatch
as the budget A2A tests.

**DNS-dependent sentinel hostname** (2 tests): `TestIsSafeURL/public_*`
+ `TestValidateAgentURL/valid_public_*` used `agent.example.com` which
is NXDOMAIN on most resolvers; switched to `example.com` itself (RFC-2606,
resolves globally via Cloudflare Anycast).

**Register C18 hijack assertion** (`registry_test.go`): attacker URL
was `attacker.example.com` (NXDOMAIN) → `validateAgentURL` rejected
with 400 before the C18 auth gate could fire 401. Switched to
`example.com` so the test actually exercises the C18 gate.

**Plugin install error vocabulary** (`plugins_test.go`): handler now
returns generic "invalid plugin source" instead of leaking the internal
`ParseSource` "empty spec" string to the HTTP surface. Test assertion
updated; "empty spec" still covered at the unit level in `plugins/source_test.go`.

**seedInitialMemories tests tripping redactSecrets** (3 tests,
`workspace_provision_test.go`): content was `strings.Repeat("X", N)`
which matches the BASE64_BLOB redactor (33+ chars of `[A-Za-z0-9+/]`)
and got replaced with `[REDACTED:BASE64_BLOB]` before INSERT, making
the `WithArgs` assertion mismatch. Switched to a space-containing
`"hello world "` pattern that breaks the run. Also fixed an unrelated
pre-existing bug in `TestSeedInitialMemories_Truncation` where
`copy([]byte(largeContent), "X")` was a no-op (strings are immutable
in Go — the copy modified a throwaway slice).

Net: Platform (Go) handlers package is now fully green on `go test -race`.
Unblocks PRs #1738, #1743, and any future handlers-package work that was
inheriting the 12→25 baseline.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Hongming Wang <hongmingwang.rabbit@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 07:14:33 +00:00
.ci-trigger chore: PM-triggered CI re-run 2026-04-21 15:40:21 +00:00
.githooks chore: add mol_pk_ and cfut_ to pre-commit secret scanner 2026-04-18 07:38:48 -07:00
.github/workflows ci: canary-verify graceful-skip + draft auto-promote staging→main 2026-04-22 14:40:28 -07:00
canvas chore: extract ContextMenu Zustand fix + a2a_proxy local-docker SSRF bypass + workspace-server Dockerfile GID entrypoint 2026-04-22 20:00:16 -07:00
docs docs(marketing+research): move sensitive strategy + research to internal repo 2026-04-22 17:53:55 -07:00
infra fix: remaining platform/ path references in scripts, tests, compose 2026-04-18 00:32:03 -07:00
marketing docs(assets): add Phase 30 token lifecycle card + canvas fleet mockup 2026-04-21 12:12:17 +00:00
org-templates/molecule-dev chore: remove org-templates/molecule-dev from git tracking 2026-04-20 11:47:13 -07:00
research research: add enterprise-case-study-pipeline-targeting-brief.md 2026-04-21 16:46:57 +00:00
scripts security: remove hardcoded API keys from post-rebuild-setup.sh 2026-04-20 13:02:52 -07:00
tests fix(e2e): delegation raw curl missing X-Molecule-Org-Id 2026-04-21 10:41:17 -07:00
workspace feat(workspace): 45-min gh-token refresh daemon + credential helper cache 2026-04-22 19:52:46 -07:00
workspace-server fix(platform-go-ci): align test mocks with schema drift + org_id context contract (#1755) 2026-04-23 07:14:33 +00:00
.env.example Merge pull request #922 from Molecule-AI/infra/issue-894-anthropic-api-key-docs 2026-04-17 21:40:23 -07:00
.gitattributes chore: final open-source cleanup — binary, stale paths, private refs 2026-04-18 00:38:55 -07:00
.gitignore merge main into staging for #1070 promotion 2026-04-20 08:41:58 -07:00
.mcp.json.example fix(security): GLOBAL memory delimiter spoofing + pin MCP npm version 2026-04-18 11:09:24 -07:00
CODE_OF_CONDUCT.md chore: open-source preparation — scrub secrets, add community files 2026-04-18 00:10:56 -07:00
comment-1172.json docs(marketing): Phase 30 launch — content blog posts, DevRel assets, and execution suite 2026-04-21 06:22:27 +00:00
comment-1173.json docs(marketing): Phase 30 launch — content blog posts, DevRel assets, and execution suite 2026-04-21 06:22:27 +00:00
CONTRIBUTING.md fix(docs): update cd commands for workspace-server/ and workspace/ renames 2026-04-18 01:24:09 -07:00
docker-compose.infra.yml chore: open-source preparation — scrub secrets, add community files 2026-04-18 00:10:56 -07:00
docker-compose.yml fix(canvas): add NEXT_PUBLIC_ADMIN_TOKEN + CSP_DEV_MODE to docker-compose 2026-04-20 12:19:12 -07:00
LICENSE
manifest.json chore: final open-source cleanup — binary, stale paths, private refs 2026-04-18 00:38:55 -07:00
railway.toml fix: railway.toml buildContext must be repo root for workspace-server COPY paths 2026-04-18 00:29:38 -07:00
README.md fix(docs): update cd commands for workspace-server/ and workspace/ renames 2026-04-18 01:24:09 -07:00
README.zh-CN.md fix(docs): update cd commands for workspace-server/ and workspace/ renames 2026-04-18 01:24:09 -07:00
render.yaml chore: open-source restructure — rename dirs, remove internal files, scrub secrets 2026-04-18 00:24:44 -07:00
tick-reflections-temp.md tick: 2026-04-21 ~04:25Z — PR #1240 merged, PRs #1247/#1248 in flight, CI slow but active 2026-04-21 03:18:29 +00:00

Molecule AI Icon Logo

Molecule AI Text Logo

English | 中文

The Org-Native Control Plane For Heterogeneous AI Agent Teams

The world's most powerful governance platform for AI agent teams.

License: BSL 1.1

Go Version Python Version Next.js

Visual Canvas • Runtime Compatibility • Hierarchical Memory • Skill Evolution • Operational Guardrails

Docs HomeQuick StartArchitecturePlatform APIWorkspace Runtime

Deploy on Railway Deploy to Render


The Pitch

Molecule AI is the most powerful way to govern an AI agent organization in production.

It combines the parts that are usually scattered across demos, internal glue code, and framework-specific tooling into one product:

  • one org-native control plane for teams, roles, hierarchy, and lifecycle
  • one runtime layer that lets LangGraph, DeepAgents, Claude Code, CrewAI, AutoGen, and OpenClaw run side by side
  • one memory model that keeps recall, sharing, and skill evolution aligned with organizational boundaries
  • one operational surface for observing, pausing, restarting, inspecting, and improving live workspaces

Most teams can build a workflow, a strong single agent, a coding agent, or a custom multi-agent graph.

Very few teams can run all of that as a governed organization with clear structure, durable memory boundaries, and production operations.

That is the gap Molecule AI closes.

Why Molecule AI Feels Different

1. The node is a role, not a task

In Molecule AI, a workspace is an organizational role. That role can begin as one agent, later expand into a sub-team, and still keep the same external identity, hierarchy position, memory boundary, and A2A interface.

2. The org chart is the topology

You do not wire collaboration paths by hand. Hierarchy defines the default communication surface. The structure is not decorative UI. It is part of the operating model.

3. Runtime choice stops being a dead-end decision

LangGraph, DeepAgents, Claude Code, CrewAI, AutoGen, and OpenClaw can all plug into the same workspace abstraction. Teams can standardize governance without forcing every group onto one runtime.

4. Memory is treated like infrastructure

Molecule AI's HMA approach is designed around organizational boundaries, not just “store more context somewhere.” Durable recall, scoped sharing, awareness namespaces, and skill promotion are all part of one coherent system.

5. It comes with a real control plane

Registry, heartbeats, restart, pause/resume, activity logs, approvals, terminal access, files, traces, bundles, templates, and WebSocket fanout are not afterthoughts. They are first-class parts of the platform.

The Category Gap Molecule AI Fills

Category What it does well Where it breaks What Molecule AI adds
Workflow builders Visual task automation Nodes are tasks, not durable organizational roles Role-native workspaces, hierarchy, long-lived teams
Agent frameworks Strong runtime semantics Weak control plane and weak org-level operations Unified lifecycle, canvas, registry, policies, observability
Coding agents Excellent local execution Usually not designed as team infrastructure Workspace abstraction, A2A collaboration, platform ops
Custom multi-agent graphs Full flexibility Brittle topology and governance sprawl Standardized operating model without losing runtime freedom

What Makes Molecule AI Defensible

Advantage Why it matters in practice
Role-native workspace abstraction Your org structure survives model swaps, framework changes, and team expansion
Fractal team expansion A single specialist can become a managed department without breaking upstream integrations
Heterogeneous runtime compatibility Different teams can keep their preferred agent architecture while sharing one control plane
HMA + awareness namespaces Memory sharing follows hierarchy instead of leaking across the whole system
Skill evolution loop Durable successful workflows can graduate from memory into reusable, hot-reloadable skills
WebSocket-first operational UX The canvas reflects task state, structure changes, and A2A responses in near real time
Global secrets with local override Centralize provider access, then override only where a workspace needs specialized credentials

Runtime Compatibility, Compared

Molecule AI is not trying to replace the frameworks below. It is the system that makes them easier to run together.

Runtime / architecture Status in current repo Native strength What Molecule AI adds
LangGraph Shipping on main Graph control, tool use, Python extensibility Canvas orchestration, hierarchy routing, A2A, memory scopes, operational lifecycle
DeepAgents Shipping on main Deeper planning and decomposition Same workspace contract, team topology, activity stream, restart behavior
Claude Code Shipping on main Real coding workflows, CLI-native continuity Secure workspace abstraction, A2A delegation, org boundaries, shared control plane
CrewAI Shipping on main Role-based crews Persistent workspace identity, policy consistency, shared canvas and registry
AutoGen Shipping on main Assistant/tool orchestration Standardized deployment, hierarchy-aware collaboration, shared ops plane
OpenClaw Shipping on main CLI-native runtime with its own session model Workspace lifecycle, templates, activity logs, topology-aware collaboration
NemoClaw WIP on feat/nemoclaw-t4-docker NVIDIA-oriented runtime path Planned to join the same abstraction once merged; not yet part of main

This is the key idea: many agent runtimes, one organizational operating system.

Why The Memory Architecture Compounds

Most projects stop at “we added memory.” Molecule AI pushes further:

Conventional memory setup Molecule AI
Flat store or weak namespaces Hierarchy-aligned LOCAL, TEAM, GLOBAL scopes
Sharing is easy to overexpose Sharing is explicit and structure-aware
Memory and procedure get mixed together Memory stores durable facts; skills store repeatable procedure
Every agent can become over-privileged Workspace awareness namespaces reduce blast radius
UI memory and runtime memory blur together Separate surfaces for scoped agent memory, key/value workspace memory, and recall

The flywheel

Task execution
   -> durable insight captured in memory
   -> repeated success becomes a signal
   -> workflow promoted into a reusable skill
   -> skill hot-reloads into the runtime
   -> future work gets faster and more reliable

This is one of Molecule AI's strongest long-term advantages: the system can get more operationally capable without turning into one giant hidden prompt.

Self-Improving Agent Teams, Built Into Molecule AI

Most agent systems stop at "a smart runtime." Molecule AI pushes further: it gives teams a way to capture what worked, promote repeatable procedure into skills, reload those improvements into live workspaces, and keep the whole loop visible at the platform level.

Positioning lens Conventional self-improving agent pattern Molecule AI
Unit of improvement A single agent session or runtime A workspace, a team, and eventually the whole org graph
Operational surface Mostly hidden inside the agent loop Visible in the platform, Canvas, activity stream, memory surfaces, and runtime controls
Strategic outcome A smarter agent A compounding organization with durable knowledge and governed reusable skills

Where that shows up in Molecule AI

Core mechanism Molecule AI module(s) Why it matters
Durable memory that survives sessions workspace/builtin_tools/memory.py, workspace/builtin_tools/awareness_client.py, workspace-server/internal/handlers/memories.go Memory is not just durable, it is workspace-scoped and can route into awareness namespaces tied to the org structure
Cross-session recall workspace-server/internal/handlers/activity.go (/workspaces/:id/session-search) Recall spans both activity history and memory rows, so the system can search what happened and what was learned without inventing a separate hidden store
Skills built from experience workspace/builtin_tools/memory.py (_maybe_log_skill_promotion) Promotion from memory into a skill candidate is surfaced as an explicit platform activity, not a silent internal side effect
Skill improvement during use workspace/skill_loader/watcher.py, workspace/skill_loader/loader.py, workspace/main.py Skills hot-reload into the live runtime, so improvements become available on the next A2A task without restarting the workspace
Persistent skill lifecycle workspace-server/cmd/cli/cmd_agent_skill.go, workspace/plugins.py Skills are not just generated once; they can be audited, installed, published, shared, mounted by plugins, and governed as reusable operational assets

Why this matters in Molecule AI

  1. The learning loop is org-aware, not just session-aware. Memory can live at LOCAL, TEAM, or GLOBAL scope, and awareness namespaces give each workspace a durable identity boundary.

  2. The learning loop is visible to operators. Promotion events, activity logs, current-task updates, traces, and WebSocket fanout mean self-improvement is part of the control plane, not a hidden black box.

  3. The learning loop compounds across teams, not just one agent. A workflow learned by one workspace can become a governed skill, reload into the runtime, appear in the Agent Card, and become usable inside a larger organizational hierarchy.

The result is not just “an agent that learns.” It is an organization that gets more capable as its workspaces accumulate durable memory and reusable procedure.

What Ships In main

Canvas

  • Next.js 15 + React Flow + Zustand
  • drag-to-nest team building
  • empty-state deployment + onboarding wizard
  • template palette
  • bundle import/export
  • 10-tab side panel for chat, activity, details, skills, terminal, config, files, memory, traces, and events

Platform

  • Go/Gin control plane
  • workspace CRUD and provisioning
  • registry and heartbeats
  • browser-safe A2A proxy
  • team expansion/collapse
  • activity logs and approvals
  • secrets and global secrets
  • files API, terminal, bundles, templates, viewport persistence

Runtime

  • unified workspace/ image
  • adapter-driven execution
  • Agent Card registration
  • awareness-backed memory integration
  • plugin-mounted shared rules/skills
  • hot-reloadable local skills
  • coordinator-only delegation path

Ops

  • Langfuse traces
  • current-task reporting
  • pause/resume/restart flows
  • activity streaming
  • runtime tiers
  • direct workspace inspection through terminal and files

Built For Teams That Need More Than A Demo

Molecule AI is especially strong when you need to run:

  • AI engineering teams with PM / Dev Lead / QA / Research / Ops roles
  • mixed runtime organizations where one team prefers LangGraph and another prefers Claude Code
  • long-lived agent organizations that need memory boundaries and reusable procedures
  • internal platforms that want to expose agent teams as structured infrastructure, not ad hoc scripts

Architecture

Canvas (Next.js :3000)  <--HTTP / WS-->  Platform (Go :8080)  <---> Postgres + Redis
         |                                          |
         |                                          +--> Docker provisioner / bundles / templates / secrets
         |
         +-------------------- shows --------------------> workspaces, teams, tasks, traces, events

Workspace Runtime (Python image with adapters)
  - LangGraph / DeepAgents / Claude Code / CrewAI / AutoGen / OpenClaw
  - Agent Card + A2A server
  - heartbeat + activity + awareness-backed memory
  - skills + plugins + hot reload

Quick Start

git clone https://github.com/Molecule-AI/molecule-monorepo.git
cd molecule-monorepo

./infra/scripts/setup.sh
# Boots Postgres (:5432), Redis (:6379), Langfuse (:3001),
# and Temporal (:7233 gRPC, :8233 UI) on the shared
# `molecule-monorepo-net` Docker network. Temporal runs with
# no auth on localhost — dev-only; production must gate it.

cd workspace-server
go run ./cmd/server

cd ../canvas
npm install
npm run dev

Then open http://localhost:3000:

  1. Deploy a template or create a blank workspace from the empty state.
  2. Follow the onboarding guide into Config.
  3. Add a provider key in Secrets & API Keys.
  4. Open Chat and send the first task.

Documentation Map

Current Scope

The current main branch already includes the core platform, canvas, memory model, six production adapters, skill lifecycle, and operational surfaces. Adjacent runtime work such as NemoClaw remains branch-level until merged, and this README keeps that distinction explicit on purpose.

License

Business Source License 1.1 — copyright © 2025 Molecule AI.

Personal, internal, and non-commercial use is permitted without restriction. You may not use the Licensed Work to offer a competing product or service. On January 1, 2029, the license converts to Apache 2.0.