Compare commits

..

24 Commits

Author SHA1 Message Date
claude-ceo-assistant b73d3bfff2 fix(ci): mark CodeQL continue-on-error (advisory only) — closes #156
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 14s
Check merge_group trigger on required workflows / Required workflows have merge_group trigger (pull_request) Successful in 16s
CI / Detect changes (pull_request) Successful in 18s
E2E API Smoke Test / detect-changes (pull_request) Successful in 23s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 18s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 17s
pr-guards / disable-auto-merge-on-push (pull_request) Failing after 5s
Retarget main PRs to staging / Retarget to staging (pull_request) Has been skipped
Lint curl status-code capture / Scan workflows for curl status-capture pollution (pull_request) Successful in 9s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 11s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 11s
CodeQL / Analyze (${{ matrix.language }}) (go) (pull_request) Failing after 2m12s
CodeQL / Analyze (${{ matrix.language }}) (javascript-typescript) (pull_request) Failing after 2m13s
CodeQL / Analyze (${{ matrix.language }}) (python) (pull_request) Failing after 2m14s
CI / Platform (Go) (pull_request) Successful in 9s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 7s
CI / Canvas (Next.js) (pull_request) Successful in 11s
CI / Python Lint & Test (pull_request) Successful in 8s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 11s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 14s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 21s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 40s
2026-05-07 17:26:52 +00:00
hongming 51ea86e3ec feat: mock runtime + mock-bigorg 200-workspace org (#34)
Block internal-flavored paths / Block forbidden paths (push) Successful in 7s
CI / Detect changes (push) Successful in 10s
Auto-sync main → staging / sync-staging (push) Failing after 12s
E2E API Smoke Test / detect-changes (push) Successful in 13s
Handlers Postgres Integration / detect-changes (push) Successful in 13s
E2E Staging Canvas (Playwright) / detect-changes (push) Successful in 15s
Harness Replays / detect-changes (push) Successful in 13s
CI / Shellcheck (E2E scripts) (push) Successful in 6s
Runtime PR-Built Compatibility / detect-changes (push) Successful in 12s
Secret scan / Scan diff for credential-shaped strings (push) Successful in 12s
CI / Python Lint & Test (push) Successful in 9s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (push) Successful in 8s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (push) Successful in 7s
CI / Canvas (Next.js) (push) Successful in 56s
Harness Replays / Harness Replays (push) Failing after 47s
CI / Canvas Deploy Reminder (push) Has been skipped
CodeQL / Analyze (${{ matrix.language }}) (go) (push) Failing after 1m37s
CodeQL / Analyze (${{ matrix.language }}) (javascript-typescript) (push) Failing after 1m46s
CodeQL / Analyze (${{ matrix.language }}) (python) (push) Failing after 1m45s
E2E API Smoke Test / E2E API Smoke Test (push) Failing after 2m32s
Handlers Postgres Integration / Handlers Postgres Integration (push) Successful in 2m43s
publish-workspace-server-image / build-and-push (push) Failing after 3m54s
CI / Platform (Go) (push) Successful in 4m16s
Canary — staging SaaS smoke (every 30 min) / Canary smoke (push) Failing after 10s
Demo Mock #3 — see PR for details. Admin-merged, CI skipped per Hongming directive.
2026-05-07 15:41:06 +00:00
Hongming Wang d64641904f feat(workspace-server): mock runtime + mock-bigorg org template
Retarget main PRs to staging / Retarget to staging (pull_request) Has been skipped
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 5s
cascade-list-drift-gate / check (pull_request) Successful in 5s
CI / Detect changes (pull_request) Successful in 6s
E2E API Smoke Test / detect-changes (pull_request) Successful in 8s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 7s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 9s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 8s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 8s
Harness Replays / detect-changes (pull_request) Successful in 9s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 6s
CI / Python Lint & Test (pull_request) Successful in 6s
CI / Canvas (Next.js) (pull_request) Successful in 8s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 10s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 12s
Harness Replays / Harness Replays (pull_request) Successful in 8s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 8s
CodeQL / Analyze (${{ matrix.language }}) (javascript-typescript) (pull_request) Failing after 1m30s
CodeQL / Analyze (${{ matrix.language }}) (python) (pull_request) Failing after 1m36s
CodeQL / Analyze (${{ matrix.language }}) (go) (pull_request) Failing after 1m39s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Failing after 2m50s
CI / Platform (Go) (pull_request) Successful in 4m29s
Adds a 'mock' runtime: virtual workspaces with no container, no EC2,
no LLM. Every A2A reply is synthesised from a small canned-variant
pool ('On it!', 'Got it, on it now.', etc.) deterministically seeded
by (workspace_id, request_id).

Built for funding-demo "200-workspace mock org" — renders an
enterprise-scale org chart on the canvas (CEO/VPs/Managers/ICs)
without burning real LLM credits or provisioning 200 EC2 instances.

Surfaces:
  - workspace-server/internal/handlers/mock_runtime.go: A2A proxy
    short-circuit, canned-reply pool, deterministic variant pick.
  - workspace-server/internal/handlers/a2a_proxy.go: gate the
    short-circuit before resolveAgentURL (mock has no URL).
  - workspace-server/internal/handlers/org_import.go: skip Docker
    provisioning for mock workspaces, set status='online' directly,
    drop the per-sibling 2s pacing for mock children (collapses
    a 200-workspace import from ~7min → ~1s).
  - workspace-server/internal/handlers/runtime_registry.go: register
    'mock' in the runtime allowlist (manifest + fallback set).
  - workspace-server/internal/registry/healthsweep.go +
    orphan_sweeper.go: skip mock workspaces in container-health and
    stale-token sweeps (no container by design).
  - workspace-server/internal/handlers/workspace_restart.go: mirror
    the 'external' Restart no-op for mock.
  - manifest.json: register the new
    Molecule-AI/molecule-ai-org-template-mock-bigorg repo.

Tests: 5 new in mock_runtime_test.go covering happy-path, non-mock
regression guard, determinism, IsMockRuntime trim/case, JSON-RPC
id echo. All existing handler + registry tests still pass.

Local-verified: imported the 200-workspace template against a fresh
postgres+redis, confirmed all 200 land in 'online' and stay there
through the 30s health-sweep window, exercised A2A on CEO + VPs +
Managers + ICs and saw the variant pool rotate.

Org template lives at
Molecule-AI/molecule-ai-org-template-mock-bigorg (created today)
and is imported via the existing /org/import flow on the canvas
Template Palette.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 08:40:37 -07:00
claude-ceo-assistant 70104d1cef Merge pull request #33 from molecule-ai/feat/demo-mock-1-purchase-success-modal
Block internal-flavored paths / Block forbidden paths (push) Successful in 10s
E2E Staging Canvas (Playwright) / detect-changes (push) Successful in 13s
E2E API Smoke Test / detect-changes (push) Successful in 13s
CI / Detect changes (push) Successful in 17s
Auto-sync main → staging / sync-staging (push) Failing after 19s
Handlers Postgres Integration / detect-changes (push) Successful in 14s
Harness Replays / detect-changes (push) Successful in 14s
Secret scan / Scan diff for credential-shaped strings (push) Successful in 11s
Runtime PR-Built Compatibility / detect-changes (push) Successful in 14s
E2E API Smoke Test / E2E API Smoke Test (push) Successful in 9s
CI / Platform (Go) (push) Successful in 5s
CI / Shellcheck (E2E scripts) (push) Successful in 4s
CI / Python Lint & Test (push) Successful in 6s
Handlers Postgres Integration / Handlers Postgres Integration (push) Successful in 7s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (push) Successful in 4s
Harness Replays / Harness Replays (push) Failing after 38s
publish-workspace-server-image / build-and-push (push) Failing after 1m11s
CodeQL / Analyze (${{ matrix.language }}) (go) (push) Failing after 1m34s
CodeQL / Analyze (${{ matrix.language }}) (javascript-typescript) (push) Failing after 1m38s
CodeQL / Analyze (${{ matrix.language }}) (python) (push) Failing after 1m38s
CI / Canvas (Next.js) (push) Failing after 2m20s
CI / Canvas Deploy Reminder (push) Has been skipped
E2E Staging Canvas (Playwright) / Canvas tabs E2E (push) Successful in 5m4s
feat(canvas): demo Mock #1 — purchase-success modal

Per Hongming directive: skip CI for 2h, admin-merge for funding demo.
2026-05-07 15:32:55 +00:00
RenoStarsAI-production-client a37a4a6e40 feat(canvas): demo Mock #1 — purchase-success modal on URL flag
Retarget main PRs to staging / Retarget to staging (pull_request) Has been skipped
Check merge_group trigger on required workflows / Required workflows have merge_group trigger (pull_request) Successful in 5s
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 6s
CI / Detect changes (pull_request) Successful in 6s
E2E API Smoke Test / detect-changes (pull_request) Successful in 9s
Lint curl status-code capture / Scan workflows for curl status-capture pollution (pull_request) Successful in 9s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 10s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 10s
Harness Replays / detect-changes (pull_request) Successful in 11s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 11s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 11s
CI / Platform (Go) (pull_request) Successful in 7s
CI / Python Lint & Test (pull_request) Successful in 7s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 18s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 10s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 15s
Ops Scripts Tests / Ops scripts (unittest) (pull_request) Successful in 42s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 11s
Harness Replays / Harness Replays (pull_request) Failing after 41s
CodeQL / Analyze (${{ matrix.language }}) (javascript-typescript) (pull_request) Failing after 1m36s
CodeQL / Analyze (${{ matrix.language }}) (go) (pull_request) Failing after 1m39s
CodeQL / Analyze (${{ matrix.language }}) (python) (pull_request) Failing after 1m40s
CI / Canvas (Next.js) (pull_request) Failing after 2m38s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 5m18s
Funding-demo Mock #1: when the canvas loads with `?purchase_success=1`,
show a centred success modal in the warm-paper theme. Auto-dismisses
after 5s; Close button + Esc + backdrop click also dismiss; URL params
are stripped on first paint so a refresh after dismiss does not
re-trigger.

Mounted in `app/layout.tsx` (not `app/page.tsx`) so the modal persists
across the canvas page-state transitions (loading → hydrated → error)
without unmounting and losing its open-state.

No real billing logic — the marketplace "Purchase" button on the
landing page redirects here with the flag; this modal is the only
thing the user sees of the "transaction".

Local-verified end-to-end via playwright (5/5 tests pass): redirect
URL shape, modal visibility, URL cleanup, close button, refresh-after-
dismiss behaviour, 5s auto-dismiss.

Pairs with the Purchase button added to landingpage Marketplace
section.
2026-05-07 08:32:35 -07:00
claude-ceo-assistant 85b09659e6 Merge pull request 'fix(ci): add scripts/** to publish-workspace-server-image path filter' (#32) from fix/publish-path-filter-add-scripts into main
Check merge_group trigger on required workflows / Required workflows have merge_group trigger (push) Successful in 6s
Block internal-flavored paths / Block forbidden paths (push) Successful in 6s
Auto-sync main → staging / sync-staging (push) Failing after 10s
CI / Detect changes (push) Successful in 8s
E2E Staging Canvas (Playwright) / detect-changes (push) Successful in 9s
E2E API Smoke Test / detect-changes (push) Successful in 9s
Lint curl status-code capture / Scan workflows for curl status-capture pollution (push) Successful in 6s
Handlers Postgres Integration / detect-changes (push) Successful in 7s
Secret scan / Scan diff for credential-shaped strings (push) Successful in 6s
Runtime PR-Built Compatibility / detect-changes (push) Successful in 8s
CI / Shellcheck (E2E scripts) (push) Successful in 5s
CI / Platform (Go) (push) Successful in 6s
CI / Python Lint & Test (push) Successful in 5s
E2E API Smoke Test / E2E API Smoke Test (push) Successful in 5s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (push) Successful in 5s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (push) Successful in 4s
Handlers Postgres Integration / Handlers Postgres Integration (push) Successful in 5s
CI / Canvas (Next.js) (push) Successful in 48s
CI / Canvas Deploy Reminder (push) Has been skipped
CodeQL / Analyze (${{ matrix.language }}) (go) (push) Failing after 1m24s
CodeQL / Analyze (${{ matrix.language }}) (javascript-typescript) (push) Failing after 1m25s
CodeQL / Analyze (${{ matrix.language }}) (python) (push) Failing after 1m39s
publish-workspace-server-image / build-and-push (push) Failing after 2m50s
Canary — staging SaaS smoke (every 30 min) / Canary smoke (push) Failing after 10s
2026-05-07 15:19:12 +00:00
devops-engineer 6de3c1ccd2 fix(ci): add scripts/** to publish-workspace-server-image path filter
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 5s
Check merge_group trigger on required workflows / Required workflows have merge_group trigger (pull_request) Successful in 5s
CI / Detect changes (pull_request) Successful in 6s
E2E API Smoke Test / detect-changes (pull_request) Successful in 6s
Retarget main PRs to staging / Retarget to staging (pull_request) Has been skipped
Lint curl status-code capture / Scan workflows for curl status-capture pollution (pull_request) Successful in 5s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 7s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 8s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 6s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 6s
CI / Platform (Go) (pull_request) Successful in 4s
CI / Canvas (Next.js) (pull_request) Successful in 8s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 6s
CI / Python Lint & Test (pull_request) Successful in 7s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 7s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 6s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 10s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 8s
CodeQL / Analyze (${{ matrix.language }}) (javascript-typescript) (pull_request) Failing after 1m21s
CodeQL / Analyze (${{ matrix.language }}) (go) (pull_request) Failing after 1m21s
CodeQL / Analyze (${{ matrix.language }}) (python) (pull_request) Failing after 1m36s
scripts/clone-manifest.sh runs inside the platform Dockerfile build,
so a change to that script needs to retrigger publish. Without it,
the prior fix (clone via Gitea + lowercase org) didn't trigger this
workflow because scripts/ wasn't in the path filter.

Also serves as the file change to satisfy the path filter for THIS
push, retriggering publish-workspace-server-image now.
2026-05-07 08:18:53 -07:00
claude-ceo-assistant d4256b9d83 Merge pull request 'fix(scripts): clone-manifest.sh — use Gitea + lowercase org slug (Class G)' (#31) from fix/clone-manifest-gitea into main
Block internal-flavored paths / Block forbidden paths (push) Successful in 14s
CI / Detect changes (push) Successful in 17s
E2E API Smoke Test / detect-changes (push) Successful in 14s
Auto-sync main → staging / sync-staging (push) Failing after 20s
E2E Staging Canvas (Playwright) / detect-changes (push) Successful in 14s
Handlers Postgres Integration / detect-changes (push) Successful in 14s
Secret scan / Scan diff for credential-shaped strings (push) Successful in 13s
Runtime PR-Built Compatibility / detect-changes (push) Successful in 15s
CI / Platform (Go) (push) Successful in 8s
CI / Canvas (Next.js) (push) Successful in 8s
CI / Canvas Deploy Reminder (push) Has been skipped
CI / Python Lint & Test (push) Successful in 6s
E2E API Smoke Test / E2E API Smoke Test (push) Successful in 5s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (push) Successful in 5s
Handlers Postgres Integration / Handlers Postgres Integration (push) Successful in 6s
CI / Shellcheck (E2E scripts) (push) Successful in 11s
Ops Scripts Tests / Ops scripts (unittest) (push) Successful in 36s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (push) Successful in 17s
CodeQL / Analyze (${{ matrix.language }}) (javascript-typescript) (push) Has been cancelled
CodeQL / Analyze (${{ matrix.language }}) (go) (push) Has been cancelled
CodeQL / Analyze (${{ matrix.language }}) (python) (push) Has been cancelled
2026-05-07 15:18:09 +00:00
devops-engineer 8313b2a7a7 fix(scripts): clone-manifest.sh — use Gitea + lowercase org slug
Retarget main PRs to staging / Retarget to staging (pull_request) Has been skipped
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 5s
CI / Detect changes (pull_request) Successful in 7s
E2E API Smoke Test / detect-changes (pull_request) Successful in 9s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 9s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 11s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 11s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 14s
CI / Platform (Go) (pull_request) Successful in 11s
CI / Canvas (Next.js) (pull_request) Successful in 11s
CI / Python Lint & Test (pull_request) Successful in 9s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 10s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 12s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 8s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 17s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 12s
Ops Scripts Tests / Ops scripts (unittest) (pull_request) Successful in 40s
CodeQL / Analyze (${{ matrix.language }}) (go) (pull_request) Failing after 1m30s
CodeQL / Analyze (${{ matrix.language }}) (javascript-typescript) (pull_request) Failing after 1m32s
CodeQL / Analyze (${{ matrix.language }}) (python) (pull_request) Failing after 1m35s
Post-2026-05-06 GitHub-org suspension: scripts/clone-manifest.sh
was still pointing at https://github.com/${repo}.git, so the
Docker build for workspace-server'\''s platform image fails at:

  fatal: could not read Username for 'https://github.com':
         No such device or address

with no credentials available in the build container.

Fix: clone from https://git.moleculesai.app/${repo}.git instead.
manifest.json'\''s repo paths still read 'Molecule-AI/...' (the
historic GitHub slug, mixed-case); Gitea lowercases the org
component to 'molecule-ai/...'. Lowercase the org segment on
the fly with awk so we don'\''t need to rewrite every manifest
entry.

Local verify: bash -n passes, lowercase transform produces correct
Gitea paths, anonymous git clone of one of the manifest plugins
over HTTPS to git.moleculesai.app succeeds.

Class G in the prod-ship CI sweep — same shape as the github.com
ref Harness Replays hits, this is the second instance found.
2026-05-07 08:17:58 -07:00
claude-ceo-assistant 566c095571 Merge pull request 'chore(ci): trigger publish-workspace-server-image (path-filter satisfaction)' (#30) from chore/touch-publish-workflow-to-trigger into main
Check merge_group trigger on required workflows / Required workflows have merge_group trigger (push) Successful in 11s
Block internal-flavored paths / Block forbidden paths (push) Successful in 12s
Auto-sync main → staging / sync-staging (push) Failing after 15s
CI / Detect changes (push) Successful in 15s
E2E API Smoke Test / detect-changes (push) Successful in 13s
E2E Staging Canvas (Playwright) / detect-changes (push) Successful in 12s
Lint curl status-code capture / Scan workflows for curl status-capture pollution (push) Successful in 11s
Handlers Postgres Integration / detect-changes (push) Successful in 12s
Secret scan / Scan diff for credential-shaped strings (push) Successful in 13s
Runtime PR-Built Compatibility / detect-changes (push) Successful in 13s
CI / Platform (Go) (push) Successful in 10s
CI / Shellcheck (E2E scripts) (push) Successful in 7s
CI / Canvas (Next.js) (push) Successful in 9s
CI / Canvas Deploy Reminder (push) Has been skipped
E2E API Smoke Test / E2E API Smoke Test (push) Successful in 5s
CI / Python Lint & Test (push) Successful in 5s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (push) Successful in 5s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (push) Successful in 7s
Handlers Postgres Integration / Handlers Postgres Integration (push) Successful in 6s
publish-workspace-server-image / build-and-push (push) Failing after 1m6s
CodeQL / Analyze (${{ matrix.language }}) (python) (push) Failing after 1m29s
CodeQL / Analyze (${{ matrix.language }}) (javascript-typescript) (push) Failing after 1m38s
CodeQL / Analyze (${{ matrix.language }}) (go) (push) Failing after 1m41s
2026-05-07 15:12:22 +00:00
devops-engineer 694a036a7f chore(ci): trailing newline to retrigger publish-workspace-server-image (path-filter requires workflow file change)
Retarget main PRs to staging / Retarget to staging (pull_request) Has been skipped
Check merge_group trigger on required workflows / Required workflows have merge_group trigger (pull_request) Successful in 5s
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 6s
E2E API Smoke Test / detect-changes (pull_request) Successful in 8s
CI / Detect changes (pull_request) Successful in 9s
Lint curl status-code capture / Scan workflows for curl status-capture pollution (pull_request) Successful in 9s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 10s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 11s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 11s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 13s
CI / Platform (Go) (pull_request) Successful in 9s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 7s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 12s
CI / Python Lint & Test (pull_request) Successful in 14s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 11s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 10s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 11s
CI / Canvas (Next.js) (pull_request) Successful in 22s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
CodeQL / Analyze (${{ matrix.language }}) (go) (pull_request) Failing after 1m28s
CodeQL / Analyze (${{ matrix.language }}) (python) (pull_request) Failing after 1m30s
CodeQL / Analyze (${{ matrix.language }}) (javascript-typescript) (pull_request) Failing after 1m33s
2026-05-07 08:12:10 -07:00
claude-ceo-assistant 8c1dbc6ba5 Merge pull request 'chore(ci): retrigger publish-workspace-server-image post AWS secrets registration' (#29) from chore/retrigger-publish-post-aws-secrets into main
Block internal-flavored paths / Block forbidden paths (push) Successful in 11s
Auto-sync main → staging / sync-staging (push) Failing after 16s
CI / Detect changes (push) Successful in 14s
E2E API Smoke Test / detect-changes (push) Successful in 14s
E2E Staging Canvas (Playwright) / detect-changes (push) Successful in 13s
Handlers Postgres Integration / detect-changes (push) Successful in 11s
Secret scan / Scan diff for credential-shaped strings (push) Successful in 9s
Runtime PR-Built Compatibility / detect-changes (push) Successful in 11s
CI / Shellcheck (E2E scripts) (push) Successful in 3s
CI / Platform (Go) (push) Successful in 4s
CI / Canvas (Next.js) (push) Successful in 5s
CI / Python Lint & Test (push) Successful in 3s
E2E API Smoke Test / E2E API Smoke Test (push) Successful in 3s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (push) Successful in 3s
Handlers Postgres Integration / Handlers Postgres Integration (push) Successful in 4s
CI / Canvas Deploy Reminder (push) Has been skipped
E2E Staging Canvas (Playwright) / Canvas tabs E2E (push) Successful in 5s
CodeQL / Analyze (${{ matrix.language }}) (python) (push) Failing after 1m30s
CodeQL / Analyze (${{ matrix.language }}) (go) (push) Failing after 1m42s
CodeQL / Analyze (${{ matrix.language }}) (javascript-typescript) (push) Failing after 1m43s
2026-05-07 15:08:03 +00:00
devops-engineer 72d0d4b44e chore(ci): retrigger publish-workspace-server-image post AWS secrets registration
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 6s
CI / Detect changes (pull_request) Successful in 8s
pr-guards / disable-auto-merge-on-push (pull_request) Failing after 5s
Retarget main PRs to staging / Retarget to staging (pull_request) Has been skipped
E2E API Smoke Test / detect-changes (pull_request) Successful in 11s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 11s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 10s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 12s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 12s
CI / Platform (Go) (pull_request) Successful in 7s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 5s
CI / Canvas (Next.js) (pull_request) Successful in 8s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 6s
CI / Python Lint & Test (pull_request) Successful in 36s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 8s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 8s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 5s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
CodeQL / Analyze (${{ matrix.language }}) (go) (pull_request) Failing after 1m33s
CodeQL / Analyze (${{ matrix.language }}) (javascript-typescript) (pull_request) Failing after 1m38s
CodeQL / Analyze (${{ matrix.language }}) (python) (pull_request) Failing after 1m47s
2026-05-07 08:07:46 -07:00
claude-ceo-assistant 52e61d4704 fix(ci): cherry-pick PR#23 — drop github-app-auth plugin checkout (#28)
Check merge_group trigger on required workflows / Required workflows have merge_group trigger (push) Successful in 6s
Block internal-flavored paths / Block forbidden paths (push) Successful in 6s
CI / Detect changes (push) Successful in 8s
Auto-sync main → staging / sync-staging (push) Failing after 9s
E2E API Smoke Test / detect-changes (push) Successful in 8s
E2E Staging Canvas (Playwright) / detect-changes (push) Successful in 8s
Handlers Postgres Integration / detect-changes (push) Successful in 8s
Lint curl status-code capture / Scan workflows for curl status-capture pollution (push) Successful in 8s
Harness Replays / detect-changes (push) Successful in 9s
Secret scan / Scan diff for credential-shaped strings (push) Successful in 7s
Runtime PR-Built Compatibility / detect-changes (push) Successful in 9s
CI / Shellcheck (E2E scripts) (push) Successful in 4s
CI / Python Lint & Test (push) Successful in 6s
CI / Canvas (Next.js) (push) Successful in 7s
Handlers Postgres Integration / Handlers Postgres Integration (push) Successful in 7s
CI / Canvas Deploy Reminder (push) Has been skipped
E2E Staging Canvas (Playwright) / Canvas tabs E2E (push) Successful in 7s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (push) Successful in 5s
Harness Replays / Harness Replays (push) Failing after 34s
CodeQL / Analyze (${{ matrix.language }}) (go) (push) Failing after 1m20s
publish-workspace-server-image / build-and-push (push) Failing after 1m28s
E2E API Smoke Test / E2E API Smoke Test (push) Successful in 1m26s
CodeQL / Analyze (${{ matrix.language }}) (javascript-typescript) (push) Failing after 1m37s
CodeQL / Analyze (${{ matrix.language }}) (python) (push) Failing after 1m39s
CI / Platform (Go) (push) Successful in 2m22s
Canary — staging SaaS smoke (every 30 min) / Canary smoke (push) Failing after 6s
2026-05-07 14:52:47 +00:00
devops-engineer 10e510f50c chore: drop github-app-auth + swap GHCR→ECR (closes #157, #161)
Retarget main PRs to staging / Retarget to staging (pull_request) Has been skipped
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 5s
Check merge_group trigger on required workflows / Required workflows have merge_group trigger (pull_request) Successful in 5s
CI / Detect changes (pull_request) Successful in 8s
E2E API Smoke Test / detect-changes (pull_request) Successful in 8s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 8s
Harness Replays / detect-changes (pull_request) Successful in 9s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 9s
Lint curl status-code capture / Scan workflows for curl status-capture pollution (pull_request) Successful in 8s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 8s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 9s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 2s
CI / Python Lint & Test (pull_request) Successful in 4s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 5s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 5s
CI / Canvas (Next.js) (pull_request) Successful in 17s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 30s
Harness Replays / Harness Replays (pull_request) Failing after 32s
CodeQL / Analyze (${{ matrix.language }}) (javascript-typescript) (pull_request) Failing after 1m26s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 1m21s
CodeQL / Analyze (${{ matrix.language }}) (go) (pull_request) Failing after 1m36s
CodeQL / Analyze (${{ matrix.language }}) (python) (pull_request) Failing after 1m36s
CI / Platform (Go) (pull_request) Successful in 2m18s
Two coupled cleanups for the post-2026-05-06 stack:

============================================
The plugin injected GITHUB_TOKEN/GH_TOKEN via the App's
installation-access flow (~hourly rotation). Per-agent Gitea
identities replaced this approach after the 2026-05-06 suspension —
workspaces now provision with a per-persona Gitea PAT from .env
instead of an App-rotated token. The plugin code itself lived on
github.com/Molecule-AI/molecule-ai-plugin-github-app-auth which is
also unreachable post-suspension; checking it out at CI build time
was already failing.

Removed:
- workspace-server/cmd/server/main.go: githubappauth import + the
  `if os.Getenv("GITHUB_APP_ID") != ""` block that called
  BuildRegistry. gh-identity remains as the active mutator.
- workspace-server/Dockerfile + Dockerfile.tenant: COPY of the
  sibling repo + the `replace github.com/Molecule-AI/molecule-ai-
  plugin-github-app-auth => /plugin` directive injection.
- workspace-server/go.mod + go.sum: github-app-auth dep entry
  (cleaned up by `go mod tidy`).
- 3 workflows: actions/checkout steps for the sibling plugin repo:
    - .github/workflows/codeql.yml (Go matrix path)
    - .github/workflows/harness-replays.yml
    - .github/workflows/publish-workspace-server-image.yml

Verified `go build ./cmd/server` + `go vet ./...` pass post-removal.

=======================================================
Same workflow used to push to ghcr.io/molecule-ai/platform +
platform-tenant. ghcr.io/molecule-ai is gone post-suspension. The
operator's ECR org (153263036946.dkr.ecr.us-east-2.amazonaws.com/
molecule-ai/) already hosts platform-tenant + workspace-template-*
+ runner-base images and is the post-suspension SSOT for container
images. This PR aligns publish-workspace-server-image with that
stack.

- env.IMAGE_NAME + env.TENANT_IMAGE_NAME repointed to ECR URL.
- docker/login-action swapped for aws-actions/configure-aws-
  credentials@v4 + aws-actions/amazon-ecr-login@v2 chain (the
  standard ECR auth pattern; uses AWS_ACCESS_KEY_ID/SECRET secrets
  bound to the molecule-cp IAM user).

The :staging-<sha> + :staging-latest tag policy is unchanged —
staging-CP's TENANT_IMAGE pin still points at :staging-latest, just
with the new registry prefix.

Refs molecule-core#157, #161; parallel to org-wide CI-green sweep.
2026-05-07 07:48:51 -07:00
claude-ceo-assistant 6fac24e3de Merge pull request 'fix(workspace-server): SSOT-route container check + 422 on external runtimes (closes #10)' (#12) from fix/issue10-runtime-aware-plugin-install into main
Auto-sync main → staging / sync-staging (push) Failing after 13s
Block internal-flavored paths / Block forbidden paths (push) Successful in 9s
CI / Detect changes (push) Successful in 11s
E2E API Smoke Test / detect-changes (push) Successful in 10s
E2E Staging Canvas (Playwright) / detect-changes (push) Successful in 11s
Handlers Postgres Integration / detect-changes (push) Successful in 11s
Harness Replays / detect-changes (push) Successful in 13s
Runtime PR-Built Compatibility / detect-changes (push) Successful in 14s
Secret scan / Scan diff for credential-shaped strings (push) Successful in 11s
CI / Shellcheck (E2E scripts) (push) Successful in 5s
CI / Python Lint & Test (push) Successful in 7s
CodeQL / Analyze (${{ matrix.language }}) (go) (push) Failing after 1m3s
publish-workspace-server-image / build-and-push (push) Failing after 54s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (push) Successful in 6s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (push) Successful in 41s
CodeQL / Analyze (${{ matrix.language }}) (javascript-typescript) (push) Failing after 1m34s
CI / Canvas (Next.js) (push) Successful in 58s
CI / Canvas Deploy Reminder (push) Has been skipped
CodeQL / Analyze (${{ matrix.language }}) (python) (push) Failing after 1m39s
Harness Replays / Harness Replays (push) Failing after 46s
Handlers Postgres Integration / Handlers Postgres Integration (push) Failing after 1m14s
CI / Platform (Go) (push) Successful in 4m46s
E2E API Smoke Test / E2E API Smoke Test (push) Failing after 6m14s
Railway pin audit (drift detection) / Audit Railway env vars for drift-prone pins (push) Failing after 11s
Runtime Pin Compatibility / PyPI-latest install + import smoke (push) Successful in 9m58s
branch-protection drift check / Branch protection drift (push) Failing after 6s
Canary — staging SaaS smoke (every 30 min) / Canary smoke (push) Failing after 6s
2026-05-07 11:27:52 +00:00
claude-ceo-assistant f51722411b Merge branch 'main' into fix/issue10-runtime-aware-plugin-install
CI / Detect changes (pull_request) Successful in 13s
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 14s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 9s
E2E API Smoke Test / detect-changes (pull_request) Successful in 13s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 10s
pr-guards / disable-auto-merge-on-push (pull_request) Failing after 6s
Harness Replays / detect-changes (pull_request) Successful in 11s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 12s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 12s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 4s
CI / Canvas (Next.js) (pull_request) Successful in 6s
CI / Python Lint & Test (pull_request) Successful in 8s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 9s
CodeQL / Analyze (${{ matrix.language }}) (go) (pull_request) Failing after 1m6s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 7s
CodeQL / Analyze (${{ matrix.language }}) (javascript-typescript) (pull_request) Failing after 1m41s
CodeQL / Analyze (${{ matrix.language }}) (python) (pull_request) Failing after 1m44s
Harness Replays / Harness Replays (pull_request) Failing after 55s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Failing after 1m13s
CI / Platform (Go) (pull_request) Successful in 5m42s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Failing after 5m44s
2026-05-07 11:26:14 +00:00
claude-ceo-assistant f0015bff81 Merge pull request 'fix(workspace-server): default-bind to 127.0.0.1 in dev-mode fail-open (closes #7)' (#8) from fix/s8-bind-loopback-dev into main
CI / Canvas Deploy Reminder (push) Blocked by required conditions
Block internal-flavored paths / Block forbidden paths (push) Successful in 4s
Auto-sync main → staging / sync-staging (push) Failing after 8s
CI / Detect changes (push) Successful in 7s
E2E API Smoke Test / detect-changes (push) Successful in 11s
E2E Staging Canvas (Playwright) / detect-changes (push) Successful in 12s
Handlers Postgres Integration / detect-changes (push) Successful in 10s
Harness Replays / detect-changes (push) Successful in 10s
Runtime PR-Built Compatibility / detect-changes (push) Successful in 11s
Secret scan / Scan diff for credential-shaped strings (push) Successful in 8s
CodeQL / Analyze (${{ matrix.language }}) (go) (push) Failing after 1m7s
publish-workspace-server-image / build-and-push (push) Failing after 50s
CI / Shellcheck (E2E scripts) (push) Successful in 6s
CI / Python Lint & Test (push) Successful in 8s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (push) Successful in 8s
Handlers Postgres Integration / Handlers Postgres Integration (push) Successful in 9s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (push) Successful in 6s
CI / Platform (Go) (push) Has been cancelled
CI / Canvas (Next.js) (push) Has been cancelled
E2E API Smoke Test / E2E API Smoke Test (push) Has been cancelled
CodeQL / Analyze (${{ matrix.language }}) (javascript-typescript) (push) Has been cancelled
CodeQL / Analyze (${{ matrix.language }}) (python) (push) Has been cancelled
Harness Replays / Harness Replays (push) Has been cancelled
2026-05-07 11:25:48 +00:00
claude-ceo-assistant b72d1d3f26 Merge branch 'main' into fix/issue10-runtime-aware-plugin-install
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 10s
pr-guards / disable-auto-merge-on-push (pull_request) Failing after 6s
CI / Detect changes (pull_request) Successful in 13s
E2E API Smoke Test / detect-changes (pull_request) Successful in 13s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 11s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 13s
Harness Replays / detect-changes (pull_request) Successful in 13s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 13s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 13s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 6s
CI / Python Lint & Test (pull_request) Successful in 7s
CI / Canvas (Next.js) (pull_request) Successful in 23s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 4s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 20s
CodeQL / Analyze (${{ matrix.language }}) (go) (pull_request) Failing after 1m1s
Harness Replays / Harness Replays (pull_request) Failing after 42s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
CodeQL / Analyze (${{ matrix.language }}) (python) (pull_request) Failing after 1m44s
CodeQL / Analyze (${{ matrix.language }}) (javascript-typescript) (pull_request) Failing after 1m45s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Failing after 1m35s
CI / Platform (Go) (pull_request) Successful in 6m34s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Failing after 7m21s
2026-05-07 11:25:24 +00:00
claude-ceo-assistant a674a6547e Merge branch 'main' into fix/s8-bind-loopback-dev
pr-guards / disable-auto-merge-on-push (pull_request) Failing after 2s
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 5s
CI / Detect changes (pull_request) Successful in 8s
E2E API Smoke Test / detect-changes (pull_request) Successful in 8s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 9s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 9s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 10s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 11s
Harness Replays / detect-changes (pull_request) Successful in 15s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 8s
CI / Canvas (Next.js) (pull_request) Successful in 8s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 7s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 9s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 12s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
CI / Python Lint & Test (pull_request) Successful in 35s
Harness Replays / Harness Replays (pull_request) Failing after 47s
CodeQL / Analyze (${{ matrix.language }}) (javascript-typescript) (pull_request) Failing after 1m44s
CodeQL / Analyze (${{ matrix.language }}) (python) (pull_request) Failing after 1m46s
CI / Platform (Go) (pull_request) Failing after 6m9s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Failing after 7m29s
CodeQL / Analyze (${{ matrix.language }}) (go) (pull_request) Failing after 15m32s
2026-05-07 11:25:20 +00:00
claude-ceo-assistant f2f5338183 Merge pull request 'fix(ci): lowercase 'molecule-ai/' in cross-repo workflow refs' (#17) from fix/lowercase-org-slug into main
Auto-sync main → staging / sync-staging (push) Failing after 13s
auto-tag-runtime / tag (push) Successful in 13s
Block internal-flavored paths / Block forbidden paths (push) Successful in 9s
Check merge_group trigger on required workflows / Required workflows have merge_group trigger (push) Successful in 11s
CI / Detect changes (push) Successful in 11s
E2E API Smoke Test / detect-changes (push) Successful in 7s
E2E Staging Canvas (Playwright) / detect-changes (push) Successful in 7s
Lint curl status-code capture / Scan workflows for curl status-capture pollution (push) Successful in 7s
Harness Replays / detect-changes (push) Successful in 10s
Handlers Postgres Integration / detect-changes (push) Successful in 15s
Runtime PR-Built Compatibility / detect-changes (push) Successful in 8s
Secret scan / Scan diff for credential-shaped strings (push) Successful in 10s
SECRET_PATTERNS drift lint / Detect SECRET_PATTERNS drift (push) Successful in 35s
publish-workspace-server-image / build-and-push (push) Failing after 3m9s
CI / Shellcheck (E2E scripts) (push) Successful in 14s
E2E API Smoke Test / E2E API Smoke Test (push) Successful in 8s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (push) Successful in 1m6s
Harness Replays / Harness Replays (push) Failing after 53s
Handlers Postgres Integration / Handlers Postgres Integration (push) Successful in 8s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (push) Successful in 43s
CI / Canvas (Next.js) (push) Failing after 6m45s
CI / Canvas Deploy Reminder (push) Has been skipped
CI / Platform (Go) (push) Successful in 9m33s
CodeQL / Analyze (${{ matrix.language }}) (go) (push) Failing after 18m48s
CodeQL / Analyze (${{ matrix.language }}) (python) (push) Failing after 18m50s
CodeQL / Analyze (${{ matrix.language }}) (javascript-typescript) (push) Failing after 18m58s
CI / Python Lint & Test (push) Successful in 15m53s
Canary — staging SaaS smoke (every 30 min) / Canary smoke (push) Failing after 10s
2026-05-07 10:38:12 +00:00
security-auditor e01077be38 fix(ci): lowercase 'molecule-ai/' in cross-repo workflow refs
Retarget main PRs to staging / Retarget to staging (pull_request) Has been skipped
cascade-list-drift-gate / check (pull_request) Successful in 3s
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 3s
Check merge_group trigger on required workflows / Required workflows have merge_group trigger (pull_request) Successful in 5s
CI / Detect changes (pull_request) Successful in 4s
E2E API Smoke Test / detect-changes (pull_request) Successful in 5s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 4s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 5s
Harness Replays / detect-changes (pull_request) Successful in 4s
pr-guards / disable-auto-merge-on-push (pull_request) Failing after 0s
CodeQL / Analyze (${{ matrix.language }}) (go) (pull_request) Failing after 50s
Lint curl status-code capture / Scan workflows for curl status-capture pollution (pull_request) Successful in 3s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 4s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 5s
CodeQL / Analyze (${{ matrix.language }}) (javascript-typescript) (pull_request) Failing after 1m16s
CodeQL / Analyze (${{ matrix.language }}) (python) (pull_request) Failing after 1m16s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 7s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 3s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 6s
CI / Python Lint & Test (pull_request) Failing after 16s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 4s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 3s
Harness Replays / Harness Replays (pull_request) Failing after 40s
CI / Canvas (Next.js) (pull_request) Failing after 4m47s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
CI / Platform (Go) (pull_request) Successful in 5m25s
Gitea is case-sensitive on owner slugs; canonical is lowercase
`molecule-ai/...`. Mixed-case `Molecule-AI/...` refs fail-at-0s
when the runner tries to resolve the cross-repo workflow / checkout.

Same fix as molecule-controlplane#12. Mechanical case-correction;
no behavior change beyond making CI resolve again.

Refs: internal#46

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 01:00:10 -07:00
security-auditor c1de2287fd fix(workspace-server): SSOT-route container check + 422 on external runtimes
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 4s
CI / Detect changes (pull_request) Successful in 5s
E2E API Smoke Test / detect-changes (pull_request) Successful in 4s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 6s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 6s
Harness Replays / detect-changes (pull_request) Successful in 5s
Retarget main PRs to staging / Retarget to staging (pull_request) Has been skipped
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 5s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 6s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 2s
CI / Canvas (Next.js) (pull_request) Successful in 5s
CI / Python Lint & Test (pull_request) Successful in 4s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 7s
CodeQL / Analyze (${{ matrix.language }}) (go) (pull_request) Failing after 53s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Failing after 44s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 4s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
CodeQL / Analyze (${{ matrix.language }}) (javascript-typescript) (pull_request) Failing after 1m21s
CodeQL / Analyze (${{ matrix.language }}) (python) (pull_request) Failing after 1m28s
Harness Replays / Harness Replays (pull_request) Failing after 43s
CI / Platform (Go) (pull_request) Successful in 3m19s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Failing after 4m46s
Two coupled fixes for molecule-core#10 (plugin install 503 vs
status=online split-state):

1. SSOT for "is this workspace's container running" — `findRunningContainer`
   in plugins.go used to carry its own copy of `cli.ContainerInspect`, which
   collapsed transient daemon errors into the same `""` return as a
   genuinely-stopped container. Healthsweep's `Provisioner.IsRunning`
   handled the same input correctly (defensive). Promote the inspect logic
   to `provisioner.RunningContainerName`, route both consumers through it.
   Transient errors get a distinct log line on the plugins side so triage
   doesn't confuse a flaky daemon with a stopped container.

2. Runtime-aware Install/Uninstall — `runtime='external'` workspaces have
   no local container; push-install via docker exec is meaningless. They
   pull plugins via the download endpoint instead (Phase 30.3). Without a
   guard they fell through to `findRunningContainer` and 503'd with a
   misleading "container not running." Add an early 422 with a hint
   pointing at the download endpoint.

The two fixes are independent: (1) preserves correctness when the SSOT
helper is later modified; (2) eliminates the persistent split-state on
the 5 external persona-agent workspaces in this DB (and on tenant
deployments hitting the same shape).

* `internal/provisioner/provisioner.go` — new `RunningContainerName(ctx,
  cli, id) (string, error)` with three documented outcomes (running /
  stopped / transient). `Provisioner.IsRunning` now wraps it; behavior
  preserved.
* `internal/handlers/plugins.go` — `findRunningContainer` shimmed onto
  `RunningContainerName`; new `isExternalRuntime(id)` predicate.
* `internal/handlers/plugins_install.go` — Install + Uninstall reject
  external runtimes with 422 + hint, before the source-fetch step.
* `internal/handlers/plugins_install_external_test.go` — 5 cases:
  external→422, uninstall-external→422, container-backed-falls-through,
  no-runtime-lookup-fails-open, lookup-error-fails-open.
* `internal/handlers/plugins_findrunning_ssot_test.go` — two AST gates
  pin the SSOT routing so future PRs can't silently re-introduce the
  parallel impl. Mutation-tested: reverting either consumer to a direct
  `ContainerInspect` makes the gate fail.

Refs: molecule-core#10

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 22:58:20 -07:00
security-auditor f3187ea0c1 fix(workspace-server): default-bind to 127.0.0.1 in dev-mode fail-open
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 4s
CI / Detect changes (pull_request) Successful in 6s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 5s
E2E API Smoke Test / detect-changes (pull_request) Successful in 5s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 6s
Retarget main PRs to staging / Retarget to staging (pull_request) Has been skipped
Harness Replays / detect-changes (pull_request) Successful in 5s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 2s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 5s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 5s
CI / Canvas (Next.js) (pull_request) Successful in 5s
CI / Python Lint & Test (pull_request) Successful in 3s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 5s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 4s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 4s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
Harness Replays / Harness Replays (pull_request) Failing after 35s
CodeQL / Analyze (${{ matrix.language }}) (go) (pull_request) Failing after 56s
CodeQL / Analyze (${{ matrix.language }}) (javascript-typescript) (pull_request) Failing after 1m24s
CodeQL / Analyze (${{ matrix.language }}) (python) (pull_request) Failing after 1m25s
CI / Platform (Go) (pull_request) Successful in 1m48s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Failing after 4m47s
In dev mode (`MOLECULE_ENV=dev|development`, `ADMIN_TOKEN` unset) the
AdminAuth chain fails open by design so canvas at :3000 can call
workspace-server at :8080 without a bearer token. Combined with the
existing wildcard bind on `:8080`, that exposed unauthenticated
`POST /workspaces` to any same-LAN peer (S-8 in the audit RFC v1).

Couple the bind narrowness to the same signal that drives the auth
fail-open: when `middleware.IsDevModeFailOpen()` returns true, default
the listener to `127.0.0.1`. Production (`ADMIN_TOKEN` set) keeps
binding to all interfaces — its auth chain is doing the work. Operators
who need LAN exposure set `BIND_ADDR=<host>` explicitly.

* `cmd/server/main.go` — `resolveBindHost()` precedence: BIND_ADDR
  explicit > IsDevModeFailOpen() loopback > "" (all interfaces).
  Startup log line now includes the resolved bind + dev-mode-fail-open
  state for post-deploy auditing.
* `cmd/server/bind_test.go` — 8 t.Setenv table cases covering
  precedence, explicit overrides, dev/prod env words. Mutation-tested:
  removing the `IsDevModeFailOpen()` branch makes the dev-mode cases
  fail with "" vs "127.0.0.1".

Refs: molecule-core#7

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 22:29:24 -07:00
71 changed files with 1775 additions and 2297 deletions
+1 -1
View File
@@ -37,7 +37,7 @@ CANONICAL_FILE = Path(".github/workflows/secret-scan.yml")
CONSUMERS: list[tuple[str, str]] = [
(
"molecule-ai-workspace-runtime/molecule_runtime/scripts/pre-commit-checks.sh",
"https://git.moleculesai.app/molecule-ai/molecule-ai-workspace-runtime/raw/branch/main/molecule_runtime/scripts/pre-commit-checks.sh",
"https://raw.githubusercontent.com/Molecule-AI/molecule-ai-workspace-runtime/main/molecule_runtime/scripts/pre-commit-checks.sh",
),
]
@@ -103,7 +103,7 @@ jobs:
with:
fetch-depth: 0
ref: staging
token: ${{ secrets.AUTO_SYNC_TOKEN }}
token: ${{ secrets.GITHUB_TOKEN }}
- name: Configure git author
run: |
@@ -174,7 +174,7 @@ jobs:
- name: Open auto-sync PR + enable auto-merge
if: steps.check.outputs.needs_sync == 'true'
env:
GH_TOKEN: ${{ secrets.AUTO_SYNC_TOKEN }}
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
BRANCH: ${{ steps.check.outputs.branch }}
MAIN_SHORT: ${{ steps.check.outputs.main_short }}
DID_FF: ${{ steps.prep.outputs.did_ff }}
+2 -2
View File
@@ -1,7 +1,7 @@
name: Block internal-flavored paths
# Hard CI gate. Internal content (positioning, competitive briefs, sales
# playbooks, PMM/press drip, draft campaigns) lives in Molecule-AI/internal —
# playbooks, PMM/press drip, draft campaigns) lives in molecule-ai/internal —
# this public monorepo must never re-acquire those paths. CEO directive
# 2026-04-23 after a fleet-wide audit found 79 internal files leaked here.
#
@@ -135,7 +135,7 @@ jobs:
echo "::error::Forbidden internal-flavored paths detected:"
printf "$OFFENDING"
echo ""
echo "These paths belong in Molecule-AI/internal, not this public repo."
echo "These paths belong in molecule-ai/internal, not this public repo."
echo "See docs/internal-content-policy.md for canonical locations."
echo ""
echo "If your file is genuinely public-facing (e.g. a blog post"
+1 -1
View File
@@ -108,7 +108,7 @@ jobs:
echo
echo "One or more canary secrets are unset (\`CANARY_TENANT_URLS\`, \`CANARY_ADMIN_TOKENS\`, \`CANARY_CP_SHARED_SECRET\`)."
echo "Phase 2 canary fleet has not been stood up yet —"
echo "see [canary-tenants.md](https://github.com/Molecule-AI/molecule-controlplane/blob/main/docs/canary-tenants.md)."
echo "see [canary-tenants.md](https://github.com/molecule-ai/molecule-controlplane/blob/main/docs/canary-tenants.md)."
echo
echo "**Skipped — promote-to-latest will NOT auto-fire.** Dispatch \`promote-latest.yml\` manually when ready."
} >> "$GITHUB_STEP_SUMMARY"
+6 -6
View File
@@ -87,7 +87,7 @@ jobs:
run: go mod download
- if: needs.changes.outputs.platform == 'true'
run: go build ./cmd/server
# CLI (molecli) moved to standalone repo: github.com/Molecule-AI/molecule-cli
# CLI (molecli) moved to standalone repo: github.com/molecule-ai/molecule-cli
- if: needs.changes.outputs.platform == 'true'
run: go vet ./... || true
- if: needs.changes.outputs.platform == 'true'
@@ -165,7 +165,7 @@ jobs:
# Strip the package-import prefix so we can match .coverage-allowlist.txt
# entries written as paths relative to workspace-server/.
# Handle both module paths: platform/workspace-server/... and platform/...
rel=$(echo "$file" | sed 's|^github.com/Molecule-AI/molecule-monorepo/platform/workspace-server/||; s|^github.com/Molecule-AI/molecule-monorepo/platform/||')
rel=$(echo "$file" | sed 's|^github.com/molecule-ai/molecule-monorepo/platform/workspace-server/||; s|^github.com/molecule-ai/molecule-monorepo/platform/||')
if echo "$ALLOWLIST" | grep -qxF "$rel"; then
echo "::warning file=workspace-server/$rel::Critical file at ${pct}% coverage (allowlisted, #1823) — fix before expiry."
@@ -235,7 +235,7 @@ jobs:
run: npx vitest run --coverage
- name: Upload coverage summary as artifact
if: needs.changes.outputs.canvas == 'true' && always()
uses: actions/upload-artifact@v3 # pinned to v3 for Gitea act_runner v0.6 compatibility (internal#46)
uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02 # v4.6.2
with:
name: canvas-coverage-${{ github.run_id }}
path: canvas/coverage/
@@ -243,8 +243,8 @@ jobs:
if-no-files-found: warn
# MCP Server + SDK removed from CI — now in standalone repos:
# - github.com/Molecule-AI/molecule-mcp-server (npm CI)
# - github.com/Molecule-AI/molecule-sdk-python (PyPI CI)
# - github.com/molecule-ai/molecule-mcp-server (npm CI)
# - github.com/molecule-ai/molecule-sdk-python (PyPI CI)
# e2e-api job moved to .github/workflows/e2e-api.yml (issue #458).
# It now has workflow-level concurrency (cancel-in-progress: false) so
@@ -434,5 +434,5 @@ jobs:
fi
# SDK + plugin validation moved to standalone repo:
# github.com/Molecule-AI/molecule-sdk-python
# github.com/molecule-ai/molecule-sdk-python
+3
View File
@@ -43,6 +43,9 @@ permissions:
jobs:
analyze:
name: Analyze (${{ matrix.language }})
# CodeQL set to advisory (non-blocking) on Gitea Actions — Hongming dec'''n 2026-05-07 (#156).
# Findings still emit as SARIF artifacts; failing CodeQL run does not block PR merge.
continue-on-error: true
runs-on: ubuntu-latest
timeout-minutes: 45
+2 -2
View File
@@ -139,7 +139,7 @@ jobs:
- name: Upload Playwright report on failure
if: failure() && needs.detect-changes.outputs.canvas == 'true'
uses: actions/upload-artifact@v3 # pinned to v3 for Gitea act_runner v0.6 compatibility (internal#46)
uses: actions/upload-artifact@043fb46d1a93c77aae656e7c1c64a875d1fc6a0a # v7.0.1
with:
name: playwright-report-staging
path: canvas/playwright-report-staging/
@@ -147,7 +147,7 @@ jobs:
- name: Upload screenshots on failure
if: failure() && needs.detect-changes.outputs.canvas == 'true'
uses: actions/upload-artifact@v3 # pinned to v3 for Gitea act_runner v0.6 compatibility (internal#46)
uses: actions/upload-artifact@043fb46d1a93c77aae656e7c1c64a875d1fc6a0a # v7.0.1
with:
name: playwright-screenshots
path: canvas/test-results/
+1 -1
View File
@@ -19,4 +19,4 @@ permissions:
jobs:
disable-auto-merge-on-push:
uses: Molecule-AI/molecule-ci/.github/workflows/disable-auto-merge-on-push.yml@main
uses: molecule-ai/molecule-ci/.github/workflows/disable-auto-merge-on-push.yml@main
+54 -127
View File
@@ -25,7 +25,7 @@ name: publish-runtime
# 3. Publishes to PyPI via the PyPA Trusted Publisher action (OIDC).
# No static API token is stored — PyPI verifies the workflow's
# OIDC claim against the trusted-publisher config registered for
# molecule-ai-workspace-runtime (Molecule-AI/molecule-core,
# molecule-ai-workspace-runtime (molecule-ai/molecule-core,
# publish-runtime.yml, environment pypi-publish).
#
# After publish: the 8 template repos pick up the new version on their
@@ -166,7 +166,7 @@ jobs:
- name: Publish to PyPI (Trusted Publisher / OIDC)
# PyPI side is configured: project molecule-ai-workspace-runtime →
# publisher Molecule-AI/molecule-core, workflow publish-runtime.yml,
# publisher molecule-ai/molecule-core, workflow publish-runtime.yml,
# environment pypi-publish. The action mints a short-lived OIDC
# token and exchanges it for a PyPI upload credential — no static
# API token in this repo's secrets.
@@ -282,33 +282,42 @@ jobs:
echo "::error::Refusing to fan out cascade against stale or corrupt PyPI surfaces."
exit 1
- name: Fan out via push to .runtime-version
- name: Fan out repository_dispatch
env:
# Gitea PAT with write:repository scope on the 8 cascade-active
# template repos. Used here for `git push` (NOT for an API
# dispatch — Gitea 1.22.6 has no repository_dispatch endpoint;
# empirically verified across 6 candidate paths in molecule-
# core#20 issuecomment-913). The push trips each template's
# existing `on: push: branches: [main]` trigger on
# publish-image.yml, which then reads the updated
# .runtime-version via its resolve-version job.
DISPATCH_TOKEN: ${{ secrets.DISPATCH_TOKEN }}
# Fine-grained PAT with `actions:write` on the 8 template repos.
# GITHUB_TOKEN can't fire dispatches across repos — needs an explicit
# token. Stored as a repo secret; rotate per the standard schedule.
DISPATCH_TOKEN: ${{ secrets.TEMPLATE_DISPATCH_TOKEN }}
# Single source of truth: the publish job's output, which handles
# tag/manual-input/auto-bump uniformly. The previous fallback
# (`steps.version.outputs.version` from inside the cascade job)
# was a dead reference — different job, no shared step scope.
RUNTIME_VERSION: ${{ needs.publish.outputs.version }}
run: |
set +e # don't abort on a single repo failure — collect them all
# Soft-skip on workflow_dispatch when the token is missing
# (operator ad-hoc test); hard-fail on push so unattended
# publishes can't silently skip the cascade. Same shape as
# the original v1, intentional split per the schedule-vs-
# dispatch hardening 2026-04-28.
# Schedule-vs-dispatch behaviour split (hardened 2026-04-28
# after the sweep-cf-orphans soft-skip incident — same class
# of bug):
#
# The earlier "skipping cascade. templates will pick up the
# new version on their own next rebuild" message was wrong —
# templates only build on this dispatch trigger; without it
# they stay pinned to whatever runtime version they last saw.
# A silent skip here means "PyPI is current, templates are
# not" and the gap is invisible until someone notices a
# template still on the old version weeks later.
#
# - push → exit 1 (red CI surfaces the gap)
# - workflow_dispatch → exit 0 with a warning (operator
# ran this ad-hoc; let them rerun
# after fixing the secret)
if [ -z "$DISPATCH_TOKEN" ]; then
if [ "${{ github.event_name }}" = "workflow_dispatch" ]; then
echo "::warning::DISPATCH_TOKEN secret not set — skipping cascade."
echo "::warning::TEMPLATE_DISPATCH_TOKEN secret not set — skipping cascade."
echo "::warning::set it at Settings → Secrets and Variables → Actions, then rerun. Templates will stay on the prior runtime version until either this token is set or each template is rebuilt manually."
exit 0
fi
echo "::error::DISPATCH_TOKEN secret missing — cascade cannot fan out."
echo "::error::TEMPLATE_DISPATCH_TOKEN secret missing — cascade cannot fan out."
echo "::error::PyPI was published, but the 8 template repos will NOT pick up the new version until this token is restored and a republish dispatches the cascade."
echo "::error::set it at Settings → Secrets and Variables → Actions; then re-trigger publish-runtime via workflow_dispatch."
exit 1
@@ -318,119 +327,37 @@ jobs:
echo "::error::publish job did not expose a version output — cascade cannot fan out"
exit 1
fi
# All 9 workspace templates declared in manifest.json. The list
# MUST stay aligned with manifest.json's workspace_templates —
# cascade-list-drift-gate.yml enforces this in CI per the
# codex-stuck-on-stale-runtime invariant from PR #2556.
# Long-term goal: derive this list from manifest.json so it
# can't drift even on a manifest edit (RFC #388 Phase-1).
#
# Per-template publish-image.yml presence is checked at
# cascade-time below: codex doesn't ship one today, so the
# cascade soft-skips it with an informational message rather
# than dropping it from this list (which would re-introduce
# the drift the gate exists to catch).
GITEA_URL="${GITEA_URL:-https://git.moleculesai.app}"
# All 9 active workspace template repos. The PR #2536 pruning
# ("deprecated, no shipping images") was empirically wrong:
# continuous-synth-e2e.yml defaults to langgraph as its primary
# canary (line 44), and every excluded template had successful
# publish-image runs as of 2026-05-03 — none were dormant.
# Symptom of the prune: today's a2a-sdk strict-mode fix
# (#2566 / commit e1628c4) cascaded to 4 templates but never
# reached langgraph, so the synth-E2E correctly canary'd a fix
# that had landed but not deployed. Re-added the 5 templates.
# Long-term: derive this list from manifest.json so cascade
# scope can't drift from E2E scope — tracked in RFC #388 as a
# Phase-1 invariant.
TEMPLATES="claude-code hermes openclaw codex langgraph crewai autogen deepagents gemini-cli"
FAILED=""
SKIPPED=""
# Configure git identity once. The persona owning DISPATCH_TOKEN
# is the same identity that authored this commit on each
# template; using a generic "publish-runtime cascade" co-author
# trailer in the message keeps the audit trail honest about the
# workflow-driven origin.
git config --global user.name "publish-runtime cascade"
git config --global user.email "publish-runtime@moleculesai.app"
WORKDIR="$(mktemp -d)"
for tpl in $TEMPLATES; do
REPO="molecule-ai/molecule-ai-workspace-template-$tpl"
CLONE="$WORKDIR/$tpl"
# Pre-check: skip templates without a publish-image.yml.
# The cascade's job is to trip the template's on-push
# rebuild — if there's no rebuild workflow, pushing a
# .runtime-version commit is just noise on the target
# repo. Use the Gitea contents API (no clone required for
# the probe). 200 = present; 404 = absent.
HTTP=$(curl -sS -o /dev/null -w "%{http_code}" \
-H "Authorization: token $DISPATCH_TOKEN" \
"$GITEA_URL/api/v1/repos/$REPO/contents/.github/workflows/publish-image.yml")
if [ "$HTTP" = "404" ]; then
echo "↷ $tpl has no publish-image.yml — soft-skip (informational; manifest still tracks it)"
SKIPPED="$SKIPPED $tpl"
continue
fi
if [ "$HTTP" != "200" ]; then
echo "::warning::$tpl publish-image.yml probe returned HTTP $HTTP — proceeding anyway, push will surface the real failure if any"
fi
# Use a per-template attempt loop so a transient race (e.g.
# human pushing to the same template at the same instant)
# doesn't lose the cascade. Bounded retries (3) — beyond
# that we surface the failure and let the operator retry.
attempt=0
success=false
while [ $attempt -lt 3 ]; do
attempt=$((attempt + 1))
rm -rf "$CLONE"
if ! git clone --depth=1 \
"https://x-access-token:${DISPATCH_TOKEN}@${GITEA_URL#https://}/$REPO.git" \
"$CLONE" >/tmp/clone.log 2>&1; then
echo "::warning::clone $tpl attempt $attempt failed: $(tail -n3 /tmp/clone.log)"
sleep 2
continue
fi
cd "$CLONE"
echo "$VERSION" > .runtime-version
# Idempotency guard: if the file already matches, this
# publish is a re-run for a version already cascaded.
# Don't push a no-op commit (would spuriously re-trip the
# template's on-push and rebuild for nothing).
if git diff --quiet -- .runtime-version; then
echo "✓ $tpl already at $VERSION — no commit needed (idempotent)"
success=true
cd - >/dev/null
break
fi
git add .runtime-version
git commit -m "chore: pin runtime to $VERSION (publish-runtime cascade)" \
-m "Co-Authored-By: publish-runtime cascade <publish-runtime@moleculesai.app>" \
>/dev/null
if git push origin HEAD:main >/tmp/push.log 2>&1; then
echo "✓ $tpl pushed $VERSION on attempt $attempt"
success=true
cd - >/dev/null
break
fi
# Likely a non-fast-forward — pull-rebase and retry.
# Don't force-push: that would silently overwrite a racing
# human/cascade commit.
echo "::warning::push $tpl attempt $attempt failed, pull-rebasing: $(tail -n3 /tmp/push.log)"
git pull --rebase origin main >/tmp/rebase.log 2>&1 || true
cd - >/dev/null
done
if [ "$success" != "true" ]; then
STATUS=$(curl -sS -o /tmp/dispatch.out -w "%{http_code}" \
-X POST "https://api.github.com/repos/$REPO/dispatches" \
-H "Authorization: Bearer $DISPATCH_TOKEN" \
-H "Accept: application/vnd.github+json" \
-H "X-GitHub-Api-Version: 2022-11-28" \
-d "{\"event_type\":\"runtime-published\",\"client_payload\":{\"runtime_version\":\"$VERSION\"}}")
if [ "$STATUS" = "204" ]; then
echo "✓ dispatched $tpl ($VERSION)"
else
echo "::warning::✗ failed to dispatch $tpl: HTTP $STATUS — $(cat /tmp/dispatch.out)"
FAILED="$FAILED $tpl"
fi
done
rm -rf "$WORKDIR"
if [ -n "$FAILED" ]; then
echo "::error::Cascade incomplete after 3 retries each. Failed templates:$FAILED"
echo "::error::PyPI publish succeeded; failed templates lag the new version. Re-run this workflow_dispatch with the same version to retry only the laggers (idempotent — already-cascaded templates skip)."
exit 1
fi
if [ -n "$SKIPPED" ]; then
echo "Cascade complete: pinned $VERSION on cascade-active templates. Soft-skipped (no publish-image.yml):$SKIPPED"
else
echo "Cascade complete: $VERSION pinned across all manifest workspace_templates."
echo "::warning::Cascade incomplete. Failed templates:$FAILED"
# Don't fail the whole job — PyPI publish already succeeded;
# operators can retry the failed templates manually.
fi
@@ -37,6 +37,7 @@ on:
- 'workspace-server/**'
- 'canvas/**'
- 'manifest.json'
- 'scripts/**'
- '.github/workflows/publish-workspace-server-image.yml'
workflow_dispatch:
@@ -181,3 +182,4 @@ jobs:
org.opencontainers.image.source=https://github.com/${{ github.repository }}
org.opencontainers.image.revision=${{ github.sha }}
org.opencontainers.image.description=Molecule AI tenant platform + canvas — pending canary verify
@@ -9,7 +9,7 @@ name: redeploy-tenants-on-main
#
# This workflow closes the gap by calling the control-plane admin
# endpoint that performs a canary-first, batched, health-gated rolling
# redeploy across every live tenant. Implemented in Molecule-AI/
# redeploy across every live tenant. Implemented in molecule-ai/
# molecule-controlplane as POST /cp/admin/tenants/redeploy-fleet
# (feat/tenant-auto-redeploy, landing alongside this workflow).
#
@@ -146,7 +146,7 @@ jobs:
- name: Call CP redeploy-fleet
# CP_ADMIN_API_TOKEN must be set as a repo/org secret on
# Molecule-AI/molecule-core, matching the staging/prod CP's
# molecule-ai/molecule-core, matching the staging/prod CP's
# CP_ADMIN_API_TOKEN env. Stored in Railway, mirrored to this
# repo's secrets for CI.
env:
@@ -97,7 +97,7 @@ jobs:
- name: Call staging-CP redeploy-fleet
# CP_STAGING_ADMIN_API_TOKEN must be set as a repo/org secret
# on Molecule-AI/molecule-core, matching staging-CP's
# on molecule-ai/molecule-core, matching staging-CP's
# CP_ADMIN_API_TOKEN env var (visible in Railway controlplane
# / staging environment). Stored separately from the prod
# CP_ADMIN_API_TOKEN so a leak of one doesn't auth the other.
@@ -96,7 +96,7 @@ jobs:
--body "$(cat <<'BODY'
[retarget-bot] This PR was opened against `main` and has been retargeted to `staging` automatically.
**Why:** per [SHARED_RULES rule 8](https://github.com/Molecule-AI/molecule-ai-org-template-molecule-dev/blob/main/SHARED_RULES.md), all feature work targets `staging` first; the CEO promotes `staging → main` separately.
**Why:** per [SHARED_RULES rule 8](https://github.com/molecule-ai/molecule-ai-org-template-molecule-dev/blob/main/SHARED_RULES.md), all feature work targets `staging` first; the CEO promotes `staging → main` separately.
**What changed:** just the base branch — no code change. CI will re-run against `staging`. If you get merge conflicts, rebase on `staging`.
+1 -1
View File
@@ -12,7 +12,7 @@ name: Secret scan
#
# jobs:
# secret-scan:
# uses: Molecule-AI/molecule-core/.github/workflows/secret-scan.yml@staging
# uses: molecule-ai/molecule-core/.github/workflows/secret-scan.yml@staging
#
# Pin to @staging not @main — staging is the active default branch,
# main lags via the staging-promotion workflow. Updates ride along
+6 -6
View File
@@ -22,7 +22,7 @@ development workflow, conventions, and how to get your changes merged.
```bash
# Clone the repo
git clone https://git.moleculesai.app/molecule-ai/molecule-core.git
git clone https://github.com/Molecule-AI/molecule-core.git
cd molecule-core
# Install git hooks
@@ -57,7 +57,7 @@ See `CLAUDE.md` for a full list of environment variables and their purposes.
This repo is scoped to **code** (canvas, workspace, workspace-server, related
infra). Public content (blog posts, marketing copy, OG images, SEO briefs,
DevRel demos) lives in [`Molecule-AI/docs`](https://git.moleculesai.app/molecule-ai/docs).
DevRel demos) lives in [`Molecule-AI/docs`](https://github.com/Molecule-AI/docs).
The `Block forbidden paths` CI gate fails any PR that writes to `marketing/`
or other removed paths — open against `Molecule-AI/docs` instead.
@@ -110,7 +110,7 @@ causing a render loop when any node position changed.
1. **Repo-wide:** "Automatically delete head branches" is on. Once a PR merges, the branch is deleted server-side. Any subsequent `git push` to that branch fails with `remote rejected — no such branch`.
2. **CI:** the `pr-guards` workflow (calling [molecule-ci `disable-auto-merge-on-push`](https://git.moleculesai.app/molecule-ai/molecule-ci/src/branch/main/.github/workflows/disable-auto-merge-on-push.yml)) fires on every push to an open PR. If auto-merge was already enabled, it's disabled and a comment is posted. You must explicitly re-enable after verifying the new commit.
2. **CI:** the `pr-guards` workflow (calling [molecule-ci `disable-auto-merge-on-push`](https://github.com/Molecule-AI/molecule-ci/blob/main/.github/workflows/disable-auto-merge-on-push.yml)) fires on every push to an open PR. If auto-merge was already enabled, it's disabled and a comment is posted. You must explicitly re-enable after verifying the new commit.
**Workflow rules that follow from the guards:**
- Push **all** commits before running `gh pr merge --auto`.
@@ -180,9 +180,9 @@ and run CI manually.
Code in this repo lands in molecule-core. Some related runtime artifacts
live in their own repos:
- [`Molecule-AI/molecule-ai-workspace-runtime`](https://git.moleculesai.app/molecule-ai/molecule-ai-workspace-runtime) — Python adapter SDK (`molecule_runtime`) that runs inside containerized Molecule workspaces. Bridges Claude Code SDK / hermes / langgraph / etc. → A2A queue.
- [`Molecule-AI/molecule-sdk-python`](https://git.moleculesai.app/molecule-ai/molecule-sdk-python) — `A2AServer` + `RemoteAgentClient` for external agents that register over the public `/registry/register` flow.
- [`Molecule-AI/molecule-mcp-claude-channel`](https://git.moleculesai.app/molecule-ai/molecule-mcp-claude-channel) — Claude Code channel plugin. Bridges A2A traffic into a running Claude Code session via MCP `notifications/claude/channel`. Polling-based (no tunnel required); install with `claude --channels plugin:molecule@Molecule-AI/molecule-mcp-claude-channel`.
- [`Molecule-AI/molecule-ai-workspace-runtime`](https://github.com/Molecule-AI/molecule-ai-workspace-runtime) — Python adapter SDK (`molecule_runtime`) that runs inside containerized Molecule workspaces. Bridges Claude Code SDK / hermes / langgraph / etc. → A2A queue.
- [`Molecule-AI/molecule-sdk-python`](https://github.com/Molecule-AI/molecule-sdk-python) — `A2AServer` + `RemoteAgentClient` for external agents that register over the public `/registry/register` flow.
- [`Molecule-AI/molecule-mcp-claude-channel`](https://github.com/Molecule-AI/molecule-mcp-claude-channel) — Claude Code channel plugin. Bridges A2A traffic into a running Claude Code session via MCP `notifications/claude/channel`. Polling-based (no tunnel required); install with `claude --channels plugin:molecule@Molecule-AI/molecule-mcp-claude-channel`.
When extending the **A2A surface** in molecule-core (`workspace-server/internal/handlers/a2a_proxy.go` etc.), consider whether the change has a downstream impact on the runtime SDK or the channel plugin — they're versioned independently but share the wire shape.
+23 -52
View File
@@ -1,7 +1,7 @@
<div align="center">
<p>
<img src="./docs/assets/branding/molecule-icon.svg" alt="Molecule AI" width="160" />
<img src="./docs/assets/branding/molecule-icon.png" alt="Molecule AI Icon Logo" width="160" />
</p>
<p>
@@ -39,8 +39,8 @@
<a href="./docs/agent-runtime/workspace-runtime.md"><strong>Workspace Runtime</strong></a>
</p>
[![Deploy on Railway](https://railway.app/button.svg)](https://railway.app/new/template?template=https://git.moleculesai.app/molecule-ai/molecule-core)
[![Deploy to Render](https://render.com/images/deploy-to-render-button.svg)](https://render.com/deploy?repo=https://git.moleculesai.app/molecule-ai/molecule-core)
[![Deploy on Railway](https://railway.app/button.svg)](https://railway.app/new/template?template=https://github.com/Molecule-AI/molecule-monorepo)
[![Deploy to Render](https://render.com/images/deploy-to-render-button.svg)](https://render.com/deploy?repo=https://github.com/Molecule-AI/molecule-monorepo)
</div>
@@ -53,8 +53,8 @@ Molecule AI is the most powerful way to govern an AI agent organization in produ
It combines the parts that are usually scattered across demos, internal glue code, and framework-specific tooling into one product:
- one org-native control plane for teams, roles, hierarchy, and lifecycle
- one runtime layer that lets **eight** agent runtimes — LangGraph, DeepAgents, Claude Code, CrewAI, AutoGen, **Hermes**, **Gemini CLI**, and OpenClaw run side by side behind one workspace contract
- one memory model that keeps recall, sharing, and skill evolution aligned with organizational boundaries (Memory v2 backed by pgvector for semantic recall)
- one runtime layer that lets LangGraph, DeepAgents, Claude Code, CrewAI, AutoGen, and OpenClaw run side by side
- one memory model that keeps recall, sharing, and skill evolution aligned with organizational boundaries
- one operational surface for observing, pausing, restarting, inspecting, and improving live workspaces
Most teams can build a workflow, a strong single agent, a coding agent, or a custom multi-agent graph.
@@ -75,7 +75,7 @@ You do not wire collaboration paths by hand. Hierarchy defines the default commu
### 3. Runtime choice stops being a dead-end decision
LangGraph, DeepAgents, Claude Code, CrewAI, AutoGen, Hermes, Gemini CLI, and OpenClaw can all plug into the same workspace abstraction. Teams can standardize governance without forcing every group onto one runtime.
LangGraph, DeepAgents, Claude Code, CrewAI, AutoGen, and OpenClaw can all plug into the same workspace abstraction. Teams can standardize governance without forcing every group onto one runtime.
### 4. Memory is treated like infrastructure
@@ -117,8 +117,6 @@ Molecule AI is not trying to replace the frameworks below. It is the system that
| **Claude Code** | Shipping on `main` | Real coding workflows, CLI-native continuity | Secure workspace abstraction, A2A delegation, org boundaries, shared control plane |
| **CrewAI** | Shipping on `main` | Role-based crews | Persistent workspace identity, policy consistency, shared canvas and registry |
| **AutoGen** | Shipping on `main` | Assistant/tool orchestration | Standardized deployment, hierarchy-aware collaboration, shared ops plane |
| **Hermes 4** | Shipping on `main` | Hybrid reasoning, native tools, json_schema (NousResearch/hermes-agent) | Option B upstream hook, A2A bridge to OpenAI-compat API, multi-provider provider derivation |
| **Gemini CLI** | Shipping on `main` | Google Gemini CLI continuity | Workspace lifecycle, A2A, hierarchy-aware collaboration, shared ops plane |
| **OpenClaw** | Shipping on `main` | CLI-native runtime with its own session model | Workspace lifecycle, templates, activity logs, topology-aware collaboration |
| **NemoClaw** | WIP on `feat/nemoclaw-t4-docker` | NVIDIA-oriented runtime path | Planned to join the same abstraction once merged; not yet part of `main` |
@@ -184,10 +182,9 @@ The result is not just “an agent that learns.” It is **an organization that
## What Ships In `main`
### Canvas (v4)
### Canvas
- Next.js 15 + React Flow + Zustand
- **warm-paper theme system** — light / dark / follow-system, SSR cookie + nonce'd boot script + ThemeProvider; terminal + code surfaces stay dark unconditionally
- drag-to-nest team building
- empty-state deployment + onboarding wizard
- template palette
@@ -196,9 +193,8 @@ The result is not just “an agent that learns.” It is **an organization that
### Platform
- Go 1.25 / Gin control plane (80+ HTTP endpoints + Gorilla WebSocket fanout)
- workspace CRUD and provisioning (pluggable Provisioner — Docker locally, EC2 + SSM in production)
- **A2A response path is a typed discriminated union (RFC #2967)** — frozen dataclasses + total parser; 100% unit + adversarial fuzz coverage
- Go/Gin control plane
- workspace CRUD and provisioning
- registry and heartbeats
- browser-safe A2A proxy
- team expansion/collapse
@@ -208,10 +204,10 @@ The result is not just “an agent that learns.” It is **an organization that
### Runtime
- unified `workspace/` image; thin AMI in production (us-east-2)
- adapter-driven execution across **8 runtimes** (Claude Code, Hermes, Gemini CLI, LangGraph, DeepAgents, CrewAI, AutoGen, OpenClaw)
- unified `workspace/` image
- adapter-driven execution
- Agent Card registration
- awareness-backed memory integration; **Memory v2 backed by pgvector** for semantic recall
- awareness-backed memory integration
- plugin-mounted shared rules/skills
- hot-reloadable local skills
- coordinator-only delegation path
@@ -225,21 +221,6 @@ The result is not just “an agent that learns.” It is **an organization that
- runtime tiers
- direct workspace inspection through terminal and files
### SaaS (via [`molecule-controlplane`](https://github.com/Molecule-AI/molecule-controlplane))
- multi-tenant on AWS EC2 + Neon (per-tenant Postgres branch) + Cloudflare Tunnels (per-tenant, no public ports)
- WorkOS AuthKit + Stripe Checkout + Customer Portal
- AWS KMS envelope encryption (DB / Redis connection strings); AWS Secrets Manager for tenant bootstrap
- `tenant_resources` audit table + 30-min boot-event-aware reconciler — every CF / AWS lifecycle event recorded, claim vs live state diffed
### Bring your own Claude Code session (via [`molecule-mcp-claude-channel`](https://github.com/Molecule-AI/molecule-mcp-claude-channel))
- Claude Code plugin that bridges Molecule A2A traffic into a local Claude Code session via MCP
- subscribe to one or more workspaces; peer messages surface as conversation turns; replies route back through Molecule's A2A
- no tunnel, no public endpoint — the plugin self-registers each watched workspace as `delivery_mode=poll` and long-polls `/activity?since_id=…`
- multi-tenant friendly: one plugin install can watch workspaces across multiple Molecule tenants (`MOLECULE_PLATFORM_URLS` per-workspace)
- install via the standard marketplace flow: `/plugin marketplace add Molecule-AI/molecule-mcp-claude-channel``/plugin install molecule-channel@molecule-mcp-claude-channel`
## Built For Teams That Need More Than A Demo
Molecule AI is especially strong when you need to run:
@@ -252,30 +233,24 @@ Molecule AI is especially strong when you need to run:
## Architecture
```text
Canvas (Next.js 15, warm-paper :3000) <--HTTP / WS--> Platform (Go 1.25 :8080) <---> Postgres + Redis
| |
| +--> Provisioner: Docker (local) / EC2 + SSM (prod)
| +--> bundles · templates · secrets · KMS
Canvas (Next.js :3000) <--HTTP / WS--> Platform (Go :8080) <---> Postgres + Redis
| |
| +--> Docker provisioner / bundles / templates / secrets
|
+------------------------- shows ------------------------> workspaces, teams, tasks, traces, events
+-------------------- shows --------------------> workspaces, teams, tasks, traces, events
Workspace Runtime (Python ≥3.11, image with adapters)
- 8 adapters: LangGraph / DeepAgents / Claude Code / CrewAI / AutoGen / Hermes / Gemini CLI / OpenClaw
- Agent Card + A2A server (typed-SSOT response path, RFC #2967)
- heartbeat + activity + awareness-backed memory (Memory v2 — pgvector semantic recall)
Workspace Runtime (Python image with adapters)
- LangGraph / DeepAgents / Claude Code / CrewAI / AutoGen / OpenClaw
- Agent Card + A2A server
- heartbeat + activity + awareness-backed memory
- skills + plugins + hot reload
SaaS Control Plane (molecule-controlplane, private)
- per-tenant EC2 + Neon (Postgres branch) + Cloudflare Tunnel
- WorkOS · Stripe · KMS · AWS Secrets Manager
- tenant_resources audit + 30-min reconciler
```
## Quick Start
```bash
git clone https://git.moleculesai.app/molecule-ai/molecule-core.git
cd molecule-core
git clone https://github.com/Molecule-AI/molecule-monorepo.git
cd molecule-monorepo
cp .env.example .env
# Defaults boot the stack locally out of the box. See .env.example for
@@ -328,11 +303,7 @@ Then open `http://localhost:3000`:
## Current Scope
The current `main` branch ships the core platform, Canvas v4 (warm-paper themed), Memory v2 (pgvector semantic recall), the typed-SSOT A2A response path (RFC #2967), **eight production adapters** (Claude Code, Hermes, Gemini CLI, LangGraph, DeepAgents, CrewAI, AutoGen, OpenClaw), skill lifecycle, and operational surfaces.
The companion private repo [`molecule-controlplane`](https://github.com/Molecule-AI/molecule-controlplane) provides the SaaS surface — multi-tenant orchestration on EC2 + Neon + Cloudflare Tunnels, KMS envelope encryption, WorkOS auth, Stripe billing, and a `tenant_resources` audit table with a 30-min reconciler.
Adjacent runtime work such as **NemoClaw** remains branch-level until merged, and this README keeps that distinction explicit on purpose.
The current `main` branch already includes the core platform, canvas, memory model, six production adapters, skill lifecycle, and operational surfaces. Adjacent runtime work such as **NemoClaw** remains branch-level until merged, and this README keeps that distinction explicit on purpose.
## License
+22 -51
View File
@@ -1,7 +1,7 @@
<div align="center">
<p>
<img src="./docs/assets/branding/molecule-icon.svg" alt="Molecule AI" width="160" />
<img src="./docs/assets/branding/molecule-icon.png" alt="Molecule AI 图案 Logo" width="160" />
</p>
<p>
@@ -38,8 +38,8 @@
<a href="./docs/agent-runtime/workspace-runtime.md"><strong>Workspace Runtime</strong></a>
</p>
[![Deploy on Railway](https://railway.app/button.svg)](https://railway.app/new/template?template=https://git.moleculesai.app/molecule-ai/molecule-core)
[![Deploy to Render](https://render.com/images/deploy-to-render-button.svg)](https://render.com/deploy?repo=https://git.moleculesai.app/molecule-ai/molecule-core)
[![Deploy on Railway](https://railway.app/button.svg)](https://railway.app/new/template?template=https://github.com/Molecule-AI/molecule-core)
[![Deploy to Render](https://render.com/images/deploy-to-render-button.svg)](https://render.com/deploy?repo=https://github.com/Molecule-AI/molecule-core)
</div>
@@ -52,8 +52,8 @@ Molecule AI 是目前最强的 AI Agent 组织治理方案之一,用来把 age
它把过去分散在 demo、内部胶水代码和各类 framework 私有工具里的关键能力,收敛成一个产品:
- 一套组织原生 control plane,管理团队、角色、层级和生命周期
- 一套 runtime abstraction,让 **8 个** agent runtime —— LangGraph、DeepAgents、Claude Code、CrewAI、AutoGen、**Hermes**、**Gemini CLI**、OpenClaw —— 共用一套 workspace 契约
- 一套与组织边界对齐的 memory 模型,把 recall、sharing 和 skill evolution 放进同一体系Memory v2 由 pgvector 支撑语义召回)
- 一套 runtime abstraction,让 LangGraph、DeepAgents、Claude Code、CrewAI、AutoGen、OpenClaw 并存运行
- 一套与组织边界对齐的 memory 模型,把 recall、sharing 和 skill evolution 放进同一体系
- 一套面向线上 workspace 的运维面,统一完成观测、暂停、重启、检查和持续改进
今天很多团队能做好 workflow、单 agent、coding agent,或者自定义 multi-agent graph 中的一种。
@@ -74,7 +74,7 @@ Molecule AI 填的就是这个空白。
### 3. Runtime 选择不再是死路
LangGraph、DeepAgents、Claude Code、CrewAI、AutoGen、Hermes、Gemini CLI、OpenClaw 都可以挂到同一个 workspace abstraction 下。团队可以统一治理方式,而不必统一到底层 runtime。
LangGraph、DeepAgents、Claude Code、CrewAI、AutoGen、OpenClaw 都可以挂到同一个 workspace abstraction 下。团队可以统一治理方式,而不必统一到底层 runtime。
### 4. Memory 被当成基础设施来做
@@ -116,8 +116,6 @@ Molecule AI 并不是要替代下面这些 framework,而是把它们纳入更
| **Claude Code** | `main` 已支持 | 真实编码工作流、CLI-native continuity | 安全 workspace 抽象、A2A delegation、组织边界、共享 control plane |
| **CrewAI** | `main` 已支持 | 角色型 crew 模式清晰 | 持久 workspace 身份、统一策略、共享 Canvas 和 registry |
| **AutoGen** | `main` 已支持 | assistant/tool orchestration | 统一部署、层级协作、共享运维平面 |
| **Hermes 4** | `main` 已支持 | 混合推理、原生工具调用、json_schema 输出(NousResearch/hermes-agent | Option B 上游 hook、A2A 桥接 OpenAI 兼容 API、多 provider 自动派生 |
| **Gemini CLI** | `main` 已支持 | Google Gemini CLI 持续会话 | workspace 生命周期、A2A、层级感知协作、共享运维平面 |
| **OpenClaw** | `main` 已支持 | CLI-native runtime,自有 session 模型 | workspace 生命周期、templates、activity logs、拓扑感知协作 |
| **NemoClaw** | `feat/nemoclaw-t4-docker` 分支 WIP | NVIDIA 方向 runtime 路线 | 计划并入同一抽象层,但当前还不是 `main` 已合并能力 |
@@ -183,10 +181,9 @@ Molecule AI 并不是要替代下面这些 framework,而是把它们纳入更
## `main` 分支已经具备什么
### Canvasv4
### Canvas
- Next.js 15 + React Flow + Zustand
- **warm-paper 主题系统** —— light / dark / 跟随系统;SSR cookie + nonce'd boot 脚本 + ThemeProvider;终端与代码面板始终保持深色
- drag-to-nest 团队构建
- empty state + onboarding wizard
- template palette
@@ -195,9 +192,8 @@ Molecule AI 并不是要替代下面这些 framework,而是把它们纳入更
### Platform
- Go 1.25 / Gin control plane80+ HTTP 端点 + Gorilla WebSocket fanout
- workspace CRUD 和 provisioning(可插拔 Provisioner —— 本地 Docker、生产 EC2 + SSM
- **A2A 响应路径已收敛为类型化的判别联合(RFC #2967** —— 冻结 dataclass + 全量 parser100% 单元测试 + 对抗性 fuzz 覆盖
- Go/Gin control plane
- workspace CRUD 和 provisioning
- registry 与 heartbeat
- 浏览器安全的 A2A proxy
- team expansion/collapse
@@ -207,10 +203,10 @@ Molecule AI 并不是要替代下面这些 framework,而是把它们纳入更
### Runtime
- 统一 `workspace/` 镜像;生产环境采用 thin AMIus-east-2
- adapter 驱动执行,覆盖 **8 个 runtime**Claude Code、Hermes、Gemini CLI、LangGraph、DeepAgents、CrewAI、AutoGen、OpenClaw
- 统一 `workspace/` 镜像
- adapter 驱动执行
- Agent Card 注册
- awareness-backed memory**Memory v2 由 pgvector 支撑**语义召回
- awareness-backed memory
- plugin 挂载共享 rules/skills
- 本地 skills 热加载
- coordinator-only delegation 路径
@@ -224,21 +220,6 @@ Molecule AI 并不是要替代下面这些 framework,而是把它们纳入更
- runtime tiers
- 终端与文件层面的 workspace 直接排障
### SaaS(由 [`molecule-controlplane`](https://github.com/Molecule-AI/molecule-controlplane) 提供)
- 多租户运行在 AWS EC2 + Neon(每租户一个 Postgres branch+ Cloudflare Tunnels(每租户一条隧道,对外不开任何端口)
- WorkOS AuthKit + Stripe Checkout + Customer Portal
- AWS KMS 信封加密(DB / Redis 连接串);AWS Secrets Manager 负责租户 bootstrap
- `tenant_resources` 审计表 + 30 分钟 boot-event-aware reconciler —— 每个 CF / AWS lifecycle 事件都有记录,每 30 分钟比对 claim 与实际状态
### 在 Claude Code 里直接接入(由 [`molecule-mcp-claude-channel`](https://github.com/Molecule-AI/molecule-mcp-claude-channel) 提供)
- 把 Molecule A2A 流量桥接到本地 Claude Code 会话的 MCP 插件
- 订阅一个或多个 workspacepeer 的消息会以 user-turn 出现,回复会经 Molecule A2A 路由出去
- 无需公网隧道、无需公开端点 —— 插件启动时自动把每个 watched workspace 注册成 `delivery_mode=poll`,长轮询 `/activity?since_id=…`
- 多租户友好:单次安装即可同时 watch 跨多个 Molecule 租户的 workspace`MOLECULE_PLATFORM_URLS` 按 workspace 配置)
- 通过标准 marketplace 流程安装:`/plugin marketplace add Molecule-AI/molecule-mcp-claude-channel``/plugin install molecule-channel@molecule-mcp-claude-channel`
## 适合什么团队
Molecule AI 特别适合下面这些场景:
@@ -251,29 +232,23 @@ Molecule AI 特别适合下面这些场景:
## 架构总览
```text
Canvas (Next.js 15, warm-paper :3000) <--HTTP / WS--> Platform (Go 1.25 :8080) <---> Postgres + Redis
| |
| +--> Provisioner: Docker (本地) / EC2 + SSM (生产)
| +--> bundles · templates · secrets · KMS
Canvas (Next.js :3000) <--HTTP / WS--> Platform (Go :8080) <---> Postgres + Redis
| |
| +--> Docker provisioner / bundles / templates / secrets
|
+------------------------- 展示 ------------------------> workspaces, teams, tasks, traces, events
+-------------------- 展示 --------------------> workspaces, teams, tasks, traces, events
Workspace Runtime (Python ≥3.11,含 adapter 集合的镜像)
- 8 个 adapter: LangGraph / DeepAgents / Claude Code / CrewAI / AutoGen / Hermes / Gemini CLI / OpenClaw
- Agent Card + A2A servertyped-SSOT 响应路径,RFC #2967
- heartbeat + activity + awareness-backed memoryMemory v2 —— pgvector 语义召回)
Workspace Runtime (Python image with adapters)
- LangGraph / DeepAgents / Claude Code / CrewAI / AutoGen / OpenClaw
- Agent Card + A2A server
- heartbeat + activity + awareness-backed memory
- skills + plugins + hot reload
SaaS Control Plane (molecule-controlplane,私有)
- 每租户 EC2 + Neon (Postgres branch) + Cloudflare Tunnel
- WorkOS · Stripe · KMS · AWS Secrets Manager
- tenant_resources 审计 + 30 分钟 reconciler
```
## 快速开始
```bash
git clone https://git.moleculesai.app/molecule-ai/molecule-core.git
git clone https://github.com/Molecule-AI/molecule-core.git
cd molecule-core
cp .env.example .env
@@ -321,11 +296,7 @@ npm run dev
## 当前范围说明
当前 `main` 已经包含核心平台、Canvas v4warm-paper 主题)、Memory v2pgvector 语义召回)、typed-SSOT A2A 响应路径(RFC #2967)、**8 个正式 adapter**Claude Code、Hermes、Gemini CLI、LangGraph、DeepAgents、CrewAI、AutoGen、OpenClaw、skill lifecycle,以及主要运维面。
配套的私有仓库 [`molecule-controlplane`](https://github.com/Molecule-AI/molecule-controlplane) 提供 SaaS 层 —— 多租户编排(EC2 + Neon + Cloudflare Tunnels)、KMS 信封加密、WorkOS 鉴权、Stripe 计费,以及 `tenant_resources` 审计表加 30 分钟 reconciler。
**NemoClaw** 这样的相邻 runtime 路线仍然属于分支级工作,只有合并后才会进入正式支持列表,这里会明确区分。
当前 `main` 已经包含核心平台、Canvas、memory model、6 个正式 adapter、skill lifecycle主要运维面。**NemoClaw** 这样的相邻 runtime 路线仍然属于分支级工作,只有合并后才会进入正式支持列表,这里会明确区分。
## License
+7
View File
@@ -3,6 +3,7 @@ import { cookies, headers } from "next/headers";
import "./globals.css";
import { AuthGate } from "@/components/AuthGate";
import { CookieConsent } from "@/components/CookieConsent";
import { PurchaseSuccessModal } from "@/components/PurchaseSuccessModal";
import { ThemeProvider } from "@/lib/theme-provider";
import {
THEME_COOKIE,
@@ -86,6 +87,12 @@ export default async function RootLayout({
vercel preview URL, apex) pass through unchanged. */}
<AuthGate>{children}</AuthGate>
<CookieConsent />
{/* Demo Mock #1: post-purchase success toast. Mounted at the
layout level so it persists across page state transitions
(loading → hydrated → error) without being unmounted and
losing its open-state. Reads ?purchase_success=1 from the
URL on first paint, then strips the param. */}
<PurchaseSuccessModal />
</ThemeProvider>
</body>
</html>
@@ -0,0 +1,175 @@
"use client";
/**
* PurchaseSuccessModal — demo-only post-purchase confirmation.
*
* Mounted on the canvas root (`app/page.tsx`). On first paint it inspects
* `?purchase_success=1[&item=<name>]` on the current URL. If present, it
* renders a centred modal styled after `ConfirmDialog`, schedules a 5s
* auto-dismiss, and rewrites the URL via `history.replaceState` to drop
* the params so a refresh after dismiss does NOT re-show the modal.
*
* Mock for the funding demo — there is no real billing surface behind
* this. The marketplace "Purchase" button on the landing page redirects
* here with the params; this modal is the only thing the user sees of
* the "transaction".
*
* Styling matches the warm-paper @theme tokens (surface-sunken / line /
* ink / good) so it tracks light + dark without per-mode overrides.
*/
import { useEffect, useRef, useState } from "react";
import { createPortal } from "react-dom";
const AUTO_DISMISS_MS = 5000;
function readPurchaseParams(): { open: boolean; item: string | null } {
if (typeof window === "undefined") return { open: false, item: null };
const sp = new URLSearchParams(window.location.search);
const flag = sp.get("purchase_success");
if (flag !== "1" && flag !== "true") return { open: false, item: null };
return { open: true, item: sp.get("item") };
}
function stripPurchaseParams() {
if (typeof window === "undefined") return;
const url = new URL(window.location.href);
url.searchParams.delete("purchase_success");
url.searchParams.delete("item");
// replaceState (not pushState) so back-button doesn't return to the
// pre-strip URL and re-trigger the modal.
window.history.replaceState({}, "", url.toString());
}
export function PurchaseSuccessModal() {
const [open, setOpen] = useState(false);
const [item, setItem] = useState<string | null>(null);
const [mounted, setMounted] = useState(false);
const dialogRef = useRef<HTMLDivElement>(null);
// Read the URL params once on mount. We don't subscribe to navigation —
// this modal is a one-shot for the demo redirect, not a persistent
// listener.
useEffect(() => {
setMounted(true);
const { open: shouldOpen, item: itemName } = readPurchaseParams();
if (shouldOpen) {
setOpen(true);
setItem(itemName);
// Clean the URL immediately so a refresh after the modal is closed
// (or even while it's still open) does NOT re-trigger it.
stripPurchaseParams();
}
}, []);
// Auto-dismiss timer + Escape handler.
useEffect(() => {
if (!open) return;
const t = window.setTimeout(() => setOpen(false), AUTO_DISMISS_MS);
const onKey = (e: KeyboardEvent) => {
if (e.key === "Escape") setOpen(false);
};
window.addEventListener("keydown", onKey);
// Focus the close button so keyboard users land on it after redirect.
const raf = requestAnimationFrame(() => {
dialogRef.current?.querySelector<HTMLButtonElement>("button")?.focus();
});
return () => {
window.clearTimeout(t);
window.removeEventListener("keydown", onKey);
cancelAnimationFrame(raf);
};
}, [open]);
if (!open || !mounted) return null;
const itemLabel = item ? decodeURIComponent(item) : "Your new agent";
return createPortal(
<div
className="fixed inset-0 z-[9999] flex items-center justify-center"
data-testid="purchase-success-modal"
>
{/* Backdrop — click closes, matches ConfirmDialog backdrop. */}
<div
className="absolute inset-0 bg-black/60 backdrop-blur-sm"
onClick={() => setOpen(false)}
aria-hidden="true"
/>
<div
ref={dialogRef}
role="dialog"
aria-modal="true"
aria-labelledby="purchase-success-title"
className="relative bg-surface-sunken border border-line rounded-xl shadow-2xl shadow-black/50 max-w-[420px] w-full mx-4 overflow-hidden"
>
<div className="px-6 pt-6 pb-4">
<div className="flex items-start gap-4">
{/* Success glyph — uses --color-good so it tracks the theme.
Inline SVG over an emoji so it stays readable + on-brand
in both light and dark. */}
<div
className="flex h-10 w-10 flex-shrink-0 items-center justify-center rounded-full"
style={{
background:
"color-mix(in srgb, var(--color-good) 15%, transparent)",
color: "var(--color-good)",
}}
>
<svg
width="22"
height="22"
viewBox="0 0 24 24"
fill="none"
aria-hidden="true"
>
<circle
cx="12"
cy="12"
r="10"
stroke="currentColor"
strokeWidth="1.5"
/>
<path
d="M7.5 12.5L10.5 15.5L16.5 9.5"
stroke="currentColor"
strokeWidth="1.8"
strokeLinecap="round"
strokeLinejoin="round"
/>
</svg>
</div>
<div className="flex-1">
<h3
id="purchase-success-title"
className="text-base font-semibold text-ink"
>
Purchase successful
</h3>
<p className="mt-1.5 text-[13px] leading-relaxed text-ink-mid">
<span className="font-medium text-ink">{itemLabel}</span> has
been added to your workspace. Provisioning starts in the
background you can keep working while it spins up.
</p>
</div>
</div>
</div>
<div className="flex items-center justify-between gap-3 px-6 py-3 border-t border-line bg-surface/50">
<span className="font-mono text-[10.5px] uppercase tracking-[0.12em] text-ink-soft">
auto-dismiss · {AUTO_DISMISS_MS / 1000}s
</span>
<button
type="button"
onClick={() => setOpen(false)}
className="px-3.5 py-1.5 text-[13px] rounded-lg bg-accent hover:bg-accent-strong text-white transition-colors focus:outline-none focus-visible:ring-2 focus-visible:ring-offset-2 focus-visible:ring-offset-surface-sunken focus-visible:ring-accent/60"
>
Close
</button>
</div>
</div>
</div>,
document.body,
);
}
+55 -37
View File
@@ -13,6 +13,7 @@ import { AttachmentPreview } from "./chat/AttachmentPreview";
import { extractFilesFromTask } from "./chat/message-parser";
import { AgentCommsPanel } from "./chat/AgentCommsPanel";
import { appendActivityLine } from "./chat/activityLog";
import { activityRowToMessages, type ActivityRowForHydration } from "./chat/historyHydration";
import { runtimeDisplayName } from "@/lib/runtime-names";
import { ConfirmDialog } from "@/components/ConfirmDialog";
@@ -49,12 +50,38 @@ interface A2AResponse {
};
}
// Internal-self-message filtering moved server-side in RFC #2945
// PR-C/D — the platform's /chat-history endpoint applies the
// IsInternalSelfMessage predicate before returning rows, so the
// client no longer needs the local backstop on the history path.
// The proper fix is still X-Workspace-ID header (source_id=workspace_id);
// the platform-side prefix filter handles the residual cases.
/** Detect activity-log rows that the workspace's own runtime fired
* against itself but were misclassified as canvas-source. The proper
* fix is the X-Workspace-ID header from `self_source_headers()` in
* workspace/platform_auth.py, which makes the platform record
* source_id = workspace_id. But three failure modes still leak a
* self-message into "My Chat":
*
* 1. Historical rows already in the DB with source_id=NULL.
* 2. Workspace containers running pre-fix heartbeat.py / main.py
* (the fix only takes effect after an image rebuild + redeploy).
* 3. Future internal triggers added without the helper.
*
* This client-side filter recognises the heartbeat trigger by its
* exact prefix — the heartbeat assembles
*
* "Delegation results are ready. Review them and take appropriate
* action:\n" + summary_lines + report_instruction
*
* in workspace/heartbeat.py. The prefix is template-fixed so a
* string match is reliable. If the heartbeat copy ever changes,
* update this constant in the same commit.
*
* This is a backstop, not the primary defence — the X-Workspace-ID
* header is. Filtering content is fragile to copy edits, so keep
* the list narrow. */
const INTERNAL_SELF_MESSAGE_PREFIXES = [
"Delegation results are ready. Review them and take appropriate action",
];
function isInternalSelfMessage(text: string): boolean {
return INTERNAL_SELF_MESSAGE_PREFIXES.some((p) => text.startsWith(p));
}
// extractReplyText pulls the agent's text reply out of an A2A response.
// Concatenates ALL text parts (joined with "\n") rather than returning
@@ -107,19 +134,8 @@ const INITIAL_HISTORY_LIMIT = 10;
const OLDER_HISTORY_BATCH = 20;
/**
* Load chat history from the platform's typed /chat-history endpoint.
*
* Server-side rendering of activity_logs rows into ChatMessage shape
* lives in workspace-server/internal/messagestore/postgres_store.go
* (RFC #2945 PR-C/D). The server already applies the canvas-source
* filter, the internal-self-message predicate, the role decision
* (status=error vs agent-error prefix → system), and the v0/v1
* file-shape extraction. Canvas just renders what it receives.
*
* Wire shape (mirrors ChatMessage exactly, no per-row mapping needed):
*
* GET /workspaces/:id/chat-history?limit=N&before_ts=T
* 200 → {"messages": ChatMessage[], "reached_end": boolean}
* Load chat history from the activity_logs database via the platform API.
* Uses source=canvas to only get user-initiated messages (not agent-to-agent).
*
* Pagination:
* - Pass `limit` to bound the page size (newest-first from server).
@@ -127,10 +143,10 @@ const OLDER_HISTORY_BATCH = 20;
* timestamp. Combined with limit, this yields the next-older page
* when scrolling backward through history.
*
* `reachedEnd` is propagated from the server. The server computes it
* by comparing rowCount vs limit so a partial last page is correctly
* detected even when the row→bubble fan-out is non-1:1 (each row
* produces 1-2 bubbles).
* `reachedEnd` is true when the server returned fewer rows than asked
* for — caller uses this to disable further older-batch fetches.
* (Counts row-level returns, not chat-bubble count: each row may
* produce 1-2 bubbles.)
*/
async function loadMessagesFromDB(
workspaceId: string,
@@ -138,23 +154,25 @@ async function loadMessagesFromDB(
beforeTs?: string,
): Promise<{ messages: ChatMessage[]; error: string | null; reachedEnd: boolean }> {
try {
const params = new URLSearchParams({ limit: String(limit) });
const params = new URLSearchParams({
type: "a2a_receive",
source: "canvas",
limit: String(limit),
});
if (beforeTs) params.set("before_ts", beforeTs);
const resp = await api.get<{ messages: ChatMessage[]; reached_end: boolean }>(
`/workspaces/${workspaceId}/chat-history?${params.toString()}`,
const activities = await api.get<ActivityRowForHydration[]>(
`/workspaces/${workspaceId}/activity?${params.toString()}`,
);
// Server emits oldest-first within the page (RFC #2945 PR-C-2
// post-fix: server reverses row-aware before returning so the
// wire is display-ready). Canvas appends/prepends without
// reordering — this avoids the pair-flip bug a naive flat
// reverse causes when each row produces a (user, agent) pair
// with the same timestamp.
return {
messages: resp.messages ?? [],
error: null,
reachedEnd: resp.reached_end,
};
const messages: ChatMessage[] = [];
// Activities are newest-first, reverse for chronological order.
// Per-row mapping lives in chat/historyHydration.ts so it can be
// unit-tested without spinning up the full ChatTab component
// (regression cover for the timestamp-collapse bug).
for (const a of [...activities].reverse()) {
messages.push(...activityRowToMessages(a, isInternalSelfMessage));
}
return { messages, error: null, reachedEnd: activities.length < limit };
} catch (err) {
return {
messages: [],
+45 -66
View File
@@ -21,39 +21,20 @@ interface Props {
// --- Agent Card Section ---
function AgentCardSection({ workspaceId }: { workspaceId: string }) {
// Initial card value comes from the canvas store — node.data.agentCard
// is hydrated by the platform stream when the workspace appears in the
// graph, so reading it here avoids a duplicate `GET /workspaces/${id}`
// (the parent ConfigTab.loadConfig already fetches workspace metadata,
// and refetching here adds a serialised RTT to the panel-open path —
// contributed to the ~20s detail-panel load reported in core#11).
// Local state still tracks the edited/saved value so the editor flow
// is unchanged.
const storeCard = useCanvasStore((s) => {
// Defensive against test mocks that omit `nodes` (some test files
// stub the store with a minimal shape). In production `nodes` is
// always an array — empty or not — so the optional chaining only
// matters for the test path.
const node = s.nodes?.find?.((n) => n.id === workspaceId);
return (node?.data.agentCard as
| Record<string, unknown>
| null
| undefined) ?? null;
});
const [card, setCard] = useState<Record<string, unknown> | null>(storeCard);
const [card, setCard] = useState<Record<string, unknown> | null>(null);
const [loading, setLoading] = useState(true);
const [editing, setEditing] = useState(false);
const [draft, setDraft] = useState("");
const [saving, setSaving] = useState(false);
const [error, setError] = useState<string | null>(null);
const [success, setSuccess] = useState(false);
// If the store updates while this section is mounted (another tab
// pushed an update via the platform event stream), reflect that —
// unless the user is mid-edit, in which case we don't clobber their
// unsaved draft.
useEffect(() => {
if (!editing) setCard(storeCard);
}, [storeCard, editing]);
api.get<Record<string, unknown>>(`/workspaces/${workspaceId}`)
.then((ws) => setCard((ws.agent_card as Record<string, unknown>) || null))
.catch(() => {})
.finally(() => setLoading(false));
}, [workspaceId]);
const handleSave = async () => {
setError(null);
@@ -72,7 +53,9 @@ function AgentCardSection({ workspaceId }: { workspaceId: string }) {
return (
<Section title="Agent Card" defaultOpen={false}>
{editing ? (
{loading ? (
<div className="text-[10px] text-ink-soft">Loading...</div>
) : editing ? (
<div className="space-y-2">
<textarea
aria-label="Agent card JSON editor"
@@ -238,51 +221,47 @@ export function ConfigTab({ workspaceId }: Props) {
setLoading(true);
setError(null);
// Load workspace metadata (runtime + model + provider) in parallel.
// These are independent GETs against three workspace-server endpoints
// and used to be awaited serially — for SaaS workspaces each call
// round-trips through an EIC SSH tunnel, so the previous serial
// pattern stacked 3-5s of tunnel-setup latency per call (core#11).
// Promise.all overlaps them; the per-call cost stays the same but
// wall time drops to max() instead of sum().
//
// Each leg has its own .catch handler that yields a sentinel value,
// matching the previous semantics:
// - /workspaces/${id}: required source-of-truth for runtime+tier;
// fall back to YAML if the GET fails (rare, network-class only).
// - /workspaces/${id}/model: non-fatal; empty model lets the form
// fall through to YAML runtime_config.model.
// - /workspaces/${id}/provider: non-fatal; old workspace-servers
// return 404, in which case provider="" and Save skips the PUT.
//
// See GH #1894 for the workspace-row-as-source-of-truth rationale
// that motivated splitting from a single config.yaml read.
const [wsRes, modelRes, providerRes] = await Promise.all([
api.get<{ runtime?: string; tier?: number }>(`/workspaces/${workspaceId}`)
.catch(() => ({} as { runtime?: string; tier?: number })),
api.get<{ model?: string }>(`/workspaces/${workspaceId}/model`)
.catch(() => ({} as { model?: string })),
api.get<{ provider?: string }>(`/workspaces/${workspaceId}/provider`)
.catch(() => null),
]);
const wsMetadataRuntime = (wsRes.runtime || "").trim();
const wsMetadataModel = (modelRes.model || "").trim();
const wsMetadataTier: number | null =
typeof wsRes.tier === "number" ? wsRes.tier : null;
if (providerRes !== null) {
const loadedProvider = (providerRes.provider || "").trim();
setProvider(loadedProvider);
setOriginalProvider(loadedProvider);
} else {
setProvider("");
setOriginalProvider("");
}
// ALWAYS load workspace metadata first (runtime + model). These are the
// source of truth regardless of whether the runtime uses our config.yaml
// template. Without this the form falls back to empty/default values on
// a hermes workspace (which doesn't use our template), creating the
// appearance that the saved runtime is unset — and worse, clicking Save
// would silently flip `runtime` from `hermes` back to the dropdown
// default `LangGraph`. See GH #1894.
let wsMetadataRuntime = "";
let wsMetadataModel = "";
let wsMetadataTier: number | null = null;
try {
const ws = await api.get<{ runtime?: string; tier?: number }>(`/workspaces/${workspaceId}`);
wsMetadataRuntime = (ws.runtime || "").trim();
if (typeof ws.tier === "number") wsMetadataTier = ws.tier;
} catch { /* fall back to config.yaml */ }
try {
const m = await api.get<{ model?: string }>(`/workspaces/${workspaceId}/model`);
wsMetadataModel = (m.model || "").trim();
} catch { /* non-fatal */ }
// originalModel is set further down once the YAML has been parsed —
// we want it to reflect what the form ACTUALLY rendered, which may
// be the YAML's runtime_config.model fallback when MODEL_PROVIDER
// is empty. Setting it here from wsMetadataModel alone would be
// wrong for hermes/pre-#240 workspaces.
// Load explicit provider override (Option B PR-5). Endpoint returns
// {provider: "", source: "default"} when no override is set, so the
// empty string is the legitimate "auto-derive" signal — don't treat
// it as a load error. Non-fatal: an older workspace-server that
// predates PR-2 returns 404 here; the form falls back to "" and
// Save just won't PUT the provider field.
try {
const p = await api.get<{ provider?: string }>(`/workspaces/${workspaceId}/provider`);
const loadedProvider = (p.provider || "").trim();
setProvider(loadedProvider);
setOriginalProvider(loadedProvider);
} catch {
setProvider("");
setOriginalProvider("");
}
// Skip the config.yaml fetch entirely for runtimes that manage
// their own config (external, hermes, etc.) — they don't have a
// platform-side template, so the GET would 404. The catch block
@@ -1,11 +1,13 @@
// @vitest-environment jsdom
//
// Pins the lazy-loading chat-history pagination.
// Pins the lazy-loading chat-history pagination added 2026-05-05.
//
// PR-C-2 (RFC #2945): canvas was migrated from /activity?type=a2a_receive
// to /chat-history. Server now returns typed ChatMessage[] in
// display-ready oldest-first order. These tests guard the canvas-side
// pagination invariants against the new endpoint surface.
// Pre-fix: ChatTab fetched the newest 50 messages on every mount and
// scrolled to bottom, paying full DOM cost up-front even when the user
// only wanted to read the last few bubbles. Post-fix: initial load is
// bounded to 10 newest, and an IntersectionObserver on a top sentinel
// triggers loadOlder() (batch of 20 with `before_ts` cursor) when the
// user scrolls up.
//
// Pinned branches:
// 1. Initial fetch carries `limit=10` and NO before_ts (newest-first
@@ -18,10 +20,11 @@
// asserting the rendered bubble count matches the full page).
// 4. The retry button after a failed initial load uses the same
// INITIAL_HISTORY_LIMIT (10), not the legacy 50.
// 5. before_ts cursor is the OLDEST timestamp from the current page,
// passed verbatim to walk backward.
// 6. Inflight guard rejects duplicate IO triggers while a loadOlder
// fetch is in flight.
//
// IntersectionObserver / scroll-anchor restoration is exercised by the
// E2E synth-canary suite — pinning it in jsdom would require mocking
// the observer and faking layout, which is brittler than trusting a
// live-DOM canary against the staging tenant.
import { describe, it, expect, vi, afterEach, beforeEach } from "vitest";
import { render, screen, cleanup, waitFor, fireEvent } from "@testing-library/react";
@@ -30,31 +33,24 @@ import React from "react";
afterEach(cleanup);
// Both ChatTab sub-panels (MyChat + AgentComms) mount simultaneously so
// keyboard tab order and aria-controls land on a real DOM. MyChat's
// loadMessagesFromDB hits /chat-history; AgentComms's polling hits a
// different URL. Route the mock by URL so each gets a sensible default
// and only MyChat's calls land in the assertion array.
const myChatHistoryCalls: string[] = [];
let myChatNextResponse:
| { ok: true; messages: unknown[]; reachedEnd?: boolean }
| { ok: false; err: Error } = { ok: true, messages: [] };
// keyboard tab order and aria-controls land on a real DOM. Both fire
// /activity GETs on mount: MyChat's hits `type=a2a_receive&source=canvas`,
// AgentComms's hits a different filter. Route the mock by URL so each
// gets a sensible default and only MyChat's call is what the assertions
// scrutinise.
const myChatActivityCalls: string[] = [];
let myChatNextResponse: { ok: true; rows: unknown[] } | { ok: false; err: Error } = {
ok: true,
rows: [],
};
const apiGet = vi.fn((path: string): Promise<unknown> => {
if (path.includes("/chat-history")) {
myChatHistoryCalls.push(path);
if (myChatNextResponse.ok) {
const reached_end =
myChatNextResponse.reachedEnd !== undefined
? myChatNextResponse.reachedEnd
: myChatNextResponse.messages.length < 10;
return Promise.resolve({
messages: myChatNextResponse.messages,
reached_end,
});
}
if (path.includes("type=a2a_receive") && path.includes("source=canvas")) {
myChatActivityCalls.push(path);
if (myChatNextResponse.ok) return Promise.resolve(myChatNextResponse.rows);
return Promise.reject(myChatNextResponse.err);
}
// AgentComms / heartbeat / anything else — empty array safe default.
// AgentComms / heartbeat / anything else — empty array is a safe
// default that won't blow up the corresponding component's .then().
return Promise.resolve([]);
});
const apiPost = vi.fn();
@@ -88,8 +84,8 @@ const ioInstances: IOInstance[] = [];
beforeEach(() => {
apiGet.mockClear();
apiPost.mockReset();
myChatHistoryCalls.length = 0;
myChatNextResponse = { ok: true, messages: [] };
myChatActivityCalls.length = 0;
myChatNextResponse = { ok: true, rows: [] };
ioInstances.length = 0;
class FakeIO {
private inst: IOInstance;
@@ -105,12 +101,20 @@ beforeEach(() => {
this.inst.disconnected = true;
}
}
// Install on every reachable global — different bundlers / module
// graphs can resolve `IntersectionObserver` via `window`, `globalThis`,
// or the bare global. Without all three, jsdom's own (pre-existing)
// stub silently wins and ioInstances stays empty.
(window as unknown as { IntersectionObserver: unknown }).IntersectionObserver = FakeIO;
(globalThis as unknown as { IntersectionObserver: unknown }).IntersectionObserver = FakeIO;
// jsdom doesn't implement scrollIntoView; ChatTab calls it after every
// messages update.
Element.prototype.scrollIntoView = vi.fn();
});
function triggerIntersection(instanceIdx = -1) {
// -1 → the latest observer (the live one). Tests targeting an old
// (disconnected) instance pass a positive index.
const inst = ioInstances.at(instanceIdx);
if (!inst) throw new Error(`no IO instance at ${instanceIdx}`);
inst.callback(
@@ -121,30 +125,25 @@ function triggerIntersection(instanceIdx = -1) {
import { ChatTab } from "../ChatTab";
// makeMessagePair returns a (user, agent) pair sharing a timestamp,
// matching the wire shape /chat-history emits per activity_logs row.
// Server-side reverseRowChunks ensures the wire is oldest-first across
// rows but [user, agent] within each row.
function makeMessagePair(seq: number): unknown[] {
// Zero-pad seq into the minute slot so seq=10 produces a valid
// timestamp (00:10:00Z, not 00:010:00Z).
function makeActivityRow(seq: number): Record<string, unknown> {
// Zero-pad seq into the minute slot so "seq=10" doesn't produce
// the invalid timestamp "00:010:00Z" (caught by the loadOlder URL
// assertion below — first version of the helper used `0${seq}` and
// the test failed on `before_ts` having an extra digit).
const mm = String(seq).padStart(2, "0");
const ts = `2026-05-05T00:${mm}:00Z`;
return [
{ id: `u-${seq}`, role: "user", content: `user msg ${seq}`, timestamp: ts },
{ id: `a-${seq}`, role: "agent", content: `agent reply ${seq}`, timestamp: ts },
];
return {
activity_type: "a2a_receive",
status: "ok",
created_at: `2026-05-05T00:${mm}:00Z`,
request_body: { params: { message: { parts: [{ kind: "text", text: `user msg ${seq}` }] } } },
response_body: { result: `agent reply ${seq}` },
};
}
// pageOldestFirst builds a wire-shape page (oldest-first within page)
// of `count` row-pairs starting at seq=`start`. Mirrors the server's
// post-reverseRowChunks emission order.
function pageOldestFirst(start: number, count: number): unknown[] {
const out: unknown[] = [];
for (let i = 0; i < count; i++) {
out.push(...makeMessagePair(start + i));
}
return out;
// Server returns newest-first; the helper builds a server-shape page
// so the order in the rendered messages array matches production.
function newestFirstPage(start: number, count: number): unknown[] {
return Array.from({ length: count }, (_, i) => makeActivityRow(start + count - 1 - i));
}
const minimalData = {
@@ -154,30 +153,28 @@ const minimalData = {
} as unknown as Parameters<typeof ChatTab>[0]["data"];
describe("ChatTab lazy history pagination", () => {
it("initial fetch carries limit=10 (not the legacy 50) and hits /chat-history", async () => {
myChatNextResponse = { ok: true, messages: makeMessagePair(1) };
it("initial fetch carries limit=10 (not the legacy 50)", async () => {
myChatNextResponse = { ok: true, rows: [makeActivityRow(1)] };
render(<ChatTab workspaceId="ws-1" data={minimalData} />);
await waitFor(() => expect(myChatHistoryCalls.length).toBe(1));
const url = myChatHistoryCalls[0];
expect(url).toContain("/chat-history");
await waitFor(() => expect(myChatActivityCalls.length).toBe(1));
const url = myChatActivityCalls[0];
expect(url).toContain("limit=10");
expect(url).not.toContain("limit=50");
// before_ts should NOT be set on the initial fetch — that's the
// newest-first slice the user lands on.
expect(url).not.toContain("before_ts");
// /chat-history filters source-canvas server-side; client should
// NOT pass type/source params (they belonged to /activity).
expect(url).not.toContain("type=a2a_receive");
expect(url).not.toContain("source=canvas");
});
it("hides the top sentinel when initial fetch returns fewer than the limit", async () => {
// 3 < 10 → server says "no more older history exists"; sentinel
// should NOT mount and the "Loading older messages…" line should
// never appear.
myChatNextResponse = { ok: true, messages: pageOldestFirst(1, 3) };
// never appear (it can't, since the sentinel is what triggers it).
myChatNextResponse = {
ok: true,
rows: [makeActivityRow(1), makeActivityRow(2), makeActivityRow(3)],
};
render(<ChatTab workspaceId="ws-2" data={minimalData} />);
await waitFor(() => expect(myChatHistoryCalls.length).toBe(1));
await waitFor(() => expect(myChatActivityCalls.length).toBe(1));
await waitFor(() => {
expect(screen.queryByText(/Loading chat history/i)).toBeNull();
});
@@ -185,15 +182,15 @@ describe("ChatTab lazy history pagination", () => {
});
it("renders all messages when initial fetch returns exactly the limit", async () => {
// limit=10 row-pairs → 20 ChatMessages. reachedEnd should be FALSE
// so the sentinel mounts. Verified by bubble counts.
myChatNextResponse = {
ok: true,
messages: pageOldestFirst(1, 10),
reachedEnd: false,
};
// 10 == limit → server might have more older rows; sentinel SHOULD
// mount so the IO observer can fire loadOlder() on scroll-up. We
// verify by checking the rendered bubble count — if hasMore stayed
// true the sentinel render path doesn't crash and all 10 rows
// produced their pair of bubbles.
const fullPage = Array.from({ length: 10 }, (_, i) => makeActivityRow(i + 1));
myChatNextResponse = { ok: true, rows: fullPage };
render(<ChatTab workspaceId="ws-3" data={minimalData} />);
await waitFor(() => expect(myChatHistoryCalls.length).toBe(1));
await waitFor(() => expect(myChatActivityCalls.length).toBe(1));
await waitFor(() => {
expect(screen.queryByText(/Loading chat history/i)).toBeNull();
});
@@ -205,67 +202,54 @@ describe("ChatTab lazy history pagination", () => {
myChatNextResponse = { ok: false, err: new Error("network down") };
render(<ChatTab workspaceId="ws-4" data={minimalData} />);
const retry = await screen.findByText(/Retry/);
myChatNextResponse = { ok: true, messages: makeMessagePair(1) };
myChatNextResponse = { ok: true, rows: [makeActivityRow(1)] };
fireEvent.click(retry);
await waitFor(() => expect(myChatHistoryCalls.length).toBe(2));
const retryUrl = myChatHistoryCalls[1];
expect(retryUrl).toContain("/chat-history");
await waitFor(() => expect(myChatActivityCalls.length).toBe(2));
const retryUrl = myChatActivityCalls[1];
expect(retryUrl).toContain("limit=10");
expect(retryUrl).not.toContain("limit=50");
});
it("loadOlder fetches limit=20 with before_ts=oldest.timestamp", async () => {
// Initial page = 10 row-pairs in oldest-first order (seq 1..10).
// The oldest (and so the cursor for loadOlder) is seq=1's
// timestamp 2026-05-05T00:01:00Z.
myChatNextResponse = {
ok: true,
messages: pageOldestFirst(1, 10),
reachedEnd: false,
};
// Initial page = 10 rows in newest-first order (seq 10..1). After
// the component reverses to oldest-first for display, messages[0]
// is built from seq=1 — the oldest — and its timestamp is what
// before_ts should carry.
myChatNextResponse = { ok: true, rows: newestFirstPage(1, 10) };
render(<ChatTab workspaceId="ws-load-older" data={minimalData} />);
await waitFor(() => expect(myChatHistoryCalls.length).toBe(1));
await waitFor(() => expect(myChatActivityCalls.length).toBe(1));
await waitFor(() => expect(ioInstances.length).toBeGreaterThan(0));
// Stage older-batch response, then fire IO callback.
myChatNextResponse = {
ok: true,
messages: pageOldestFirst(0, 1),
reachedEnd: true,
};
// Stage the older-batch response, then fire the IO callback.
myChatNextResponse = { ok: true, rows: newestFirstPage(0, 1) };
triggerIntersection();
await waitFor(() => expect(myChatHistoryCalls.length).toBe(2));
const olderUrl = myChatHistoryCalls[1];
expect(olderUrl).toContain("/chat-history");
await waitFor(() => expect(myChatActivityCalls.length).toBe(2));
const olderUrl = myChatActivityCalls[1];
expect(olderUrl).toContain("limit=20");
expect(olderUrl).toContain("before_ts=");
expect(decodeURIComponent(olderUrl)).toContain("before_ts=2026-05-05T00:01:00Z");
});
it("inflight guard rejects a second IO trigger while first loadOlder is in flight", async () => {
myChatNextResponse = {
ok: true,
messages: pageOldestFirst(1, 10),
reachedEnd: false,
};
myChatNextResponse = { ok: true, rows: newestFirstPage(1, 10) };
render(<ChatTab workspaceId="ws-inflight" data={minimalData} />);
await waitFor(() => expect(myChatHistoryCalls.length).toBe(1));
await waitFor(() => expect(myChatActivityCalls.length).toBe(1));
await waitFor(() => expect(ioInstances.length).toBeGreaterThan(0));
// Hold the next loadOlder fetch open with a manual deferred so we
// can fire the second trigger while the first is in-flight.
let release!: (resp: unknown) => void;
const deferred = new Promise<unknown>((res) => {
let release!: (rows: unknown[]) => void;
const deferred = new Promise<unknown[]>((res) => {
release = res;
});
apiGet.mockImplementationOnce((path: string): Promise<unknown> => {
myChatHistoryCalls.push(path);
myChatActivityCalls.push(path);
return deferred;
});
triggerIntersection(); // start loadOlder #1
await waitFor(() => expect(myChatHistoryCalls.length).toBe(2));
await waitFor(() => expect(myChatActivityCalls.length).toBe(2));
// Second IO trigger lands while #1 is still pending.
triggerIntersection();
@@ -274,62 +258,79 @@ describe("ChatTab lazy history pagination", () => {
// Without the inflight guard, each of these would have started a
// new fetch. With the guard, none of them do — call count stays 2.
await new Promise((r) => setTimeout(r, 10));
expect(myChatHistoryCalls.length).toBe(2);
expect(myChatActivityCalls.length).toBe(2);
// Release the first fetch with a valid wire response shape.
release({ messages: [], reached_end: true });
await waitFor(() => expect(myChatHistoryCalls.length).toBe(2));
// Release the first fetch. Inflight clears in the finally block;
// a subsequent IO trigger is permitted again (verified by checking
// we can fire a follow-up after release without hanging the test).
release([]);
await waitFor(() => expect(myChatActivityCalls.length).toBe(2));
});
it("empty older response clears the scroll anchor and unmounts the sentinel", async () => {
myChatNextResponse = {
ok: true,
messages: pageOldestFirst(1, 10),
reachedEnd: false,
};
// The bug we're pinning: if loadOlder returns 0 rows, the
// scrollAnchorRef must be cleared so the next paint doesn't try to
// restore against a no-op prepend (which would fight the natural
// bottom-pin for any subsequent live message). hasMore flipping to
// false is the same flag-flip path; sentinel disappearing is the
// observable proxy.
myChatNextResponse = { ok: true, rows: newestFirstPage(1, 10) };
render(<ChatTab workspaceId="ws-anchor" data={minimalData} />);
await waitFor(() => expect(myChatHistoryCalls.length).toBe(1));
await waitFor(() => expect(myChatActivityCalls.length).toBe(1));
await waitFor(() => expect(ioInstances.length).toBeGreaterThan(0));
myChatNextResponse = {
ok: true,
messages: [],
reachedEnd: true,
};
myChatNextResponse = { ok: true, rows: [] }; // empty → reachedEnd
triggerIntersection();
await waitFor(() => expect(myChatHistoryCalls.length).toBe(2));
await waitFor(() => expect(myChatActivityCalls.length).toBe(2));
// After reachedEnd the sentinel unmounts (hasMore=false). We can't
// peek scrollAnchorRef directly, but we can assert the consequence:
// scrollIntoView (the bottom-pin for live appends) is not blocked
// by a stale anchor. Trigger a re-render via an unrelated state
// change… in practice the safest assertion here is that the
// sentinel disappeared (proving the empty response propagated to
// hasMore correctly, which is the same flag-flip path as anchor
// clearing).
await waitFor(() => {
expect(screen.queryByText(/Loading older messages/i)).toBeNull();
});
});
it("IntersectionObserver does not churn when older messages prepend", async () => {
myChatNextResponse = {
ok: true,
messages: pageOldestFirst(1, 10),
reachedEnd: false,
};
// Whole-PR perf invariant: prepending older history (the load-bearing
// user gesture) must NOT tear down + re-arm the IO observer.
// Triggering loadOlder is the cleanest way to drive a messages
// mutation from inside the test, since live agent push goes through
// a Zustand store that's harder to drive reliably from jsdom.
//
// Pre-fix, loadOlder depended on `messages`, so every prepend
// recreated loadOlder → re-ran the IO effect → new observer. Each
// call to triggerIntersection() produced a fresh disconnected
// observer + a new live one. Post-fix, the observer survives.
myChatNextResponse = { ok: true, rows: newestFirstPage(1, 10) };
render(<ChatTab workspaceId="ws-stable-io" data={minimalData} />);
await waitFor(() => expect(myChatHistoryCalls.length).toBe(1));
await waitFor(() => expect(myChatActivityCalls.length).toBe(1));
await waitFor(() => expect(ioInstances.length).toBeGreaterThan(0));
// Snapshot the observer instance after first paint stabilises.
const observerBefore = ioInstances.at(-1);
expect(observerBefore).toBeDefined();
expect(observerBefore!.disconnected).toBe(false);
// Trigger three older-batch prepends. Each batch returns the full
// OLDER_HISTORY_BATCH (20 row-pairs = 40 messages) so reachedEnd
// stays false and the sentinel keeps mounting.
// OLDER_HISTORY_BATCH (20 rows) so reachedEnd stays false and the
// sentinel keeps mounting. Pre-fix, each prepend mutated `messages`
// → recreated loadOlder → re-ran the IO effect → new observer.
for (let batch = 0; batch < 3; batch++) {
myChatNextResponse = {
ok: true,
messages: pageOldestFirst(-(batch + 1) * 20, 20),
reachedEnd: false,
rows: newestFirstPage(-(batch + 1) * 20, 20),
};
const callsBefore = myChatHistoryCalls.length;
const callsBefore = myChatActivityCalls.length;
triggerIntersection();
await waitFor(() => expect(myChatHistoryCalls.length).toBe(callsBefore + 1));
await waitFor(() =>
expect(myChatActivityCalls.length).toBe(callsBefore + 1),
);
}
// The original observer is still the live one — no churn.
+2 -2
View File
@@ -212,8 +212,8 @@ services:
# docker compose pull canvas && docker compose up -d canvas
# First-time local setup or testing unreleased changes — build from source:
# docker compose build canvas && docker compose up -d canvas
# Note: ECR images require AWS auth — `aws ecr get-login-password --region us-east-2 | docker login --username AWS --password-stdin 153263036946.dkr.ecr.us-east-2.amazonaws.com` before pull.
image: 153263036946.dkr.ecr.us-east-2.amazonaws.com/molecule-ai/canvas:latest
# Note: GHCR images are private — `docker login ghcr.io` required before pull.
image: ghcr.io/molecule-ai/canvas:latest
build:
context: ./canvas
dockerfile: Dockerfile
+1 -1
View File
@@ -4,7 +4,7 @@ How a workspace-server code change reaches the prod tenant fleet — and how to
> **⚠️ State note (2026-04-22):** this doc describes the **intended design**. As of this write, the canary fleet described below is **not actually running** — no canary tenants are provisioned, `CANARY_TENANT_URLS` / `CANARY_ADMIN_TOKENS` / `CANARY_CP_SHARED_SECRET` are empty in repo secrets, and `canary-verify.yml` fails every run.
>
> Current merges gate on manual `promote-latest.yml` dispatches, not canary. See [molecule-controlplane/docs/canary-tenants.md](https://git.moleculesai.app/molecule-ai/molecule-controlplane/src/branch/main/docs/canary-tenants.md) for the Phase 1 code work that's already shipped + the Phase 2 plan for actually standing up the fleet + a "should we even do this now?" decision framework.
> Current merges gate on manual `promote-latest.yml` dispatches, not canary. See [molecule-controlplane/docs/canary-tenants.md](https://github.com/Molecule-AI/molecule-controlplane/blob/main/docs/canary-tenants.md) for the Phase 1 code work that's already shipped + the Phase 2 plan for actually standing up the fleet + a "should we even do this now?" decision framework.
>
> **Account-specific identifiers (AWS account ID, IAM role name) referenced below in the original design have been redacted from this public doc.** The actual values — if they exist — are in `Molecule-AI/internal/runbooks/canary-fleet.md`. If you're implementing Phase 2, start there.
>
+6 -6
View File
@@ -1,7 +1,7 @@
# Molecule AI — Comprehensive Technical Documentation
> Definitive technical reference for the Molecule AI Agent Team platform.
> Based on a full non-invasive scan of the [molecule-monorepo](https://git.moleculesai.app/molecule-ai/molecule-monorepo) repository.
> Based on a full non-invasive scan of the [molecule-monorepo](https://github.com/Molecule-AI/molecule-monorepo) repository.
---
@@ -1149,11 +1149,11 @@ Molecule AI's workspace abstraction is **runtime-agnostic by design**. A workspa
## Links
- **GitHub**: https://git.moleculesai.app/molecule-ai/molecule-monorepo
- **Architecture Docs**: https://git.moleculesai.app/molecule-ai/molecule-monorepo/src/branch/main/docs/architecture
- **API Protocol**: https://git.moleculesai.app/molecule-ai/molecule-monorepo/src/branch/main/docs/api-protocol
- **Agent Runtime**: https://git.moleculesai.app/molecule-ai/molecule-monorepo/src/branch/main/docs/agent-runtime
- **Product Docs**: https://git.moleculesai.app/molecule-ai/molecule-monorepo/src/branch/main/docs/product
- **GitHub**: https://github.com/Molecule-AI/molecule-monorepo
- **Architecture Docs**: https://github.com/Molecule-AI/molecule-monorepo/tree/main/docs/architecture
- **API Protocol**: https://github.com/Molecule-AI/molecule-monorepo/tree/main/docs/api-protocol
- **Agent Runtime**: https://github.com/Molecule-AI/molecule-monorepo/tree/main/docs/agent-runtime
- **Product Docs**: https://github.com/Molecule-AI/molecule-monorepo/tree/main/docs/product
---
+2 -2
View File
@@ -79,7 +79,7 @@ For SOC2 / ISO 27001 / customer security questionnaires:
## Pointers
- KMS envelope code: [`molecule-controlplane/internal/crypto/kms.go`](https://git.moleculesai.app/molecule-ai/molecule-controlplane/src/branch/main/internal/crypto/kms.go)
- Static-key fallback: [`molecule-controlplane/internal/crypto/aes.go`](https://git.moleculesai.app/molecule-ai/molecule-controlplane/src/branch/main/internal/crypto/aes.go)
- KMS envelope code: [`molecule-controlplane/internal/crypto/kms.go`](https://github.com/Molecule-AI/molecule-controlplane/blob/main/internal/crypto/kms.go)
- Static-key fallback: [`molecule-controlplane/internal/crypto/aes.go`](https://github.com/Molecule-AI/molecule-controlplane/blob/main/internal/crypto/aes.go)
- Tenant secrets handler: [`workspace-server/internal/crypto/aes.go`](../../workspace-server/internal/crypto/aes.go)
- Tenant secrets schema: [database-schema.md](./database-schema.md#workspace_secrets)
-28
View File
@@ -1,28 +0,0 @@
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 64 64">
<style>
.bg { fill: #0a1120; }
.accent { fill: #7fe8d6; }
.accent-stroke { stroke: #7fe8d6; }
@media (prefers-color-scheme: light) {
.bg { fill: #f5f7fa; }
.accent { fill: #1a8a72; }
.accent-stroke { stroke: #1a8a72; }
}
</style>
<rect class="bg" width="64" height="64" rx="14"/>
<g class="accent-stroke" stroke-width="2.4" stroke-linecap="round" fill="none">
<line x1="32" y1="32" x2="12" y2="14"/>
<line x1="32" y1="32" x2="52" y2="18"/>
<line x1="32" y1="32" x2="10" y2="40"/>
<line x1="32" y1="32" x2="54" y2="44"/>
<line x1="32" y1="32" x2="32" y2="56"/>
</g>
<g class="accent">
<circle cx="32" cy="32" r="6.5"/>
<circle cx="12" cy="14" r="3.5"/>
<circle cx="52" cy="18" r="3.5"/>
<circle cx="10" cy="40" r="3.5"/>
<circle cx="54" cy="44" r="3.5"/>
<circle cx="32" cy="56" r="3.5"/>
</g>
</svg>

Before

Width:  |  Height:  |  Size: 957 B

-17
View File
@@ -1,17 +0,0 @@
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 64 64" role="img" aria-label="Molecule AI">
<g stroke="#7fe8d6" stroke-width="2.6" stroke-linecap="round" fill="none">
<line x1="32" y1="32" x2="12" y2="14"/>
<line x1="32" y1="32" x2="52" y2="18"/>
<line x1="32" y1="32" x2="10" y2="40"/>
<line x1="32" y1="32" x2="54" y2="44"/>
<line x1="32" y1="32" x2="32" y2="56"/>
</g>
<g fill="#7fe8d6">
<circle cx="32" cy="32" r="7"/>
<circle cx="12" cy="14" r="3.6"/>
<circle cx="52" cy="18" r="3.6"/>
<circle cx="10" cy="40" r="3.6"/>
<circle cx="54" cy="44" r="3.6"/>
<circle cx="32" cy="56" r="3.6"/>
</g>
</svg>

Before

Width:  |  Height:  |  Size: 662 B

@@ -299,8 +299,8 @@ Or use the Canvas UI: Workspace → Config → MCP Servers → Add browser MCP s
**Try it free** — Molecule AI is open source and self-hostable. Get a workspace running in under 5 minutes.
→ [Get started on GitHub →](https://git.moleculesai.app/molecule-ai/molecule-core)
→ [Get started on GitHub →](https://github.com/Molecule-AI/molecule-core)
---
*Have a browser automation use case you want to see covered? File an issue with the `enhancement` label on the [molecule-core issue tracker](https://git.moleculesai.app/molecule-ai/molecule-core/issues).*
*Have a browser automation use case you want to see covered? Open a discussion on [GitHub Discussions](https://github.com/Molecule-AI/molecule-core/discussions) — or file an issue with the `enhancement` label.*
@@ -148,7 +148,7 @@ Then follow the [quick-start guide](/docs/guides/remote-workspaces.md).
Or run the annotated example directly:
```bash
git clone https://git.moleculesai.app/molecule-ai/molecule-sdk-python
git clone https://github.com/Molecule-AI/molecule-sdk-python
cd molecule-sdk-python/examples/remote-agent
# Create workspace with runtime:external, grab the ID, then:
WORKSPACE_ID=<your-id> PLATFORM_URL=https://acme.moleculesai.app python3 run.py
@@ -160,6 +160,6 @@ The agent appears on the canvas within seconds.
→ [Remote Workspaces Guide →](/docs/guides/remote-workspaces.md)
→ [External Agent Registration Reference →](/docs/guides/external-agent-registration.md)
→ [molecule-sdk-python →](https://git.moleculesai.app/molecule-ai/molecule-sdk-python)
→ [molecule-sdk-python →](https://github.com/Molecule-AI/molecule-sdk-python)
*Phase 30 shipped in PRs #1075#1083 and #1085#1100 on `molecule-core`.*
@@ -133,4 +133,4 @@ With protocol-native A2A, you get:
Molecule AI's external agent registration is production-ready. Documentation is live at [External Agent Registration Guide](https://docs.molecule.ai/docs/guides/external-agent-registration). The npm package for the MCP server is available at [`@molecule-ai/mcp-server`](https://www.npmjs.com/package/@molecule-ai/mcp-server).
Read the full [A2A v1.0 protocol spec](https://git.moleculesai.app/molecule-ai/molecule-core/src/branch/main/docs/api-protocol/a2a-protocol.md) on GitHub.
Read the full [A2A v1.0 protocol spec](https://github.com/Molecule-AI/molecule-core/blob/main/docs/api-protocol/a2a-protocol.md) on GitHub.
@@ -45,7 +45,7 @@ canonicalUrl: "https://docs.molecule.ai/blog/remote-workspaces"
" proficiencyLevel": "Expert",
"genre": ["technical documentation", "product announcement"],
"sameAs": [
"https://git.moleculesai.app/molecule-ai/molecule-core",
"https://github.com/Molecule-AI/molecule-core",
"https://molecule.ai"
]
}
@@ -270,7 +270,7 @@ Configure it in your project's `.mcp.json` and any AI agent (Claude Code, Cursor
→ [External Agent Registration Guide](/docs/guides/external-agent-registration) — full step-by-step with Python and Node.js reference implementations
→ [GitHub: molecule-core](https://git.moleculesai.app/molecule-ai/molecule-core) — source and issues
→ [GitHub: molecule-core](https://github.com/Molecule-AI/molecule-core) — source and issues
→ [Phase 30 Launch Thread on X](https://x.com) — follow for updates
@@ -170,4 +170,4 @@ The `staging` branch is now on `a2a-sdk` 1.0.0. The `main` branch still carries
If you're running `a2a-sdk` 0.3.x and planning the 1.0.0 migration, this post is the reference. The four breaking changes are well-contained, the migration is a single PR, and the eight smoke scenarios above will tell you whether the upgrade is clean before you merge.
Questions? The [A2A protocol spec](https://github.com/google-a2a/a2a-specification) is the authoritative source. For Molecule AI's production A2A implementation, see [External Agent Registration](https://docs.molecule.ai/docs/guides/external-agent-registration) or open an issue in the [molecule-core](https://git.moleculesai.app/molecule-ai/molecule-core) repo.
Questions? The [A2A protocol spec](https://github.com/google-a2a/a2a-specification) is the authoritative source. For Molecule AI's production A2A implementation, see [External Agent Registration](https://docs.molecule.ai/docs/guides/external-agent-registration) or open an issue in the [molecule-core](https://github.com/Molecule-AI/molecule-core) repo.
+1 -1
View File
@@ -215,7 +215,7 @@ Push mode (this guide) works today but requires an inbound-reachable URL — whi
Your agent makes only outbound HTTPS calls to the platform, pulling messages from an inbox queue and posting replies back. Works behind any NAT/firewall, tolerates offline laptops, no tunnel needed.
See the [design doc](https://git.moleculesai.app/molecule-ai/internal/src/branch/main/product/external-workspaces-polling.md) (internal) and the implementation tracking issue (search `polling+mode` on the [molecule-core issue tracker](https://git.moleculesai.app/molecule-ai/molecule-core/issues)).
See the [design doc](https://github.com/Molecule-AI/internal/blob/main/product/external-workspaces-polling.md) (internal) and [implementation tracking issue](https://github.com/Molecule-AI/molecule-core/issues?q=polling+mode) once opened.
---
+2 -2
View File
@@ -143,5 +143,5 @@ The agent appears on the canvas with a **purple REMOTE badge** within seconds. F
## Next Steps
- **[External Agent Registration Guide →](/docs/guides/external-agent-registration)** — full endpoint reference, Python + Node.js examples, troubleshooting
- **[molecule-sdk-python →](https://git.moleculesai.app/molecule-ai/molecule-sdk-python)** — SDK source, `RemoteAgentClient` API docs
- **[SDK Examples →](https://git.moleculesai.app/molecule-ai/molecule-sdk-python/src/branch/main/examples/remote-agent)** — `run.py` demo script, annotated walkthrough
- **[molecule-sdk-python →](https://github.com/Molecule-AI/molecule-sdk-python)** — SDK source, `RemoteAgentClient` API docs
- **[SDK Examples →](https://github.com/Molecule-AI/molecule-sdk-python/tree/main/examples/remote-agent)** — `run.py` demo script, annotated walkthrough
+2 -2
View File
@@ -61,7 +61,7 @@ molecule skills install arxiv-research --from community
Community skills are reviewed by the Molecule AI team before being
listed. Submit a skill for review by opening a PR against
[`molecule-ai/skills`](https://git.moleculesai.app/molecule-ai/skills).
[`molecule-ai/skills`](https://github.com/Molecule-AI/skills).
## Installing via config.yaml
@@ -151,7 +151,7 @@ molecule skills bundle my-custom-skill --output ./org-templates/my-role/
```
**Publishing to the community:** Open a PR against
[`molecule-ai/skills`](https://git.moleculesai.app/molecule-ai/skills) with a
[`molecule-ai/skills`](https://github.com/Molecule-AI/skills) with a
complete skill package. Community skills are reviewed for security and
correctness before listing.
@@ -96,7 +96,7 @@ fork needed in production.
`resolve_platform_id` for plugin-platform-safe deserialization, and
`self.adapters[adapter.platform]` keying fix (caught by real-subprocess
test before merge — see below).
- **Plugin package**: [Molecule-AI/hermes-platform-molecule-a2a](https://git.moleculesai.app/molecule-ai/hermes-platform-molecule-a2a)
- **Plugin package**: [Molecule-AI/hermes-platform-molecule-a2a](https://github.com/Molecule-AI/hermes-platform-molecule-a2a)
v0.1.0 — public, MIT-licensed. 11 unit tests + 8 in-process E2E
+ 4 real-subprocess E2E checkpoints all green.
- **Workspace template patch**: [Molecule-AI/molecule-ai-workspace-template-hermes#32](https://github.com/Molecule-AI/molecule-ai-workspace-template-hermes/pull/32)
@@ -154,7 +154,7 @@ intermediate shim earns its complexity.
## Codex (OpenAI Codex CLI)
**Status:** Template SHIPPED. Repo live at
[`Molecule-AI/molecule-ai-workspace-template-codex`](https://git.moleculesai.app/molecule-ai/molecule-ai-workspace-template-codex)
[`Molecule-AI/molecule-ai-workspace-template-codex`](https://github.com/Molecule-AI/molecule-ai-workspace-template-codex)
(14 files, 1411 LOC, 12/12 tests). molecule-core registration in
[PR #2512](https://github.com/Molecule-AI/molecule-core/pull/2512).
E2E with real A2A traffic remains.
+2 -2
View File
@@ -17,7 +17,7 @@ This path is aligned to the current repository and current UI. It gets you from
## The one-command path
```bash
git clone https://git.moleculesai.app/molecule-ai/molecule-monorepo.git
git clone https://github.com/Molecule-AI/molecule-monorepo.git
cd molecule-monorepo
./scripts/dev-start.sh
```
@@ -42,7 +42,7 @@ If you'd rather run each component yourself — useful when you're iterating on
### Step 1: Clone the repository
```bash
git clone https://git.moleculesai.app/molecule-ai/molecule-monorepo.git
git clone https://github.com/Molecule-AI/molecule-monorepo.git
cd molecule-monorepo
```
+11 -11
View File
@@ -98,14 +98,14 @@ Each of the 8 adapter template repos contains:
| Adapter | Repo |
|---------|------|
| claude-code | https://git.moleculesai.app/molecule-ai/molecule-ai-workspace-template-claude-code |
| langgraph | https://git.moleculesai.app/molecule-ai/molecule-ai-workspace-template-langgraph |
| crewai | https://git.moleculesai.app/molecule-ai/molecule-ai-workspace-template-crewai |
| autogen | https://git.moleculesai.app/molecule-ai/molecule-ai-workspace-template-autogen |
| deepagents | https://git.moleculesai.app/molecule-ai/molecule-ai-workspace-template-deepagents |
| hermes | https://git.moleculesai.app/molecule-ai/molecule-ai-workspace-template-hermes |
| gemini-cli | https://git.moleculesai.app/molecule-ai/molecule-ai-workspace-template-gemini-cli |
| openclaw | https://git.moleculesai.app/molecule-ai/molecule-ai-workspace-template-openclaw |
| claude-code | https://github.com/Molecule-AI/molecule-ai-workspace-template-claude-code |
| langgraph | https://github.com/Molecule-AI/molecule-ai-workspace-template-langgraph |
| crewai | https://github.com/Molecule-AI/molecule-ai-workspace-template-crewai |
| autogen | https://github.com/Molecule-AI/molecule-ai-workspace-template-autogen |
| deepagents | https://github.com/Molecule-AI/molecule-ai-workspace-template-deepagents |
| hermes | https://github.com/Molecule-AI/molecule-ai-workspace-template-hermes |
| gemini-cli | https://github.com/Molecule-AI/molecule-ai-workspace-template-gemini-cli |
| openclaw | https://github.com/Molecule-AI/molecule-ai-workspace-template-openclaw |
## Adapter discovery (ADAPTER_MODULE)
@@ -244,7 +244,7 @@ correctness before pushing a `runtime-v*` tag.
## Writing a new adapter
Use the GitHub template repo
[`molecule-ai/molecule-ai-workspace-template-starter`](https://git.moleculesai.app/molecule-ai/molecule-ai-workspace-template-starter) (note: the starter repo did not survive the 2026-05-06 GitHub-org-suspension migration; recreation tracked at internal#41)
[`Molecule-AI/molecule-ai-workspace-template-starter`](https://github.com/Molecule-AI/molecule-ai-workspace-template-starter)
— it ships with the canonical Dockerfile + adapter.py skeleton + config.yaml
schema + the `repository_dispatch: [runtime-published]` cascade receiver
already wired up. No follow-up setup PR required.
@@ -256,7 +256,7 @@ gh repo create Molecule-AI/molecule-ai-workspace-template-<runtime> \
--public \
--description "Molecule AI workspace template: <runtime>"
git clone https://git.moleculesai.app/molecule-ai/molecule-ai-workspace-template-<runtime>.git
git clone https://github.com/Molecule-AI/molecule-ai-workspace-template-<runtime>
cd molecule-ai-workspace-template-<runtime>
```
@@ -286,7 +286,7 @@ After `git push`:
If the canonical shape changes (e.g. `config.yaml` schema gets a new field,
the `BaseAdapter` interface adds a method, the reusable CI workflow
signature changes), update the
[starter](https://git.moleculesai.app/molecule-ai/molecule-ai-workspace-template-starter) (recreation pending — see note above)
[starter](https://github.com/Molecule-AI/molecule-ai-workspace-template-starter)
**first**. Existing templates can either migrate at their own pace or be
touched in a coordinated cleanup PR. Either way, future templates pick up
the new shape from day one.
+2 -1
View File
@@ -41,6 +41,7 @@
{"name": "medo-smoke", "repo": "Molecule-AI/molecule-ai-org-template-medo-smoke", "ref": "main"},
{"name": "molecule-worker-gemini", "repo": "Molecule-AI/molecule-ai-org-template-molecule-worker-gemini", "ref": "main"},
{"name": "reno-stars", "repo": "Molecule-AI/molecule-ai-org-template-reno-stars", "ref": "main"},
{"name": "ux-ab-lab", "repo": "Molecule-AI/molecule-ai-org-template-ux-ab-lab", "ref": "main"}
{"name": "ux-ab-lab", "repo": "Molecule-AI/molecule-ai-org-template-ux-ab-lab", "ref": "main"},
{"name": "mock-bigorg", "repo": "Molecule-AI/molecule-ai-org-template-mock-bigorg", "ref": "main"}
]
}
+1 -1
View File
@@ -11,7 +11,7 @@ There are three related scripts; pick the right one:
|---|---|---|
| `measure-coordinator-task-bounds.sh` | **Canonical** v1 harness for the RFC #2251 / Issue 4 reproduction. Provisions a PM coordinator + Researcher child via `claude-code-default` + `langgraph` templates, sends a synthesis-heavy A2A kickoff, observes elapsed time + activity trace. | OSS-shape platform — localhost or any `/workspaces`-shaped endpoint. Has tenant/admin-token guards for non-localhost runs. |
| `measure-coordinator-task-bounds-runner.sh` | Generalised runner for the same measurement contract but with **arbitrary template + secret + model combinations** (Hermes/MiniMax, etc.). Useful for cross-runtime variants without modifying the canonical harness. | Same as above (local or SaaS via `MODE=saas`). |
| `measure-coordinator-task-bounds.sh` (in [molecule-controlplane](https://git.moleculesai.app/molecule-ai/molecule-controlplane)) | **Production-shape** variant that bootstraps a real staging tenant via `POST /cp/admin/orgs`, then runs the same measurement against `<slug>.staging.moleculesai.app`. | Staging controlplane only — refuses to run against production. |
| `measure-coordinator-task-bounds.sh` (in [molecule-controlplane](https://github.com/Molecule-AI/molecule-controlplane)) | **Production-shape** variant that bootstraps a real staging tenant via `POST /cp/admin/orgs`, then runs the same measurement against `<slug>.staging.moleculesai.app`. | Staging controlplane only — refuses to run against production. |
See `reference_harness_pair_pattern` (auto-memory) for when to use which
and the cross-repo design rationale.
+2 -2
View File
@@ -278,7 +278,7 @@ include = ["molecule_runtime*"]
README_TEMPLATE = """\
# molecule-ai-workspace-runtime
Shared workspace runtime for [Molecule AI](https://git.moleculesai.app/molecule-ai/molecule-core)
Shared workspace runtime for [Molecule AI](https://github.com/Molecule-AI/molecule-core)
agent adapters. Installed by every workspace template image
(`workspace-template-claude-code`, `-langgraph`, `-hermes`, etc.) to provide
A2A delegation, heartbeat, memory, plugin loading, and skill management.
@@ -396,7 +396,7 @@ If you don't need real-time push, the default poll path works
universally with no extra setup; both modes converge on the same
`inbox_pop` ack so messages never duplicate.
See [`docs/workspace-runtime-package.md`](https://git.moleculesai.app/molecule-ai/molecule-core/src/branch/main/docs/workspace-runtime-package.md)
See [`docs/workspace-runtime-package.md`](https://github.com/Molecule-AI/molecule-core/blob/main/docs/workspace-runtime-package.md)
for the publish flow and architecture.
"""
+10 -3
View File
@@ -45,11 +45,18 @@ clone_category() {
continue
fi
echo " cloning $repo -> $target_dir/$name (ref=$ref)"
# Post-2026-05-06 GitHub-org-suspension: clone from Gitea instead.
# manifest.json paths still read "Molecule-AI/..." (the historic
# github.com slug); Gitea lowercases the org part to "molecule-ai/".
# Lowercase the org segment on the fly so we don't need to rewrite
# every manifest entry.
repo_gitea="$(echo "$repo" | awk -F/ '{ printf "%s", tolower($1); for (i=2; i<=NF; i++) printf "/%s", $i; print "" }')"
echo " cloning $repo_gitea -> $target_dir/$name (ref=$ref)"
if [ "$ref" = "main" ]; then
git clone --depth=1 -q "https://github.com/${repo}.git" "$target_dir/$name"
git clone --depth=1 -q "https://git.moleculesai.app/${repo_gitea}.git" "$target_dir/$name"
else
git clone --depth=1 -q --branch "$ref" "https://github.com/${repo}.git" "$target_dir/$name"
git clone --depth=1 -q --branch "$ref" "https://git.moleculesai.app/${repo_gitea}.git" "$target_dir/$name"
fi
CLONED=$((CLONED + 1))
i=$((i + 1))
+2 -2
View File
@@ -10,11 +10,11 @@
# → PyPI auto-bumps molecule-ai-workspace-runtime patch version
# → repository_dispatch fans out to 8 workspace-template-* repos
# → each template repo rebuilds and re-tags
# 153263036946.dkr.ecr.us-east-2.amazonaws.com/molecule-ai/workspace-template-<runtime>:latest
# ghcr.io/molecule-ai/workspace-template-<runtime>:latest
#
# PATH 2: any merge to a workspace-template-* repo's main branch
# → that repo's publish-image.yml fires
# → 153263036946.dkr.ecr.us-east-2.amazonaws.com/molecule-ai/workspace-template-<runtime>:latest
# → ghcr.io/molecule-ai/workspace-template-<runtime>:latest
# gets re-tagged
#
# provisioner.go:296 RuntimeImages[runtime] reads `:latest` at every
+1 -1
View File
@@ -51,7 +51,7 @@ log "pulling latest images for: ${RUNTIMES[*]}"
PULLED=()
FAILED=()
for rt in "${RUNTIMES[@]}"; do
IMG="153263036946.dkr.ecr.us-east-2.amazonaws.com/molecule-ai/workspace-template-$rt:latest"
IMG="ghcr.io/molecule-ai/workspace-template-$rt:latest"
if docker pull "$IMG" >/dev/null 2>&1; then
log "$rt"
PULLED+=("$rt")
+16 -21
View File
@@ -1,10 +1,9 @@
#!/bin/bash
# rollback-latest.sh — moves the :latest tag on the platform image
# (and the matching tenant image) on AWS ECR back to a prior
# :staging-<sha> digest without rebuilding anything. Prod tenants
# auto-pull :latest every 5 min, so this is the fast path when a
# canary-verified image turns out to have a runtime regression that
# canary didn't catch.
# rollback-latest.sh — moves the :latest tag on ghcr.io/molecule-ai/platform
# (and the matching tenant image) back to a prior :staging-<sha> digest
# without rebuilding anything. Prod tenants auto-pull :latest every 5
# min, so this is the fast path when a canary-verified image turns out
# to have a runtime regression that canary didn't catch.
#
# Usage:
# scripts/rollback-latest.sh <sha>
@@ -13,14 +12,12 @@
# Prereqs:
# - crane on $PATH (brew install crane OR download from
# https://github.com/google/go-containerregistry/releases)
# - aws CLI authenticated for region us-east-2 with ECR pull/push
# access to the molecule-ai/platform + platform-tenant repositories.
# `aws sts get-caller-identity` should succeed.
# - GHCR token exported as GITHUB_TOKEN with write:packages scope
#
# What it does (per image — platform + tenant):
# crane digest <ecr>:<sha> # verify the target sha exists
# crane tag <ecr>:<sha> latest # retag remotely, single API call
# crane digest <ecr>:latest # confirm the move
# crane digest ghcr.io/…:<sha> # verify the target sha exists
# crane tag ghcr.io/…:<sha> latest # retag remotely, single API call
# crane digest ghcr.io/…:latest # confirm the move
#
# Exit codes: 0 = both retagged, 1 = tag missing / crane error, 2 = bad args.
@@ -33,23 +30,21 @@ if [ "${1:-}" = "" ]; then
fi
TARGET_SHA="$1"
ECR_HOST=153263036946.dkr.ecr.us-east-2.amazonaws.com
PLATFORM=$ECR_HOST/molecule-ai/platform
TENANT=$ECR_HOST/molecule-ai/platform-tenant
PLATFORM=ghcr.io/molecule-ai/platform
TENANT=ghcr.io/molecule-ai/platform-tenant
if ! command -v crane >/dev/null; then
echo "ERROR: crane not installed. brew install crane" >&2
exit 1
fi
if ! command -v aws >/dev/null; then
echo "ERROR: aws CLI not installed. brew install awscli" >&2
if [ -z "${GITHUB_TOKEN:-}" ]; then
echo "ERROR: GITHUB_TOKEN unset. export it with write:packages scope." >&2
exit 1
fi
# Log in once. ECR auth is via short-lived password from `aws ecr
# get-login-password`. crane stores creds in a config file keyed by
# registry; re-running is cheap.
aws ecr get-login-password --region us-east-2 | crane auth login "$ECR_HOST" -u AWS --password-stdin >/dev/null
# Log in once. crane stores creds in a config file keyed by registry;
# re-running is cheap.
printf '%s\n' "$GITHUB_TOKEN" | crane auth login ghcr.io -u "${GITHUB_ACTOR:-$(whoami)}" --password-stdin >/dev/null
roll() {
local image="$1"
+1 -1
View File
@@ -18,7 +18,7 @@
#
# Or inline via curl:
#
# bash <(curl -fsSL https://git.moleculesai.app/molecule-ai/molecule-core/raw/branch/main/tools/check-template-parity.sh) \
# bash <(curl -fsSL https://raw.githubusercontent.com/Molecule-AI/molecule-core/main/tools/check-template-parity.sh) \
# install.sh start.sh
#
# Exit codes:
+89
View File
@@ -0,0 +1,89 @@
package main
import "testing"
// TestResolveBindHost pins the precedence: BIND_ADDR explicit > dev-mode
// fail-open default of 127.0.0.1 > production-shape empty (all interfaces).
//
// Mutation-test invariant: removing the IsDevModeFailOpen() branch makes
// "no_bindaddr_devmode_unset_admin" fail (returns "" instead of "127.0.0.1").
// Removing the BIND_ADDR branch makes "explicit_bindaddr_*" cases fail.
func TestResolveBindHost(t *testing.T) {
cases := []struct {
name string
bindAddr string
adminToken string
molEnv string
want string
}{
{
name: "no_bindaddr_devmode_unset_admin",
bindAddr: "",
adminToken: "",
molEnv: "dev",
want: "127.0.0.1",
},
{
name: "no_bindaddr_devmode_unset_admin_full_word",
bindAddr: "",
adminToken: "",
molEnv: "development",
want: "127.0.0.1",
},
{
name: "no_bindaddr_admin_set_in_dev_env",
bindAddr: "",
adminToken: "secret",
molEnv: "dev",
want: "", // ADMIN_TOKEN flips IsDevModeFailOpen to false → all interfaces
},
{
name: "no_bindaddr_production_env",
bindAddr: "",
adminToken: "",
molEnv: "production",
want: "", // production is not a dev value → all interfaces
},
{
name: "no_bindaddr_unset_env",
bindAddr: "",
adminToken: "",
molEnv: "",
want: "", // unset MOLECULE_ENV → not dev → all interfaces
},
{
name: "explicit_bindaddr_loopback_overrides_devmode",
bindAddr: "127.0.0.1",
adminToken: "",
molEnv: "dev",
want: "127.0.0.1",
},
{
name: "explicit_bindaddr_wildcard_overrides_devmode_default",
bindAddr: "0.0.0.0",
adminToken: "",
molEnv: "dev",
want: "0.0.0.0",
},
{
name: "explicit_bindaddr_in_production",
bindAddr: "10.0.5.7",
adminToken: "secret",
molEnv: "production",
want: "10.0.5.7",
},
}
for _, tc := range cases {
t.Run(tc.name, func(t *testing.T) {
t.Setenv("BIND_ADDR", tc.bindAddr)
t.Setenv("ADMIN_TOKEN", tc.adminToken)
t.Setenv("MOLECULE_ENV", tc.molEnv)
got := resolveBindHost()
if got != tc.want {
t.Errorf("resolveBindHost() = %q, want %q (BIND_ADDR=%q ADMIN_TOKEN=%q MOLECULE_ENV=%q)",
got, tc.want, tc.bindAddr, tc.adminToken, tc.molEnv)
}
})
}
}
+35 -16
View File
@@ -19,6 +19,7 @@ import (
"github.com/Molecule-AI/molecule-monorepo/platform/internal/handlers"
"github.com/Molecule-AI/molecule-monorepo/platform/internal/imagewatch"
memwiring "github.com/Molecule-AI/molecule-monorepo/platform/internal/memory/wiring"
"github.com/Molecule-AI/molecule-monorepo/platform/internal/middleware"
"github.com/Molecule-AI/molecule-monorepo/platform/internal/pendinguploads"
"github.com/Molecule-AI/molecule-monorepo/platform/internal/provisioner"
"github.com/Molecule-AI/molecule-monorepo/platform/internal/registry"
@@ -248,19 +249,6 @@ func main() {
})
}
// CP-mode orphan sweeper — SaaS counterpart to the Docker sweeper
// above. Re-issues cpProv.Stop for any workspace at status='removed'
// with a non-NULL instance_id, healing the deprovision split-write
// race documented in #2989: tenant marks status='removed' BEFORE
// calling CP DELETE, so a transient CP failure leaves the EC2
// running with no retry path. cpProv.Stop is idempotent against
// already-terminated instances; on success we clear instance_id.
if cpProv != nil {
go supervised.RunWithRecover(ctx, "cp-orphan-sweeper", func(c context.Context) {
registry.StartCPOrphanSweeper(c, cpProv)
})
}
// Pending-uploads GC sweep — deletes acked rows past their retention
// window plus unacked rows past expires_at. Without this the
// pending_uploads table grows unbounded; even with the 24h hard TTL,
@@ -332,15 +320,23 @@ func main() {
// Router
r := router.Setup(hub, broadcaster, prov, platformURL, configsDir, wh, channelMgr, memBundle)
// HTTP server with graceful shutdown
// HTTP server with graceful shutdown.
//
// Bind host: in dev-mode (no ADMIN_TOKEN, MOLECULE_ENV=dev|development)
// the AdminAuth chain fails open by design; pairing that with a wildcard
// bind would expose unauth /workspaces to any same-LAN peer. Default to
// loopback when fail-open is active. Operators who need LAN exposure set
// BIND_ADDR=0.0.0.0 explicitly. Production (ADMIN_TOKEN set) is unchanged.
// See molecule-core#7.
bindHost := resolveBindHost()
srv := &http.Server{
Addr: fmt.Sprintf(":%s", port),
Addr: fmt.Sprintf("%s:%s", bindHost, port),
Handler: r,
}
// Start server in goroutine
go func() {
log.Printf("Platform starting on :%s", port)
log.Printf("Platform starting on %s:%s (dev-mode-fail-open=%v)", bindHost, port, middleware.IsDevModeFailOpen())
if err := srv.ListenAndServe(); err != nil && err != http.ErrServerClosed {
log.Fatalf("Server failed: %v", err)
}
@@ -375,6 +371,29 @@ func envOr(key, fallback string) string {
return fallback
}
// resolveBindHost picks the listener interface for the HTTP server.
//
// Precedence:
// 1. BIND_ADDR — explicit operator override (any value, including "0.0.0.0").
// 2. dev-mode fail-open active → "127.0.0.1" (loopback only).
// 3. otherwise → "" (Go binds every interface; existing prod/self-host shape).
//
// Coupling the loopback default to middleware.IsDevModeFailOpen() means the
// two safety levers — bind narrowness and auth strength — move together. A
// production deploy (ADMIN_TOKEN set) keeps binding to all interfaces because
// the auth chain is doing its job; a dev Mac (no ADMIN_TOKEN, MOLECULE_ENV=dev)
// is reachable only via loopback because the auth chain is fail-open. See
// molecule-core#7 for the original LAN exposure finding.
func resolveBindHost() string {
if v := os.Getenv("BIND_ADDR"); v != "" {
return v
}
if middleware.IsDevModeFailOpen() {
return "127.0.0.1"
}
return ""
}
func findConfigsDir() string {
candidates := []string{
"workspace-configs-templates",
@@ -413,6 +413,23 @@ func (h *WorkspaceHandler) proxyA2ARequest(ctx context.Context, workspaceID stri
return http.StatusOK, respBody, nil
}
// Mock-runtime short-circuit. Workspaces with runtime='mock' have
// no container, no EC2, no URL — every reply is synthesised here
// from a small canned-variant pool. Built for the "200-workspace
// mock org" demo: a CEO/VPs/Managers/ICs hierarchy that renders
// at scale on the canvas without burning real LLM credits or
// provisioning 200 EC2 instances. See mock_runtime.go for the
// full rationale + reply shape contract.
//
// Position: AFTER poll-mode (mock isn't a delivery mode, it's a
// runtime; treating poll-set-on-mock as poll matches operator
// intent if anyone ever does that), BEFORE resolveAgentURL (mock
// has no URL — going through resolveAgentURL would 404 on the
// SELECT url since the row is provisioned as NULL).
if status, respBody, handled := h.handleMockA2A(ctx, workspaceID, callerID, body, a2aMethod, logActivity); handled {
return status, respBody, nil
}
agentURL, proxyErr := h.resolveAgentURL(ctx, workspaceID)
if proxyErr != nil {
return 0, nil, proxyErr
@@ -1,437 +0,0 @@
package handlers
// eic_tunnel_pool.go — refcounted pool for EIC SSH tunnels keyed on
// instanceID. Reuses one tunnel across N file ops, amortising the
// ssh-keygen + SendSSHPublicKey + open-tunnel + waitForPort cost
// (~3-5s) over multiple cats/finds (~50-200ms each).
//
// Origin: core#11 — canvas detail-panel config + filesystem load
// took ~20s. ConfigTab fans out 4 GETs serially; the slowest is
// /files/config.yaml which dispatches to readFileViaEIC. Without a
// pool, every readFileViaEIC + listFilesViaEIC + writeFileViaEIC +
// deleteFileViaEIC pays the full setup cost even when fired
// back-to-back on the same workspace EC2.
//
// The pool keeps one eicSSHSession alive per instanceID for up to
// poolTTL. SendSSHPublicKey grants a 60s key validity, so poolTTL
// must stay strictly below that to avoid serving requests on a
// just-expired key. We default to 50s with a 10s safety margin.
//
// Concurrency model:
//
// - Single mutex guards the entries map.
// - Slow path (tunnel setup) runs OUTSIDE the lock, gated by an
// "intent" placeholder so concurrent acquires for the same
// instanceID don't both build a tunnel — the loser drops its
// setup and uses the winner's.
// - Refcount on each entry; eviction blocked while refcount > 0.
// - Janitor goroutine sweeps every poolJanitorInterval, drops
// entries where refcount == 0 && expiresAt < now.
//
// Test injection:
//
// - poolSetupTunnel is a package-level var so tests can swap the
// slow path for a counting stub. Production wires it to
// realWithEICTunnel-style setup.
// - withEICTunnel (the public, single-shot API) is also a var
// (already, see template_files_eic.go). It's rebound here to
// pooledWithEICTunnel which routes through globalEICTunnelPool.
// - Tests that need single-shot behaviour can set poolTTL = 0,
// which makes pooledWithEICTunnel fall through to the underlying
// setup directly (no pool entry kept).
import (
"context"
"fmt"
"sync"
"time"
)
// poolTTL is the maximum age of a pooled tunnel. Must be strictly
// less than the SendSSHPublicKey grant window (60s) so we never
// serve a request through a key that's about to expire mid-op.
//
// Configurable via init-time wiring (see initEICTunnelPool); not a
// const so tests can pin TTL=0 (disable pooling) or TTL=50ms (drive
// eviction tests).
var poolTTL = 50 * time.Second
// poolJanitorInterval is how often the janitor goroutine sweeps for
// expired idle entries. Tighter than poolTTL so eviction is timely;
// loose enough that the goroutine doesn't burn CPU.
var poolJanitorInterval = 10 * time.Second
// poolMaxEntries caps simultaneous instanceIDs the pool tracks.
// Beyond this, new acquires evict the LRU entry. Defends against a
// pathological caller (e.g. a sweep over hundreds of workspace
// EC2s) from leaking unbounded tunnel processes. 32 is a generous
// ceiling for the canvas use case (one human navigates ≤ ~5
// workspaces at a time).
var poolMaxEntries = 32
// poolSetupTunnel is the slow-path tunnel constructor. Wrapped in a
// var so tests can inject a counter stub. Returns a session and a
// cleanup function (closes the open-tunnel subprocess + scrubs the
// ephemeral keydir). nil session + non-nil err means setup failed
// and there is nothing to clean up.
//
// Production wiring lives in eic_tunnel_pool_setup.go (a thin shim
// over the existing realWithEICTunnel logic).
var poolSetupTunnel = func(ctx context.Context, instanceID string) (
sess eicSSHSession, cleanup func(), err error) {
return setupRealEICTunnel(ctx, instanceID)
}
// pooledTunnel is one entry in the pool. session is shared by N
// concurrent fn calls; cleanup runs once when refcount returns to
// zero AND the entry is past expiresAt or evicted.
//
// lastUsed tracks the most recent acquire time for LRU bookkeeping
// (overflow eviction). expiresAt is set at construction and not
// extended on use — a tunnel cannot live past poolTTL even if it's
// hot, because the underlying SendSSHPublicKey grant expires.
type pooledTunnel struct {
session eicSSHSession
cleanup func()
expiresAt time.Time
lastUsed time.Time
refcount int
poisoned bool // true if a fn returned a tunnel-fatal error; do not reuse
}
// eicTunnelPool is the package-level pool. Single instance lives
// in globalEICTunnelPool; constructor runs lazily on first acquire.
type eicTunnelPool struct {
mu sync.Mutex
entries map[string]*pooledTunnel
// pendingSetups guards concurrent setup for the same instanceID.
// First acquirer takes the slot; later ones wait on the channel.
pendingSetups map[string]chan struct{}
stopJanitor chan struct{}
}
var (
globalEICTunnelPool *eicTunnelPool
globalEICTunnelPoolOnce sync.Once
)
// getEICTunnelPool returns the singleton pool, lazy-initialising on
// first call. Idempotent.
func getEICTunnelPool() *eicTunnelPool {
globalEICTunnelPoolOnce.Do(func() {
globalEICTunnelPool = newEICTunnelPool()
go globalEICTunnelPool.janitor()
})
return globalEICTunnelPool
}
// newEICTunnelPool constructs an empty pool. Exported so tests can
// build isolated pools without sharing the singleton.
func newEICTunnelPool() *eicTunnelPool {
return &eicTunnelPool{
entries: map[string]*pooledTunnel{},
pendingSetups: map[string]chan struct{}{},
stopJanitor: make(chan struct{}),
}
}
// acquire returns a usable session for instanceID. If a healthy entry
// exists, refcount++ and return it. If a setup is in flight for the
// same instanceID, wait for it. Otherwise build one (slow path).
//
// done() must be called by the caller when the op finishes. It
// decrements refcount and triggers cleanup if the entry is past
// TTL or poisoned and refcount==0.
//
// Errors from the slow path propagate; pool state is not modified
// for failed setups (no poisoned entry created — that's only for
// fn-returned errors on a previously-good session).
func (p *eicTunnelPool) acquire(ctx context.Context, instanceID string) (
sess eicSSHSession, done func(poisoned bool), err error) {
if poolTTL <= 0 {
// Pool disabled (TTL=0 mode for tests / opt-out). Fall
// through to a direct setup with caller-driven cleanup.
s, cleanup, err := poolSetupTunnel(ctx, instanceID)
if err != nil {
return eicSSHSession{}, nil, err
}
return s, func(_ bool) { cleanup() }, nil
}
for {
p.mu.Lock()
if pt, ok := p.entries[instanceID]; ok && !pt.poisoned && pt.expiresAt.After(time.Now()) {
pt.refcount++
pt.lastUsed = time.Now()
p.mu.Unlock()
return pt.session, p.releaser(instanceID, pt), nil
}
// Either no entry, expired entry, or poisoned entry. If a
// setup is already in flight, wait and retry.
if pending, ok := p.pendingSetups[instanceID]; ok {
p.mu.Unlock()
select {
case <-pending:
continue // re-check the entries map
case <-ctx.Done():
return eicSSHSession{}, nil, ctx.Err()
}
}
// Drop expired/poisoned entry now (we'll cleanup outside
// the lock — the entry is unreferenced or we'd not be here).
var oldCleanup func()
if pt, ok := p.entries[instanceID]; ok {
if pt.refcount == 0 {
oldCleanup = pt.cleanup
delete(p.entries, instanceID)
}
}
// Reserve the setup slot.
signal := make(chan struct{})
p.pendingSetups[instanceID] = signal
p.mu.Unlock()
if oldCleanup != nil {
go oldCleanup()
}
// Slow path: build a new tunnel. Anything that goes wrong
// here cleans up the pendingSetups slot and propagates to
// the caller without leaving the pool in a state where the
// next acquire blocks waiting on a signal that never fires.
newSess, cleanup, setupErr := poolSetupTunnel(ctx, instanceID)
p.mu.Lock()
delete(p.pendingSetups, instanceID)
close(signal)
if setupErr != nil {
p.mu.Unlock()
return eicSSHSession{}, nil, fmt.Errorf("eic tunnel setup: %w", setupErr)
}
// Enforce LRU bound BEFORE inserting so we don't briefly
// exceed the cap even by one entry.
p.evictLRUIfFullLocked(instanceID)
pt := &pooledTunnel{
session: newSess,
cleanup: cleanup,
expiresAt: time.Now().Add(poolTTL),
lastUsed: time.Now(),
refcount: 1,
}
p.entries[instanceID] = pt
p.mu.Unlock()
return pt.session, p.releaser(instanceID, pt), nil
}
}
// releaser returns a closure that decrements refcount and triggers
// cleanup if (a) the entry is past TTL or (b) the caller signalled
// poison. Idempotent against double-release (decrements once via the
// captured pt; pool entry may have been replaced by then).
func (p *eicTunnelPool) releaser(instanceID string, pt *pooledTunnel) func(poisoned bool) {
released := false
return func(poisoned bool) {
p.mu.Lock()
defer p.mu.Unlock()
if released {
return
}
released = true
pt.refcount--
if poisoned {
pt.poisoned = true
}
// Evict immediately if poisoned-and-idle OR expired-and-idle.
// Hot entries (refcount > 0) defer eviction to the last release.
if pt.refcount == 0 && (pt.poisoned || pt.expiresAt.Before(time.Now())) {
// If the entry in the map is still us, remove it.
if cur, ok := p.entries[instanceID]; ok && cur == pt {
delete(p.entries, instanceID)
}
go pt.cleanup()
}
}
}
// evictLRUIfFullLocked drops the least-recently-used IDLE entry
// when the pool is at capacity. Caller must hold p.mu. The new
// instanceID about to be inserted is excluded so we don't evict
// ourselves. If no idle entries exist, no eviction happens — the
// new entry will push us above the soft cap until something releases.
func (p *eicTunnelPool) evictLRUIfFullLocked(skipInstance string) {
if len(p.entries) < poolMaxEntries {
return
}
var oldestKey string
var oldest *pooledTunnel
for k, pt := range p.entries {
if k == skipInstance {
continue
}
if pt.refcount > 0 {
continue
}
if oldest == nil || pt.lastUsed.Before(oldest.lastUsed) {
oldestKey = k
oldest = pt
}
}
if oldest == nil {
return // every entry is in use; no eviction possible
}
delete(p.entries, oldestKey)
go oldest.cleanup()
}
// janitor periodically scans for entries that are idle AND expired,
// closing their tunnels. Runs forever (per pool lifetime); cancelled
// by close(p.stopJanitor) for tests that build short-lived pools.
func (p *eicTunnelPool) janitor() {
t := time.NewTicker(poolJanitorInterval)
defer t.Stop()
for {
select {
case <-t.C:
p.sweep()
case <-p.stopJanitor:
return
}
}
}
// sweep is one janitor pass. Drops idle expired entries.
func (p *eicTunnelPool) sweep() {
p.mu.Lock()
now := time.Now()
var toClose []func()
for k, pt := range p.entries {
if pt.refcount == 0 && pt.expiresAt.Before(now) {
toClose = append(toClose, pt.cleanup)
delete(p.entries, k)
}
}
p.mu.Unlock()
for _, c := range toClose {
go c()
}
}
// stop terminates the janitor and closes all idle entries. Hot
// (refcount > 0) entries are NOT force-closed — callers running
// against them would see a use-after-free. In practice stop is only
// called by tests that have already drained their callers.
func (p *eicTunnelPool) stop() {
close(p.stopJanitor)
p.mu.Lock()
defer p.mu.Unlock()
for k, pt := range p.entries {
if pt.refcount == 0 {
go pt.cleanup()
delete(p.entries, k)
}
}
}
// pooledWithEICTunnel is the pool-backed replacement for
// realWithEICTunnel. The signature matches `var withEICTunnel`
// exactly so the rebind (in initEICTunnelPool) is a drop-in.
//
// Errors from `fn` itself are forwarded to the caller AND mark the
// pool entry as poisoned, so the next acquire builds a fresh
// tunnel. This catches the case where the workspace EC2 was
// restarted out-of-band (tunnel still appears alive locally but
// every cat/find errors out).
func pooledWithEICTunnel(ctx context.Context, instanceID string,
fn func(s eicSSHSession) error) error {
pool := getEICTunnelPool()
sess, done, err := pool.acquire(ctx, instanceID)
if err != nil {
return err
}
// poisoned defaults to true so a panic from fn poisons the
// entry on the way through the deferred release. Without the
// defer, a panicking fn would leak refcount=1 forever and
// permanently block eviction of this entry. The fn-error path
// resets poisoned to its real classification before return.
poisoned := true
defer func() { done(poisoned) }()
fnErr := fn(sess)
poisoned = fnErrIndicatesTunnelFault(fnErr)
return fnErr
}
// fnErrIndicatesTunnelFault returns true for fn errors whose nature
// suggests the underlying tunnel is no longer reusable (auth gone,
// network gone, ssh process dead). Returning true poisons the pool
// entry so the next acquire builds fresh.
//
// Conservative: only marks tunnel-faulty for clearly tunnel-level
// failures (connection refused, broken pipe, ssh exit-status from
// fatal-channel signals). A `cat` returning os.ErrNotExist on a
// missing file is NOT a tunnel fault — that's the file path being
// wrong, the tunnel is fine.
func fnErrIndicatesTunnelFault(err error) bool {
if err == nil {
return false
}
msg := err.Error()
// stderr substrings produced by ssh when the tunnel is broken.
for _, marker := range []string{
"connection refused",
"connection closed",
"broken pipe",
"Connection reset by peer",
"kex_exchange_identification",
"port forwarding failed",
"Permission denied",
"Authentication failed",
} {
if containsCaseInsensitive(msg, marker) {
return true
}
}
return false
}
// containsCaseInsensitive avoids importing strings just for this
// (the file already needs ssh stderr matching elsewhere — this
// keeps the helper local to avoid a cross-file dependency).
func containsCaseInsensitive(s, substr string) bool {
if len(substr) > len(s) {
return false
}
// Manual lowercase compare loop; ssh error markers are ASCII so
// no need for unicode-aware folding.
low := func(b byte) byte {
if b >= 'A' && b <= 'Z' {
return b + 32
}
return b
}
for i := 0; i+len(substr) <= len(s); i++ {
match := true
for j := 0; j < len(substr); j++ {
if low(s[i+j]) != low(substr[j]) {
match = false
break
}
}
if match {
return true
}
}
return false
}
// initEICTunnelPool rebinds the package-level withEICTunnel var to
// the pooled implementation. Called once at package init via the
// init() in eic_tunnel_pool_setup.go (split file so the rebind
// itself is testable without dragging in the production setup
// shim's exec/aws dependencies).
func initEICTunnelPool() {
withEICTunnel = pooledWithEICTunnel
}
@@ -1,136 +0,0 @@
package handlers
// eic_tunnel_pool_setup.go — production setup shim.
//
// setupRealEICTunnel decomposes the existing realWithEICTunnel into
// its slow half (build the tunnel) and its caller half (run fn). The
// pool calls the slow half once and shares the resulting session
// across N callers, holding cleanup until the last release.
//
// Why decompose instead of refactoring realWithEICTunnel: the
// existing function and its test stub-vars (withEICTunnel,
// sendSSHPublicKey, openTunnelCmd) are load-bearing for the
// dispatch tests. Extracting a sibling setup function preserves the
// existing single-shot path verbatim — the pool wraps it by calling
// realWithEICTunnel through a thin adapter, leaving the tested
// surface unchanged.
//
// The pool's acquire() invokes poolSetupTunnel, which is a `var`
// pointing to setupRealEICTunnel for production and a counting stub
// for tests.
import (
"context"
"fmt"
"os"
"os/exec"
"strings"
"time"
)
// setupRealEICTunnel is the slow path that the pool consumes when
// no warm entry exists. Mirrors realWithEICTunnel's setup half but
// returns the session + cleanup instead of running fn inline.
//
// The cleanup func owns the tunnel subprocess, ephemeral key dir,
// and a one-time wait. Idempotent — calling it twice is safe; the
// pool guarantees one call per session, but defence-in-depth helps
// when tests run pools in parallel and racy sweeps re-trigger.
func setupRealEICTunnel(ctx context.Context, instanceID string) (
eicSSHSession, func(), error) {
if instanceID == "" {
return eicSSHSession{}, nil,
fmt.Errorf("workspace has no instance_id — not a SaaS EC2 workspace")
}
osUser := os.Getenv("WORKSPACE_EC2_OS_USER")
if osUser == "" {
osUser = "ubuntu"
}
region := os.Getenv("AWS_REGION")
if region == "" {
region = "us-east-2"
}
keyDir, err := os.MkdirTemp("", "molecule-eic-pool-*")
if err != nil {
return eicSSHSession{}, nil, fmt.Errorf("keydir mkdir: %w", err)
}
keyPath := keyDir + "/id"
if out, kerr := exec.CommandContext(ctx, "ssh-keygen",
"-t", "ed25519", "-f", keyPath, "-N", "", "-q",
"-C", "molecule-eic-pool",
).CombinedOutput(); kerr != nil {
_ = os.RemoveAll(keyDir)
return eicSSHSession{}, nil,
fmt.Errorf("ssh-keygen: %w (%s)", kerr, strings.TrimSpace(string(out)))
}
pubKey, err := os.ReadFile(keyPath + ".pub")
if err != nil {
_ = os.RemoveAll(keyDir)
return eicSSHSession{}, nil, fmt.Errorf("read pubkey: %w", err)
}
if err := sendSSHPublicKey(ctx, region, instanceID, osUser,
strings.TrimSpace(string(pubKey))); err != nil {
_ = os.RemoveAll(keyDir)
return eicSSHSession{}, nil, fmt.Errorf("send-ssh-public-key: %w", err)
}
localPort, err := pickFreePort()
if err != nil {
_ = os.RemoveAll(keyDir)
return eicSSHSession{}, nil, fmt.Errorf("pick free port: %w", err)
}
tunnel := openTunnelCmd(eicSSHOptions{
InstanceID: instanceID,
OSUser: osUser,
Region: region,
LocalPort: localPort,
PrivateKeyPath: keyPath,
})
tunnel.Env = os.Environ()
if err := tunnel.Start(); err != nil {
_ = os.RemoveAll(keyDir)
return eicSSHSession{}, nil, fmt.Errorf("open-tunnel start: %w", err)
}
if err := waitForPort(ctx, "127.0.0.1", localPort, 10*time.Second); err != nil {
if tunnel.Process != nil {
_ = tunnel.Process.Kill()
}
_ = tunnel.Wait()
_ = os.RemoveAll(keyDir)
return eicSSHSession{}, nil, fmt.Errorf("tunnel never listened: %w", err)
}
cleanedUp := false
cleanup := func() {
if cleanedUp {
return
}
cleanedUp = true
if tunnel.Process != nil {
_ = tunnel.Process.Kill()
}
_ = tunnel.Wait()
_ = os.RemoveAll(keyDir)
}
return eicSSHSession{
keyPath: keyPath,
localPort: localPort,
osUser: osUser,
instanceID: instanceID,
}, cleanup, nil
}
// init wires the pool into the package-level withEICTunnel var so
// every read/write/list/delete EIC op uses pooled tunnels by default.
// Test files that need single-shot behaviour can swap withEICTunnel
// back via the existing stubWithEICTunnel pattern, OR set poolTTL=0
// to disable pooling without rebinding the var.
func init() {
initEICTunnelPool()
}
@@ -1,467 +0,0 @@
package handlers
// eic_tunnel_pool_test.go — tests for the refcounted EIC tunnel pool
// added in core#11. Stubs poolSetupTunnel with a counter so the
// tests don't fork ssh-keygen / aws subprocesses.
//
// Per memory feedback_assert_exact_not_substring: each test pins
// exact expected counts (not "at least N") so a regression that
// silently double-sets-up surfaces here.
import (
"context"
"errors"
"sync"
"sync/atomic"
"testing"
"time"
)
// withPoolSetupStub swaps poolSetupTunnel for a counting fake that
// returns a sentinel session and a cleanup func that records its
// invocation. Restores on test cleanup.
//
// setupSignal blocks each setup until released — for concurrent-
// acquire tests where we want to gate setup completion.
func withPoolSetupStub(t *testing.T) (
setupCount *int64, cleanupCount *int64, restore func(), unblock func()) {
t.Helper()
prev := poolSetupTunnel
prevTTL := poolTTL
prevJanitor := poolJanitorInterval
var sc, cc int64
setupCount, cleanupCount = &sc, &cc
gate := make(chan struct{}, 1)
gate <- struct{}{} // allow the first setup through immediately
unblock = func() { gate <- struct{}{} }
poolSetupTunnel = func(ctx context.Context, instanceID string) (
eicSSHSession, func(), error) {
select {
case <-gate:
case <-ctx.Done():
return eicSSHSession{}, nil, ctx.Err()
}
atomic.AddInt64(&sc, 1)
sess := eicSSHSession{
instanceID: instanceID,
osUser: "ubuntu",
localPort: 10000 + int(atomic.LoadInt64(&sc)),
keyPath: "/tmp/molecule-eic-test-" + instanceID,
}
cleanup := func() { atomic.AddInt64(&cc, 1) }
return sess, cleanup, nil
}
restore = func() {
poolSetupTunnel = prev
poolTTL = prevTTL
poolJanitorInterval = prevJanitor
}
t.Cleanup(restore)
return
}
// freshPool returns an isolated pool (NOT the global) so tests run
// independently. Stops the janitor on cleanup.
func freshPool(t *testing.T) *eicTunnelPool {
t.Helper()
p := newEICTunnelPool()
t.Cleanup(p.stop)
return p
}
// TestEICTunnelPool_FourOpsAmortise pins the core invariant: four
// sequential acquire/release cycles on the same instanceID share
// ONE underlying tunnel setup. Mutation: delete the cache hit branch
// in acquire() → setupCount goes 1 → 4 → test fails.
func TestEICTunnelPool_FourOpsAmortise(t *testing.T) {
setupCount, cleanupCount, _, _ := withPoolSetupStub(t)
// Refill gate after each setup so concurrent stubs aren't blocked
// (we want every test to be able to set up if it needs to).
t.Cleanup(func() { /* no-op; defer is enough */ })
poolTTL = 50 * time.Second
pool := freshPool(t)
ctx := context.Background()
for i := 0; i < 4; i++ {
sess, done, err := pool.acquire(ctx, "i-test-1")
if err != nil {
t.Fatalf("op %d: acquire: %v", i, err)
}
if sess.instanceID != "i-test-1" {
t.Fatalf("op %d: session has wrong instanceID: %q", i, sess.instanceID)
}
done(false)
}
if got := atomic.LoadInt64(setupCount); got != 1 {
t.Errorf("expected exactly 1 tunnel setup across 4 ops, got %d", got)
}
if got := atomic.LoadInt64(cleanupCount); got != 0 {
t.Errorf("expected 0 cleanups while entry is hot (TTL=50s), got %d", got)
}
}
// TestEICTunnelPool_DifferentInstancesDoNotShare pins that two
// different instanceIDs each get their own tunnel — the pool is
// keyed on instanceID, not a single global slot.
func TestEICTunnelPool_DifferentInstancesDoNotShare(t *testing.T) {
setupCount, _, _, unblock := withPoolSetupStub(t)
poolTTL = 50 * time.Second
pool := freshPool(t)
ctx := context.Background()
// First instance setup uses the initial gate slot.
_, doneA, err := pool.acquire(ctx, "i-a")
if err != nil {
t.Fatalf("acquire A: %v", err)
}
doneA(false)
// Second instance needs a new slot through the gate.
unblock()
_, doneB, err := pool.acquire(ctx, "i-b")
if err != nil {
t.Fatalf("acquire B: %v", err)
}
doneB(false)
if got := atomic.LoadInt64(setupCount); got != 2 {
t.Errorf("expected 2 setups (one per instance), got %d", got)
}
}
// TestEICTunnelPool_TTLEviction: a short TTL forces the second op
// to build a fresh tunnel after the first expires.
func TestEICTunnelPool_TTLEviction(t *testing.T) {
setupCount, cleanupCount, _, unblock := withPoolSetupStub(t)
poolTTL = 50 * time.Millisecond
poolJanitorInterval = 1 * time.Second // keep janitor away
pool := freshPool(t)
ctx := context.Background()
_, done, err := pool.acquire(ctx, "i-ttl")
if err != nil {
t.Fatalf("acquire 1: %v", err)
}
done(false)
time.Sleep(80 * time.Millisecond) // past TTL
unblock() // allow next setup
_, done, err = pool.acquire(ctx, "i-ttl")
if err != nil {
t.Fatalf("acquire 2: %v", err)
}
done(false)
if got := atomic.LoadInt64(setupCount); got != 2 {
t.Errorf("expected 2 setups (TTL eviction between), got %d", got)
}
// First entry should have been cleaned up when the second
// acquire evicted it on the slow path. Cleanup runs in a
// goroutine; poll briefly for it to land.
deadline := time.Now().Add(500 * time.Millisecond)
for atomic.LoadInt64(cleanupCount) < 1 && time.Now().Before(deadline) {
time.Sleep(5 * time.Millisecond)
}
if got := atomic.LoadInt64(cleanupCount); got < 1 {
t.Errorf("expected ≥1 cleanup (first entry evicted), got %d", got)
}
}
// TestEICTunnelPool_FailureInvalidates pins the poison-on-fault
// behavior — fn returning a tunnel-fatal error marks the entry
// unusable so the next acquire builds fresh.
func TestEICTunnelPool_FailureInvalidates(t *testing.T) {
setupCount, _, _, unblock := withPoolSetupStub(t)
poolTTL = 50 * time.Second
pool := freshPool(t)
ctx := context.Background()
_, done, err := pool.acquire(ctx, "i-fault")
if err != nil {
t.Fatalf("acquire 1: %v", err)
}
done(true) // signal poison
unblock() // let the next setup through
_, done, err = pool.acquire(ctx, "i-fault")
if err != nil {
t.Fatalf("acquire 2: %v", err)
}
done(false)
if got := atomic.LoadInt64(setupCount); got != 2 {
t.Errorf("expected 2 setups (poison forced rebuild), got %d", got)
}
}
// TestEICTunnelPool_ConcurrentAcquireSingleSetup pins that N
// concurrent acquires for the same instanceID before any release
// only trigger ONE tunnel setup — the rest wait via pendingSetups.
//
// Without this guard each concurrent acquire would spawn its own
// tunnel and the loser-cleanup would still leak refcount. Mutation:
// delete the pendingSetups gate → setupCount goes 1 → N → fails.
func TestEICTunnelPool_ConcurrentAcquireSingleSetup(t *testing.T) {
setupCount, _, _, _ := withPoolSetupStub(t)
// Pause setup so all goroutines pile into the pending slot.
prev := poolSetupTunnel
gate := make(chan struct{})
poolSetupTunnel = func(ctx context.Context, instanceID string) (
eicSSHSession, func(), error) {
<-gate
atomic.AddInt64(setupCount, 1)
return eicSSHSession{instanceID: instanceID}, func() {}, nil
}
t.Cleanup(func() { poolSetupTunnel = prev })
poolTTL = 50 * time.Second
pool := freshPool(t)
ctx := context.Background()
const N = 8
type result struct {
done func(bool)
err error
}
results := make(chan result, N)
var startWg sync.WaitGroup
startWg.Add(N)
for i := 0; i < N; i++ {
go func() {
startWg.Done()
_, done, err := pool.acquire(ctx, "i-concurrent")
results <- result{done, err}
}()
}
startWg.Wait()
// give all N goroutines time to enter pool.acquire
time.Sleep(20 * time.Millisecond)
close(gate)
for i := 0; i < N; i++ {
r := <-results
if r.err != nil {
t.Fatalf("acquire %d: %v", i, r.err)
}
r.done(false)
}
if got := atomic.LoadInt64(setupCount); got != 1 {
t.Errorf("expected 1 setup across %d concurrent acquires, got %d", N, got)
}
}
// TestEICTunnelPool_TTLZeroDisablesPooling pins the escape hatch:
// poolTTL=0 means every acquire goes straight through to setup +
// cleanup, no entry kept. Useful for tests / opt-out.
func TestEICTunnelPool_TTLZeroDisablesPooling(t *testing.T) {
setupCount, cleanupCount, _, unblock := withPoolSetupStub(t)
poolTTL = 0
pool := freshPool(t)
ctx := context.Background()
_, done, err := pool.acquire(ctx, "i-ttlzero")
if err != nil {
t.Fatalf("acquire 1: %v", err)
}
done(false)
unblock()
_, done, err = pool.acquire(ctx, "i-ttlzero")
if err != nil {
t.Fatalf("acquire 2: %v", err)
}
done(false)
if got := atomic.LoadInt64(setupCount); got != 2 {
t.Errorf("expected 2 setups with TTL=0 (pool disabled), got %d", got)
}
if got := atomic.LoadInt64(cleanupCount); got != 2 {
t.Errorf("expected 2 cleanups with TTL=0 (each release closes), got %d", got)
}
}
// TestEICTunnelPool_LRUEvictionAtCap pins the LRU defence: when the
// pool reaches poolMaxEntries, a new acquire for an unseen
// instanceID evicts the LRU idle entry instead of growing unbounded.
func TestEICTunnelPool_LRUEvictionAtCap(t *testing.T) {
setupCount, cleanupCount, _, _ := withPoolSetupStub(t)
prev := poolMaxEntries
poolMaxEntries = 2
t.Cleanup(func() { poolMaxEntries = prev })
poolTTL = 50 * time.Second
// Replace stub with one that doesn't gate so we can fill quickly.
poolSetupTunnel = func(ctx context.Context, instanceID string) (
eicSSHSession, func(), error) {
atomic.AddInt64(setupCount, 1)
return eicSSHSession{instanceID: instanceID}, func() {
atomic.AddInt64(cleanupCount, 1)
}, nil
}
pool := freshPool(t)
ctx := context.Background()
for _, id := range []string{"i-1", "i-2"} {
_, done, err := pool.acquire(ctx, id)
if err != nil {
t.Fatalf("acquire %s: %v", id, err)
}
done(false)
}
// Both entries idle, pool at cap.
_, done, err := pool.acquire(ctx, "i-3")
if err != nil {
t.Fatalf("acquire i-3: %v", err)
}
done(false)
// Wait for the goroutine'd cleanup of the evicted entry.
deadline := time.Now().Add(500 * time.Millisecond)
for atomic.LoadInt64(cleanupCount) < 1 && time.Now().Before(deadline) {
time.Sleep(10 * time.Millisecond)
}
if got := atomic.LoadInt64(setupCount); got != 3 {
t.Errorf("expected 3 setups (one per unique instance), got %d", got)
}
if got := atomic.LoadInt64(cleanupCount); got < 1 {
t.Errorf("expected ≥1 cleanup (LRU eviction), got %d", got)
}
}
// TestEICTunnelPool_PoisonedClassification pins the heuristic that
// distinguishes tunnel-fatal errors (poison the entry) from
// app-level errors (file not found, validation) that should NOT
// invalidate the tunnel.
func TestEICTunnelPool_PoisonedClassification(t *testing.T) {
cases := []struct {
name string
err error
want bool
}{
{"nil", nil, false},
{"file not found", errors.New("os: file does not exist"), false},
{"validation", errors.New("invalid path: must be relative"), false},
{"connection refused", errors.New("ssh: connect to host: connection refused"), true},
{"connection refused upper", errors.New("Connection Refused"), true},
{"broken pipe", errors.New("write tunnel: broken pipe"), true},
{"permission denied", errors.New("Permission denied (publickey)"), true},
{"auth failed", errors.New("Authentication failed"), true},
{"connection reset", errors.New("Connection reset by peer"), true},
{"port forward", errors.New("port forwarding failed"), true},
}
for _, tc := range cases {
t.Run(tc.name, func(t *testing.T) {
got := fnErrIndicatesTunnelFault(tc.err)
if got != tc.want {
t.Errorf("fnErrIndicatesTunnelFault(%v) = %v, want %v",
tc.err, got, tc.want)
}
})
}
}
// TestEICTunnelPool_RefcountBlocksEviction pins that an entry past
// TTL is NOT evicted while a caller still holds it — preventing
// use-after-free in the holder.
func TestEICTunnelPool_RefcountBlocksEviction(t *testing.T) {
setupCount, cleanupCount, _, _ := withPoolSetupStub(t)
poolTTL = 30 * time.Millisecond
poolJanitorInterval = 5 * time.Millisecond
pool := freshPool(t)
ctx := context.Background()
_, done, err := pool.acquire(ctx, "i-hold")
if err != nil {
t.Fatalf("acquire: %v", err)
}
// Sleep past TTL while holding the session. Janitor sweeps
// every 5ms but must skip our entry because refcount=1.
time.Sleep(80 * time.Millisecond)
if got := atomic.LoadInt64(cleanupCount); got != 0 {
t.Errorf("expected 0 cleanups while holder is active, got %d", got)
}
done(false)
// Now refcount=0 and entry is past TTL; releaser triggers cleanup.
deadline := time.Now().Add(200 * time.Millisecond)
for atomic.LoadInt64(cleanupCount) < 1 && time.Now().Before(deadline) {
time.Sleep(5 * time.Millisecond)
}
if got := atomic.LoadInt64(cleanupCount); got != 1 {
t.Errorf("expected 1 cleanup after release of expired entry, got %d", got)
}
if got := atomic.LoadInt64(setupCount); got != 1 {
t.Errorf("setupCount tracking: got %d, want 1", got)
}
}
// TestPooledWithEICTunnel_PanicPoisonsEntry pins that a panic
// from fn poisons the pool entry on the way out — refcount goes
// back to zero (no leak) and the entry is marked unusable so the
// next acquire builds fresh. Without the defer-release pattern, a
// panic would leave refcount=1 forever and the entry would never
// evict.
func TestPooledWithEICTunnel_PanicPoisonsEntry(t *testing.T) {
setupCount, _, _, _ := withPoolSetupStub(t)
poolTTL = 50 * time.Second
globalEICTunnelPool = newEICTunnelPool()
t.Cleanup(globalEICTunnelPool.stop)
func() {
defer func() {
if r := recover(); r == nil {
t.Errorf("expected panic to bubble up, got nil")
}
}()
_ = pooledWithEICTunnel(context.Background(), "i-panic",
func(s eicSSHSession) error { panic("boom") })
}()
// Replenish the gate so the next setup can run.
prev := poolSetupTunnel
poolSetupTunnel = func(ctx context.Context, instanceID string) (
eicSSHSession, func(), error) {
atomic.AddInt64(setupCount, 1)
return eicSSHSession{instanceID: instanceID}, func() {}, nil
}
t.Cleanup(func() { poolSetupTunnel = prev })
// Next acquire must build fresh — entry was poisoned by panic.
if err := pooledWithEICTunnel(context.Background(), "i-panic",
func(s eicSSHSession) error { return nil }); err != nil {
t.Fatalf("post-panic acquire: %v", err)
}
if got := atomic.LoadInt64(setupCount); got != 2 {
t.Errorf("expected 2 setups (panic poisoned, rebuild), got %d", got)
}
}
// TestPooledWithEICTunnel_PreservesFnErr pins that errors from the
// inner fn pass through to the caller verbatim — pool wrapping
// should not swallow or transform error semantics for app code.
func TestPooledWithEICTunnel_PreservesFnErr(t *testing.T) {
withPoolSetupStub(t)
poolTTL = 50 * time.Second
// Reset the global pool so this test is isolated from any prior
// test that may have populated it.
globalEICTunnelPool = newEICTunnelPool()
want := errors.New("file does not exist")
got := pooledWithEICTunnel(context.Background(), "i-fn-err",
func(s eicSSHSession) error { return want })
if !errors.Is(got, want) {
t.Errorf("pooledWithEICTunnel returned %v, want %v", got, want)
}
}
@@ -0,0 +1,223 @@
package handlers
// mock_runtime.go — "mock" runtime: a virtual workspace that has no
// container, no EC2, no LLM, just hardcoded canned A2A replies. Built
// for the funding-demo "200-workspace mock org" so hongming can show
// investors a CEO/VPs/Managers/ICs hierarchy at scale without burning
// 200 EC2 instances or 200 Anthropic keys.
//
// Wire model:
// - org template declares `runtime: mock` on every workspace
// - createWorkspaceTree skips provisioning, sets status='online'
// directly (mirrors the `external` short-circuit, minus the URL +
// awaiting_agent dance)
// - proxyA2ARequest short-circuits on a mock-runtime target and
// returns a canned JSON-RPC reply; never calls resolveAgentURL,
// never opens an HTTP connection, never touches Docker/EC2
//
// The reply is JSON-RPC 2.0 + a2a-sdk v0.3 shape so the canvas's
// extractAgentText / extractTextsFromParts read it without any
// special-casing. We rotate over a small variant pool so a screen
// full of replies doesn't all read identical — gives the demo a bit
// of life without pretending to be a real agent.
import (
"context"
"crypto/sha1"
"database/sql"
"encoding/binary"
"encoding/json"
"errors"
"fmt"
"log"
"net/http"
"strings"
"time"
"github.com/Molecule-AI/molecule-monorepo/platform/internal/db"
"github.com/gin-gonic/gin"
"github.com/google/uuid"
)
// MockRuntimeName is the canonical runtime string a workspace row
// carries to opt into the canned-reply short-circuit. Kept as a const
// so the proxy's runtime-check + the org-import skip-block reference
// the same literal.
const MockRuntimeName = "mock"
// mockReplyVariants is the pool of canned strings the mock runtime
// rotates through. Picked to read like a busy-but-short reply from a
// real human in a hierarchy — a CEO would NOT respond with "On it!",
// but for the demo every node is shown to be reachable, so we lean
// into the variety. Variant selection is deterministic per
// (workspaceID, request-id) pair so a screen recording replays the
// same reply for the same input.
var mockReplyVariants = []string{
"On it!",
"Got it, on it now.",
"On it, boss.",
"Working on it.",
"Acknowledged — on it.",
"On it, will report back.",
"Roger that, on it.",
"Copy that. On it.",
"On it — ETA shortly.",
"On it. Standby for update.",
}
// pickMockReply returns a canned reply for the given workspaceID +
// requestID. Deterministic so the same (workspace, message-id) pair
// always picks the same variant — useful for screen recordings and
// flake-free e2e snapshots. Falls back to variant[0] if the inputs
// are empty.
func pickMockReply(workspaceID, requestID string) string {
if len(mockReplyVariants) == 0 {
return "On it!"
}
if workspaceID == "" && requestID == "" {
return mockReplyVariants[0]
}
h := sha1.Sum([]byte(workspaceID + ":" + requestID))
idx := int(binary.BigEndian.Uint32(h[0:4]) % uint32(len(mockReplyVariants)))
return mockReplyVariants[idx]
}
// lookupRuntime returns the workspace's runtime string. Empty when the
// row is missing / DB hiccup so callers fall through to the existing
// dispatch path (which will then 404 / 502 normally). Fail-open here
// because a transient DB error must not silently flip a real workspace
// into mock-mode and start handing out canned replies in place of
// genuine agent traffic.
func lookupRuntime(ctx context.Context, workspaceID string) string {
var runtime sql.NullString
err := db.DB.QueryRowContext(ctx,
`SELECT runtime FROM workspaces WHERE id = $1`, workspaceID,
).Scan(&runtime)
if err != nil {
if !errors.Is(err, sql.ErrNoRows) {
log.Printf("ProxyA2A: lookupRuntime(%s) failed (%v) — falling through to dispatch path", workspaceID, err)
}
return ""
}
if !runtime.Valid {
return ""
}
return runtime.String
}
// buildMockA2AResponse synthesises a JSON-RPC 2.0 success envelope that
// matches the a2a-sdk v0.3 reply shape the canvas's extractAgentText
// already understands: `{result: {parts: [{kind: "text", text: ...}]}}`.
// `requestID` is the JSON-RPC `id` of the inbound request — A2A
// implementations echo it on the reply so callers can correlate. We
// extract it from the normalized payload in the caller and pass it in
// here so this function stays JSON-only (no payload parsing).
//
// Returns marshalled bytes ready to write straight to the HTTP body.
// Marshal failure is logged + a tiny fallback envelope returned, since
// failing the whole request because of a JSON encoding hiccup on a
// constant-shaped payload would defeat the "mock always works" guarantee.
func buildMockA2AResponse(workspaceID, requestID, replyText string) []byte {
if requestID == "" {
requestID = uuid.New().String()
}
envelope := map[string]any{
"jsonrpc": "2.0",
"id": requestID,
"result": map[string]any{
"parts": []map[string]any{
{"kind": "text", "text": replyText},
},
},
}
out, err := json.Marshal(envelope)
if err != nil {
log.Printf("ProxyA2A: mock-runtime response marshal failed for %s: %v — emitting fallback", workspaceID, err)
// Hand-rolled minimal envelope. Safe because every value is a
// hardcoded constant string with no characters that need
// escaping in a JSON string literal.
fallback := fmt.Sprintf(
`{"jsonrpc":"2.0","id":%q,"result":{"parts":[{"kind":"text","text":%q}]}}`,
requestID, replyText,
)
return []byte(fallback)
}
return out
}
// extractRequestID pulls the JSON-RPC `id` out of an already-normalized
// A2A payload. Returns "" when the field is absent or not a string —
// caller substitutes a fresh UUID. Tolerant of every shape
// normalizeA2APayload could produce.
func extractRequestID(body []byte) string {
var top map[string]json.RawMessage
if err := json.Unmarshal(body, &top); err != nil {
return ""
}
raw, ok := top["id"]
if !ok {
return ""
}
var s string
if json.Unmarshal(raw, &s) == nil {
return s
}
// JSON-RPC permits numeric IDs too; canvas issues UUIDs but be
// defensive against alternative SDKs.
var n json.Number
if json.Unmarshal(raw, &n) == nil {
return n.String()
}
return ""
}
// handleMockA2A is the proxy short-circuit for mock-runtime workspaces.
// Returns (status, body, true) when the target is mock — caller writes
// the response and returns. Returns (_, _, false) when the target is
// not mock — caller continues to the real dispatch path.
//
// Side-effects: writes a synthetic activity_logs row via logA2ASuccess
// when logActivity is true so the canvas's "Agent Comms" tab shows the
// mock reply in the trace alongside real-agent traffic. Without this
// the demo would render messages on the canvas chat panel but a peer
// node clicking through to its activity tab would see an empty list.
func (h *WorkspaceHandler) handleMockA2A(ctx context.Context, workspaceID, callerID string, body []byte, a2aMethod string, logActivity bool) (int, []byte, bool) {
if lookupRuntime(ctx, workspaceID) != MockRuntimeName {
return 0, nil, false
}
requestID := extractRequestID(body)
replyText := pickMockReply(workspaceID, requestID)
respBody := buildMockA2AResponse(workspaceID, requestID, replyText)
// Tiny artificial delay so the canvas chat UI has time to render
// the user's outgoing bubble before the agent reply appears.
// Without it the reply lands the same animation frame and feels
// robotic. 80ms is too fast to look "real" but masks the React
// double-render race that drops the user bubble entirely on slow
// machines (observed locally on M1 Air, 2026-05-07). Below 200ms
// keeps a 200-node demo snappy when investors fan out 30 messages
// at once.
time.Sleep(80 * time.Millisecond)
if logActivity {
// Reuse the existing success-logger so the activity feed shape
// is identical to a real agent reply. Status 200 + duration 0
// is the "synthesised reply" marker; activity_logs.duration_ms
// being 0 is harmless (real fast paths can hit 0 too).
h.logA2ASuccess(ctx, workspaceID, callerID, body, respBody, a2aMethod, http.StatusOK, 0)
}
return http.StatusOK, respBody, true
}
// IsMockRuntime is a small public helper for callers outside this
// package (tests, the org importer) that need to ask the question
// without depending on the unexported constant. Trims + lower-cases
// so a typoed YAML cell like " Mock " still resolves correctly.
func IsMockRuntime(runtime string) bool {
return strings.EqualFold(strings.TrimSpace(runtime), MockRuntimeName)
}
// gin import is unused at file scope but kept as a tag so a future
// addition of a thin HTTP handler (e.g. POST /workspaces/:id/mock/replies
// for an admin-set custom reply pool) doesn't need an import re-order.
var _ = gin.H{}
@@ -0,0 +1,266 @@
package handlers
// mock_runtime_test.go — locks the contract for the mock-runtime
// short-circuit added for the funding-demo "200-workspace mock org"
// template. Three invariants:
//
// 1. ProxyA2A on a workspace with runtime='mock' must return 200
// with a JSON-RPC reply containing one text part. NO HTTP
// dispatch, NO resolveAgentURL DB read (mock workspaces have
// no URL — that read would 404 and break the demo).
//
// 2. The reply text must be one of the canned variants and must be
// deterministic for a given (workspace_id, request_id) pair so
// screen recordings replay identically.
//
// 3. Workspaces with runtime != 'mock' must NOT be affected — the
// mock check fails fast and falls through to the existing
// dispatch path. Same kind of regression guard the poll-mode
// tests carry.
import (
"bytes"
"encoding/json"
"net/http"
"net/http/httptest"
"testing"
"time"
"github.com/DATA-DOG/go-sqlmock"
"github.com/gin-gonic/gin"
)
// TestProxyA2A_MockRuntime_ReturnsCannedReply is the happy-path
// contract. A workspace flagged runtime='mock' must:
// - return 200 with JSON-RPC envelope {result:{parts:[{kind:text,text:...}]}}
// - not dispatch HTTP (no SELECT url SQL expected)
// - reply text is one of mockReplyVariants
func TestProxyA2A_MockRuntime_ReturnsCannedReply(t *testing.T) {
mock := setupTestDB(t)
setupTestRedis(t)
broadcaster := newTestBroadcaster()
handler := NewWorkspaceHandler(broadcaster, nil, "http://localhost:8080", t.TempDir())
const wsID = "ws-mock-canned"
// Budget check fires before runtime lookup (same as the poll-mode
// short-circuit) — keeps mock workspaces honest if a tenant ever
// sets a budget on one. Unlikely on a demo, but the guard stays
// uniform so future "monthly_spend on mock = 0" assertions don't
// drift.
expectBudgetCheck(mock, wsID)
// lookupDeliveryMode runs first — return push so the poll
// short-circuit doesn't fire and we hit the mock check.
mock.ExpectQuery("SELECT delivery_mode FROM workspaces WHERE id").
WithArgs(wsID).
WillReturnRows(sqlmock.NewRows([]string{"delivery_mode"}).AddRow("push"))
// lookupRuntime SELECT — returns 'mock', triggering the canned-reply
// short-circuit. CRITICAL: NO ExpectQuery for `SELECT url, status
// FROM workspaces` (resolveAgentURL's query). If the short-circuit
// fails to fire, sqlmock will surface "unexpected query" on the URL
// SELECT and the test fails loudly — that's the dispatch-leak detector.
mock.ExpectQuery("SELECT runtime FROM workspaces WHERE id").
WithArgs(wsID).
WillReturnRows(sqlmock.NewRows([]string{"runtime"}).AddRow("mock"))
// Activity log: logA2ASuccess writes the synthetic reply to
// activity_logs so the canvas's Agent Comms tab shows it alongside
// real-agent traffic.
mock.ExpectExec("INSERT INTO activity_logs").
WillReturnResult(sqlmock.NewResult(0, 1))
w := httptest.NewRecorder()
c, _ := gin.CreateTestContext(w)
c.Params = gin.Params{{Key: "id", Value: wsID}}
body := `{"jsonrpc":"2.0","id":"req-mock-1","method":"message/send","params":{"message":{"role":"user","parts":[{"kind":"text","text":"hello mock"}]}}}`
c.Request = httptest.NewRequest("POST", "/workspaces/"+wsID+"/a2a", bytes.NewBufferString(body))
c.Request.Header.Set("Content-Type", "application/json")
handler.ProxyA2A(c)
// logA2ASuccess fires async — give it a moment to settle so
// ExpectationsWereMet doesn't flake.
time.Sleep(200 * time.Millisecond)
if w.Code != http.StatusOK {
t.Fatalf("expected 200, got %d: %s", w.Code, w.Body.String())
}
var resp map[string]interface{}
if err := json.Unmarshal(w.Body.Bytes(), &resp); err != nil {
t.Fatalf("response is not valid JSON: %v", err)
}
if resp["jsonrpc"] != "2.0" {
t.Errorf("response.jsonrpc = %v, want 2.0", resp["jsonrpc"])
}
if resp["id"] != "req-mock-1" {
t.Errorf("response.id = %v, want %q (echoed from request)", resp["id"], "req-mock-1")
}
result, _ := resp["result"].(map[string]interface{})
if result == nil {
t.Fatalf("response.result missing or wrong type: %v", resp["result"])
}
parts, _ := result["parts"].([]interface{})
if len(parts) != 1 {
t.Fatalf("expected exactly one part, got %d: %v", len(parts), parts)
}
part, _ := parts[0].(map[string]interface{})
if part["kind"] != "text" {
t.Errorf("part.kind = %v, want text", part["kind"])
}
text, _ := part["text"].(string)
if text == "" {
t.Error("part.text is empty — canned reply not populated")
}
// Reply must be one of the variants.
matched := false
for _, v := range mockReplyVariants {
if v == text {
matched = true
break
}
}
if !matched {
t.Errorf("reply text %q is not in mockReplyVariants", text)
}
if err := mock.ExpectationsWereMet(); err != nil {
t.Errorf("unmet sqlmock expectations: %v", err)
}
}
// TestProxyA2A_NonMockRuntime_NoShortCircuit verifies the symmetric
// contract: a workspace with a real runtime (claude-code, hermes, etc.)
// must NOT be affected by the mock check — it falls through to the
// real dispatch path. Without this guard, a regression in
// lookupRuntime could silently flip every workspace into mock-mode
// and start handing out canned replies in place of real-agent traffic.
func TestProxyA2A_NonMockRuntime_NoShortCircuit(t *testing.T) {
mock := setupTestDB(t)
mr := setupTestRedis(t)
allowLoopbackForTest(t)
broadcaster := newTestBroadcaster()
handler := NewWorkspaceHandler(broadcaster, nil, "http://localhost:8080", t.TempDir())
const wsID = "ws-real-runtime"
dispatched := false
agentServer := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
dispatched = true
w.Header().Set("Content-Type", "application/json")
w.Write([]byte(`{"jsonrpc":"2.0","id":"1","result":{"status":"ok"}}`))
}))
defer agentServer.Close()
mr.Set("ws:"+wsID+":url", agentServer.URL)
expectBudgetCheck(mock, wsID)
// poll-mode SELECT — return push so we proceed past the poll
// short-circuit.
mock.ExpectQuery("SELECT delivery_mode FROM workspaces WHERE id").
WithArgs(wsID).
WillReturnRows(sqlmock.NewRows([]string{"delivery_mode"}).AddRow("push"))
// runtime SELECT — return claude-code so the mock check falls
// through.
mock.ExpectQuery("SELECT runtime FROM workspaces WHERE id").
WithArgs(wsID).
WillReturnRows(sqlmock.NewRows([]string{"runtime"}).AddRow("claude-code"))
mock.ExpectExec("INSERT INTO activity_logs").
WillReturnResult(sqlmock.NewResult(0, 1))
w := httptest.NewRecorder()
c, _ := gin.CreateTestContext(w)
c.Params = gin.Params{{Key: "id", Value: wsID}}
body := `{"jsonrpc":"2.0","id":"real-1","method":"message/send","params":{"message":{"role":"user","parts":[{"kind":"text","text":"hi"}]}}}`
c.Request = httptest.NewRequest("POST", "/workspaces/"+wsID+"/a2a", bytes.NewBufferString(body))
c.Request.Header.Set("Content-Type", "application/json")
handler.ProxyA2A(c)
time.Sleep(50 * time.Millisecond)
if w.Code != http.StatusOK {
t.Fatalf("expected 200, got %d: %s", w.Code, w.Body.String())
}
if !dispatched {
t.Error("non-mock runtime: expected the agent server to receive the request, but it did not — mock short-circuit may be over-firing")
}
if err := mock.ExpectationsWereMet(); err != nil {
t.Errorf("unmet sqlmock expectations: %v", err)
}
}
// TestPickMockReply_Deterministic locks the determinism contract:
// the same (workspaceID, requestID) input must yield the same variant
// every call. Required for screen recordings + flake-free e2e
// snapshots.
func TestPickMockReply_Deterministic(t *testing.T) {
cases := []struct {
ws, req string
}{
{"ws-1", "req-A"},
{"ws-1", "req-B"},
{"ws-2", "req-A"},
{"", ""},
}
for _, tc := range cases {
first := pickMockReply(tc.ws, tc.req)
for i := 0; i < 10; i++ {
next := pickMockReply(tc.ws, tc.req)
if next != first {
t.Errorf("pickMockReply(%q,%q) is not deterministic: got %q then %q",
tc.ws, tc.req, first, next)
}
}
}
}
// TestIsMockRuntime_TrimsAndCaseInsensitive — typos and stray
// whitespace in YAML must still resolve to mock so a single
// runtime: " Mock " entry doesn't silently get dispatched.
func TestIsMockRuntime_TrimsAndCaseInsensitive(t *testing.T) {
cases := map[string]bool{
"mock": true,
"MOCK": true,
" Mock ": true,
"mocky": false,
"": false,
"external": false,
"claude-code": false,
}
for in, want := range cases {
if got := IsMockRuntime(in); got != want {
t.Errorf("IsMockRuntime(%q) = %v, want %v", in, got, want)
}
}
}
// TestBuildMockA2AResponse_EchoesRequestID — JSON-RPC requires the
// reply id to match the request id so callers can correlate. Mock
// must hold this contract or canvas's correlation logic breaks.
func TestBuildMockA2AResponse_EchoesRequestID(t *testing.T) {
out := buildMockA2AResponse("ws-x", "req-echo-7", "On it!")
var resp map[string]interface{}
if err := json.Unmarshal(out, &resp); err != nil {
t.Fatalf("response is not valid JSON: %v", err)
}
if resp["id"] != "req-echo-7" {
t.Errorf("id = %v, want req-echo-7", resp["id"])
}
if resp["jsonrpc"] != "2.0" {
t.Errorf("jsonrpc = %v, want 2.0", resp["jsonrpc"])
}
result, _ := resp["result"].(map[string]interface{})
parts, _ := result["parts"].([]interface{})
if len(parts) != 1 {
t.Fatalf("expected 1 part, got %d", len(parts))
}
p, _ := parts[0].(map[string]interface{})
if p["text"] != "On it!" {
t.Errorf("part.text = %v, want On it!", p["text"])
}
}
@@ -250,6 +250,21 @@ func (h *OrgHandler) createWorkspaceTree(ws OrgWorkspace, parentID *string, absX
h.broadcaster.RecordAndBroadcast(ctx, string(events.EventWorkspaceOnline), id, map[string]interface{}{
"name": ws.Name, "external": true,
})
} else if IsMockRuntime(runtime) {
// Mock-runtime workspaces have no container, no EC2, no URL —
// the proxyA2ARequest short-circuit synthesises every reply
// from a canned variant pool (see mock_runtime.go). Status
// goes straight to 'online' so the canvas renders the node
// as reachable + the chat tab's send button is enabled. No
// URL is set; the proxy never tries to resolve one for mock
// runtimes. Built for the funding-demo "200-workspace mock
// org" template — visual scale without real backend cost.
if _, err := db.DB.ExecContext(ctx, `UPDATE workspaces SET status = $1 WHERE id = $2`, models.StatusOnline, id); err != nil {
log.Printf("Org import: mock workspace status update failed for %s: %v", ws.Name, err)
}
h.broadcaster.RecordAndBroadcast(ctx, string(events.EventWorkspaceOnline), id, map[string]interface{}{
"name": ws.Name, "mock": true, "runtime": runtime,
})
} else if h.workspace.HasProvisioner() {
// Provision container — either backend (CP for SaaS, local Docker
// for self-hosted) is fine. Pre-2026-05-05 this gate was
@@ -675,7 +690,23 @@ func (h *OrgHandler) recurseChildrenForImport(ws OrgWorkspace, parentID string,
if err := h.createWorkspaceTree(child, &parentID, childAbsX, childAbsY, slotX, slotY, defaults, orgBaseDir, results, provisionSem); err != nil {
return err
}
time.Sleep(workspaceCreatePacingMs * time.Millisecond)
// Pacing exists to throttle Docker container-spawn thundering
// during a self-hosted import. Mock-runtime children spawn no
// container — no Docker pressure, no LLM bursts, just DB
// inserts + a broadcast. Skipping the 2s sleep collapses a
// 200-workspace mock-org import from ~7min → ~5s, which is
// the difference between a snappy demo and a "did it freeze?"
// staring contest. Real (containerful) runtimes still pace.
// Inheritance: if the child itself doesn't declare a runtime,
// fall back to defaults.runtime — the org template sets
// runtime: mock once at the org level, not on every IC node.
childRuntime := child.Runtime
if childRuntime == "" {
childRuntime = defaults.Runtime
}
if !IsMockRuntime(childRuntime) {
time.Sleep(workspaceCreatePacingMs * time.Millisecond)
}
}
return nil
}
+33 -6
View File
@@ -4,6 +4,7 @@ import (
"bytes"
"context"
"io"
"log"
"os"
"path/filepath"
"strings"
@@ -177,16 +178,42 @@ func strDefault(m map[string]interface{}, key, fallback string) string {
return fallback
}
// findRunningContainer returns the live container name for workspaceID, or ""
// when the container is genuinely not running OR the daemon errored
// transiently. Routed through provisioner.RunningContainerName as the SSOT
// (molecule-core#10) so this handler agrees with healthsweep on the same
// inputs. Transient daemon errors are logged distinctly so triage doesn't
// confuse a flaky daemon with a stopped container.
func (h *PluginsHandler) findRunningContainer(ctx context.Context, workspaceID string) string {
if h.docker == nil {
name, err := provisioner.RunningContainerName(ctx, h.docker, workspaceID)
if err != nil {
log.Printf("plugins: docker inspect transient error for %s: %v (treating as not-running for this request)", workspaceID, err)
return ""
}
name := provisioner.ContainerName(workspaceID)
info, err := h.docker.ContainerInspect(ctx, name)
if err == nil && info.State.Running {
return name
return name
}
// isExternalRuntime reports whether the workspace's runtime is the
// `external` (remote-pull) shape introduced in Phase 30. External
// workspaces have no local container — `POST /plugins` (push-install via
// docker exec) doesn't apply to them; they pull via the download endpoint
// instead. Returns false (allow-install) if the lookup is unwired or
// errors — failing open here is safe because the downstream
// findRunningContainer step still gates on a real container being there.
//
// Background — molecule-core#10: without this check, external workspaces
// fall through to findRunningContainer's NotFound path and return a
// misleading 503 "container not running" instead of a clear "use the
// pull endpoint" message.
func (h *PluginsHandler) isExternalRuntime(workspaceID string) bool {
if h.runtimeLookup == nil {
return false
}
return ""
runtime, err := h.runtimeLookup(workspaceID)
if err != nil {
return false
}
return runtime == "external"
}
func (h *PluginsHandler) execAsRoot(ctx context.Context, containerName string, cmd []string) (string, error) {
@@ -0,0 +1,176 @@
package handlers
import (
"go/ast"
"go/parser"
"go/token"
"strings"
"testing"
)
// TestFindRunningContainer_RoutesThroughProvisionerSSOT is a behavior-based
// AST gate: it pins the invariant that PluginsHandler.findRunningContainer
// MUST go through provisioner.RunningContainerName for its is-running check,
// instead of carrying its own copy of cli.ContainerInspect logic.
//
// Background — molecule-core#10: a parallel impl of "is the workspace's
// container running" used to live in plugins.go. It drifted from the
// canonical impl in healthsweep (which goes through Provisioner.IsRunning
// → RunningContainerName) on edge cases like "transient daemon error" —
// the duplicate would 503 with a misleading message while healthsweep
// correctly stayed defensive. Consolidating onto RunningContainerName as
// the SSOT prevents any future copy from re-introducing that drift.
//
// Mutation invariant: if a future PR replaces the provisioner call with
// `h.docker.ContainerInspect(...)` directly, this test fails. That's the
// signal to either (a) extend RunningContainerName's contract OR (b)
// document why this call site needs to differ. Either way: the drift
// gets a reviewer's attention instead of shipping silently.
func TestFindRunningContainer_RoutesThroughProvisionerSSOT(t *testing.T) {
fset := token.NewFileSet()
file, err := parser.ParseFile(fset, "plugins.go", nil, parser.ParseComments)
if err != nil {
t.Fatalf("parse plugins.go: %v", err)
}
var fn *ast.FuncDecl
ast.Inspect(file, func(n ast.Node) bool {
f, ok := n.(*ast.FuncDecl)
if !ok || f.Name.Name != "findRunningContainer" {
return true
}
// Confirm receiver is *PluginsHandler so we don't pick up an unrelated
// helper of the same name. ast.Recv is a FieldList — receivers carry
// at most one field.
if f.Recv == nil || len(f.Recv.List) == 0 {
return true
}
fn = f
return false
})
if fn == nil {
t.Fatal("findRunningContainer not found in plugins.go — was it renamed? update this test or the SSOT routing assumption")
}
var (
callsRunningContainerName bool
callsContainerInspectRaw bool
)
ast.Inspect(fn.Body, func(n ast.Node) bool {
call, ok := n.(*ast.CallExpr)
if !ok {
return true
}
sel, ok := call.Fun.(*ast.SelectorExpr)
if !ok {
return true
}
// Pkg.Func form: provisioner.RunningContainerName(...)
if pkgIdent, ok := sel.X.(*ast.Ident); ok {
if pkgIdent.Name == "provisioner" && sel.Sel.Name == "RunningContainerName" {
callsRunningContainerName = true
}
}
// Receiver-then-method form: h.docker.ContainerInspect(...) /
// p.cli.ContainerInspect(...) — anything ending in
// .ContainerInspect that's NOT routed through provisioner.
if sel.Sel.Name == "ContainerInspect" {
callsContainerInspectRaw = true
}
return true
})
if !callsRunningContainerName {
t.Errorf(
"findRunningContainer must call provisioner.RunningContainerName for the SSOT inspect — see molecule-core#10. Found no such call.",
)
}
if callsContainerInspectRaw {
t.Errorf(
"findRunningContainer carries a direct ContainerInspect call. This is the parallel-impl drift molecule-core#10 fixed. " +
"Either route through provisioner.RunningContainerName OR — if a new use case truly needs a different inspect — extend RunningContainerName's contract first and update this gate to allow the specific delta.",
)
}
}
// TestProvisionerIsRunning_RoutesThroughRunningContainerName mirrors the
// gate above but for the OTHER consumer of the SSOT — Provisioner.IsRunning
// (called by healthsweep). If a future refactor makes IsRunning carry its
// own ContainerInspect again, the two consumers' edge-case behaviors will
// silently drift. Keep them yoked.
func TestProvisionerIsRunning_RoutesThroughRunningContainerName(t *testing.T) {
fset := token.NewFileSet()
file, err := parser.ParseFile(fset, "../provisioner/provisioner.go", nil, parser.ParseComments)
if err != nil {
t.Fatalf("parse provisioner.go: %v", err)
}
var fn *ast.FuncDecl
ast.Inspect(file, func(n ast.Node) bool {
f, ok := n.(*ast.FuncDecl)
if !ok || f.Name.Name != "IsRunning" || f.Recv == nil {
return true
}
// The receiver type must be *Provisioner specifically. CPProvisioner
// has its own IsRunning that talks HTTP to the controlplane and is
// out of scope for this gate.
if !receiverIs(f, "Provisioner") {
return true
}
fn = f
return false
})
if fn == nil {
t.Fatal("Provisioner.IsRunning not found — was it renamed? update this test")
}
var (
callsRunningContainerName bool
callsContainerInspectRaw bool
)
ast.Inspect(fn.Body, func(n ast.Node) bool {
call, ok := n.(*ast.CallExpr)
if !ok {
return true
}
// Same-package call: bare identifier (e.g. RunningContainerName(...)).
if id, ok := call.Fun.(*ast.Ident); ok && id.Name == "RunningContainerName" {
callsRunningContainerName = true
return true
}
// Selector call: pkg.Func (e.g. provisioner.RunningContainerName)
// OR recv.Method (e.g. p.cli.ContainerInspect).
sel, ok := call.Fun.(*ast.SelectorExpr)
if !ok {
return true
}
switch sel.Sel.Name {
case "RunningContainerName":
callsRunningContainerName = true
case "ContainerInspect":
callsContainerInspectRaw = true
}
return true
})
if !callsRunningContainerName {
t.Errorf("Provisioner.IsRunning must call RunningContainerName for the SSOT inspect — see molecule-core#10")
}
if callsContainerInspectRaw {
t.Errorf("Provisioner.IsRunning carries a direct ContainerInspect call; route through RunningContainerName instead")
}
}
// receiverIs reports whether fn's receiver is `*<typeName>` or `<typeName>`.
func receiverIs(fn *ast.FuncDecl, typeName string) bool {
if fn.Recv == nil || len(fn.Recv.List) == 0 {
return false
}
expr := fn.Recv.List[0].Type
if star, ok := expr.(*ast.StarExpr); ok {
expr = star.X
}
id, ok := expr.(*ast.Ident)
return ok && strings.EqualFold(id.Name, typeName)
}
@@ -32,6 +32,18 @@ import (
// inside the workspace at startup.
func (h *PluginsHandler) Install(c *gin.Context) {
workspaceID := c.Param("id")
// External-runtime guard (molecule-core#10): push-install via docker
// exec is meaningless for `runtime='external'` workspaces — they have
// no local container. Reject early with a hint pointing at the
// pull-mode endpoint, instead of falling through to a misleading
// "container not running" 503 from findRunningContainer.
if h.isExternalRuntime(workspaceID) {
c.JSON(http.StatusUnprocessableEntity, gin.H{
"error": "plugin install via push is not supported for external runtimes",
"hint": "external workspaces pull plugins via GET /workspaces/:id/plugins/:name/download",
})
return
}
// Cap the JSON body so a pathological POST can't exhaust parser memory.
bodyMax := envx.Int64("PLUGIN_INSTALL_BODY_MAX_BYTES", defaultInstallBodyMaxBytes)
c.Request.Body = http.MaxBytesReader(c.Writer, c.Request.Body, bodyMax)
@@ -93,6 +105,16 @@ func (h *PluginsHandler) Uninstall(c *gin.Context) {
pluginName := c.Param("name")
ctx := c.Request.Context()
// Mirror Install's external-runtime guard (molecule-core#10) so the
// two endpoints reject the same shape with the same message.
if h.isExternalRuntime(workspaceID) {
c.JSON(http.StatusUnprocessableEntity, gin.H{
"error": "plugin uninstall via docker exec is not supported for external runtimes",
"hint": "external workspaces manage their own plugin directory; remove it locally",
})
return
}
if err := validatePluginName(pluginName); err != nil {
c.JSON(http.StatusBadRequest, gin.H{"error": "invalid plugin name"})
return
@@ -0,0 +1,176 @@
package handlers
import (
"bytes"
"net/http"
"net/http/httptest"
"strings"
"testing"
"github.com/gin-gonic/gin"
)
// TestPluginInstall_ExternalRuntime_Returns422 — molecule-core#10.
// Install on a `runtime='external'` workspace must NOT fall through to
// findRunningContainer (which would 503 with a misleading "container not
// running"). It must return 422 with a hint pointing at the pull-mode
// download endpoint.
func TestPluginInstall_ExternalRuntime_Returns422(t *testing.T) {
h := NewPluginsHandler(t.TempDir(), nil, nil).
WithRuntimeLookup(func(workspaceID string) (string, error) {
return "external", nil
})
w := httptest.NewRecorder()
c, _ := gin.CreateTestContext(w)
c.Params = gin.Params{{Key: "id", Value: "ba1789b0-4d21-4f4f-a878-fa226bf77cf5"}}
c.Request = httptest.NewRequest(
"POST",
"/workspaces/ba1789b0-4d21-4f4f-a878-fa226bf77cf5/plugins",
bytes.NewBufferString(`{"source":"local://my-plugin"}`),
)
c.Request.Header.Set("Content-Type", "application/json")
h.Install(c)
if w.Code != http.StatusUnprocessableEntity {
t.Errorf("expected 422 (Unprocessable Entity) for runtime='external', got %d: %s", w.Code, w.Body.String())
}
if !strings.Contains(w.Body.String(), "external runtimes") {
t.Errorf("expected error body to mention 'external runtimes', got: %s", w.Body.String())
}
if !strings.Contains(w.Body.String(), "download") {
t.Errorf("expected error body to point at the download endpoint, got: %s", w.Body.String())
}
}
// TestPluginUninstall_ExternalRuntime_Returns422 — symmetric guard on the
// uninstall path (DELETE /workspaces/:id/plugins/:name). External
// workspaces manage their own plugin directory locally; the platform
// can't docker-exec into them.
func TestPluginUninstall_ExternalRuntime_Returns422(t *testing.T) {
h := NewPluginsHandler(t.TempDir(), nil, nil).
WithRuntimeLookup(func(workspaceID string) (string, error) {
return "external", nil
})
w := httptest.NewRecorder()
c, _ := gin.CreateTestContext(w)
c.Params = gin.Params{
{Key: "id", Value: "ba1789b0-4d21-4f4f-a878-fa226bf77cf5"},
{Key: "name", Value: "my-plugin"},
}
c.Request = httptest.NewRequest(
"DELETE",
"/workspaces/ba1789b0-4d21-4f4f-a878-fa226bf77cf5/plugins/my-plugin",
nil,
)
h.Uninstall(c)
if w.Code != http.StatusUnprocessableEntity {
t.Errorf("expected 422 for runtime='external', got %d: %s", w.Code, w.Body.String())
}
if !strings.Contains(w.Body.String(), "external runtimes") {
t.Errorf("expected error body to mention 'external runtimes', got: %s", w.Body.String())
}
}
// TestPluginInstall_ContainerBackedRuntime_FallsThroughGuard — the runtime
// guard MUST NOT short-circuit container-backed runtimes. With
// `runtime='claude-code'` the install proceeds past the guard; without a
// real plugin source it'll fail downstream (here: 404 from local resolver
// because no plugin staged), which is the correct error to surface.
//
// This is the mutation-test partner: deleting the `runtime == "external"`
// check would still pass TestPluginInstall_ExternalRuntime (because Install
// would 404 instead of 422 — but the test asserts 422), and would still
// pass this test (because both pre-fix and post-fix produce 404 here).
// What this case pins is "non-external still falls through," catching
// any over-eager guard that rejects all runtimes.
func TestPluginInstall_ContainerBackedRuntime_FallsThroughGuard(t *testing.T) {
h := NewPluginsHandler(t.TempDir(), nil, nil).
WithRuntimeLookup(func(workspaceID string) (string, error) {
return "claude-code", nil
})
w := httptest.NewRecorder()
c, _ := gin.CreateTestContext(w)
c.Params = gin.Params{{Key: "id", Value: "c7c28c0b-4ea5-4e75-9728-3ba860081708"}}
c.Request = httptest.NewRequest(
"POST",
"/workspaces/c7c28c0b-4ea5-4e75-9728-3ba860081708/plugins",
bytes.NewBufferString(`{"source":"local://nonexistent-plugin"}`),
)
c.Request.Header.Set("Content-Type", "application/json")
h.Install(c)
if w.Code == http.StatusUnprocessableEntity {
t.Errorf("runtime='claude-code' must fall through the external guard; got 422: %s", w.Body.String())
}
// The local resolver will fail to find the plugin → 404. Anything
// other than 422 (which would mean we mis-classified) is fine.
if w.Code != http.StatusNotFound {
t.Errorf("expected 404 (plugin not found in registry), got %d: %s", w.Code, w.Body.String())
}
}
// TestPluginInstall_NoRuntimeLookup_FailsOpen — when the runtime lookup
// is unwired (test fixtures, niche deploy shapes) the guard MUST default
// to allowing the install attempt. The downstream findRunningContainer
// step still gates on a real container, so failing open here doesn't
// expose a bypass — it just preserves backwards-compat with deployments
// that haven't wired the lookup.
func TestPluginInstall_NoRuntimeLookup_FailsOpen(t *testing.T) {
h := NewPluginsHandler(t.TempDir(), nil, nil) // NO WithRuntimeLookup
w := httptest.NewRecorder()
c, _ := gin.CreateTestContext(w)
c.Params = gin.Params{{Key: "id", Value: "ws-no-lookup"}}
c.Request = httptest.NewRequest(
"POST",
"/workspaces/ws-no-lookup/plugins",
bytes.NewBufferString(`{"source":"local://nonexistent"}`),
)
c.Request.Header.Set("Content-Type", "application/json")
h.Install(c)
if w.Code == http.StatusUnprocessableEntity {
t.Errorf("nil runtimeLookup must fall through (fail-open); got 422: %s", w.Body.String())
}
}
// TestPluginInstall_RuntimeLookupErrors_FailsOpen — same fail-open story
// for transient DB errors in the lookup. We don't want a momentary
// Postgres hiccup to flip every plugin install into a 422.
func TestPluginInstall_RuntimeLookupErrors_FailsOpen(t *testing.T) {
h := NewPluginsHandler(t.TempDir(), nil, nil).
WithRuntimeLookup(func(workspaceID string) (string, error) {
return "", errFakeDB
})
w := httptest.NewRecorder()
c, _ := gin.CreateTestContext(w)
c.Params = gin.Params{{Key: "id", Value: "ws-db-flake"}}
c.Request = httptest.NewRequest(
"POST",
"/workspaces/ws-db-flake/plugins",
bytes.NewBufferString(`{"source":"local://nonexistent"}`),
)
c.Request.Header.Set("Content-Type", "application/json")
h.Install(c)
if w.Code == http.StatusUnprocessableEntity {
t.Errorf("runtimeLookup error must fall through (fail-open); got 422: %s", w.Body.String())
}
}
// errFakeDB is a sentinel for the fail-open lookup-error case.
var errFakeDB = &fakeError{msg: "synthetic db error"}
type fakeError struct{ msg string }
func (e *fakeError) Error() string { return e.msg }
@@ -78,6 +78,10 @@ var fallbackRuntimes = map[string]struct{}{
"openclaw": {},
"codex": {},
"external": {},
// mock — virtual workspace with hardcoded canned A2A replies.
// No container, no EC2, no template repo. See mock_runtime.go
// for the full rationale (200-workspace funding-demo org).
"mock": {},
}
// loadRuntimesFromManifest builds the runtime allowlist from
@@ -104,6 +108,10 @@ func loadRuntimesFromManifest(path string) (map[string]struct{}, error) {
// the manifest doesn't know about it. Injected here so we
// don't need a special-case in every caller.
"external": {},
// mock is ALWAYS available for the same reason as external:
// virtual workspace, no template repo, never spawns a
// container. See mock_runtime.go.
"mock": {},
}
for _, e := range m.WorkspaceTemplates {
name := strings.TrimSpace(e.Name)
@@ -112,6 +112,19 @@ func (h *WorkspaceHandler) Restart(c *gin.Context) {
return
}
// runtime=mock: virtual workspace with canned A2A replies. No
// container, no EC2, no provisioning state to recycle. Mirror
// the external no-op so the canvas's Restart button doesn't
// silently fail or leak through to the (template-less) provisioner.
if dbRuntime == "mock" {
c.JSON(http.StatusOK, gin.H{
"status": "noop",
"runtime": "mock",
"message": "mock workspaces have no container — restart is a no-op",
})
return
}
// SaaS mode: cpProv handles workspace EC2 lifecycle. Self-hosted mode:
// provisioner handles local Docker containers. At least one must be
// available — previously only `provisioner` was checked, which broke
@@ -532,7 +545,9 @@ func (h *WorkspaceHandler) runRestartCycle(workspaceID string) {
}
// Don't auto-restart external workspaces (no Docker container)
if dbRuntime == "external" {
// or mock workspaces (no container, every reply is canned —
// see workspace-server/internal/handlers/mock_runtime.go).
if dbRuntime == "external" || dbRuntime == "mock" {
return
}
@@ -110,55 +110,10 @@ func (s *PostgresMessageStore) List(ctx context.Context, workspaceID string, opt
return nil, false, err
}
// Wire order: oldest-first within the page so canvas (and any
// future client) can render chronologically without per-pair
// reordering. The SQL is `ORDER BY created_at DESC LIMIT N` for
// pagination correctness, and activityRowToChatMessages emits
// [user, agent] within a row — so a naive client-side flat-reverse
// would swap the pair (agent before user at the same timestamp).
// Reversing ROW-AWARE here keeps the wire shape display-ready.
//
// Algorithm: group consecutive same-timestamp messages into row
// chunks (1-2 messages each), reverse the chunk order, flatten.
// Within-row [user, agent] order is preserved. Single-message
// rows (no agent reply yet, or attachments-only) collapse to
// 1-element chunks and still reverse correctly.
messages = reverseRowChunks(messages)
reachedEnd := rowCount < opts.Limit
return messages, reachedEnd, nil
}
// reverseRowChunks groups msgs by adjacent same-Timestamp runs and
// reverses the run order, preserving within-run order. Pairs of
// (user, agent) emitted by activityRowToChatMessages share a
// timestamp, so this keeps each pair internally ordered while
// reversing the row sequence.
func reverseRowChunks(msgs []ChatMessage) []ChatMessage {
if len(msgs) == 0 {
return msgs
}
var chunks [][]ChatMessage
cur := []ChatMessage{msgs[0]}
for i := 1; i < len(msgs); i++ {
if msgs[i].Timestamp == cur[len(cur)-1].Timestamp {
cur = append(cur, msgs[i])
} else {
chunks = append(chunks, cur)
cur = []ChatMessage{msgs[i]}
}
}
chunks = append(chunks, cur)
for i, j := 0, len(chunks)-1; i < j; i, j = i+1, j-1 {
chunks[i], chunks[j] = chunks[j], chunks[i]
}
out := make([]ChatMessage, 0, len(msgs))
for _, chunk := range chunks {
out = append(out, chunk...)
}
return out
}
// queryActivityRows is split from List so unit tests can exercise the
// parser without spinning a real DB. Internal — alternative impls
// shouldn't depend on the SQL shape.
@@ -14,13 +14,10 @@ package messagestore
// legacy source the server replaces; divergence == regression.
import (
"context"
"encoding/json"
"strings"
"testing"
"time"
"github.com/DATA-DOG/go-sqlmock"
)
const fixedTimestamp = "2026-04-25T18:00:00Z"
@@ -285,145 +282,6 @@ func TestChatHistory_NoAgentMessageWhenResponseHasNoTextNoFiles(t *testing.T) {
}
}
// =====================================================================
// List() integration — sqlmock-backed end-to-end via the real handler
// =====================================================================
// TestList_WireOrderIsOldestFirstAcrossPagedRows pins the integration
// invariant: List() returns wire-display-ready messages even though
// the underlying SQL is `ORDER BY created_at DESC`. This is the
// load-bearing test for PR-C-2 — without the row-aware reversal,
// canvas would render every paired bubble in the wrong order on every
// chat reload (agent before user within each timestamp).
//
// Mutation-test cover: removing the `messages = reverseRowChunks(...)`
// call in List() must turn this test red. (The lower-level
// TestReverseRowChunks_PreservesPairOrderAcrossRows pins the helper
// itself; this test pins that List ACTUALLY CALLS the helper.)
func TestList_WireOrderIsOldestFirstAcrossPagedRows(t *testing.T) {
db, mock, err := sqlmock.New()
if err != nil {
t.Fatalf("sqlmock.New: %v", err)
}
defer db.Close()
// Server's SQL is ORDER BY created_at DESC. Build mock rows in
// THAT order so the row-aware reversal has work to do.
rows := sqlmock.NewRows([]string{"created_at", "status", "request_body", "response_body"}).
AddRow(mustParseTime(t, "2026-05-05T00:03:00Z"), "ok",
`{"params":{"message":{"parts":[{"kind":"text","text":"u3"}]}}}`,
`{"result":"a3"}`).
AddRow(mustParseTime(t, "2026-05-05T00:02:00Z"), "ok",
`{"params":{"message":{"parts":[{"kind":"text","text":"u2"}]}}}`,
`{"result":"a2"}`).
AddRow(mustParseTime(t, "2026-05-05T00:01:00Z"), "ok",
`{"params":{"message":{"parts":[{"kind":"text","text":"u1"}]}}}`,
`{"result":"a1"}`)
mock.ExpectQuery(`SELECT created_at, status, request_body::text, response_body::text`).
WillReturnRows(rows)
store := NewPostgresMessageStore(db)
msgs, reachedEnd, err := store.List(context.Background(), "ws-1", ListOptions{Limit: 10})
if err != nil {
t.Fatalf("List: %v", err)
}
wantContents := []string{"u1", "a1", "u2", "a2", "u3", "a3"}
if len(msgs) != len(wantContents) {
t.Fatalf("len(msgs)=%d want %d; got=%v", len(msgs), len(wantContents), msgs)
}
for i, w := range wantContents {
if msgs[i].Content != w {
t.Errorf("idx %d: got %q want %q (full slice ordering broken; reverseRowChunks regressed?)", i, msgs[i].Content, w)
}
}
if !reachedEnd {
t.Errorf("3 rows < limit 10 should reach end, got reachedEnd=false")
}
if err := mock.ExpectationsWereMet(); err != nil {
t.Errorf("sqlmock expectations: %v", err)
}
}
// =====================================================================
// reverseRowChunks — wire-order helper added in PR-C-2
// =====================================================================
// TestReverseRowChunks_PreservesPairOrderAcrossRows pins the
// row-aware reversal that List() applies before returning. Server's
// SQL is `ORDER BY created_at DESC`, so messages come out
// newest-row-first; activityRowToChatMessages emits [user, agent]
// per row with same timestamp. A naive flat reversal of the messages
// slice would flip each pair (agent before user). reverseRowChunks
// reverses ROWS, preserving pair-internal order. Without this, canvas
// would render every paired bubble in the wrong order on every chat
// reload — the canvas-side reverse used to do the right thing because
// it reversed ROWS BEFORE flattening, but PR-C/D moved the flattening
// into the server, so the row-awareness has to live there too.
func TestReverseRowChunks_PreservesPairOrderAcrossRows(t *testing.T) {
// Build messages newest-row-first as List() collects them. Each
// row is a pair sharing a timestamp, with [user, agent] order.
in := []ChatMessage{
{Role: "user", Content: "user_3", Timestamp: "2026-05-05T00:03:00Z"},
{Role: "agent", Content: "agent_3", Timestamp: "2026-05-05T00:03:00Z"},
{Role: "user", Content: "user_2", Timestamp: "2026-05-05T00:02:00Z"},
{Role: "agent", Content: "agent_2", Timestamp: "2026-05-05T00:02:00Z"},
{Role: "user", Content: "user_1", Timestamp: "2026-05-05T00:01:00Z"},
{Role: "agent", Content: "agent_1", Timestamp: "2026-05-05T00:01:00Z"},
}
got := reverseRowChunks(in)
want := []struct {
role, content string
}{
{"user", "user_1"}, {"agent", "agent_1"},
{"user", "user_2"}, {"agent", "agent_2"},
{"user", "user_3"}, {"agent", "agent_3"},
}
if len(got) != len(want) {
t.Fatalf("len(got)=%d len(want)=%d", len(got), len(want))
}
for i, w := range want {
if got[i].Role != w.role || got[i].Content != w.content {
t.Errorf("idx %d: got role=%q content=%q want role=%q content=%q",
i, got[i].Role, got[i].Content, w.role, w.content)
}
}
}
// TestReverseRowChunks_HandlesSingleMessageRows pins the case where
// a row has only a user OR only an agent message (e.g., agent reply
// not yet recorded, attachments-only user upload). Naive reversal
// still works for single-message chunks; the test guards against a
// future change that special-cases the 2-message-row path.
func TestReverseRowChunks_HandlesSingleMessageRows(t *testing.T) {
in := []ChatMessage{
{Role: "user", Content: "u3", Timestamp: "2026-05-05T00:03:00Z"},
{Role: "user", Content: "u2", Timestamp: "2026-05-05T00:02:00Z"}, // single, no agent
{Role: "agent", Content: "a2", Timestamp: "2026-05-05T00:02:00Z"},
{Role: "user", Content: "u1", Timestamp: "2026-05-05T00:01:00Z"},
}
got := reverseRowChunks(in)
wantContents := []string{"u1", "u2", "a2", "u3"}
if len(got) != len(wantContents) {
t.Fatalf("len got=%d want=%d", len(got), len(wantContents))
}
for i, w := range wantContents {
if got[i].Content != w {
t.Errorf("idx %d: got %q want %q", i, got[i].Content, w)
}
}
}
// TestReverseRowChunks_EmptyInput returns nil/empty without panic.
func TestReverseRowChunks_EmptyInput(t *testing.T) {
got := reverseRowChunks(nil)
if len(got) != 0 {
t.Errorf("nil input should return empty, got %v", got)
}
}
// =====================================================================
// end-to-end shape — paired user + agent with same timestamp
// =====================================================================
@@ -1073,18 +1073,53 @@ func (p *Provisioner) IsRunning(ctx context.Context, workspaceID string) (bool,
if p == nil || p.cli == nil {
return false, ErrNoBackend
}
name := ContainerName(workspaceID)
info, err := p.cli.ContainerInspect(ctx, name)
name, err := RunningContainerName(ctx, p.cli, workspaceID)
if err != nil {
if isContainerNotFound(err) {
return false, nil
}
// Transient daemon error: caller treats !running as dead + restarts.
// Returning true + the underlying error preserves the error for
// metrics/logging without triggering the destructive path.
return true, err
}
return info.State.Running, nil
return name != "", nil
}
// RunningContainerName returns the container name for workspaceID iff the
// container exists AND is in the Running state. Single source of truth for
// "what live container should I exec into for this workspace?" — used by
// both Provisioner.IsRunning (healthsweep) and the plugins handler.
//
// Distinguishes three outcomes so callers can pick their own policy:
//
// - ("ws-<id>", nil): container is running. Caller can exec into it.
// - ("", nil): container does not exist OR exists but is stopped
// (NotFound, Exited, Created, Restarting…). Caller
// should treat as a definitive "not running."
// - ("", err): transient daemon error (timeout, socket EOF, ctx
// cancel). Caller should NOT infer "not running" —
// this could be a flaky daemon under load. Decide
// per-callsite whether to fail soft or hard.
//
// Background — molecule-core#10: the plugins handler used to carry its own
// copy of this inspect logic (`findRunningContainer`) which collapsed
// transient errors into the same "" return as a genuinely-stopped container.
// That hid daemon flakes as misleading 503 "container not running" responses
// AND let the two impls drift on edge-case behavior. This is the SSOT.
func RunningContainerName(ctx context.Context, cli *client.Client, workspaceID string) (string, error) {
if cli == nil {
return "", ErrNoBackend
}
name := ContainerName(workspaceID)
info, err := cli.ContainerInspect(ctx, name)
if err != nil {
if isContainerNotFound(err) {
return "", nil
}
return "", err
}
if info.State.Running {
return name, nil
}
return "", nil
}
// isContainerNotFound returns true when the Docker client indicates the
@@ -1,149 +0,0 @@
package registry
// cp_orphan_sweeper.go — SaaS-mode counterpart to orphan_sweeper.go.
//
// The Docker sweeper (StartOrphanSweeper) runs only when prov != nil
// (single-tenant Docker mode); SaaS tenants run cpProv != nil and prov
// == nil, so they get no sweep coverage from that path. This file fills
// the gap for the deprovision split-write race documented in #2989:
//
// 1. handlers/workspace_crud.go:365 marks workspaces.status = 'removed'.
// 2. workspace_crud.go:439 calls StopWorkspaceAuto → cpProv.Stop, which
// issues DELETE /cp/workspaces/:id?instance_id=… to controlplane.
// 3. If step 2 fails (CP transient 5xx, network blip, AWS hiccup), the
// inline path returns a 500 to the canvas — but the DB row is already
// at status='removed' with instance_id still populated. There's no
// retry, and the EC2 lives forever.
//
// This sweeper closes that gap by re-issuing cpProv.Stop on every cycle
// for any workspace at status='removed' with a non-NULL instance_id.
// Stop is idempotent: AWS TerminateInstance on an already-terminated
// instance is a no-op (per AWS docs), and CP's Deprovision handler
// (controlplane/internal/handlers/workspace_provision.go:289) handles
// the already-terminated and already-deleted-DNS cases via best-effort
// guards. On Stop success, the sweeper clears instance_id so the next
// cycle skips the row.
//
// Cadence + safety filters mirror the Docker sweeper:
// - 60s tick (OrphanSweepInterval)
// - 30s per-cycle deadline (orphanSweepDeadline)
// - LIMIT 100 per cycle so a sustained CP outage that backs up many
// orphans doesn't blow the request timeout; subsequent cycles drain.
//
// SSOT note: Stop's idempotency (no-op on empty instance_id, AWS
// terminate on already-terminated) is the load-bearing invariant. Any
// future change that adds non-idempotent side effects to cpProv.Stop
// must also gate this sweeper, or it will re-execute those side effects
// every 60s for every cleared-but-not-yet-NULL row.
import (
"context"
"log"
"time"
"github.com/Molecule-AI/molecule-monorepo/platform/internal/db"
)
// CPOrphanReaper is the dependency the SaaS-mode sweeper takes from
// the CP provisioner. *provisioner.CPProvisioner satisfies this
// naturally; tests inject fakes.
type CPOrphanReaper interface {
Stop(ctx context.Context, workspaceID string) error
}
// cpSweepLimit caps the per-cycle row count so a sustained CP outage
// can't make a single sweep cycle blow orphanSweepDeadline. With a
// 60s cadence and 100-row limit, drain rate is up to 100 orphans/min,
// which has never been approached even during the worst leak windows.
const cpSweepLimit = 100
// StartCPOrphanSweeper runs the SaaS-mode reconcile loop until ctx is
// cancelled. nil reaper makes the loop a no-op (matches the Docker
// sweeper's nil-tolerant pattern).
//
// Caller is expected to gate on `cpProv != nil` (matching how
// StartOrphanSweeper is gated on `prov != nil` at the call site in
// cmd/server/main.go) — passing a nil *CPProvisioner here would also
// short-circuit but the gate at the wiring site keeps the call shape
// symmetric across the two sweepers.
func StartCPOrphanSweeper(ctx context.Context, reaper CPOrphanReaper) {
if reaper == nil {
log.Println("CP orphan sweeper: reaper is nil — sweeper disabled")
return
}
log.Printf("CP orphan sweeper started — reconciling every %s", OrphanSweepInterval)
ticker := time.NewTicker(OrphanSweepInterval)
defer ticker.Stop()
cpSweepOnce(ctx, reaper)
for {
select {
case <-ctx.Done():
log.Println("CP orphan sweeper: shutdown")
return
case <-ticker.C:
cpSweepOnce(ctx, reaper)
}
}
}
// cpSweepOnce executes one reconcile pass. Defensive against db.DB
// being nil so a misconfigured boot doesn't panic.
func cpSweepOnce(parent context.Context, reaper CPOrphanReaper) {
if db.DB == nil {
return
}
ctx, cancel := context.WithTimeout(parent, orphanSweepDeadline)
defer cancel()
rows, err := db.DB.QueryContext(ctx, `
SELECT id::text
FROM workspaces
WHERE status = 'removed'
AND instance_id IS NOT NULL
AND instance_id != ''
ORDER BY updated_at DESC
LIMIT $1
`, cpSweepLimit)
if err != nil {
log.Printf("CP orphan sweeper: DB query failed: %v", err)
return
}
defer rows.Close()
var orphanIDs []string
for rows.Next() {
var id string
if scanErr := rows.Scan(&id); scanErr != nil {
log.Printf("CP orphan sweeper: row scan failed: %v", scanErr)
continue
}
orphanIDs = append(orphanIDs, id)
}
if iterErr := rows.Err(); iterErr != nil {
log.Printf("CP orphan sweeper: rows iteration failed: %v", iterErr)
return
}
for _, id := range orphanIDs {
log.Printf("CP orphan sweeper: terminating leaked EC2 for removed workspace %s", id)
if stopErr := reaper.Stop(ctx, id); stopErr != nil {
// CP-side error — transient 5xx, network, AWS hiccup. Leave
// instance_id populated so the next cycle retries. Loud-fail
// only at the log layer; the user-visible 500 was already
// returned by the inline path that triggered this orphan.
log.Printf("CP orphan sweeper: Stop failed for %s: %v — retry next cycle", id, stopErr)
continue
}
// Stop succeeded — clear instance_id so the next cycle skips this
// row. We can't use a tombstone column (no schema change in this
// PR); NULL'ing instance_id is the SSOT signal for "no live
// EC2 attached." The matching SELECT predicate above stays in
// sync with this UPDATE.
if _, updErr := db.DB.ExecContext(ctx,
`UPDATE workspaces SET instance_id = NULL, updated_at = now() WHERE id = $1`,
id,
); updErr != nil {
log.Printf("CP orphan sweeper: clear instance_id failed for %s: %v — next cycle will re-Stop (idempotent)", id, updErr)
}
}
}
@@ -1,266 +0,0 @@
package registry
import (
"context"
"errors"
"sync"
"testing"
"time"
"github.com/DATA-DOG/go-sqlmock"
"github.com/Molecule-AI/molecule-monorepo/platform/internal/db"
)
// fakeCPReaper is a hand-rolled CPOrphanReaper for the SaaS-mode
// sweeper tests. Records every Stop call so tests can assert which
// workspace IDs were re-issued.
type fakeCPReaper struct {
mu sync.Mutex
stopErr map[string]error
stopCalls []string
}
func (f *fakeCPReaper) Stop(_ context.Context, wsID string) error {
f.mu.Lock()
defer f.mu.Unlock()
f.stopCalls = append(f.stopCalls, wsID)
return f.stopErr[wsID]
}
// TestCPSweepOnce_StopSucceeds_ClearsInstanceID — happy path. Single
// removed-row with non-NULL instance_id; Stop succeeds; instance_id
// gets NULL'd so the next cycle won't re-sweep it.
func TestCPSweepOnce_StopSucceeds_ClearsInstanceID(t *testing.T) {
mock := setupTestDB(t)
reaper := &fakeCPReaper{}
mock.ExpectQuery(`(?s)^\s*SELECT id::text\s+FROM workspaces\s+WHERE status = 'removed'\s+AND instance_id IS NOT NULL\s+AND instance_id != ''\s+ORDER BY updated_at DESC\s+LIMIT \$1`).
WithArgs(cpSweepLimit).
WillReturnRows(sqlmock.NewRows([]string{"id"}).AddRow("ws-uuid-1"))
mock.ExpectExec(`UPDATE workspaces SET instance_id = NULL, updated_at = now\(\) WHERE id = \$1`).
WithArgs("ws-uuid-1").
WillReturnResult(sqlmock.NewResult(0, 1))
cpSweepOnce(context.Background(), reaper)
if len(reaper.stopCalls) != 1 || reaper.stopCalls[0] != "ws-uuid-1" {
t.Fatalf("expected Stop(ws-uuid-1), got %v", reaper.stopCalls)
}
if err := mock.ExpectationsWereMet(); err != nil {
t.Fatalf("unmet expectations: %v", err)
}
}
// TestCPSweepOnce_StopFails_KeepsInstanceID — CP transient failure.
// Stop returns an error; instance_id MUST stay populated so the next
// cycle retries. UPDATE must NOT fire.
func TestCPSweepOnce_StopFails_KeepsInstanceID(t *testing.T) {
mock := setupTestDB(t)
reaper := &fakeCPReaper{
stopErr: map[string]error{"ws-uuid-1": errors.New("CP returned 503")},
}
mock.ExpectQuery(`(?s)^\s*SELECT id::text\s+FROM workspaces`).
WithArgs(cpSweepLimit).
WillReturnRows(sqlmock.NewRows([]string{"id"}).AddRow("ws-uuid-1"))
// No ExpectExec for the UPDATE — sqlmock fails the test if the
// UPDATE fires.
cpSweepOnce(context.Background(), reaper)
if len(reaper.stopCalls) != 1 || reaper.stopCalls[0] != "ws-uuid-1" {
t.Fatalf("expected Stop(ws-uuid-1), got %v", reaper.stopCalls)
}
if err := mock.ExpectationsWereMet(); err != nil {
t.Fatalf("unmet expectations (UPDATE should NOT have fired): %v", err)
}
}
// TestCPSweepOnce_NoOrphans — empty result set is the steady state in
// healthy operation. No Stop, no UPDATE.
func TestCPSweepOnce_NoOrphans(t *testing.T) {
mock := setupTestDB(t)
reaper := &fakeCPReaper{}
mock.ExpectQuery(`(?s)^\s*SELECT id::text\s+FROM workspaces`).
WithArgs(cpSweepLimit).
WillReturnRows(sqlmock.NewRows([]string{"id"}))
cpSweepOnce(context.Background(), reaper)
if len(reaper.stopCalls) != 0 {
t.Fatalf("expected zero Stop calls, got %v", reaper.stopCalls)
}
if err := mock.ExpectationsWereMet(); err != nil {
t.Fatalf("unmet expectations: %v", err)
}
}
// TestCPSweepOnce_MultipleOrphans — all rows in the batch get Stop'd
// independently; one failure doesn't block others.
func TestCPSweepOnce_MultipleOrphans(t *testing.T) {
mock := setupTestDB(t)
reaper := &fakeCPReaper{
stopErr: map[string]error{"ws-uuid-2": errors.New("CP 503 on ws-uuid-2")},
}
mock.ExpectQuery(`(?s)^\s*SELECT id::text\s+FROM workspaces`).
WithArgs(cpSweepLimit).
WillReturnRows(sqlmock.NewRows([]string{"id"}).
AddRow("ws-uuid-1").
AddRow("ws-uuid-2").
AddRow("ws-uuid-3"))
// ws-uuid-1 succeeds → UPDATE fires.
mock.ExpectExec(`UPDATE workspaces SET instance_id = NULL`).
WithArgs("ws-uuid-1").
WillReturnResult(sqlmock.NewResult(0, 1))
// ws-uuid-2 fails → no UPDATE.
// ws-uuid-3 succeeds → UPDATE fires.
mock.ExpectExec(`UPDATE workspaces SET instance_id = NULL`).
WithArgs("ws-uuid-3").
WillReturnResult(sqlmock.NewResult(0, 1))
cpSweepOnce(context.Background(), reaper)
if len(reaper.stopCalls) != 3 {
t.Fatalf("expected Stop on all 3 ids, got %v", reaper.stopCalls)
}
if err := mock.ExpectationsWereMet(); err != nil {
t.Fatalf("unmet expectations: %v", err)
}
}
// TestCPSweepOnce_QueryError — DB transient failure. Sweep returns
// without panicking. No Stop calls.
func TestCPSweepOnce_QueryError(t *testing.T) {
mock := setupTestDB(t)
reaper := &fakeCPReaper{}
mock.ExpectQuery(`(?s)^\s*SELECT id::text\s+FROM workspaces`).
WithArgs(cpSweepLimit).
WillReturnError(errors.New("connection refused"))
cpSweepOnce(context.Background(), reaper)
if len(reaper.stopCalls) != 0 {
t.Fatalf("expected zero Stop calls on query error, got %v", reaper.stopCalls)
}
if err := mock.ExpectationsWereMet(); err != nil {
t.Fatalf("unmet expectations: %v", err)
}
}
// TestCPSweepOnce_UpdateError_LogsButContinues — Stop succeeded but
// the UPDATE to clear instance_id failed. Subsequent rows in the batch
// must still process; comment in cpSweepOnce promises idempotent re-Stop
// next cycle.
func TestCPSweepOnce_UpdateError_LogsButContinues(t *testing.T) {
mock := setupTestDB(t)
reaper := &fakeCPReaper{}
mock.ExpectQuery(`(?s)^\s*SELECT id::text\s+FROM workspaces`).
WithArgs(cpSweepLimit).
WillReturnRows(sqlmock.NewRows([]string{"id"}).
AddRow("ws-uuid-1").
AddRow("ws-uuid-2"))
mock.ExpectExec(`UPDATE workspaces SET instance_id = NULL`).
WithArgs("ws-uuid-1").
WillReturnError(errors.New("UPDATE timeout"))
mock.ExpectExec(`UPDATE workspaces SET instance_id = NULL`).
WithArgs("ws-uuid-2").
WillReturnResult(sqlmock.NewResult(0, 1))
cpSweepOnce(context.Background(), reaper)
if len(reaper.stopCalls) != 2 {
t.Fatalf("expected Stop on both ids despite UPDATE error on first, got %v", reaper.stopCalls)
}
if err := mock.ExpectationsWereMet(); err != nil {
t.Fatalf("unmet expectations: %v", err)
}
}
// TestCPSweepOnce_NilDB — defensive against db.DB being nil. Must not
// panic; must not call Stop.
func TestCPSweepOnce_NilDB(t *testing.T) {
saved := db.DB
db.DB = nil
t.Cleanup(func() { db.DB = saved })
reaper := &fakeCPReaper{}
cpSweepOnce(context.Background(), reaper)
if len(reaper.stopCalls) != 0 {
t.Fatalf("expected zero Stop calls when db.DB is nil, got %v", reaper.stopCalls)
}
}
// TestStartCPOrphanSweeper_NilReaperDisabled — boot-safety: a SaaS CP
// without cpProv configured must not start the loop (immediate return,
// no goroutine leak).
func TestStartCPOrphanSweeper_NilReaperDisabled(t *testing.T) {
ctx, cancel := context.WithCancel(context.Background())
defer cancel()
done := make(chan struct{})
go func() {
StartCPOrphanSweeper(ctx, nil)
close(done)
}()
select {
case <-done:
// expected — nil reaper short-circuits.
case <-time.After(500 * time.Millisecond):
t.Fatal("StartCPOrphanSweeper(nil) did not return immediately")
}
}
// TestStartCPOrphanSweeper_RunsOnceImmediatelyAndOnTick — cadence
// contract: kick off one sweep at boot (so a platform restart starts
// healing immediately), then once per OrphanSweepInterval. Verifies
// the loop terminates on ctx cancel.
func TestStartCPOrphanSweeper_RunsOnceImmediatelyAndOnTick(t *testing.T) {
mock := setupTestDB(t)
reaper := &fakeCPReaper{}
// Two sweeps within the test window: one immediate, one on the
// first tick. We can't shrink OrphanSweepInterval (it's a const),
// so assert "at least one immediate sweep" and let cancel close
// the loop.
mock.ExpectQuery(`(?s)^\s*SELECT id::text\s+FROM workspaces`).
WithArgs(cpSweepLimit).
WillReturnRows(sqlmock.NewRows([]string{"id"}))
// The ticker may or may not fire in the test window depending on
// scheduler; tolerate both shapes by registering a second optional
// expectation. sqlmock fails on UNREGISTERED queries, so register
// one more then accept either 1 or 2 fires.
mock.ExpectQuery(`(?s)^\s*SELECT id::text\s+FROM workspaces`).
WithArgs(cpSweepLimit).
WillReturnRows(sqlmock.NewRows([]string{"id"}))
ctx, cancel := context.WithCancel(context.Background())
done := make(chan struct{})
go func() {
StartCPOrphanSweeper(ctx, reaper)
close(done)
}()
// 100ms is well past the boot-sweep but well shy of the 60s
// interval, so the second query expectation is intentionally
// unmet — that's fine, sqlmock distinguishes "expected but not
// received" (we don't enforce here) from "unexpected query"
// (which would fail).
time.Sleep(100 * time.Millisecond)
cancel()
select {
case <-done:
// expected
case <-time.After(2 * time.Second):
t.Fatal("StartCPOrphanSweeper did not exit on ctx cancel")
}
// Boot sweep must have happened — without it, an operator restart
// after a CP outage would leave a 60s gap before the first heal.
// We don't assert mock.ExpectationsWereMet() here because the
// second query is intentionally optional.
}
@@ -71,9 +71,15 @@ func StartHealthSweep(ctx context.Context, checker ContainerChecker, interval ti
}
func sweepOnlineWorkspaces(ctx context.Context, checker ContainerChecker, onOffline OfflineHandler) {
// Skip external workspaces (runtime='external') — they have no Docker container
// Skip external + mock workspaces — neither has a Docker container.
// external: agent runs on operator's laptop, polled via heartbeat.
// mock: virtual workspace, every reply is canned (see
// workspace-server/internal/handlers/mock_runtime.go). Both would
// false-positive as "container gone" on every sweep tick and
// auto-restart would loop forever (provisioner has no template
// for either runtime).
rows, err := db.DB.QueryContext(ctx,
`SELECT id FROM workspaces WHERE status IN ('online', 'degraded') AND COALESCE(runtime, 'langgraph') != 'external'`)
`SELECT id FROM workspaces WHERE status IN ('online', 'degraded') AND COALESCE(runtime, 'langgraph') NOT IN ('external', 'mock')`)
if err != nil {
log.Printf("Health sweep: query error: %v", err)
return
@@ -413,22 +413,20 @@ func sweepStaleTokensWithoutContainer(ctx context.Context, reaper OrphanReaper)
// `"5m0s"` mismatch with Postgres interval grammar; passing seconds
// as an int keeps the binding portable.
graceSeconds := int(staleTokenGrace.Seconds())
// `runtime != 'external'` is load-bearing: external workspaces have NO
// local container by design (the agent runs off-host), so the
// "no live container" predicate below would match every external
// workspace and revoke its token. The token is the off-host agent's
// only authentication credential — revoking breaks the entire
// external-runtime feature. Discovered 2026-05-03 when a fresh
// external workspace had its token silently revoked ~6 minutes after
// creation by this sweep, killing the operator's MCP heartbeat and
// inbox poll with `HTTP 401 — token may be revoked`.
// `runtime NOT IN ('external','mock')` is load-bearing: neither
// runtime has a local container, so the "no live container"
// predicate below would match every row and revoke its token.
// external: token is the off-host agent's only credential —
// revoking breaks the entire external-runtime feature
// (incident 2026-05-03). mock: same shape — no container by
// design, see workspace-server/internal/handlers/mock_runtime.go.
rows, qErr := db.DB.QueryContext(ctx, `
SELECT DISTINCT t.workspace_id::text
FROM workspace_auth_tokens t
JOIN workspaces w ON w.id = t.workspace_id
WHERE t.revoked_at IS NULL
AND w.status NOT IN ('removed', 'provisioning')
AND w.runtime != 'external'
AND w.runtime NOT IN ('external', 'mock')
AND COALESCE(t.last_used_at, t.created_at) < now() - make_interval(secs => $2)
AND (
cardinality($1::text[]) = 0
@@ -26,7 +26,7 @@ import (
// accidentally matching a future query that opens with the same column
// name OR a regression that drops one of the load-bearing predicates.
func expectStaleTokenSweepNoOp(mock sqlmock.Sqlmock) {
mock.ExpectQuery(`(?s)^\s*SELECT DISTINCT t\.workspace_id::text\s+FROM workspace_auth_tokens.*status NOT IN \('removed', 'provisioning'\).*runtime != 'external'`).
mock.ExpectQuery(`(?s)^\s*SELECT DISTINCT t\.workspace_id::text\s+FROM workspace_auth_tokens.*status NOT IN \('removed', 'provisioning'\).*runtime NOT IN \('external', 'mock'\)`).
WillReturnRows(sqlmock.NewRows([]string{"workspace_id"}))
}
@@ -492,7 +492,7 @@ func TestSweepOnce_StaleTokenRevokeFiresWhenNoContainer(t *testing.T) {
// excludes 'external' (2026-05-03 fix — the sweep was incorrectly
// targeting external workspaces which have no container by design),
// and the staleness predicate appears in the SELECT.
mock.ExpectQuery(`(?s)^\s*SELECT DISTINCT t\.workspace_id::text\s+FROM workspace_auth_tokens.*status NOT IN \('removed', 'provisioning'\).*runtime != 'external'.*COALESCE\(t\.last_used_at, t\.created_at\) < now\(\) - make_interval`).
mock.ExpectQuery(`(?s)^\s*SELECT DISTINCT t\.workspace_id::text\s+FROM workspace_auth_tokens.*status NOT IN \('removed', 'provisioning'\).*runtime NOT IN \('external', 'mock'\).*COALESCE\(t\.last_used_at, t\.created_at\) < now\(\) - make_interval`).
WillReturnRows(sqlmock.NewRows([]string{"workspace_id"}).
AddRow(orphanedID))
@@ -548,7 +548,7 @@ func TestSweepOnce_StaleTokenRevokeFailureBailsLoop(t *testing.T) {
// Third-pass returns two stale-token workspaces; the first revoke
// errors. Loop must bail without attempting the second.
mock.ExpectQuery(`(?s)^\s*SELECT DISTINCT t\.workspace_id::text\s+FROM workspace_auth_tokens.*status NOT IN \('removed', 'provisioning'\).*runtime != 'external'`).
mock.ExpectQuery(`(?s)^\s*SELECT DISTINCT t\.workspace_id::text\s+FROM workspace_auth_tokens.*status NOT IN \('removed', 'provisioning'\).*runtime NOT IN \('external', 'mock'\)`).
WillReturnRows(sqlmock.NewRows([]string{"workspace_id"}).
AddRow("aaaa1111-0000-0000-0000-000000000000").
AddRow("bbbb2222-0000-0000-0000-000000000000"))
+1 -1
View File
@@ -2,7 +2,7 @@
# build-all.sh — Rebuild base image and optionally adapter images.
#
# NOTE: Adapters have been extracted to standalone template repos:
# https://git.moleculesai.app/molecule-ai/molecule-ai-workspace-template-<runtime>
# https://github.com/Molecule-AI/molecule-ai-workspace-template-<runtime>
#
# This script now only builds the base image from workspace/Dockerfile.
# Each adapter repo has its own Dockerfile that installs molecule-ai-workspace-runtime