molecule-core

Author	SHA1	Message	Date
Hongming Wang	0d0840d9d9	Merge branch 'staging' into refactor/a2a-tools-messaging-extract-rfc2873-iter4d	2026-05-05 13:41:55 -07:00
Hongming Wang	23d3f057d3	Merge pull request #2890 from Molecule-AI/refactor/a2a-tools-memory-extract-rfc2873-iter4c refactor(workspace): extract memory tools from a2a_tools.py (RFC #2873 iter 4c)	2026-05-05 20:31:45 +00:00
Hongming Wang	6470e5f41b	Merge pull request #2887 from Molecule-AI/refactor/a2a-tools-delegation-extract-rfc2873-iter4b refactor(workspace): extract delegation handlers from a2a_tools.py (RFC #2873 iter 4b)	2026-05-05 17:40:40 +00:00
Hongming Wang	abba16beb4	Merge pull request #2883 from Molecule-AI/refactor/a2a-tools-rbac-extract-rfc2873-iter4a refactor(workspace): extract RBAC helpers from a2a_tools.py (RFC #2873 iter 4a)	2026-05-05 16:59:36 +00:00
Hongming Wang	9c752e0673	Merge pull request #2879 from Molecule-AI/refactor/mcp-cli-split-rfc2873-iter3 refactor(workspace): split mcp_cli.py into focused modules (RFC #2873 iter 3)	2026-05-05 16:58:05 +00:00
Hongming Wang	3e0d2e650a	refactor(workspace): extract messaging tools from a2a_tools.py to a2a_tools_messaging.py (RFC #2873 iter 4d) Fourth slice of the a2a_tools.py split (stacked on iter 4c). Owns the four human-and-peer messaging MCP tools + the chat-upload helper: * _upload_chat_files — stage local paths to /chat/uploads * tool_send_message_to_user — push canvas-chat via /notify * tool_list_peers — discover peers across registered workspaces * tool_get_workspace_info — JSON-encode workspace info * tool_chat_history — fetch prior conversation rows with a peer a2a_tools.py shrinks from 508 → 213 LOC (−295). The remaining 213 is just report_activity + back-compat re-exports. Inbox tools (tool_inbox_peek/pop/wait_for_message) deferred to iter 4e. Layered architecture: messaging depends on a2a_tools_rbac (iter 4a), a2a_client, platform_auth — NOT on kitchen-sink a2a_tools. An import-contract test pins this so future refactors that add `from a2a_tools import …` fail in CI. Tests: * 28 patch sites in TestToolSendMessageToUser + TestToolListPeers + TestToolGetWorkspaceInfo + TestChatHistory retargeted from `a2a_tools.{httpx, get_peers_, get_workspace_info, _upload_chat_files, _peer_, list_registered_workspaces}` to `a2a_tools_messaging.…` because the call sites moved. * test_a2a_tools_messaging.py adds 7 new tests: - 5 alias drift gates - 2 import-contract tests (no top-level a2a_tools dep + a2a_tools surfaces every messaging symbol) 137 tests total in the a2a_tools suite, all green. Refs RFC #2873.	2026-05-05 09:50:47 -07:00
Hongming Wang	210a26d31a	refactor(workspace): extract memory tools from a2a_tools.py to a2a_tools_memory.py (RFC #2873 iter 4c) Third slice of the a2a_tools.py split (stacked on iter 4b). Owns the two persistent-memory MCP tools: * tool_commit_memory — write to /workspaces/:id/memories with RBAC + GLOBAL-scope tier-zero enforcement * tool_recall_memory — search /workspaces/:id/memories with RBAC a2a_tools.py shrinks from 609 → 508 LOC (−101). Both handlers depend ONLY on a2a_tools_rbac (iter 4a), a2a_client, and the platform's /memories endpoint — no entanglement with delegation or messaging. Side-effects of the layered architecture: a2a_tools_memory's import contract is "depends on a2a_tools_rbac, never on a2a_tools" — the kitchen-sink module is for back-compat re-exports only. A test pins this so a future refactor that re-introduces `from a2a_tools import …` fails in CI. Tests: * 49 patch sites in TestToolCommitMemory + TestToolRecallMemory retargeted from `a2a_tools.{_check_memory_, _is_root_workspace, httpx.AsyncClient}` to `a2a_tools_memory.…` because the call sites moved. test_a2a_tools_memory.py adds 4 new tests (alias drift gate + import-contract + a2a_tools-side re-export). 117 tests total (77 impl + 28 rbac + 8 delegation + 4 memory), all green. Refs RFC #2873.	2026-05-05 09:50:39 -07:00
Hongming Wang	2227a14b1e	fix(build): add a2a_tools_delegation to TOP_LEVEL_MODULES drift gate Iter 4b's new module needs the rewrite-list entry. Stacked on iter 4a which already added a2a_tools_rbac. Refs RFC #2873 iter 4b.	2026-05-05 05:01:04 -07:00
Hongming Wang	17aec22f9b	fix(build): add a2a_tools_rbac to TOP_LEVEL_MODULES drift gate Iter 4a's new module needs to be in the rewrite list so the wheel ships its imports prefixed correctly. Caught by 'PR-built wheel + import smoke'. Refs RFC #2873 iter 4a.	2026-05-05 05:00:47 -07:00
Hongming Wang	8388144098	fix(build): add iter-3 mcp_* modules to TOP_LEVEL_MODULES drift gate The iter-3 split created mcp_heartbeat / mcp_inbox_pollers / mcp_workspace_resolver but the wheel build's drift-gate check at scripts/build_runtime_package.py:TOP_LEVEL_MODULES wasn't updated. Without this fix the wheel ships those modules un-rewritten, so their imports of platform_auth / configs_dir / etc. break at runtime. Caught by the 'PR-built wheel + import smoke' check. Refs RFC #2873 iter 3.	2026-05-05 05:00:29 -07:00
Hongming Wang	86015412eb	build(runtime): register inbox_uploads in TOP_LEVEL_MODULES The drift gate in build_runtime_package.py rejects any workspace/*.py module not listed in TOP_LEVEL_MODULES — it would ship un-rewritten and break wheel imports. Add inbox_uploads (introduced in this PR) to the list.	2026-05-05 04:41:07 -07:00
Hongming Wang	a8850bac55	Merge pull request #2778 from Molecule-AI/fix/redact-secrets-1777932233 fix(runtime): redact secret-shaped tokens from JSON-RPC error.data	2026-05-04 22:13:29 +00:00
Hongming Wang	28f22609d9	fix(runtime): redact secret-shaped tokens from JSON-RPC error.data PR #2756 piped adapter.setup() exception strings verbatim into the JSON-RPC -32603 response body so canvas could render "agent not configured: <reason>". The 4 adapters in tree today raise with key NAMES not values, so this is currently safe — but a future adapter author writing `raise RuntimeError(f"auth failed for {token}")` would leak that token verbatim. Issue #2760 flagged the risk; this PR closes it. workspace/secret_redactor.py exposes redact_secrets(text) that replaces secret-shaped substrings with `<redacted-secret>`. Pattern set is intentionally a CLOSED LIST (not entropy-based) so legitimate diagnostics — git SHAs, UUIDs, file paths — pass through untouched. Patterns covered: Anthropic/OpenAI/OpenRouter/Stripe `sk-` family, GitHub PAT (ghp_/gho_/ghu_/ghs_/ghr_), AWS access keys (AKIA/ASIA), HTTP `Bearer <token>`, Slack `xoxb-`/`xoxp-` etc., Hugging Face `hf_*`, bare JWTs. Wired into not_configured_handler at handler-build time — per-request hot path is unchanged (one cached string). Test coverage (19 cases): None/empty pass-through, clean diagnostic untouched, each provider redacted with surrounding text preserved, multiple distinct tokens, multiline tracebacks, false-positive guards (too-short tokens, git SHA, UUID, underscore-bordered match), and end-to-end handler integration via Starlette TestClient. Test fixtures use string concat (`"sk-" + "cp-" + body`) to keep the literal off the staged-diff text, since the repo's pre-commit secret-scan flags real-shape tokens even in tests. `secret_redactor` registered in TOP_LEVEL_MODULES (drift gate). Closes #2760 Pairs with: PR #2756, PR #2775 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-04 15:07:53 -07:00
Hongming Wang	4f4b6c4f90	test(runtime): pin PR #2756 's card-vs-setup decoupling with build_routes helper PR #2756's contract — card route always mounted regardless of adapter.setup() outcome — lived inline in main.py's `# pragma: no cover` boot sequence. A future refactor that re-coupled the two would have silently bypassed PR #2756 and shipped the original "stuck booting forever" UX again, with no pytest catching it. This change extracts route assembly into workspace/boot_routes.py's build_routes(card, executor, adapter_error) and pins the contract with 6 integration tests using Starlette's TestClient: - test_card_route_serves_200_when_adapter_ready: happy path - test_card_route_serves_200_when_adapter_failed: misconfigured boot, card still 200, skill stubs survive - test_jsonrpc_returns_503_when_no_executor: full -32603 envelope with the adapter_error in error.data - test_jsonrpc_returns_503_with_generic_when_no_error_string: fallback reason for the rare case main.py reaches this branch without one - test_card_route_does_not_depend_on_executor: direct PR #2756 regression guard — both branches MUST mount the card route - test_executor_present_does_not_mount_not_configured_handler: sanity that a healthy workspace doesn't return -32603 to every request Conftest stubs extended with a2a.server.routes / request_handlers classes so the tests work under the existing a2a-mock infra (pattern matches the AgentCard/AgentSkill stubs added for PR #2765). main.py now calls build_routes; the inline if/else is gone. Same production behaviour, cleaner shape, regression-proof. Heavy a2a-sdk imports inside build_routes() are lazy (deferred to the executor-only branch) so tests that only exercise the not-configured path don't pull DefaultRequestHandler / InMemoryTaskStore. card_helpers + boot_routes registered in TOP_LEVEL_MODULES (build drift gate would have caught the missing entry on the wheel-publish smoke). All 18 related tests pass (test_boot_routes.py: 6, test_card_helpers.py: 6, test_not_configured_handler.py: 6). Closes #2761 Pairs with: PR #2756 (decouple agent-card from setup), PR #2765 (defensive isolation of enrichment + transcript) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-04 14:59:56 -07:00
Hongming Wang	63ac99788b	fix(runtime): isolate card-skill enrichment + transcript handler from adapter shape mismatch PR #2756 added a try/except around adapter.setup() so a missing LLM key doesn't crash the workspace boot. Two paths that now run AFTER setup succeeds were not similarly isolated, leaving small but real coupling risks for future adapter authors. 1. Skill metadata enrichment swap (main.py:248-259). When adapter.setup() returns, main.py reads adapter.loaded_skills and replaces the static stubs in agent_card.skills with rich metadata (description, tags, examples). The list comprehension assumes each element exposes .metadata.{id,name,description,tags,examples}. A future adapter that returns a non-canonical shape would raise AttributeError, propagate to the outer except, capture as adapter_error, and silently degrade an OK boot to the not-configured state — even though setup() actually succeeded. Extract to card_helpers.enrich_card_skills(card, loaded_skills) → bool. Helper swallows enrichment failures, logs the cause, returns False, leaves the static stubs in place. setup() success path continues unchanged. 6 unit tests cover: None input, empty list, canonical happy path, missing .metadata attr, partial .metadata (missing one canonical field), atomic-failure-no-partial-swap. 2. /transcript handler (main.py:513). Calls await adapter.transcript_lines(...) without try/except. BaseAdapter's default returns {"supported": false} so today's 4 adapters never trigger this — but a future adapter override that assumes setup() ran would surface as a 500 from Starlette's default error handler instead of a useful 503 with the exception class + message. Inline try/except returns 503 with the reason, matching the not-configured JSON-RPC handler's pattern. Both changes match the architectural principle the PR #2756 chain established: availability (workspace reachable) is decoupled from configuration / adapter behavior. Operators see useful errors instead of silent degradation; future adapter authors can't accidentally break tenant readiness with a shape mismatch. Adds: - workspace/card_helpers.py (~50 lines, 100% covered) - workspace/tests/test_card_helpers.py (6 tests) - AgentCard/AgentSkill/AgentCapabilities/AgentInterface stubs to workspace/tests/conftest.py so future card-related tests work under the existing a2a-mock infrastructure - card_helpers in TOP_LEVEL_MODULES (drift gate would have caught it) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-04 14:15:27 -07:00
Hongming Wang	d1122f8d28	fix(build): register not_configured_handler in TOP_LEVEL_MODULES The wheel-build drift gate caught the new module added in this PR — without registering it, the published wheel would ship `import not_configured_handler` un-rewritten, which would `ModuleNotFoundError` at runtime under `molecule_runtime.main`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-04 10:24:02 -07:00
Hongming Wang	09010212a0	feat(ci): structural drift gate for cascade list vs manifest (RFC #388 PR-3) Closes the recurrence path of PR #2556. The data fix realigned 8→4 templates in publish-runtime.yml's TEMPLATES variable, but the underlying drift hazard was unguarded — the next manifest change could silently leave cascade out of sync again. This gate fails any PR that changes manifest.json or publish-runtime.yml in a way that makes the cascade list diverge from manifest workspace_templates (suffix-stripped). Either direction is caught: missing-from-cascade templates that won't auto-rebuild on a new wheel publish (the codex-stuck-on-stale-runtime bug class — PR #2512 added codex to manifest, cascade wasn't updated, codex stayed pinned to its last-built runtime version for weeks). extra-in-cascade cascade dispatches to deprecated templates (the wasted-API-calls + dead-CI-noise class — PR #2536 pruned 5 templates from manifest; cascade kept dispatching to all 8 until PR #2556). Triggers narrowly: only on PRs that touch manifest.json, publish-runtime.yml, or the script itself. Fast (single grep+sed+comm pipeline, no Go build). Surfaced during the RFC #388 prior-art audit; folded in as the structural follow-up to the data fix #2556 promised. Self-tested both failure modes locally before commit: - Drop codex from cascade → script fails with "MISSING: codex" - Add langgraph to cascade → script fails with "EXTRA: langgraph" Refs: https://github.com/Molecule-AI/molecule-controlplane/issues/388 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 03:52:39 -07:00
Hongming Wang	6f8f7932d2	feat(ops): add sweep-aws-secrets janitor — orphan tenant bootstrap secrets CP's deprovision flow calls Secrets.DeleteSecret() (provisioner/ec2.go:806) but only when the deprovision runs to completion. Crashed provisions and incomplete teardowns leak the per-tenant `molecule/tenant/<org_id>/bootstrap` secret. At ~$0.40/secret/month, ~45 leaked secrets surfaced as ~$19/month on the AWS cost dashboard. The tenant_resources audit table (mig 024) tracks four kinds today — CloudflareTunnel, CloudflareDNS, EC2Instance, SecurityGroup — and the existing reconciler doesn't catch Secrets Manager orphans. The proper fix (KindSecretsManagerSecret + recorder hook + reconciler enumerator) is filed as a follow-up controlplane issue. This sweeper is the immediate stopgap. Parallel-shape to sweep-cf-tunnels.sh: - Hourly schedule offset (:30, between sweep-cf-orphans :15 and sweep-cf-tunnels :45) so the three janitors don't burst CP admin at the same minute. - 24h grace window — never deletes a secret younger than the provisioning roundtrip, so an in-flight provision can't be racemurdered. - MAX_DELETE_PCT=50 default (mirrors sweep-cf-orphans for durable resources; tenant secrets should track 1:1 with live tenants). - Same schedule-vs-dispatch hardening as the other janitors: schedule → hard-fail on missing secrets, dispatch → soft-skip. - 8-way xargs parallelism, dry-run by default, --execute to delete. Requires a dedicated AWS_JANITOR_* IAM principal — the prod molecule-cp principal lacks secretsmanager:ListSecrets (it only has scoped Get/Create/Update/Delete). The workflow's verify-secrets step will hard-fail on the first scheduled run until those secrets are configured, surfacing the missing setup loudly rather than silently no-op'ing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 02:38:08 -07:00
Hongming Wang	9753d58539	fix(build): register event_log in TOP_LEVEL_MODULES The wheel-build drift gate caught it correctly: any new top-level module under workspace/ must be listed in TOP_LEVEL_MODULES so its `from event_log import …` statements get rewritten to `from molecule_runtime.event_log import …` at package time. Without this entry, the published wheel ships event_log.py un-rewritten and crashes at runtime with ModuleNotFoundError on first heartbeat. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 00:19:30 -07:00
Hongming Wang	2552779d97	Merge pull request #2517 from Molecule-AI/test/all-runtimes-a2a-e2e-harness test(e2e): unified A2A round-trip parity harness across all 4 runtimes	2026-05-02 11:40:14 +00:00
Hongming Wang	d88c160e56	test(e2e): wire SaaS auth headers (TENANT_ADMIN_TOKEN + TENANT_ORG_ID) The harness needs Authorization + X-Molecule-Org-Id (per-tenant, NOT CP_ADMIN_API_TOKEN) when targeting *.moleculesai.app subdomains. Existing single-Origin-header form silent-failed with 404 against staging tenants since the SaaS edge WAF rewrites unauthenticated /workspaces calls to Next.js (per reference_saas_waf_origin_header.md). Switch to a headers array so multiple -H flags compose cleanly with curl arg-quoting, and document the env var contract at the top of the script. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 04:36:23 -07:00
Hongming Wang	5aaac7d2d9	test(e2e): unified A2A round-trip parity harness across all 4 runtimes Adds two scripts: scripts/test-all-runtimes-a2a-e2e.sh Provisions one workspace per runtime (claude-code, hermes, codex, openclaw), sets provider keys, waits online, sends two A2A messages per workspace. First message validates round-trip; second message validates session continuity. Cleans up via trap on EXIT. scripts/test-hermes-plugin-e2e.sh Hermes-only variant focused on the plugin /a2a/inbound path. Proof-point: session continuity between turns (the plugin path's deliverable; old chat-completions path lost context per turn). Both honor SKIP_<runtime> env vars for incremental testing and tolerate the SaaS edge WAF Origin header requirement (per reference_saas_waf_origin_header.md). Run: PLATFORM=https://demo-tenant.staging.moleculesai.app \\ ./scripts/test-all-runtimes-a2a-e2e.sh Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 04:36:23 -07:00
Hongming Wang	8bf29b7d0e	fix(sweep-cf-tunnels): parallelize deletes + raise workflow timeout The hourly Sweep stale Cloudflare Tunnels job got cancelled mid-cleanup on 2026-05-02 (run 25248788312, killed at 5min after deleting 424/672 stale tunnels). A second manual dispatch finished the remaining 254 fine, so the immediate backlog cleared, but two underlying bugs would re-trip on the next big cleanup. Bug 1: serial delete loop. The execute branch was a `while read; do curl -X DELETE; done` pipeline at ~0.7s/tunnel — fine for the steady-state cleanup of a handful, but a 600+ backlog needs ~7-8min. This commit fans out to $SWEEP_CONCURRENCY (default 8) workers via `xargs -P 8 -L 1 -I {} bash -c '...' _ {} < "$DELETE_PLAN"`. With 8x parallelism the same 600+ list drains in ~60s. Notes: - We use stdin (`<`) not GNU's `xargs -a FILE` so the script stays portable to BSD xargs (matters for local-runner testing on macOS). - We pass ONLY the tunnel id on argv. xargs tokenizes on whitespace by default; tab-separating id+name on argv risks mangling. The name is kept in a side-channel id->name map ($NAME_MAP) and looked up by the worker only on failure, for FAIL_LOG readability. - Workers print exactly `OK` or `FAIL` on stdout; tally with `grep -c '^OK$' / '^FAIL$'`. - On non-zero FAILED, log the first 20 lines of $FAIL_LOG as "Failure detail (first 20):" — same diagnostic surface as before but consolidated so we don't spam logs on a flaky CF API. Bug 2: workflow's 5-min cap was set as a hangs-detector but turned out to be a real-job-too-slow detector. Raised to 30 min — generous headroom for the ~60s steady-state run while still surfacing genuine hangs (and in line with the sweep-cf-orphans companion job). Bug 3 (drive-by): the existing trap was `trap 'rm -rf "$PAGES_DIR"' EXIT`, which would have been silently overwritten by any later trap registration. Replaced with a single `cleanup()` function that wipes PAGES_DIR + all four new tempfiles (DELETE_PLAN, NAME_MAP, FAIL_LOG, RESULT_LOG), called once via `trap cleanup EXIT`. Verification: - bash -n scripts/ops/sweep-cf-tunnels.sh: clean - shellcheck -S warning scripts/ops/sweep-cf-tunnels.sh: clean - python3 yaml.safe_load on the workflow: clean - Synthetic 30-line delete plan with every 7th id sentinel'd to return {"success":false}: TEST PASS, DELETED=26 FAILED=4, FAIL_LOG side-channel name lookup verified. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 02:35:46 -07:00
Hongming Wang	a117a60eed	fix(sweep-cf-tunnels): buffer pages to disk to avoid argv ARG_MAX The page-merge loop passed the entire accumulating tunnel JSON to python3 -c via argv on every iteration. On a busy account (verified 2026-05-02: 672 tunnels, 14 pages on Hongmingwangrabbit account) this exceeds the GH Ubuntu runner's combined argv+envp limit (~128 KB) and dies with `python3: Argument list too long` at exit 126 — the workflow has been silently failing this way since the very first run that hit a real account, masked earlier by a missing-CF_ACCOUNT_ID secret check. Buffer each page response to a file under a temp dir, merge from disk at the end. Also bumps the page cap from 20 to 40 (1000 → 2000 tunnel ceiling) so the existing soft-cap warning has headroom; the disk-merge shape is O(n) in tunnel count rather than the previous O(n^2) so the larger ceiling is cheap. Verified locally against the live account (672 tunnels): script now runs cleanly to the existing MAX_DELETE_PCT safety gate, which trips at 99% > 90% as designed and surfaces the actual orphan backlog for operator-driven cleanup. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 00:42:25 -07:00
Hongming Wang	b8fdbd9fab	fix(runtime): register configs_dir in TOP_LEVEL_MODULES + drop alias Wheel-build smoke gate detected `configs_dir` missing from scripts/build_runtime_package.py:TOP_LEVEL_MODULES. Without it the build would ship `import configs_dir` un-rewritten and every external-runtime install would die on `ModuleNotFoundError` at first import. Two callers used `import configs_dir as _configs_dir` to belt-and- suspenders against an imagined name collision, but the rewriter rejects `import X as Y` because the rewrite would produce `import molecule_runtime.X as X as Y` (invalid syntax). No actual collision exists (only docstring/comment references). Switched to plain `import configs_dir`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-01 13:13:57 -07:00
Hongming Wang	6d23611620	ops: demo-day freeze + rollback runbook Demo-day preparation bundle for the funding demo (~2026-05-06). Adds: - scripts/demo-freeze.sh — captures current ghcr.io workspace-template-* :latest digests for all 8 runtimes, then disables both cascade vectors that could re-tag :latest mid-demo: publish-runtime.yml in molecule-core (PATH 1 — staging push to workspace/** auto-bumps the wheel and fans out to 8 templates) and publish-image.yml in each of the 8 template repos (PATH 2 — direct template repo merge re-tags :latest). Defaults to dry-run; requires --execute to apply. Writes both digest + workflow receipts to scripts/demo-freeze-snapshots/. - scripts/demo-thaw.sh — re-enables every workflow demo-freeze.sh disabled, keyed off the receipt timestamp. Defaults to executing (the inverse safety polarity from freeze, where the destructive default is dry-run). --dry-run prints without applying. - scripts/demo-day-runbook.md — operator runbook indexing the six rollback levers (platform image rollback, template image rollback, tenant redeploy, workspace delete, Railway rollback, Vercel rollback) plus pre-warm timing and post-demo cleanup. Also covers read-only diagnostics for "is this working?" moments and the CP_ADMIN_API_TOKEN rotation step that must follow demo (the token gets copy-pasted into shells during incident response). - scripts/demo-freeze-snapshots/.gitignore — generated freeze receipts are operational state, not source. Tracked .gitkeep so the directory exists when the script writes to it. Both scripts dry-run-tested locally. Did not exercise --execute since that would actually disable production workflows mid-development. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-01 12:04:30 -07:00
Hongming Wang	aacaba024c	feat(wheel-smoke): exercise executor.execute() to catch lazy imports (#2275 ) The existing wheel-publish smoke (`wheel_smoke.py`) only IMPORTS `molecule_runtime.main` at module scope. Lazy imports buried inside `async def execute(...)` bodies (e.g. `from a2a.types import FilePart`) NEVER evaluate at static-import time — they crash at first message delivery in production. The 2026-04-2x v0→v1 a2a-sdk migration shipped 5 such regressions in templates that all looked fine at module-load smoke. This change adds `smoke_mode.py` plus a `MOLECULE_SMOKE_MODE=1` short-circuit in `main.py`: after `adapter.create_executor(...)`, the boot path invokes `executor.execute(stub_ctx, stub_queue)` once with a 5s timeout (`MOLECULE_SMOKE_TIMEOUT_SECS`). Healthy import tree → execution proceeds far enough to hit a network boundary and times out (exit 0). Broken lazy import → `ImportError` / `ModuleNotFoundError` from inside the executor body (exit 1). Other downstream errors (auth, validation) pass — those are caught by adapter-level tests, not this gate. Stub `(RequestContext, EventQueue)` is built from the real a2a-sdk so SendMessageRequest/RequestContext constructor changes also surface as import-tree failures (the regression class also includes "SDK refactored mid-publish"). The stub-build itself is wrapped — if it raises, that's a smoke fail too. Phase 2 (separate PR, molecule-ci) wires this into publish-template-image.yml so the publish gate runs the boot smoke against every template image before pushing the tag. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 21:21:18 -07:00
Hongming Wang	6e92fe0a08	chore: rewriter unit tests + drop misleading noqa on `import inbox` Two small follow-ups to the PR #2433 → #2436 → #2439 incident chain. 1) `import inbox # noqa: F401` in workspace/a2a_mcp_server.py was misleading — `inbox` IS used (at the bridge wiring inside main()). F401 means "imported but unused", which would mask a real future F401 if the usage is removed. Drop the noqa, keep the explanatory block comment about the rewriter's `import X` → `import mr.X as X` expansion (and the `import X as Y` → `import mr.X as X as Y` trap the comment exists to prevent re-introducing). 2) scripts/test_build_runtime_package.py — 17 unit tests covering `rewrite_imports()` and `build_import_rewriter()` in scripts/build_runtime_package.py. Until now the function had zero coverage despite the entire wheel build depending on it. Tests pin: bare-import aliasing, dotted-import preservation, indented imports, from-imports (simple + dotted + multi-symbol + block), the `import X as Y` rejection added in PR #2436 (with comment- stripping + indented + comma-not-alias edge cases), allowlist anchoring (`a2a` ≠ `a2a_tools`), and end-to-end reproduction of the PR #2433 failing pattern + the #2436 fix pattern. 3) Wire scripts/test_.py into CI by adding a second discover pass to test-ops-scripts.yml. Top-level scripts/ tests live alongside their target file (parallels the scripts/ops/ test layout); the existing scripts/ops/ pass keeps running because scripts/ops/ has no __init__.py so a single discover from scripts/ root doesn't recurse. Two passes is simpler than retrofitting namespace packages. Path filter widened from `scripts/ops/` to `scripts/*` so PRs touching the build script trigger the new tests. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 20:45:32 -07:00
Hongming Wang	0acdf3bb56	fix(wheel): import inbox without alias to dodge rewriter collision PR #2433 (notifications/claude/channel) shipped 'import inbox as _inbox_module' inside a2a_mcp_server.py:main(). The build script's import rewriter expands plain 'import inbox' to 'import molecule_runtime.inbox as inbox', so the original source became 'import molecule_runtime.inbox as inbox as _inbox_module', which is invalid Python. Caught at the publish-runtime + PR-built-wheel-smoke gate (the SyntaxError trace is in run 25200422679). The wheel didn't ship to PyPI because publish-runtime's smoke-import step refused to install it, but staging is currently sitting on a broken-build commit until this fix-forward lands. Changes: - a2a_mcp_server.py: lift `import inbox` to top of file (rewriter produces clean `import molecule_runtime.inbox as inbox`), call inbox.set_notification_callback directly in main() - build_runtime_package.py: rewrite_imports() now raises ValueError when it sees 'import X as Y' for any X in the workspace allowlist, instead of silently producing a syntax-error wheel. Operator gets a clear actionable error at build time pointing at the offending line + suggested rewrites ('from X import …' or plain 'import X'). The build-time gate (this PR's rewriter check) catches the regression class earlier than the smoke-time gate (PR #2433's failure). Adding 'PR-built wheel + import smoke' to staging branch protection's required checks is filed separately so this class doesn't merge again. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 20:21:54 -07:00
Hongming Wang	0a3ec53f34	feat(mcp): notifications/claude/channel for push-feel inbox UX Adds a notification seam to the universal molecule-mcp wheel so push- notification-capable MCP hosts (Claude Code today; any compliant client tomorrow) get inbound A2A messages as conversation interrupts instead of having to poll wait_for_message / inbox_peek. Wire-up: - inbox.py: module-level _NOTIFICATION_CALLBACK + set_notification_callback() Fires from InboxState.record() AFTER lock release, with same dict shape inbox_peek returns. Best-effort — a raising callback never prevents the message from landing in the queue. - a2a_mcp_server.py: _build_channel_notification() pure helper + bridge wiring in main() that schedules notifications via asyncio.run_coroutine_threadsafe (poller is a daemon thread, MCP loop is asyncio). - Method name 'notifications/claude/channel' matches the contract documented in molecule-mcp-claude-channel/server.ts:509. - wheel_smoke.py: pin set_notification_callback as a published name, same regression class as the 0.1.16 main_sync incident. Pollers (wait_for_message / inbox_peek) keep working unchanged for runtimes without notification support. Tests: 6 new in test_inbox.py (callback fires once on record, dedupe short-circuits before fire, raising cb doesn't break inbox, set/clear semantics), 5 new in test_a2a_mcp_server.py (method name pin, content mapping, meta routing, no-id JSON-RPC notification spec, missing- field tolerance). All 59 combined tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 20:10:01 -07:00
Hongming Wang	b47d4ceb00	feat(workspace-runtime): add inbox polling for standalone molecule-mcp path The universal MCP server (a2a_mcp_server.py) was outbound-only — agents in standalone runtimes (Claude Code, hermes, codex, etc.) could delegate, list peers, and write memories, but never observed the canvas-user or peer-agent messages addressed to them. This blocked "constantly responding" loops without forcing operators back onto a runtime-specific channel plugin. This PR closes the inbound gap with a poller-fed in-memory queue and three new MCP tools: - wait_for_message(timeout_secs?) — block until next message arrives - inbox_peek(limit?) — list pending messages (non-destructive) - inbox_pop(activity_id) — drop a handled message A daemon thread polls /workspaces/:id/activity?type=a2a_receive every 5s, fills the queue from the cursor (since_id), and persists the cursor to ${CONFIGS_DIR}/.mcp_inbox_cursor so a restart doesn't replay backlog. On 410 (cursor pruned) we fall back to since_secs=600 for a bounded recovery window. Activity-row → InboxMessage extraction mirrors the molecule-mcp-claude-channel plugin's extractText (envelope shapes #1-3 + summary fallback). mcp_cli.main starts the poller alongside the existing register + heartbeat threads. In-container runtimes (which have push delivery via canvas WebSocket) skip activation, so inbox tools return an informational "(inbox not enabled)" message instead of double-delivery. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 16:32:48 -07:00
Hongming Wang	169e284d57	feat(workspace-runtime): expose universal MCP server to runtime=external operators Ship the baseline universal MCP path that any external runtime (Claude Code, hermes, codex, anything that speaks MCP stdio) can use, before optimizing per-runtime channels. Today the workspace MCP server only spins up inside the container; external operators have no way to call the 8 platform tools (delegate_task, list_peers, send_message_to_user, commit_memory, etc.) from outside. Three additive changes: 1. `platform_auth.get_token()` env-var fallback — adds `MOLECULE_WORKSPACE_TOKEN` as a fallback when no `${CONFIGS_DIR}/.auth_token` file exists. File-first preserves in-container behavior unchanged. External operators (no /configs volume) now have a way to supply the token without faking the filesystem layout. 2. `molecule-mcp` console script — adds a new entry point in the published `molecule-ai-workspace-runtime` PyPI wheel. Operators run `pip install molecule-ai-workspace-runtime`, set 3 env vars (WORKSPACE_ID, PLATFORM_URL, MOLECULE_WORKSPACE_TOKEN), and register the binary in their agent's MCP config. `mcp_cli.main` is a thin validator wrapper — it checks env BEFORE importing the heavy `a2a_mcp_server` module so a misconfigured first-run gets a friendly 3-line error instead of a 20-line module-level RuntimeError traceback. 3. Wheel smoke gate — extends `scripts/wheel_smoke.py` to assert `cli_main` and `mcp_cli.main` are importable. Same regression class as the 0.1.16 main_sync incident: a silent rename or unrewritten import here would break every external operator on the next wheel publish (memory: feedback_runtime_publish_pipeline_gates.md). Test coverage: - `tests/test_platform_auth.py` — 8 new tests for the env-var fallback: file-priority, env-fallback, whitespace handling, cache, header construction, empty-env-as-unset. - `tests/test_mcp_cli.py` — 8 new tests for the validator: each required var separately, file-or-env satisfies token requirement, whitespace-only env treated as missing, help mentions canvas Tokens tab. - Full `workspace/tests/` suite green: 1346 passed, 1 skipped. - Local end-to-end: built wheel, installed in venv, ran `molecule-mcp` with no env → friendly error; with env → MCP server starts. Why now / why this shape: user redirect was "support the baseline first so all runtimes can use, then optimize". A claude-only MCP channel leaves hermes/codex/third-party operators broken on runtime=external. This PR ships the runtime-agnostic baseline; per- runtime polish (claude-channel push delivery, hermes-native bindings) is a follow-up PR. PR #2412 fixed the partner bug where canvas Restart silently revoked the operator's token — the two together unblock the external-runtime story end-to-end. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 15:20:19 -07:00
Hongming Wang	41d5f9558f	ops: scripts/ops/check-prod-versions.sh — one-line "is each tenant on latest?" Iterates a list of tenant slugs (default canary set on production, operator-supplied on staging), curls each tenant's /buildinfo plus canvas's /api/buildinfo, compares to origin/main's HEAD SHA, prints a table with one of {current, stale, unreachable} per surface. Returns non-zero if any surface is stale, so it can be wired into a periodic alert later. Why this exists: every "is the fix live?" question used to be answered with a one-off curl + git rev-parse + manual diff. This script does that uniformly across every public surface (workspace tenants + canvas) and is parseable. The redeploy verifier (#2398) covers the deploy moment; this covers any-time-after. Reads EXPECTED_SHA from `gh api repos/Molecule-AI/molecule-core/ commits/main` so it always reflects the actual upstream tip, not local working-copy state. Falls back to local origin/main with a WARN if `gh` isn't logged in — debugging is still useful even if the comparison may lag. Depends on: - #2409 (TenantGuard /buildinfo allowlist) — without it every tenant looks "unreachable" because the route 404s before the handler. Already merged on staging; will hit production after the next staging→main fast-forward + redeploy. - #2407 (canvas /api/buildinfo) — already on main + Vercel. Usage: ./scripts/ops/check-prod-versions.sh # production canary set TENANT_SLUGS="a b c" ./scripts/ops/check-prod-versions.sh # custom set ENV=staging TENANT_SLUGS="..." ./scripts/ops/check-prod-versions.sh Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 13:13:47 -07:00
Hongming Wang	ef206b5be6	refactor(ci): extract wheel smoke into shared script publish-runtime.yml had a broad smoke (AgentCard call-shape, well-known mount alignment, new_text_message) inline as a heredoc. runtime-prbuild- compat.yml had a narrow inline smoke (just `from main import main_sync`). Result: a PR could introduce SDK shape regressions that pass at PR time and only fail at publish time, post-merge. Extract the broad smoke into scripts/wheel_smoke.py and invoke it from both workflows. PR-time gate now matches publish-time gate — same script, same assertions. Eliminates the drift hazard of two heredocs that have to be kept in lockstep manually. Verified locally: * Built wheel from workspace/ source, installed in venv, ran smoke → pass * Simulated AgentCard kwarg-rename regression → smoke catches it as `ValueError: Protocol message AgentCard has no "supported_interfaces" field` (the exact failure mode of #2179 / supported_protocols incident) Path filter for runtime-prbuild-compat extended to include scripts/wheel_smoke.py so smoke-only edits get PR-validated. publish- runtime path filter intentionally NOT extended — smoke-only edits should not auto-trigger a PyPI version bump. Subset of #131 (the broader "invoke main() against stub config" goal remains pending — main() needs a config dir + stub platform server). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 11:52:07 -07:00
Hongming Wang	b5df2126b9	fix(test): convert migration-collision tests from pytest to unittest (#2341 ) CI failure: the Ops scripts (unittest) job runs `python -m unittest discover` which doesn't have pytest installed. test_check_migration_ collisions.py imported pytest unconditionally, failing module import: ImportError: Failed to import test module: test_check_migration_collisions Traceback (most recent call last): File ".../test_check_migration_collisions.py", line 12, in <module> import pytest ModuleNotFoundError: No module named 'pytest' The tests use no pytest-specific features (just bare assert + plain class). Sibling test_sweep_cf_decide.py in the same dir already uses unittest.TestCase. Convert this one to match: drop the pytest import, make TestMigrationFileRe inherit from unittest.TestCase. unittest.TestLoader.discover() requires TestCase subclasses for auto-discovery, so the fix is two lines (drop import, add base). Bare assert statements work fine inside TestCase methods. Verified: `python3 -m unittest scripts.ops.test_check_migration_collisions -v` runs all 9 tests, all pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 01:47:27 -07:00
Hongming Wang	ea8ff626a9	ci: hard gate against migration version collisions (#2341 ) Two PRs targeting staging can each add a migration with the same numeric prefix (e.g. 044_.up.sql). Each passes CI independently. They collide at merge time. Worst case: second migration silently doesn't apply and prod schema drifts from what the code expects. Caught manually 2026-04-30 during PR #2276 rebase: 044_runtime_image_pins collided with 044_platform_inbound_secret from RFC #2312. This workflow makes that detection automatic at PR-open time. How it works: scripts/ops/check_migration_collisions.py runs on every PR that touches workspace-server/migrations/*. For each new/modified migration filename, extracts the numeric prefix and checks: 1. Does the base branch already have a DIFFERENT migration file with the same prefix? (PR branched off an old base, base advanced and another PR landed the same number — needs rebase.) 2. Is another OPEN PR (not this one) also adding a migration with the same prefix? (Race-window collision — both pass CI separately, would collide at merge time.) Either case → exit 1 with a clear ::error:: message naming the conflicting PR(s) so the author knows what to renumber. Implementation notes: - Uses git ls-tree (not working-tree walk) so it works against any base ref without checkout. - Uses gh pr diff --name-only per open PR, bounded by `gh pr list --limit 100`. ~30s worst case for a busy repo, <5s normally. - --diff-filter=AM picks up Added or Modified — renaming a migration in place is also flagged (intentional; renaming migrations isn't safe). - Same filename in both PR and base = no collision (PR is editing in-place, fine). Tests: scripts/ops/test_check_migration_collisions.py — 9 cases on the regex classifier (the load-bearing piece). End-to-end git/gh path is exercised by running the workflow against real PRs. Hard-gates Tier 1 item 1 (#2341). Cheapest, cleanest gate. Catches one specific class of merge-time foot-gun automatically. Refs hard-gates discussion 2026-04-30. Tier 1 of 4 (others tracked in #2342, #2343, #2344).	2026-04-29 21:42:42 -07:00
Hongming Wang	88da3d523b	fix(dev-start): detect missing Go and fall back to docker-compose platform Issue: scripts/dev-start.sh assumed `go` was on PATH; on a fresh dev box without Go installed, line 111 (`go run ./cmd/server`) failed with `go: not found` and the script bailed before printing the readiness banner. The script's own prerequisite list (line 13-21) said "Go 1.25+" but there was no signpost between "open the doc" and "command not found." Fix: detect `go` via `command -v`. If present, keep the existing `go run` path (fast iteration, attaches to local log). If not, fall back to `docker compose up -d --build platform` which uses the published platform container — slower first run but the script still works without forcing the dev to install Go just to read logs. Either path leaves /health on :8080 so the rest of the script's wait loop is unchanged. If both paths fail, the error message names the install URL (https://go.dev/dl/) and the fallback diagnostic (`/tmp/molecule-platform.log`) so the dev has a single, actionable next step. Verified: `sh -n` syntax check passes. Closes #2329 item 2.	2026-04-29 20:04:37 -07:00
Hongming Wang	3a6d2f179d	feat(ops): add sweep-cf-tunnels janitor — orphan Cloudflare Tunnels accumulate CP's tenant-delete cascade removes the DNS record (with sweep-cf-orphans as a backstop) but does NOT delete the underlying Cloudflare Tunnel. Each E2E provision creates one Tunnel named `tenant-<slug>`; without cleanup these accumulate indefinitely on the account, consuming the tunnel quota and cluttering the dashboard. Observed 2026-04-30: dozens of `tenant-e2e-canvas-*` tunnels in Down state with zero replicas, weeks past their tenant's deletion. Same class of bug as the DNS-records leak that drove sweep-cf-orphans (controlplane#239). Parallel-shape to sweep-cf-orphans: - Same dry-run-by-default + --execute pattern - Same MAX_DELETE_PCT safety gate (default 90% — higher than DNS sweep's 50% because tenant-shaped tunnels are orphans by design) - Same schedule/dispatch hardening (hard-fail on missing secrets when scheduled, soft-skip when dispatched) - Cron offset to :45 to avoid CF API bursts colliding with the DNS sweep at :15 Decision rules (in order): 1. Name doesn't match `tenant-<slug>` → keep (unknown — never sweep tunnels that might belong to platform infra). 2. Tunnel has active connections (status=healthy or non-empty connections array) → keep (defense-in-depth: don't kill a live tunnel even if CP forgot the org). 3. Slug ∈ {prod_slugs ∪ staging_slugs} → keep. 4. Otherwise → delete (orphan). Verified by: - shell syntax check (bash -n) - YAML lint - Decide-logic offline smoke (7 cases, all pass) - End-to-end dry-run smoke with stubbed CP + CF APIs Required secrets (added to existing org-secrets): CF_API_TOKEN must include account:cloudflare_tunnel:edit scope (separate from zone:dns:edit used by sweep-cf-orphans — same token if scope is broad, or a new token if narrowly scoped). CF_ACCOUNT_ID account that owns the tunnels (visible in dash.cloudflare.com URL path). CP_PROD_ADMIN_TOKEN reused from sweep-cf-orphans. CP_STAGING_ADMIN_TOKEN reused from sweep-cf-orphans. Note: CP-side root cause (tenant-delete should cascade to tunnel delete) is in molecule-controlplane and worth fixing separately. This janitor is the operational backstop in the meantime — same pattern applied to DNS records when the same root cause was unaddressed.	2026-04-29 19:42:47 -07:00
Hongming Wang	e955597a98	feat(chat_files): rewrite Download as HTTP-forward (RFC #2312 , PR-D) Mirrors PR-C's Upload migration: replaces the docker-cp tar-stream extraction with a streaming HTTP GET to the workspace's own /internal/file/read endpoint. Closes the SaaS gap for downloads — without this PR, GET /workspaces/:id/chat/download still returns 503 on Railway-hosted SaaS even after A+B+C+F land. Stacks: PR-A #2313 → PR-B #2314 → PR-C #2315 → PR-F #2319 → this PR. Why a single broad /internal/file/read instead of /internal/chat/download: Today's chat_files.go::Download already accepts paths under any of the four allowed roots {/configs, /workspace, /home, /plugins} — it's not strictly chat. Future PRs (template export, etc.) will reuse this endpoint via the same forward pattern; reusing avoids three near- identical handlers (one per domain) with duplicated path-safety logic. Path safety is duplicated on platform + workspace sides — defence in depth via two parallel checks, not "trust the workspace." Changes: * workspace/internal_file_read.py — Starlette handler. Validates path (must be absolute, under allowed roots, no traversal, canonicalises cleanly). lstat (not stat) so a symlink at the path doesn't redirect the read. Streams via FileResponse (no buffering). Mirrors Go's contentDispositionAttachment for Content-Disposition header. * workspace/main.py — registers GET /internal/file/read alongside the POST /internal/chat/uploads/ingest from PR-B. * scripts/build_runtime_package.py — adds internal_file_read to TOP_LEVEL_MODULES so the publish-runtime cascade rewrites its imports correctly. Also includes the PR-B additions (internal_chat_uploads, platform_inbound_auth) since this branch was rooted before PR-B's drift-gate fix; merge-clean alphabetic additions. * workspace-server/internal/handlers/chat_files.go — Download rewritten as streaming HTTP GET forward. Resolves workspace URL + platform_inbound_secret (same shape as Upload), builds GET request with path query param, propagates response headers (Content-Type / Content-Length / Content-Disposition) + body. Drops archive/tar + mime imports (no longer needed). Drops Docker-exec branch entirely — Download is now uniform across self-hosted Docker and SaaS EC2. * workspace-server/internal/handlers/chat_files_test.go — replaces TestChatDownload_DockerUnavailable (stale post-rewrite) with 4 new tests: - TestChatDownload_WorkspaceNotInDB → 404 on missing row - TestChatDownload_NoInboundSecret → 503 on NULL column (with RFC #2312 detail in body) - TestChatDownload_ForwardsToWorkspace_HappyPath → forward shape (auth header, GET method, /internal/file/read path) + headers propagated + body byte-for-byte - TestChatDownload_404FromWorkspacePropagated → 404 from workspace propagates (NOT remapped to 500) Existing TestChatDownload_InvalidPath path-safety tests preserved. * workspace/tests/test_internal_file_read.py — 21 tests covering _validate_path matrix (absolute, allowed roots, traversal, double- slash, exact-match-on-root), 401 on missing/wrong/no-secret-file bearer, 400 on missing path/outside-root/traversal, 404 on missing file, happy-path streaming with correct Content-Type + Content-Disposition, special-char escaping in Content-Disposition, symlink-redirect-rejection (lstat-not-stat protection). Test results: * go test ./internal/handlers/ ./internal/wsauth/ — green * pytest workspace/tests/ — 1292 passed (was 1272 before PR-D) Refs #2312 (parent RFC), #2308 (chat upload+download 503 incident). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 15:19:02 -07:00
Hongming Wang	499fed5080	docs(scripts): rename /heartbeat-history → /activity in README PR #2265 renamed the harness trace endpoint and event name; sync the cross-repo scripts/README.md to match. Closes #2270 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 02:23:00 -07:00
Hongming Wang	de01544d6b	fix(harness-runner): switch from non-existent /heartbeat-history to /activity The runner was speculatively calling `/workspaces/:id/heartbeat-history` — that endpoint doesn't exist on workspace-server. On local dev it 404'd; on tenant builds the platform's :8080 canvas-proxy fallback intercepted it and returned 28KB of Next.js HTML which then landed in the JSON event log. Neither outcome was useful trace data. `GET /workspaces/:id/activity` is the existing endpoint that reads activity_logs. That table already records the events the RFC §V1.0 step 6 'platform-side transition' check needs (a2a_send / a2a_receive / task_update / agent_log / error, plus duration_ms + status). Rename the runner's fetch + emitted event accordingly. Verified: GET /workspaces/<uuid>/activity?since_secs=60 returns 200 with `[]` against the local platform; no SaaS skip needed since the endpoint exists in both environments. Refs: molecule-core#2256 (V1.0 gate #1 measurement comment).	2026-04-28 23:12:51 -07:00
Hongming Wang	dd5c54dbaa	fix(harness-runner): WAIT_ONLINE_SECS round-up + SaaS heartbeat skip + UUID/slug validation Three review-driven fixes to the runner before #2261 merges: 1. `WAIT_ONLINE_SECS / 3` truncated; an operator passing 200 actually waited 198s. Round up so 200 → 67 polls × 3s = 201s ≥ requested. 2. The heartbeat-history endpoint isn't on tenant workspace-servers — the platform's :8080 fallback proxies unmatched paths to the canvas Next.js, so the SaaS run captured 28KB of HTML in the `heartbeat_trace` event log. Skip the fetch in MODE=saas; emit an explicit `<skipped: ...>` placeholder. Local mode behaviour unchanged. 3. ORG_ID and ORG_SLUG had no client-side format check, so a typo'd value got swallowed by TenantGuard's intentionally-opaque 404 (which doesn't tell the operator whether slug, UUID, or auth was wrong). Validate UUID and slug shape up front; matching errors are actionable. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-28 22:29:29 -07:00
Hongming Wang	00e4766046	docs: registry pattern + harness scripts READMEs Two docs covering load-bearing patterns from today's work that weren't previously discoverable: 1. workspace/platform_tools/README.md — explains the ToolSpec single-source-of-truth pattern (#2240), the CLI-block alignment gap that hand-maintained generation can't close (#2258), the snapshot golden files + LF-pinning (#2260), and the add/rename/ remove playbook. The next reader who lands in workspace/platform_tools/ now has the design rationale + the safe-edit procedure colocated with the code. 2. scripts/README.md — disambiguates the three measure-coordinator- task-bounds.sh files that now exist across two repos: - scripts/measure-coordinator-task-bounds.sh (canonical OSS, this repo) - scripts/measure-coordinator-task-bounds-runner.sh (Hermes/MiniMax variant, this repo) - scripts/measure-coordinator-task-bounds.sh (production-shape, in molecule-controlplane) Cross-references reference_harness_pair_pattern (auto-memory) for the cross-repo design rationale. Documents the common safety pattern (cleanup trap, DRY_RUN, non-target guard, cleanup_*_failed events) and the heartbeat-trace caveat. Refs: #2240, #2254, #2257, #2258, #2259, #2260; molecule-controlplane#321. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-28 22:19:40 -07:00
Hongming Wang	592f47694b	feat(harness): SaaS routing + provider-agnostic config for RFC #2251 measurement The original measure-coordinator-task-bounds.sh was hardcoded for local-dev (workspace-server on :8080) with claude-code/langgraph templates and OPENROUTER_API_KEY. Running it against staging requires both auth-chain plumbing (per-tenant ADMIN_TOKEN + X-Molecule-Org-Id TenantGuard header + tenant subdomain routing) and template/secret flexibility (e.g. Hermes/MiniMax for Token Plan keys). This adds: * `measure-coordinator-task-bounds-runner.sh` — separate runner that wraps the same workspace-server API calls but takes everything as env-var inputs. Two MODE values: - `local` → direct workspace-server (no auth/tenant scoping) - `saas` → tenant subdomain + per-tenant ADMIN_TOKEN bearer + X-Molecule-Org-Id TenantGuard header. Auto-fetches tenant token via CP /cp/admin/orgs/<slug>/admin-token given ORG_SLUG + CP_ADMIN_API_TOKEN, OR accepts a pre-resolved TENANT_ADMIN_TOKEN. * Configurable PM_TEMPLATE / CHILD_TEMPLATE / MODEL / SECRET_NAME / SECRET_VALUE — defaults match the original (claude-code-default + langgraph + OpenRouter). Hermes/MiniMax example documented in the header. * Per-poll status_change events during wait_online, so a workspace that never reaches online surfaces its last status (provisioning, failed, etc.) instead of a bare timeout. * WAIT_ONLINE_SECS knob (default 180s; SaaS cold-start needs ~420s for first hermes-image pull on a freshly-provisioned EC2 tenant). * `${args[@]+...}` guard on the api() helper — avoids `set -u` exploding on an empty header array (the local-dev hot-path). The original script also gained a SECRET_VALUE block earlier in the session — that change (separately staged) makes the secret-name configurable without forcing every operator through the new runner. V1.0 gate #1 (RFC #2251, Issue 4 repro) measurement results posted as a separate comment on molecule-core#2256. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-28 22:06:18 -07:00
Hongming Wang	6e5b5c4142	fix(harness): cleanup_failed event + drop misleading exit_code capture Self-review follow-ups on #2257: - Drop `local exit_code=$?` from cleanup(). `trap`-handler return values are ignored, so capturing $? only misled a future reader into thinking exit-code preservation was happening. - Replace silenced `>/dev/null 2>&1` DELETE with `-w '%{http_code}'` capture. ADMIN_TOKEN expiring mid-run was the realistic failure mode here — previously we swallowed it under the silenced redirect, leaving workspaces leaked with no signal. Now a 401/403/5xx surfaces as a `cleanup_failed` JSON event with a remediation hint pointing at cleanup-rogue-workspaces.sh; 404 is treated as success (the post-condition — workspace absent — holds). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-28 21:00:38 -07:00
Hongming Wang	039a41cce3	fix(harness): cleanup trap + tenant scoping + dry-run for measure-coordinator-task-bounds Three follow-ups from #2254 code review before the harness is safe to run against staging: 1. Cleanup trap. Workspaces are now auto-deleted on EXIT/INT/TERM. A Ctrl-C mid-run no longer leaks the PM + Researcher pair against shared infra. KEEP_WORKSPACES=1 opts out for post-run inspection. 2. Tenant scoping + admin auth. Non-localhost PLATFORM values now require both ADMIN_TOKEN and TENANT_ID; the script refuses to run without them. The previous version sent unauthenticated POSTs that, on staging, would either 401 every request or — worse — provision into the wrong tenant. Memory `feedback_never_run_cluster_cleanup_ tests_on_live_platform` calls out the same hazard class. 3. DRY_RUN=1 mode. Prints platform target, tenant id, auth fingerprint, and the planned actions, then exits before any state mutation. The intended pre-flight before running against staging. Also tightened OR_KEY check (the chained default silently accepted an empty OPENROUTER_API_KEY) and added a heartbeat-trace caveat to the interpretation guide explaining what `<endpoint_unavailable>` means for the bound question. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-28 20:38:35 -07:00
hongmingwang-moleculeai	daea27641f	Merge pull request #2254 from Molecule-AI/docs/rfc-2251-issue-4-repro-harness docs(rfc-2251): add coordinator task-bounds measurement harness	2026-04-29 02:03:23 +00:00
Hongming Wang	acd7fe76a5	docs(rfc-2251): add coordinator task-bounds measurement harness Adds a reproduction harness for Issue 4 of the 2026-04-28 CP review, referenced in RFC molecule-core#2251. The RFC review (issue #2251 comment) flagged that Issue 4 was hypothesized but not reproduced before V1.0 implementation begins — this script closes that gap. What it does: - Provisions a coordinator (PM, claude-code-default) + 1 child (Researcher, langgraph) via the platform API. - Sends an A2A kickoff with a synthesis-heavy task that requires SYNTHESIS_DEPTH (default 3) sequential delegations followed by a 600-word post-delegation synthesis. - Times the coordinator's full A2A round-trip with millisecond precision and emits one JSON event per phase (machine-readable). - Pulls the coordinator's heartbeat trace post-run so the team can see whether any platform-side state transition fired during the long synthesis (the V1.0 RFC's MAX_TASK_EXECUTION_SECS would surface as such a transition; absence of one in this trace confirms the RFC's premise). Why a measurement harness, not a pass/fail test: Issue 4's claim is "absence of platform-side bound", which is hard to assert in a single CI run. Outputting structured measurement data lets the team interpret across multiple runs / staging vs prod / different SYNTHESIS_DEPTH values rather than relying on one reproduction snapshot. The script's header has the full interpretation guide: - ELAPSED < 60s → not informative (LLM was just fast) - 60–300s → within DELEGATION_TIMEOUT, ambiguous - >= 300s without trace transitions → BUG CONFIRMED - curl_failed → coordinator hung past A2A_TIMEOUT or genuinely slow (disambiguate by querying status separately) Doesn't run in CI by default — invoked manually against staging or a local platform with PLATFORM=... and OPENROUTER_API_KEY=... env vars. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-28 18:58:39 -07:00
Hongming Wang	f323def18f	chore(build): include platform_tools in runtime wheel SUBPACKAGES The PR-built wheel + import smoke gate refused the platform_tools package because it's a new subdirectory under workspace/ that wasn't in scripts/build_runtime_package.py:SUBPACKAGES. The drift gate (which exists for exactly this reason) caught it cleanly: error: SUBPACKAGES drifted from workspace/ subdirectories: in workspace/ but NOT in SUBPACKAGES (will ship un-rewritten or be excluded): ['platform_tools'] Adding platform_tools to SUBPACKAGES wires the package into the runtime wheel + applies the canonical from platform_tools.<x> -> from molecule_runtime.platform_tools.<x> import-rewrite step that every other subpackage uses. Verified locally: scripts/build_runtime_package.py succeeds, the rewritten a2a_mcp_server.py reads from molecule_runtime.platform_tools.registry import TOOLS which matches the package layout in the wheel.	2026-04-28 17:19:00 -07:00
Hongming Wang	f2c3594abc	feat(dev-start): true single-command spinup — infra + templates + auth posture Manual fresh-user clean-slate test surfaced three friction points in the existing dev-start.sh: 1. The script ran docker compose -f docker-compose.infra.yml directly, bypassing infra/scripts/setup.sh — so the workspace template registry was never populated and the canvas template palette came up empty (the "Template palette is empty" troubleshooting hit). 2. ADMIN_TOKEN was not handled at all. Without it, the AdminAuth fail-open gate worked initially but slammed shut the moment the first workspace registered a token — at which point the canvas could no longer call /workspaces or /templates. New users hit 401s with no obvious next step. 3. The script wasn't mentioned in docs/quickstart.md. New users followed the documented 4-step manual flow and never discovered the single command existed. Fixes: - dev-start.sh now calls infra/scripts/setup.sh, which brings up full infra (postgres + redis + langfuse + clickhouse + temporal) AND populates the template/plugin registry from manifest.json. - On first run, dev-start.sh writes MOLECULE_ENV=development to .env. This activates middleware.isDevModeFailOpen() which lets the canvas keep calling admin endpoints without a bearer (the intended local-dev escape hatch). The .env is preserved on re-runs and sourced before the platform launches. - The script intentionally does NOT auto-generate an ADMIN_TOKEN. A first attempt did, and broke the canvas because isDevModeFailOpen requires ADMIN_TOKEN empty AND MOLECULE_ENV=development together. Setting ADMIN_TOKEN in dev would close the hatch and the canvas has no way to read that token in a dev build (no NEXT_PUBLIC_ADMIN_TOKEN bake step here). The .env comment block explicitly warns future contributors not to add it. - Both processes' logs go to /tmp/molecule-{platform,canvas}.log instead of stdout-mixed so the readiness banner stays clean. - Health-poll loops cap at 30s with a clear timeout error pointing to the log file, instead of hanging forever. - The readiness banner now lists the log paths AND tells the user the next step is "open localhost:3000 → add API key in Config → Secrets & API Keys → Global", instead of just listing service URLs. Quickstart doc rewrite leads with: git clone ... cd molecule-monorepo ./scripts/dev-start.sh The 4-step manual flow is preserved as "Manual setup (advanced)" for contributors who want per-component logs. Verified end-to-end from clean Docker (no containers, no volumes, no .env) three times: total wall-clock ~12s for a re-run with cached npm/docker layers. Platform's HTTP 200 on /workspaces without a bearer confirms the dev-mode auth hatch is active.	2026-04-27 16:29:37 -07:00

1 2

74 Commits