molecule-core

Author	SHA1	Message	Date
Hongming Wang	6589929f87	docs: document the two PR auto-merge safety guards Adds a section to CONTRIBUTING.md → "Pull Requests" explaining the two system-level guards that protect against the "I enabled auto-merge then pushed more commits" race: 1. Repo-wide setting: "Automatically delete head branches" (catches pushes to a merged-and-deleted branch — the post-merge orphan case). 2. CI workflow `pr-guards` calling molecule-ci's disable-auto-merge-on-push (catches pushes during queue processing — disables auto-merge, posts a comment, requires explicit re-engage). Why doc-not-just-memory: my agent-side memory is local. Other contributors on other machines need this in the repo where they read it. Cites the 2026-04-27 PR #2174 incident with the specific commit SHAs that got orphaned. Companion: molecule-ci README updated separately to document the reusable workflow under "What each workflow validates" so devs who land in the molecule-ci repo first can find the contract. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 06:45:55 -07:00
Hongming Wang	a354ae2feb	Merge pull request #2174 from Molecule-AI/fix/lib-subpackage-and-drift-gate fix(build): ship lib/ subpackage + extend drift gate to SUBPACKAGES	2026-04-27 13:07:00 +00:00
Hongming Wang	6e732ab714	fix(build): ship lib/ subpackage + extend drift gate to SUBPACKAGES Two compounding bugs that bit hermes (and any other workspace that reaches main.py:142): 1. workspace/lib/ was in EXCLUDE_DIRS so the published wheel didn't contain the directory at all. main.py imports `from lib.pre_stop import read_snapshot` (and `build_snapshot`, `write_snapshot`) so every workspace startup that reaches the snapshot path crashed with `ModuleNotFoundError: No module named 'lib'`. 2. Even if lib/ had shipped, `lib` wasn't in SUBPACKAGES so the import-rewriter would have left the bare `from lib.pre_stop` unqualified — it would still fail because the package would only be reachable as `molecule_runtime.lib`. Fix: move `lib` from EXCLUDE_DIRS to SUBPACKAGES (one entry each). Drift gate extension: the existing gate I added in #2163 only asserted TOP_LEVEL_MODULES against workspace/*.py. This change adds the symmetric assertion for SUBPACKAGES against workspace/<dir>/ (filtered by EXCLUDE_DIRS + presence of __init__.py). Catches both: - Subpackage added to workspace/ but missed in SUBPACKAGES - Subpackage missing from workspace/ but lingering in SUBPACKAGES - Subpackage wrongly in EXCLUDE_DIRS while also referenced by rewritten imports (the lib case) Tested locally: build of 0.1.99 now ships lib/ and main.py contains `from molecule_runtime.lib.pre_stop import ...` correctly rewritten. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 06:03:46 -07:00
Hongming Wang	1100c50da8	Merge pull request #2172 from Molecule-AI/feat/e2e-cover-all-8-runtimes feat(e2e): extend priority-runtimes test to cover all 8 templates	2026-04-27 13:00:43 +00:00
Hongming Wang	c7478af99f	feat(e2e): extend priority-runtimes test to cover all 8 templates Tonight's wire-real E2E sweep exposed 12+ root causes across the post- #87 template extraction. Most would have been caught by an actual provision-and-online test running on each template — but the test only covered claude-code + hermes. Extending it to cover all 8 ensures any future regression in any template fails the test, not production. What's added: - run_openai_runtime(runtime, label): generic provisioner for the 5 OpenAI-backed templates (langgraph, crewai, autogen, deepagents, openclaw). Same shape as run_hermes minus the HERMES_* config block that hermes-agent needs. - run_gemini_cli: separate function — gemini-cli wants a Google AI key (E2E_GEMINI_API_KEY), not OpenAI. - Each new runtime registered in the dispatch loop. New `all` keyword for E2E_RUNTIMES runs every covered runtime. claude-code + hermes keep their dedicated functions; both have unique provisioning quirks (claude-code OAuth + claude-code-specific volume mounts; hermes 15-min cold-boot) that don't generalize cleanly. Skip-if-no-key pattern matches the existing one — partially-keyed CI gets clean skips, not false-fails. Usage: E2E_OPENAI_API_KEY=... E2E_RUNTIMES=langgraph ./test_priority_runtimes_e2e.sh E2E_OPENAI_API_KEY=... E2E_RUNTIMES=all ./test_priority_runtimes_e2e.sh Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 05:57:59 -07:00
Hongming Wang	1a2ddb4539	Merge pull request #2171 from Molecule-AI/deps/jwt-go-v5.2.2-cve-2025-30204 deps(jwt): bump golang-jwt/jwt/v5 v5.2.1 → v5.2.2 (CVE-2025-30204, HIGH)	2026-04-27 12:44:54 +00:00
Hongming Wang	e63c3b2044	Merge pull request #2170 from Molecule-AI/fix/a2a-executor-sdk-migration fix(a2a_executor): migrate to a2a-sdk 1.x API	2026-04-27 12:44:42 +00:00
Hongming Wang	041d255091	Merge pull request #2168 from Molecule-AI/ops/audit-railway-sha-pins ops: add Railway SHA-pin drift audit script + regression test (#2001)	2026-04-27 12:44:31 +00:00
Hongming Wang	5b05d663ee	test: update a2a.helpers mock to export new_text_message The conftest mock only exposed `new_agent_text_message`, the pre-v1 name. After fixing a2a_executor.py to use the v1 name `new_text_message`, the mock didn't satisfy the import → CI red. Mock both names (aliased to the same lambda) so any in-flight test that still references the old name keeps working until the next sweep removes those references. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 05:34:28 -07:00
Hongming Wang	86bdfa3b47	deps(jwt): bump golang-jwt/jwt/v5 v5.2.1 → v5.2.2 (CVE-2025-30204) Closes the HIGH-severity dependabot alert on workspace-server's jwt-go pin. Upstream advisory GHSA-mh63-6h87-95cp / CVE-2025-30204: "jwt-go allows excessive memory allocation during header parsing" — fixed in v5.2.2. Patch bump within the v5.x line; semver guarantees no API change. Full workspace-server test suite passes (\`go test ./...\` clean across all 18 packages). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 05:31:58 -07:00
Hongming Wang	722e1fd175	fix(a2a_executor): migrate to a2a-sdk 1.x API — new_agent_text_message → new_text_message a2a-sdk v1 renamed `new_agent_text_message` → `new_text_message` (role=Role.agent is now the default). Same fix landed in the hermes template earlier today; this is the runtime-side equivalent. NOT dead code: a2a_executor.py is the LangGraph A2A executor, used by the langgraph + deepagents templates. Both templates currently import it via bare `from a2a_executor import LangGraphA2AExecutor` — which is a separate bug in those templates, filed/fixed separately. Symptom in a2a_executor.py form: any langgraph or deepagents workspace that calls create_executor crashes with `ImportError: cannot import name 'new_agent_text_message' from 'a2a.helpers'`. Doesn't surface for claude-code or hermes (their templates use their own executors and don't load a2a_executor). Five call sites updated, one import line, one comment. Test suite already passes against the new symbol — `python -c "from molecule_runtime.a2a_executor import LangGraphA2AExecutor"` resolves cleanly after this change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 05:29:59 -07:00
Hongming Wang	026f5e51d9	ops: add Railway SHA-pin drift audit script + regression test (#2001 ) #2000 fixed one symptom — TENANT_IMAGE pinned to `staging-a14cf86` (10 days stale) silently no-op'd four upstream fixes on 2026-04-24. This adds the audit pattern as a re-runnable script so the broader class is observable on demand without new CI infrastructure. Audit results today (2026-04-27): controlplane / production: 54 vars audited, 0 drift-prone pins controlplane / staging: 52 vars audited, 0 drift-prone pins So the immediate audit deliverable is clean — TENANT_IMAGE is the only known violation and #2000 already fixed it. The script makes the ongoing audit a 5-second command instead of a manual one. Detection regex catches: * branch-SHA suffixes (`staging\|main\|prod\|production-<6+ hex>`) — the exact 2026-04-24 incident shape * version pins after `:` or `=` (`:v1.2.3`, `=v0.1.16`) — same drift class, just rendered differently Anchoring on `:` or `=` keeps prose like "version 1.2.3 of the api" out of the false-positive set. UUIDs, ARNs, AMI IDs, secrets, and floating tags (`:staging-latest`, `:main`) pass through untouched. Regression test (tests/ops/test_audit_railway_sha_pins.sh) pins 20 representative cases — 9 should-flag (covering all four branch prefixes + semver variants + middle-of-value matches) and 11 should-pass (the false-positive guards). Same regex inlined in both files so a future tweak that weakens detection fails the test in lockstep with weakening the audit. Both files shellcheck clean. CI gate (acceptance criterion's "regression: add a CI check") is deliberately scoped out — querying Railway from CI requires plumbing RAILWAY_TOKEN as a repo secret, which is multi-step setup. The re-runnable script + test cover the same surface today; the CI workflow is a small follow-up once the token is provisioned. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 05:01:23 -07:00
Hongming Wang	7cf77f274a	Merge pull request #2166 from Molecule-AI/test/unblock-resolveandstage-test test(plugins): unblock TestResolveAndStage_NoInternalErrorsInHTTPErr (#1814)	2026-04-27 11:36:15 +00:00
Hongming Wang	dc2f6bd378	Merge pull request #2167 from Molecule-AI/fix/saas-federation-tutorial-409 docs(saas-federation): fix workspace-limit response code (409, not 402) (#1754)	2026-04-27 11:36:02 +00:00
Hongming Wang	3679a6eff6	docs(saas-federation): fix workspace-limit response code (409, not 402) (#1754 ) Quota gates are resource-state conflicts, not payment failures — RFC 9110 reserves 402 for billing/payment failures specifically. The canonical Molecule-AI/docs PR #82 already shipped the corrected text; this brings the molecule-core copy of the tutorial in line. The inline parenthetical "(not 402 Payment Required — quota gates are resource-state conflicts, not payment failures, per RFC 9110)" doubles as a regression anchor: a future edit that flips 409 back to 402 would have to also reword that explanation, making the change a deliberate two-step act rather than a casual oversight. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 04:30:46 -07:00
Hongming Wang	a0154ea0b4	test(plugins): unblock TestResolveAndStage_NoInternalErrorsInHTTPErr (#1814 ) Closes the second of two skipped tests in workspace_provision_test.go that were blocked on interface refactors. The Broadcaster + CP provisioner halves landed in earlier #1814 cycles; this is the plugin-source-registry half. Refactor: - Add handlers.pluginSources interface with the 3 methods handler code actually calls (Register, Resolve, Schemes) - Compile-time assertion `var _ pluginSources = (plugins.Registry)(nil)` catches future method-signature drift at build time - PluginsHandler.sources narrowed from plugins.Registry to the interface; production wiring (NewPluginsHandler, WithSourceResolver) still passes *plugins.Registry — satisfies the interface Production fix (#1206 leak): - resolveAndStage's Fetch-failure path was interpolating err.Error() into the HTTP response body via `failed to fetch plugin from %s: %v`. Resolver errors routinely contain rate-limit text, github request IDs, raw HTTP body fragments, and (for local resolvers) file system paths — none has any business landing in a user's browser. - Body now carries just `failed to fetch plugin from <scheme>`; the status code already differentiates the failure shape (404 not found, 504 timeout, 502 generic). Full err detail stays in the server-side log line one statement above. Test: - 6 sub-tests covering every error path inside resolveAndStage: empty source, invalid format, unknown scheme, local path-traversal, unpinned github (PLUGIN_ALLOW_UNPINNED unset), Fetch failure with a leaky synthetic error - The Fetch-failure case plants 5 realistic leak markers in the resolver's error string (rate limit text, x-github-request-id, auth_token, ghp_-prefixed token, /etc/passwd path); the assertion fails if ANY appears in the response body - Table-driven so a future error path added to resolveAndStage gets one new row, not a copy-paste of the assertion logic Verification: - 6/6 sub-tests pass - Full workspace-server test suite passes (interface refactor is non-breaking; production caller paths unchanged) - go build ./... clean Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 04:00:39 -07:00
hongmingwang-moleculeai	104650941a	Merge pull request #2165 from Molecule-AI/fix/main-sync-entry-point fix: restore main_sync entry point in workspace/main.py	2026-04-27 10:54:44 +00:00
hongmingwang-moleculeai	4c839cb306	Merge pull request #2164 from Molecule-AI/test/unblock-cp-provision-broadcast-test test(provisioner): unblock TestProvisionWorkspaceCP_NoInternalErrorsInBroadcast (#1814)	2026-04-27 10:54:44 +00:00
Hongming Wang	3df5867b56	fix: restore main_sync entry point in workspace/main.py The wheel's pyproject.toml has declared `molecule-runtime = "molecule_runtime.main:main_sync"` since the publish pipeline was created on 2026-04-26, but the function itself was never present in workspace/main.py — it lived in the pre-monorepo molecule-ai-workspace-runtime repo and was lost during the consolidation that made workspace/ the source of truth. The 0.1.15 wheel still had main_sync from a leftover snapshot, so the regression went unnoticed until 0.1.16 (the first wheel built from the new source-of-truth) shipped. Symptom: every workspace container restart loops with ImportError: cannot import name 'main_sync' from 'molecule_runtime.main' — the molecule-runtime CLI script's first line tries to import the missing symbol. Workspaces stay in `provisioning` until the 10-min sweep marks them failed. Caught by .github/workflows/runtime-pin-compat.yml, which already imports the symbol by name as its smoke test. (That check kept failing red on every recent merge_group run; this PR fixes the underlying symbol-not-found instead of the smoke step.) Also strengthens publish-runtime.yml's wheel smoke from `import molecule_runtime.main` (loads the module — passes even when entry-point target is missing) to `from molecule_runtime.main import main_sync` (the actual contract the CLI script needs). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 03:35:49 -07:00
Hongming Wang	e15d1182cd	test(provisioner): unblock TestProvisionWorkspaceCP_NoInternalErrorsInBroadcast (#1814 ) The skipped test exists to assert that provisionWorkspaceCP never leaks err.Error() in WORKSPACE_PROVISION_FAILED broadcasts (regression guard for #1206). Writing the test body required substituting a failing CPProvisioner — but the handler's `cpProv` field was the concrete CPProvisioner type, so a mock had nowhere to plug in. Refactor: - Add provisioner.CPProvisionerAPI interface with the 3 methods handlers actually call (Start, Stop, GetConsoleOutput) - Compile-time assertion `var _ CPProvisionerAPI = (CPProvisioner)(nil)` catches future method-signature drift at build time - WorkspaceHandler.cpProv narrowed to the interface; SetCPProvisioner accepts the interface (production caller passes *CPProvisioner from NewCPProvisioner unchanged) Test: - stubFailingCPProv whose Start returns a deliberately leaky error (machine_type=t3.large, ami=…, vpc=…, raw HTTP body fragment) - Drive provisionWorkspaceCP via the cpProv.Start failure path - Assert broadcast["error"] == "provisioning failed" (canned) - Assert no leak markers (machine type, AMI, VPC, subnet, HTTP body, raw error head) in any broadcast string value - Stop/GetConsoleOutput on the stub panic — flags a future regression that reaches into them on this path Verification: - Full workspace-server test suite passes (interface refactor is non-breaking; production caller path unchanged) - go build ./... clean - The other skipped test in this file (TestResolveAndStage_…) is a separate plugins.Registry refactor and remains skipped Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 03:28:25 -07:00
Hongming Wang	5022a740e1	Merge pull request #2163 from Molecule-AI/fix/build-script-drift-gate-and-main-smoke fix(release): drift-gate TOP_LEVEL_MODULES + smoke-import main (post-0.1.16 incident)	2026-04-27 10:22:06 +00:00
Hongming Wang	c68dc1877f	fix(release): drift-gate TOP_LEVEL_MODULES + smoke-import main in publish Two compounding bugs surfaced when 0.1.16 hit production today: 1. scripts/build_runtime_package.py had a hand-curated TOP_LEVEL_MODULES set listing every workspace/.py that should get its bare imports rewritten to `molecule_runtime.X`. The set silently went stale: - Missing: transcript_auth (added since #87 phase 1c), runtime_wedge, watcher → unrewritten imports shipped, every workspace startup died with ModuleNotFoundError. - Stale: claude_sdk_executor, cli_executor (both removed in #87), hermes_executor (never existed) → harmless but misleading. 2. publish-runtime.yml's wheel-smoke step asserted on stable invariants (BaseAdapter, AdapterConfig, a2a_client error sentinel) but never imported main. So even though main.py held the broken bare `from transcript_auth import ...`, the smoke check passed. Fixes: - Build script now derives the on-disk module set from workspace/.py and asserts it matches TOP_LEVEL_MODULES exactly. Drift in either direction fails the build with a specific diff message instead of shipping a broken wheel. Closed-list typo guard preserved (we still edit the set explicitly when a module is added/removed) — the gate just makes drift impossible to ignore. - TOP_LEVEL_MODULES updated to current reality: drop the 3 stale, add the 3 missing. - publish-runtime.yml wheel-smoke now `import molecule_runtime.main` before the invariant asserts. main is the entry point and transitively imports every module — any bare-import bug surfaces as ModuleNotFoundError before PyPI accepts the upload. Tested locally: `python3 scripts/build_runtime_package.py --version 0.1.99 --out /tmp/build-test` succeeds, and /tmp/build-test/molecule_runtime/main.py contains the rewritten `from molecule_runtime.transcript_auth import ...`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 03:19:17 -07:00
Hongming Wang	6f0774c708	Merge pull request #2162 from Molecule-AI/fix/e2e-sanity-rc-normalization fix(e2e-sanity): normalize unexpected curl exit codes in cleanup trap (#2159)	2026-04-27 10:05:14 +00:00
Hongming Wang	99fb61bb8c	fix(e2e-sanity): normalize unexpected curl exit codes in cleanup trap (#2159 ) When E2E_INTENTIONAL_FAILURE=1 poisons the tenant token, step 5/11's `tenant_call POST /workspaces` curl exits 22 (HTTP error under --fail-with-body). `set -e` propagates rc=22 directly, but the script's documented contract emits only {0,1,2,3,4}, and the sanity workflow's case statement only matches those. rc=22 falls through to "Unexpected rc — investigate harness" and opens a false-positive priority-high "safety net broken" issue (#2159, weekly run on 2026-04-27). The trap now captures $? at entry (must be the first statement before any command clobbers it) and at the end normalizes any non-contract code to 1 (generic failure). Leak detection continues to exit 4 directly, so its semantics are preserved. Adds tests/e2e/test_harness_rc_normalization.sh — a self-contained regression test that builds a stub harness with the same trap pattern, triggers controlled exit codes, and asserts the normalization. Covers the 5 contracted codes + curl-22 (the bug) + 3 representative network-failure codes + sigsegv-139. Verification: - 10/10 regression tests pass - shellcheck clean on both modified files - production teardown path unchanged for legitimate {1,2,3,4} failures and the leak-detection exit 4 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 02:55:44 -07:00
hongmingwang-moleculeai	c3d29941b8	Merge pull request #2161 from Molecule-AI/feat/auto-publish-runtime-on-staging feat(publish-runtime): auto-publish to PyPI on staging pushes touching workspace/	2026-04-27 09:20:12 +00:00
Hongming Wang	7d872f9661	Merge pull request #2160 from Molecule-AI/feat/skill-runtime-compat feat(skills): per-skill runtime compatibility (#119)	2026-04-27 09:15:01 +00:00
Hongming Wang	0a455b7d71	feat(publish-runtime): auto-publish to PyPI on staging pushes that touch workspace/ Adds a third trigger so any merge to staging that changes workspace/ auto-publishes a new molecule-ai-workspace-runtime patch release. Closes the human-in-loop gap that caused tonight's RuntimeCapabilities ImportError outage. Tonight: #117 added RuntimeCapabilities to molecule_runtime.adapters.base. The merge landed at 02:37 UTC. Templates rebuilt their images at 07:37 UTC (4 hours later) and started importing the new symbol. PyPI was still serving 0.1.15 (pre-#117) because nobody remembered to push a runtime-vX.Y.Z tag or workflow_dispatch the publish. Result: every template image shipped tonight runs `from molecule_runtime.adapters.base import RuntimeCapabilities` against an installed runtime that doesn't export it -> ImportError -> workspace never registers -> stuck in provisioning until 10-min sweep. Mechanism: - New trigger: push to staging filtered to paths: ['workspace/']. Path filter applies only to branch pushes; the existing tag trigger still fires unconditionally. - Version derivation for the auto case: query PyPI's JSON API for current latest, bump the patch component. PyPI is the source of truth so concurrent runs don't double-publish (HTTP 400 on collision). - concurrency: group serializes parallel staging merges so they don't race on the bump computation. cancel-in-progress: false because each workspace/** change deserves its own release. - publish job now exposes its derived version as a job-level output so the cascade reads it cleanly. Fixes a latent bug: cascade tried to read steps.version.outputs.version, which is from a different job's scope and silently resolved to empty -- then re-derived from GITHUB_REF_NAME, which would have been "staging" under the new trigger and produced an invalid version. Tag-driven and manual-dispatch paths are unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 02:11:45 -07:00
Hongming Wang	d19d35f6b3	test(skills): make watcher test fakes accept current_runtime kwarg The runtime-compat change in this branch added a `current_runtime` kwarg to load_skills(); the watcher passes it through. Test mocks that pre-date the kwarg signature broke with TypeError, which the watcher's reload-error try/except swallowed — the symptom was empty callback lists, not a clear failure. Switching the fakes to accept **kwargs keeps them forward-compat for future load_skills additions without another test churn. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 02:04:26 -07:00
Hongming Wang	d0057912d2	feat(skills): per-skill runtime compatibility (#119 , hermes pattern) SKILL.md frontmatter can now declare `runtime: [claude-code]` or `runtime: [hermes, claude-code]` to opt out of incompatible adapters instead of failing at first invocation. Default `[""]` means universal — existing skill libraries need zero migration. Borrowed from hermes' declarative skill-compat pattern surfaced in the hermes architecture survey. The remaining two patterns (event-log layer, observability config block) stay open under #119. Wiring: - SkillMetadata.runtime: list[str] = [""] - _normalize_runtime_field accepts list, string-sugar, missing -> [""]; malformed warns and falls back to universal so a typo never silently drops a skill. - load_skills(..., current_runtime=...) filters out skills whose runtime list lacks "" or current_runtime, with an INFO log line. - BaseAdapter.start passes type(self).name() so the live adapter drives the filter; SkillsWatcher takes the same kwarg so hot-reload honors it. 8 new tests cover default universal, no-field universal, explicit match/mismatch, string sugar, wildcard short-circuit, current_runtime=None (preserves old behavior), and malformed-warns-not-drops. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 01:57:43 -07:00
Hongming Wang	e99f937630	Merge pull request #2157 from Molecule-AI/chore/drop-cli-executor-from-runtime chore(workspace): drop cli_executor — Phase 3 of #87 [DRAFT]	2026-04-27 08:24:30 +00:00
Hongming Wang	4959c37040	Merge pull request #2158 from Molecule-AI/feat/steer-agent-to-attachments-field feat(tools): tighten send_message_to_user description to forbid pasting URLs in body	2026-04-27 08:24:02 +00:00
Hongming Wang	98ca5c50fa	chore(workspace): drop cli_executor — Phase 3 of #87 (DRAFT, blocked on gemini-cli image rebuild) DRAFT — do NOT merge until gemini-cli template image rebuilds with its local cli_executor.py copy (template PR #9 just merged at 07:59 UTC; image build kicks off now). Final adapter-specific deletion from molecule-runtime, completing #87 for the priority adapters (claude-code via PR #2156, plus gemini-cli via this PR + template #9). Deletes: - workspace/cli_executor.py (461 LOC) — CLIAgentExecutor + the RUNTIME_PRESETS dict for codex / ollama / gemini-cli. The file moved to molecule-ai-workspace-template-gemini-cli (PR #9, merged). - workspace/tests/test_agent_base_urls.py — only consumer of CLIAgentExecutor in the test suite. Tests for the executor behavior live in the template repo now. Updates: - workspace/tests/test_executor_helpers.py — docstring refresh: executor_helpers.py is the runtime-agnostic shared helpers; the executor classes themselves live in template repos post-#87. Codex / ollama presets disappear naturally with the file. They never had template repos, so no production path could invoke them anyway — this is dead-code removal as a side effect of the move. Verified-safe-to-delete: - heartbeat.py: doesn't import cli_executor - claude_sdk_executor.py: deleted by PR #2156 (in flight) - preflight.py: only references runtime names by string; no import - main.py: doesn't import cli_executor (uses adapter discovery via ADAPTER_MODULE; the template's adapter constructs the executor) - Only test_agent_base_urls.py + test_executor_helpers.py docstring referenced cli_executor Verification: - 1249/1249 workspace pytest pass (was 1251; -2 = test_agent_base_urls.py cases — exact match) - No live import of cli_executor anywhere in molecule-core after deletion (grep verified) Sequencing: 1. ✅ Template PR #9 (gemini-cli local copy) — MERGED 2. ⏳ Template image rebuild — running 3. THIS PR — wait until image is published, then mark ready-for-review Closes #87 for the priority adapters: workspace/ is now adapter- agnostic except for adapter discovery (ADAPTER_MODULE) + the runtime_wedge primitive. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 01:22:39 -07:00
Hongming Wang	7504aba934	feat(tools): tighten send_message_to_user description to forbid pasting URLs in body Root-cause fix for #118 (chat attachments rendering as plain text links instead of download chips). User flagged with screenshot 2026-04-26 showing the Design Director agent pasting https://files.catbox.moe/… in the message body — chat rendered the URL as plain markdown text, unclickable in the canvas's bubble layout, and unreachable in any SaaS deployment where the user's browser can't egress to catbox. The structured `attachments` field already exists, the canvas's AttachmentChip already renders well, the WebSocket broadcast already carries attachments verbatim — the missing piece was the LLM choosing the body over the structured field. Tighten the tool description so it trains the right behavior. Three targeted strengthenings: 1. Top-level tool description: enumerated use case (4) now reads "via the `attachments` field (NEVER paste file URLs in `message`)". The all-caps NEVER + the explicit field name move the LLM toward the structured path on first read. 2. `message` param: adds an explicit DO NOT rule with rationale. Includes the SaaS-reachability reason so operators can grep for "SaaS" and find this design constraint instead of re-discovering it after a tenant complaint. Calls out catbox.moe + file:// by name as concrete examples of forbidden hosts (those are the two we've seen in production). 3. `attachments` param: leads with REQUIRED, lists the bad alternatives explicitly (pasting URLs, base64-encoding, telling user to look at a path). LLMs handle "use X, NOT Y" framings better than "use X" alone — observed during prompt-engineering iteration on hermes' tool descriptions. Tests pin all three load-bearing phrases (4 new in test_a2a_mcp_server.py) so a future doc edit that softens or drops them fails CI. Brittle by design — these are prompt-engineering invariants, not implementation details. This is the root-cause fix. A defensive canvas-side backstop (auto- detect download-shaped URLs in body and convert to chips) is a follow-up that could land separately if the steering proves insufficient in practice. Verification: - 1190/1190 workspace pytest pass - 4 new test_a2a_mcp_server.py cases all green Closes the steering half of #118. The structured-attachments-only contract was already enforced server-side (PR #2130 added per-attachment validation); this PR closes the prompt-side gap. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 01:13:11 -07:00
Hongming Wang	4e6030d783	Merge pull request #2156 from Molecule-AI/chore/drop-claude-sdk-executor-from-runtime chore(workspace): drop claude_sdk_executor — Phase 2 of #87	2026-04-27 08:02:51 +00:00
Hongming Wang	2fbf6b6b27	Merge pull request #2155 from Molecule-AI/feat/preflight-runtime-discovery feat(preflight): replace SUPPORTED_RUNTIMES static list with adapter discovery	2026-04-27 08:02:39 +00:00
Hongming Wang	4b5ac2ebc2	chore(workspace): drop claude_sdk_executor — Phase 2 of #87 Phase 2 of the universal-runtime refactor (task #87). Now that the claude-code template repo ships its own claude_sdk_executor.py (template PR #13 merged + image rebuilt at 07:36 UTC) the molecule-runtime no longer needs to ship the file. Deletes: - workspace/claude_sdk_executor.py (704 LOC) - workspace/tests/test_claude_sdk_executor.py (~1.6K LOC) Updates: - workspace/runtime_wedge.py — drops the "Compatibility shim" docstring section. The shim was time-bounded ("removed once #87 Phase 2 lands"); this is that PR. - workspace/tests/test_runtime_wedge.py — drops the TestClaudeSdkExecutorReExportShim test class (the shim doesn't exist anymore so the identity assertions would fail at import). - workspace/tests/conftest.py — drops the claude_agent_sdk stub. Its only consumer was test_claude_sdk_executor.py which is gone; no other test imports the SDK. - workspace/cli_executor.py — comment refresh: claude-code template repo (not workspace/) is now the home for ClaudeSDKExecutor. Verified-safe-to-delete: - heartbeat.py: migrated to runtime_wedge in PR #2154 (no longer imports from claude_sdk_executor) - cli_executor.py: only comments referenced claude_sdk_executor; its line-117 ValueError defends against accidental routing - tests: only test_claude_sdk_executor.py + test_runtime_wedge.py's shim class consumed the deleted module; both removed in this PR Verification: - 1182/1182 workspace pytest pass (was 1251; -69 = exactly the deleted test cases — zero unexpected regressions) - No live import of claude_sdk_executor anywhere in molecule-core after deletion (grep verified) Closes #87 for the claude-code adapter. Hermes is already template-only. The remaining adapter-specific code in workspace/ is cli_executor.py (codex/ollama/gemini-cli) tracked by task #122. preflight.py's SUPPORTED_RUNTIMES static list is tracked by task #123 (PR #2155 in flight). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 00:52:55 -07:00
Hongming Wang	7dba700ac3	feat(preflight): replace SUPPORTED_RUNTIMES static list with adapter discovery Closes task #123 — last piece of #87 cleanup. Pre-fix: workspace/preflight.py:11 hardcoded a tuple of "supported" runtime names (claude-code, codex, ollama, langgraph, etc.). Every new template repo required a code change in molecule-runtime to be recognized — direct violation of the universal-runtime principle (#87) where adapters declare themselves and the runtime stays generic. Post-fix: discovery-based validation via the same ADAPTER_MODULE env var that production load paths already consult (workspace/adapters/__init__.py:get_adapter). Distinguished failure modes so operator messages are concrete: - ADAPTER_MODULE unset → "no adapter installed; set the env var" - ADAPTER_MODULE set but module won't import → import error type + message - module imports but no Adapter class → "convention violation, add `Adapter = YourClass`" - Adapter.name() raises → caught with operator message - Adapter.name() returns non-string → contract violation message - Adapter.name() doesn't match config.runtime → drift WARNING (not fatal; the adapter wins in production, config.yaml is just documentation) The drift case is the one behavioral change worth calling out: the prior static-list path would have hard-failed config.runtime values not in the allowlist. With discovery, an unknown runtime in config.yaml is just a documentation drift — the adapter that's actually installed runs regardless. Operator gets a warning naming both the configured and installed names so they can fix whichever is stale. Tests: - Replaces the obsolete "static list pass/fail" tests with 6 new cases covering each distinguished failure mode, plus a positive test for the adapter-matches-config happy path - Adds an autouse `_default_langgraph_adapter` fixture that pre-installs a fake adapter via sys.modules monkey-patching, so existing tests building default WorkspaceConfig (runtime="langgraph") inherit a valid adapter without each test setting ADAPTER_MODULE - Failure-mode tests opt out of the default fixture via @pytest.mark.no_default_adapter (registered in pytest.ini) - Sentinel pattern (`_UNSET = object()`) for `name_returns` so None is a passable test value (otherwise `is not None` would skip the None branch — exact bug the sentinel avoids) Verification: - 22/22 preflight tests pass (was 16; +6 new failure-path tests) - 1256/1256 workspace pytest pass (was 1251; +5 net) - No production code path other than preflight changed Source: 2026-04-27 #87 cleanup audit after PR #2154 (wedge extraction). This change is independent of the cli_executor.py template moves (task #122) — completes one of the two remaining cleanup items. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 00:44:51 -07:00
Hongming Wang	66b9c04057	Merge pull request #2154 from Molecule-AI/refactor/extract-wedge-state-from-claude-sdk refactor(wedge): extract claude_sdk_executor wedge state into runtime_wedge module	2026-04-27 07:22:20 +00:00
Hongming Wang	5e049244d6	refactor(wedge): mark re-exports explicit via __all__ Addresses github-code-quality unused-import flag on the runtime_wedge re-export shim. Adds __all__ listing the names that exist purely for backwards-compat (is_wedged / wedge_reason / _reset_sdk_wedge_for_test) so static analysis recognizes the imports as deliberate exports. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 00:20:23 -07:00
Hongming Wang	feb544938b	refactor(wedge): address review feedback — class wrap + import-path doc + dedupe shim rationale Three changes from /code-review-and-quality on PR #2154: 1. Optional (architecture): wrap state in a private _WedgeState class instead of bare module-level globals. Public API (mark_wedged / clear_wedge / is_wedged / wedge_reason / reset_for_test) is unchanged — adapters never see the class. The class is forward-cover for any future per-scope variant (multiple executors per process, a keyed registry, etc.) without churning the call sites. Today there's exactly one instance (_DEFAULT) so behavior is identical. 2. Optional (readability): clarify the import path in the integration recipe — in a TEMPLATE repo it's `from molecule_runtime.runtime_wedge` (PyPI package); in molecule-core itself it's `from runtime_wedge` (top-level module). Removes the trap where a contributor reading the docstring while editing in-repo copies the template-style import and gets ImportError. 3. Nit (readability): dedupe the shim rationale. claude_sdk_executor's re-export comment now points to runtime_wedge's "Compatibility shim" section as the source of truth instead of restating the same content. Avoids docs-in-two-places drift risk. Verification: - 1251/1251 workspace pytest pass (no behavior change — class wrap is pure plumbing; module-level helpers delegate to the singleton) - All shim re-export identity tests still pass (the shim's `is_wedged is runtime_wedge.is_wedged` assertion holds because we re-export the SAME function object that delegates to _DEFAULT) No new tests needed — the existing test suite covers the public API contract; the class is an implementation detail behind that contract. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 00:16:33 -07:00
Hongming Wang	cd899c969f	docs(wedge): integration recipe for adapters that want to flip-to-degraded Doc-only follow-up to the wedge-state extraction. Adds proactive guidance so the next adapter (hermes / codex / langgraph / a future template) discovers the runtime_wedge primitive and integrates the ~6 LOC pattern uniformly instead of inventing its own wedge state. Two additions: - workspace/runtime_wedge.py — new "How to use from a NEW adapter" section in the module docstring with the minimum viable integration recipe, what-you-get-for-free list, and explicit DON'TS (don't store local wedge state, don't mark for transient errors, don't write your own clear logic). Plus a "when wedge is the WRONG primitive" note to keep adopters from over-using it. - workspace/adapter_base.py — adds runtime_wedge to the "Cross-cutting capabilities your adapter can opt into" list in BaseAdapter's docstring (alongside capabilities() and idle_timeout_override()). Discoverability path: adapter author reads BaseAdapter docstring → sees runtime_wedge mention → reads runtime_wedge module docstring → has the recipe. Also tightens the "to add a new agent infra" steps in BaseAdapter to match the actual current model (standalone template repo + ADAPTER_MODULE env var) rather than the obsolete workspace/adapters/<infra>/ layout that hasn't been the path since the universal-runtime extraction started. Zero code change. Tests untouched (1251/1251 still pass). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 00:12:14 -07:00
Hongming Wang	1d231ed295	refactor(wedge): extract claude_sdk_executor wedge state into runtime_wedge module Prerequisite for the universal-runtime refactor (task #87) to move claude_sdk_executor.py out of molecule-runtime into the claude-code template repo. heartbeat.py had a hard import: from claude_sdk_executor import is_wedged, wedge_reason which would break the moment the executor moves out of the runtime package — the heartbeat would lose access to the wedge state used to flip workspace status to degraded. Extract the wedge state to a runtime-side module that the heartbeat can keep importing regardless of which adapter executor is wedged: - workspace/runtime_wedge.py — single-flag state + mark_wedged / clear_wedge / is_wedged / wedge_reason / reset_for_test. Same semantics as the original claude_sdk_executor implementation (sticky first-write-wins, auto-clear on observed success). 100 LOC of pure stateless helpers; lock-free ok because there's one executor per workspace process today. - workspace/claude_sdk_executor.py — drops the in-file definitions; re-exports the same names from runtime_wedge as a backwards-compat shim. Any third-party adapter that imported is_wedged / wedge_reason / _mark_sdk_wedged from claude_sdk_executor keeps working for one release cycle while they migrate to runtime_wedge. - workspace/heartbeat.py — _runtime_state_payload() now imports from runtime_wedge instead of claude_sdk_executor. Lazy-import pattern preserved; the docstring updated to explain the new cross-cutting source-of-truth. Tests (10 new in test_runtime_wedge.py): - Default state (unwedged), mark sets flag, first-write-wins, clear restores healthy, clear-when-not-wedged is no-op, re-marking after clear is allowed - Re-export shim: each old name in claude_sdk_executor IS the runtime_wedge function (identity check), state is shared (marking via the executor shim is observable via runtime_wedge and vice versa) Verification: - 1251/1251 workspace pytest pass (was 1241 after orphan deletion; +10 = exactly the new test_runtime_wedge.py cases) - All existing test_claude_sdk_executor.py cases (which call _mark_sdk_wedged via the shim) still pass After this lands + the claude-code template image rebuilds with the local claude_sdk_executor.py copy (template PR #13), the molecule- core deletion of workspace/claude_sdk_executor.py becomes safe (the shim deletion comes alongside the file deletion, since runtime_wedge is the new public API). See project memory `project_runtime_native_pluggable.md`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 00:08:53 -07:00
Hongming Wang	c1e9aa7461	Merge pull request #2153 from Molecule-AI/fix/block-internal-paths-shallow-clone-bug fix(ci): block-internal-paths handle merge_group + shallow-clone BASE	2026-04-27 06:58:32 +00:00
hongmingwang-moleculeai	5d49cd7843	Merge pull request #2152 from Molecule-AI/chore/delete-orphan-hermes-executor chore(workspace): delete orphan HermesA2AExecutor (-1.8K LOC dead code)	2026-04-27 06:58:21 +00:00
Hongming Wang	d46d558ca9	Merge pull request #2148 from Molecule-AI/test/canvas-lib-utils-runtime-names-1815 test(canvas): cover utils.cn + runtime-names.runtimeDisplayName (0% → 100%) (#1815)	2026-04-27 06:57:57 +00:00
Hongming Wang	a682dcb502	Merge pull request #2149 from Molecule-AI/test/canvas-actions-1815 test(canvas): cover canvas-actions restart-pending helpers (25% → 100%) (#1815)	2026-04-27 06:55:36 +00:00
Hongming Wang	17a6800374	Merge pull request #2150 from Molecule-AI/feat/priority-runtimes-e2e test(e2e): claude-code + hermes priority-runtimes happy path	2026-04-27 06:55:20 +00:00
Hongming Wang	ae029f8c3f	Merge pull request #2151 from Molecule-AI/test/canvas-class-names-1815 test(canvas): cover store/classNames helpers (17% → 100%) (#1815)	2026-04-27 06:54:37 +00:00
Hongming Wang	516b58dcd7	Merge pull request #2147 from Molecule-AI/feat/canvas-coverage-instrumentation-1815 feat(canvas): vitest coverage instrumentation (#1815, no CI gate yet)	2026-04-27 06:54:22 +00:00
Hongming Wang	7ac7a010fa	fix(ci): block-internal-paths handle merge_group + shallow-clone BASE [Molecule-Platform-Evolvement-Manager] ## What was broken Same bug class as the secret-scan.yml fix in #2120 — block-internal-paths hit `fatal: bad object <sha>` exit 128 on the staging push at 2026-04-27 06:50:33Z. Two cases: 1. `merge_group` events: BASE/HEAD came from `github.event.before` / `.after` which are push-event-only properties. On merge_group both came back empty, the script fell through to "scan entire tree" mode which is correct but inefficient. Worse, when this workflow is required for the merge queue (line 21-22), an empty-BASE entire-tree scan would run on every queue check. 2. `push` events with shallow clones: `fetch-depth: 2` doesn't always cover BASE across true merge commits. When BASE is in the payload but absent from the local object DB, `git diff` errors out with `fatal: bad object <sha>` and the job exits 128. This is what broke today's staging push. ## Fix Same shape as the secret-scan.yml fix (#2120): - Add a dedicated `git fetch` step for `merge_group.base_sha`. - Move event-specific SHAs into a step `env:` block; script uses a `case` over `${{ github.event_name }}` covering pull_request / merge_group / push (rather than `if pull_request / else push` which left merge_group on the empty-BASE branch). - On-demand fetch + `git cat-file -e` guard for push BASE so a SHA that's payload-present-but-DB-absent triggers the fetch, and a fetch failure falls through cleanly to "scan entire tree" instead of exiting 128. ## Test plan - [x] YAML structure preserved (no schema changes) - [x] Bash logic mirrors the secret-scan recovery path tested in #2120 - [ ] CI green on this PR's pull_request scan + push to staging post-merge 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 23:54:00 -07:00

1 2 3 4 5 ...

3217 Commits