Commit Graph

3217 Commits

Author SHA1 Message Date
Hongming Wang
6589929f87 docs: document the two PR auto-merge safety guards
Adds a section to CONTRIBUTING.md → "Pull Requests" explaining the two
system-level guards that protect against the "I enabled auto-merge then
pushed more commits" race:

1. Repo-wide setting: "Automatically delete head branches" (catches
   pushes to a merged-and-deleted branch — the post-merge orphan case).
2. CI workflow `pr-guards` calling molecule-ci's
   disable-auto-merge-on-push (catches pushes during queue
   processing — disables auto-merge, posts a comment, requires
   explicit re-engage).

Why doc-not-just-memory: my agent-side memory is local. Other
contributors on other machines need this in the repo where they
read it. Cites the 2026-04-27 PR #2174 incident with the
specific commit SHAs that got orphaned.

Companion: molecule-ci README updated separately to document the
reusable workflow under "What each workflow validates" so devs
who land in the molecule-ci repo first can find the contract.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 06:45:55 -07:00
Hongming Wang
a354ae2feb
Merge pull request #2174 from Molecule-AI/fix/lib-subpackage-and-drift-gate
fix(build): ship lib/ subpackage + extend drift gate to SUBPACKAGES
2026-04-27 13:07:00 +00:00
Hongming Wang
6e732ab714 fix(build): ship lib/ subpackage + extend drift gate to SUBPACKAGES
Two compounding bugs that bit hermes (and any other workspace that
reaches main.py:142):

1. workspace/lib/ was in EXCLUDE_DIRS so the published wheel didn't
   contain the directory at all. main.py imports `from lib.pre_stop
   import read_snapshot` (and `build_snapshot`, `write_snapshot`) so
   every workspace startup that reaches the snapshot path crashed
   with `ModuleNotFoundError: No module named 'lib'`.

2. Even if lib/ had shipped, `lib` wasn't in SUBPACKAGES so the
   import-rewriter would have left the bare `from lib.pre_stop`
   unqualified — it would still fail because the package would only
   be reachable as `molecule_runtime.lib`.

Fix: move `lib` from EXCLUDE_DIRS to SUBPACKAGES (one entry each).

Drift gate extension: the existing gate I added in #2163 only
asserted TOP_LEVEL_MODULES against workspace/*.py. This change adds
the symmetric assertion for SUBPACKAGES against workspace/<dir>/
(filtered by EXCLUDE_DIRS + presence of __init__.py). Catches both:
- Subpackage added to workspace/ but missed in SUBPACKAGES
- Subpackage missing from workspace/ but lingering in SUBPACKAGES
- Subpackage wrongly in EXCLUDE_DIRS while also referenced by
  rewritten imports (the lib case)

Tested locally: build of 0.1.99 now ships lib/ and main.py contains
`from molecule_runtime.lib.pre_stop import ...` correctly rewritten.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 06:03:46 -07:00
Hongming Wang
1100c50da8
Merge pull request #2172 from Molecule-AI/feat/e2e-cover-all-8-runtimes
feat(e2e): extend priority-runtimes test to cover all 8 templates
2026-04-27 13:00:43 +00:00
Hongming Wang
c7478af99f feat(e2e): extend priority-runtimes test to cover all 8 templates
Tonight's wire-real E2E sweep exposed 12+ root causes across the post-
#87 template extraction. Most would have been caught by an actual
provision-and-online test running on each template — but the test only
covered claude-code + hermes. Extending it to cover all 8 ensures any
future regression in any template fails the test, not production.

What's added:
- run_openai_runtime(runtime, label): generic provisioner for the 5
  OpenAI-backed templates (langgraph, crewai, autogen, deepagents,
  openclaw). Same shape as run_hermes minus the HERMES_* config block
  that hermes-agent needs.
- run_gemini_cli: separate function — gemini-cli wants a Google AI
  key (E2E_GEMINI_API_KEY), not OpenAI.
- Each new runtime registered in the dispatch loop. New `all` keyword
  for E2E_RUNTIMES runs every covered runtime.

claude-code + hermes keep their dedicated functions; both have unique
provisioning quirks (claude-code OAuth + claude-code-specific volume
mounts; hermes 15-min cold-boot) that don't generalize cleanly.

Skip-if-no-key pattern matches the existing one — partially-keyed CI
gets clean skips, not false-fails.

Usage:
  E2E_OPENAI_API_KEY=... E2E_RUNTIMES=langgraph     ./test_priority_runtimes_e2e.sh
  E2E_OPENAI_API_KEY=... E2E_RUNTIMES=all           ./test_priority_runtimes_e2e.sh

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 05:57:59 -07:00
Hongming Wang
1a2ddb4539
Merge pull request #2171 from Molecule-AI/deps/jwt-go-v5.2.2-cve-2025-30204
deps(jwt): bump golang-jwt/jwt/v5 v5.2.1 → v5.2.2 (CVE-2025-30204, HIGH)
2026-04-27 12:44:54 +00:00
Hongming Wang
e63c3b2044
Merge pull request #2170 from Molecule-AI/fix/a2a-executor-sdk-migration
fix(a2a_executor): migrate to a2a-sdk 1.x API
2026-04-27 12:44:42 +00:00
Hongming Wang
041d255091
Merge pull request #2168 from Molecule-AI/ops/audit-railway-sha-pins
ops: add Railway SHA-pin drift audit script + regression test (#2001)
2026-04-27 12:44:31 +00:00
Hongming Wang
5b05d663ee test: update a2a.helpers mock to export new_text_message
The conftest mock only exposed `new_agent_text_message`, the pre-v1
name. After fixing a2a_executor.py to use the v1 name
`new_text_message`, the mock didn't satisfy the import → CI red.

Mock both names (aliased to the same lambda) so any in-flight test
that still references the old name keeps working until the next
sweep removes those references.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 05:34:28 -07:00
Hongming Wang
86bdfa3b47 deps(jwt): bump golang-jwt/jwt/v5 v5.2.1 → v5.2.2 (CVE-2025-30204)
Closes the HIGH-severity dependabot alert on workspace-server's jwt-go
pin. Upstream advisory GHSA-mh63-6h87-95cp / CVE-2025-30204:
"jwt-go allows excessive memory allocation during header parsing" —
fixed in v5.2.2.

Patch bump within the v5.x line; semver guarantees no API change. Full
workspace-server test suite passes (\`go test ./...\` clean across all
18 packages).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 05:31:58 -07:00
Hongming Wang
722e1fd175 fix(a2a_executor): migrate to a2a-sdk 1.x API — new_agent_text_message → new_text_message
a2a-sdk v1 renamed `new_agent_text_message` → `new_text_message`
(role=Role.agent is now the default). Same fix landed in the hermes
template earlier today; this is the runtime-side equivalent.

NOT dead code: a2a_executor.py is the LangGraph A2A executor, used by
the langgraph + deepagents templates. Both templates currently import
it via bare `from a2a_executor import LangGraphA2AExecutor` — which is
a separate bug in those templates, filed/fixed separately.

Symptom in a2a_executor.py form: any langgraph or deepagents workspace
that calls create_executor crashes with `ImportError: cannot import
name 'new_agent_text_message' from 'a2a.helpers'`. Doesn't surface for
claude-code or hermes (their templates use their own executors and
don't load a2a_executor).

Five call sites updated, one import line, one comment. Test suite
already passes against the new symbol — `python -c "from
molecule_runtime.a2a_executor import LangGraphA2AExecutor"` resolves
cleanly after this change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 05:29:59 -07:00
Hongming Wang
026f5e51d9 ops: add Railway SHA-pin drift audit script + regression test (#2001)
#2000 fixed one symptom — TENANT_IMAGE pinned to `staging-a14cf86`
(10 days stale) silently no-op'd four upstream fixes on 2026-04-24.
This adds the audit pattern as a re-runnable script so the broader
class is observable on demand without new CI infrastructure.

Audit results today (2026-04-27):
  controlplane / production: 54 vars audited, 0 drift-prone pins
  controlplane / staging:    52 vars audited, 0 drift-prone pins

So the immediate audit deliverable is clean — TENANT_IMAGE is the only
known violation and #2000 already fixed it. The script makes the
ongoing audit a 5-second command instead of a manual one.

Detection regex catches:
  * branch-SHA suffixes (`staging|main|prod|production-<6+ hex>`)
    — the exact 2026-04-24 incident shape
  * version pins after `:` or `=`  (`:v1.2.3`, `=v0.1.16`)
    — same drift class, just rendered differently

Anchoring on `:` or `=` keeps prose like "version 1.2.3 of the api"
out of the false-positive set. UUIDs, ARNs, AMI IDs, secrets, and
floating tags (`:staging-latest`, `:main`) pass through untouched.

Regression test (tests/ops/test_audit_railway_sha_pins.sh) pins 20
representative cases — 9 should-flag (covering all four branch
prefixes + semver variants + middle-of-value matches) and 11
should-pass (the false-positive guards).  Same regex inlined in both
files so a future tweak that weakens detection fails the test in
lockstep with weakening the audit.

Both files shellcheck clean.

CI gate (acceptance criterion's "regression: add a CI check") is
deliberately scoped out — querying Railway from CI requires plumbing
RAILWAY_TOKEN as a repo secret, which is multi-step setup. The
re-runnable script + test cover the same surface today; the CI
workflow is a small follow-up once the token is provisioned.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 05:01:23 -07:00
Hongming Wang
7cf77f274a
Merge pull request #2166 from Molecule-AI/test/unblock-resolveandstage-test
test(plugins): unblock TestResolveAndStage_NoInternalErrorsInHTTPErr (#1814)
2026-04-27 11:36:15 +00:00
Hongming Wang
dc2f6bd378
Merge pull request #2167 from Molecule-AI/fix/saas-federation-tutorial-409
docs(saas-federation): fix workspace-limit response code (409, not 402) (#1754)
2026-04-27 11:36:02 +00:00
Hongming Wang
3679a6eff6 docs(saas-federation): fix workspace-limit response code (409, not 402) (#1754)
Quota gates are resource-state conflicts, not payment failures —
RFC 9110 reserves 402 for billing/payment failures specifically. The
canonical Molecule-AI/docs PR #82 already shipped the corrected text;
this brings the molecule-core copy of the tutorial in line.

The inline parenthetical "(not 402 Payment Required — quota gates are
resource-state conflicts, not payment failures, per RFC 9110)" doubles
as a regression anchor: a future edit that flips 409 back to 402 would
have to also reword that explanation, making the change a deliberate
two-step act rather than a casual oversight.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 04:30:46 -07:00
Hongming Wang
a0154ea0b4 test(plugins): unblock TestResolveAndStage_NoInternalErrorsInHTTPErr (#1814)
Closes the second of two skipped tests in workspace_provision_test.go
that were blocked on interface refactors. The Broadcaster + CP
provisioner halves landed in earlier #1814 cycles; this is the
plugin-source-registry half.

Refactor:
  - Add handlers.pluginSources interface with the 3 methods handler
    code actually calls (Register, Resolve, Schemes)
  - Compile-time assertion `var _ pluginSources = (*plugins.Registry)(nil)`
    catches future method-signature drift at build time
  - PluginsHandler.sources narrowed from *plugins.Registry to the
    interface; production wiring (NewPluginsHandler, WithSourceResolver)
    still passes *plugins.Registry — satisfies the interface

Production fix (#1206 leak):
  - resolveAndStage's Fetch-failure path was interpolating err.Error()
    into the HTTP response body via `failed to fetch plugin from %s: %v`.
    Resolver errors routinely contain rate-limit text, github request
    IDs, raw HTTP body fragments, and (for local resolvers) file system
    paths — none has any business landing in a user's browser.
  - Body now carries just `failed to fetch plugin from <scheme>`; the
    status code already differentiates the failure shape (404 not
    found, 504 timeout, 502 generic). Full err detail stays in the
    server-side log line one statement above.

Test:
  - 6 sub-tests covering every error path inside resolveAndStage:
    empty source, invalid format, unknown scheme, local
    path-traversal, unpinned github (PLUGIN_ALLOW_UNPINNED unset),
    Fetch failure with a leaky synthetic error
  - The Fetch-failure case plants 5 realistic leak markers in the
    resolver's error string (rate limit text, x-github-request-id,
    auth_token, ghp_-prefixed token, /etc/passwd path); the assertion
    fails if ANY appears in the response body
  - Table-driven so a future error path added to resolveAndStage gets
    one new row, not a copy-paste of the assertion logic

Verification:
  - 6/6 sub-tests pass
  - Full workspace-server test suite passes (interface refactor is
    non-breaking; production caller paths unchanged)
  - go build ./... clean

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 04:00:39 -07:00
104650941a
Merge pull request #2165 from Molecule-AI/fix/main-sync-entry-point
fix: restore main_sync entry point in workspace/main.py
2026-04-27 10:54:44 +00:00
4c839cb306
Merge pull request #2164 from Molecule-AI/test/unblock-cp-provision-broadcast-test
test(provisioner): unblock TestProvisionWorkspaceCP_NoInternalErrorsInBroadcast (#1814)
2026-04-27 10:54:44 +00:00
Hongming Wang
3df5867b56 fix: restore main_sync entry point in workspace/main.py
The wheel's pyproject.toml has declared
`molecule-runtime = "molecule_runtime.main:main_sync"` since the
publish pipeline was created on 2026-04-26, but the function
itself was never present in workspace/main.py — it lived in the
pre-monorepo molecule-ai-workspace-runtime repo and was lost
during the consolidation that made workspace/ the source of truth.

The 0.1.15 wheel still had main_sync from a leftover snapshot,
so the regression went unnoticed until 0.1.16 (the first wheel
built from the new source-of-truth) shipped. Symptom: every
workspace container restart loops with

  ImportError: cannot import name 'main_sync' from 'molecule_runtime.main'

— the molecule-runtime CLI script's first line tries to import
the missing symbol. Workspaces stay in `provisioning` until the
10-min sweep marks them failed.

Caught by .github/workflows/runtime-pin-compat.yml, which already
imports the symbol by name as its smoke test. (That check kept
failing red on every recent merge_group run; this PR fixes the
underlying symbol-not-found instead of the smoke step.)

Also strengthens publish-runtime.yml's wheel smoke from
`import molecule_runtime.main` (loads the module — passes even
when entry-point target is missing) to `from molecule_runtime.main
import main_sync` (the actual contract the CLI script needs).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 03:35:49 -07:00
Hongming Wang
e15d1182cd test(provisioner): unblock TestProvisionWorkspaceCP_NoInternalErrorsInBroadcast (#1814)
The skipped test exists to assert that provisionWorkspaceCP never
leaks err.Error() in WORKSPACE_PROVISION_FAILED broadcasts (regression
guard for #1206). Writing the test body required substituting a
failing CPProvisioner — but the handler's `cpProv` field was the
concrete *CPProvisioner type, so a mock had nowhere to plug in.

Refactor:
  - Add provisioner.CPProvisionerAPI interface with the 3 methods
    handlers actually call (Start, Stop, GetConsoleOutput)
  - Compile-time assertion `var _ CPProvisionerAPI = (*CPProvisioner)(nil)`
    catches future method-signature drift at build time
  - WorkspaceHandler.cpProv narrowed to the interface; SetCPProvisioner
    accepts the interface (production caller passes *CPProvisioner
    from NewCPProvisioner unchanged)

Test:
  - stubFailingCPProv whose Start returns a deliberately leaky error
    (machine_type=t3.large, ami=…, vpc=…, raw HTTP body fragment)
  - Drive provisionWorkspaceCP via the cpProv.Start failure path
  - Assert broadcast["error"] == "provisioning failed" (canned)
  - Assert no leak markers (machine type, AMI, VPC, subnet, HTTP
    body, raw error head) in any broadcast string value
  - Stop/GetConsoleOutput on the stub panic — flags a future
    regression that reaches into them on this path

Verification:
  - Full workspace-server test suite passes (interface refactor
    is non-breaking; production caller path unchanged)
  - go build ./... clean
  - The other skipped test in this file (TestResolveAndStage_…)
    is a separate plugins.Registry refactor and remains skipped

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 03:28:25 -07:00
Hongming Wang
5022a740e1
Merge pull request #2163 from Molecule-AI/fix/build-script-drift-gate-and-main-smoke
fix(release): drift-gate TOP_LEVEL_MODULES + smoke-import main (post-0.1.16 incident)
2026-04-27 10:22:06 +00:00
Hongming Wang
c68dc1877f fix(release): drift-gate TOP_LEVEL_MODULES + smoke-import main in publish
Two compounding bugs surfaced when 0.1.16 hit production today:

1. scripts/build_runtime_package.py had a hand-curated TOP_LEVEL_MODULES
   set listing every workspace/*.py that should get its bare imports
   rewritten to `molecule_runtime.X`. The set silently went stale:
   - Missing: transcript_auth (added since #87 phase 1c), runtime_wedge,
     watcher → unrewritten imports shipped, every workspace startup
     died with ModuleNotFoundError.
   - Stale: claude_sdk_executor, cli_executor (both removed in #87),
     hermes_executor (never existed) → harmless but misleading.

2. publish-runtime.yml's wheel-smoke step asserted on stable invariants
   (BaseAdapter, AdapterConfig, a2a_client error sentinel) but never
   imported main. So even though main.py held the broken bare
   `from transcript_auth import ...`, the smoke check passed.

Fixes:

- Build script now derives the on-disk module set from workspace/*.py
  and asserts it matches TOP_LEVEL_MODULES exactly. Drift in either
  direction fails the build with a specific diff message instead of
  shipping a broken wheel. Closed-list typo guard preserved (we still
  edit the set explicitly when a module is added/removed) — the gate
  just makes drift impossible to ignore.

- TOP_LEVEL_MODULES updated to current reality: drop the 3 stale,
  add the 3 missing.

- publish-runtime.yml wheel-smoke now `import molecule_runtime.main`
  before the invariant asserts. main is the entry point and
  transitively imports every module — any bare-import bug surfaces
  as ModuleNotFoundError before PyPI accepts the upload.

Tested locally: `python3 scripts/build_runtime_package.py
--version 0.1.99 --out /tmp/build-test` succeeds, and
/tmp/build-test/molecule_runtime/main.py contains the rewritten
`from molecule_runtime.transcript_auth import ...`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 03:19:17 -07:00
Hongming Wang
6f0774c708
Merge pull request #2162 from Molecule-AI/fix/e2e-sanity-rc-normalization
fix(e2e-sanity): normalize unexpected curl exit codes in cleanup trap (#2159)
2026-04-27 10:05:14 +00:00
Hongming Wang
99fb61bb8c fix(e2e-sanity): normalize unexpected curl exit codes in cleanup trap (#2159)
When E2E_INTENTIONAL_FAILURE=1 poisons the tenant token, step 5/11's
`tenant_call POST /workspaces` curl exits 22 (HTTP error under
--fail-with-body). `set -e` propagates rc=22 directly, but the
script's documented contract emits only {0,1,2,3,4}, and the sanity
workflow's case statement only matches those. rc=22 falls through
to "Unexpected rc — investigate harness" and opens a false-positive
priority-high "safety net broken" issue (#2159, weekly run on
2026-04-27).

The trap now captures $? at entry (must be the first statement
before any command clobbers it) and at the end normalizes any
non-contract code to 1 (generic failure). Leak detection continues
to exit 4 directly, so its semantics are preserved.

Adds tests/e2e/test_harness_rc_normalization.sh — a self-contained
regression test that builds a stub harness with the same trap
pattern, triggers controlled exit codes, and asserts the
normalization. Covers the 5 contracted codes + curl-22 (the bug) +
3 representative network-failure codes + sigsegv-139.

Verification:
  - 10/10 regression tests pass
  - shellcheck clean on both modified files
  - production teardown path unchanged for legitimate {1,2,3,4}
    failures and the leak-detection exit 4

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 02:55:44 -07:00
c3d29941b8
Merge pull request #2161 from Molecule-AI/feat/auto-publish-runtime-on-staging
feat(publish-runtime): auto-publish to PyPI on staging pushes touching workspace/
2026-04-27 09:20:12 +00:00
Hongming Wang
7d872f9661
Merge pull request #2160 from Molecule-AI/feat/skill-runtime-compat
feat(skills): per-skill runtime compatibility (#119)
2026-04-27 09:15:01 +00:00
Hongming Wang
0a455b7d71 feat(publish-runtime): auto-publish to PyPI on staging pushes that touch workspace/
Adds a third trigger so any merge to staging that changes workspace/**
auto-publishes a new molecule-ai-workspace-runtime patch release. Closes
the human-in-loop gap that caused tonight's RuntimeCapabilities
ImportError outage.

Tonight: #117 added RuntimeCapabilities to molecule_runtime.adapters.base.
The merge landed at 02:37 UTC. Templates rebuilt their images at 07:37
UTC (4 hours later) and started importing the new symbol. PyPI was
still serving 0.1.15 (pre-#117) because nobody remembered to push a
runtime-vX.Y.Z tag or workflow_dispatch the publish. Result: every
template image shipped tonight runs `from molecule_runtime.adapters.base
import RuntimeCapabilities` against an installed runtime that doesn't
export it -> ImportError -> workspace never registers -> stuck in
provisioning until 10-min sweep.

Mechanism:
- New trigger: push to staging filtered to paths: ['workspace/**'].
  Path filter applies only to branch pushes; the existing tag trigger
  still fires unconditionally.
- Version derivation for the auto case: query PyPI's JSON API for
  current latest, bump the patch component. PyPI is the source of
  truth so concurrent runs don't double-publish (HTTP 400 on collision).
- concurrency: group serializes parallel staging merges so they don't
  race on the bump computation. cancel-in-progress: false because each
  workspace/** change deserves its own release.
- publish job now exposes its derived version as a job-level output so
  the cascade reads it cleanly. Fixes a latent bug: cascade tried to
  read steps.version.outputs.version, which is from a different job's
  scope and silently resolved to empty -- then re-derived from
  GITHUB_REF_NAME, which would have been "staging" under the new
  trigger and produced an invalid version.

Tag-driven and manual-dispatch paths are unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 02:11:45 -07:00
Hongming Wang
d19d35f6b3 test(skills): make watcher test fakes accept current_runtime kwarg
The runtime-compat change in this branch added a `current_runtime`
kwarg to load_skills(); the watcher passes it through. Test mocks
that pre-date the kwarg signature broke with TypeError, which the
watcher's reload-error try/except swallowed — the symptom was empty
callback lists, not a clear failure.

Switching the fakes to accept **kwargs keeps them forward-compat for
future load_skills additions without another test churn.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 02:04:26 -07:00
Hongming Wang
d0057912d2 feat(skills): per-skill runtime compatibility (#119, hermes pattern)
SKILL.md frontmatter can now declare `runtime: [claude-code]` or
`runtime: [hermes, claude-code]` to opt out of incompatible adapters
instead of failing at first invocation. Default `["*"]` means universal —
existing skill libraries need zero migration.

Borrowed from hermes' declarative skill-compat pattern surfaced in the
hermes architecture survey. The remaining two patterns (event-log
layer, observability config block) stay open under #119.

Wiring:
- SkillMetadata.runtime: list[str] = ["*"]
- _normalize_runtime_field accepts list, string-sugar, missing -> ["*"];
  malformed warns and falls back to universal so a typo never silently
  drops a skill.
- load_skills(..., current_runtime=...) filters out skills whose runtime
  list lacks "*" or current_runtime, with an INFO log line.
- BaseAdapter.start passes type(self).name() so the live adapter drives
  the filter; SkillsWatcher takes the same kwarg so hot-reload honors it.

8 new tests cover default universal, no-field universal, explicit
match/mismatch, string sugar, wildcard short-circuit, current_runtime=None
(preserves old behavior), and malformed-warns-not-drops.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 01:57:43 -07:00
Hongming Wang
e99f937630
Merge pull request #2157 from Molecule-AI/chore/drop-cli-executor-from-runtime
chore(workspace): drop cli_executor — Phase 3 of #87 [DRAFT]
2026-04-27 08:24:30 +00:00
Hongming Wang
4959c37040
Merge pull request #2158 from Molecule-AI/feat/steer-agent-to-attachments-field
feat(tools): tighten send_message_to_user description to forbid pasting URLs in body
2026-04-27 08:24:02 +00:00
Hongming Wang
98ca5c50fa chore(workspace): drop cli_executor — Phase 3 of #87 (DRAFT, blocked on gemini-cli image rebuild)
DRAFT — do NOT merge until gemini-cli template image rebuilds with
its local cli_executor.py copy (template PR #9 just merged at
07:59 UTC; image build kicks off now).

Final adapter-specific deletion from molecule-runtime, completing #87
for the priority adapters (claude-code via PR #2156, plus gemini-cli
via this PR + template #9).

Deletes:
  - workspace/cli_executor.py (461 LOC) — CLIAgentExecutor + the
    RUNTIME_PRESETS dict for codex / ollama / gemini-cli. The file
    moved to molecule-ai-workspace-template-gemini-cli (PR #9, merged).
  - workspace/tests/test_agent_base_urls.py — only consumer of
    CLIAgentExecutor in the test suite. Tests for the executor
    behavior live in the template repo now.

Updates:
  - workspace/tests/test_executor_helpers.py — docstring refresh:
    executor_helpers.py is the runtime-agnostic shared helpers; the
    executor classes themselves live in template repos post-#87.

Codex / ollama presets disappear naturally with the file. They never
had template repos, so no production path could invoke them anyway —
this is dead-code removal as a side effect of the move.

Verified-safe-to-delete:
  - heartbeat.py: doesn't import cli_executor
  - claude_sdk_executor.py: deleted by PR #2156 (in flight)
  - preflight.py: only references runtime names by string; no import
  - main.py: doesn't import cli_executor (uses adapter discovery via
    ADAPTER_MODULE; the template's adapter constructs the executor)
  - Only test_agent_base_urls.py + test_executor_helpers.py docstring
    referenced cli_executor

Verification:
  - 1249/1249 workspace pytest pass (was 1251; -2 = test_agent_base_urls.py
    cases — exact match)
  - No live import of cli_executor anywhere in molecule-core after deletion
    (grep verified)

Sequencing:
  1.  Template PR #9 (gemini-cli local copy) — MERGED
  2.  Template image rebuild — running
  3. THIS PR — wait until image is published, then mark ready-for-review

Closes #87 for the priority adapters: workspace/ is now adapter-
agnostic except for adapter discovery (ADAPTER_MODULE) + the
runtime_wedge primitive.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 01:22:39 -07:00
Hongming Wang
7504aba934 feat(tools): tighten send_message_to_user description to forbid pasting URLs in body
Root-cause fix for #118 (chat attachments rendering as plain text links
instead of download chips). User flagged with screenshot 2026-04-26
showing the Design Director agent pasting https://files.catbox.moe/…
in the message body — chat rendered the URL as plain markdown text,
unclickable in the canvas's bubble layout, and unreachable in any SaaS
deployment where the user's browser can't egress to catbox.

The structured `attachments` field already exists, the canvas's
AttachmentChip already renders well, the WebSocket broadcast already
carries attachments verbatim — the missing piece was the LLM choosing
the body over the structured field. Tighten the tool description so it
trains the right behavior.

Three targeted strengthenings:

  1. Top-level tool description: enumerated use case (4) now reads
     "via the `attachments` field (NEVER paste file URLs in `message`)".
     The all-caps NEVER + the explicit field name move the LLM toward
     the structured path on first read.

  2. `message` param: adds an explicit DO NOT rule with rationale.
     Includes the SaaS-reachability reason so operators can grep for
     "SaaS" and find this design constraint instead of re-discovering it
     after a tenant complaint. Calls out catbox.moe + file:// by name as
     concrete examples of forbidden hosts (those are the two we've seen
     in production).

  3. `attachments` param: leads with REQUIRED, lists the bad
     alternatives explicitly (pasting URLs, base64-encoding, telling
     user to look at a path). LLMs handle "use X, NOT Y" framings
     better than "use X" alone — observed during prompt-engineering
     iteration on hermes' tool descriptions.

Tests pin all three load-bearing phrases (4 new in test_a2a_mcp_server.py)
so a future doc edit that softens or drops them fails CI. Brittle by
design — these are prompt-engineering invariants, not implementation
details.

This is the root-cause fix. A defensive canvas-side backstop (auto-
detect download-shaped URLs in body and convert to chips) is a
follow-up that could land separately if the steering proves
insufficient in practice.

Verification:
  - 1190/1190 workspace pytest pass
  - 4 new test_a2a_mcp_server.py cases all green

Closes the steering half of #118. The structured-attachments-only
contract was already enforced server-side (PR #2130 added per-attachment
validation); this PR closes the prompt-side gap.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 01:13:11 -07:00
Hongming Wang
4e6030d783
Merge pull request #2156 from Molecule-AI/chore/drop-claude-sdk-executor-from-runtime
chore(workspace): drop claude_sdk_executor — Phase 2 of #87
2026-04-27 08:02:51 +00:00
Hongming Wang
2fbf6b6b27
Merge pull request #2155 from Molecule-AI/feat/preflight-runtime-discovery
feat(preflight): replace SUPPORTED_RUNTIMES static list with adapter discovery
2026-04-27 08:02:39 +00:00
Hongming Wang
4b5ac2ebc2 chore(workspace): drop claude_sdk_executor — Phase 2 of #87
Phase 2 of the universal-runtime refactor (task #87). Now that the
claude-code template repo ships its own claude_sdk_executor.py
(template PR #13 merged + image rebuilt at 07:36 UTC) the
molecule-runtime no longer needs to ship the file.

Deletes:
  - workspace/claude_sdk_executor.py (704 LOC)
  - workspace/tests/test_claude_sdk_executor.py (~1.6K LOC)

Updates:
  - workspace/runtime_wedge.py — drops the "Compatibility shim" docstring
    section. The shim was time-bounded ("removed once #87 Phase 2 lands");
    this is that PR.
  - workspace/tests/test_runtime_wedge.py — drops the
    TestClaudeSdkExecutorReExportShim test class (the shim doesn't
    exist anymore so the identity assertions would fail at import).
  - workspace/tests/conftest.py — drops the claude_agent_sdk stub.
    Its only consumer was test_claude_sdk_executor.py which is gone;
    no other test imports the SDK.
  - workspace/cli_executor.py — comment refresh: claude-code template
    repo (not workspace/) is now the home for ClaudeSDKExecutor.

Verified-safe-to-delete:
  - heartbeat.py: migrated to runtime_wedge in PR #2154 (no longer
    imports from claude_sdk_executor)
  - cli_executor.py: only comments referenced claude_sdk_executor;
    its line-117 ValueError defends against accidental routing
  - tests: only test_claude_sdk_executor.py + test_runtime_wedge.py's
    shim class consumed the deleted module; both removed in this PR

Verification:
  - 1182/1182 workspace pytest pass (was 1251; -69 = exactly the
    deleted test cases — zero unexpected regressions)
  - No live import of claude_sdk_executor anywhere in molecule-core
    after deletion (grep verified)

Closes #87 for the claude-code adapter. Hermes is already template-only.
The remaining adapter-specific code in workspace/ is cli_executor.py
(codex/ollama/gemini-cli) tracked by task #122. preflight.py's
SUPPORTED_RUNTIMES static list is tracked by task #123 (PR #2155 in
flight).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 00:52:55 -07:00
Hongming Wang
7dba700ac3 feat(preflight): replace SUPPORTED_RUNTIMES static list with adapter discovery
Closes task #123 — last piece of #87 cleanup.

Pre-fix: workspace/preflight.py:11 hardcoded a tuple of "supported"
runtime names (claude-code, codex, ollama, langgraph, etc.). Every
new template repo required a code change in molecule-runtime to be
recognized — direct violation of the universal-runtime principle
(#87) where adapters declare themselves and the runtime stays generic.

Post-fix: discovery-based validation via the same ADAPTER_MODULE env
var that production load paths already consult
(workspace/adapters/__init__.py:get_adapter). Distinguished failure
modes so operator messages are concrete:

  - ADAPTER_MODULE unset → "no adapter installed; set the env var"
  - ADAPTER_MODULE set but module won't import → import error type +
    message
  - module imports but no Adapter class → "convention violation, add
    `Adapter = YourClass`"
  - Adapter.name() raises → caught with operator message
  - Adapter.name() returns non-string → contract violation message
  - Adapter.name() doesn't match config.runtime → drift WARNING (not
    fatal; the adapter wins in production, config.yaml is just
    documentation)

The drift case is the one behavioral change worth calling out: the
prior static-list path would have hard-failed config.runtime values
not in the allowlist. With discovery, an unknown runtime in
config.yaml is just a documentation drift — the adapter that's
actually installed runs regardless. Operator gets a warning naming
both the configured and installed names so they can fix whichever
is stale.

Tests:
  - Replaces the obsolete "static list pass/fail" tests with 6 new
    cases covering each distinguished failure mode, plus a positive
    test for the adapter-matches-config happy path
  - Adds an autouse `_default_langgraph_adapter` fixture that
    pre-installs a fake adapter via sys.modules monkey-patching, so
    existing tests building default WorkspaceConfig (runtime="langgraph")
    inherit a valid adapter without each test setting ADAPTER_MODULE
  - Failure-mode tests opt out of the default fixture via
    @pytest.mark.no_default_adapter (registered in pytest.ini)
  - Sentinel pattern (`_UNSET = object()`) for `name_returns` so None
    is a passable test value (otherwise `is not None` would skip the
    None branch — exact bug the sentinel avoids)

Verification:
  - 22/22 preflight tests pass (was 16; +6 new failure-path tests)
  - 1256/1256 workspace pytest pass (was 1251; +5 net)
  - No production code path other than preflight changed

Source: 2026-04-27 #87 cleanup audit after PR #2154 (wedge extraction).
This change is independent of the cli_executor.py template moves
(task #122) — completes one of the two remaining cleanup items.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 00:44:51 -07:00
Hongming Wang
66b9c04057
Merge pull request #2154 from Molecule-AI/refactor/extract-wedge-state-from-claude-sdk
refactor(wedge): extract claude_sdk_executor wedge state into runtime_wedge module
2026-04-27 07:22:20 +00:00
Hongming Wang
5e049244d6 refactor(wedge): mark re-exports explicit via __all__
Addresses github-code-quality unused-import flag on the runtime_wedge
re-export shim.  Adds __all__ listing the names that exist purely for
backwards-compat (is_wedged / wedge_reason / _reset_sdk_wedge_for_test)
so static analysis recognizes the imports as deliberate exports.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 00:20:23 -07:00
Hongming Wang
feb544938b refactor(wedge): address review feedback — class wrap + import-path doc + dedupe shim rationale
Three changes from /code-review-and-quality on PR #2154:

1. Optional (architecture): wrap state in a private _WedgeState class
   instead of bare module-level globals. Public API (mark_wedged /
   clear_wedge / is_wedged / wedge_reason / reset_for_test) is
   unchanged — adapters never see the class. The class is forward-cover
   for any future per-scope variant (multiple executors per process, a
   keyed registry, etc.) without churning the call sites. Today there's
   exactly one instance (_DEFAULT) so behavior is identical.

2. Optional (readability): clarify the import path in the integration
   recipe — in a TEMPLATE repo it's `from molecule_runtime.runtime_wedge`
   (PyPI package); in molecule-core itself it's `from runtime_wedge`
   (top-level module). Removes the trap where a contributor reading the
   docstring while editing in-repo copies the template-style import and
   gets ImportError.

3. Nit (readability): dedupe the shim rationale. claude_sdk_executor's
   re-export comment now points to runtime_wedge's "Compatibility shim"
   section as the source of truth instead of restating the same content.
   Avoids docs-in-two-places drift risk.

Verification:
  - 1251/1251 workspace pytest pass (no behavior change — class wrap
    is pure plumbing; module-level helpers delegate to the singleton)
  - All shim re-export identity tests still pass (the shim's
    `is_wedged is runtime_wedge.is_wedged` assertion holds because we
    re-export the SAME function object that delegates to _DEFAULT)

No new tests needed — the existing test suite covers the public API
contract; the class is an implementation detail behind that contract.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 00:16:33 -07:00
Hongming Wang
cd899c969f docs(wedge): integration recipe for adapters that want to flip-to-degraded
Doc-only follow-up to the wedge-state extraction. Adds proactive
guidance so the next adapter (hermes / codex / langgraph / a future
template) discovers the runtime_wedge primitive and integrates the
~6 LOC pattern uniformly instead of inventing its own wedge state.

Two additions:

  - workspace/runtime_wedge.py — new "How to use from a NEW adapter"
    section in the module docstring with the minimum viable
    integration recipe, what-you-get-for-free list, and explicit
    DON'TS (don't store local wedge state, don't mark for transient
    errors, don't write your own clear logic). Plus a "when wedge is
    the WRONG primitive" note to keep adopters from over-using it.

  - workspace/adapter_base.py — adds runtime_wedge to the
    "Cross-cutting capabilities your adapter can opt into" list in
    BaseAdapter's docstring (alongside capabilities() and
    idle_timeout_override()). Discoverability path: adapter author
    reads BaseAdapter docstring → sees runtime_wedge mention → reads
    runtime_wedge module docstring → has the recipe.

Also tightens the "to add a new agent infra" steps in BaseAdapter to
match the actual current model (standalone template repo + ADAPTER_MODULE
env var) rather than the obsolete workspace/adapters/<infra>/ layout
that hasn't been the path since the universal-runtime extraction
started.

Zero code change. Tests untouched (1251/1251 still pass).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 00:12:14 -07:00
Hongming Wang
1d231ed295 refactor(wedge): extract claude_sdk_executor wedge state into runtime_wedge module
Prerequisite for the universal-runtime refactor (task #87) to move
claude_sdk_executor.py out of molecule-runtime into the claude-code
template repo. heartbeat.py had a hard import:

    from claude_sdk_executor import is_wedged, wedge_reason

which would break the moment the executor moves out of the runtime
package — the heartbeat would lose access to the wedge state used to
flip workspace status to degraded.

Extract the wedge state to a runtime-side module that the heartbeat
can keep importing regardless of which adapter executor is wedged:

  - workspace/runtime_wedge.py — single-flag state + mark_wedged /
    clear_wedge / is_wedged / wedge_reason / reset_for_test. Same
    semantics as the original claude_sdk_executor implementation
    (sticky first-write-wins, auto-clear on observed success). 100
    LOC of pure stateless helpers; lock-free ok because there's one
    executor per workspace process today.

  - workspace/claude_sdk_executor.py — drops the in-file definitions;
    re-exports the same names from runtime_wedge as a backwards-compat
    shim. Any third-party adapter that imported is_wedged / wedge_reason
    / _mark_sdk_wedged from claude_sdk_executor keeps working for one
    release cycle while they migrate to runtime_wedge.

  - workspace/heartbeat.py — _runtime_state_payload() now imports
    from runtime_wedge instead of claude_sdk_executor. Lazy-import
    pattern preserved; the docstring updated to explain the new
    cross-cutting source-of-truth.

Tests (10 new in test_runtime_wedge.py):
  - Default state (unwedged), mark sets flag, first-write-wins,
    clear restores healthy, clear-when-not-wedged is no-op,
    re-marking after clear is allowed
  - Re-export shim: each old name in claude_sdk_executor IS the
    runtime_wedge function (identity check), state is shared
    (marking via the executor shim is observable via runtime_wedge
    and vice versa)

Verification:
  - 1251/1251 workspace pytest pass (was 1241 after orphan deletion;
    +10 = exactly the new test_runtime_wedge.py cases)
  - All existing test_claude_sdk_executor.py cases (which call
    _mark_sdk_wedged via the shim) still pass

After this lands + the claude-code template image rebuilds with the
local claude_sdk_executor.py copy (template PR #13), the molecule-
core deletion of workspace/claude_sdk_executor.py becomes safe (the
shim deletion comes alongside the file deletion, since runtime_wedge
is the new public API).

See project memory `project_runtime_native_pluggable.md`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 00:08:53 -07:00
Hongming Wang
c1e9aa7461
Merge pull request #2153 from Molecule-AI/fix/block-internal-paths-shallow-clone-bug
fix(ci): block-internal-paths handle merge_group + shallow-clone BASE
2026-04-27 06:58:32 +00:00
5d49cd7843
Merge pull request #2152 from Molecule-AI/chore/delete-orphan-hermes-executor
chore(workspace): delete orphan HermesA2AExecutor (-1.8K LOC dead code)
2026-04-27 06:58:21 +00:00
Hongming Wang
d46d558ca9
Merge pull request #2148 from Molecule-AI/test/canvas-lib-utils-runtime-names-1815
test(canvas): cover utils.cn + runtime-names.runtimeDisplayName (0% → 100%) (#1815)
2026-04-27 06:57:57 +00:00
Hongming Wang
a682dcb502
Merge pull request #2149 from Molecule-AI/test/canvas-actions-1815
test(canvas): cover canvas-actions restart-pending helpers (25% → 100%) (#1815)
2026-04-27 06:55:36 +00:00
Hongming Wang
17a6800374
Merge pull request #2150 from Molecule-AI/feat/priority-runtimes-e2e
test(e2e): claude-code + hermes priority-runtimes happy path
2026-04-27 06:55:20 +00:00
Hongming Wang
ae029f8c3f
Merge pull request #2151 from Molecule-AI/test/canvas-class-names-1815
test(canvas): cover store/classNames helpers (17% → 100%) (#1815)
2026-04-27 06:54:37 +00:00
Hongming Wang
516b58dcd7
Merge pull request #2147 from Molecule-AI/feat/canvas-coverage-instrumentation-1815
feat(canvas): vitest coverage instrumentation (#1815, no CI gate yet)
2026-04-27 06:54:22 +00:00
Hongming Wang
7ac7a010fa fix(ci): block-internal-paths handle merge_group + shallow-clone BASE
[Molecule-Platform-Evolvement-Manager]

## What was broken

Same bug class as the secret-scan.yml fix in #2120 — block-internal-paths
hit `fatal: bad object <sha>` exit 128 on the staging push at
2026-04-27 06:50:33Z.

Two cases:

1. **`merge_group` events**: BASE/HEAD came from
   `github.event.before` / `.after` which are push-event-only
   properties. On merge_group both came back empty, the script fell
   through to "scan entire tree" mode which is correct but
   inefficient. Worse, when this workflow is required for the merge
   queue (line 21-22), an empty-BASE entire-tree scan would run on
   every queue check.

2. **`push` events with shallow clones**: `fetch-depth: 2` doesn't
   always cover BASE across true merge commits. When BASE is in the
   payload but absent from the local object DB, `git diff` errors out
   with `fatal: bad object <sha>` and the job exits 128. This is what
   broke today's staging push.

## Fix

Same shape as the secret-scan.yml fix (#2120):

- Add a dedicated `git fetch` step for `merge_group.base_sha`.
- Move event-specific SHAs into a step `env:` block; script uses a
  `case` over `${{ github.event_name }}` covering pull_request /
  merge_group / push (rather than `if pull_request / else push`
  which left merge_group on the empty-BASE branch).
- On-demand fetch + `git cat-file -e` guard for push BASE so a SHA
  that's payload-present-but-DB-absent triggers the fetch, and a
  fetch failure falls through cleanly to "scan entire tree" instead
  of exiting 128.

## Test plan

- [x] YAML structure preserved (no schema changes)
- [x] Bash logic mirrors the secret-scan recovery path tested in #2120
- [ ] CI green on this PR's pull_request scan + push to staging post-merge

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 23:54:00 -07:00