Commit Graph

3220 Commits

Author SHA1 Message Date
Hongming Wang
b96f99da0f
Merge pull request #2175 from Molecule-AI/deps/docker-v28.5.2-ghsa-x4rx-4gw3-53p4
deps(docker): bump docker/docker v28.2.2 → v28.5.2 (GHSA-x4rx-4gw3-53p4, medium)
2026-04-27 13:42:29 +00:00
Hongming Wang
182de6f2b3
Merge pull request #2176 from Molecule-AI/feat/pr-guards-caller
ci: add pr-guards caller (disable auto-merge on push)
2026-04-27 13:42:17 +00:00
Hongming Wang
82b366fce5 ci: add pr-guards caller that disables auto-merge on push
Thin caller for molecule-ci's reusable disable-auto-merge-on-push
workflow. Forces operator re-engagement when a commit is pushed to
an open PR with auto-merge already enabled.

Pairs with the org-wide "Automatically delete head branches" repo
setting (also enabled today). Defense in depth:

1. Repo setting blocks pushes to a merged-and-deleted branch
   (post-merge orphan case — what bit #2174 today: my second
   commit landed on an already-merged-and-deleted branch).
2. This workflow catches in-queue races (push lands while the
   merge queue is processing) by disabling auto-merge so the
   operator must explicitly re-engage.

Together they cover the full lifecycle of "auto-merge enabled →
new commits arrive" without relying on operator discipline.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 06:39:31 -07:00
Hongming Wang
394dda2a4a deps(docker): bump docker/docker v28.2.2 → v28.5.2 (GHSA-x4rx-4gw3-53p4)
Closes the medium-severity dependabot alert #7 on workspace-server's
docker pin: "Moby firewalld reload makes published container ports
accessible from remote hosts" — fixed in v28.3.3, pulling v28.5.2
(latest in the v28 line).

Patch+minor bump within the v28 train; no client-API breaks
(workspace-server only uses docker.Client for container exec /
inspect, all stable since v20+).

Verification: full workspace-server test suite passes (18/18 packages
clean). Build clean.

Out of scope:
  - Alerts #10 and #11 (the AuthZ bypass + plugin-priv off-by-one)
    require v29.3.1, which is not yet published to the Go module
    proxy (latest published is v28.5.2). They'll close in a follow-up
    PR once v29 lands as a Go module.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 06:26:53 -07:00
Hongming Wang
a354ae2feb
Merge pull request #2174 from Molecule-AI/fix/lib-subpackage-and-drift-gate
fix(build): ship lib/ subpackage + extend drift gate to SUBPACKAGES
2026-04-27 13:07:00 +00:00
Hongming Wang
6e732ab714 fix(build): ship lib/ subpackage + extend drift gate to SUBPACKAGES
Two compounding bugs that bit hermes (and any other workspace that
reaches main.py:142):

1. workspace/lib/ was in EXCLUDE_DIRS so the published wheel didn't
   contain the directory at all. main.py imports `from lib.pre_stop
   import read_snapshot` (and `build_snapshot`, `write_snapshot`) so
   every workspace startup that reaches the snapshot path crashed
   with `ModuleNotFoundError: No module named 'lib'`.

2. Even if lib/ had shipped, `lib` wasn't in SUBPACKAGES so the
   import-rewriter would have left the bare `from lib.pre_stop`
   unqualified — it would still fail because the package would only
   be reachable as `molecule_runtime.lib`.

Fix: move `lib` from EXCLUDE_DIRS to SUBPACKAGES (one entry each).

Drift gate extension: the existing gate I added in #2163 only
asserted TOP_LEVEL_MODULES against workspace/*.py. This change adds
the symmetric assertion for SUBPACKAGES against workspace/<dir>/
(filtered by EXCLUDE_DIRS + presence of __init__.py). Catches both:
- Subpackage added to workspace/ but missed in SUBPACKAGES
- Subpackage missing from workspace/ but lingering in SUBPACKAGES
- Subpackage wrongly in EXCLUDE_DIRS while also referenced by
  rewritten imports (the lib case)

Tested locally: build of 0.1.99 now ships lib/ and main.py contains
`from molecule_runtime.lib.pre_stop import ...` correctly rewritten.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 06:03:46 -07:00
Hongming Wang
1100c50da8
Merge pull request #2172 from Molecule-AI/feat/e2e-cover-all-8-runtimes
feat(e2e): extend priority-runtimes test to cover all 8 templates
2026-04-27 13:00:43 +00:00
Hongming Wang
c7478af99f feat(e2e): extend priority-runtimes test to cover all 8 templates
Tonight's wire-real E2E sweep exposed 12+ root causes across the post-
#87 template extraction. Most would have been caught by an actual
provision-and-online test running on each template — but the test only
covered claude-code + hermes. Extending it to cover all 8 ensures any
future regression in any template fails the test, not production.

What's added:
- run_openai_runtime(runtime, label): generic provisioner for the 5
  OpenAI-backed templates (langgraph, crewai, autogen, deepagents,
  openclaw). Same shape as run_hermes minus the HERMES_* config block
  that hermes-agent needs.
- run_gemini_cli: separate function — gemini-cli wants a Google AI
  key (E2E_GEMINI_API_KEY), not OpenAI.
- Each new runtime registered in the dispatch loop. New `all` keyword
  for E2E_RUNTIMES runs every covered runtime.

claude-code + hermes keep their dedicated functions; both have unique
provisioning quirks (claude-code OAuth + claude-code-specific volume
mounts; hermes 15-min cold-boot) that don't generalize cleanly.

Skip-if-no-key pattern matches the existing one — partially-keyed CI
gets clean skips, not false-fails.

Usage:
  E2E_OPENAI_API_KEY=... E2E_RUNTIMES=langgraph     ./test_priority_runtimes_e2e.sh
  E2E_OPENAI_API_KEY=... E2E_RUNTIMES=all           ./test_priority_runtimes_e2e.sh

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 05:57:59 -07:00
Hongming Wang
1a2ddb4539
Merge pull request #2171 from Molecule-AI/deps/jwt-go-v5.2.2-cve-2025-30204
deps(jwt): bump golang-jwt/jwt/v5 v5.2.1 → v5.2.2 (CVE-2025-30204, HIGH)
2026-04-27 12:44:54 +00:00
Hongming Wang
e63c3b2044
Merge pull request #2170 from Molecule-AI/fix/a2a-executor-sdk-migration
fix(a2a_executor): migrate to a2a-sdk 1.x API
2026-04-27 12:44:42 +00:00
Hongming Wang
041d255091
Merge pull request #2168 from Molecule-AI/ops/audit-railway-sha-pins
ops: add Railway SHA-pin drift audit script + regression test (#2001)
2026-04-27 12:44:31 +00:00
Hongming Wang
5b05d663ee test: update a2a.helpers mock to export new_text_message
The conftest mock only exposed `new_agent_text_message`, the pre-v1
name. After fixing a2a_executor.py to use the v1 name
`new_text_message`, the mock didn't satisfy the import → CI red.

Mock both names (aliased to the same lambda) so any in-flight test
that still references the old name keeps working until the next
sweep removes those references.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 05:34:28 -07:00
Hongming Wang
86bdfa3b47 deps(jwt): bump golang-jwt/jwt/v5 v5.2.1 → v5.2.2 (CVE-2025-30204)
Closes the HIGH-severity dependabot alert on workspace-server's jwt-go
pin. Upstream advisory GHSA-mh63-6h87-95cp / CVE-2025-30204:
"jwt-go allows excessive memory allocation during header parsing" —
fixed in v5.2.2.

Patch bump within the v5.x line; semver guarantees no API change. Full
workspace-server test suite passes (\`go test ./...\` clean across all
18 packages).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 05:31:58 -07:00
Hongming Wang
722e1fd175 fix(a2a_executor): migrate to a2a-sdk 1.x API — new_agent_text_message → new_text_message
a2a-sdk v1 renamed `new_agent_text_message` → `new_text_message`
(role=Role.agent is now the default). Same fix landed in the hermes
template earlier today; this is the runtime-side equivalent.

NOT dead code: a2a_executor.py is the LangGraph A2A executor, used by
the langgraph + deepagents templates. Both templates currently import
it via bare `from a2a_executor import LangGraphA2AExecutor` — which is
a separate bug in those templates, filed/fixed separately.

Symptom in a2a_executor.py form: any langgraph or deepagents workspace
that calls create_executor crashes with `ImportError: cannot import
name 'new_agent_text_message' from 'a2a.helpers'`. Doesn't surface for
claude-code or hermes (their templates use their own executors and
don't load a2a_executor).

Five call sites updated, one import line, one comment. Test suite
already passes against the new symbol — `python -c "from
molecule_runtime.a2a_executor import LangGraphA2AExecutor"` resolves
cleanly after this change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 05:29:59 -07:00
Hongming Wang
026f5e51d9 ops: add Railway SHA-pin drift audit script + regression test (#2001)
#2000 fixed one symptom — TENANT_IMAGE pinned to `staging-a14cf86`
(10 days stale) silently no-op'd four upstream fixes on 2026-04-24.
This adds the audit pattern as a re-runnable script so the broader
class is observable on demand without new CI infrastructure.

Audit results today (2026-04-27):
  controlplane / production: 54 vars audited, 0 drift-prone pins
  controlplane / staging:    52 vars audited, 0 drift-prone pins

So the immediate audit deliverable is clean — TENANT_IMAGE is the only
known violation and #2000 already fixed it. The script makes the
ongoing audit a 5-second command instead of a manual one.

Detection regex catches:
  * branch-SHA suffixes (`staging|main|prod|production-<6+ hex>`)
    — the exact 2026-04-24 incident shape
  * version pins after `:` or `=`  (`:v1.2.3`, `=v0.1.16`)
    — same drift class, just rendered differently

Anchoring on `:` or `=` keeps prose like "version 1.2.3 of the api"
out of the false-positive set. UUIDs, ARNs, AMI IDs, secrets, and
floating tags (`:staging-latest`, `:main`) pass through untouched.

Regression test (tests/ops/test_audit_railway_sha_pins.sh) pins 20
representative cases — 9 should-flag (covering all four branch
prefixes + semver variants + middle-of-value matches) and 11
should-pass (the false-positive guards).  Same regex inlined in both
files so a future tweak that weakens detection fails the test in
lockstep with weakening the audit.

Both files shellcheck clean.

CI gate (acceptance criterion's "regression: add a CI check") is
deliberately scoped out — querying Railway from CI requires plumbing
RAILWAY_TOKEN as a repo secret, which is multi-step setup. The
re-runnable script + test cover the same surface today; the CI
workflow is a small follow-up once the token is provisioned.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 05:01:23 -07:00
Hongming Wang
7cf77f274a
Merge pull request #2166 from Molecule-AI/test/unblock-resolveandstage-test
test(plugins): unblock TestResolveAndStage_NoInternalErrorsInHTTPErr (#1814)
2026-04-27 11:36:15 +00:00
Hongming Wang
dc2f6bd378
Merge pull request #2167 from Molecule-AI/fix/saas-federation-tutorial-409
docs(saas-federation): fix workspace-limit response code (409, not 402) (#1754)
2026-04-27 11:36:02 +00:00
Hongming Wang
3679a6eff6 docs(saas-federation): fix workspace-limit response code (409, not 402) (#1754)
Quota gates are resource-state conflicts, not payment failures —
RFC 9110 reserves 402 for billing/payment failures specifically. The
canonical Molecule-AI/docs PR #82 already shipped the corrected text;
this brings the molecule-core copy of the tutorial in line.

The inline parenthetical "(not 402 Payment Required — quota gates are
resource-state conflicts, not payment failures, per RFC 9110)" doubles
as a regression anchor: a future edit that flips 409 back to 402 would
have to also reword that explanation, making the change a deliberate
two-step act rather than a casual oversight.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 04:30:46 -07:00
Hongming Wang
a0154ea0b4 test(plugins): unblock TestResolveAndStage_NoInternalErrorsInHTTPErr (#1814)
Closes the second of two skipped tests in workspace_provision_test.go
that were blocked on interface refactors. The Broadcaster + CP
provisioner halves landed in earlier #1814 cycles; this is the
plugin-source-registry half.

Refactor:
  - Add handlers.pluginSources interface with the 3 methods handler
    code actually calls (Register, Resolve, Schemes)
  - Compile-time assertion `var _ pluginSources = (*plugins.Registry)(nil)`
    catches future method-signature drift at build time
  - PluginsHandler.sources narrowed from *plugins.Registry to the
    interface; production wiring (NewPluginsHandler, WithSourceResolver)
    still passes *plugins.Registry — satisfies the interface

Production fix (#1206 leak):
  - resolveAndStage's Fetch-failure path was interpolating err.Error()
    into the HTTP response body via `failed to fetch plugin from %s: %v`.
    Resolver errors routinely contain rate-limit text, github request
    IDs, raw HTTP body fragments, and (for local resolvers) file system
    paths — none has any business landing in a user's browser.
  - Body now carries just `failed to fetch plugin from <scheme>`; the
    status code already differentiates the failure shape (404 not
    found, 504 timeout, 502 generic). Full err detail stays in the
    server-side log line one statement above.

Test:
  - 6 sub-tests covering every error path inside resolveAndStage:
    empty source, invalid format, unknown scheme, local
    path-traversal, unpinned github (PLUGIN_ALLOW_UNPINNED unset),
    Fetch failure with a leaky synthetic error
  - The Fetch-failure case plants 5 realistic leak markers in the
    resolver's error string (rate limit text, x-github-request-id,
    auth_token, ghp_-prefixed token, /etc/passwd path); the assertion
    fails if ANY appears in the response body
  - Table-driven so a future error path added to resolveAndStage gets
    one new row, not a copy-paste of the assertion logic

Verification:
  - 6/6 sub-tests pass
  - Full workspace-server test suite passes (interface refactor is
    non-breaking; production caller paths unchanged)
  - go build ./... clean

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 04:00:39 -07:00
104650941a
Merge pull request #2165 from Molecule-AI/fix/main-sync-entry-point
fix: restore main_sync entry point in workspace/main.py
2026-04-27 10:54:44 +00:00
4c839cb306
Merge pull request #2164 from Molecule-AI/test/unblock-cp-provision-broadcast-test
test(provisioner): unblock TestProvisionWorkspaceCP_NoInternalErrorsInBroadcast (#1814)
2026-04-27 10:54:44 +00:00
Hongming Wang
3df5867b56 fix: restore main_sync entry point in workspace/main.py
The wheel's pyproject.toml has declared
`molecule-runtime = "molecule_runtime.main:main_sync"` since the
publish pipeline was created on 2026-04-26, but the function
itself was never present in workspace/main.py — it lived in the
pre-monorepo molecule-ai-workspace-runtime repo and was lost
during the consolidation that made workspace/ the source of truth.

The 0.1.15 wheel still had main_sync from a leftover snapshot,
so the regression went unnoticed until 0.1.16 (the first wheel
built from the new source-of-truth) shipped. Symptom: every
workspace container restart loops with

  ImportError: cannot import name 'main_sync' from 'molecule_runtime.main'

— the molecule-runtime CLI script's first line tries to import
the missing symbol. Workspaces stay in `provisioning` until the
10-min sweep marks them failed.

Caught by .github/workflows/runtime-pin-compat.yml, which already
imports the symbol by name as its smoke test. (That check kept
failing red on every recent merge_group run; this PR fixes the
underlying symbol-not-found instead of the smoke step.)

Also strengthens publish-runtime.yml's wheel smoke from
`import molecule_runtime.main` (loads the module — passes even
when entry-point target is missing) to `from molecule_runtime.main
import main_sync` (the actual contract the CLI script needs).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 03:35:49 -07:00
Hongming Wang
e15d1182cd test(provisioner): unblock TestProvisionWorkspaceCP_NoInternalErrorsInBroadcast (#1814)
The skipped test exists to assert that provisionWorkspaceCP never
leaks err.Error() in WORKSPACE_PROVISION_FAILED broadcasts (regression
guard for #1206). Writing the test body required substituting a
failing CPProvisioner — but the handler's `cpProv` field was the
concrete *CPProvisioner type, so a mock had nowhere to plug in.

Refactor:
  - Add provisioner.CPProvisionerAPI interface with the 3 methods
    handlers actually call (Start, Stop, GetConsoleOutput)
  - Compile-time assertion `var _ CPProvisionerAPI = (*CPProvisioner)(nil)`
    catches future method-signature drift at build time
  - WorkspaceHandler.cpProv narrowed to the interface; SetCPProvisioner
    accepts the interface (production caller passes *CPProvisioner
    from NewCPProvisioner unchanged)

Test:
  - stubFailingCPProv whose Start returns a deliberately leaky error
    (machine_type=t3.large, ami=…, vpc=…, raw HTTP body fragment)
  - Drive provisionWorkspaceCP via the cpProv.Start failure path
  - Assert broadcast["error"] == "provisioning failed" (canned)
  - Assert no leak markers (machine type, AMI, VPC, subnet, HTTP
    body, raw error head) in any broadcast string value
  - Stop/GetConsoleOutput on the stub panic — flags a future
    regression that reaches into them on this path

Verification:
  - Full workspace-server test suite passes (interface refactor
    is non-breaking; production caller path unchanged)
  - go build ./... clean
  - The other skipped test in this file (TestResolveAndStage_…)
    is a separate plugins.Registry refactor and remains skipped

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 03:28:25 -07:00
Hongming Wang
5022a740e1
Merge pull request #2163 from Molecule-AI/fix/build-script-drift-gate-and-main-smoke
fix(release): drift-gate TOP_LEVEL_MODULES + smoke-import main (post-0.1.16 incident)
2026-04-27 10:22:06 +00:00
Hongming Wang
c68dc1877f fix(release): drift-gate TOP_LEVEL_MODULES + smoke-import main in publish
Two compounding bugs surfaced when 0.1.16 hit production today:

1. scripts/build_runtime_package.py had a hand-curated TOP_LEVEL_MODULES
   set listing every workspace/*.py that should get its bare imports
   rewritten to `molecule_runtime.X`. The set silently went stale:
   - Missing: transcript_auth (added since #87 phase 1c), runtime_wedge,
     watcher → unrewritten imports shipped, every workspace startup
     died with ModuleNotFoundError.
   - Stale: claude_sdk_executor, cli_executor (both removed in #87),
     hermes_executor (never existed) → harmless but misleading.

2. publish-runtime.yml's wheel-smoke step asserted on stable invariants
   (BaseAdapter, AdapterConfig, a2a_client error sentinel) but never
   imported main. So even though main.py held the broken bare
   `from transcript_auth import ...`, the smoke check passed.

Fixes:

- Build script now derives the on-disk module set from workspace/*.py
  and asserts it matches TOP_LEVEL_MODULES exactly. Drift in either
  direction fails the build with a specific diff message instead of
  shipping a broken wheel. Closed-list typo guard preserved (we still
  edit the set explicitly when a module is added/removed) — the gate
  just makes drift impossible to ignore.

- TOP_LEVEL_MODULES updated to current reality: drop the 3 stale,
  add the 3 missing.

- publish-runtime.yml wheel-smoke now `import molecule_runtime.main`
  before the invariant asserts. main is the entry point and
  transitively imports every module — any bare-import bug surfaces
  as ModuleNotFoundError before PyPI accepts the upload.

Tested locally: `python3 scripts/build_runtime_package.py
--version 0.1.99 --out /tmp/build-test` succeeds, and
/tmp/build-test/molecule_runtime/main.py contains the rewritten
`from molecule_runtime.transcript_auth import ...`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 03:19:17 -07:00
Hongming Wang
6f0774c708
Merge pull request #2162 from Molecule-AI/fix/e2e-sanity-rc-normalization
fix(e2e-sanity): normalize unexpected curl exit codes in cleanup trap (#2159)
2026-04-27 10:05:14 +00:00
Hongming Wang
99fb61bb8c fix(e2e-sanity): normalize unexpected curl exit codes in cleanup trap (#2159)
When E2E_INTENTIONAL_FAILURE=1 poisons the tenant token, step 5/11's
`tenant_call POST /workspaces` curl exits 22 (HTTP error under
--fail-with-body). `set -e` propagates rc=22 directly, but the
script's documented contract emits only {0,1,2,3,4}, and the sanity
workflow's case statement only matches those. rc=22 falls through
to "Unexpected rc — investigate harness" and opens a false-positive
priority-high "safety net broken" issue (#2159, weekly run on
2026-04-27).

The trap now captures $? at entry (must be the first statement
before any command clobbers it) and at the end normalizes any
non-contract code to 1 (generic failure). Leak detection continues
to exit 4 directly, so its semantics are preserved.

Adds tests/e2e/test_harness_rc_normalization.sh — a self-contained
regression test that builds a stub harness with the same trap
pattern, triggers controlled exit codes, and asserts the
normalization. Covers the 5 contracted codes + curl-22 (the bug) +
3 representative network-failure codes + sigsegv-139.

Verification:
  - 10/10 regression tests pass
  - shellcheck clean on both modified files
  - production teardown path unchanged for legitimate {1,2,3,4}
    failures and the leak-detection exit 4

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 02:55:44 -07:00
c3d29941b8
Merge pull request #2161 from Molecule-AI/feat/auto-publish-runtime-on-staging
feat(publish-runtime): auto-publish to PyPI on staging pushes touching workspace/
2026-04-27 09:20:12 +00:00
Hongming Wang
7d872f9661
Merge pull request #2160 from Molecule-AI/feat/skill-runtime-compat
feat(skills): per-skill runtime compatibility (#119)
2026-04-27 09:15:01 +00:00
Hongming Wang
0a455b7d71 feat(publish-runtime): auto-publish to PyPI on staging pushes that touch workspace/
Adds a third trigger so any merge to staging that changes workspace/**
auto-publishes a new molecule-ai-workspace-runtime patch release. Closes
the human-in-loop gap that caused tonight's RuntimeCapabilities
ImportError outage.

Tonight: #117 added RuntimeCapabilities to molecule_runtime.adapters.base.
The merge landed at 02:37 UTC. Templates rebuilt their images at 07:37
UTC (4 hours later) and started importing the new symbol. PyPI was
still serving 0.1.15 (pre-#117) because nobody remembered to push a
runtime-vX.Y.Z tag or workflow_dispatch the publish. Result: every
template image shipped tonight runs `from molecule_runtime.adapters.base
import RuntimeCapabilities` against an installed runtime that doesn't
export it -> ImportError -> workspace never registers -> stuck in
provisioning until 10-min sweep.

Mechanism:
- New trigger: push to staging filtered to paths: ['workspace/**'].
  Path filter applies only to branch pushes; the existing tag trigger
  still fires unconditionally.
- Version derivation for the auto case: query PyPI's JSON API for
  current latest, bump the patch component. PyPI is the source of
  truth so concurrent runs don't double-publish (HTTP 400 on collision).
- concurrency: group serializes parallel staging merges so they don't
  race on the bump computation. cancel-in-progress: false because each
  workspace/** change deserves its own release.
- publish job now exposes its derived version as a job-level output so
  the cascade reads it cleanly. Fixes a latent bug: cascade tried to
  read steps.version.outputs.version, which is from a different job's
  scope and silently resolved to empty -- then re-derived from
  GITHUB_REF_NAME, which would have been "staging" under the new
  trigger and produced an invalid version.

Tag-driven and manual-dispatch paths are unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 02:11:45 -07:00
Hongming Wang
d19d35f6b3 test(skills): make watcher test fakes accept current_runtime kwarg
The runtime-compat change in this branch added a `current_runtime`
kwarg to load_skills(); the watcher passes it through. Test mocks
that pre-date the kwarg signature broke with TypeError, which the
watcher's reload-error try/except swallowed — the symptom was empty
callback lists, not a clear failure.

Switching the fakes to accept **kwargs keeps them forward-compat for
future load_skills additions without another test churn.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 02:04:26 -07:00
Hongming Wang
d0057912d2 feat(skills): per-skill runtime compatibility (#119, hermes pattern)
SKILL.md frontmatter can now declare `runtime: [claude-code]` or
`runtime: [hermes, claude-code]` to opt out of incompatible adapters
instead of failing at first invocation. Default `["*"]` means universal —
existing skill libraries need zero migration.

Borrowed from hermes' declarative skill-compat pattern surfaced in the
hermes architecture survey. The remaining two patterns (event-log
layer, observability config block) stay open under #119.

Wiring:
- SkillMetadata.runtime: list[str] = ["*"]
- _normalize_runtime_field accepts list, string-sugar, missing -> ["*"];
  malformed warns and falls back to universal so a typo never silently
  drops a skill.
- load_skills(..., current_runtime=...) filters out skills whose runtime
  list lacks "*" or current_runtime, with an INFO log line.
- BaseAdapter.start passes type(self).name() so the live adapter drives
  the filter; SkillsWatcher takes the same kwarg so hot-reload honors it.

8 new tests cover default universal, no-field universal, explicit
match/mismatch, string sugar, wildcard short-circuit, current_runtime=None
(preserves old behavior), and malformed-warns-not-drops.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 01:57:43 -07:00
Hongming Wang
e99f937630
Merge pull request #2157 from Molecule-AI/chore/drop-cli-executor-from-runtime
chore(workspace): drop cli_executor — Phase 3 of #87 [DRAFT]
2026-04-27 08:24:30 +00:00
Hongming Wang
4959c37040
Merge pull request #2158 from Molecule-AI/feat/steer-agent-to-attachments-field
feat(tools): tighten send_message_to_user description to forbid pasting URLs in body
2026-04-27 08:24:02 +00:00
Hongming Wang
98ca5c50fa chore(workspace): drop cli_executor — Phase 3 of #87 (DRAFT, blocked on gemini-cli image rebuild)
DRAFT — do NOT merge until gemini-cli template image rebuilds with
its local cli_executor.py copy (template PR #9 just merged at
07:59 UTC; image build kicks off now).

Final adapter-specific deletion from molecule-runtime, completing #87
for the priority adapters (claude-code via PR #2156, plus gemini-cli
via this PR + template #9).

Deletes:
  - workspace/cli_executor.py (461 LOC) — CLIAgentExecutor + the
    RUNTIME_PRESETS dict for codex / ollama / gemini-cli. The file
    moved to molecule-ai-workspace-template-gemini-cli (PR #9, merged).
  - workspace/tests/test_agent_base_urls.py — only consumer of
    CLIAgentExecutor in the test suite. Tests for the executor
    behavior live in the template repo now.

Updates:
  - workspace/tests/test_executor_helpers.py — docstring refresh:
    executor_helpers.py is the runtime-agnostic shared helpers; the
    executor classes themselves live in template repos post-#87.

Codex / ollama presets disappear naturally with the file. They never
had template repos, so no production path could invoke them anyway —
this is dead-code removal as a side effect of the move.

Verified-safe-to-delete:
  - heartbeat.py: doesn't import cli_executor
  - claude_sdk_executor.py: deleted by PR #2156 (in flight)
  - preflight.py: only references runtime names by string; no import
  - main.py: doesn't import cli_executor (uses adapter discovery via
    ADAPTER_MODULE; the template's adapter constructs the executor)
  - Only test_agent_base_urls.py + test_executor_helpers.py docstring
    referenced cli_executor

Verification:
  - 1249/1249 workspace pytest pass (was 1251; -2 = test_agent_base_urls.py
    cases — exact match)
  - No live import of cli_executor anywhere in molecule-core after deletion
    (grep verified)

Sequencing:
  1.  Template PR #9 (gemini-cli local copy) — MERGED
  2.  Template image rebuild — running
  3. THIS PR — wait until image is published, then mark ready-for-review

Closes #87 for the priority adapters: workspace/ is now adapter-
agnostic except for adapter discovery (ADAPTER_MODULE) + the
runtime_wedge primitive.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 01:22:39 -07:00
Hongming Wang
7504aba934 feat(tools): tighten send_message_to_user description to forbid pasting URLs in body
Root-cause fix for #118 (chat attachments rendering as plain text links
instead of download chips). User flagged with screenshot 2026-04-26
showing the Design Director agent pasting https://files.catbox.moe/…
in the message body — chat rendered the URL as plain markdown text,
unclickable in the canvas's bubble layout, and unreachable in any SaaS
deployment where the user's browser can't egress to catbox.

The structured `attachments` field already exists, the canvas's
AttachmentChip already renders well, the WebSocket broadcast already
carries attachments verbatim — the missing piece was the LLM choosing
the body over the structured field. Tighten the tool description so it
trains the right behavior.

Three targeted strengthenings:

  1. Top-level tool description: enumerated use case (4) now reads
     "via the `attachments` field (NEVER paste file URLs in `message`)".
     The all-caps NEVER + the explicit field name move the LLM toward
     the structured path on first read.

  2. `message` param: adds an explicit DO NOT rule with rationale.
     Includes the SaaS-reachability reason so operators can grep for
     "SaaS" and find this design constraint instead of re-discovering it
     after a tenant complaint. Calls out catbox.moe + file:// by name as
     concrete examples of forbidden hosts (those are the two we've seen
     in production).

  3. `attachments` param: leads with REQUIRED, lists the bad
     alternatives explicitly (pasting URLs, base64-encoding, telling
     user to look at a path). LLMs handle "use X, NOT Y" framings
     better than "use X" alone — observed during prompt-engineering
     iteration on hermes' tool descriptions.

Tests pin all three load-bearing phrases (4 new in test_a2a_mcp_server.py)
so a future doc edit that softens or drops them fails CI. Brittle by
design — these are prompt-engineering invariants, not implementation
details.

This is the root-cause fix. A defensive canvas-side backstop (auto-
detect download-shaped URLs in body and convert to chips) is a
follow-up that could land separately if the steering proves
insufficient in practice.

Verification:
  - 1190/1190 workspace pytest pass
  - 4 new test_a2a_mcp_server.py cases all green

Closes the steering half of #118. The structured-attachments-only
contract was already enforced server-side (PR #2130 added per-attachment
validation); this PR closes the prompt-side gap.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 01:13:11 -07:00
Hongming Wang
4e6030d783
Merge pull request #2156 from Molecule-AI/chore/drop-claude-sdk-executor-from-runtime
chore(workspace): drop claude_sdk_executor — Phase 2 of #87
2026-04-27 08:02:51 +00:00
Hongming Wang
2fbf6b6b27
Merge pull request #2155 from Molecule-AI/feat/preflight-runtime-discovery
feat(preflight): replace SUPPORTED_RUNTIMES static list with adapter discovery
2026-04-27 08:02:39 +00:00
Hongming Wang
4b5ac2ebc2 chore(workspace): drop claude_sdk_executor — Phase 2 of #87
Phase 2 of the universal-runtime refactor (task #87). Now that the
claude-code template repo ships its own claude_sdk_executor.py
(template PR #13 merged + image rebuilt at 07:36 UTC) the
molecule-runtime no longer needs to ship the file.

Deletes:
  - workspace/claude_sdk_executor.py (704 LOC)
  - workspace/tests/test_claude_sdk_executor.py (~1.6K LOC)

Updates:
  - workspace/runtime_wedge.py — drops the "Compatibility shim" docstring
    section. The shim was time-bounded ("removed once #87 Phase 2 lands");
    this is that PR.
  - workspace/tests/test_runtime_wedge.py — drops the
    TestClaudeSdkExecutorReExportShim test class (the shim doesn't
    exist anymore so the identity assertions would fail at import).
  - workspace/tests/conftest.py — drops the claude_agent_sdk stub.
    Its only consumer was test_claude_sdk_executor.py which is gone;
    no other test imports the SDK.
  - workspace/cli_executor.py — comment refresh: claude-code template
    repo (not workspace/) is now the home for ClaudeSDKExecutor.

Verified-safe-to-delete:
  - heartbeat.py: migrated to runtime_wedge in PR #2154 (no longer
    imports from claude_sdk_executor)
  - cli_executor.py: only comments referenced claude_sdk_executor;
    its line-117 ValueError defends against accidental routing
  - tests: only test_claude_sdk_executor.py + test_runtime_wedge.py's
    shim class consumed the deleted module; both removed in this PR

Verification:
  - 1182/1182 workspace pytest pass (was 1251; -69 = exactly the
    deleted test cases — zero unexpected regressions)
  - No live import of claude_sdk_executor anywhere in molecule-core
    after deletion (grep verified)

Closes #87 for the claude-code adapter. Hermes is already template-only.
The remaining adapter-specific code in workspace/ is cli_executor.py
(codex/ollama/gemini-cli) tracked by task #122. preflight.py's
SUPPORTED_RUNTIMES static list is tracked by task #123 (PR #2155 in
flight).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 00:52:55 -07:00
Hongming Wang
7dba700ac3 feat(preflight): replace SUPPORTED_RUNTIMES static list with adapter discovery
Closes task #123 — last piece of #87 cleanup.

Pre-fix: workspace/preflight.py:11 hardcoded a tuple of "supported"
runtime names (claude-code, codex, ollama, langgraph, etc.). Every
new template repo required a code change in molecule-runtime to be
recognized — direct violation of the universal-runtime principle
(#87) where adapters declare themselves and the runtime stays generic.

Post-fix: discovery-based validation via the same ADAPTER_MODULE env
var that production load paths already consult
(workspace/adapters/__init__.py:get_adapter). Distinguished failure
modes so operator messages are concrete:

  - ADAPTER_MODULE unset → "no adapter installed; set the env var"
  - ADAPTER_MODULE set but module won't import → import error type +
    message
  - module imports but no Adapter class → "convention violation, add
    `Adapter = YourClass`"
  - Adapter.name() raises → caught with operator message
  - Adapter.name() returns non-string → contract violation message
  - Adapter.name() doesn't match config.runtime → drift WARNING (not
    fatal; the adapter wins in production, config.yaml is just
    documentation)

The drift case is the one behavioral change worth calling out: the
prior static-list path would have hard-failed config.runtime values
not in the allowlist. With discovery, an unknown runtime in
config.yaml is just a documentation drift — the adapter that's
actually installed runs regardless. Operator gets a warning naming
both the configured and installed names so they can fix whichever
is stale.

Tests:
  - Replaces the obsolete "static list pass/fail" tests with 6 new
    cases covering each distinguished failure mode, plus a positive
    test for the adapter-matches-config happy path
  - Adds an autouse `_default_langgraph_adapter` fixture that
    pre-installs a fake adapter via sys.modules monkey-patching, so
    existing tests building default WorkspaceConfig (runtime="langgraph")
    inherit a valid adapter without each test setting ADAPTER_MODULE
  - Failure-mode tests opt out of the default fixture via
    @pytest.mark.no_default_adapter (registered in pytest.ini)
  - Sentinel pattern (`_UNSET = object()`) for `name_returns` so None
    is a passable test value (otherwise `is not None` would skip the
    None branch — exact bug the sentinel avoids)

Verification:
  - 22/22 preflight tests pass (was 16; +6 new failure-path tests)
  - 1256/1256 workspace pytest pass (was 1251; +5 net)
  - No production code path other than preflight changed

Source: 2026-04-27 #87 cleanup audit after PR #2154 (wedge extraction).
This change is independent of the cli_executor.py template moves
(task #122) — completes one of the two remaining cleanup items.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 00:44:51 -07:00
Hongming Wang
66b9c04057
Merge pull request #2154 from Molecule-AI/refactor/extract-wedge-state-from-claude-sdk
refactor(wedge): extract claude_sdk_executor wedge state into runtime_wedge module
2026-04-27 07:22:20 +00:00
Hongming Wang
5e049244d6 refactor(wedge): mark re-exports explicit via __all__
Addresses github-code-quality unused-import flag on the runtime_wedge
re-export shim.  Adds __all__ listing the names that exist purely for
backwards-compat (is_wedged / wedge_reason / _reset_sdk_wedge_for_test)
so static analysis recognizes the imports as deliberate exports.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 00:20:23 -07:00
Hongming Wang
feb544938b refactor(wedge): address review feedback — class wrap + import-path doc + dedupe shim rationale
Three changes from /code-review-and-quality on PR #2154:

1. Optional (architecture): wrap state in a private _WedgeState class
   instead of bare module-level globals. Public API (mark_wedged /
   clear_wedge / is_wedged / wedge_reason / reset_for_test) is
   unchanged — adapters never see the class. The class is forward-cover
   for any future per-scope variant (multiple executors per process, a
   keyed registry, etc.) without churning the call sites. Today there's
   exactly one instance (_DEFAULT) so behavior is identical.

2. Optional (readability): clarify the import path in the integration
   recipe — in a TEMPLATE repo it's `from molecule_runtime.runtime_wedge`
   (PyPI package); in molecule-core itself it's `from runtime_wedge`
   (top-level module). Removes the trap where a contributor reading the
   docstring while editing in-repo copies the template-style import and
   gets ImportError.

3. Nit (readability): dedupe the shim rationale. claude_sdk_executor's
   re-export comment now points to runtime_wedge's "Compatibility shim"
   section as the source of truth instead of restating the same content.
   Avoids docs-in-two-places drift risk.

Verification:
  - 1251/1251 workspace pytest pass (no behavior change — class wrap
    is pure plumbing; module-level helpers delegate to the singleton)
  - All shim re-export identity tests still pass (the shim's
    `is_wedged is runtime_wedge.is_wedged` assertion holds because we
    re-export the SAME function object that delegates to _DEFAULT)

No new tests needed — the existing test suite covers the public API
contract; the class is an implementation detail behind that contract.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 00:16:33 -07:00
Hongming Wang
cd899c969f docs(wedge): integration recipe for adapters that want to flip-to-degraded
Doc-only follow-up to the wedge-state extraction. Adds proactive
guidance so the next adapter (hermes / codex / langgraph / a future
template) discovers the runtime_wedge primitive and integrates the
~6 LOC pattern uniformly instead of inventing its own wedge state.

Two additions:

  - workspace/runtime_wedge.py — new "How to use from a NEW adapter"
    section in the module docstring with the minimum viable
    integration recipe, what-you-get-for-free list, and explicit
    DON'TS (don't store local wedge state, don't mark for transient
    errors, don't write your own clear logic). Plus a "when wedge is
    the WRONG primitive" note to keep adopters from over-using it.

  - workspace/adapter_base.py — adds runtime_wedge to the
    "Cross-cutting capabilities your adapter can opt into" list in
    BaseAdapter's docstring (alongside capabilities() and
    idle_timeout_override()). Discoverability path: adapter author
    reads BaseAdapter docstring → sees runtime_wedge mention → reads
    runtime_wedge module docstring → has the recipe.

Also tightens the "to add a new agent infra" steps in BaseAdapter to
match the actual current model (standalone template repo + ADAPTER_MODULE
env var) rather than the obsolete workspace/adapters/<infra>/ layout
that hasn't been the path since the universal-runtime extraction
started.

Zero code change. Tests untouched (1251/1251 still pass).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 00:12:14 -07:00
Hongming Wang
1d231ed295 refactor(wedge): extract claude_sdk_executor wedge state into runtime_wedge module
Prerequisite for the universal-runtime refactor (task #87) to move
claude_sdk_executor.py out of molecule-runtime into the claude-code
template repo. heartbeat.py had a hard import:

    from claude_sdk_executor import is_wedged, wedge_reason

which would break the moment the executor moves out of the runtime
package — the heartbeat would lose access to the wedge state used to
flip workspace status to degraded.

Extract the wedge state to a runtime-side module that the heartbeat
can keep importing regardless of which adapter executor is wedged:

  - workspace/runtime_wedge.py — single-flag state + mark_wedged /
    clear_wedge / is_wedged / wedge_reason / reset_for_test. Same
    semantics as the original claude_sdk_executor implementation
    (sticky first-write-wins, auto-clear on observed success). 100
    LOC of pure stateless helpers; lock-free ok because there's one
    executor per workspace process today.

  - workspace/claude_sdk_executor.py — drops the in-file definitions;
    re-exports the same names from runtime_wedge as a backwards-compat
    shim. Any third-party adapter that imported is_wedged / wedge_reason
    / _mark_sdk_wedged from claude_sdk_executor keeps working for one
    release cycle while they migrate to runtime_wedge.

  - workspace/heartbeat.py — _runtime_state_payload() now imports
    from runtime_wedge instead of claude_sdk_executor. Lazy-import
    pattern preserved; the docstring updated to explain the new
    cross-cutting source-of-truth.

Tests (10 new in test_runtime_wedge.py):
  - Default state (unwedged), mark sets flag, first-write-wins,
    clear restores healthy, clear-when-not-wedged is no-op,
    re-marking after clear is allowed
  - Re-export shim: each old name in claude_sdk_executor IS the
    runtime_wedge function (identity check), state is shared
    (marking via the executor shim is observable via runtime_wedge
    and vice versa)

Verification:
  - 1251/1251 workspace pytest pass (was 1241 after orphan deletion;
    +10 = exactly the new test_runtime_wedge.py cases)
  - All existing test_claude_sdk_executor.py cases (which call
    _mark_sdk_wedged via the shim) still pass

After this lands + the claude-code template image rebuilds with the
local claude_sdk_executor.py copy (template PR #13), the molecule-
core deletion of workspace/claude_sdk_executor.py becomes safe (the
shim deletion comes alongside the file deletion, since runtime_wedge
is the new public API).

See project memory `project_runtime_native_pluggable.md`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 00:08:53 -07:00
Hongming Wang
c1e9aa7461
Merge pull request #2153 from Molecule-AI/fix/block-internal-paths-shallow-clone-bug
fix(ci): block-internal-paths handle merge_group + shallow-clone BASE
2026-04-27 06:58:32 +00:00
5d49cd7843
Merge pull request #2152 from Molecule-AI/chore/delete-orphan-hermes-executor
chore(workspace): delete orphan HermesA2AExecutor (-1.8K LOC dead code)
2026-04-27 06:58:21 +00:00
Hongming Wang
d46d558ca9
Merge pull request #2148 from Molecule-AI/test/canvas-lib-utils-runtime-names-1815
test(canvas): cover utils.cn + runtime-names.runtimeDisplayName (0% → 100%) (#1815)
2026-04-27 06:57:57 +00:00
Hongming Wang
a682dcb502
Merge pull request #2149 from Molecule-AI/test/canvas-actions-1815
test(canvas): cover canvas-actions restart-pending helpers (25% → 100%) (#1815)
2026-04-27 06:55:36 +00:00
Hongming Wang
17a6800374
Merge pull request #2150 from Molecule-AI/feat/priority-runtimes-e2e
test(e2e): claude-code + hermes priority-runtimes happy path
2026-04-27 06:55:20 +00:00