Quota gates are resource-state conflicts, not payment failures —
RFC 9110 reserves 402 for billing/payment failures specifically. The
canonical Molecule-AI/docs PR #82 already shipped the corrected text;
this brings the molecule-core copy of the tutorial in line.
The inline parenthetical "(not 402 Payment Required — quota gates are
resource-state conflicts, not payment failures, per RFC 9110)" doubles
as a regression anchor: a future edit that flips 409 back to 402 would
have to also reword that explanation, making the change a deliberate
two-step act rather than a casual oversight.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The wheel's pyproject.toml has declared
`molecule-runtime = "molecule_runtime.main:main_sync"` since the
publish pipeline was created on 2026-04-26, but the function
itself was never present in workspace/main.py — it lived in the
pre-monorepo molecule-ai-workspace-runtime repo and was lost
during the consolidation that made workspace/ the source of truth.
The 0.1.15 wheel still had main_sync from a leftover snapshot,
so the regression went unnoticed until 0.1.16 (the first wheel
built from the new source-of-truth) shipped. Symptom: every
workspace container restart loops with
ImportError: cannot import name 'main_sync' from 'molecule_runtime.main'
— the molecule-runtime CLI script's first line tries to import
the missing symbol. Workspaces stay in `provisioning` until the
10-min sweep marks them failed.
Caught by .github/workflows/runtime-pin-compat.yml, which already
imports the symbol by name as its smoke test. (That check kept
failing red on every recent merge_group run; this PR fixes the
underlying symbol-not-found instead of the smoke step.)
Also strengthens publish-runtime.yml's wheel smoke from
`import molecule_runtime.main` (loads the module — passes even
when entry-point target is missing) to `from molecule_runtime.main
import main_sync` (the actual contract the CLI script needs).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The skipped test exists to assert that provisionWorkspaceCP never
leaks err.Error() in WORKSPACE_PROVISION_FAILED broadcasts (regression
guard for #1206). Writing the test body required substituting a
failing CPProvisioner — but the handler's `cpProv` field was the
concrete *CPProvisioner type, so a mock had nowhere to plug in.
Refactor:
- Add provisioner.CPProvisionerAPI interface with the 3 methods
handlers actually call (Start, Stop, GetConsoleOutput)
- Compile-time assertion `var _ CPProvisionerAPI = (*CPProvisioner)(nil)`
catches future method-signature drift at build time
- WorkspaceHandler.cpProv narrowed to the interface; SetCPProvisioner
accepts the interface (production caller passes *CPProvisioner
from NewCPProvisioner unchanged)
Test:
- stubFailingCPProv whose Start returns a deliberately leaky error
(machine_type=t3.large, ami=…, vpc=…, raw HTTP body fragment)
- Drive provisionWorkspaceCP via the cpProv.Start failure path
- Assert broadcast["error"] == "provisioning failed" (canned)
- Assert no leak markers (machine type, AMI, VPC, subnet, HTTP
body, raw error head) in any broadcast string value
- Stop/GetConsoleOutput on the stub panic — flags a future
regression that reaches into them on this path
Verification:
- Full workspace-server test suite passes (interface refactor
is non-breaking; production caller path unchanged)
- go build ./... clean
- The other skipped test in this file (TestResolveAndStage_…)
is a separate plugins.Registry refactor and remains skipped
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two compounding bugs surfaced when 0.1.16 hit production today:
1. scripts/build_runtime_package.py had a hand-curated TOP_LEVEL_MODULES
set listing every workspace/*.py that should get its bare imports
rewritten to `molecule_runtime.X`. The set silently went stale:
- Missing: transcript_auth (added since #87 phase 1c), runtime_wedge,
watcher → unrewritten imports shipped, every workspace startup
died with ModuleNotFoundError.
- Stale: claude_sdk_executor, cli_executor (both removed in #87),
hermes_executor (never existed) → harmless but misleading.
2. publish-runtime.yml's wheel-smoke step asserted on stable invariants
(BaseAdapter, AdapterConfig, a2a_client error sentinel) but never
imported main. So even though main.py held the broken bare
`from transcript_auth import ...`, the smoke check passed.
Fixes:
- Build script now derives the on-disk module set from workspace/*.py
and asserts it matches TOP_LEVEL_MODULES exactly. Drift in either
direction fails the build with a specific diff message instead of
shipping a broken wheel. Closed-list typo guard preserved (we still
edit the set explicitly when a module is added/removed) — the gate
just makes drift impossible to ignore.
- TOP_LEVEL_MODULES updated to current reality: drop the 3 stale,
add the 3 missing.
- publish-runtime.yml wheel-smoke now `import molecule_runtime.main`
before the invariant asserts. main is the entry point and
transitively imports every module — any bare-import bug surfaces
as ModuleNotFoundError before PyPI accepts the upload.
Tested locally: `python3 scripts/build_runtime_package.py
--version 0.1.99 --out /tmp/build-test` succeeds, and
/tmp/build-test/molecule_runtime/main.py contains the rewritten
`from molecule_runtime.transcript_auth import ...`.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
When E2E_INTENTIONAL_FAILURE=1 poisons the tenant token, step 5/11's
`tenant_call POST /workspaces` curl exits 22 (HTTP error under
--fail-with-body). `set -e` propagates rc=22 directly, but the
script's documented contract emits only {0,1,2,3,4}, and the sanity
workflow's case statement only matches those. rc=22 falls through
to "Unexpected rc — investigate harness" and opens a false-positive
priority-high "safety net broken" issue (#2159, weekly run on
2026-04-27).
The trap now captures $? at entry (must be the first statement
before any command clobbers it) and at the end normalizes any
non-contract code to 1 (generic failure). Leak detection continues
to exit 4 directly, so its semantics are preserved.
Adds tests/e2e/test_harness_rc_normalization.sh — a self-contained
regression test that builds a stub harness with the same trap
pattern, triggers controlled exit codes, and asserts the
normalization. Covers the 5 contracted codes + curl-22 (the bug) +
3 representative network-failure codes + sigsegv-139.
Verification:
- 10/10 regression tests pass
- shellcheck clean on both modified files
- production teardown path unchanged for legitimate {1,2,3,4}
failures and the leak-detection exit 4
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds a third trigger so any merge to staging that changes workspace/**
auto-publishes a new molecule-ai-workspace-runtime patch release. Closes
the human-in-loop gap that caused tonight's RuntimeCapabilities
ImportError outage.
Tonight: #117 added RuntimeCapabilities to molecule_runtime.adapters.base.
The merge landed at 02:37 UTC. Templates rebuilt their images at 07:37
UTC (4 hours later) and started importing the new symbol. PyPI was
still serving 0.1.15 (pre-#117) because nobody remembered to push a
runtime-vX.Y.Z tag or workflow_dispatch the publish. Result: every
template image shipped tonight runs `from molecule_runtime.adapters.base
import RuntimeCapabilities` against an installed runtime that doesn't
export it -> ImportError -> workspace never registers -> stuck in
provisioning until 10-min sweep.
Mechanism:
- New trigger: push to staging filtered to paths: ['workspace/**'].
Path filter applies only to branch pushes; the existing tag trigger
still fires unconditionally.
- Version derivation for the auto case: query PyPI's JSON API for
current latest, bump the patch component. PyPI is the source of
truth so concurrent runs don't double-publish (HTTP 400 on collision).
- concurrency: group serializes parallel staging merges so they don't
race on the bump computation. cancel-in-progress: false because each
workspace/** change deserves its own release.
- publish job now exposes its derived version as a job-level output so
the cascade reads it cleanly. Fixes a latent bug: cascade tried to
read steps.version.outputs.version, which is from a different job's
scope and silently resolved to empty -- then re-derived from
GITHUB_REF_NAME, which would have been "staging" under the new
trigger and produced an invalid version.
Tag-driven and manual-dispatch paths are unchanged.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The runtime-compat change in this branch added a `current_runtime`
kwarg to load_skills(); the watcher passes it through. Test mocks
that pre-date the kwarg signature broke with TypeError, which the
watcher's reload-error try/except swallowed — the symptom was empty
callback lists, not a clear failure.
Switching the fakes to accept **kwargs keeps them forward-compat for
future load_skills additions without another test churn.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
SKILL.md frontmatter can now declare `runtime: [claude-code]` or
`runtime: [hermes, claude-code]` to opt out of incompatible adapters
instead of failing at first invocation. Default `["*"]` means universal —
existing skill libraries need zero migration.
Borrowed from hermes' declarative skill-compat pattern surfaced in the
hermes architecture survey. The remaining two patterns (event-log
layer, observability config block) stay open under #119.
Wiring:
- SkillMetadata.runtime: list[str] = ["*"]
- _normalize_runtime_field accepts list, string-sugar, missing -> ["*"];
malformed warns and falls back to universal so a typo never silently
drops a skill.
- load_skills(..., current_runtime=...) filters out skills whose runtime
list lacks "*" or current_runtime, with an INFO log line.
- BaseAdapter.start passes type(self).name() so the live adapter drives
the filter; SkillsWatcher takes the same kwarg so hot-reload honors it.
8 new tests cover default universal, no-field universal, explicit
match/mismatch, string sugar, wildcard short-circuit, current_runtime=None
(preserves old behavior), and malformed-warns-not-drops.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
DRAFT — do NOT merge until gemini-cli template image rebuilds with
its local cli_executor.py copy (template PR #9 just merged at
07:59 UTC; image build kicks off now).
Final adapter-specific deletion from molecule-runtime, completing #87
for the priority adapters (claude-code via PR #2156, plus gemini-cli
via this PR + template #9).
Deletes:
- workspace/cli_executor.py (461 LOC) — CLIAgentExecutor + the
RUNTIME_PRESETS dict for codex / ollama / gemini-cli. The file
moved to molecule-ai-workspace-template-gemini-cli (PR #9, merged).
- workspace/tests/test_agent_base_urls.py — only consumer of
CLIAgentExecutor in the test suite. Tests for the executor
behavior live in the template repo now.
Updates:
- workspace/tests/test_executor_helpers.py — docstring refresh:
executor_helpers.py is the runtime-agnostic shared helpers; the
executor classes themselves live in template repos post-#87.
Codex / ollama presets disappear naturally with the file. They never
had template repos, so no production path could invoke them anyway —
this is dead-code removal as a side effect of the move.
Verified-safe-to-delete:
- heartbeat.py: doesn't import cli_executor
- claude_sdk_executor.py: deleted by PR #2156 (in flight)
- preflight.py: only references runtime names by string; no import
- main.py: doesn't import cli_executor (uses adapter discovery via
ADAPTER_MODULE; the template's adapter constructs the executor)
- Only test_agent_base_urls.py + test_executor_helpers.py docstring
referenced cli_executor
Verification:
- 1249/1249 workspace pytest pass (was 1251; -2 = test_agent_base_urls.py
cases — exact match)
- No live import of cli_executor anywhere in molecule-core after deletion
(grep verified)
Sequencing:
1. ✅ Template PR #9 (gemini-cli local copy) — MERGED
2. ⏳ Template image rebuild — running
3. THIS PR — wait until image is published, then mark ready-for-review
Closes#87 for the priority adapters: workspace/ is now adapter-
agnostic except for adapter discovery (ADAPTER_MODULE) + the
runtime_wedge primitive.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Root-cause fix for #118 (chat attachments rendering as plain text links
instead of download chips). User flagged with screenshot 2026-04-26
showing the Design Director agent pasting https://files.catbox.moe/…
in the message body — chat rendered the URL as plain markdown text,
unclickable in the canvas's bubble layout, and unreachable in any SaaS
deployment where the user's browser can't egress to catbox.
The structured `attachments` field already exists, the canvas's
AttachmentChip already renders well, the WebSocket broadcast already
carries attachments verbatim — the missing piece was the LLM choosing
the body over the structured field. Tighten the tool description so it
trains the right behavior.
Three targeted strengthenings:
1. Top-level tool description: enumerated use case (4) now reads
"via the `attachments` field (NEVER paste file URLs in `message`)".
The all-caps NEVER + the explicit field name move the LLM toward
the structured path on first read.
2. `message` param: adds an explicit DO NOT rule with rationale.
Includes the SaaS-reachability reason so operators can grep for
"SaaS" and find this design constraint instead of re-discovering it
after a tenant complaint. Calls out catbox.moe + file:// by name as
concrete examples of forbidden hosts (those are the two we've seen
in production).
3. `attachments` param: leads with REQUIRED, lists the bad
alternatives explicitly (pasting URLs, base64-encoding, telling
user to look at a path). LLMs handle "use X, NOT Y" framings
better than "use X" alone — observed during prompt-engineering
iteration on hermes' tool descriptions.
Tests pin all three load-bearing phrases (4 new in test_a2a_mcp_server.py)
so a future doc edit that softens or drops them fails CI. Brittle by
design — these are prompt-engineering invariants, not implementation
details.
This is the root-cause fix. A defensive canvas-side backstop (auto-
detect download-shaped URLs in body and convert to chips) is a
follow-up that could land separately if the steering proves
insufficient in practice.
Verification:
- 1190/1190 workspace pytest pass
- 4 new test_a2a_mcp_server.py cases all green
Closes the steering half of #118. The structured-attachments-only
contract was already enforced server-side (PR #2130 added per-attachment
validation); this PR closes the prompt-side gap.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 2 of the universal-runtime refactor (task #87). Now that the
claude-code template repo ships its own claude_sdk_executor.py
(template PR #13 merged + image rebuilt at 07:36 UTC) the
molecule-runtime no longer needs to ship the file.
Deletes:
- workspace/claude_sdk_executor.py (704 LOC)
- workspace/tests/test_claude_sdk_executor.py (~1.6K LOC)
Updates:
- workspace/runtime_wedge.py — drops the "Compatibility shim" docstring
section. The shim was time-bounded ("removed once #87 Phase 2 lands");
this is that PR.
- workspace/tests/test_runtime_wedge.py — drops the
TestClaudeSdkExecutorReExportShim test class (the shim doesn't
exist anymore so the identity assertions would fail at import).
- workspace/tests/conftest.py — drops the claude_agent_sdk stub.
Its only consumer was test_claude_sdk_executor.py which is gone;
no other test imports the SDK.
- workspace/cli_executor.py — comment refresh: claude-code template
repo (not workspace/) is now the home for ClaudeSDKExecutor.
Verified-safe-to-delete:
- heartbeat.py: migrated to runtime_wedge in PR #2154 (no longer
imports from claude_sdk_executor)
- cli_executor.py: only comments referenced claude_sdk_executor;
its line-117 ValueError defends against accidental routing
- tests: only test_claude_sdk_executor.py + test_runtime_wedge.py's
shim class consumed the deleted module; both removed in this PR
Verification:
- 1182/1182 workspace pytest pass (was 1251; -69 = exactly the
deleted test cases — zero unexpected regressions)
- No live import of claude_sdk_executor anywhere in molecule-core
after deletion (grep verified)
Closes#87 for the claude-code adapter. Hermes is already template-only.
The remaining adapter-specific code in workspace/ is cli_executor.py
(codex/ollama/gemini-cli) tracked by task #122. preflight.py's
SUPPORTED_RUNTIMES static list is tracked by task #123 (PR #2155 in
flight).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closes task #123 — last piece of #87 cleanup.
Pre-fix: workspace/preflight.py:11 hardcoded a tuple of "supported"
runtime names (claude-code, codex, ollama, langgraph, etc.). Every
new template repo required a code change in molecule-runtime to be
recognized — direct violation of the universal-runtime principle
(#87) where adapters declare themselves and the runtime stays generic.
Post-fix: discovery-based validation via the same ADAPTER_MODULE env
var that production load paths already consult
(workspace/adapters/__init__.py:get_adapter). Distinguished failure
modes so operator messages are concrete:
- ADAPTER_MODULE unset → "no adapter installed; set the env var"
- ADAPTER_MODULE set but module won't import → import error type +
message
- module imports but no Adapter class → "convention violation, add
`Adapter = YourClass`"
- Adapter.name() raises → caught with operator message
- Adapter.name() returns non-string → contract violation message
- Adapter.name() doesn't match config.runtime → drift WARNING (not
fatal; the adapter wins in production, config.yaml is just
documentation)
The drift case is the one behavioral change worth calling out: the
prior static-list path would have hard-failed config.runtime values
not in the allowlist. With discovery, an unknown runtime in
config.yaml is just a documentation drift — the adapter that's
actually installed runs regardless. Operator gets a warning naming
both the configured and installed names so they can fix whichever
is stale.
Tests:
- Replaces the obsolete "static list pass/fail" tests with 6 new
cases covering each distinguished failure mode, plus a positive
test for the adapter-matches-config happy path
- Adds an autouse `_default_langgraph_adapter` fixture that
pre-installs a fake adapter via sys.modules monkey-patching, so
existing tests building default WorkspaceConfig (runtime="langgraph")
inherit a valid adapter without each test setting ADAPTER_MODULE
- Failure-mode tests opt out of the default fixture via
@pytest.mark.no_default_adapter (registered in pytest.ini)
- Sentinel pattern (`_UNSET = object()`) for `name_returns` so None
is a passable test value (otherwise `is not None` would skip the
None branch — exact bug the sentinel avoids)
Verification:
- 22/22 preflight tests pass (was 16; +6 new failure-path tests)
- 1256/1256 workspace pytest pass (was 1251; +5 net)
- No production code path other than preflight changed
Source: 2026-04-27 #87 cleanup audit after PR #2154 (wedge extraction).
This change is independent of the cli_executor.py template moves
(task #122) — completes one of the two remaining cleanup items.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Addresses github-code-quality unused-import flag on the runtime_wedge
re-export shim. Adds __all__ listing the names that exist purely for
backwards-compat (is_wedged / wedge_reason / _reset_sdk_wedge_for_test)
so static analysis recognizes the imports as deliberate exports.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three changes from /code-review-and-quality on PR #2154:
1. Optional (architecture): wrap state in a private _WedgeState class
instead of bare module-level globals. Public API (mark_wedged /
clear_wedge / is_wedged / wedge_reason / reset_for_test) is
unchanged — adapters never see the class. The class is forward-cover
for any future per-scope variant (multiple executors per process, a
keyed registry, etc.) without churning the call sites. Today there's
exactly one instance (_DEFAULT) so behavior is identical.
2. Optional (readability): clarify the import path in the integration
recipe — in a TEMPLATE repo it's `from molecule_runtime.runtime_wedge`
(PyPI package); in molecule-core itself it's `from runtime_wedge`
(top-level module). Removes the trap where a contributor reading the
docstring while editing in-repo copies the template-style import and
gets ImportError.
3. Nit (readability): dedupe the shim rationale. claude_sdk_executor's
re-export comment now points to runtime_wedge's "Compatibility shim"
section as the source of truth instead of restating the same content.
Avoids docs-in-two-places drift risk.
Verification:
- 1251/1251 workspace pytest pass (no behavior change — class wrap
is pure plumbing; module-level helpers delegate to the singleton)
- All shim re-export identity tests still pass (the shim's
`is_wedged is runtime_wedge.is_wedged` assertion holds because we
re-export the SAME function object that delegates to _DEFAULT)
No new tests needed — the existing test suite covers the public API
contract; the class is an implementation detail behind that contract.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Doc-only follow-up to the wedge-state extraction. Adds proactive
guidance so the next adapter (hermes / codex / langgraph / a future
template) discovers the runtime_wedge primitive and integrates the
~6 LOC pattern uniformly instead of inventing its own wedge state.
Two additions:
- workspace/runtime_wedge.py — new "How to use from a NEW adapter"
section in the module docstring with the minimum viable
integration recipe, what-you-get-for-free list, and explicit
DON'TS (don't store local wedge state, don't mark for transient
errors, don't write your own clear logic). Plus a "when wedge is
the WRONG primitive" note to keep adopters from over-using it.
- workspace/adapter_base.py — adds runtime_wedge to the
"Cross-cutting capabilities your adapter can opt into" list in
BaseAdapter's docstring (alongside capabilities() and
idle_timeout_override()). Discoverability path: adapter author
reads BaseAdapter docstring → sees runtime_wedge mention → reads
runtime_wedge module docstring → has the recipe.
Also tightens the "to add a new agent infra" steps in BaseAdapter to
match the actual current model (standalone template repo + ADAPTER_MODULE
env var) rather than the obsolete workspace/adapters/<infra>/ layout
that hasn't been the path since the universal-runtime extraction
started.
Zero code change. Tests untouched (1251/1251 still pass).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Prerequisite for the universal-runtime refactor (task #87) to move
claude_sdk_executor.py out of molecule-runtime into the claude-code
template repo. heartbeat.py had a hard import:
from claude_sdk_executor import is_wedged, wedge_reason
which would break the moment the executor moves out of the runtime
package — the heartbeat would lose access to the wedge state used to
flip workspace status to degraded.
Extract the wedge state to a runtime-side module that the heartbeat
can keep importing regardless of which adapter executor is wedged:
- workspace/runtime_wedge.py — single-flag state + mark_wedged /
clear_wedge / is_wedged / wedge_reason / reset_for_test. Same
semantics as the original claude_sdk_executor implementation
(sticky first-write-wins, auto-clear on observed success). 100
LOC of pure stateless helpers; lock-free ok because there's one
executor per workspace process today.
- workspace/claude_sdk_executor.py — drops the in-file definitions;
re-exports the same names from runtime_wedge as a backwards-compat
shim. Any third-party adapter that imported is_wedged / wedge_reason
/ _mark_sdk_wedged from claude_sdk_executor keeps working for one
release cycle while they migrate to runtime_wedge.
- workspace/heartbeat.py — _runtime_state_payload() now imports
from runtime_wedge instead of claude_sdk_executor. Lazy-import
pattern preserved; the docstring updated to explain the new
cross-cutting source-of-truth.
Tests (10 new in test_runtime_wedge.py):
- Default state (unwedged), mark sets flag, first-write-wins,
clear restores healthy, clear-when-not-wedged is no-op,
re-marking after clear is allowed
- Re-export shim: each old name in claude_sdk_executor IS the
runtime_wedge function (identity check), state is shared
(marking via the executor shim is observable via runtime_wedge
and vice versa)
Verification:
- 1251/1251 workspace pytest pass (was 1241 after orphan deletion;
+10 = exactly the new test_runtime_wedge.py cases)
- All existing test_claude_sdk_executor.py cases (which call
_mark_sdk_wedged via the shim) still pass
After this lands + the claude-code template image rebuilds with the
local claude_sdk_executor.py copy (template PR #13), the molecule-
core deletion of workspace/claude_sdk_executor.py becomes safe (the
shim deletion comes alongside the file deletion, since runtime_wedge
is the new public API).
See project memory `project_runtime_native_pluggable.md`.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
[Molecule-Platform-Evolvement-Manager]
## What was broken
Same bug class as the secret-scan.yml fix in #2120 — block-internal-paths
hit `fatal: bad object <sha>` exit 128 on the staging push at
2026-04-27 06:50:33Z.
Two cases:
1. **`merge_group` events**: BASE/HEAD came from
`github.event.before` / `.after` which are push-event-only
properties. On merge_group both came back empty, the script fell
through to "scan entire tree" mode which is correct but
inefficient. Worse, when this workflow is required for the merge
queue (line 21-22), an empty-BASE entire-tree scan would run on
every queue check.
2. **`push` events with shallow clones**: `fetch-depth: 2` doesn't
always cover BASE across true merge commits. When BASE is in the
payload but absent from the local object DB, `git diff` errors out
with `fatal: bad object <sha>` and the job exits 128. This is what
broke today's staging push.
## Fix
Same shape as the secret-scan.yml fix (#2120):
- Add a dedicated `git fetch` step for `merge_group.base_sha`.
- Move event-specific SHAs into a step `env:` block; script uses a
`case` over `${{ github.event_name }}` covering pull_request /
merge_group / push (rather than `if pull_request / else push`
which left merge_group on the empty-BASE branch).
- On-demand fetch + `git cat-file -e` guard for push BASE so a SHA
that's payload-present-but-DB-absent triggers the fetch, and a
fetch failure falls through cleanly to "scan entire tree" instead
of exiting 128.
## Test plan
- [x] YAML structure preserved (no schema changes)
- [x] Bash logic mirrors the secret-scan recovery path tested in #2120
- [ ] CI green on this PR's pull_request scan + push to staging post-merge
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Removes:
- workspace/hermes_executor.py (545 LOC) — HermesA2AExecutor, an
OpenAI-compat direct-call executor that was the original hermes
integration before the template was rewritten to bridge to
hermes-agent's sidecar API server.
- workspace/tests/test_hermes_executor.py (1307 LOC) — its test file.
Verified-dead-code analysis:
- Zero `from hermes_executor` / `import hermes_executor` imports
anywhere in workspace/, workspace-server/, or
workspace-configs-templates/ (excluding the file itself + its test).
- The hermes template (workspace-configs-templates/hermes/executor.py)
uses HermesAgentProxyExecutor, NOT HermesA2AExecutor — they're
independent implementations. The executor.py file imports from
`executor` (local), not from molecule_runtime.
- Last touched in PR #1974 (2026 a2a-sdk migration to 1.0.0) for SDK
compatibility — kept compiling but never wired into any code path.
- Older than that, only the 2026 open-source restructure rename.
Why now: starting task #87 (universal-runtime violation, move adapter-
specific code out of workspace/). Dead-code deletion is the safest
first step and motivates the broader refactor by clearing the
landscape — no risk of someone defending HermesA2AExecutor as
"actually used somewhere."
Verification:
- 1241/1241 workspace pytest pass (was 1312; the 71 dropped tests
are exactly test_hermes_executor.py's coverage)
- No new failures, no broken imports anywhere
The remaining adapter-specific executors in workspace/ that #87 will
eventually relocate (per the user's scope: claude-code + hermes priority,
others later):
- workspace/claude_sdk_executor.py (757 LOC) → claude-code template repo
- workspace/cli_executor.py (461 LOC) → defer (codex/ollama/etc still
use the runtime presets here; comes back later when those bump versions)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
[Molecule-Platform-Evolvement-Manager]
Continues the #1815 coverage rollup. classNames.ts was at 17%
in the baseline; this PR brings it to full coverage.
16 cases across 3 helpers:
**appendClass (6):**
- undefined / empty existing → just `cls`
- single-class → "a b" join
- DEDUP: existing already contains `cls` → existing unchanged.
This is the load-bearing reason classNames.ts exists. Pre-helper
the call sites inlined `${existing} ${cls}` with no dedup, so a
tick that fired the same class twice produced "a a" and React
Flow's className-equality diff saw it as a change every render.
- whitespace normalization (multi-space, leading/trailing)
**removeClass (7):**
- undefined / empty existing → ""
- removes named class
- exact match only ("spawn" must NOT match "spawn-fast")
- removing the only class → ""
- no-op when class absent
- whitespace normalization
**scheduleNodeClassRemoval (3):**
- after delayMs: calls set() with className-removed on target node;
OTHER nodes untouched (the per-id pruning is the contract — pin
it so a future refactor that maps over all nodes doesn't silently
strip classes from siblings)
- does NOT fire before the delay elapses (vi.useFakeTimers + advance)
- SSR safety: when window is undefined, function is a no-op
(neither get nor set fires)
## Note on test environment
Added `// @vitest-environment jsdom` directive — the file's
default `node` environment leaves `window` undefined, which would
make the SSR-guard happy-path test pass for the wrong reason
(every test would short-circuit). With jsdom, the SSR test
explicitly stubs `window` to undefined to exercise the guard.
## Test plan
- [x] All 16 cases pass locally (~1.1s with jsdom env spin-up)
- [x] No SUT changes
- [ ] CI green
## #1815 progress
- [x] Step 1+2: instrumentation (#2147)
- [x] utils.ts + runtime-names.ts (#2148)
- [x] canvas-actions.ts (#2149)
- [x] store/classNames.ts (this PR)
- [ ] store/canvas.ts (73% — biggest absolute gap; bigger surface,
separate cycle)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Self-contained happy-path E2E for the two runtimes the project commits
to first-class support for (task #116, completes the loop on the
"both must work end-to-end with tests" requirement).
What it proves per runtime:
1. POST /workspaces succeeds with the runtime + secrets
2. Workspace reaches status=online within its cold-boot window
(claude-code: 240s, hermes: 900s on cold apt + uv + sidecar)
3. POST /a2a (message/send "Reply with PONG") returns a non-error,
non-empty reply
4. activity_logs row written with method=message/send and ok|error
status (a2a_proxy.LogActivity contract)
Skip semantics: each phase independently checks for its required env
key (CLAUDE_CODE_OAUTH_TOKEN / E2E_OPENAI_API_KEY) and skips cleanly
if absent. The script always exit-0s if every phase either passed or
skipped — so wiring it into a no-keys CI job validates the script
itself stays clean without false-failing.
Idempotent: pre-sweeps any prior "Priority E2E (claude-code)" /
"Priority E2E (hermes)" workspaces so a run interrupted by SIGPIPE /
kill -9 (which bypasses the EXIT trap) doesn't poison the next run.
Same defensive pattern as test_notify_attachments_e2e.sh.
CI wiring:
- e2e-api.yml — runs on every PR with no LLM keys, both phases skip,
catches script-level regressions (set -u bugs, syntax issues, etc.)
- canary-staging.yml + e2e-staging-saas.yml already have the keys
via secrets.MOLECULE_STAGING_OPENAI_KEY and exercise wire-real
behavior — could be wired to opt-in if you want claude-code coverage
there too.
Local runs (from this branch, no keys):
=== Results: 0 passed, 0 failed, 2 skipped ===
Validates the capability primitives shipped in PRs #2137-2144: once
template PRs #12 (claude-code) + #25 (hermes) merge with their
declared provides_native_session=True + idle_timeout_override=900,
a manual run with both keys validates the full native+pluggable chain.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
[Molecule-Platform-Evolvement-Manager]
Continues the #1815 coverage rollup. canvas-actions.ts was at 25%
in the baseline run from #2147; this PR brings the file's two
helpers to full coverage.
5 cases:
**markAllWorkspacesNeedRestart (3):**
- calls updateNodeData on every node with `{needsRestart: true}`
- no-op when the canvas has zero workspaces
- preserves call ordering — matters because the toolbar's
Restart Pending pill observes per-node data changes
incrementally; a refactor that shuffled iteration order would
silently change which workspaces flash first
**markWorkspaceNeedsRestart (2):**
- targeted call: updateNodeData fires exactly once on the named id
- defensive: regardless of how many other workspaces exist in the
store, only the target workspace gets updated. Pre-this-test, a
refactor that accidentally wired this function through the
per-node iteration path of markAll would silently mark every
workspace — pinning the cardinality here catches that.
## Mock strategy
Standard pattern for canvas store: mock useCanvasStore as both the
selector function AND a getState()-bearing object. updateNodeData
is a vi.fn() spy so the test asserts on calls + args directly.
## Test plan
- [x] All 5 cases pass locally (~132ms)
- [x] No SUT changes — pure additive coverage
- [ ] CI green
## #1815 progress
- [x] Step 1+2: instrumentation + script (#2147)
- [x] utils.ts + runtime-names.ts (#2148)
- [x] canvas-actions.ts (this PR)
- [ ] Remaining low-coverage targets: store/classNames.ts (17%),
store/canvas.ts (73% — largest absolute gap by lines)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
[Molecule-Platform-Evolvement-Manager]
Closes two of the 0%-coverage files surfaced by the baseline run in
PR #2147 (vitest coverage instrumentation). Both files are tiny
utility helpers with high-touch read paths.
## utils.cn (8 cases)
Wraps `twMerge(clsx(inputs))` — every conditionally-styled component
flows through here. The load-bearing case is the **last-wins
Tailwind dedup**: `cn("p-2", "p-4")` → "p-4". A regression that lost
twMerge would silently double-apply utilities (cosmetically broken,
breaks `:where()` rules + theme overrides).
Cases:
- single class unchanged
- multiple positional classes joined
- array input flattening (clsx)
- object syntax with truthy/falsy keys
- last-wins dedup on conflicting Tailwind utilities (the
regression-locked guarantee)
- non-conflicting utilities both survive (p-2 + m-4)
- mixed input shapes (string + array + object + string)
- nullish / empty inputs don't throw
## runtime-names.runtimeDisplayName (4 it.each cases + 3 it())
Friendly-name lookup that surfaces the workspace runtime in the chat
indicator, details tab, and a few component labels.
Cases:
- known runtimes map to display strings
(claude-code → Claude Code, langgraph → LangGraph, etc.)
- unknown runtime falls back to input string verbatim
(a NEW runtime not yet in the lookup still renders something
operator-debuggable rather than a generic placeholder)
- empty string falls back to "agent" (final default)
- case-sensitivity pinned: "Claude-Code" / "LANGGRAPH" miss the
lookup. The upstream slug is already normalized lowercase, so a
future refactor that lowercases input "for safety" would
silently change behavior — pinning the contract here.
## Test plan
- [x] All 17 cases pass locally (~129ms)
- [x] No SUT changes — pure additive coverage
- [ ] CI green
## #1815 progress
- [x] Step 1+2: coverage instrumentation + script (#2147)
- [x] 0%-file gaps utils.ts + runtime-names.ts (this PR)
- [ ] More 0%/low-coverage files: lib/canvas-actions.ts (25%),
store/classNames.ts (17%) — separate PRs
- [ ] Step 3b: thresholds + CI gate once baseline catches up
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
[Molecule-Platform-Evolvement-Manager]
Closes step 1+2 of #1815. Step 3 (CI gate + threshold) is split into
a follow-up because today's baseline is ~46% lines / ~45% statements,
not the 70% the issue's draft thresholds assumed.
## What this lands
- `canvas/vitest.config.ts` — `coverage` block with v8 provider,
reporters: text (terminal) / html (./coverage/index.html) /
json-summary (machine-readable for tooling). NO threshold —
pure observability.
- `canvas/package.json` — adds `test:coverage` script
(`vitest run --coverage`); existing `test` script is unchanged so
the default workflow is identical.
- `canvas/package-lock.json` — adds @vitest/coverage-v8@^4.1.5 (the
v8 provider Vitest uses for native coverage).
## Why no threshold yet
Issue draft threshold was 70%/70%/65%/70% (lines/funcs/branches/stmts).
Local baseline today:
```
Statements : 45.19% (3248/7186)
Branches : 39.87% (2034/5101)
Functions : 40.99% (724/1766)
Lines : 46.36% (2905/6265)
```
Turning on a 70% gate today would either fail CI immediately or get
papered over with an ad-hoc exclude list. Better path: land
observability now, run coverage in PR review for any new code
(via the new script), gate later when the baseline catches up.
## Heatmap (from local run, top gaps)
- `src/lib/runtime-names.ts` — 0% (untouched by tests)
- `src/lib/utils.ts` — 0%
- `src/lib/canvas-actions.ts` — 25%
- `src/store/classNames.ts` — 17%
- `src/store/canvas.ts` — 73% (already-tested but the largest absolute
gap by lines)
Each is a concrete follow-up issue / PR target.
## Test plan
- [x] `npx vitest run --coverage` runs cleanly locally (~10s) and
produces `./coverage/index.html` + a `coverage-summary.json`
- [x] Existing `npm run test` workflow unchanged — instrumentation
only activates with `--coverage` flag
- [x] No production-code changes — pure tooling addition
## Follow-ups (each tracked separately; this PR keeps minimal scope)
- Step 3a — write tests for the 0% files above (~tiny each)
- Step 3b — once baseline ≥ thresholds, add `thresholds` block to
vitest.config.ts + a `npm run test:coverage` step in
`.github/workflows/ci.yml`'s Canvas job
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
[Molecule-Platform-Evolvement-Manager]
Closes the fourth and final item from #2071 — but at a slightly
different layer than the issue listed: tests `dragUtils.ts` (the
74-LOC pure-ish geometry helpers) instead of the full 296-LOC
`useDragHandlers` hook. Rationale below.
15 cases across 2 buckets:
**shouldDetach (8):**
- child fully inside parent → false
- child drifted slightly past edge but under DETACH_FRACTION → false
- child past 20% threshold on X → true (un-nest)
- child past 20% threshold on Y → true (un-nest)
- missing child node → true (conservative fallback per source comment)
- missing parent node → true (same)
- measured size absent → falls back to React Flow's 220x120 defaults
(mirrors initial-mount race where measurement hasn't run yet)
- DETACH_FRACTION constant pinned at 0.2 (Miro/tldraw convention)
**clampChildIntoParent (7):**
- child already inside bounds → no-op (no setState — proven by
reference equality on mockState.nodes)
- drifted past top-left → clamps to (0, 0)
- drifted past bottom-right → clamps to (parentW - childW, parentH - childH)
- per-axis independence: X past edge + Y inside → only X clamps
- child not in store → early return, no setState
- child internalNode missing → early return, no setState
- multi-node store: clamping one node MUST NOT touch siblings
## Why dragUtils, not the full useDragHandlers hook
The hook (296 LOC) orchestrates React Flow drag events + Zustand
mutations. Testing it would need heavyweight `useReactFlow` +
internal-node + `setDragOverNode` / `nestNode` / `batchNest` /
`isDescendant` mocks just to drive event handlers — and the
*decisions* the hook makes all delegate to these two helpers:
- `shouldDetach` decides "is this a real un-nest?"
- `clampChildIntoParent` snaps the child back when the user drifted
slightly past the edge without holding Alt/Cmd
Pinning these locks the hot path the user feels. The hook's
remaining surface (modifier-key snapshotting, drop-target
broadcasting, commit-on-release grow pass) is plumbing — worth
testing as a follow-up if it ever regresses, but lower
correctness leverage per LOC of test setup.
## #2071 status after this PR
- [x] useTemplateDeploy (#2121)
- [x] A2AEdge (#2143)
- [x] OrgCancelButton (#2145)
- [x] dragUtils geometry helpers (this PR)
- [ ] Full useDragHandlers hook orchestration — explicit deferral
with rationale above
## Test plan
- [x] All 15 cases pass locally (`vitest run dragUtils.test.ts` — 131ms)
- [x] No changes to the SUT — pure additive coverage
- [ ] CI green
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
[Molecule-Platform-Evolvement-Manager]
Closes the third item from #2071 (Canvas test gaps follow-up). Builds
on the A2AEdge tests in PR #2143.
10 cases across 4 buckets:
**Render (2):**
- Default pill with `Cancel (N)` text + correct ARIA label
- Confirm dialog NOT visible until pill click
**Pill click (3):**
- Click flips to confirming view + stops propagation (so React Flow
doesn't interpret the click as a node selection)
- Confirm copy pluralizes correctly: count=1 → "Delete 1 workspace?",
count>1 → "Delete N workspaces?". Negative assertion guards against
the wrong-form regressing in either direction.
**No / cancel-confirm (1):**
- Click No → returns to pill, no API call, no store mutation
**Yes / cascade-delete (4):**
- Happy path: beginDelete locks the WHOLE subtree (root + children,
NOT unrelated workspace) → api.del("/workspaces/<id>?confirm=true")
→ optimistic store filter strips subtree, keeps unrelated → success
toast → endDelete in finally
- WS-event race: WS_REMOVED handler clears the root mid-flight. The
bail-out branch (`!postDeleteState.nodes.some(n => n.id === rootId)`)
must NOT then run a second optimistic filter. Pre-fix the post-await
subtree walk would miss any orphaned descendants whose parentId got
reparented upward by handleCanvasEvent — pinned now.
- Error path: api.del rejects → endDelete UNDOes the lock + error
toast surfaces the message → subtree STAYS in the store so the user
can retry / interact with the still-deploying nodes
- Non-Error rejection (e.g. string thrown directly): toast surfaces
the canned "Cancel failed" fallback instead of attempting `.message`
## Mocking
- `@/lib/api`, `@/components/Toaster`: simple spy mocks
- `@/store/canvas`: object that satisfies BOTH the selector pattern
(`useCanvasStore(s => s.x)`) AND `getState()` / `setState()` since
the cascade-delete handler walks the subtree via `getState()` and
mutates via `setState()` for the optimistic removal. `vi.hoisted`
preserves referential identity so the mock fns wired into the
state object are observed by every consumer.
## Test plan
- [x] All 10 cases pass locally (`vitest run OrgCancelButton.test.tsx` — ~990ms)
- [x] No changes to the SUT — pure additive coverage
- [ ] CI green
## #2071 progress after this PR
- [x] useTemplateDeploy (PR #2121)
- [x] A2AEdge (PR #2143)
- [x] OrgCancelButton (this PR)
- [ ] useDragHandlers — separate PR
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
When a target workspace's adapter has declared
provides_native_session=True (claude-code SDK's streaming session,
hermes-agent's in-container event log), the SDK owns its own queue/
session state. Adding the platform's a2a_queue layer on top would
double-buffer the same in-flight state — and worse, the platform
queue's drain timing has no relationship to the SDK's actual readiness,
so the queued request might dispatch while the SDK is STILL busy.
Behavior change: in handleA2ADispatchError, when isUpstreamBusyError(err)
fires and the target declared native_session, return 503 + Retry-After
directly without enqueueing. The caller's adapter handles retry on
its own schedule, and the SDK's own queue absorbs the request when
ready. Response body carries native_session=true so callers can
distinguish this from queue-failure 503s.
Observability is preserved: logA2AFailure still runs above; the
broadcaster still fires; the activity_logs row records the busy event
just like the platform-fallback path.
This is the consumer that validates the template-side declarations
already shipped in:
- molecule-ai-workspace-template-claude-code PR #12
- molecule-ai-workspace-template-hermes PR #25
Once those merge + image tags bump, claude-code + hermes workspaces'
busy 503s skip the platform queue end-to-end. End-to-end validation
of capability primitive #5.
Tests (2 new):
- NativeSession_SkipsEnqueue: cache pre-populated, deliberate
sqlmock with NO INSERT INTO a2a_queue expected — implicit
regression cover (sqlmock fails on unexpected queries). Asserts
503 + Retry-After + native_session=true marker in body.
- NoNativeSession_StillEnqueues: negative pin — empty cache, same
busy error → falls through to EnqueueA2A (which fails in this
test, falls through to legacy 503 without native_session marker).
Verification:
- All Go handlers tests pass (2 new + existing)
- go build + go vet clean
See project memory `project_runtime_native_pluggable.md`.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
[Molecule-Platform-Evolvement-Manager]
Closes the second item from #2071 (Canvas test gaps follow-up):
adds behavioural coverage for the custom React Flow edge that renders
delegation counts between workspaces and routes a click into the
source workspace's Activity feed.
10 cases across 2 buckets:
**Render (6):**
- Empty label → BaseEdge only, NO portaled HTML pill (the most
common state for cold edges; pill must not render-through-empty)
- Non-empty label → pill renders with the exact label text
- isHot=true → violet accent classes; blue accent NOT present
- isHot=false → blue accent classes
- ARIA pluralization: count=1 → "1 delegation from …" (singular)
- ARIA pluralization: count=7 → "7 delegations from …" (plural)
**Click behaviour (4):**
- Click → selectNode(source)
- FRESH selection (selectedNodeId != source) → also setPanelTab("activity")
- RE-click of already-selected source → setPanelTab MUST NOT fire
(this is the regression-locked guarantee — preserves the user's
current tab when they intentionally moved to Chat / Memory while
inspecting the same peer)
- stopPropagation: parent onClick must NOT see the event (otherwise
the canvas Pane's clear-selection handler would fire and undo the
edge's own selectNode call)
## Mocking strategy
- `@xyflow/react`: BaseEdge → <g data-testid>, EdgeLabelRenderer →
inline pass-through (no portal), getBezierPath → fixed [path, x, y].
Lets the test render the component without a ReactFlow provider.
- `@/store/canvas`: vi.hoisted-shared mock state with selectNode +
setPanelTab spies and a mutable selectedNodeId. The store's
getState() returns the same object so the click handler's
`useCanvasStore.getState().selectedNodeId` lookup works.
Pattern matches the existing `A2ATopologyOverlay.test.tsx` setup
in the same module.
## Test plan
- [x] All 10 cases pass locally (`vitest run A2AEdge.test.tsx` — ~1.3s)
- [x] No changes to the SUT — pure additive coverage
- [ ] CI green
## Remaining #2071 items
- OrgCancelButton tests
- useDragHandlers tests
Each is a separate PR.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>