Closes task #123 — last piece of #87 cleanup.
Pre-fix: workspace/preflight.py:11 hardcoded a tuple of "supported"
runtime names (claude-code, codex, ollama, langgraph, etc.). Every
new template repo required a code change in molecule-runtime to be
recognized — direct violation of the universal-runtime principle
(#87) where adapters declare themselves and the runtime stays generic.
Post-fix: discovery-based validation via the same ADAPTER_MODULE env
var that production load paths already consult
(workspace/adapters/__init__.py:get_adapter). Distinguished failure
modes so operator messages are concrete:
- ADAPTER_MODULE unset → "no adapter installed; set the env var"
- ADAPTER_MODULE set but module won't import → import error type +
message
- module imports but no Adapter class → "convention violation, add
`Adapter = YourClass`"
- Adapter.name() raises → caught with operator message
- Adapter.name() returns non-string → contract violation message
- Adapter.name() doesn't match config.runtime → drift WARNING (not
fatal; the adapter wins in production, config.yaml is just
documentation)
The drift case is the one behavioral change worth calling out: the
prior static-list path would have hard-failed config.runtime values
not in the allowlist. With discovery, an unknown runtime in
config.yaml is just a documentation drift — the adapter that's
actually installed runs regardless. Operator gets a warning naming
both the configured and installed names so they can fix whichever
is stale.
Tests:
- Replaces the obsolete "static list pass/fail" tests with 6 new
cases covering each distinguished failure mode, plus a positive
test for the adapter-matches-config happy path
- Adds an autouse `_default_langgraph_adapter` fixture that
pre-installs a fake adapter via sys.modules monkey-patching, so
existing tests building default WorkspaceConfig (runtime="langgraph")
inherit a valid adapter without each test setting ADAPTER_MODULE
- Failure-mode tests opt out of the default fixture via
@pytest.mark.no_default_adapter (registered in pytest.ini)
- Sentinel pattern (`_UNSET = object()`) for `name_returns` so None
is a passable test value (otherwise `is not None` would skip the
None branch — exact bug the sentinel avoids)
Verification:
- 22/22 preflight tests pass (was 16; +6 new failure-path tests)
- 1256/1256 workspace pytest pass (was 1251; +5 net)
- No production code path other than preflight changed
Source: 2026-04-27 #87 cleanup audit after PR #2154 (wedge extraction).
This change is independent of the cli_executor.py template moves
(task #122) — completes one of the two remaining cleanup items.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Addresses github-code-quality unused-import flag on the runtime_wedge
re-export shim. Adds __all__ listing the names that exist purely for
backwards-compat (is_wedged / wedge_reason / _reset_sdk_wedge_for_test)
so static analysis recognizes the imports as deliberate exports.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three changes from /code-review-and-quality on PR #2154:
1. Optional (architecture): wrap state in a private _WedgeState class
instead of bare module-level globals. Public API (mark_wedged /
clear_wedge / is_wedged / wedge_reason / reset_for_test) is
unchanged — adapters never see the class. The class is forward-cover
for any future per-scope variant (multiple executors per process, a
keyed registry, etc.) without churning the call sites. Today there's
exactly one instance (_DEFAULT) so behavior is identical.
2. Optional (readability): clarify the import path in the integration
recipe — in a TEMPLATE repo it's `from molecule_runtime.runtime_wedge`
(PyPI package); in molecule-core itself it's `from runtime_wedge`
(top-level module). Removes the trap where a contributor reading the
docstring while editing in-repo copies the template-style import and
gets ImportError.
3. Nit (readability): dedupe the shim rationale. claude_sdk_executor's
re-export comment now points to runtime_wedge's "Compatibility shim"
section as the source of truth instead of restating the same content.
Avoids docs-in-two-places drift risk.
Verification:
- 1251/1251 workspace pytest pass (no behavior change — class wrap
is pure plumbing; module-level helpers delegate to the singleton)
- All shim re-export identity tests still pass (the shim's
`is_wedged is runtime_wedge.is_wedged` assertion holds because we
re-export the SAME function object that delegates to _DEFAULT)
No new tests needed — the existing test suite covers the public API
contract; the class is an implementation detail behind that contract.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Doc-only follow-up to the wedge-state extraction. Adds proactive
guidance so the next adapter (hermes / codex / langgraph / a future
template) discovers the runtime_wedge primitive and integrates the
~6 LOC pattern uniformly instead of inventing its own wedge state.
Two additions:
- workspace/runtime_wedge.py — new "How to use from a NEW adapter"
section in the module docstring with the minimum viable
integration recipe, what-you-get-for-free list, and explicit
DON'TS (don't store local wedge state, don't mark for transient
errors, don't write your own clear logic). Plus a "when wedge is
the WRONG primitive" note to keep adopters from over-using it.
- workspace/adapter_base.py — adds runtime_wedge to the
"Cross-cutting capabilities your adapter can opt into" list in
BaseAdapter's docstring (alongside capabilities() and
idle_timeout_override()). Discoverability path: adapter author
reads BaseAdapter docstring → sees runtime_wedge mention → reads
runtime_wedge module docstring → has the recipe.
Also tightens the "to add a new agent infra" steps in BaseAdapter to
match the actual current model (standalone template repo + ADAPTER_MODULE
env var) rather than the obsolete workspace/adapters/<infra>/ layout
that hasn't been the path since the universal-runtime extraction
started.
Zero code change. Tests untouched (1251/1251 still pass).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Prerequisite for the universal-runtime refactor (task #87) to move
claude_sdk_executor.py out of molecule-runtime into the claude-code
template repo. heartbeat.py had a hard import:
from claude_sdk_executor import is_wedged, wedge_reason
which would break the moment the executor moves out of the runtime
package — the heartbeat would lose access to the wedge state used to
flip workspace status to degraded.
Extract the wedge state to a runtime-side module that the heartbeat
can keep importing regardless of which adapter executor is wedged:
- workspace/runtime_wedge.py — single-flag state + mark_wedged /
clear_wedge / is_wedged / wedge_reason / reset_for_test. Same
semantics as the original claude_sdk_executor implementation
(sticky first-write-wins, auto-clear on observed success). 100
LOC of pure stateless helpers; lock-free ok because there's one
executor per workspace process today.
- workspace/claude_sdk_executor.py — drops the in-file definitions;
re-exports the same names from runtime_wedge as a backwards-compat
shim. Any third-party adapter that imported is_wedged / wedge_reason
/ _mark_sdk_wedged from claude_sdk_executor keeps working for one
release cycle while they migrate to runtime_wedge.
- workspace/heartbeat.py — _runtime_state_payload() now imports
from runtime_wedge instead of claude_sdk_executor. Lazy-import
pattern preserved; the docstring updated to explain the new
cross-cutting source-of-truth.
Tests (10 new in test_runtime_wedge.py):
- Default state (unwedged), mark sets flag, first-write-wins,
clear restores healthy, clear-when-not-wedged is no-op,
re-marking after clear is allowed
- Re-export shim: each old name in claude_sdk_executor IS the
runtime_wedge function (identity check), state is shared
(marking via the executor shim is observable via runtime_wedge
and vice versa)
Verification:
- 1251/1251 workspace pytest pass (was 1241 after orphan deletion;
+10 = exactly the new test_runtime_wedge.py cases)
- All existing test_claude_sdk_executor.py cases (which call
_mark_sdk_wedged via the shim) still pass
After this lands + the claude-code template image rebuilds with the
local claude_sdk_executor.py copy (template PR #13), the molecule-
core deletion of workspace/claude_sdk_executor.py becomes safe (the
shim deletion comes alongside the file deletion, since runtime_wedge
is the new public API).
See project memory `project_runtime_native_pluggable.md`.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
[Molecule-Platform-Evolvement-Manager]
## What was broken
Same bug class as the secret-scan.yml fix in #2120 — block-internal-paths
hit `fatal: bad object <sha>` exit 128 on the staging push at
2026-04-27 06:50:33Z.
Two cases:
1. **`merge_group` events**: BASE/HEAD came from
`github.event.before` / `.after` which are push-event-only
properties. On merge_group both came back empty, the script fell
through to "scan entire tree" mode which is correct but
inefficient. Worse, when this workflow is required for the merge
queue (line 21-22), an empty-BASE entire-tree scan would run on
every queue check.
2. **`push` events with shallow clones**: `fetch-depth: 2` doesn't
always cover BASE across true merge commits. When BASE is in the
payload but absent from the local object DB, `git diff` errors out
with `fatal: bad object <sha>` and the job exits 128. This is what
broke today's staging push.
## Fix
Same shape as the secret-scan.yml fix (#2120):
- Add a dedicated `git fetch` step for `merge_group.base_sha`.
- Move event-specific SHAs into a step `env:` block; script uses a
`case` over `${{ github.event_name }}` covering pull_request /
merge_group / push (rather than `if pull_request / else push`
which left merge_group on the empty-BASE branch).
- On-demand fetch + `git cat-file -e` guard for push BASE so a SHA
that's payload-present-but-DB-absent triggers the fetch, and a
fetch failure falls through cleanly to "scan entire tree" instead
of exiting 128.
## Test plan
- [x] YAML structure preserved (no schema changes)
- [x] Bash logic mirrors the secret-scan recovery path tested in #2120
- [ ] CI green on this PR's pull_request scan + push to staging post-merge
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Removes:
- workspace/hermes_executor.py (545 LOC) — HermesA2AExecutor, an
OpenAI-compat direct-call executor that was the original hermes
integration before the template was rewritten to bridge to
hermes-agent's sidecar API server.
- workspace/tests/test_hermes_executor.py (1307 LOC) — its test file.
Verified-dead-code analysis:
- Zero `from hermes_executor` / `import hermes_executor` imports
anywhere in workspace/, workspace-server/, or
workspace-configs-templates/ (excluding the file itself + its test).
- The hermes template (workspace-configs-templates/hermes/executor.py)
uses HermesAgentProxyExecutor, NOT HermesA2AExecutor — they're
independent implementations. The executor.py file imports from
`executor` (local), not from molecule_runtime.
- Last touched in PR #1974 (2026 a2a-sdk migration to 1.0.0) for SDK
compatibility — kept compiling but never wired into any code path.
- Older than that, only the 2026 open-source restructure rename.
Why now: starting task #87 (universal-runtime violation, move adapter-
specific code out of workspace/). Dead-code deletion is the safest
first step and motivates the broader refactor by clearing the
landscape — no risk of someone defending HermesA2AExecutor as
"actually used somewhere."
Verification:
- 1241/1241 workspace pytest pass (was 1312; the 71 dropped tests
are exactly test_hermes_executor.py's coverage)
- No new failures, no broken imports anywhere
The remaining adapter-specific executors in workspace/ that #87 will
eventually relocate (per the user's scope: claude-code + hermes priority,
others later):
- workspace/claude_sdk_executor.py (757 LOC) → claude-code template repo
- workspace/cli_executor.py (461 LOC) → defer (codex/ollama/etc still
use the runtime presets here; comes back later when those bump versions)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
[Molecule-Platform-Evolvement-Manager]
Continues the #1815 coverage rollup. classNames.ts was at 17%
in the baseline; this PR brings it to full coverage.
16 cases across 3 helpers:
**appendClass (6):**
- undefined / empty existing → just `cls`
- single-class → "a b" join
- DEDUP: existing already contains `cls` → existing unchanged.
This is the load-bearing reason classNames.ts exists. Pre-helper
the call sites inlined `${existing} ${cls}` with no dedup, so a
tick that fired the same class twice produced "a a" and React
Flow's className-equality diff saw it as a change every render.
- whitespace normalization (multi-space, leading/trailing)
**removeClass (7):**
- undefined / empty existing → ""
- removes named class
- exact match only ("spawn" must NOT match "spawn-fast")
- removing the only class → ""
- no-op when class absent
- whitespace normalization
**scheduleNodeClassRemoval (3):**
- after delayMs: calls set() with className-removed on target node;
OTHER nodes untouched (the per-id pruning is the contract — pin
it so a future refactor that maps over all nodes doesn't silently
strip classes from siblings)
- does NOT fire before the delay elapses (vi.useFakeTimers + advance)
- SSR safety: when window is undefined, function is a no-op
(neither get nor set fires)
## Note on test environment
Added `// @vitest-environment jsdom` directive — the file's
default `node` environment leaves `window` undefined, which would
make the SSR-guard happy-path test pass for the wrong reason
(every test would short-circuit). With jsdom, the SSR test
explicitly stubs `window` to undefined to exercise the guard.
## Test plan
- [x] All 16 cases pass locally (~1.1s with jsdom env spin-up)
- [x] No SUT changes
- [ ] CI green
## #1815 progress
- [x] Step 1+2: instrumentation (#2147)
- [x] utils.ts + runtime-names.ts (#2148)
- [x] canvas-actions.ts (#2149)
- [x] store/classNames.ts (this PR)
- [ ] store/canvas.ts (73% — biggest absolute gap; bigger surface,
separate cycle)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Self-contained happy-path E2E for the two runtimes the project commits
to first-class support for (task #116, completes the loop on the
"both must work end-to-end with tests" requirement).
What it proves per runtime:
1. POST /workspaces succeeds with the runtime + secrets
2. Workspace reaches status=online within its cold-boot window
(claude-code: 240s, hermes: 900s on cold apt + uv + sidecar)
3. POST /a2a (message/send "Reply with PONG") returns a non-error,
non-empty reply
4. activity_logs row written with method=message/send and ok|error
status (a2a_proxy.LogActivity contract)
Skip semantics: each phase independently checks for its required env
key (CLAUDE_CODE_OAUTH_TOKEN / E2E_OPENAI_API_KEY) and skips cleanly
if absent. The script always exit-0s if every phase either passed or
skipped — so wiring it into a no-keys CI job validates the script
itself stays clean without false-failing.
Idempotent: pre-sweeps any prior "Priority E2E (claude-code)" /
"Priority E2E (hermes)" workspaces so a run interrupted by SIGPIPE /
kill -9 (which bypasses the EXIT trap) doesn't poison the next run.
Same defensive pattern as test_notify_attachments_e2e.sh.
CI wiring:
- e2e-api.yml — runs on every PR with no LLM keys, both phases skip,
catches script-level regressions (set -u bugs, syntax issues, etc.)
- canary-staging.yml + e2e-staging-saas.yml already have the keys
via secrets.MOLECULE_STAGING_OPENAI_KEY and exercise wire-real
behavior — could be wired to opt-in if you want claude-code coverage
there too.
Local runs (from this branch, no keys):
=== Results: 0 passed, 0 failed, 2 skipped ===
Validates the capability primitives shipped in PRs #2137-2144: once
template PRs #12 (claude-code) + #25 (hermes) merge with their
declared provides_native_session=True + idle_timeout_override=900,
a manual run with both keys validates the full native+pluggable chain.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
[Molecule-Platform-Evolvement-Manager]
Continues the #1815 coverage rollup. canvas-actions.ts was at 25%
in the baseline run from #2147; this PR brings the file's two
helpers to full coverage.
5 cases:
**markAllWorkspacesNeedRestart (3):**
- calls updateNodeData on every node with `{needsRestart: true}`
- no-op when the canvas has zero workspaces
- preserves call ordering — matters because the toolbar's
Restart Pending pill observes per-node data changes
incrementally; a refactor that shuffled iteration order would
silently change which workspaces flash first
**markWorkspaceNeedsRestart (2):**
- targeted call: updateNodeData fires exactly once on the named id
- defensive: regardless of how many other workspaces exist in the
store, only the target workspace gets updated. Pre-this-test, a
refactor that accidentally wired this function through the
per-node iteration path of markAll would silently mark every
workspace — pinning the cardinality here catches that.
## Mock strategy
Standard pattern for canvas store: mock useCanvasStore as both the
selector function AND a getState()-bearing object. updateNodeData
is a vi.fn() spy so the test asserts on calls + args directly.
## Test plan
- [x] All 5 cases pass locally (~132ms)
- [x] No SUT changes — pure additive coverage
- [ ] CI green
## #1815 progress
- [x] Step 1+2: instrumentation + script (#2147)
- [x] utils.ts + runtime-names.ts (#2148)
- [x] canvas-actions.ts (this PR)
- [ ] Remaining low-coverage targets: store/classNames.ts (17%),
store/canvas.ts (73% — largest absolute gap by lines)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
[Molecule-Platform-Evolvement-Manager]
Closes two of the 0%-coverage files surfaced by the baseline run in
PR #2147 (vitest coverage instrumentation). Both files are tiny
utility helpers with high-touch read paths.
## utils.cn (8 cases)
Wraps `twMerge(clsx(inputs))` — every conditionally-styled component
flows through here. The load-bearing case is the **last-wins
Tailwind dedup**: `cn("p-2", "p-4")` → "p-4". A regression that lost
twMerge would silently double-apply utilities (cosmetically broken,
breaks `:where()` rules + theme overrides).
Cases:
- single class unchanged
- multiple positional classes joined
- array input flattening (clsx)
- object syntax with truthy/falsy keys
- last-wins dedup on conflicting Tailwind utilities (the
regression-locked guarantee)
- non-conflicting utilities both survive (p-2 + m-4)
- mixed input shapes (string + array + object + string)
- nullish / empty inputs don't throw
## runtime-names.runtimeDisplayName (4 it.each cases + 3 it())
Friendly-name lookup that surfaces the workspace runtime in the chat
indicator, details tab, and a few component labels.
Cases:
- known runtimes map to display strings
(claude-code → Claude Code, langgraph → LangGraph, etc.)
- unknown runtime falls back to input string verbatim
(a NEW runtime not yet in the lookup still renders something
operator-debuggable rather than a generic placeholder)
- empty string falls back to "agent" (final default)
- case-sensitivity pinned: "Claude-Code" / "LANGGRAPH" miss the
lookup. The upstream slug is already normalized lowercase, so a
future refactor that lowercases input "for safety" would
silently change behavior — pinning the contract here.
## Test plan
- [x] All 17 cases pass locally (~129ms)
- [x] No SUT changes — pure additive coverage
- [ ] CI green
## #1815 progress
- [x] Step 1+2: coverage instrumentation + script (#2147)
- [x] 0%-file gaps utils.ts + runtime-names.ts (this PR)
- [ ] More 0%/low-coverage files: lib/canvas-actions.ts (25%),
store/classNames.ts (17%) — separate PRs
- [ ] Step 3b: thresholds + CI gate once baseline catches up
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
[Molecule-Platform-Evolvement-Manager]
Closes step 1+2 of #1815. Step 3 (CI gate + threshold) is split into
a follow-up because today's baseline is ~46% lines / ~45% statements,
not the 70% the issue's draft thresholds assumed.
## What this lands
- `canvas/vitest.config.ts` — `coverage` block with v8 provider,
reporters: text (terminal) / html (./coverage/index.html) /
json-summary (machine-readable for tooling). NO threshold —
pure observability.
- `canvas/package.json` — adds `test:coverage` script
(`vitest run --coverage`); existing `test` script is unchanged so
the default workflow is identical.
- `canvas/package-lock.json` — adds @vitest/coverage-v8@^4.1.5 (the
v8 provider Vitest uses for native coverage).
## Why no threshold yet
Issue draft threshold was 70%/70%/65%/70% (lines/funcs/branches/stmts).
Local baseline today:
```
Statements : 45.19% (3248/7186)
Branches : 39.87% (2034/5101)
Functions : 40.99% (724/1766)
Lines : 46.36% (2905/6265)
```
Turning on a 70% gate today would either fail CI immediately or get
papered over with an ad-hoc exclude list. Better path: land
observability now, run coverage in PR review for any new code
(via the new script), gate later when the baseline catches up.
## Heatmap (from local run, top gaps)
- `src/lib/runtime-names.ts` — 0% (untouched by tests)
- `src/lib/utils.ts` — 0%
- `src/lib/canvas-actions.ts` — 25%
- `src/store/classNames.ts` — 17%
- `src/store/canvas.ts` — 73% (already-tested but the largest absolute
gap by lines)
Each is a concrete follow-up issue / PR target.
## Test plan
- [x] `npx vitest run --coverage` runs cleanly locally (~10s) and
produces `./coverage/index.html` + a `coverage-summary.json`
- [x] Existing `npm run test` workflow unchanged — instrumentation
only activates with `--coverage` flag
- [x] No production-code changes — pure tooling addition
## Follow-ups (each tracked separately; this PR keeps minimal scope)
- Step 3a — write tests for the 0% files above (~tiny each)
- Step 3b — once baseline ≥ thresholds, add `thresholds` block to
vitest.config.ts + a `npm run test:coverage` step in
`.github/workflows/ci.yml`'s Canvas job
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
[Molecule-Platform-Evolvement-Manager]
Closes the fourth and final item from #2071 — but at a slightly
different layer than the issue listed: tests `dragUtils.ts` (the
74-LOC pure-ish geometry helpers) instead of the full 296-LOC
`useDragHandlers` hook. Rationale below.
15 cases across 2 buckets:
**shouldDetach (8):**
- child fully inside parent → false
- child drifted slightly past edge but under DETACH_FRACTION → false
- child past 20% threshold on X → true (un-nest)
- child past 20% threshold on Y → true (un-nest)
- missing child node → true (conservative fallback per source comment)
- missing parent node → true (same)
- measured size absent → falls back to React Flow's 220x120 defaults
(mirrors initial-mount race where measurement hasn't run yet)
- DETACH_FRACTION constant pinned at 0.2 (Miro/tldraw convention)
**clampChildIntoParent (7):**
- child already inside bounds → no-op (no setState — proven by
reference equality on mockState.nodes)
- drifted past top-left → clamps to (0, 0)
- drifted past bottom-right → clamps to (parentW - childW, parentH - childH)
- per-axis independence: X past edge + Y inside → only X clamps
- child not in store → early return, no setState
- child internalNode missing → early return, no setState
- multi-node store: clamping one node MUST NOT touch siblings
## Why dragUtils, not the full useDragHandlers hook
The hook (296 LOC) orchestrates React Flow drag events + Zustand
mutations. Testing it would need heavyweight `useReactFlow` +
internal-node + `setDragOverNode` / `nestNode` / `batchNest` /
`isDescendant` mocks just to drive event handlers — and the
*decisions* the hook makes all delegate to these two helpers:
- `shouldDetach` decides "is this a real un-nest?"
- `clampChildIntoParent` snaps the child back when the user drifted
slightly past the edge without holding Alt/Cmd
Pinning these locks the hot path the user feels. The hook's
remaining surface (modifier-key snapshotting, drop-target
broadcasting, commit-on-release grow pass) is plumbing — worth
testing as a follow-up if it ever regresses, but lower
correctness leverage per LOC of test setup.
## #2071 status after this PR
- [x] useTemplateDeploy (#2121)
- [x] A2AEdge (#2143)
- [x] OrgCancelButton (#2145)
- [x] dragUtils geometry helpers (this PR)
- [ ] Full useDragHandlers hook orchestration — explicit deferral
with rationale above
## Test plan
- [x] All 15 cases pass locally (`vitest run dragUtils.test.ts` — 131ms)
- [x] No changes to the SUT — pure additive coverage
- [ ] CI green
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
[Molecule-Platform-Evolvement-Manager]
Closes the third item from #2071 (Canvas test gaps follow-up). Builds
on the A2AEdge tests in PR #2143.
10 cases across 4 buckets:
**Render (2):**
- Default pill with `Cancel (N)` text + correct ARIA label
- Confirm dialog NOT visible until pill click
**Pill click (3):**
- Click flips to confirming view + stops propagation (so React Flow
doesn't interpret the click as a node selection)
- Confirm copy pluralizes correctly: count=1 → "Delete 1 workspace?",
count>1 → "Delete N workspaces?". Negative assertion guards against
the wrong-form regressing in either direction.
**No / cancel-confirm (1):**
- Click No → returns to pill, no API call, no store mutation
**Yes / cascade-delete (4):**
- Happy path: beginDelete locks the WHOLE subtree (root + children,
NOT unrelated workspace) → api.del("/workspaces/<id>?confirm=true")
→ optimistic store filter strips subtree, keeps unrelated → success
toast → endDelete in finally
- WS-event race: WS_REMOVED handler clears the root mid-flight. The
bail-out branch (`!postDeleteState.nodes.some(n => n.id === rootId)`)
must NOT then run a second optimistic filter. Pre-fix the post-await
subtree walk would miss any orphaned descendants whose parentId got
reparented upward by handleCanvasEvent — pinned now.
- Error path: api.del rejects → endDelete UNDOes the lock + error
toast surfaces the message → subtree STAYS in the store so the user
can retry / interact with the still-deploying nodes
- Non-Error rejection (e.g. string thrown directly): toast surfaces
the canned "Cancel failed" fallback instead of attempting `.message`
## Mocking
- `@/lib/api`, `@/components/Toaster`: simple spy mocks
- `@/store/canvas`: object that satisfies BOTH the selector pattern
(`useCanvasStore(s => s.x)`) AND `getState()` / `setState()` since
the cascade-delete handler walks the subtree via `getState()` and
mutates via `setState()` for the optimistic removal. `vi.hoisted`
preserves referential identity so the mock fns wired into the
state object are observed by every consumer.
## Test plan
- [x] All 10 cases pass locally (`vitest run OrgCancelButton.test.tsx` — ~990ms)
- [x] No changes to the SUT — pure additive coverage
- [ ] CI green
## #2071 progress after this PR
- [x] useTemplateDeploy (PR #2121)
- [x] A2AEdge (PR #2143)
- [x] OrgCancelButton (this PR)
- [ ] useDragHandlers — separate PR
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
When a target workspace's adapter has declared
provides_native_session=True (claude-code SDK's streaming session,
hermes-agent's in-container event log), the SDK owns its own queue/
session state. Adding the platform's a2a_queue layer on top would
double-buffer the same in-flight state — and worse, the platform
queue's drain timing has no relationship to the SDK's actual readiness,
so the queued request might dispatch while the SDK is STILL busy.
Behavior change: in handleA2ADispatchError, when isUpstreamBusyError(err)
fires and the target declared native_session, return 503 + Retry-After
directly without enqueueing. The caller's adapter handles retry on
its own schedule, and the SDK's own queue absorbs the request when
ready. Response body carries native_session=true so callers can
distinguish this from queue-failure 503s.
Observability is preserved: logA2AFailure still runs above; the
broadcaster still fires; the activity_logs row records the busy event
just like the platform-fallback path.
This is the consumer that validates the template-side declarations
already shipped in:
- molecule-ai-workspace-template-claude-code PR #12
- molecule-ai-workspace-template-hermes PR #25
Once those merge + image tags bump, claude-code + hermes workspaces'
busy 503s skip the platform queue end-to-end. End-to-end validation
of capability primitive #5.
Tests (2 new):
- NativeSession_SkipsEnqueue: cache pre-populated, deliberate
sqlmock with NO INSERT INTO a2a_queue expected — implicit
regression cover (sqlmock fails on unexpected queries). Asserts
503 + Retry-After + native_session=true marker in body.
- NoNativeSession_StillEnqueues: negative pin — empty cache, same
busy error → falls through to EnqueueA2A (which fails in this
test, falls through to legacy 503 without native_session marker).
Verification:
- All Go handlers tests pass (2 new + existing)
- go build + go vet clean
See project memory `project_runtime_native_pluggable.md`.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
[Molecule-Platform-Evolvement-Manager]
Closes the second item from #2071 (Canvas test gaps follow-up):
adds behavioural coverage for the custom React Flow edge that renders
delegation counts between workspaces and routes a click into the
source workspace's Activity feed.
10 cases across 2 buckets:
**Render (6):**
- Empty label → BaseEdge only, NO portaled HTML pill (the most
common state for cold edges; pill must not render-through-empty)
- Non-empty label → pill renders with the exact label text
- isHot=true → violet accent classes; blue accent NOT present
- isHot=false → blue accent classes
- ARIA pluralization: count=1 → "1 delegation from …" (singular)
- ARIA pluralization: count=7 → "7 delegations from …" (plural)
**Click behaviour (4):**
- Click → selectNode(source)
- FRESH selection (selectedNodeId != source) → also setPanelTab("activity")
- RE-click of already-selected source → setPanelTab MUST NOT fire
(this is the regression-locked guarantee — preserves the user's
current tab when they intentionally moved to Chat / Memory while
inspecting the same peer)
- stopPropagation: parent onClick must NOT see the event (otherwise
the canvas Pane's clear-selection handler would fire and undo the
edge's own selectNode call)
## Mocking strategy
- `@xyflow/react`: BaseEdge → <g data-testid>, EdgeLabelRenderer →
inline pass-through (no portal), getBezierPath → fixed [path, x, y].
Lets the test render the component without a ReactFlow provider.
- `@/store/canvas`: vi.hoisted-shared mock state with selectNode +
setPanelTab spies and a mutable selectedNodeId. The store's
getState() returns the same object so the click handler's
`useCanvasStore.getState().selectedNodeId` lookup works.
Pattern matches the existing `A2ATopologyOverlay.test.tsx` setup
in the same module.
## Test plan
- [x] All 10 cases pass locally (`vitest run A2AEdge.test.tsx` — ~1.3s)
- [x] No changes to the SUT — pure additive coverage
- [ ] CI green
## Remaining #2071 items
- OrgCancelButton tests
- useDragHandlers tests
Each is a separate PR.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three small wins from the hermes-agent design survey, bundled because
each is too small for its own PR but they all improve the priority
adapters (claude-code + hermes) immediately.
1. Hermes-style cap on telemetry fields, applied INSIDE report_activity
so every caller benefits without remembering. error_detail capped at
4096 (hermes' value); summary capped at 256 (one-liner ceiling). The
existing call site in tool_delegate_task already truncated error_detail
at 4096, but moving the cap into the helper closes the door on a
future caller pasting a giant traceback. response_text is NOT capped
(it's the agent's user-visible reply; truncating would silently drop
content). Pinned by 4 new tests including a negative-pin that
response_text MUST stay untruncated.
2. Sharper MCP tool descriptions for commit_memory + recall_memory —
hermes' delegate_task description literally says "WAIT for the response"
and delegate_task_async says "Returns immediately." LLMs pick the
right tool variant from descriptions; ambiguity costs accuracy.
- commit_memory now states it APPENDS (each call creates a row, no
overwrite) and that GLOBAL requires tier 0.
- recall_memory now states it's case-insensitive substring search
with no pagination, returns all matches, and that empty-query is
cheap and safer than a narrow keyword.
3. (no code change) Filed task #120 for the bigger user-flow win — a
per-workspace tool enable/disable menu in Canvas Config — and task
#121 for model-string passthrough (depends on #87 universal-runtime
refactor).
Verification:
- 1312/1312 Python pytest pass (was 1308, +4 new)
See task #119 for the architectural follow-ups (event-log layer,
declarative skill compat, observability config block) and project
memory `project_runtime_native_pluggable.md`.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
When an adapter declares provides_native_status_mgmt=True (because its
SDK reports its own ready/degraded/failed state explicitly), the
platform's error-rate-based status inference fights the adapter's own
state machine. This PR gates the inference branches on the capability
flag — adapter-driven transitions become authoritative.
Components:
- registry.go evaluateStatus: gate the two inferred-status branches
(online → degraded when error_rate ≥ 0.5; degraded → online when
error_rate < 0.1 and runtime_state is empty) behind a check of
runtimeOverrides.HasCapability("status_mgmt").
- The wedged-branch (RuntimeState == "wedged" → degraded) is NOT
gated. That path is the adapter's OWN self-report, not platform
inference, and stays active under native_status_mgmt — adapters
can still drive transitions via runtime_state.
Python side: no change. The capability map is already serialized via
RuntimeCapabilities.to_dict() in PR #2137 and sent in the heartbeat's
runtime_metadata block via PR #2139. An adapter setting
RuntimeCapabilities(provides_native_status_mgmt=True) automatically
flows through.
Tests (3 new):
- SkipsDegradeInference: error_rate=0.8 + currentStatus=online + native
flag set → degrade UPDATE does NOT fire (sqlmock fails on unexpected
query, which is the regression cover)
- SkipsRecovery: error_rate=0.05 + currentStatus=degraded + native →
recovery UPDATE does NOT fire
- WedgedStillRespected: runtime_state="wedged" + native → wedged
branch DOES fire (adapter self-report stays active)
Verification:
- All Go handlers tests pass (3 new + existing)
- 1308/1308 Python pytest pass (unchanged — Python side unmodified)
- go build + go vet clean
Stacked on #2140 (already merged via cascade); branch is current with
staging since #2139 and #2140 merged.
See project memory `project_runtime_native_pluggable.md`.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Reviewer bot flagged: import was leftover from earlier scaffolding —
all test fixtures use sys.modules monkey-patching with SimpleNamespace
instead. Drop to unblock merge. Tests still 5/5 pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Reviewer bot flagged: ChatTab.tsx imported extractResponseText but
no longer used it after the loop body moved to historyHydration.ts
(the helper imports it directly). Drop from the named import to
unblock merge. extractFilesFromTask remains used at line 515 for the
WS A2A_RESPONSE handler's reply-files extraction.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
When an adapter declares provides_native_scheduler=True (because its
SDK has built-in cron / Temporal-style workflows), the platform's
polling loop must skip firing schedules for that workspace — otherwise
the schedule fires twice (once natively, once via platform). The
native skip preserves observability (next_run_at still advances, the
schedule row stays in the DB, last_run_at would still update) while
moving the FIRE responsibility to the SDK.
Stacked on PR #2139 (idle_timeout_override end-to-end). The
RuntimeMetadata heartbeat block already carries the capability map;
this PR teaches the platform how to read and act on the scheduler bit.
Components:
- handlers/runtime_overrides.go: extended the cache to store
capability flags alongside idle timeout. Two heartbeat fields are
independent — SetIdleTimeout / SetCapabilities each update one
without stomping the other. Defensive copy on SetCapabilities so
a caller mutating its map after the call doesn't retroactively
change cached declarations. Empty entries dropped to avoid stale
husks.
- handlers/runtime_overrides.go: new HasCapability(workspaceID, name)
+ ProvidesNativeScheduler(workspaceID) — the latter is the
package-level adapter the scheduler imports (avoids a
handlers/scheduler import cycle).
- handlers/registry.go: heartbeat handler now calls SetCapabilities
in addition to SetIdleTimeout.
- scheduler/scheduler.go: NativeSchedulerCheck function-pointer DI
(mirrors the existing QueueDrainFunc pattern). New() leaves the
field nil so existing callers preserve today's "always fire"
behavior. SetNativeSchedulerCheck wires production. tick() drops
workspaces declaring native ownership before goroutine fan-out;
advances next_run_at so we don't tight-loop on the same row.
- cmd/server/main.go: wires handlers.ProvidesNativeScheduler into
the cron scheduler at server boot.
Tests:
Go (7 new):
- SetCapabilitiesAndHas (round-trip)
- per-workspace isolation (ws-a's declaration doesn't leak to ws-b)
- nil/empty map clears (adapter dropping the flag restores fallback)
- SetCapabilities is a defensive copy (caller mutation can't
retroactively flip cached value)
- SetIdleTimeout preserves capabilities and vice-versa (two-field
independence)
- empty entry deleted (no stale husks)
- ProvidesNativeScheduler reads the same singleton heartbeat writes
- SetNativeSchedulerCheck wires the function (scheduler-side)
- nil-check safety contract for tick
Python: no change needed — the heartbeat already serializes the
full capability map via _runtime_metadata_payload (PR #2139). An
adapter setting RuntimeCapabilities(provides_native_scheduler=True)
automatically flows through.
Verification:
- 1308 / 1308 Python pytest pass (unchanged)
- All Go handlers + scheduler tests pass
- go build + go vet clean
See project memory `project_runtime_native_pluggable.md`.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Capability primitive #2 (task #117). The first cross-cutting capability
where the adapter actually displaces platform behavior — claude-code's
streaming session can legitimately go silent for 8+ minutes during
synthesis + slow tool calls; the platform's hardcoded 5min idle timer
in a2a_proxy.go cancels it mid-flight (the bug PR #2128 patched at
the env-var layer). This PR fixes it at the right layer: the adapter
declares "I need 600s" and the platform's dispatch path honors it.
Wire shape (Python → Go):
POST /registry/heartbeat
{
"workspace_id": "...",
...
"runtime_metadata": {
"capabilities": {"heartbeat": false, "scheduler": false, ...},
"idle_timeout_seconds": 600 // optional, omitted = use default
}
}
Default behavior preserved: any adapter that doesn't override
BaseAdapter.idle_timeout_override() (returns None by default) sends
no idle_timeout_seconds field; the Go side falls through to
idleTimeoutDuration (env A2A_IDLE_TIMEOUT_SECONDS, default 5min).
Existing langgraph / crewai / deepagents workspaces are unaffected.
Components:
Python:
- adapter_base.py: idle_timeout_override() method on BaseAdapter
returning None (the platform-default sentinel).
- heartbeat.py: _runtime_metadata_payload() lazy-imports the active
adapter and assembles the capability + override block. Try/except
swallows ANY error so heartbeat never breaks because of capability
discovery — observability outranks capability accuracy.
Go:
- models.HeartbeatPayload.RuntimeMetadata (pointer so absent =
"old runtime, didn't say"; explicit zero-cap = "new runtime,
declared no native ownership").
- handlers.runtimeOverrides: in-memory sync.Map cache keyed by
workspaceID. Populated by the heartbeat handler, consulted on
every dispatchA2A. Reset on platform restart (worst-case 30s of
platform-default behavior — acceptable; nothing about overrides
is correctness-critical).
- a2a_proxy.dispatchA2A: looks up the override before applyIdle
Timeout; falls through to global default when absent.
Tests:
Python (17, all new):
- RuntimeCapabilities dataclass shape (frozen, defaults, wire keys)
- BaseAdapter.capabilities() default + override + sibling isolation
- idle_timeout_override default, positive override, dropped-override
- Heartbeat metadata producer: default adapter emits all-False,
native adapter emits flag + override, missing ADAPTER_MODULE
returns {} (graceful), zero/negative override is omitted from
wire, exception inside adapter swallowed
Go (6, all new):
- SetIdleTimeout + IdleTimeout round-trip
- Zero/negative duration clears the override
- Empty workspace_id ignored
- Replacement (heartbeat overwrites prior value)
- Reset clears entire cache
- Concurrent reads + writes (sync.Map invariant)
Verification:
- 1308 / 1308 workspace pytest pass (was 1300, +8)
- All Go handlers tests pass (6 new + existing)
- go vet clean
See project memory `project_runtime_native_pluggable.md` for the
architecture principle this implements.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
[Molecule-Platform-Evolvement-Manager]
## What this fixes
Closes one of the three skipped tests in workspace_provision_test.go
that #1814's interface refactor enabled but never had a body written:
`TestProvisionWorkspace_NoInternalErrorsInBroadcast`.
The interface blocker (`captureBroadcaster` couldn't substitute for
`*events.Broadcaster`) was already fixed when `events.EventEmitter`
was extracted; this PR ships the test body that the prior refactor
made possible. The test was effectively unverified regression cover
for issue #1206 (internal error leak in WORKSPACE_PROVISION_FAILED
broadcasts) until now.
## What the test pins
Drives the **earliest** failure path in `provisionWorkspace` — the
global-secrets decrypt failure — so the setup needs only:
- one `global_secrets` mock row (with `encryption_version=99` to
force `crypto.DecryptVersioned` to error with a string that
includes the literal version number)
- one `UPDATE workspaces SET status = 'failed'` expectation
- a `captureBroadcaster` (already in the test file) injected via
`NewWorkspaceHandler`
Asserts the captured `WORKSPACE_PROVISION_FAILED` payload:
1. carries the safe canned `"failed to decrypt global secret"` only
2. does NOT contain `"version=99"`, `"platform upgrade required"`,
or the global_secret row's `key` value (`FAKE_KEY`) — the three
leak markers a regression that interpolates `err.Error()` into
the broadcast would surface
## Why not use containsUnsafeString
The test file already has a `containsUnsafeString` helper with
`"secret"` and `"token"` in its prohibition list. Those substrings
match the legitimate redacted message (`"failed to decrypt global
secret"`) — appropriate in user-facing copy, NOT a leak. Using the
broad helper would either fail the test against the source's own
correct message OR require loosening the helper for everyone else.
Per-test explicit leak markers keep the assertion precise without
weakening shared infrastructure.
## What's still skipped (out of scope for this PR)
- `TestProvisionWorkspaceCP_NoInternalErrorsInBroadcast` — same
shape but blocked on a different refactor: `provisionWorkspaceCP`
routes through `*provisioner.CPProvisioner` (concrete pointer,
no interface), so the test would need either an interface
extraction or a real CPProvisioner with a mocked HTTP server.
Larger scope; deferred.
- `TestResolveAndStage_NoInternalErrorsInHTTPErr` — different
blocker (`mockPluginsSources` vs `*plugins.Registry` type
mismatch). Needs a SourceResolver-side interface refactor.
Both still carry their `t.Skip` notes documenting the remaining
work.
## Test plan
- [x] New test passes
- [x] Full handlers package suite still green (`go test ./internal/handlers/`)
- [x] No changes to production code — pure test addition
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Foundation primitive for the native+pluggable runtime principle (task
#117, blocks #87). Lets each adapter declare which cross-cutting
capabilities it owns natively (heartbeat, scheduler, durable session,
status mgmt, retry, activity decoration, channel dispatch) versus
delegates to the platform's fallback implementation.
Pure additive: every existing adapter inherits BaseAdapter.capabilities()
which returns RuntimeCapabilities() — every flag False — so today's
"platform owns everything" behavior is preserved exactly. Subsequent
PRs land platform-side consumers (idle-timeout override, scheduler
skip, status-transition hook, etc.) one capability at a time.
Why a frozen dataclass instead of class attributes: capabilities are
declared at class-load time and read by the platform on every heartbeat.
A mutable value would let a runtime change capabilities mid-flight,
creating impossible-to-debug state where the platform's idea of who-
owns-heartbeat drifts from the adapter's actual code.
Why a `to_dict()` with explicit short keys: the Go side will read these
from the heartbeat payload by string key. The dict's wire names are
pinned independently of Python field names so a Python-side rename
doesn't silently break the Go consumer (test pins this).
Tests (9 new):
- is a frozen dataclass (mutation rejected)
- all 7 default flags are False (load-bearing — flipping any default
silently moves ownership for langgraph/crewai/deepagents)
- to_dict() keys are stable wire names (Go contract)
- BaseAdapter.capabilities() default returns all-False
- subclass override mechanism works
- sibling adapters' defaults aren't affected by an override
Verification:
- 1300/1300 workspace pytest pass (was 1291, +9)
- Zero behavior change for any existing code path
See project memory `project_runtime_native_pluggable.md`.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>