PR #2558 enqueued a Task at the start of new requests so the v1 SDK
would accept TaskUpdater.start_work() — fix#1 of the v0→v1 migration
gap (PR #2170). But after Task is enqueued, the executor enters
"task mode" and the SDK rejects raw Message enqueues at the terminal
step:
{"code":-32603,"message":"Received Message object in task mode.
Use TaskStatusUpdateEvent or TaskArtifactUpdateEvent instead."}
Synth-E2E 2026-05-03T11:00:34Z surfaced this on the very first run
after the prior fix cascaded. Validation site is the same
a2a/server/agent_execution/active_task.py — the framework's job is
to enforce the v1 invariant; we're catching up to it.
The fix routes both terminal events through TaskUpdater helpers:
- success: updater.complete(message=msg) wraps in
TaskStatusUpdateEvent(state=COMPLETED, final=True)
- error: updater.failed(message=...) wraps in
TaskStatusUpdateEvent(state=FAILED, final=True)
Both helpers exist in a2a-sdk ≥ 1.0; verified via
TaskUpdater.complete signature.
Tests:
- conftest TaskUpdater stub now records complete/failed calls AND
routes the message back through event_queue.enqueue_event so the
~20 legacy tests asserting on enqueue_event keep working
- 2 new regression tests pin the contract:
* test_terminal_success_routes_via_updater_complete
* test_terminal_error_routes_via_updater_failed
- Both NEW tests verified to FAIL on staging-baseline (without this
fix) and PASS with it — they'd catch the regression before staging
if the wheel-smoke gate covered task-mode terminal events too
(separate yak-shave for #131 follow-up)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closes the recurrence path of PR #2556. The data fix realigned 8→4
templates in publish-runtime.yml's TEMPLATES variable, but the
underlying drift hazard was unguarded — the next manifest change
could silently leave cascade out of sync again.
This gate fails any PR that changes manifest.json or
publish-runtime.yml in a way that makes the cascade list diverge
from manifest workspace_templates (suffix-stripped). Either
direction is caught:
missing-from-cascade templates that won't auto-rebuild on a new
wheel publish (the codex-stuck-on-stale-runtime
bug class — PR #2512 added codex to manifest,
cascade wasn't updated, codex stayed pinned to
its last-built runtime version for weeks).
extra-in-cascade cascade dispatches to deprecated templates
(the wasted-API-calls + dead-CI-noise class —
PR #2536 pruned 5 templates from manifest;
cascade kept dispatching to all 8 until
PR #2556).
Triggers narrowly: only on PRs that touch manifest.json,
publish-runtime.yml, or the script itself. Fast (single grep+sed+comm
pipeline, no Go build).
Surfaced during the RFC #388 prior-art audit; folded in as the
structural follow-up to the data fix#2556 promised.
Self-tested both failure modes locally before commit:
- Drop codex from cascade → script fails with "MISSING: codex"
- Add langgraph to cascade → script fails with "EXTRA: langgraph"
Refs: https://github.com/Molecule-AI/molecule-controlplane/issues/388
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Boot smoke (#2275) exercises executor.execute() against stub deps
and never hits the real provider, so missing auth env is not a real
blocker. Without this bypass, every adapter that introduces a new
auth env var must be mirrored into molecule-ci's fake-env list — a
maintenance treadmill that just bit hermes-template:
- 2026-05-03 09:47 UTC: hermes publish-image smoke fails on
HERMES_API_KEY preflight (workflow injects CLAUDE_CODE_OAUTH_TOKEN,
ANTHROPIC_API_KEY, GEMINI_API_KEY, OPENAI_API_KEY but not
HERMES_API_KEY or OPENROUTER_API_KEY). Failed for two cycles
before being noticed.
The bypass demotes Required-env failures to warnings when
MOLECULE_SMOKE_MODE is truthy, so the unset env stays visible in
the boot log without blocking. Production paths are unchanged
(env unset → fail).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
a2a-sdk ≥ 1.0 raises InvalidAgentResponseError when an executor publishes a
TaskStatusUpdateEvent (e.g. via TaskUpdater.start_work) before any Task
event for fresh requests. The framework only auto-creates the Task on
continuation messages (existing task_id resolves via task_manager.get_task);
new requests leave _task_created unset and the SDK validation at
a2a/server/agent_execution/active_task.py rejects the first status update.
PR #2170 migrated the executor surface to v1 but missed this contract. The
synthetic E2E gate caught it on every staging run since (~1 week silent
fail) with:
{"jsonrpc":"2.0","id":"e2e-msg-1","error":{"code":-32603,
"message":"Agent should enqueue Task before TaskStatusUpdateEvent
event","data":null}}
The fix enqueues a Task(state=SUBMITTED) before the TaskUpdater is
constructed, gated on `context.current_task is None` so continuation
messages don't double-enqueue (which the SDK logs about but doesn't reject).
Tests:
- test_first_event_is_task_for_new_request — pins the new-request path:
first enqueue must be a Task with the expected id/context_id
- test_no_task_enqueue_on_continuation — pins the continuation path: when
context.current_task is set, the executor must NOT re-enqueue Task
- conftest: stub Task / TaskStatus / TaskState in the mocked a2a.types
module so the import inside the executor resolves under unit tests
google-adk adapter does not have this bug — its execute() only emits
Message events, not TaskStatusUpdateEvent. Its cancel() does emit one,
but cancel is rarely-invoked and out of scope for this fix.
Live verification path: this PR's merge → publish-runtime cascade → next
synth-E2E firing should go green at step "8/11 Sending A2A message to
parent — expecting agent response".
CP's deprovision flow calls Secrets.DeleteSecret() (provisioner/ec2.go:806)
but only when the deprovision runs to completion. Crashed provisions and
incomplete teardowns leak the per-tenant `molecule/tenant/<org_id>/bootstrap`
secret. At ~$0.40/secret/month, ~45 leaked secrets surfaced as ~$19/month
on the AWS cost dashboard.
The tenant_resources audit table (mig 024) tracks four kinds today —
CloudflareTunnel, CloudflareDNS, EC2Instance, SecurityGroup — and the
existing reconciler doesn't catch Secrets Manager orphans. The proper fix
(KindSecretsManagerSecret + recorder hook + reconciler enumerator) is filed
as a follow-up controlplane issue. This sweeper is the immediate stopgap.
Parallel-shape to sweep-cf-tunnels.sh:
- Hourly schedule offset (:30, between sweep-cf-orphans :15 and
sweep-cf-tunnels :45) so the three janitors don't burst CP admin
at the same minute.
- 24h grace window — never deletes a secret younger than the
provisioning roundtrip, so an in-flight provision can't be racemurdered.
- MAX_DELETE_PCT=50 default (mirrors sweep-cf-orphans for durable
resources; tenant secrets should track 1:1 with live tenants).
- Same schedule-vs-dispatch hardening as the other janitors:
schedule → hard-fail on missing secrets, dispatch → soft-skip.
- 8-way xargs parallelism, dry-run by default, --execute to delete.
Requires a dedicated AWS_JANITOR_* IAM principal — the prod molecule-cp
principal lacks secretsmanager:ListSecrets (it only has scoped
Get/Create/Update/Delete). The workflow's verify-secrets step will hard-fail
on the first scheduled run until those secrets are configured, surfacing
the missing setup loudly rather than silently no-op'ing.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Layer 1 of the runtime-rollout plan. Decouples publish from promotion by
giving operators a `runtime_image_pins` table the provisioner consults at
container-create time. No row = legacy `:latest` behavior; row present =
provisioner pulls `<base>@sha256:<digest>`. One bad publish no longer
breaks every workspace simultaneously.
Mechanics:
- Migration 047: `runtime_image_pins` (template_name PK + sha256 digest +
audit columns) and `workspaces.runtime_image_digest` (nullable, with
partial index) for "show me workspaces still on the old digest" queries.
- `resolveRuntimeImage` (handlers/runtime_image_pin.go): looks up the
pin, returns `<base>@sha256:<digest>` on hit, "" on miss/error so the
provisioner falls through to the legacy tag map. Availability over
pinning — any DB error logs and returns "" rather than blocking the
provision. `WORKSPACE_IMAGE_LOCAL_OVERRIDE=1` short-circuits the
lookup so devs rebuilding template images locally see their fresh
build.
- `WorkspaceConfig.Image` carries the resolved value into the
provisioner. `selectImage` honors it ahead of the runtime→tag map and
falls back to DefaultImage on unknown runtime.
- The existing `imageTagIsMoving` predicate (#215) already returns false
on `@sha256:` form, so digest pins skip the force-pull path naturally.
Tests:
- Handler-side (sqlmock): no-pin/db-error/with-pin/empty/unknown/local-
override paths cover every branch of `resolveRuntimeImage`.
- Provisioner-side: `selectImage` table covers explicit-image preference,
runtime-map fallback, unknown-runtime → default, empty-config →
default. Plus a struct-literal compile-time pin on `Image` so a future
refactor can't silently drop the field.
Layer 2 (per-ring routing via `workspaces.runtime_image_digest`) and the
admin promote/rollback endpoint ride on top of this and ship separately.
The cascade `TEMPLATES` list in publish-runtime.yml had drifted from
manifest.json:
Currently dispatches to: claude-code, langgraph, crewai, autogen,
deepagents, hermes, gemini-cli, openclaw
manifest.json supports: claude-code, hermes, openclaw, codex (after
PR #2536 pruned to 4 actively-supported)
Two consequences of the drift:
1. `codex` (added in PR #2512, supported in manifest) was never in the
cascade — fresh runtime publishes did NOT trigger a codex template
rebuild. Codex stayed pinned to whatever runtime version it last saw
at its own image-build time.
2. langgraph/crewai/autogen/deepagents/gemini-cli — deprecated, no
shipping images, no working A2A — were still receiving cascade
dispatches. Wasted API calls and (worse) green CI on dead repos
masks "this template is dead, stop maintaining it."
Now matches manifest.json workspace_templates exactly. Surfaced during
RFC #388 (fast workspace provision) prior-art audit.
Long-term fix is to derive TEMPLATES from manifest.json so this can't
drift again — captured as a Phase-1 invariant in RFC #388. This commit
is the data fix only; structural fix lands with the bake pipeline.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Independent code review of #2555 caught two contrast regressions left
by the bulk perl pass:
1. text-white → text-ink mass-substitution silently broke destructive
and primary buttons. text-ink resolves to #15181c (warm-paper
near-black) in light mode — dark text on bg-red-600 / bg-amber-600
/ bg-emerald-600 / bg-blue-600 / bg-accent / bg-accent-strong /
bg-good / bg-bad fails WCAG contrast and looks broken. Per-line
pass flips text-ink → text-white only when a saturated bg utility
is present; tinted-state pills (bg-red-950/50 etc.) keep their
intentionally-retained text-* literals.
2. Original mapping table was missing bg-zinc-600 (most-used
hover-state literal for cancel buttons — caused them to JUMP from
warm cream resting state to dark zinc on hover in light mode) and
text-zinc-700/800/900 (separator dots and decorative dim text
invisible on warm-paper light bg). Extended mapping fills these
gaps with bg-surface-card / text-ink-soft.
Also: drop stale tailwind.config.ts reference from components.json
(file deleted by the v3→v4 migration); switch baseColor zinc →
neutral and enable cssVariables since v4 uses CSS-driven tokens.
Future shadcn-cli invocations would have failed or written malformed
components without this.
27 sites in 27 files affected by #1, ~20 sites in 20 files by #2.
1214/1214 unit tests still pass; build still clean.
Findings courtesy of multi-model review per code-review-and-quality
skill — different blind spots catch different bugs.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CI's `npm ci` failed because the previous lock was generated on macOS
arm64, which omits the Linux-specific optional deps that
@tailwindcss/postcss → lightningcss-linux-x64-gnu transitively need
(@emnapi/runtime, @emnapi/core).
Re-ran `npm install --include=optional` so the lock includes every
platform variant of lightningcss + the @emnapi packages they pull in.
Runner (Linux x64) now has what it needs; local macOS install still
fine (npm picks the matching binary at install time).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Self-review of PR #2553 caught an unreachable defensive block at
test_load_skills_call_sites.py:99-103: the inner check guarded
`call.func.__class__.__name__ == "Name"` from a FunctionDef, but
`_find_load_skills_calls` already filters its return type to
`ast.Call` — `FunctionDef` cannot reach that loop body. The block
was a no-op `pass` with a misleading comment.
Removing keeps the gate behaviorally identical; tests still pass.
Same five-axis review pass that turned this up also approved the
substantive logic of #2553, so no behavior change here.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closes the documentation + audit gap for declarative skill-compat. The
plumbing has been live since PR #117 (RuntimeCapabilities) and
skill_loader's `_normalize_runtime_field` has been emitting filter
decisions for weeks, but:
- No public doc explained the `runtime` frontmatter field, so skill
authors didn't know how to opt in / opt out.
- No structural gate ensured every load_skills() call site threads
current_runtime — a future caller forgetting the kwarg silently
force-loads runtime-incompatible skills (no AttributeError, just a
delayed crash on first tool invocation).
Two changes:
1. docs/agent-runtime/skills.md
- Adds `runtime`, `tags`, `examples` to the Frontmatter Fields table.
- Adds a Runtime Compatibility section with example, accepted shapes
(universal default, list, string sugar), and the "logged + omitted,
not crashed" failure mode. Notes that match values come from each
adapter's name() (the same string in config.yaml's runtime: field).
2. workspace/tests/test_load_skills_call_sites.py
- Static AST gate: walks every workspace/*.py (excluding tests),
finds load_skills(...) Call nodes, fails if any lacks
current_runtime= as a keyword.
- Defense-in-depth `test_known_call_sites_present` — pins that the
scan actually sees the two known callers (adapter_base,
skill_loader.watcher) so a refactor that moves them is loud.
- Sanity-checked the matcher against a synthetic violating module.
Same-shape pattern as PR #2358 (tenant_resources audit-coverage AST
gate, #150) — pin the contract structurally, not just behaviorally.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds adapter.event_log property+setter on BaseAdapter so adapters can
emit structured events (tool dispatch, skill load, executor errors)
without coupling to the chosen backend. Default is a shared no-op
DisabledEventLog; main.py overrides at boot from the
observability.event_log config block (PR-2 schema).
The shape is intentionally additive:
- Property is invisible to the BaseAdapter signature snapshot drift
gate (the helper walks vars(cls) for callables only — properties
are not callable). Verified with a regression test in the new
test_adapter_base_event_log.py.
- Existing adapters continue to work unchanged. Template repos that
never call self.event_log get the no-op for free.
- Setter accepts any EventLogBackend, so swapping memory↔disabled
at runtime (or to a future Redis backend) requires no adapter
code change.
Sequels:
- PR-3c: emit events from claude-code/hermes adapters at the
natural points (tool dispatch, skill load).
- PR-4: skill-compat audit + SKILL.md frontmatter docs.
- Platform-side /workspaces/:id/activity endpoint reads the buffer.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replaces the hard-coded HEARTBEAT_INTERVAL=30 in heartbeat.py and
log_level="info" in main.py with values from
ObservabilityConfig (#119 PR-1, schema landed in PR #2538).
Concrete plumbing:
- heartbeat.HeartbeatLoop accepts an `interval_seconds=` keyword
arg. Defaults to the legacy module constant so 2-arg callers
(existing tests, any downstream code that hasn't been updated)
keep their existing 30s behavior.
- main.py constructs HeartbeatLoop with
config.observability.heartbeat_interval_seconds — the value the
config parser already clamped to [5, 300].
- main.py's uvicorn.Config takes log_level from
config.observability.log_level (lowercased — uvicorn's convention
differs from Python logging's) with LOG_LEVEL env still winning
as an ops-side debugging override.
Adapter EventLog wiring deferred to PR-3b (#208 follow-up) — touches
adapter_base interface + needs careful design, kept separate to keep
this PR small + reviewable.
Tests:
- test_heartbeat.py: 3 new tests pin default interval, explicit
override, and the [5, 300] band that the constructor accepts
without re-clamping (clamping is the parser's job).
- All 88 tests in test_heartbeat.py + test_config.py pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The Docker-mode orphan sweeper was incorrectly targeting external runtime
workspaces, revoking their auth tokens ~6 minutes after creation (one
sweep cycle past the 5-min grace).
External workspaces have NO local container by design — their agent runs
off-host. The "no live container" predicate the sweep uses to detect
wiped-volume orphans matches every external workspace unconditionally,
which was killing the only auth credential the off-host agent has.
Reproducer: create runtime=external workspace, paste the auth token into
molecule-mcp / curl, wait 5 minutes. Next request returns
`HTTP 401 — token may be revoked`. Platform log shows
`Orphan sweeper: revoking stale tokens for workspace <id> (no live
container; volume likely wiped)`.
Fix: add `AND w.runtime != 'external'` to the sweep's SELECT. The
existing test regexes (third-pass query expectations + the shared
expectStaleTokenSweepNoOp helper) are tightened to require the new
predicate, so a regression that drops it fails CI immediately.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The wheel-build drift gate caught it correctly: any new top-level
module under workspace/ must be listed in TOP_LEVEL_MODULES so its
`from event_log import …` statements get rewritten to
`from molecule_runtime.event_log import …` at package time.
Without this entry, the published wheel ships event_log.py un-rewritten
and crashes at runtime with ModuleNotFoundError on first heartbeat.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds workspace/event_log.py with an in-memory EventLog backend and a
disabled no-op variant, plus EventLogConfig nested in
ObservabilityConfig (backend / ttl_seconds / max_entries).
The event log is the append-and-query buffer that the canvas Activity
tab and platform `/activity` endpoint will read in PR-3 of the #119
stack. Two backends ship in this PR:
- InMemoryEventLog: bounded ring buffer with TTL eviction, monotonic
ids that survive eviction so cursors don't break, thread-safe for
concurrent appends from heartbeat + main loop + A2A executor.
- DisabledEventLog: no-op for `backend: disabled` — opts the
workspace out without crashing callers that propagate event ids.
Schema-only PR — no consumers wired yet. Wiring lands in PR-3.
Test coverage:
- 34 new test_event_log.py tests (100% line coverage on event_log.py)
- 9 new test_config.py tests for EventLogConfig parsing
- Concurrency stress with 8 threads × 200 appends — verifies unique
monotonic ids under contention
- TTL + max_entries eviction with injected clock (no time.sleep)
- Disabled backend contract pinned
Closes#207.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Heartbeats fire every 60s per workspace and were the dominant caller
of ReadPlatformInboundSecret — one DB SELECT each, purely to redeliver
the same value. For an N-workspace fleet that's N SELECTs/minute of
pure overhead, growing linearly with the fleet (#189).
This adds a sync.Map cache keyed by workspaceID with a 5-minute TTL:
- **Read-through**: cache miss → DB SELECT → populate → return.
- **Write-through**: every IssuePlatformInboundSecret call refreshes
the cache with the new value before returning, so the lazy-heal mint
path (readOrLazyHealInboundSecret) doesn't see a stale read of the
value it just wrote.
- **TTL eviction**: 5 minutes — generous enough that the heartbeat
hot path hits cache for ~5 reads in a row before re-validating, short
enough that an out-of-band rotation (operator running
`UPDATE workspaces SET platform_inbound_secret=...` directly)
propagates within minutes without requiring a redeploy.
- **Absence not cached**: ErrNoInboundSecret skips the cache write so
the lazy-heal recovery contract for the column-NULL case
(readOrLazyHealInboundSecret in workspace_provision_shared.go) keeps
working.
Memory footprint is bounded by the active workspace fleet (~200 bytes
per entry); deleted workspaces leave dead entries until process restart,
acceptable given workspace-deletion is operator-rare.
Why in-process instead of Redis: workspace-server runs as a single
Railway service today (per memory project_controlplane_ownership);
adding Redis for this single column read would be over-engineering.
The cache is a self-contained, Redis-free upgrade that keeps the same
semantic surface (read returns the latest secret) while collapsing
the heartbeat read storm. If the deployment ever fans out across
replicas, an operator-side rotation propagates per-replica TTL-bounded
without needing a shared write log.
Tests: 5 new cases covering cache hit within TTL, refresh after TTL
(simulating an operator rotation via SQL), write-through on Issue,
absence-not-cached, and Reset clearing all entries. The setupMock
helper in wsauth and setupTestDB helper in handlers both call
ResetInboundSecretCacheForTesting() at start + cleanup so write-through
state from one test doesn't shadow SELECT expectations in the next.
SetInboundSecretCacheNowForTesting() exposes a deterministic clock
override so the TTL test doesn't sleep.
Task: #189.
Previously Start() only pulled when the image was missing locally
(imgErr != nil). Once a tenant's Docker daemon had `:latest` cached,
it stuck on that snapshot forever even after publish-runtime pushed
a newer image with the same tag — the same image-cache class that
sibling task #232 closed on the controlplane redeploy path.
Now Start() additionally re-pulls when the tag is "moving"
(`:latest`, no tag, `:staging`, `:main`, `:dev`, `:edge`, `:nightly`,
`:rolling`). Pinned tags (semver, sha-prefixed, date-stamped, build-id)
and digest-pinned references (`@sha256:...`) skip the pull because
their contents are by definition immutable.
The classifier (imageTagIsMoving) is deliberately conservative on the
"moving" side — only the well-known moving tags trip it. Misclassifying
a pinned tag as moving wastes bandwidth on every provision; misclassifying
moving as pinned silently bricks the fleet on stale snapshots, which
is exactly the bug class this fix closes.
Edge cases handled:
- Registry hostname with port (`localhost:5000/foo`) — the `:5000` is
not mistaken for a tag.
- Digest pinning (`image@sha256:...`) — never re-pulled even if a
moving-looking tag is also present.
- Legacy local-build tags (`workspace-template:hermes`) — treated as
pinned (no registry to move from).
Test coverage: 22 cases across all classifier shapes. No changes to
the pull-failure path (still best-effort, ContainerCreate still
surfaces the actionable "image not found" error if the pull failed
and the cache is also empty).
Task: #215. Companion to #232.
The drift gate's monorepoRoot walk-up looked for workspace-configs-templates/
which is gitignored locally and doesn't exist in this repo at all (the
canonical script lives in molecule-ai-workspace-template-hermes). Test
failed on CI from day one with "could not find monorepo root".
Two layered fixes in one PR:
1. Vendor upstream derive-provider.sh as testdata/ + drop monorepoRoot.
The vendored copy has a header pointing operators at the upstream
source and a one-line cp command for refresh. Test now reads two
files (vendored shell + workspace_provision.go) via package-relative
paths — Go test sets cwd to the package dir, so this is hermetic
without any walk-up gymnastics.
2. Update the case-statement regex to match upstream's renamed variable
(${_HERMES_MODEL} since v0.12.0, the resolved value of
HERMES_INFERENCE_MODEL with a HERMES_DEFAULT_MODEL legacy fallback).
Regex now accepts either spelling so a future rename fails loudly
on the parser-sanity check rather than silently returning empty.
Vendoring upstream surfaced real drift the gate was designed to catch:
upstream v0.12.0 added 12 provider prefixes that deriveProviderFromModelSlug
didn't handle (xai/grok, bedrock/aws, tencent/tencent-tokenhub, gmi,
qwen-oauth, lmstudio/lm-studio, minimax-oauth, alibaba-coding-plan,
google-gemini-cli, openai-codex, copilot-acp, copilot). Without these,
Save+Restart on a workspace using one of those prefixes would persist
LLM_PROVIDER="" and the next boot would fall back to derive-provider.sh's
runtime *=auto branch — losing the user's explicit choice on every restart.
Added all 12 case clauses + 16 new table-driven test cases (covering
both canonical and aliased forms). Drift gate now passes; future
upstream additions will fail loudly with a "DRIFT: ..." message
pointing the engineer at the missing case.
Task: #242
PR #2535 added a Go port of derive-provider.sh
(deriveProviderFromModelSlug) so workspace-server can persist
LLM_PROVIDER into workspace_secrets at provision time. This created
two sources of truth — if a future PR adds a provider prefix to one
without the other, the platform's persisted LLM_PROVIDER silently
disagrees with what the container's derive-provider.sh produces at
boot, with no test going red.
This adds a hermetic drift gate that:
1. Parses workspace-configs-templates/hermes/scripts/derive-provider.sh
with regex (handling both single-line `pat/*) PROVIDER="x" ;;`
clauses and multi-line conditional clauses) to build a
map[prefix]provider.
2. Walks workspace_provision.go's AST with go/ast, finds
deriveProviderFromModelSlug, and extracts every case-clause
prefix → return-string-literal pair.
3. Cross-checks both directions and accepts only the two documented
divergences (nousresearch/* and openai/* both → "openrouter" at
provision time because derive-provider.sh's runtime-env checks
aren't loaded yet) via a hardcoded acceptedDivergences map.
4. Fails with an actionable message that names both files and
suggests the exact fix (add the case OR add to divergence list
with a comment).
Pattern: behavior-based AST gate from PR #2367 / memory feedback —
pin the invariant by what the function maps, not by what it's named.
Stdlib-only (go/ast, go/parser, go/token, regexp); no network, no DB,
no docker — reads two monorepo files in-process.
A second sanity-check test pins anchor prefixes the regex must find,
so a future shell-syntax change can't silently produce an empty map
and trivially pass the main gate.
Closes task #242.
PR #2545 self-review findings.
(1) originalModel was set from wsMetadataModel alone. On a hermes/pre-#240
workspace where MODEL_PROVIDER was never written but YAML has
runtime_config.model: "something", originalModel="" while the form
rendered "something" — handleSave's diff fired /model PUT on every
unrelated save (tier change → workspace auto-restart). Snapshot from
the actual rendered model in BOTH loadConfig branches so the diff
stays scoped to user-initiated changes.
(2) The store-flush test asserted the call happened but didn't pin
success-gating. A future refactor wrapping the PATCH in try/catch and
unconditionally calling updateNodeData would have shipped green and
left the badge lying about server-rejected writes. New test pins the
PATCH-rejects-no-flush invariant.
(3) Hermes-edge regression test for (1).
All 1214 canvas tests pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three drift bugs in ConfigTab + ProviderModelSelector. Same root pattern:
the form's display, the diff baseline, and the canvas store all read or
write from different copies of the same data, so what the user sees and
what the runtime actually uses can diverge silently.
(1) currentModelId read runtime_config.model first; loadConfig overrode
only top-level config.model. With template YAML `runtime_config.model:
sonnet` and live MODEL_PROVIDER=`MiniMax-M2`, the form rendered
"Claude Code subscription / Claude Sonnet (OAuth)" while the container
env (and chat) used MiniMax-M2. Fix: loadConfig propagates
wsMetadataModel into BOTH places.
(2) handleSave's nextModel-vs-oldModel diff compared the form value to
the YAML default. After (1) mirrors wsMetadataModel into the form's
runtime_config.model for display, that diff was always non-zero on
no-op saves and would fire /model PUT — which auto-restarts. New
originalModel state tracks the loaded MODEL_PROVIDER and is the diff
baseline.
(3) handleSave PATCHed the workspace row but never pushed the same
fields into useCanvasStore.updateNodeData. User picked T3, hit Save &
Restart, DB updated to tier=3, header pill kept showing T2 until full
hydrate. Fix: mirror dbPatch into the store.
Bonus: ProviderModelSelector.handleProviderChange used to auto-default
the model to next.models[0] (alphabetically first) when switching
providers. User picked the MiniMax provider intending MiniMax-M2.7;
the form silently set MiniMax-M2 (first in the bucket) and the
workspace deployed with the wrong model. Now empty-default for
multi-model providers, force explicit pick — Save/Deploy already gate
on model.trim() === "".
Three new tests in ConfigTab.provider.test.tsx pin (1)/(2)/(3); two
existing ProviderModelSelector tests updated to reflect the no-silent-
default behaviour, with a new single-model-auto-pick test for the
0-vs-many boundary. 1212/1212 canvas tests pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two follow-ups from PR #2543's multi-model code review (audit #253).
1. **Log silent yaml.Unmarshal errors (#256).** When a malformed
config.yaml made `yaml.Unmarshal(data, &raw)` fail, the affected
template silently disappeared from /templates with no trace —
operator could not distinguish "excluded due to parse error" from
"never existed." That widened a real foot-gun once PR #2543 added
structured top-level `providers:` (a string-shaped top-level
`providers:` decoded into `[]providerRegistryEntry` would fail and
drop the whole entry). Now logs `templates list: skip <id>:
yaml.Unmarshal: <err>` and continues with the rest.
2. **Coexistence test (#257 part 1).** PR #2543 covered the structured
registry and slug list in isolation. claude-code-default in
production ships BOTH: top-level `providers:` (structured registry,
2 entries) AND `runtime_config.providers:` (slug list, 3 entries).
New `TestTemplatesList_BothProviderShapesCoexist` mirrors that
layout, asserts both shapes surface independently with no
cross-talk (e.g. a slug-only entry like `anthropic-api` does NOT
synthesize a stub in the structured registry), and pins the JSON
wire-shape for both fields side-by-side.
3. **`base_url: null` decoding assertion (#257 part 3).** Adds an
explicit `got[0].BaseURL == ""` check in the existing
`TestTemplatesList_SurfacesProviderRegistry` test, locking in the
`string` (not `*string`) type. A future change to `*string` would
surface as JSON `null` and break canvas's "no base_url = use
provider defaults" branch — caught loudly by this assertion.
Tests: 11 TestTemplatesList_* now green, including the new
MalformedYAMLLogsAndSkips and BothProviderShapesCoexist.
The remaining piece of #257 — renaming `Providers []string` JSON tag
to `provider_slugs` — requires coordinated canvas updates across 4
files and is intentionally deferred to a separate PR (no canvas
churn while user is mid-test).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closes the contract drift caught by audit #253. Task #235 ("Server:
enrich /templates payload with structured providers") was marked
completed, but `templates.go` only ever emitted the
`runtime_config.providers []string` slug list — the structured
ProviderEntry shape (auth_env, model_prefixes, model_aliases, base_url)
the description promised was never plumbed.
Templates ship the structured registry under a TOP-LEVEL `providers:`
block (claude-code carries 6+ entries today; hermes still uses the
slug list). Both shapes coexist and are independent — surface them as
two separate fields:
- `providers` → existing []string slug list (unchanged)
- `provider_registry` → new []providerRegistryEntry (structured)
The canvas's ProviderModelSelector comment block already anticipates
this ("Templates that ship explicit vendor metadata (future) should
override the heuristic."). With this field in place, the canvas can
optionally drop its prefix-inference fallback for templates that ship
an explicit registry — separate PR. Today's change is purely additive
on the server side; no canvas change required.
Tests:
- TestTemplatesList_SurfacesProviderRegistry: order preservation +
field plumbing on a claude-code-shaped fixture (oauth + minimax)
+ JSON wire-shape gate to catch struct-tag renames.
- TestTemplatesList_OmitsProviderRegistryWhenAbsent: omitempty so
legacy templates (hermes, langgraph) don't emit `null` and break
Array.isArray on the canvas side.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Bug B fix, server-side complement to molecule-runtime PR #2538.
The runtime PR taught `workspace/config.py` to honour
`MODEL_PROVIDER` over `runtime_config.model` from the template's
verbatim YAML. This PR is the upstream half: workspace-server's
`applyRuntimeModelEnv` now sets `MODEL=<picked>` for **every**
runtime, not just hermes (which got `HERMES_DEFAULT_MODEL` already).
Pre-fix: applyRuntimeModelEnv's per-runtime switch only emitted
HERMES_DEFAULT_MODEL for hermes; every other runtime got nothing,
so the adapter read its template's default model from
/configs/config.yaml. Surfaced 2026-05-02 — picking MiniMax-M2 in
canvas → workspace booted with model=sonnet (claude-code template
default) and demanded CLAUDE_CODE_OAUTH_TOKEN.
Post-fix: MODEL is set unconditionally before the per-runtime switch.
HERMES_DEFAULT_MODEL stays for backwards compat. Adapters opt in by
reading os.environ["MODEL"] in their executor (claude-code adapter
already does this since the same Bug B fix; see
workspace-configs-templates/claude-code-default/adapter.py).
Tests
=====
- `TestApplyRuntimeModelEnv_SetsUniversalMODELForAllRuntimes`:
table-driven across claude-code/hermes/langgraph/crewai + empty-model
fallback + MODEL_PROVIDER-secret-fallback path. Adding a new
runtime = adding a row, not writing a new test.
- All 6 sub-cases pass + existing
`TestWorkspaceCreate_FirstDeploy_UnknownModel_OnlyMintModelProvider`
pin still green.
Why now
=======
This was authored alongside the runtime PR but stashed (not committed)
during a session-handoff cleanup. The molecule-runtime side shipped at
SHA 16ac895a and is live on PyPI as molecule-ai-workspace-runtime
0.1.84, but until the workspace-server side ships, the canvas-picked
MODEL env never reaches non-hermes adapters.
Caught by the systematic stash audit triggered by the user's
discovery that ProviderModelSelector had been similarly stashed.
Closes the workspace-server side of #246. Builds on merged #2538.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>