Five fixes for the first-time-user wizard. Every new user sees this,
so visual bugs here have outsized impact.
1. Action button hovered LIGHTER: bg-accent-strong/90 hover:bg-accent.
accent is the LIGHTER variant — hovering to it on white text drops
contrast below AA. Flipped the direction: bg-accent
hover:bg-accent-strong, matching the same trap fixed in
ConfirmDialog and ApprovalBanner.
2. "Next" button hover was a no-op (bg-surface-card on top of itself).
Lift to surface-elevated, matching the Cancel pattern in
ConfirmDialog.
3. Progress bar gradient was hardcoded from-blue-500 to-sky-400 —
neither tone exists in the warm-paper light theme, so the bar lost
brand color in light mode. Switched to the accent ramp so it stays
brand-tinted in both.
4. Step indicator was hardcoded text-sky-400/80, same theme-flip
issue. Switched to text-accent.
5. All three buttons (Skip / Action / Next) had no focus-visible
rings. Added the accent ring pattern used across the rest of
the canvas.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The previous soft-skip-on-dispatch path used `exit 0`, which only
ends the STEP — the rest of the workflow continued with empty
secrets. Caught 2026-05-04 by dispatched run 25296530706:
- E2E_MINIMAX_API_KEY: empty
- verify-secrets printed warning + exit 0
- Install required tools: ran
- Run synthetic E2E: ran with empty MiniMax key
- SECRETS_JSON branched to OpenAI shape (MINIMAX empty → fall through)
- But model slug stayed MiniMax-M2.7-highspeed (workflow env)
- Workspace booted with OpenAI keys + MiniMax model
- 5 min later: "Agent error (Exception)" — claude SDK 401'd
against api.minimax.io with the OpenAI key
The confusing failure mode silently masked the real problem (missing
secret) under a runtime-error label. Fix: drop both soft-skip paths
and exit 1 always. Operators who want to verify a YAML change without
setting up secrets can read the verify-secrets step's stderr — the
failure IS the verification signal.
Pure visibility fix; preserves the cron hard-fail path (now also the
dispatch hard-fail path). No mechanism change beyond the exit code.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Five fixes for the terms-acceptance modal:
1. CRITICAL: aria-hidden="true" on the modal's wrapper hid the dialog
AND its descendants from screen readers. The entire ToS-acceptance
flow was invisible to AT users. Removed the false aria-hidden — the
wrapper is just a backdrop, the dialog inside still has role=dialog
aria-modal=true so AT recognises it correctly.
2. Added focus management: when the modal opens, focus moves to the
"I agree" button (WCAG 2.4.3). Hard gate so no focus-trap loop or
Esc-dismiss — the user must accept or close the page.
3. "I agree" button hovered LIGHTER (bg-emerald-500 over bg-emerald-600).
On white text that drops below AA — same trap fixed in ApprovalBanner
and ConfirmDialog. Flipped to bg-emerald-700.
4. Added focus-visible ring on the "I agree" button. Was relying on
browser default outline only.
5. Privacy/Terms links: hardcoded text-sky-400 → text-accent (theme-
aware) + hover:text-accent-strong (was hover:text-sky-400, no-op
same color) + focus-visible ring. Added aria-describedby pointing
to the body div so SR can read the description with the title.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Mirrors the bouncing-dots indicator ChatTab already shows while waiting
for an agent reply. Before this, an operator delegating to one or more
external peers via Agent Comms saw their outbound bubble land and then
silence until the reply (or queued/failed status) arrived — no visual
"the system is working on this" cue.
Per-peer not global: when multiple delegations are in flight to
different peers (the fan-out case), one shared spinner under-reports —
the user can't tell whether ALL peers are still working or just the
visible ones. Per-peer matches Slack typing-indicator semantics and
keeps the signal honest.
Detection rule: walk visible messages, keep only the chronologically-
last bubble per peer. If that tail is `flow === "out"` AND status is
"pending" or "queued", emit a waiting bubble. Once an inbound reply
lands, the tail flips to "in" and the bubble disappears — even if the
backend hasn't mutated the original outbound row to "completed" yet.
This collapses both states into one rule.
Visual: matches the outgoing bubble (cyan-900/30 + cyan-700/20 border,
right-justified) with cyan-300/70 dots that respect prefers-reduced-
motion via `motion-safe:animate-bounce`. Queued case adds copy
explaining the peer is busy. role="status" + aria-label so SR users
also hear "Waiting for reply from <peer>".
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Four fixes for the cascade-delete confirmation modal:
1. Cancel button hover was a no-op: bg-surface-card on top of the
same base — clicking did something but the button looked dead.
Lifted to surface-elevated, matching the ConfirmDialog Cancel
pattern.
2. Delete button hovered LIGHTER (bg-red-500 over bg-red-600). On
white text that drops contrast below AA — same trap fixed in
ConfirmDialog and ApprovalBanner. Flipped to bg-red-700 so hover
stays readable in both themes.
3. Checkbox ring-offset color was zinc-900 — but the dialog actually
sits on bg-surface-sunken, so the offset showed the wrong color
through the ring gap. Corrected to ring-offset-surface-sunken.
Also moved focus → focus-visible so the ring only shows on
keyboard nav, not mouse clicks.
4. Cancel + Delete had no focus-visible rings. Added accent ring
on Cancel, danger ring on Delete, both with the correct
ring-offset-surface-sunken.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
User report: handing the modal's Claude Code channel snippet to an
agent fails immediately with two errors that the snippet doesn't tell
the operator how to resolve:
plugin:molecule@Molecule-AI/molecule-mcp-claude-channel · plugin not installed
plugin:molecule@Molecule-AI/molecule-mcp-claude-channel · not on the approved channels allowlist
Root cause: the snippet's `claude --channels plugin:...` line assumes
the plugin is pre-installed AND that the channel is on Anthropic's
default allowlist. Both assumptions are wrong for a custom Molecule
plugin in a public repo.
Two changes:
1. Rewrite externalChannelTemplate (Go) with full setup chain:
- Bun prereq check (channel plugins are Bun scripts)
- `/plugin marketplace add Molecule-AI/molecule-mcp-claude-channel`
+ `/plugin install molecule@molecule-mcp-claude-channel` BEFORE the
launch — otherwise "plugin not installed"
- `--dangerously-load-development-channels` flag on launch — required
for non-Anthropic-allowlisted channels, otherwise "not on approved
channels allowlist"
- Common-errors block at the bottom mapping each error string to
which numbered step recovers it
- Team/Enterprise managed-settings caveat (the dev-channels flag is
blocked there; admin must use channelsEnabled + allowedChannelPlugins)
Plugin install info verified by reading `Molecule-AI/molecule-mcp-claude-channel`
plugin.json (`name: "molecule"`) and the Claude Code channels +
plugin-discovery docs at code.claude.com/docs/en/{channels,discover-plugins}.
2. Add per-tab HelpBlock to the modal (canvas):
- Collapsible <details> below each snippet, closed by default so the
snippet stays the visual focus
- "Where to install" link (PyPI for runtime, claude.com for Claude
Code, github.com/openai/codex for Codex, NousResearch/hermes-agent
for Hermes)
- "Documentation" link (docs.molecule.ai/docs/guides/*; hostname
confirmed by existing blog post canonical metadata; paths map
1:1 to docs/guides/*.md files in this repo)
- "Common errors" list with concrete recovery steps for each tab
(e.g. Codex tab calls out the codex≥0.57 requirement and TOML
duplicate-table parse error; OpenClaw calls out the :18789 port
conflict check)
URL discipline: every URL is either (a) verified against a file path
in this repo's docs/, (b) the canonical repo of an existing snippet
reference, or (c) a well-known third-party canonical URL. No guessed
URLs — broken links would defeat the purpose of "more comprehensive
instructions."
Verification:
- `go build ./...` clean in workspace-server
- `go test ./internal/handlers/...` passes (4.3s)
- Bash syntax check on test_staging_full_saas.sh (no edits there) clean
- TS brace/paren/bracket counts balanced; no full tsc run because the
worktree's node_modules isn't installed — counterpart Canvas tabs E2E
on the PR will exercise the full type-check + render path
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
GitHub Actions scheduler de-prioritises :00 cron firings under load.
Empirical 2026-05-03: the canary's cron was '0,20,40 * * * *' but
actual firings landed at :08, :03, :01, :03 — :20 and :40 silently
dropped. Detection latency degraded from claimed 20 min to actual
~60 min worst case.
Move to '10,30,50 * * * *':
- :10/:30/:50 sit 10 min off the top-of-hour load peak
- Still 5 min from :15 sweep-cf-orphans and :45 sweep-cf-tunnels
(the original constraint that kept us off :15/:45)
- Same 20-min cadence; only the phase changes
No code change beyond the cron expression + comment refresh.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three small UIUX fixes for the bundle drag-import surface.
1. Drag overlay was hardcoded blue-950/blue-400 — those tones don't
exist in the warm-paper light theme, so the overlay washed out
inconsistently. Switched to bg-accent/15 + border-accent/40 so
the overlay flips with theme and matches the inner card's
border-accent/50.
2. Importing spinner was visually obvious but invisible to screen
readers — only the result toast had aria-live. Operators relying
on AT had no way to know the import was in flight. Added
role="status" + aria-live="polite" + aria-hidden on the spinner
itself so the SR hears "Importing bundle..." once.
3. animate-spin → motion-safe:animate-spin so the spinner respects
prefers-reduced-motion (Tailwind's built-in variant gates the
animation on the user's OS setting). Layout doesn't change in
either case — text alone communicates state.
Also dropped border-sky-400 → border-accent on the spinner so it
matches the rest of the canvas semantics.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Four UIUX fixes for the EC2 console modal:
1. Copy and Close buttons had hover:bg-surface-card on TOP of the
same base bg-surface-card — silent no-op hover. Lifted to
surface-elevated + line-soft border, matching ConfirmDialog's
Cancel pattern. The button visibly responds now.
2. Copy button silently succeeded — no toast, no animation, no UI
feedback. Operators clicking it had no idea whether anything
landed in the clipboard. Now fires showToast on resolve/reject
so the action is observable.
3. × close button was ~10x16px (well under WCAG 2.5.5's 24x24).
Bumped to w-6 h-6 with focus-visible ring + hover bg.
4. Added focus-visible:ring-accent/60 + ring-offset-surface to
all three buttons so keyboard users see focus. Matches the
semantic ring pattern used across the canvas.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two small fixes for the batch-action toolbar:
1. The deselect button's title says "Clear selection (Escape)" — but
pressing Escape did NOTHING. The title has been lying since the bar
shipped. Now wired: window keydown handler calls clearSelection
when Esc fires. Skipped while the confirm dialog is open
(`pending !== null`) so the dialog's own Esc-cancels takes
precedence, and skipped during a busy in-flight action so the
user can't strand a partial-failure mid-flight.
2. focus-visible:ring-zinc-500/70 → focus-visible:ring-accent/50
on the deselect button. The hardcoded zinc broke the semantic-
token pattern used by the other action buttons.
Tests: two new vitest cases — Esc clears with selection, Esc no-op
when empty (the bar isn't mounted at count===0 so the listener never
registers). Full suite: 1222/1222.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Follow-up to #2648 — same `>/dev/null || true` swallow-on-error
pattern existed in:
e2e-staging-canvas.yml (single-slug)
e2e-staging-saas.yml (loop)
e2e-staging-sanity.yml (loop)
e2e-staging-external.yml (loop, was `>/dev/null 2>&1` variant)
All four now capture the HTTP code, log a "[teardown] deleted $slug
(HTTP $code)" line on success, and emit a workflow warning naming
the slug + body excerpt on non-2xx. Loop bodies also tally + summarise
total leaks at the end.
Exit semantics unchanged: a single cleanup miss still doesn't fail-flag
the test (sweep-stale-e2e-orgs is the safety net within ~45 min). The
behavior change is purely surfacing — failures that were silent are
now visible on the workflow run page.
Pairs with #2648's tightened sweeper. Together: per-run cleanup
failures are visible AND the safety net catches them quickly.
Closes the per-workflow port noted as out-of-scope in #2648.
See molecule-controlplane#420.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two changes that close one of the leak classes from the
molecule-controlplane#420 vCPU audit:
1. sweep-stale-e2e-orgs.yml: cron */15 (was hourly), MAX_AGE_MINUTES
30 (was 120). E2E runs are 8-25 min wall clock; 30 min is safely
above the longest run while shrinking the worst-case leak window
from ~2h to ~45 min (15-min sweep cadence + 30-min threshold).
2. canary-staging.yml teardown: the per-slug DELETE used `>/dev/null
|| true`, which swallowed every failure. A 5xx or timeout from CP
looked identical to "successfully deleted" and the canary tenant
kept eating ~2 vCPU until the sweeper caught it. Now we capture
the response code and surface non-2xx as a workflow warning that
names the leaked slug.
The exit semantics stay unchanged — a single-canary cleanup miss
shouldn't fail-flag the canary itself when the actual smoke check
passed. The sweeper is the safety net for whatever slips past.
Caught during the molecule-controlplane#420 audit on 2026-05-03 —
3 e2e canary tenant orphans were running for 24-95 min, all under
the previous 120-min sweep threshold so they went unnoticed until
manual cleanup. Same `|| true` pattern exists in
e2e-staging-{canvas,external,saas,sanity}.yml; out of scope for
this PR (mechanical port; tracking separately) but the sweeper
tightening covers all of them by reducing the safety-net latency.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two small a11y fixes for the floating legend.
1. Both buttons (open pill + close ×) had no focus-visible ring —
keyboard users couldn't tell where focus landed. Added the
accent-ring pattern used across the rest of the canvas.
2. Close button was a ~10x16px hit area — well below WCAG 2.5.5's
24x24 minimum. Bumped to w-6 h-6 with negative margin so the
visible × stays in the same spot but the hit area + focus ring
are larger. Hover bg added to make the hit area visible on hover.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three fixes for the cookie banner:
1. role="dialog" aria-modal="true" → <section role="region">. The
banner has no focus trap, doesn't block the page, and the user
can keep using the canvas while it's up — none of which are modal
semantics. Claiming aria-modal="true" without a trap actively
harms screen-reader users: they're told the rest of the page is
inert, jump into the banner, and then can't escape. Region
semantics let AT navigate around it normally. (Forcing a modal
cookie banner would also be a dark pattern under GDPR.)
2. Privacy-policy link: hover:text-accent → hover:text-accent-strong.
The original was a no-op (same color). Also added focus-visible
ring + underline-offset so the link is readable AND keyboard-
distinguishable in both themes.
3. Both buttons: focus-visible:ring-2 + ring-offset-surface so
keyboard users see where focus lands. Mouse clicks unchanged
thanks to focus-visible.
Tests: swapped getByRole("dialog") → getByRole("region") in 8
existing tests, then tightened the role-assertion test into a
regression guard that explicitly asserts NO aria-modal and NO
dialog role exist. Full suite: 1220/1220.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Cuts the per-run LLM cost ~10x (MiniMax M2.7 vs gpt-4.1-mini) and
removes the recurring OpenAI-quota-exhaustion failure mode that took
the canary down on 2026-05-03 (#265 — staging quota burnt for ~16h).
Path:
E2E_RUNTIME=claude-code (default)
→ workspace-configs-templates/claude-code-default/config.yaml's
`minimax` provider (lines 64-69)
→ ANTHROPIC_BASE_URL auto-set to api.minimax.io/anthropic
→ reads MINIMAX_API_KEY (per-vendor env, no collision with
GLM/Z.ai etc.)
Workflow changes (continuous-synth-e2e.yml):
- Default runtime: langgraph → claude-code
- New env: E2E_MODEL_SLUG (defaults to MiniMax-M2.7-highspeed,
overridable via workflow_dispatch)
- New secret wire: E2E_MINIMAX_API_KEY ←
secrets.MOLECULE_STAGING_MINIMAX_API_KEY
- Per-runtime missing-secret guard: claude-code requires MINIMAX,
langgraph/hermes require OPENAI. Cron firing hard-fails on missing
key for the active runtime; dispatch soft-skips so operators can
ad-hoc test without setting up the secret first
- Operators can still pick langgraph/hermes via workflow_dispatch;
the OpenAI fallback path stays wired
Script changes (tests/e2e/test_staging_full_saas.sh):
- SECRETS_JSON branches on which key is set:
E2E_MINIMAX_API_KEY → {MINIMAX_API_KEY: <key>} (claude-code path)
E2E_OPENAI_API_KEY → {OPENAI_API_KEY, HERMES_*, MODEL_PROVIDER} (legacy)
MiniMax wins when both are present — claude-code default canary
must not accidentally consume the OpenAI key
Tests (new tests/e2e/test_secrets_dispatch.sh):
- 10 cases pinning the precedence + payload shape per branch
- Discipline check verified: 5 of 10 FAIL on a swapped if/elif
(precedence inversion), all 10 PASS on the fix
- Anchors on the section-comment header so a structural refactor
fails loudly rather than silently sourcing nothing
The model_slug dispatcher (lib/model_slug.sh) needs no change:
E2E_MODEL_SLUG override path is already wired (line 41), and
claude-code template's `minimax-` prefix matcher catches
"MiniMax-M2.7-highspeed" via lowercase-on-lookup.
Operator action required to land green:
- Set MOLECULE_STAGING_MINIMAX_API_KEY in repo secrets
(Settings → Secrets and Variables → Actions). Use
`gh secret set MOLECULE_STAGING_MINIMAX_API_KEY -R Molecule-AI/molecule-core`
to avoid leaking the value into shell history.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The staging canary's A2A step has a ladder of specific regression
classifiers (hermes-agent down, model_not_found, Invalid API key,
etc.) followed by a generic "error|exception" catch-all. Provider-
side OpenAI 429 quota errors fell through to the catch-all, so the
canary issue body and CI log just said "A2A returned an error-shaped
response" — which is technically true but obscures the actual
operator action.
This adds a 7th classifier above the catch-all for "exceeded your
current quota" / "insufficient_quota" — both terms appear in
OpenAI's quota-exhaustion 429 response. When matched, the failure
message names the operator action directly (top up MOLECULE_STAGING_OPENAI_KEY
or rotate the secret) and links to #2578.
Why this is correct, not "lowering the bar":
- Steps 0–7 of the canary cover full platform health (CP up, tenant
provisioned, DNS+TLS reachable, workspace booted, A2A delivered).
- Reaching step 8 with a provider-side 429 means the platform IS
healthy — the failure is downstream of all platform invariants.
- The canary still exits 1 (CI stays red, threshold-3 alarm still
fires); only the failure message changes.
- All 6 existing specific classifiers run BEFORE this one, so any
real platform regression is still caught with its specific message.
Verification:
- Regex tested against the actual 429 string from canary run 25291517608:
"API call failed after 3 retries: HTTP 429: You exceeded your current quota..."
→ matches ✅
- Negative tests: "PONG", "hermes-agent unreachable" → no match ✅
- bash -n syntax check passes
- shellcheck -S error clean
Tracking: #2593 (canary), #2578 (root cause)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>