The staging canary's A2A step has a ladder of specific regression
classifiers (hermes-agent down, model_not_found, Invalid API key,
etc.) followed by a generic "error|exception" catch-all. Provider-
side OpenAI 429 quota errors fell through to the catch-all, so the
canary issue body and CI log just said "A2A returned an error-shaped
response" — which is technically true but obscures the actual
operator action.
This adds a 7th classifier above the catch-all for "exceeded your
current quota" / "insufficient_quota" — both terms appear in
OpenAI's quota-exhaustion 429 response. When matched, the failure
message names the operator action directly (top up MOLECULE_STAGING_OPENAI_KEY
or rotate the secret) and links to #2578.
Why this is correct, not "lowering the bar":
- Steps 0–7 of the canary cover full platform health (CP up, tenant
provisioned, DNS+TLS reachable, workspace booted, A2A delivered).
- Reaching step 8 with a provider-side 429 means the platform IS
healthy — the failure is downstream of all platform invariants.
- The canary still exits 1 (CI stays red, threshold-3 alarm still
fires); only the failure message changes.
- All 6 existing specific classifiers run BEFORE this one, so any
real platform regression is still caught with its specific message.
Verification:
- Regex tested against the actual 429 string from canary run 25291517608:
"API call failed after 3 retries: HTTP 429: You exceeded your current quota..."
→ matches ✅
- Negative tests: "PONG", "hermes-agent unreachable" → no match ✅
- bash -n syntax check passes
- shellcheck -S error clean
Tracking: #2593 (canary), #2578 (root cause)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two small UIUX fixes for Cmd+K search.
1. Auto-highlight the first match while the user types. Before, Enter
on a non-empty query was a no-op — focusedIndex stayed at -1 until
the user pressed ↓. Standard search-palette behavior is to highlight
the top result so Enter just works. Empty query keeps -1 (opening
the dialog shows ALL workspaces; arbitrarily pinning one looks
wrong).
2. placeholder-zinc-400 → placeholder-ink-soft. The hardcoded zinc
broke the semantic-token pattern other inputs use; placeholder now
flips with theme correctly. (Also reordered focus:outline-none
ahead of the focus-visible variants — cosmetic, more idiomatic.)
Tests: replaced the "resets to -1" test with two new ones — auto-
highlight on a matching query (Enter selects without ArrowDown), and
no-results query stays a no-op. Full suite 1220/1220.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two small fixes for the workspace right-click menu:
1. Off-screen clamp. Right-clicking near the right or bottom edge of
the canvas put part of the menu past the viewport — items hidden
under the scrollbar / off the screen. The menu now measures itself
on the same rAF that auto-focuses the first item, and shifts back
inside with an 8px margin (matching the floating-tooltip top-edge
clamp in Tooltip.tsx). Falls back to the raw cursor coords for the
first paint frame so there's no flash.
2. focus:ring-zinc-600 → focus-visible:ring-accent/50. The hardcoded
zinc tone broke the semantic-token pattern every other surface
uses; flipping to focus-visible also stops the ring from showing
when items are clicked with the mouse (only keyboard nav now
triggers the ring, matching Toolbar/SidePanel behavior).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two diagnostic upgrades to the Playwright staging-setup harness, both
zero-behavior-change:
1. provision-failed throw now includes the full admin-orgs row (boot
stage, last error, terraform/SSM state, etc) instead of just the
slug. Every "provision failed: <slug>" in CI history was followed
by a manual repro to find out WHY — that round-trip is gone.
2. workspace-failed throw dumps the full /workspaces/{id} body when
last_sample_error is empty. Boot crashes, image-pull errors,
missing PYTHONPATH, and OpenAI-quota-at-startup all surface as a
bare "Workspace failed:" today (see #2632). Now they carry the
boot_stage / image / last_error fields the API row exposes.
No fix for the underlying flakes — those are tracked in #2632 (CP race)
and #2578 (OpenAI quota). This just stops them looking identical in the
CI log.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three small a11y fixes for the global toast surface:
1. Esc dismisses the newest toast. Errors never auto-expire, so without
a keyboard shortcut a keyboard-only user has to tab through the entire
app to reach the × button on a stuck error.
2. Dismiss button gets focus-visible ring + theme-aware tint. The previous
`opacity-70 hover:opacity-100` gave no visible focus indicator (WCAG
2.4.7). Info toasts use the semantic surface that flips with theme,
so the dismiss tint splits per type — accent ring on info, white ring
on the always-dark success/error toasts.
3. Touch target bumps from p-1 (~24x24) to w-7 h-7 (28x28) toward WCAG
2.5.5 AAA's 44x44 ideal.
Tests: 5 new vitest cases covering Esc on info/error, no-op on empty
queue, accessible label, and per-toast click dismissal.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
WCAG 1.4.13 (Content on Hover or Focus) requires that tooltip content
be DISMISSIBLE without moving pointer hover or keyboard focus. Tooltip
had no escape hatch — once a keyboard user tabbed onto a control with
a tooltip, the tooltip stayed visible until they tabbed away (which
moves focus and may not be possible if the tooltip is itself blocking
content the user needs to see, e.g. for screen-magnifier users).
Add a window-level Escape listener that's active only while a tooltip
is shown. Pressing Esc clears the tooltip without moving focus or
breaking the hover state, satisfying the dismissible criterion.
Used `capture: true` so we beat any modal/dialog Esc handler that
might also be listening — the tooltip belongs to the focused control,
not the modal it sits inside.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Self-review of the modal-tab additions caught footguns in the new
hermes/codex/openclaw snippets. Ship the fixes before merge.
Critical 1 — Hermes `cat >> ~/.hermes/config.yaml` corrupts existing
configs. Most existing hermes installs have a top-level gateway:
block; appending creates a duplicate, which YAML rejects. Replaced
the auto-append with explicit instructions: 'under your existing
gateway: block, add a plugin_platforms entry'.
Critical 2 — Codex `cat >> ~/.codex/config.toml` corrupts on
re-run. TOML rejects duplicate [mcp_servers.molecule] tables; a
second run breaks codex parse. Replaced auto-append with commented
config block + explicit 'open ~/.codex/config.toml in your editor
and paste'. Canvas-side token stamping still hits the literal in
the comment so the operator's clipboard has the real token already
substituted.
Required 3 — OpenClaw `onboard --non-interactive` missing
provider/model defaults. Added explicit --provider + --model
placeholders in a commented form so operators see what's needed
without a stub default applying silently.
Required 4 — OpenClaw gateway started with bare '&' dies on
terminal close. Switched to nohup + log file + disown, with a note
that systemd is the right answer for production.
Optional 5 + 6 (env_vars cleanup, tests) deferred — env_vars stripped
to keep the in-tree-vs-external surface narrow; tests for the new
response fields can land separately when external_connection.go is
next touched.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The External Connect modal had tabs for Python SDK / curl / Claude Code
channel / Universal MCP. Operators using hermes / codex / openclaw as
their external runtime had no copy-paste; they pieced together
WORKSPACE_ID + PLATFORM_URL + auth_token into config files by reading
docs.
Adds three runtime-specific snippets stamped server-side:
- **Hermes** — installs molecule-ai-workspace-runtime + the
hermes-channel-molecule plugin, exports the 4 env vars, and writes
the gateway.plugin_platforms.molecule block into ~/.hermes/config.yaml.
Same long-poll-based push semantics the Claude Code channel tab
delivers (push parity with the in-tree template-hermes adapter).
- **Codex** — wires the molecule_runtime A2A MCP server into
~/.codex/config.toml ([mcp_servers.molecule] block with env_vars
passthrough + literal env values). Outbound tools only — codex's
MCP client doesn't route arbitrary notifications/* (verified by
reading codex-rs/codex-mcp/src/connection_manager.rs); push parity
on external codex would need a separate bridge daemon, tracked
as future work. Snippet calls this out so operators know to pair
with Python SDK if they need inbound delivery.
- **OpenClaw** — installs openclaw + onboards, wires the molecule
MCP server via openclaw mcp set, starts the gateway on loopback.
Same outbound-tools-only caveat as codex; the in-tree template-
openclaw adapter implements the full sessions.steer push path,
but an external setup would need the same bridge daemon to translate
platform inbox events into sessions.steer calls. Future work.
Default open tab changed from "Claude Code" to "Universal MCP".
Universal MCP is runtime-agnostic and works as a starting point for
any operator regardless of their downstream agent runtime; runtime-
specific tabs are still one click away. Pre-2026-05-03 the modal
defaulted to Claude Code, so operators using non-Claude runtimes
opened to a tab they had to skip past.
Tab order also reorganized:
Universal MCP → Python SDK → Claude Code → Hermes → Codex → OpenClaw → curl → Fields
Each runtime-specific tab is gated on the platform supplying the
snippet (older platform builds without the field don't show empty
tabs).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Why: the 2026-05-03 SG-missing-port-22 bug was structurally invisible to
local-dev — handleLocalConnect uses docker exec; only handleRemoteConnect
exercises EIC. The CP provisioner shipped without the EIC ingress rule
for ~6 months and nobody noticed until a paying tenant clicked Terminal.
Continuous synth-E2E runs every 20 min; adding this probe means the same
class of regression (CP provisioner ingress, EIC_ENDPOINT_SG_ID env,
handleRemoteConnect chain, SDK source-group support) surfaces within ~20
min of merge instead of waiting for a user report.
What: after Step 7 (workspace online), call
GET /workspaces/$wid/terminal/diagnose for each workspace. The endpoint
already exists in workspace-server (terminal_diagnose.go); it runs the
full EIC + ssh chain from inside the tenant (which has AWS creds via
its IAM profile) and returns {ok, first_failure, steps[]}. We just need
to call it as the tenant — no AWS creds plumbed onto the GHA runner,
no port-forwarding from CI.
Local-docker workspaces (instance_id NULL) hit diagnoseLocal which
probes docker.Ping + container exec; same ok=true contract, so the
probe works on both production paths.
This is a partial mitigation for task #269 (eliminate handleLocalConnect
bypass — local must mimic prod terminal path). The architectural fix
(refactor terminal.go so local docker also exercises an EIC-shaped
sequence) remains pending; this PR is the "find out issues earlier"
half of the user's directive.
User feedback: chat-bubble agent text still washed out after #2618 +
#2623. Looked at the actual rendered colors and the issue was Tailwind
Typography's `prose-invert` defaults — body text ships at zinc-300,
which lands at ~5.3:1 against bg-zinc-700. Passes AA but visibly
duller than the user bubble's crisp white-on-blue (~10:1).
Override the prose CSS variables on the agent bubble in dark mode:
- body → zinc-100 (was zinc-300)
- headings / bold → white
- inline code → zinc-100
That brings agent body text to ~13:1 against bg-zinc-700, matching the
user bubble's brightness so both sides of the conversation read at
the same crispness.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Same bug class as #2622 (ConfirmDialog), but on a more critical surface
— this is the top-of-page banner asking the user to approve / deny a
real workspace permission request.
1. **Deny was a no-op hover.** `bg-surface-card hover:bg-surface-card`
gave zero visual feedback before the user clicked a destructive
action. Now lifts to surface-elevated + brightens the text so the
button visibly responds.
2. **Approve hover went LIGHTER.** `bg-emerald-600 hover:bg-emerald-500`
dropped white-text contrast on hover. Reversed to emerald-700.
3. **No focus rings on either button.** Keyboard users had no way to
tell which decision was focused. Added focus-visible rings
(offset against the dark amber banner bg) — emerald for Approve,
amber for Deny so the choice is unambiguous.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Discovered during code review of the #2623 hotfix audit. Same
regression class as #2618: prose-invert applied where the bubble bg
themes between light/dark, leaving markdown unreadable in one theme.
`MarkdownBody` was unconditionally `prose-invert` — fine for the
outgoing-message bubble (bg-cyan-900, dark in both themes) and the
failure bubble (bg-red-950, dark in both themes), but WRONG for the
incoming-message bubble (bg-surface-card, which themes LIGHT in light
mode). Result: light prose body text on light cream bg = invisible
markdown for incoming peer-to-peer messages in light mode.
Added an `invert: "always" | "dark-only"` prop to MarkdownBody. The
NormalMessage call sites switch on `msg.flow` so each bubble gets the
direction matching its bg's theming behavior. Failure bubble keeps
the default ("always") since red-950 stays dark.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Regression from PR #2618 (chat dark-contrast).
PR #2618 switched the agent bubble bg to `dark:bg-zinc-700` so it
visibly elevates against the dark panel — but the inner ReactMarkdown
prose div only got `prose-invert` for USER messages. Result: in dark
mode the agent's markdown text rendered with the Tailwind Typography
plugin's default dark body color on top of the new dark bg = invisible
text. User reported empty-looking gray rectangles where agent replies
should be.
Fix: apply `dark:prose-invert` to agent bubbles so prose body text
flips light alongside the bg. Light mode unchanged (default prose
colors against the warm `bg-surface-card`).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three issues on a high-stakes surface (revoke token, delete workspace,
cascade delete):
1. **Cancel hover was a no-op.** `bg-surface-card hover:bg-surface-card`
gave zero visual feedback on hover. Now hovers to surface-elevated
with a softened border so the button visibly lifts.
2. **Confirm hovers went LIGHTER, dropping white-text contrast.**
`bg-red-600 hover:bg-red-500` made the destructive button less
readable on hover. Same for warning (amber) and primary (accent).
Reversed to hover-darker so contrast holds in both themes.
3. **No focus-visible rings on either button.** Keyboard users had no
indication of focus position (WCAG 2.4.7 fail). Added
`focus-visible:ring-2 focus-visible:ring-accent/40` on Cancel and
`focus-visible:ring-2 focus-visible:ring-offset-2 ...accent/60` on
Confirm so the focused destructive action is unambiguous.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PR #2571 fixed synth-E2E by branching MODEL_SLUG per runtime, but only
the langgraph branch was verified at runtime — hermes / claude-code /
override / fallback had zero automated coverage. A future regression
(e.g. dropping the langgraph case) would silently revert and only
surface as "Could not resolve authentication method" mid-E2E.
This PR:
- Extracts the dispatch into tests/e2e/lib/model_slug.sh as a sourceable
pick_model_slug() function. No behavior change.
- Adds tests/e2e/test_model_slug.sh — 9 assertions across all 5 dispatch
branches plus the override path. Verified to FAIL when any branch is
flipped (manually regressed langgraph slash-form to confirm the test
catches it; restored before commit).
- Wires the unit test into ci.yml's existing shellcheck job (only runs
when tests/e2e/ or scripts/ change). Pure-bash, no live infra.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
User screenshot showed pale lavender user bubbles with hard-to-read white
text and a nearly-invisible agent bubble blending into the dark panel.
Root causes:
1. Tailwind v4 defaults `dark:` to `prefers-color-scheme: dark`. Our
ThemeProvider writes `data-theme="dark"` on <html> so user toggle wins
over OS — but `dark:` classes elsewhere in the codebase weren't
tracking it. Added `@custom-variant dark` to re-bind the variant.
2. `bg-accent` themes lighter in dark mode (--color-accent: #6883e8),
dropping white-text contrast to ~3:1 (fails WCAG AA). Switched user
bubble to solid blue-600/500 so it stays ~5:1 in both modes.
3. `bg-surface-card` (#1a1d23) was only ~7% lighter than the panel bg
(#0e1014), making agent bubbles disappear. Bumped to zinc-700 in
dark; light mode keeps the warm surface-card tint.
4. System (error) bubble's /10 overlay was nearly invisible; raised to
/25 in dark with stronger border + ink for readability.
Sub-tab + textarea polish included: low-contrast `text-ink-soft` →
`text-ink-mid`, focus-visible rings on tabs, dark variants on textarea.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The chat_history query
WHERE workspace_id = $1
AND activity_type = 'a2a_receive'
AND (source_id = $2 OR target_id = $2)
ORDER BY created_at DESC
forces a workspace-scoped seq-scan-and-filter at every call —
idx_activity_ws_type_time covers workspace_id+type prefix but the
(source OR target) clause then walks every workspace row. Demo
workspaces (≤50 rows) don't notice; production workspaces accumulate
thousands over months and chat_history latency grows linearly.
Adds two partial btree indexes (workspace_id, source_id) WHERE NOT NULL
and (workspace_id, target_id) WHERE NOT NULL. Postgres BitmapOrs them
into a workspace-scoped BitmapAnd against the existing index, dropping
chat_history from O(workspace_rows) to O(peer_a2a_rows).
Partial WHERE NOT NULL because most activity rows (heartbeats,
agent_log, memory_write, etc.) carry NULL source_id/target_id and
shouldn't bloat the index.
Anti-pattern caveat (per the issue): a single compound (a, b) index
can't serve 'a OR b' — Postgres only uses compound for prefix match.
Two separate indexes + BitmapOr is the right shape.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Top-of-canvas Toolbar had multiple low-contrast surfaces in light theme:
Action buttons (Stop All, Restart Pending):
- bg-red-950/50 + bg-amber-950/40 → bg-bad/10 + bg-warm/10 with bg-bad/40
+ bg-warm/40 borders. Dark-tinted backgrounds with /40-/50 alpha render
as nearly invisible smudges on warm-paper; semantic tokens at /10 give
a clear pale-bad / pale-warm tint that scales correctly in dark mode.
- Both gain focus-visible:ring-2 focus-visible:ring-{bad,warm}/40.
Toggle button (A2A edges):
- Active state: bg-blue-950/50 → bg-accent/15 (themes correctly).
- Inactive state: bg-surface-card/50 + text-ink-soft → solid bg-surface-card
+ text-ink-mid; hover bumps to text-ink. Drops the redundant
"hover:bg-surface-card/50" identity hover.
Icon buttons (Audit, Search, Help):
- Same pattern as toggle inactive: solid bg-surface-card + text-ink-mid +
text-ink hover, with focus-visible:ring-2 focus-visible:ring-accent/40.
Workspace count + bullet separator:
- text-ink-soft (3.5:1 on warm-paper) → text-ink-mid (7:1).
WS connection status:
- "Live": text-ink-soft → text-ink-mid (paired with the green dot).
- "Reconnecting": text-ink-soft → text-warm (semantic match for amber dot).
- "Offline": text-ink-soft → text-bad (semantic match for red dot).
Status text now reinforces the dot colour instead of disappearing on
light surfaces.
Help popover:
- Close button: text-ink-soft → text-ink-mid + focus-visible:underline.
- HelpRow body text: text-ink-soft → text-ink-mid (was 3.5:1 on the
bg-surface-sunken/45 popover row — failed AA for body text).
Defense-in-depth follow-up to #2481 (peer_id trust-boundary gate).
Same XML-attribute injection vector applies to the four other meta
fields rendered as agent-context attrs in the <channel> tag:
<channel kind="..." method="..." activity_id="..." ts="..." source="molecule">
Each field is now passed through a closed-set / shape-validate gate:
- kind → frozenset {canvas_user, peer_agent} via _safe_meta_field
- method → frozenset {message/send, tasks/send, tasks/get, notify, ""}
- activity_id → UUID-shape regex via _safe_activity_id
- ts → ISO-8601 RFC3339 regex via _safe_ts
Any value outside the allowed shape is replaced with empty string.
Today the values come from a platform-DB column so they're trusted,
but "trust the source" was the same assumption that got peer_id into
trouble (#2481). Closed-enum allowlists make this row-content-blind.
5 new tests mirroring test_envelope_enrichment_strips_path_traversal_peer_id:
- test_envelope_strips_unknown_kind — kind injection stripped
- test_envelope_strips_unknown_method — method injection stripped
- test_envelope_strips_malformed_activity_id — non-UUID stripped
- test_envelope_strips_malformed_ts — non-ISO8601 stripped
- test_envelope_keeps_valid_meta_fields_unchanged — happy-path negative case
Mutation-tested: temporarily making _safe_meta_field permissive kills
both kind/method strip tests with the injection payload reflecting
into the meta dict, confirming the gate is what blocks them.
Two existing tests updated to use UUID-shaped activity_ids ("act-7",
"act-bridge-test" → real UUIDs) since the gate strips synthetic ids.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>