Self-review of the modal-tab additions caught footguns in the new
hermes/codex/openclaw snippets. Ship the fixes before merge.
Critical 1 — Hermes `cat >> ~/.hermes/config.yaml` corrupts existing
configs. Most existing hermes installs have a top-level gateway:
block; appending creates a duplicate, which YAML rejects. Replaced
the auto-append with explicit instructions: 'under your existing
gateway: block, add a plugin_platforms entry'.
Critical 2 — Codex `cat >> ~/.codex/config.toml` corrupts on
re-run. TOML rejects duplicate [mcp_servers.molecule] tables; a
second run breaks codex parse. Replaced auto-append with commented
config block + explicit 'open ~/.codex/config.toml in your editor
and paste'. Canvas-side token stamping still hits the literal in
the comment so the operator's clipboard has the real token already
substituted.
Required 3 — OpenClaw `onboard --non-interactive` missing
provider/model defaults. Added explicit --provider + --model
placeholders in a commented form so operators see what's needed
without a stub default applying silently.
Required 4 — OpenClaw gateway started with bare '&' dies on
terminal close. Switched to nohup + log file + disown, with a note
that systemd is the right answer for production.
Optional 5 + 6 (env_vars cleanup, tests) deferred — env_vars stripped
to keep the in-tree-vs-external surface narrow; tests for the new
response fields can land separately when external_connection.go is
next touched.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The External Connect modal had tabs for Python SDK / curl / Claude Code
channel / Universal MCP. Operators using hermes / codex / openclaw as
their external runtime had no copy-paste; they pieced together
WORKSPACE_ID + PLATFORM_URL + auth_token into config files by reading
docs.
Adds three runtime-specific snippets stamped server-side:
- **Hermes** — installs molecule-ai-workspace-runtime + the
hermes-channel-molecule plugin, exports the 4 env vars, and writes
the gateway.plugin_platforms.molecule block into ~/.hermes/config.yaml.
Same long-poll-based push semantics the Claude Code channel tab
delivers (push parity with the in-tree template-hermes adapter).
- **Codex** — wires the molecule_runtime A2A MCP server into
~/.codex/config.toml ([mcp_servers.molecule] block with env_vars
passthrough + literal env values). Outbound tools only — codex's
MCP client doesn't route arbitrary notifications/* (verified by
reading codex-rs/codex-mcp/src/connection_manager.rs); push parity
on external codex would need a separate bridge daemon, tracked
as future work. Snippet calls this out so operators know to pair
with Python SDK if they need inbound delivery.
- **OpenClaw** — installs openclaw + onboards, wires the molecule
MCP server via openclaw mcp set, starts the gateway on loopback.
Same outbound-tools-only caveat as codex; the in-tree template-
openclaw adapter implements the full sessions.steer push path,
but an external setup would need the same bridge daemon to translate
platform inbox events into sessions.steer calls. Future work.
Default open tab changed from "Claude Code" to "Universal MCP".
Universal MCP is runtime-agnostic and works as a starting point for
any operator regardless of their downstream agent runtime; runtime-
specific tabs are still one click away. Pre-2026-05-03 the modal
defaulted to Claude Code, so operators using non-Claude runtimes
opened to a tab they had to skip past.
Tab order also reorganized:
Universal MCP → Python SDK → Claude Code → Hermes → Codex → OpenClaw → curl → Fields
Each runtime-specific tab is gated on the platform supplying the
snippet (older platform builds without the field don't show empty
tabs).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Why: the 2026-05-03 SG-missing-port-22 bug was structurally invisible to
local-dev — handleLocalConnect uses docker exec; only handleRemoteConnect
exercises EIC. The CP provisioner shipped without the EIC ingress rule
for ~6 months and nobody noticed until a paying tenant clicked Terminal.
Continuous synth-E2E runs every 20 min; adding this probe means the same
class of regression (CP provisioner ingress, EIC_ENDPOINT_SG_ID env,
handleRemoteConnect chain, SDK source-group support) surfaces within ~20
min of merge instead of waiting for a user report.
What: after Step 7 (workspace online), call
GET /workspaces/$wid/terminal/diagnose for each workspace. The endpoint
already exists in workspace-server (terminal_diagnose.go); it runs the
full EIC + ssh chain from inside the tenant (which has AWS creds via
its IAM profile) and returns {ok, first_failure, steps[]}. We just need
to call it as the tenant — no AWS creds plumbed onto the GHA runner,
no port-forwarding from CI.
Local-docker workspaces (instance_id NULL) hit diagnoseLocal which
probes docker.Ping + container exec; same ok=true contract, so the
probe works on both production paths.
This is a partial mitigation for task #269 (eliminate handleLocalConnect
bypass — local must mimic prod terminal path). The architectural fix
(refactor terminal.go so local docker also exercises an EIC-shaped
sequence) remains pending; this PR is the "find out issues earlier"
half of the user's directive.
User feedback: chat-bubble agent text still washed out after #2618 +
#2623. Looked at the actual rendered colors and the issue was Tailwind
Typography's `prose-invert` defaults — body text ships at zinc-300,
which lands at ~5.3:1 against bg-zinc-700. Passes AA but visibly
duller than the user bubble's crisp white-on-blue (~10:1).
Override the prose CSS variables on the agent bubble in dark mode:
- body → zinc-100 (was zinc-300)
- headings / bold → white
- inline code → zinc-100
That brings agent body text to ~13:1 against bg-zinc-700, matching the
user bubble's brightness so both sides of the conversation read at
the same crispness.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Same bug class as #2622 (ConfirmDialog), but on a more critical surface
— this is the top-of-page banner asking the user to approve / deny a
real workspace permission request.
1. **Deny was a no-op hover.** `bg-surface-card hover:bg-surface-card`
gave zero visual feedback before the user clicked a destructive
action. Now lifts to surface-elevated + brightens the text so the
button visibly responds.
2. **Approve hover went LIGHTER.** `bg-emerald-600 hover:bg-emerald-500`
dropped white-text contrast on hover. Reversed to emerald-700.
3. **No focus rings on either button.** Keyboard users had no way to
tell which decision was focused. Added focus-visible rings
(offset against the dark amber banner bg) — emerald for Approve,
amber for Deny so the choice is unambiguous.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Discovered during code review of the #2623 hotfix audit. Same
regression class as #2618: prose-invert applied where the bubble bg
themes between light/dark, leaving markdown unreadable in one theme.
`MarkdownBody` was unconditionally `prose-invert` — fine for the
outgoing-message bubble (bg-cyan-900, dark in both themes) and the
failure bubble (bg-red-950, dark in both themes), but WRONG for the
incoming-message bubble (bg-surface-card, which themes LIGHT in light
mode). Result: light prose body text on light cream bg = invisible
markdown for incoming peer-to-peer messages in light mode.
Added an `invert: "always" | "dark-only"` prop to MarkdownBody. The
NormalMessage call sites switch on `msg.flow` so each bubble gets the
direction matching its bg's theming behavior. Failure bubble keeps
the default ("always") since red-950 stays dark.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Regression from PR #2618 (chat dark-contrast).
PR #2618 switched the agent bubble bg to `dark:bg-zinc-700` so it
visibly elevates against the dark panel — but the inner ReactMarkdown
prose div only got `prose-invert` for USER messages. Result: in dark
mode the agent's markdown text rendered with the Tailwind Typography
plugin's default dark body color on top of the new dark bg = invisible
text. User reported empty-looking gray rectangles where agent replies
should be.
Fix: apply `dark:prose-invert` to agent bubbles so prose body text
flips light alongside the bg. Light mode unchanged (default prose
colors against the warm `bg-surface-card`).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three issues on a high-stakes surface (revoke token, delete workspace,
cascade delete):
1. **Cancel hover was a no-op.** `bg-surface-card hover:bg-surface-card`
gave zero visual feedback on hover. Now hovers to surface-elevated
with a softened border so the button visibly lifts.
2. **Confirm hovers went LIGHTER, dropping white-text contrast.**
`bg-red-600 hover:bg-red-500` made the destructive button less
readable on hover. Same for warning (amber) and primary (accent).
Reversed to hover-darker so contrast holds in both themes.
3. **No focus-visible rings on either button.** Keyboard users had no
indication of focus position (WCAG 2.4.7 fail). Added
`focus-visible:ring-2 focus-visible:ring-accent/40` on Cancel and
`focus-visible:ring-2 focus-visible:ring-offset-2 ...accent/60` on
Confirm so the focused destructive action is unambiguous.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PR #2571 fixed synth-E2E by branching MODEL_SLUG per runtime, but only
the langgraph branch was verified at runtime — hermes / claude-code /
override / fallback had zero automated coverage. A future regression
(e.g. dropping the langgraph case) would silently revert and only
surface as "Could not resolve authentication method" mid-E2E.
This PR:
- Extracts the dispatch into tests/e2e/lib/model_slug.sh as a sourceable
pick_model_slug() function. No behavior change.
- Adds tests/e2e/test_model_slug.sh — 9 assertions across all 5 dispatch
branches plus the override path. Verified to FAIL when any branch is
flipped (manually regressed langgraph slash-form to confirm the test
catches it; restored before commit).
- Wires the unit test into ci.yml's existing shellcheck job (only runs
when tests/e2e/ or scripts/ change). Pure-bash, no live infra.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
User screenshot showed pale lavender user bubbles with hard-to-read white
text and a nearly-invisible agent bubble blending into the dark panel.
Root causes:
1. Tailwind v4 defaults `dark:` to `prefers-color-scheme: dark`. Our
ThemeProvider writes `data-theme="dark"` on <html> so user toggle wins
over OS — but `dark:` classes elsewhere in the codebase weren't
tracking it. Added `@custom-variant dark` to re-bind the variant.
2. `bg-accent` themes lighter in dark mode (--color-accent: #6883e8),
dropping white-text contrast to ~3:1 (fails WCAG AA). Switched user
bubble to solid blue-600/500 so it stays ~5:1 in both modes.
3. `bg-surface-card` (#1a1d23) was only ~7% lighter than the panel bg
(#0e1014), making agent bubbles disappear. Bumped to zinc-700 in
dark; light mode keeps the warm surface-card tint.
4. System (error) bubble's /10 overlay was nearly invisible; raised to
/25 in dark with stronger border + ink for readability.
Sub-tab + textarea polish included: low-contrast `text-ink-soft` →
`text-ink-mid`, focus-visible rings on tabs, dark variants on textarea.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The chat_history query
WHERE workspace_id = $1
AND activity_type = 'a2a_receive'
AND (source_id = $2 OR target_id = $2)
ORDER BY created_at DESC
forces a workspace-scoped seq-scan-and-filter at every call —
idx_activity_ws_type_time covers workspace_id+type prefix but the
(source OR target) clause then walks every workspace row. Demo
workspaces (≤50 rows) don't notice; production workspaces accumulate
thousands over months and chat_history latency grows linearly.
Adds two partial btree indexes (workspace_id, source_id) WHERE NOT NULL
and (workspace_id, target_id) WHERE NOT NULL. Postgres BitmapOrs them
into a workspace-scoped BitmapAnd against the existing index, dropping
chat_history from O(workspace_rows) to O(peer_a2a_rows).
Partial WHERE NOT NULL because most activity rows (heartbeats,
agent_log, memory_write, etc.) carry NULL source_id/target_id and
shouldn't bloat the index.
Anti-pattern caveat (per the issue): a single compound (a, b) index
can't serve 'a OR b' — Postgres only uses compound for prefix match.
Two separate indexes + BitmapOr is the right shape.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Top-of-canvas Toolbar had multiple low-contrast surfaces in light theme:
Action buttons (Stop All, Restart Pending):
- bg-red-950/50 + bg-amber-950/40 → bg-bad/10 + bg-warm/10 with bg-bad/40
+ bg-warm/40 borders. Dark-tinted backgrounds with /40-/50 alpha render
as nearly invisible smudges on warm-paper; semantic tokens at /10 give
a clear pale-bad / pale-warm tint that scales correctly in dark mode.
- Both gain focus-visible:ring-2 focus-visible:ring-{bad,warm}/40.
Toggle button (A2A edges):
- Active state: bg-blue-950/50 → bg-accent/15 (themes correctly).
- Inactive state: bg-surface-card/50 + text-ink-soft → solid bg-surface-card
+ text-ink-mid; hover bumps to text-ink. Drops the redundant
"hover:bg-surface-card/50" identity hover.
Icon buttons (Audit, Search, Help):
- Same pattern as toggle inactive: solid bg-surface-card + text-ink-mid +
text-ink hover, with focus-visible:ring-2 focus-visible:ring-accent/40.
Workspace count + bullet separator:
- text-ink-soft (3.5:1 on warm-paper) → text-ink-mid (7:1).
WS connection status:
- "Live": text-ink-soft → text-ink-mid (paired with the green dot).
- "Reconnecting": text-ink-soft → text-warm (semantic match for amber dot).
- "Offline": text-ink-soft → text-bad (semantic match for red dot).
Status text now reinforces the dot colour instead of disappearing on
light surfaces.
Help popover:
- Close button: text-ink-soft → text-ink-mid + focus-visible:underline.
- HelpRow body text: text-ink-soft → text-ink-mid (was 3.5:1 on the
bg-surface-sunken/45 popover row — failed AA for body text).
Defense-in-depth follow-up to #2481 (peer_id trust-boundary gate).
Same XML-attribute injection vector applies to the four other meta
fields rendered as agent-context attrs in the <channel> tag:
<channel kind="..." method="..." activity_id="..." ts="..." source="molecule">
Each field is now passed through a closed-set / shape-validate gate:
- kind → frozenset {canvas_user, peer_agent} via _safe_meta_field
- method → frozenset {message/send, tasks/send, tasks/get, notify, ""}
- activity_id → UUID-shape regex via _safe_activity_id
- ts → ISO-8601 RFC3339 regex via _safe_ts
Any value outside the allowed shape is replaced with empty string.
Today the values come from a platform-DB column so they're trusted,
but "trust the source" was the same assumption that got peer_id into
trouble (#2481). Closed-enum allowlists make this row-content-blind.
5 new tests mirroring test_envelope_enrichment_strips_path_traversal_peer_id:
- test_envelope_strips_unknown_kind — kind injection stripped
- test_envelope_strips_unknown_method — method injection stripped
- test_envelope_strips_malformed_activity_id — non-UUID stripped
- test_envelope_strips_malformed_ts — non-ISO8601 stripped
- test_envelope_keeps_valid_meta_fields_unchanged — happy-path negative case
Mutation-tested: temporarily making _safe_meta_field permissive kills
both kind/method strip tests with the injection payload reflecting
into the meta dict, confirming the gate is what blocks them.
Two existing tests updated to use UUID-shaped activity_ids ("act-7",
"act-bridge-test" → real UUIDs) since the gate strips synthetic ids.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two follow-ups from the multi-axis review of #2474:
1. **Docstring inversion** in tool_chat_history. The doc said
'(source_id=peer)' meant 'this workspace is the sender' — actually
it means the *peer* is the sender (source_id is where the activity
came FROM). Reframed to 'where the peer is either the sender or
the recipient' to match the underlying SQL semantics.
2. **Empty-history test**. TestChatHistory had 10 tests but no
200+[] happy-path pin. Added test_empty_history_returns_empty_json_list
asserting result == '[]' on exact-equality (per assert-exact
memory — substring '[]' would match envelope shapes too).
Both changes are pure docs+tests — no behaviour change.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Chat bubble fixes (canvas/src/components/tabs/ChatTab.tsx):
- User bubble: bg-accent-strong/30 + text-blue-100 → bg-accent + text-white
(translucent dark-blue overlay on warm-paper surface read as pale lavender
with near-invisible light-blue text — a real WCAG AA failure on the
highest-traffic surface in canvas).
- System/error bubble: bg-red-900/30 + text-red-200 → bg-bad/10 + text-bad,
using semantic tokens so dark-mode adapts automatically.
- Agent bubble: drop /80 + /30 opacity modifiers; solid bg-surface-card +
text-ink + border-line gives consistent contrast in both themes.
- prose-invert was unconditional, so markdown text on agent/system bubbles
rendered as light text on a light surface in light mode. Make it apply
only on the user bubble (the only dark surface in this component).
- Timestamp: text-ink-soft is too pale on warm-paper; use text-ink-mid for
agent/system, white/70 for user (visible on the now-solid blue bg).
Sub-tab bar fixes (canvas/src/components/SidePanel.tsx):
- Right-edge fade was hardcoded `from-zinc-950` — that paints a dark vertical
strip on the right edge of the tab bar in light mode. Switch to
`from-surface` so the gradient blends into whichever theme is active.
- Inactive tab text: text-ink-soft (~3.5:1 on warm-paper) → text-ink-mid
(~7:1). Active tab background: drop the /40 opacity so the selection is
unambiguous on either surface.
No semantic-token additions; all changes use existing warm-paper tokens
that already work in both themes.
The two missing branch tests called out by the multi-axis review of #2471.
a2a_client.enrich_peer_metadata handles two failure shapes (lines 105-112)
that the existing 12 envelope-enrichment tests don't exercise:
1. HTTP 200, response.json() raises (non-JSON body)
2. HTTP 200, valid JSON, but body is list/string/number not dict
Both paths land at the negative-cache write, but no test verified the
discriminator. Pin both with the same call_count == 1 assertion shape
the 5xx + network-exception tests already use.
Verified: temporarily removing the negative-cache write in either
branch makes the corresponding test fail with call_count == 2 — the
assertion correctly discriminates the contract from a fall-through.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The auto-promote ↔ auto-sync chain has been generating empty PRs
indefinitely since the staging merge_queue ruleset uses MERGE
strategy:
1. Auto-promote merges PR via queue → main = merge commit M2 not in staging
2. Auto-sync opens sync-back PR. Workflow's local `git merge --ff-only`
succeeds (PR title even says "ff to ..."), but the queue lands the
PR via MERGE → staging = merge commit S2 not in main
3. Auto-promote sees staging ahead by 1 → opens new promote PR. Tree
diff vs main = 0 (S2's tree == main's tree). But the gate logic
only checks "all required workflows green", not "actual code to
ship" → opens an empty promote PR
4. ... repeat indefinitely
Each round costs ~30-40 min wallclock, ~2 manual approvals (the queue
requires 1 review and the bot can't self-approve without admin
bypass), and one full CodeQL Go run (~15 min).
Observed today (2026-05-03) across PRs #2592 → #2594 → #2595 → #2596
→ #2597 — 5 PRs, ~3 hours, all empty content.
Fix: before opening the promote PR, check that staging's tree
actually differs from main's tree. If they're identical (the
empty-merge-commit cycle), skip cleanly and let the cycle terminate.
Implementation:
- New step `Skip if staging tree == main tree` runs before the
existing gate check.
- `git diff --quiet origin/main $HEAD_SHA` exits 0 iff trees match.
- On match: emits a step summary explaining the skip + sets
`skip=true`; subsequent gate-check + promote steps are gated on
`skip != 'true'` so they short-circuit.
- Fail-open: if `git fetch` errors, fall through to gate check
(preserve existing behavior). Only skip when diff is DEFINITIVELY
empty.
Long-term, the cleaner fix is to switch the merge_queue ruleset's
merge_method away from MERGE so FF-able PRs land cleanly without a
new commit — but that's a broader change affecting every staging
PR's commit shape. This guard is the surgical one-step break.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>