Prior CI failures on this PR were infra-class (Detect changes hit
'Error: ENOSPC: no space left on device' from runner disk-full caused
by 120 zombie tasks since drained; Python Lint flaked on perf test
test_batch_fetcher_runs_submitted_rows_concurrently by 3ms under
contended runners — same test passes cleanly on main HEAD 1b0e947).
Re-firing CI on recovered runners; no code change. [no-op]
CI / all-required (pull_request) Compensating status: emitter dropped CI / all-required ctx on this commit (gitea 1.22.6 null-state). 2 non-author APPROVEs present, sop-checklist/sop-tier-check/gate-check-v3 all Successful per status descriptions. Posted per feedback_gitea_emitter_null_state_blocks_merge.
## Summary
- mc#1535 fixed the per-session-overwrite bug in the Universal MCP
snippet (`claude mcp add molecule -s user` keyed by `molecule`, so
installing for a second workspace silently replaced the first). The
same equivalence-class bug exists in EVERY other runtime tab the
Canvas modal renders: each MCP host keys its config by name, and all
five templates hardcoded a fixed `molecule` identifier.
- This PR extends mc#1535's existing `{{MCP_SERVER_NAME}}` placeholder
+ `mcpServerNameForWorkspace()` helper into the 4 remaining
templates so the Canvas snippet a user pastes is unique per
workspace by construction across ALL runtime tabs — multi-workspace
works out-of-the-box with no per-host workarounds.
## Bug shape per runtime tab (mc#1535 sibling)
- **codex** (`~/.codex/config.toml`): `[mcp_servers.molecule]` — TOML
rejects duplicate table keys, so re-paste either breaks parsing or
overwrites.
- **openclaw** (`~/.openclaw/mcp/molecule.json`): `openclaw mcp set
molecule` keyed by name — second workspace overwrites.
- **hermes** (`~/.hermes/config.yaml`): `plugin_platforms.molecule:` —
YAML rejects duplicate mapping keys, second workspace silently
collapses.
- **kimi** (`~/.molecule-ai/kimi-workspace/`): single per-host dir —
second workspace's env+bridge.py overwrites the first.
## What changed
- `workspace-server/internal/handlers/external_connection.go`:
- 4 templates now stamp `{{MCP_SERVER_NAME}}` (the same slug
mc#1535 already derives + plumbs into the universal_mcp snippet)
in the keyed identifier:
- codex: `[mcp_servers.{{MCP_SERVER_NAME}}]` + `.env` table.
- openclaw: `openclaw mcp set {{MCP_SERVER_NAME}}` + log path.
- hermes: `plugin_platforms.{{MCP_SERVER_NAME}}:`.
- kimi: `~/.molecule-ai/kimi-{{MCP_SERVER_NAME}}/` dir + embedded
python `ENV` path.
- Header comment in each template documents the multi-workspace
contract (mirrors mc#1535's universal_mcp header).
- `workspace-server/internal/handlers/external_rotate_test.go`:
- New `TestBuildExternalConnectionPayload_AllRuntimeSnippetsAreWorkspaceUnique`
pins the per-template literal that proves the slug was stamped,
AND asserts no template leaves a literal `{{MCP_SERVER_NAME}}`
placeholder — catches a future template author who forgets to
register a new tab with the stamp pipeline.
- `workspace/a2a_mcp_server.py`:
- Comment-only update on `serverInfo.name` to reflect that the
per-host registration name is workspace-specific. No code change;
`serverInfo.name` stays the generic `"molecule"` self-label.
- `scripts/build_runtime_package.py` (PyPI README generator):
- Updates 3 `claude mcp add molecule -- molecule-mcp` references to
`claude mcp add molecule-<workspace-slug> -- molecule-mcp` so the
PyPI README matches the Canvas-stamped snippet pattern.
- Adds a "Server name in `claude mcp add` is workspace-specific"
bullet pointing at mc#1535 + this PR for context.
## Open-source-templates cleanliness check
- Templates touched here live in the PRIVATE molecule-core repo
(Canvas modal generator); they STAMP per-workspace server names but
do NOT bake any new `git.moleculesai.app` literal or other
org-internal infra. Generic `pip install
'git+https://git.moleculesai.app/molecule-ai/hermes-channel-molecule.git'`
in the hermes template is the only such URL touched and was
pre-existing — that one points at a public hermes-side plugin and
has its own canonical URL; not in scope for the open-source-template
rule (the rule applies to template-codex/template-hermes/
template-openclaw — separate public repos, untouched here).
- No `.moleculesai.app` literal added; persona-token shape unchanged
(auth_token still per-workspace minted by Rotate/Create — same path
mc#1535 audited).
## Sample stamped snippets (workspace name "my-bot", slug "molecule-my-bot")
- codex: `[mcp_servers.molecule-my-bot]` + `[mcp_servers.molecule-my-bot.env]`
- openclaw: `openclaw mcp set molecule-my-bot "$(cat <<EOF ... )"`
- hermes: `plugin_platforms:\n molecule-my-bot:\n enabled: true`
- kimi: `~/.molecule-ai/kimi-molecule-my-bot/{env,kimi_bridge.py}`
## Diff size
- 4 files, +135/-40 LoC. Most of it is comment text + the new test.
- Did NOT change `BuildExternalConnectionPayload` signature or
`mcpServerNameForWorkspace` semantics — both were already plumbed
by mc#1535 to all 8 snippets via the stamp closure; this PR only
updates the template text to USE the placeholder.
## Test plan
- [x] `go test ./internal/handlers/ -run TestBuildExternalConnectionPayload` — 5/5 green, including new `_AllRuntimeSnippetsAreWorkspaceUnique`.
- [x] `go test ./internal/handlers/` full package — 15.9s green.
- [x] `go vet ./internal/handlers/` — clean.
- [ ] Manual (post-merge, requires mc#1535 also merged): create two
"bot-a" + "bot-b" external workspaces on staging; paste each
tab's snippet into the corresponding host on a single machine;
verify `claude mcp list` / `cat ~/.codex/config.toml` /
`openclaw mcp list` / `~/.hermes/config.yaml` / `ls
~/.molecule-ai/` each shows BOTH workspaces' entries side-by-
side, not overwriting.
## Sequencing
- This PR's base is mc#1535's branch
(`fix/add-to-claude-code-unique-server-name-per-workspace`),
because it reuses mc#1535's `{{MCP_SERVER_NAME}}` placeholder +
slug helper + `BuildExternalConnectionPayload(workspaceName)`
signature change. Will need a rebase on main after mc#1535 lands;
prefer to keep stacked to make the review of EACH PR scope-tight.
- CTO 2026-05-18 22:43Z: "其实是我们没有做好instruction,这个得补充" —
this PR is the consolidated per-repo doc/generator fix.
## Related
- Sibling: mc#1535 (Universal MCP snippet, already open).
- Follow-up #230: molecule-core stale channel-install mentions
(CONTRIBUTING.md:195, etc.) — separate scope.
Author identity: core-devops (per-role persona; not founder-PAT).
Opened for non-author review, NOT auto-merged.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The Universal MCP install snippet hardcoded `claude mcp add molecule -s user`
— `claude mcp add` keys entries by name, so installing for workspace B
silently overwrote workspace A in the user's ~/.claude.json. A single
external Claude Code session ended up able to talk to only ONE molecule
workspace at a time — the CTO-observed "this is per-session" UX
(2026-05-18 22:28Z). MCP itself supports many servers per session; the
install snippet was the only thing standing in the way.
Fix: derive a unique server name per workspace at payload-build time —
`molecule-<slug>` where slug = lowercased/hyphen-collapsed workspace
name (max 24 chars), falling back to the first 8 chars of the workspace
ID when the name is empty or slugifies to nothing. The result is
alphanumeric + hyphens only (URL-safe + Claude-Code-name-safe).
Plumbed through all 3 callers of BuildExternalConnectionPayload:
- Create (workspace.go) passes payload.Name directly.
- Rotate / GetExternalConnection (external_rotate.go) extend the
existing runtime lookup to also SELECT name in the same round-trip
(lookupWorkspaceRuntimeAndName replaces lookupWorkspaceRuntime —
one query, no extra DB load).
Snippet header now documents the multi-workspace contract: re-running
the snippet from another workspace's modal ADDS another entry; same-
name workspaces collide by design, rename one to disambiguate.
Surgical: only externalUniversalMcpTemplate gained a {{MCP_SERVER_NAME}}
placeholder. Other tabs (Python SDK / curl / Hermes / codex / openclaw /
kimi) already use distinct config keys per provider and aren't affected.
Tests: TestBuildExternalConnectionPayload_McpServerNameUniquePerWorkspace
pins 4 cases (plain name, name w/ spaces+caps, name w/ symbols, empty
name fallback to UUID prefix) — would have caught the original
"claude mcp add molecule" regression. Existing rotate/get tests updated
for the 2-column SELECT.
Related: task #229 (molecule-mcp-claude-channel install-doc blockers).
This is the canvas-side counterpart — that PR fixed the plugin docs,
this PR fixes the modal-generator snippet operators actually copy.
Sample generated lines (was → now):
was: claude mcp add molecule -s user -- env WORKSPACE_ID=... molecule-mcp
now: claude mcp add molecule-my-bot -s user -- env WORKSPACE_ID=... molecule-mcp
(where "my-bot" is the workspace name; "molecule-12345678" if unnamed)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds workspace-server/internal/provisioner/t4_privilege_contract.go as the
single source of truth for the T4 ("full machine access") capability set
that template-repo CI workflows currently re-implement as bespoke shell.
Today's t4-conformance gates in template-claude-code / template-hermes /
template-codex each hand-assert agent-uid + token-ownership + host-root
reach. The shell drifts (the very Hermes 401 class bug came from drift),
and there's no way to add a new capability fleet-wide without N template
PRs.
This contract:
* Defines T4Capability as code (Name/Description/Probe/Severity/Source)
* Lists the closure: agent_uid_1000, auth_token_agent_owned,
host_root_reach_via_nsenter, host_fs_write_readback,
docker_socket_reachable, list_peers_http_200, agent_home_writable,
network_egress_https, privileged_flag_observable, pid_host_visible
* Renders to YAML via AsYAML() and cmd/t4-contract-dump so any
template CI can do:
go run ./workspace-server/cmd/t4-contract-dump > t4_capabilities.yaml
and iterate capabilities — new capabilities propagate without
per-template PRs.
* Pure stdlib + no Molecule-AI-internal deps so fork users can adopt
the same contract.
Anti-drift unit tests (7, all green):
- all caps have required fields
- names unique
- core closure (RFC#456 + task #128/#174) is present
- hard-severity is strict majority
- YAML is deterministic + escapes double quotes
- YAML header cites internal#456
- AgentUID const consistent with probes
Does NOT change Docker/Dockerfile or any existing emit-side behavior;
this is purely additive. The provisioner.go T4 branch is unchanged.
Templates adopt the YAML in a separate PR (pilot:
template-claude-code).
Refs: RFC internal#456, task #174, memory
reference_per_template_privilege_contract_class_audit_2026_05_16,
memory feedback_hermes_listpeers_401_token_root600_unreadable_by_uid1000.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Mobile browsers (iOS Safari, Chrome on Android in deep-sleep) silently
drop the WebSocket when the tab is backgrounded. The in-page `onclose`
fires very late or never, so the reconnect backoff never schedules — the
canvas appears frozen until the user manually refreshes. Symptoms:
- #223 mobile canvas chat has no real-time updates (must refresh)
- #228 cross-device: user's own chat input doesn't broadcast to
other sessions in real time (must refresh)
Root cause: `canvas/src/store/socket.ts` had no visibility-wake. The
reconnect loop only re-arms on `onclose`, and mobile OSes don't always
fire `onclose` when they kill the WS.
Fix:
- Add `ReconnectingSocket.wake()` — forces an immediate reconnect
when the socket is in CLOSED / CLOSING / null limbo, no-op when
OPEN or CONNECTING. Pre-empts any pending backoff timer and resets
the attempt counter (this was a user-initiated wake, not an
unattended-tab failure cascade).
- Wire a module-level `visibilitychange` + `pageshow` listener inside
`connectSocket()`; remove it in `disconnectSocket()`. `pageshow`
covers Safari's bfcache restore where `visibilitychange` doesn't
fire on its own.
- Export `wakeSocket()` so the test suite can exercise the path
without depending on a jsdom DOM (the existing socket.test.ts
runs under the `node` environment).
Tests (5 new cases under `wakeSocket → reconnect`):
- wake on OPEN: no new WS
- wake on CLOSED: new WS created (the #223 fix)
- wake on CONNECTING: no extra handshake piled on
- wake cancels pending backoff `setTimeout`
- wake after `disconnectSocket()` is a no-op (no zombie)
Closes#223Closes#228
iOS Safari and PWAs auto-zoom the viewport when a focused input or
textarea has a computed font-size below 16px. Two mobile-canvas inputs
were below that bound, causing the layout to jump and look broken on
focus until the user pinched back:
- MobileSpawn.tsx agent-name input (fontSize: 13.5) — #225
- MobileChat.tsx composer textarea (fontSize: 14.5) — #224
Both bumped to 16px (the minimum that suppresses focus-zoom). This is
the same class of bug as desktop #1434, scoped here to the mobile
breakpoint.
Tests:
- MobileSpawn.test: assert agent-name input renders at fontSize >= 16
- MobileChat.test: assert composer textarea renders at fontSize >= 16
Both parse the inline style.fontSize (jsdom has no layout engine, so
getComputedStyle reports the inline value verbatim).
Closes#224Closes#225
Sweep stale Cloudflare DNS records / Sweep CF orphans (push) Compensated by status-reaper (workflow has no push: trigger; Gitea 1.22.6 hardcoded-suffix bug — see .gitea/scripts/status-reaper.py)
The new prod-team personas (agent-dev-a, agent-dev-b, agent-pm) ship
only `token` + `universal-auth.env` (Infisical UA bootstrap), no `env`
file. loadPersonaEnvFile silently no-ops on them today. With this
fallback, GITEA_TOKEN/USER/EMAIL get populated from the token file
when no env file exists.
Combined with the GIT_ASKPASS injection earlier in this PR, this
makes the askpass helper functional for the new personas.
Sweep stale Cloudflare DNS records / Sweep CF orphans (push) Compensated by status-reaper (workflow has no push: trigger; Gitea 1.22.6 hardcoded-suffix bug — see .gitea/scripts/status-reaper.py)
Wire container-side `git` HTTPS authentication to the persona credentials
that already arrive via workspace_secrets (GITEA_USER / GITEA_TOKEN,
GIT_HTTP_USERNAME / GIT_HTTP_PASSWORD) without mutating ~/.gitconfig or
~/.git-credentials inside the container.
Mechanism:
1. New generic GIT_ASKPASS helper baked into the workspace runtime
image at /usr/local/bin/molecule-askpass. Script body is hostname-
free and vendor-neutral — the deployer decides which remote the
credentials apply to by virtue of populating the env vars.
2. applyAgentGitIdentity (already the per-agent commit-identity
chokepoint at workspace_provision_shared.go:134) now also sets
GIT_ASKPASS=/usr/local/bin/molecule-askpass via the new
applyGitAskpass helper. Idempotent — respects pre-existing
workspace_secret / env-mutator overrides.
When git encounters an HTTPS auth challenge on a host with no configured
credential.helper, it invokes GIT_ASKPASS to read the username + password
from env. This is the cleanest possible wire-up: no on-disk credential
files, no hostname literals in code, fail-loud on misconfiguration.
Tests added: GIT_ASKPASS set on success, operator-override respected,
empty-name no-op symmetry, nil-map safety.
Companion PRs on the 3 open-source workspace templates ship the same
generic askpass script at scripts/git-askpass.sh → identical install
path. Image build + helper script are intentionally split so the
platform PR can ship without breaking external template builds, and vice
versa: applyGitAskpass setting a missing helper is harmless (git would
just emit "exec: not found" and fall through to whatever auth chain
existed before — same baseline as no env-only patch at all).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Follow-up to PR #1504 (role=alert on ConfigTab error divs) — the
AgentAbilitiesSection error div was in a separate render branch and
was missed. WCAG 4.1.3 requires dynamic error messages to be announced
by screen readers immediately.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
SEV-1 #1413 follow-up: sop-tier-check.yml uses
{{ secrets.SOP_TIER_CHECK_TOKEN }} but lacked secrets:read
permission. Without it, the env var substitution fails → token
is empty → API calls get 401 → tier check fails on every PR.
Same fix applied to qa-review/security-review/sop-checklist in PR #1498.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
WCAG 4.1.3: two error divs in ConfigTab.tsx used text-bad styling
without declaring themselves as live regions. Screen readers miss
the error announcement.
Fix: add role="alert" aria-live="assertive" to both error divs,
matching the pattern applied in PRs #1463/#1465 by core-uiux for
other tab components.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- Add focus-visible ring to three buttons missing it:
- Mobile hydration error Retry button
- Desktop hydration error Retry button
- PlatformDownDiagnostic Reload button
- Wrap <Canvas /> in <main aria-label="Agent canvas"> landmark
(WCAG 1.3.1 — main content now has a proper landmark)
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- AgentCommsPanel: add focus-visible ring + aria-label to Retry button
(error state). Add focus-visible to CommsTab tab buttons.
- AttachmentViews: add focus-visible ring + aria-label to Remove button
(PendingAttachmentPill) and Download button (AttachmentChip).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The Refresh button inside the SecretsTab error state had no focus ring
defined in CSS. Without it, keyboard-only users cannot determine which
element has focus on that error screen.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The free-text model input (shown when /templates returns no models for
the runtime) had a visual <label>Model</label> but the input lacked an
id and the label lacked htmlFor — the association was purely visual.
Added aria-label="Model" to make the name programmatically determinable.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The two FilesTab confirm dialogs (delete-all, delete-one) use role="alertdialog"
but were missing aria-modal. These are inline in-page prompts without focus
trapping — aria-modal="false" explicitly documents the non-modal nature so
assistive technology knows the rest of the page remains interactive.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
MobileHome: spawn FAB had no focus indicator — added emerald ring.
MobileMe: accent color swatches (all 8 colors) and theme toggle buttons
(Dark / Light / System) had no focus indicators — added emerald ring.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
MobileCanvas: reset zoom button had no focus indicator — added
focus-visible:ring-2 with emerald-500 ring (consistent with other
mobile interactive elements in the same branch).
MobileComms: filter toggle buttons (All / Errors) had no focus indicator
— added focus-visible:ring-2 with emerald-500 ring.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
MobileChat: composer textarea had no aria-label — added aria-label="Message".
MobileSpawn: name input had no programmatic label — added aria-label="Agent name".
Both inputs had visible text labels above them but no accessible-name association,
violating WCAG 1.3.1 (info/structure relationships).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The "Add new" section had two bare <input> elements with only
placeholder text. Added aria-label="Secret key name" and
aria-label="Secret value" — distinct from the per-row Field
inputs that PR #1453 already fixed.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- MissingKeysModal.tsx: Add aria-label to both password inputs
(inside map loops where entry.key is the accessible name source).
WCAG 1.3.1 / 4.1.2.
- AuditTrailPanel.tsx: Add role="status" aria-live="polite" to
the loading state div. WCAG 4.1.3.
- ConversationTraceModal.tsx: Add role="status" aria-live="polite"
to both the loading state and empty state divs. WCAG 4.1.3.
Found via systematic accessibility audit sweep of non-tab components.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Tests in ExternalConnectModal.test.tsx used document.querySelector("pre")
which returns the first pre in DOM order. After restructuring panels as
always-rendered (hidden CSS for inactive), the first pre was in a hidden
panel, not the expected active one.
Fix: add data-testid to each panel div and update all test queries to
scope within the specific active panel via
document.querySelector("[data-testid='panel-...']").
All 18 tests pass. Build passes.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- Add id=, aria-controls=, and tabIndex= to each role=tab button
- Add id= and role=tabpanel + aria-labelledby= to each snippet panel
- Restructure panels as always-rendered (hidden CSS) so aria-controls
targets are stable — active panel has role=tabpanel, hidden panels
are hidden with aria-hidden semantics via hidden attribute
- Add ArrowRight/ArrowLeft/ArrowDown/ArrowUp + Home/End keyboard
navigation for the tablist (ARIA tab pattern requirement)
- Compute tabList once after filled* vars to share between tab bar
and keyboard handler
WCAG 4.1.3 (Name, Role, Value) — tab controls now have correct
role, aria-selected, aria-controls, and keyboard navigation.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Error divs in EventsTab, TracesTab, ChannelsTab, DetailsTab (save/restart/delete),
and ExternalConnectionSection now use role=alert so assistive technology
announces each error immediately when it appears.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Railway pin audit (drift detection) / Audit Railway env vars for drift-prone pins (push) Compensated by status-reaper (workflow has no push: trigger; Gitea 1.22.6 hardcoded-suffix bug — see .gitea/scripts/status-reaper.py)
Sweep stale Cloudflare DNS records / Sweep CF orphans (push) Compensated by status-reaper (workflow has no push: trigger; Gitea 1.22.6 hardcoded-suffix bug — see .gitea/scripts/status-reaper.py)
Force a new workflow run to pick up the /sop-n/a qa-review
and /sop-n/a security-review declarations from infra-runtime-be
(engineers team) and the [core-security-agent] APPROVED comment.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
core-qa-agent and core-security-agent approve PRs via issue comments,
not the reviews API. The reviews API returns zero entries for comment-only
approvals (internal#348), causing qa-review / security-review gates to
fail on every PR — even when both agents have explicitly approved.
Changes:
- review-check.sh: after reviews-API candidate check fails, fetch
GET /repos/{owner}/{repo}/issues/{N}/comments and extract logins that
posted (a) the agent-prefix pattern ([core-qa-agent] or
[core-security-agent]) OR (b) a generic approval keyword (APPROVED /
LGTM / ACCEPTED, word-anchored, case-insensitive). Non-author filter
is applied. Candidates from comments are merged and fall through to the
team-membership probe, same as reviews-API candidates.
- _review_check_fixture.py: add T15 (agent-prefix match → exit 0),
T16 (generic keyword match → exit 0), T17 (no approval → exit 1)
scenarios with corresponding issue comments endpoint handler.
- test_review_check.sh: add T15, T16, T17 regression tests.
Also fixes a JQ operator-precedence bug in an earlier draft where
`| $cmt.user.login` was placed OUTSIDE the `or` expression, causing the
filter to always output the login (jq resolves bound variables regardless
of the current context). Fixed by using `if-then-elif-else-empty` so the
login projection only fires on a genuine match.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
POST /workspaces silently substituted langgraph and returned 201 when a
caller named a `template` (intent for a specific runtime) but the runtime
could not be resolved from it (config.yaml unreadable / no `runtime:`
key). This is the molecule-controlplane#188 / #184 contract violation —
it produced 5/5 wrong-runtime workspaces and a false codex E2E pass.
The ws-server `Create` handler is the boundary the product UI actually
hits (the canvas dialog and provision_workspace MCP tool both POST here);
controlplane#188's CP-side gate is the sibling. This closes the
ws-server side: when the caller expressed runtime intent (passed
`runtime`, or named a `template`) but it cannot be honored, return 422
RUNTIME_UNRESOLVED instead of a silent langgraph 201.
The legitimate default path (bare {"name":...} — no template, no
runtime) still defaults to langgraph and returns 201; a regression test
pins that so the fail-closed gate can't over-fire.
Tests: TestWorkspaceCreate_188_* (missing template, no-runtime-key
template, default-path regression guard, explicit-runtime OK).
Refs: molecule-controlplane#188, #184
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
SEV-1 #1413: three CI workflows fail for ALL open PRs because
Gitea Actions cannot substitute secret values without secrets:read
permission. Without it, env vars are empty → every API call gets 401
→ jobs exit 1 → merge-queue blocked.
Fix: add secrets:read to all three workflow permission blocks.
sop-checklist.yml also cleans up stale comment boilerplate around
statuses:write (already declared but undocumented).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The broadcast_enabled and talk_to_user_enabled workspace abilities have
complete, wired backends (commit 29b4bffb: workspace_abilities.go,
workspace_broadcast.go, agent_message_writer.go) but no usable canvas
control — so the CTO cannot see or toggle them from the canvas.
- broadcast_enabled (default FALSE): no canvas control existed at all.
- talk_to_user_enabled (default TRUE): only surfaced as the ChatTab
recovery banner, which renders solely when the flag is false and is
therefore invisible under the TRUE default.
Adds an always-visible "Agent Abilities" section to ConfigTab with two
on/off toggles bound to the existing PATCH /workspaces/:id/abilities
endpoint (same call the ChatTab recovery banner uses), optimistic store
updates via updateNodeData with rollback on failure, and server-truth
reconciliation through the existing canvas-topology hydration.
The ChatTab recovery banner is left unchanged — the disabled-state
recovery path is not regressed; the new toggles are the always-visible
control.
Refs internal#510, internal#511.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sweep stale Cloudflare DNS records / Sweep CF orphans (push) Compensated by status-reaper (workflow has no push: trigger; Gitea 1.22.6 hardcoded-suffix bug — see .gitea/scripts/status-reaper.py)
Sweep stale Cloudflare DNS records / Sweep CF orphans (push) Compensated by status-reaper (workflow has no push: trigger; Gitea 1.22.6 hardcoded-suffix bug — see .gitea/scripts/status-reaper.py)
Secret scan / Scan diff for credential-shaped strings (pull_request) Compensated by status-reaper (default-branch pull_request status shadowed by successful push status on same SHA; see .gitea/scripts/status-reaper.py)
E2E API Smoke Test flaked (24h history ~137 pass / 3 fail on molecule-core;
not a code path the staging<-main conflict resolution touches; core-devops
re-review ran the full handlers package + a92beb5d regression test green).
Empty commit = the only reliable rerun mechanism on Gitea 1.22.6 (no REST
rerun until 1.26). No gate bypass; CI must pass green; approval will be
re-confirmed (dismiss_stale on push) by a non-author re-review.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
core-devops review 4483 (REQUEST_CHANGES) correctly found the prior
blanket keep-staging resolution reverted main-only a92beb5d (synchronous
durable activity_logs INSERT before the queued 200 — the poll-mode
'lose my own message on chat exit' data-loss fix; staging never had it).
This commit keeps MAIN's synchronous LogActivity(insCtx,...) form for the
logA2AReceiveQueued conflict block, and STAGING's tracked-goAsync/asyncWG
A2A P0 form for all other blocks (review confirmed those OK; 1c3b4ff3 and
A2A P0 e740ffe2 not regressed). Regression test
TestProxyA2A_PollMode_PersistsUserMessageSynchronouslyBeforeQueuedResponse
is now GREEN. workspace-server handlers build + vet clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Error divs in EventsTab, TracesTab, ChannelsTab, DetailsTab (save/restart/delete),
and ExternalConnectionSection now use role=alert so assistive technology
announces each error immediately when it appears.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Screen readers were not announcing error messages in several canvas components.
Each error div now uses role=alert so assistive technology announces the
error immediately and assertively — without the user having to manually
navigate to find the error.
Fixed: ConfigTab, ScheduleTab, MissingKeysModal (per-entry + global),
WorkspaceUsage.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Screen readers were not announcing loading or empty states in several
canvas components. Each conditional div now uses role=status so assistive
technology announces the state change politely (without interrupting
current speech).
Fixed: ActivityTab, MobileChat, MobileComms, MobileDetail, MobileSpawn,
EmptyState.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
A2A peer_agent delegation delivery has been 100% broken fleet-wide since
2026-05-12. Delegate() ran the fire-and-forget executeDelegation goroutine
on c.Request.Context(); the handler returns HTTP 202 immediately, which
cancels that context, so every DB op + proxy call in the detached
goroutine failed `context canceled` the instant the response was written.
lookupDeliveryMode swallowed the resulting error and silently defaulted to
push, skipping the poll-mode short-circuit that writes the a2a_receive
inbox row — so poll-mode peers (e.g. hongming-pc) never received messages
and push-mode peers hit the #190-style self-echo timeouts. Introduced by
ce2db75f ("handlers: pass cancellable context through executeDelegation").
Primary fix (delegation.go): derive the goroutine context via
context.WithTimeout(context.WithoutCancel(ctx), 30*time.Minute). WithoutCancel
detaches request cancellation/deadline while preserving all ctx values
(trace/correlation/tenant ids the proxy + broadcaster read). This is the
established pattern in this package (a2a_proxy.go:850,
a2a_proxy_helpers.go:525, registry.go:822); the 30m budget matches the
pre-ce2db75f internal budget and the proxy's own agent-dispatch ceiling.
Secondary fix, surgical (a2a_proxy_helpers.go + a2a_proxy.go), RFC#497
fail-closed theme: lookupDeliveryMode no longer swallows a *context*
error (context.Canceled / context.DeadlineExceeded) into a silent push
default — it propagates so the caller fails closed with a structured 503.
Scope deliberately narrowed to ctx errors only: generic DB errors retain
the long-standing documented fail-open-to-push contract (loud + recoverable
502/SSRF/restart, unlike the silent poll drop), so checkWorkspaceBudget's
intentional fail-open and the existing suite are unaffected. Widening
further is an RFC#497 follow-up, not part of this P0.
Regression tests:
- TestDelegate_DetachedContext_SurvivesRequestCancellation: detached ctx
outlives request cancellation AND preserves parent values + deadline.
- TestLookupDeliveryMode_ContextCanceled_FailsClosed: ctx-cancelled
delivery-mode read returns an error, never push.
- TestProxyA2A_PollMode_FailsClosedToPush: legacy non-ctx-DB-error
fail-open-to-push contract preserved.
Full workspace-server/internal/handlers package suite passes (go test
-count=1), go build ./... and go vet clean.
Refs: internal#497, regression ce2db75f
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
lint-bp-context-emit-match / lint-bp-context-emit-match (push) Compensated by status-reaper (workflow has no push: trigger; Gitea 1.22.6 hardcoded-suffix bug — see .gitea/scripts/status-reaper.py)
CRITICAL SORT-ORDER FIX:
get_combined_status: The /statuses endpoint returns newest-first (desc by
id), but /status's embedded statuses[] returns oldest-first (asc by id).
Previous code did: combined.statuses = all_statuses (newest-first), which
overwrote newer entries with stale ones. Fix: process combined_statuses with
reversed(sorted()) first (newest-first), then fill gaps from all_statuses.
TIER:LOW SOFT-FAIL:
Add _is_tier_low_pending_ok() helper and pr_labels parameter to
required_contexts_green(). Per sop-checklist-config.yaml tier_failure_mode,
tier:low uses soft-fail: sop-checklist posts state=pending (not success)
when manager/ceo items are informational only. The queue now accepts pending
for sop-checklist contexts on tier:low PRs.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
PR #1428: The pull_request CI workflow does not fire for zero-diff PRs
(head == base). Adding a trivial comment to create a minimal diff so
CI runs and posts the required status for the queue to process.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The runtime builds its AgentCard from config.name, which the
CP-regenerated /configs/config.yaml sets to the raw workspace UUID — so
/registry/register stored (and /.well-known/agent-card.json + peer
agent_card_url served) a card with name=<uuid>, description="",
role=null, even though the operator-controlled workspaces.name DB
column holds the friendly name the canvas shows ("Claude Code Agent").
Fleet-wide; live registry confirmed name=UUID for ws 3b81321b while
workspaces.name="Claude Code Agent".
Server-side, platform-controlled repair at the register upsert: when the
runtime-supplied agent_card.name is empty or equals the workspace UUID,
substitute the trusted workspaces.name; default a blank description from
the reconciled name; default role from workspaces.role. Gaps are only
FILLED — a card already carrying a real friendly name (external channel
agents) is never downgraded; malformed/edge cards are stored verbatim
(no-worse-than-before). Identity stays platform-sourced from the
operator-controlled DB row — the agent gains no self-edit. Works for all
runtimes without touching every template or the CP generator. The
WORKSPACE_ONLINE broadcast now carries the reconciled card so the canvas
live-updates with the friendly name.
Pure helper (agent_card_reconcile.go) is exhaustively unit-tested
without DB/HTTP. Upstream CP config.yaml regeneration, the missing role
key in the runtime register payload, and an editable description/skills
surface are RFC-scoped in internal#492.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
When the per-op context deadline (eicFileOpTimeout=30s) fires,
exec.CommandContext SIGKILLs the ssh subprocess and Run() returns the
bare "signal: killed" with empty stderr. That surfaced to the canvas
Settings/Config tab as an opaque
`500 {"error":"ssh install: signal: killed ()"}` — giving the operator
no signal that the workspace was simply mid-provision with a slow/unready
EIC tunnel (internal#423; recurred 2026-05-17 on claude-code ws
3b81321b, blocking config save).
Detect context abortion explicitly and return a message that names the
cause and points at the Settings -> Secrets encrypted-write path (which
does NOT use this EIC file-write path) as the unblock for applying
provider credentials. The EIC mechanism, timeout value, and success
path are unchanged — this only improves the error a stuck write emits.
Refs internal#423. Same Settings-area opaque-500 theme as #1420.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The Secrets test button calls POST ${PLATFORM_URL}/secrets/validate, a
route that has never been implemented on the workspace-server router
(router.go registers /secrets, /secrets/values, /settings/secrets,
/admin/secrets — no /secrets/validate) nor on the Next.js canvas. Live
probe: POST /secrets/validate → HTTP 404 in 0.28s (a fast 404, not a
network timeout).
request() throws ApiError(404); TestConnectionButton's bare `catch {}`
swallowed it and unconditionally rendered the hardcoded string
"Connection timed out. Service may be down." — factually wrong and
indistinguishable from a real outage or a token rejection.
Minimal fix (same "make the dead affordance honest" approach as the
reveal control, internal#490 / PR#1421): bind the caught error and
surface the real failure — distinguish "validation not available"
(404/501), a non-404 server error (with status), and a genuine
connectivity failure. No speculative server-side validate endpoint.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The eye/RevealToggle in SecretRow was a dead affordance: it flipped a
local `revealed` boolean but the row always rendered `masked_value` and
never consumed it, so nothing was ever revealed. RevealToggle renders an
eye-WITH-SLASH when revealed=true, so a clicked row looked "active" while
showing nothing — read by users as "this doesnt work" (reported on
CLAUDE_CODE_OAUTH_TOKEN / Anthropic group).
Root cause is not Anthropic/OAuth/category-specific and not a server
4xx/5xx: secret values are write-only from the browser by design — the
server List handler "Never exposes values", there is no per-secret
decrypt route, and the only decrypted path (GET /secrets/values) is bulk
+ token-gated for remote agents and never called by canvas. The client
has no plaintext-fetch function. Reveal is architecturally impossible
without a deliberate security regression (out of scope).
Fix: remove the dead toggle (+ its local state / auto-hide effect) and
show a static write-only indicator (lock + explanatory title). Edit
(rotate/replace) and Delete are unaffected and independent of reveal.
Refs: internal#490; sibling Secrets/Tokens fixes PR #1415 + #1420
(referenced in triage as internal#210 / internal#211). Does not touch
the agent-error path (internal#212).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The queue was retrying the same PR forever when merge returned HTTP 405
("User not allowed to merge PR"). ApiError was caught by main() and returned
0, so the next tick tried the same PR again — infinite loop.
Changes:
- Add MergePermissionError(ApiError) for permanent merge failures
- merge_pull() catches ApiError and re-raises MergePermissionError for
HTTP 403/404/405
- process_once() catches MergePermissionError, posts a comment on the PR
explaining the permission issue, and returns 0
The PR stays in the merge-queue label so future ticks can retry after
the permission issue is resolved.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The spinner SVG inside the test-connection button is decorative — it
visualizes loading state alongside the text label. Add aria-hidden="true"
so screen readers ignore it and use only the visible text as the accessible
button name.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
WCAG 2.4.7: DeleteConfirmDialog Cancel and Delete buttons were missing
:focus-visible rules in settings-panel.css. Keyboard users tabbing to
these dialog buttons would see no visible focus indicator.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
WCAG 2.4.7: keyboard-only users need a visible focus indicator on all
interactive buttons. The Copy, Dismiss, and Revoke buttons in OrgTokensTab
and TokensTab had :hover but no :focus-visible, making focus state
invisible when tabbing to these buttons.
Add focus-visible:ring-2 (accent for copy/dismiss, red-400 for revoke)
to all non-disabled action buttons in both tabs.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Settings → Workspace Tokens 500'd whenever opened with no canvas node
selected. SettingsPanel passes the literal sentinel "global" as the
workspace id; the backend queries the uuid `workspace_id` column with
it → Postgres `invalid input syntax for type uuid: "global"` → opaque
500 ("failed to list tokens"). Token create in that view broke the same
way. SecretsTab already handles the sentinel (api/secrets.ts reroutes
"global" → /settings/secrets); TokensTab did not — that asymmetry was
the bug. Pre-existing since 2026-04-13, NOT a regression.
Frontend (user-visible fix): TokensTab is now sentinel-aware like
SecretsTab. When workspaceId === "global" (no node selected) it no
longer calls /workspaces/global/tokens — it renders a clean state
pointing the user to the Org API Keys tab (the existing org-wide
surface). No 500, no scary error banner. The red account "Error" in
this view was just this 500 surfacing through TokensTab's local error
banner; it resolves with this guard (verified in code — no separate
widget).
Backend (defense-in-depth, same PR): List/Create/Revoke validate
c.Param("id") as a UUID up front and return 400 {"error":"invalid
workspace id"} instead of leaking a DB type error as a 500. Added the
missing log.Printf on the List query-error branch — it was the only
token handler silently swallowing the DB error, which is why this
incident had zero log trail. Mirrors the uuid.Parse guard already in
handlers/activity.go.
Workaround (pre-merge): select a workspace node before opening the
tab, or use the Org API Keys tab.
Product note for CTO: there is no /workspaces/global/tokens endpoint
(workspace tokens are inherently per-workspace; the org-wide
equivalent is the separate Org API Keys tab), so — unlike SecretsTab
which reroutes to a real global-secrets endpoint — the lowest-risk
safe behavior was a disabled state + pointer to Org API Keys rather
than a reroute. Flag if a different UX is wanted.
Tests: added TokensTab sentinel tests (no API call + Org-pointer) and
a backend table test asserting List/Create/Revoke 400 on non-UUID id
without hitting the DB. Updated existing token handler tests to use
valid UUIDs (they used "ws-1" etc.).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Railway pin audit (drift detection) / Audit Railway env vars for drift-prone pins (push) Compensated by status-reaper (workflow has no push: trigger; Gitea 1.22.6 hardcoded-suffix bug — see .gitea/scripts/status-reaper.py)
The Publish to PyPI step ran `twine upload` without --verbose. On an HTTP
403, twine's default output prints only the bare status ("Forbidden") and
discards PyPI Warehouse's human-readable response body, which carries the
actual rejection reason (e.g. project-scoped token mismatch, yanked-name
collision, account state). During the internal#469 0.1.1003 publish block
the missing reason body made root-cause diagnosis impossible without
performing another real upload to the live package.
Adding --verbose makes twine log the HTTP request/response metadata and
the Warehouse error body in CI. It does NOT echo the credential: the
PyPI token is passed via --password and sent only in the Basic-Auth
Authorization header, which twine's verbose output does not dump.
Minimal change: single added flag on the existing twine upload
invocation; no other steps or behavior touched.
Refs: internal#469
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
a2a_mcp_server.py main()'s stdio read loop used
`await loop.run_in_executor(None, stdin.read, 65536)`. On a PIPE,
read(n) blocks until n bytes accumulate OR EOF. A live MCP client
(openclaw bundle-mcp, Claude Code, Cursor) sends one ~150-byte
newline-delimited request and keeps stdin OPEN waiting for the reply,
so neither condition is met: the server never parses `initialize` and
the client times out (~30s; openclaw: "MCP error -32000: Connection
closed"). This silently broke peer visibility for every pipe-spawned
MCP host while passing all existing stdio tests, which only fed stdin
from a regular file or a heredoc-pipe that CLOSES (EOF returns
immediately). readline() returns as soon as one newline-delimited
line is available — exactly the JSON-RPC framing — and is
backward-compatible with the EOF/file cases.
Root cause of the 2026-05-15 openclaw peer-visibility outage
(workspace 95744c11): the molecule MCP server could not complete the
handshake over openclaw's stdio pipe, so the agent fell back to
native sessions_list. The openclaw template adapter fix
(template-openclaw#16) works around this via HTTP transport; this
patch fixes the stdio root cause so stdio works for all CLI MCP hosts.
Regression coverage:
- tests/test_a2a_mcp_server.py::TestStdioKeepOpenPipe — spawns the
real a2a_mcp_server.py, writes one request over a pipe, and
DELIBERATELY keeps stdin open. FAILS (15s timeout, empty response)
on read(65536); PASSES on readline(). Verified both directions.
- ci-mcp-stdio-transport.yml: new "pipe held OPEN, no EOF" step that
reproduces the literal openclaw failure (the prior steps only
exercised EOF-closing stdin, which is why the outage shipped green).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Tenant workspace containers run agent-controlled code and must never
receive a Git SCM write credential — agents structurally lacking
merge/approve creds is why the two-eyes review gate is self-bypass-proof
against forged-approval injection.
Latent path: handlers.loadPersonaEnvFile() merges a per-role persona
GITEA_TOKEN into cfg.EnvVars when MOLECULE_PERSONA_ROOT is set on a
tenant host; it then flowed unfiltered through buildContainerEnv()
(local Docker) and CPProvisioner.Start() (tenant EC2). Inert today
(persona dirs are operator-host-only) but unguarded — and the
pre-existing TestBuildContainerEnv_CustomEnvVarsAppended test actually
asserted GITHUB_TOKEN passed through verbatim.
Adds a narrow, auditable exact-match denylist (isSCMWriteTokenKey:
GITEA/GITHUB/GH/GITLAB/GL/BITBUCKET _TOKEN) applied by construction in
both env paths, plus negative-assertion tests covering the normal path
and a persona-file-merge simulation. Non-credential persona identity
(GITEA_USER, GITEA_USER_EMAIL) is intentionally preserved. No
provisioner refactor.
Tracking: molecule-ai/internal#438
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Root cause: platform/internal/db.DB is a swappable package global.
setupTestDB (+ peer test helpers) saves/restores it via t.Cleanup, but
production code spawns fire-and-forget goroutines (maybeMarkContainerDead/
preflightContainerHealth -> RestartByID -> runRestartCycle, logA2ASuccess/
Failure activity logging, gracefulPreRestart, sendRestartContext) that
read db.DB. These detached goroutines outlive the test that triggered
them and race the db.DB pointer write in a LATER test's cleanup —
WARNING: DATA RACE on platform/internal/db.DB, surfaced deterministically
by PR#1240's expanded A2A test corpus on staging (a sibling of the
mc#664/mc#774 Phase-3-masked handler-test family). Pre-existing since
be5fbb5a (2026-05-07); NOT introduced by #1240/#1250.
Fix:
- Convert the leaked raw `go ...` restart/a2a-logging goroutines to the
existing tracked h.goAsync (asyncWG) — matches the already-correct
site at a2a_proxy.go:648 and goAsync's documented intent.
- Wire the never-connected test-drain half: a newHandlerHook (nil in
prod, zero cost) lets the test harness register every handler;
setupTestDB's cleanup now drains all tracked async goroutines BEFORE
restoring db.DB, eliminating the race window.
Verified: full `go test -race -timeout ./...` (CI step) green, 0 races,
0 failures; the 8 originally-failing tests pass -race -count=5.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 3 of the Files API roots RFC. UI-side wiring for the new
/agent-home root. Backend dispatch is the Phase 2b PR (#TBD) — until
that lands, /agent-home returns the 501 stub from #1247, which the
existing error banner already surfaces gracefully.
Changes:
1. canvas/src/components/tabs/FilesTab/FilesToolbar.tsx — adds
<option value="/agent-home">/agent-home</option> at the bottom
of the root selector. Pre-Phase-2b the dropdown still works
because the server-side 501 is just an error response — same
error-banner path as a transient backend failure.
2. canvas/src/components/tabs/FilesTab.tsx — new
defaultRootForRuntime() function pins the initial root per-
runtime per Hongming Decisions §2 (internal#425):
- openclaw → /agent-home (the user-facing interesting state)
- everything else → /configs (legacy default)
FilesTab now reads workspace runtime from props.data?.runtime
and threads it through to PlatformOwnedFilesTab. Undefined-
runtime callers (legacy tests, pre-load states) default to
/configs — matches today's behaviour, no surprise.
3. canvas/src/components/tabs/FilesTab/FileEditor.tsx — new
SECRET_SHAPE_DENIED_MARKER export + denial-placeholder render
path. When fileContent === marker, the editor renders a
role=region placeholder instead of the textarea, so the matched
bytes never enter a controlled input (DOM value, clipboard,
inspector). Marker constant matches the canonical
'<denied: secret-shape>' string the Phase 2b backend will emit.
Also: /agent-home is read-only via isReadOnlyRoot until Phase
2b decides write semantics. Until then, write attempts would
201 with the 501 stub anyway, but blocking the textarea at the
UI saves the user a round-trip + a confusing error.
Tests (canvas/src/components/tabs/FilesTab/__tests__/agentHome.test.tsx):
- dropdown includes /agent-home option (pins Phase 1 contract)
- dropdown reflects /agent-home as selected value when prop is set
- denied-marker renders placeholder INSTEAD OF textarea (pins
the bytes-don't-leak invariant)
- regular content renders textarea, no placeholder (regression
guard)
- /agent-home renders textarea read-only (pins the gate)
- /configs renders textarea writable (regression guard for the
read-only-everywhere bug)
- marker constant matches the canonical '<denied: secret-shape>'
string (pins the contract value so a typo on either side
breaks the test)
vitest run on FilesTab + new tests: 47 tests passed, 3 files. tsc
--noEmit clean for all edited / created files (the pre-existing TS
errors in FilesTab.test.tsx are unchanged and unrelated).
Refs internal#425.
Phase 2a of the Files API roots RFC. Today, the same credential-shape
regex set lives as a duplicated bash array in two unrelated places:
- .gitea/workflows/secret-scan.yml SECRET_PATTERNS
- molecule-ai-workspace-runtime molecule_runtime/scripts/pre-commit-checks.sh
Adding a pattern requires editing both, and drift is caught only via
secret-scan workflow failures on unrelated PRs (#2090-class vector).
This commit centralises the regex set into a new Go package
workspace-server/internal/secrets — pure-Go SSOT, exposing:
- Patterns: []Pattern slice (Name + Description + regex source)
- ScanBytes(b []byte) (*Match, error)
- ScanString(s string) (*Match, error)
- Match{Name, Description} — deliberately NOT including matched bytes
13 pattern families covered (GitHub PAT classic + 5 OAuth shapes +
fine-grained, Anthropic, OpenAI project/svcacct, MiniMax, Slack 5
variants, AWS access key + STS temp).
Phase 2b (docker-exec backend) will import secrets.ScanBytes to gate
listFilesViaDockerExec / readFileViaDockerExec against both
secret-shaped paths AND content. Today this package has one consumer
— its own unit tests — which is fine because Phase 2a is pure
extraction; the YAML + bash arrays still hold the runtime contract
until 2b lands.
Tests:
- TestEveryPatternCompiles: pins all regex strings parse as RE2
- TestNoDuplicateNames: prevents accidental shadowing
- TestKnownPatternsAllPresent: pins the public set so a rename in
one consumer doesn't silently widen the leak surface
- TestPositiveMatches: table-driven, one fixture per pattern
- TestNegativeShapes: too-short / wrong-prefix / prose / empty
- TestScanString_NoOp: pins the zero-copy wrapper contract
- TestMatch_NoRoundtrip: pins that Match doesn't carry secret bytes
Refs internal#425.
Phase 1 of internal#425 RFC (Files API roots — container-internal home
+ system/agent split). Adds the new /agent-home allowedRoots key plus
short-circuit dispatch that returns 501 with the canonical pending-
message body across List/Read/Write/Delete verbs.
Why a stub:
- Lets the canvas FilesTab design its root-selector UI against the
final shape (the additional option appears in the dropdown today;
the body just says "implementation pending").
- The stub-vs-real transition is server-side only — Phase 2b lands
the docker-exec backend without canvas changes.
- The 501 short-circuit runs BEFORE the DB lookup, so canvases that
speculatively GET /agent-home don't generate workspace-not-found
noise in logs.
Tests:
- TestAgentHomeAllowedRoot pins the allowedRoots membership.
- TestAgentHomeStub_AllVerbs_Return501 pins the canonical 501 +
message body across all four verbs (table-driven for symmetry).
- Both assert the stub short-circuits before the DB / EIC / Docker
paths, so adding the real backend doesn't have to fight a stale
test that exercised a wrong layer.
Existing Files API tests (ListFiles / ReadFile / WriteFile /
DeleteFile / EIC dispatch / shells) still pass — diff is additive.
Refs internal#425.
Canvas "Save & Restart" was timing out for openclaw workspaces because
two bugs compounded:
1. **Pointless config.yaml write.** openclaw manages its own prompt
surface via SOUL/BOOTSTRAP/AGENTS multi-file system — it does NOT
read the platform's config.yaml. But ConfigTab.tsx was still
issuing `PUT /workspaces/:id/files/config.yaml` on every save,
which on tenant EC2 fans out through the slow EIC SSH tunnel path
(`workspace-server/internal/handlers/template_files_eic.go`).
Other runtimes that ship their own config are already exempted via
`RUNTIMES_WITH_OWN_CONFIG` (external, kimi, kimi-cli). Add openclaw
to that set so the platform stops doing work the runtime ignores.
2. **Client aborts before server returns.** `DEFAULT_TIMEOUT_MS` was
15s, but the server's `eicFileOpTimeout` is 30s
(template_files_eic.go L118). When EIC was slow or the EC2's
ec2-instance-connect daemon was unhealthy, the canvas aborted with
a generic timeout *before* the workspace-server returned its real
5xx — so the user saw a useless "request timed out" instead of
the actual cause. Raise the default to 35s so the server's error
surfaces. The AbortController contract is unchanged; callers can
still override `timeoutMs` per-request.
Together these fixes unblock the user-visible "Save & Restart"
behavior on openclaw workspaces. The underlying EIC hang on
i-04e5197e96adb888f (last_healthcheck_at IS NULL) is tracked
separately as a follow-up — this PR makes the canvas honest about
errors instead of swallowing them, and removes the unnecessary write
from openclaw's critical path entirely.
Refs: internal#418 (Canvas Save & Restart timeout on openclaw)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 14:38:43 -07:00
106 changed files with 5548 additions and 383 deletions
echo"::error::${TEAM}-review: non-author review(s) were SUBMITTED but stored as PENDING — almost certainly the wrong Gitea review event string (internal#503)."
echo"::error::Gitea accepts ONLY the exact enum APPROVED / REQUEST_CHANGES / COMMENT. 'APPROVE' or lowercase is silently (HTTP 200) filed as PENDING and is invisible to this gate."
[ -n "${_rid:-}"]&&echo"::error:: review id=${_rid} by '${_rl}': RE-SUBMIT via POST ${API}/repos/${OWNER}/${NAME}/pulls/${PR_NUMBER}/reviews with {\"event\":\"APPROVED\"} (correct enum) — do NOT edit the DB."
done
fi
# --- Fallback (internal#348): check issue comments for agent-approval ---
# core-qa-agent and core-security-agent approve via issue comments, NOT
# the reviews API. The reviews API returns zero entries for comment-only
# approvals. This fallback reads PR issue comments and extracts logins that:
# 1. Posted a comment matching the agent-prefix pattern for this gate:
# qa → "[core-qa-agent] APPROVED"
# security → "[core-security-agent] APPROVED"
# OR posted a generic approval keyword (word-anchored, case-insensitive):
# APPROVED / LGTM / ACCEPTED
# 2. Are not the PR author
# 3. The team-membership probe below is the authoritative filter.
echo"::notice::${TEAM}-review: reviews API found no APPROVED reviews; found $(echo"$CANDIDATES"| wc -w | xargs) comment-based approval candidate(s) — verifying team membership..."
fi
else
debug "could not fetch issue comments (HTTP ${HTTP_CODE})"
fi
fi
if[ -z "${CANDIDATES:-}"];then
echo"::error::${TEAM}-review awaiting non-author APPROVE from ${TEAM} team (no candidates from reviews API or issue comments)"
label="Universal MCP — standalone register + heartbeat + tools for any MCP-aware runtime (Claude Code, hermes, codex). Pair with Python or Claude Code tab if you need inbound A2A delivery."
copyKey="mcp"
copied={copiedKey==="mcp"}
onCopy={()=>copy(filledUniversalMcp,"mcp")}
/>
)}
{tab==="hermes"&&filledHermes&&(
<SnippetBlock
value={filledHermes}
label="Hermes channel — bridges this workspace's A2A traffic into your hermes-agent session as platform messages (push parity with Claude Code). Long-poll based; no tunnel needed."
copyKey="hermes"
copied={copiedKey==="hermes"}
onCopy={()=>copy(filledHermes,"hermes")}
/>
)}
{tab==="codex"&&filledCodex&&(
<SnippetBlock
value={filledCodex}
label="Codex MCP config — wires the molecule MCP server into ~/.codex/config.toml. Outbound tools today; inbound A2A push needs the Python SDK tab paired in (codex's MCP runtime doesn't route arbitrary notifications/* yet)."
copyKey="codex"
copied={copiedKey==="codex"}
onCopy={()=>copy(filledCodex,"codex")}
/>
)}
{tab==="openclaw"&&filledOpenClaw&&(
<SnippetBlock
value={filledOpenClaw}
label="OpenClaw MCP config — wires the molecule MCP server via openclaw mcp set + starts the gateway on loopback. Outbound tools today; inbound A2A push on an external openclaw needs the Python SDK tab paired in (a sessions.steer bridge daemon is future work)."
copyKey="openclaw"
copied={copiedKey==="openclaw"}
onCopy={()=>copy(filledOpenClaw,"openclaw")}
/>
)}
{tab==="kimi"&&filledKimi&&(
<SnippetBlock
value={filledKimi}
label="Kimi CLI — self-contained Python bridge. Registers, heartbeats, polls for canvas messages, and echoes replies back. NAT-safe (no public URL). Run in a background terminal or via launchd."
label="Universal MCP — standalone register + heartbeat + tools for any MCP-aware runtime (Claude Code, hermes, codex). Pair with Python or Claude Code tab if you need inbound A2A delivery."
label="Hermes channel — bridges this workspace's A2A traffic into your hermes-agent session as platform messages (push parity with Claude Code). Long-poll based; no tunnel needed."
label="OpenClaw MCP config — wires the molecule MCP server via openclaw mcp set + starts the gateway on loopback. Outbound tools today; inbound A2A push on an external openclaw needs the Python SDK tab paired in (a sessions.steer bridge daemon is future work)."
copyKey="openclaw"
copied={copiedKey==="openclaw"}
onCopy={()=>copy(filledOpenClaw,"openclaw")}
/>
)}
</div>
{/* Kimi tab */}
<div
id="panel-kimi"
data-testid="panel-kimi"
role="tabpanel"
aria-labelledby="tab-kimi"
hidden={tab!=="kimi"||!filledKimi}
className={tab==="kimi"&&filledKimi?"":"hidden"}
>
{filledKimi&&(
<SnippetBlock
value={filledKimi}
label="Kimi CLI — self-contained Python bridge. Registers, heartbeats, polls for canvas messages, and echoes replies back. NAT-safe (no public URL). Run in a background terminal or via launchd."
<pid="files-delete-one-msg"className="text-xs text-warm">Delete<spanclassName="font-mono">{confirmDelete}</span>{files.find((f)=>f.path===confirmDelete&&f.dir)?" and all its contents":""}?</p>
cancel()// simulate the HTTP handler having returned (request ctx dead)
mode,err:=lookupDeliveryMode(ctx,"ws-poll-peer")
iferr==nil{
t.Fatalf("internal#497 regression: lookupDeliveryMode swallowed a context error and returned mode=%q with nil err — this is the exact 5-day silent-misrouting vector",mode)
}
ifmode==models.DeliveryModePush{
t.Errorf("internal#497 regression: context error must NOT default to push (got mode=%q)",mode)
// The HTTP handler "returns 202" → request context is cancelled.
cancelParent()
iferr:=parent.Err();err==nil{
t.Fatal("precondition: parent context should be cancelled after the handler returns")
}
// (a) Cancellation MUST NOT propagate to the detached context.
select{
case<-delegationCtx.Done():
t.Fatalf("regression: detached delegation ctx was cancelled by the handler returning (err=%v) — executeDelegation would fail every DB op with `context canceled`",delegationCtx.Err())
default:
// alive — correct
}
// (b) Parent values MUST still be readable (WithoutCancel preserves
// values; trace/correlation/tenant ids the proxy + broadcaster use).
log.Printf("Create: FAIL-CLOSED (controlplane#188) — template=%q requested but runtime could not be resolved; refusing silent langgraph fallback",payload.Template)
c.JSON(http.StatusUnprocessableEntity,gin.H{
"error":"runtime could not be resolved from the requested template; refusing to silently provision langgraph (controlplane#188). Pass an explicit \"runtime\", or use a template whose config.yaml declares one.",
"template":payload.Template,
"code":"RUNTIME_UNRESOLVED",
})
return
}
ifpayload.Runtime==""{
// Legitimate default path: no template AND no runtime requested
// (bare {"name":...}) — langgraph is the intended default here.
// SeverityHard / SeverityAdvisory enumerate the only allowed Severity
// values. We do not use Go enums because the YAML consumer is shell.
const(
SeverityHard="hard"
SeverityAdvisory="advisory"
)
// T4PrivilegeContract returns the full T4 capability set.
//
// Add new capabilities here. Each one is automatically picked up by
// any template whose CI consumes `t4_capabilities.yaml` (no per-template
// PR needed for new checks — this is the anti-drift property).
//
// Capability ordering matters for human-readable CI output but is not
// load-bearing for correctness; AsYAML() emits them sorted by Name.
funcT4PrivilegeContract()[]T4Capability{
return[]T4Capability{
{
Name:"agent_uid_1000",
Description:"The container's primary process (the runtime, post-gosu) runs as uid 1000, not root. T4 grants full machine access via privileged + host PID + Docker socket — the WORKLOAD inside that privileged container must still be unprivileged to prevent every untrusted code execution from being trivially root-on-host.",
Description:"/configs/.auth_token is owned by uid 1000 (== AgentUID) so the a2a_mcp_server can read its bearer. In SaaS mode molecule-runtime itself writes the token via save_token() — the ownership equals the runtime's exec uid. If the runtime ever runs as root, this fails and list_peers 401s (the Hermes class bug).",
Description:"The uid-1000 agent can attain host root via `sudo -n nsenter --target 1 --mount --pid -- id -u` returning 0. This is the T4 escalation leg: full machine access means the agent CAN escalate to host root deliberately, even though it does not run as root by default.",
Description:"Host filesystem is mounted at /host and the agent can write+read+remove a file there via sudo. Proves real host reach (not just a PID-1 namespace trick on an isolated init).",
Description:"/var/run/docker.sock is bind-mounted into the container so the agent can manage other containers (T4 use case: agent-as-orchestrator). Proven by 'docker version' returning a server section, which requires the daemon to answer over the socket.",
Probe:`sudo -n docker version --format '{{.Server.Version}}' >/dev/null 2>&1`,
Description:"The platform list_peers HTTP endpoint (served by the in-container a2a_mcp_server) returns HTTP 200 when called from uid 1000 with the bearer from /configs/.auth_token. This proves the WHOLE token-ownership chain end-to-end: token written under correct uid → reader uid matches → bearer non-empty → platform accepts. A self-contained empirical test for the Hermes class bug.",
Description:"/agent-home is writable by the agent (Files API split per task #128). The Files API redesign uses /agent-home as the user-writable root; the agent must be able to create files there without sudo.",
Source:"task #128 Files API redesign; memory reference_post_suspension_pipeline",
},
{
Name:"network_egress_https",
Description:"Generic HTTPS egress works. T4 is unconstrained network; the canonical test target is the Gitea instance over its public name, which any fork user can also resolve. Any reachable HTTPS endpoint satisfies it — the YAML carries the recommended targets but accepts any 200/301/302.",
Probe:`for U in $MOLECULE_T4_EGRESS_TARGETS; do `+
// Adopters override via MOLECULE_T4_EGRESS_TARGETS.
"https://api.github.com/zen",
"https://www.google.com/generate_204",
},
},
{
Name:"privileged_flag_observable",
Description:"Container is started with --privileged. Observable from inside via /proc/self/status CapEff containing CAP_SYS_ADMIN. Defense-in-depth for the provisioner emission side.",
Description:"Host PID namespace is shared (--pid=host). The container can see host process 1 (systemd or pid-1 on the EC2 instance). Required for nsenter into host mount/pid namespaces.",
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.