fix(workspace-server): surface secret-safe error_detail on ACTIVITY_LOGGED (internal#212) #1549

2026-05-19T00:29:33Z

core-devops commented

2026-05-19 00:29:33 +00:00

Summary

include error_detail on the ACTIVITY_LOGGED WebSocket broadcast so the canvas chat-tab error banner can show why (provider status + error code + human message) instead of the hardcoded opaque "Agent error (Exception) — see workspace logs for details." string
defense-in-depth sanitizeErrorDetailForBroadcast redacts bearer tokens / sk- API keys / JWT-shaped strings at the broadcast trust boundary while preserving the actionable parts (HTTP status, error code, human-readable provider message) — over-redacting would defeat the whole point of internal#212
omitted from the payload when nil so the canvas's "has actionable reason" guard doesn't false-positive on empty-string keys

Test plan

go test ./internal/handlers/... — TestLogActivity_Broadcast_IncludesErrorDetail, TestLogActivity_Broadcast_OmitsErrorDetailWhenNil, TestSanitizeErrorDetail_StripsSecretShapes/{4 sub-cases}, TestLogActivity_Broadcast_ErrorDetailIsSanitized pass
full handlers suite green (15.7s)

Reason field shape

On the wire, the ACTIVITY_LOGGED event payload now carries:

{
  "activity_type": "a2a_receive",
  "status": "error",
  "error_detail": "Anthropic 403 oauth_org_not_allowed: Your organization has disabled Claude subscription access for Claude Code",
  ...
}

Key is omitted when nil/empty so the canvas can gracefully degrade against an older ws-server.

Refs: internal#212, feedback_surface_actionable_failure_reason_to_user

## Summary - include `error_detail` on the `ACTIVITY_LOGGED` WebSocket broadcast so the canvas chat-tab error banner can show *why* (provider status + error code + human message) instead of the hardcoded opaque "Agent error (Exception) — see workspace logs for details." string - defense-in-depth `sanitizeErrorDetailForBroadcast` redacts bearer tokens / `sk-` API keys / JWT-shaped strings at the broadcast trust boundary while preserving the actionable parts (HTTP status, error code, human-readable provider message) — over-redacting would defeat the whole point of internal#212 - omitted from the payload when nil so the canvas's "has actionable reason" guard doesn't false-positive on empty-string keys ## Test plan - [x] `go test ./internal/handlers/...` — `TestLogActivity_Broadcast_IncludesErrorDetail`, `TestLogActivity_Broadcast_OmitsErrorDetailWhenNil`, `TestSanitizeErrorDetail_StripsSecretShapes/{4 sub-cases}`, `TestLogActivity_Broadcast_ErrorDetailIsSanitized` pass - [x] full handlers suite green (15.7s) ## Reason field shape On the wire, the `ACTIVITY_LOGGED` event payload now carries: ```json { "activity_type": "a2a_receive", "status": "error", "error_detail": "Anthropic 403 oauth_org_not_allowed: Your organization has disabled Claude subscription access for Claude Code", ... } ``` Key is omitted when nil/empty so the canvas can gracefully degrade against an older ws-server. Refs: internal#212, feedback_surface_actionable_failure_reason_to_user

core-devops added 1 commit 2026-05-19 00:29:34 +00:00

fix(workspace-server): surface secret-safe error_detail on ACTIVITY_LOGGED broadcast (internal#212)

Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 6s

Details

CI / Detect changes (pull_request) Successful in 21s

Details

CI / Shellcheck (E2E scripts) (pull_request) Successful in 10s

Details

E2E Peer Visibility (literal MCP list_peers) / E2E Peer Visibility (pull_request) Successful in 7s

Details

E2E API Smoke Test / detect-changes (pull_request) Successful in 12s

Details

Harness Replays / detect-changes (pull_request) Failing after 8s

Details

Harness Replays / Harness Replays (pull_request) Has been skipped

Details

MCP Stdio Transport Regression / MCP stdio with regular-file stdout (pull_request) Successful in 56s

Details

Handlers Postgres Integration / detect-changes (pull_request) Successful in 18s

Details

Lint curl status-code capture / Scan workflows for curl status-capture pollution (pull_request) Successful in 6s

Details

E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 19s

Details

E2E Chat / detect-changes (pull_request) Successful in 24s

Details

lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 41s

Details

publish-runtime-autobump / bump-and-tag (pull_request) Has been skipped

Details

lint-continue-on-error-tracking / lint-continue-on-error-tracking (pull_request) Successful in 58s

Details

Lint pre-flip continue-on-error / Verify continue-on-error flips have run-log proof (pull_request) Successful in 47s

Details

Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 9s

Details

Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 14s

Details

qa-review / approved (pull_request) Failing after 8s

Details

gate-check-v3 / gate-check (pull_request) Successful in 11s

Details

sop-checklist / na-declarations (pull_request) N/A: (none)

Details

publish-runtime-autobump / pr-validate (pull_request) Successful in 1m12s

Details

sop-checklist / all-items-acked (pull_request) Successful in 11s

Details

security-review / approved (pull_request) Failing after 12s

Details

Ops Scripts Tests / Ops scripts (unittest) (pull_request) Successful in 36s

Details

sop-tier-check / tier-check (pull_request) Successful in 13s

Details

Lint workflow YAML (Gitea-1.22.6-hostile shapes) / Lint workflow YAML for Gitea-1.22.6-hostile shapes (pull_request) Successful in 1m53s

Details

lint-required-context-exists-in-bp / lint-required-context-exists-in-bp (pull_request) Successful in 2m0s

Details

E2E API Smoke Test / E2E API Smoke Test (pull_request) Failing after 25s

Details

E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 7s

Details

Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Failing after 24s

Details

E2E Chat / E2E Chat (pull_request) Failing after 53s

Details

CI / Python Lint & Test (pull_request) Successful in 6m10s

Details

CI / Platform (Go) (pull_request) Successful in 7m0s

Details

CI / Canvas (Next.js) (pull_request) Successful in 8m36s

Details

Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 1m18s

Details

CI / Canvas Deploy Reminder (pull_request) Has been skipped

Details

CI / all-required (pull_request) emitter-null compensating success (feedback_gitea_emitter_null_state_blocks_merge); CI ran, state never persisted by Gitea 1.22.6 emitter

audit-force-merge / audit (pull_request) Successful in 17s

Details

94eff31c20

When an a2a_receive row is persisted with status="error" the DB column
error_detail already carries the actionable cause (provider HTTP
status, error code, provider human message). The live ACTIVITY_LOGGED
broadcast dropped it, so the canvas chat-tab error banner fell back
to a hardcoded "Agent error (Exception) — see workspace logs for
details." string with no logs tab to navigate to.

Include error_detail in the broadcast payload, omitted when nil so the
canvas's "has actionable reason" guard doesn't false-positive on empty
keys. Defense-in-depth: a sanitizeErrorDetailForBroadcast scrubber
redacts anything that looks credential-shaped (bearer tokens, sk-
prefixed API keys, JWTs) while preserving the actionable parts
(status codes, error codes, human-readable provider messages) — over-
redacting would defeat the whole point of internal#212.

Tests pin: detail surfaces on the wire, omitted when nil, scrubber
removes secret shapes but keeps actionable text, scrubber survives
the broadcast round-trip.

Refs: internal#212

core-devops referenced this pull request

2026-05-19 00:39:37 +00:00

fix(canvas/chat): surface actionable error reason in chat banner + link to Activity tab (internal#212) #1550

core-be approved these changes 2026-05-19 01:53:33 +00:00

core-be left a comment

5-axis review: correctness OK (secret-safe scrubber preserves actionable text, JWT shape covered, length cap honored); readability strong (comments map each regex to a class); arch OK (re-scrub at broadcast layer is defense-in-depth on top of runtime _sanitize_for_external); security OK (sk-proj-/Bearer/JWT shapes covered, prefix preserved, body capped 4096); perf OK (regex compiled at package init, single replace per call). Tests pin the actionable-substring contract AND the redaction-must-happen contract. APPROVED.

core-qa approved these changes 2026-05-19 01:53:33 +00:00

core-qa left a comment

Reviewed for QA discipline: tests pin both positive (actionable substrings survive) and negative (secret shapes redacted) assertions; nil-error_detail omission test matches request_body/response_body precedent. Real broadcast-layer assertion via recordingBroadcaster, no mocks. APPROVED.

hongming merged commit 83ad7e252b into main

2026-05-19 01:56:11 +00:00

hongming referenced this issue from a commit

2026-05-19 01:56:13 +00:00

Merge pull request 'fix(workspace-server): surface secret-safe error_detail on ACTIVITY_LOGGED (internal#212)' (#1549) from fix/wsserver-broadcast-error-detail into main

Sign in to join this conversation.

3 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: molecule-ai/molecule-core#1549