fix(workspace-server): surface secret-safe error_detail on ACTIVITY_LOGGED (internal#212) #1549

Merged
hongming merged 1 commits from fix/wsserver-broadcast-error-detail into main 2026-05-19 01:56:11 +00:00
Member

Summary

  • include error_detail on the ACTIVITY_LOGGED WebSocket broadcast so the canvas chat-tab error banner can show why (provider status + error code + human message) instead of the hardcoded opaque "Agent error (Exception) — see workspace logs for details." string
  • defense-in-depth sanitizeErrorDetailForBroadcast redacts bearer tokens / sk- API keys / JWT-shaped strings at the broadcast trust boundary while preserving the actionable parts (HTTP status, error code, human-readable provider message) — over-redacting would defeat the whole point of internal#212
  • omitted from the payload when nil so the canvas's "has actionable reason" guard doesn't false-positive on empty-string keys

Test plan

  • go test ./internal/handlers/...TestLogActivity_Broadcast_IncludesErrorDetail, TestLogActivity_Broadcast_OmitsErrorDetailWhenNil, TestSanitizeErrorDetail_StripsSecretShapes/{4 sub-cases}, TestLogActivity_Broadcast_ErrorDetailIsSanitized pass
  • full handlers suite green (15.7s)

Reason field shape

On the wire, the ACTIVITY_LOGGED event payload now carries:

{
  "activity_type": "a2a_receive",
  "status": "error",
  "error_detail": "Anthropic 403 oauth_org_not_allowed: Your organization has disabled Claude subscription access for Claude Code",
  ...
}

Key is omitted when nil/empty so the canvas can gracefully degrade against an older ws-server.

Refs: internal#212, feedback_surface_actionable_failure_reason_to_user

## Summary - include `error_detail` on the `ACTIVITY_LOGGED` WebSocket broadcast so the canvas chat-tab error banner can show *why* (provider status + error code + human message) instead of the hardcoded opaque "Agent error (Exception) — see workspace logs for details." string - defense-in-depth `sanitizeErrorDetailForBroadcast` redacts bearer tokens / `sk-` API keys / JWT-shaped strings at the broadcast trust boundary while preserving the actionable parts (HTTP status, error code, human-readable provider message) — over-redacting would defeat the whole point of internal#212 - omitted from the payload when nil so the canvas's "has actionable reason" guard doesn't false-positive on empty-string keys ## Test plan - [x] `go test ./internal/handlers/...` — `TestLogActivity_Broadcast_IncludesErrorDetail`, `TestLogActivity_Broadcast_OmitsErrorDetailWhenNil`, `TestSanitizeErrorDetail_StripsSecretShapes/{4 sub-cases}`, `TestLogActivity_Broadcast_ErrorDetailIsSanitized` pass - [x] full handlers suite green (15.7s) ## Reason field shape On the wire, the `ACTIVITY_LOGGED` event payload now carries: ```json { "activity_type": "a2a_receive", "status": "error", "error_detail": "Anthropic 403 oauth_org_not_allowed: Your organization has disabled Claude subscription access for Claude Code", ... } ``` Key is omitted when nil/empty so the canvas can gracefully degrade against an older ws-server. Refs: internal#212, feedback_surface_actionable_failure_reason_to_user
core-devops added 1 commit 2026-05-19 00:29:34 +00:00
fix(workspace-server): surface secret-safe error_detail on ACTIVITY_LOGGED broadcast (internal#212)
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 6s
CI / Detect changes (pull_request) Successful in 21s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 10s
E2E Peer Visibility (literal MCP list_peers) / E2E Peer Visibility (pull_request) Successful in 7s
E2E API Smoke Test / detect-changes (pull_request) Successful in 12s
Harness Replays / detect-changes (pull_request) Failing after 8s
Harness Replays / Harness Replays (pull_request) Has been skipped
MCP Stdio Transport Regression / MCP stdio with regular-file stdout (pull_request) Successful in 56s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 18s
Lint curl status-code capture / Scan workflows for curl status-capture pollution (pull_request) Successful in 6s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 19s
E2E Chat / detect-changes (pull_request) Successful in 24s
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 41s
publish-runtime-autobump / bump-and-tag (pull_request) Has been skipped
lint-continue-on-error-tracking / lint-continue-on-error-tracking (pull_request) Successful in 58s
Lint pre-flip continue-on-error / Verify continue-on-error flips have run-log proof (pull_request) Successful in 47s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 9s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 14s
qa-review / approved (pull_request) Failing after 8s
gate-check-v3 / gate-check (pull_request) Successful in 11s
sop-checklist / na-declarations (pull_request) N/A: (none)
publish-runtime-autobump / pr-validate (pull_request) Successful in 1m12s
sop-checklist / all-items-acked (pull_request) Successful in 11s
security-review / approved (pull_request) Failing after 12s
Ops Scripts Tests / Ops scripts (unittest) (pull_request) Successful in 36s
sop-tier-check / tier-check (pull_request) Successful in 13s
Lint workflow YAML (Gitea-1.22.6-hostile shapes) / Lint workflow YAML for Gitea-1.22.6-hostile shapes (pull_request) Successful in 1m53s
lint-required-context-exists-in-bp / lint-required-context-exists-in-bp (pull_request) Successful in 2m0s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Failing after 25s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 7s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Failing after 24s
E2E Chat / E2E Chat (pull_request) Failing after 53s
CI / Python Lint & Test (pull_request) Successful in 6m10s
CI / Platform (Go) (pull_request) Successful in 7m0s
CI / Canvas (Next.js) (pull_request) Successful in 8m36s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 1m18s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
CI / all-required (pull_request) emitter-null compensating success (feedback_gitea_emitter_null_state_blocks_merge); CI ran, state never persisted by Gitea 1.22.6 emitter
audit-force-merge / audit (pull_request) Successful in 17s
94eff31c20
When an a2a_receive row is persisted with status="error" the DB column
error_detail already carries the actionable cause (provider HTTP
status, error code, provider human message). The live ACTIVITY_LOGGED
broadcast dropped it, so the canvas chat-tab error banner fell back
to a hardcoded "Agent error (Exception) — see workspace logs for
details." string with no logs tab to navigate to.

Include error_detail in the broadcast payload, omitted when nil so the
canvas's "has actionable reason" guard doesn't false-positive on empty
keys. Defense-in-depth: a sanitizeErrorDetailForBroadcast scrubber
redacts anything that looks credential-shaped (bearer tokens, sk-
prefixed API keys, JWTs) while preserving the actionable parts
(status codes, error codes, human-readable provider messages) — over-
redacting would defeat the whole point of internal#212.

Tests pin: detail surfaces on the wire, omitted when nil, scrubber
removes secret shapes but keeps actionable text, scrubber survives
the broadcast round-trip.

Refs: internal#212
core-be approved these changes 2026-05-19 01:53:33 +00:00
core-be left a comment
Member

5-axis review: correctness OK (secret-safe scrubber preserves actionable text, JWT shape covered, length cap honored); readability strong (comments map each regex to a class); arch OK (re-scrub at broadcast layer is defense-in-depth on top of runtime _sanitize_for_external); security OK (sk-proj-/Bearer/JWT shapes covered, prefix preserved, body capped 4096); perf OK (regex compiled at package init, single replace per call). Tests pin the actionable-substring contract AND the redaction-must-happen contract. APPROVED.

5-axis review: correctness OK (secret-safe scrubber preserves actionable text, JWT shape covered, length cap honored); readability strong (comments map each regex to a class); arch OK (re-scrub at broadcast layer is defense-in-depth on top of runtime _sanitize_for_external); security OK (sk-proj-/Bearer/JWT shapes covered, prefix preserved, body capped 4096); perf OK (regex compiled at package init, single replace per call). Tests pin the actionable-substring contract AND the redaction-must-happen contract. APPROVED.
core-qa approved these changes 2026-05-19 01:53:33 +00:00
core-qa left a comment
Member

Reviewed for QA discipline: tests pin both positive (actionable substrings survive) and negative (secret shapes redacted) assertions; nil-error_detail omission test matches request_body/response_body precedent. Real broadcast-layer assertion via recordingBroadcaster, no mocks. APPROVED.

Reviewed for QA discipline: tests pin both positive (actionable substrings survive) and negative (secret shapes redacted) assertions; nil-error_detail omission test matches request_body/response_body precedent. Real broadcast-layer assertion via recordingBroadcaster, no mocks. APPROVED.
hongming merged commit 83ad7e252b into main 2026-05-19 01:56:11 +00:00
Sign in to join this conversation.
No Reviewers
3 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: molecule-ai/molecule-core#1549