From 159b3978c1eb96ea0fca842f530275a9a770170f Mon Sep 17 00:00:00 2001 From: core-be Date: Sat, 16 May 2026 06:04:14 -0700 Subject: [PATCH 01/27] fix(workspace-server): persist poll-mode canvas user message synchronously before queued 200 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Sibling of #1347/internal#470 — the POLL-mode arm of the canvas user-message data-loss bug Hongming reported ("i sometimes lose my own message when i exit chat", 2026-05-16). Hongming's tenant is entirely poll-mode (4 external workspaces, no URL — verified empirically: every workspace returns the {delivery_mode:poll, status:queued} short-circuit envelope), so #1347 (push-mode only, persists AFTER the poll short-circuit) structurally cannot cover his reported case. #1347's "poll-mode was never affected" framing is overstated: logA2AReceiveQueued's durable activity_logs INSERT ran inside h.goAsync(...) — a detached goroutine with no happens-before barrier against the synthetic {status:queued} 200. The canvas sees the send acknowledged while the row may still be racing; a workspace-server restart / deploy / OOM / EC2 hibernation between the 200 and the goroutine's commit loses the message permanently (chat-history reads activity_logs; missing row = message gone on reopen). No fallback either, unlike push-mode's legacy-INSERT path. Fix: make the poll-mode ingest persist SYNCHRONOUS — committed before the queued 200 — on a context.WithoutCancel context (parity with persistUserMessageAtIngest). Best-effort preserved (LogActivity logs+swallows INSERT errors, never blocks the send). Post-commit broadcast still fires inside LogActivity (a missed WS event is not data loss; the durable row is the truth chat-history re-reads on reopen). TDD: a2a_poll_ingest_persist_test.go — deterministic RED (queued 200 returned in ~0.5ms, before the 150ms INSERT → DATA LOSS) → GREEN after fix. Full internal/handlers + internal/messagestore suites green; vet clean. Refs: molecule-ai/internal#471 (tracking), molecule-ai/internal#470 (push-mode sibling, PR #1347) Co-Authored-By: Claude Opus 4.7 (1M context) --- .../handlers/a2a_poll_ingest_persist_test.go | 136 ++++++++++++++++++ .../internal/handlers/a2a_proxy_helpers.go | 53 +++++-- 2 files changed, 174 insertions(+), 15 deletions(-) create mode 100644 workspace-server/internal/handlers/a2a_poll_ingest_persist_test.go diff --git a/workspace-server/internal/handlers/a2a_poll_ingest_persist_test.go b/workspace-server/internal/handlers/a2a_poll_ingest_persist_test.go new file mode 100644 index 00000000..f16d100b --- /dev/null +++ b/workspace-server/internal/handlers/a2a_poll_ingest_persist_test.go @@ -0,0 +1,136 @@ +package handlers + +// Regression coverage for the POLL-mode arm of the canvas user-message +// data-loss bug (internal#470 sibling — tracked on internal#471). +// +// Bug (reported 2026-05-16 by CTO Hongming): "in canvas i sometimes lose +// my own message when i exit chat". The push-mode arm was fixed by +// #1347 (persistUserMessageAtIngest — a SYNCHRONOUS, before-dispatch, +// context.WithoutCancel INSERT). #1347's framing asserted "poll-mode +// workspaces were never affected — logA2AReceiveQueued already persists +// at ingest". That assertion is OVERSTATED. +// +// Hongming's tenant (slug `hongming`, org 2c940477-...) has 4 workspaces, +// ALL runtime=external with empty URL → ALL delivery_mode=poll (proven +// empirically: a benign A2A probe returns the synthetic +// {"delivery_mode":"poll","status":"queued"} envelope for every one). +// So his reported loss is the POLL path, NOT the push path #1347 fixes. +// +// Root cause (poll arm): the poll-mode short-circuit (a2a_proxy.go ~402) +// calls logA2AReceiveQueued and then IMMEDIATELY returns the synthetic +// 200 {status:"queued"} to the canvas. But logA2AReceiveQueued's durable +// INSERT runs inside h.goAsync(...) — a DETACHED goroutine with NO +// happens-before barrier against the HTTP response. The canvas sees 200 +// ("message accepted") while the activity_logs row may not yet be — and, +// on a workspace-server restart / deploy / OOM / EC2 hibernation between +// the 200 and the goroutine's commit, NEVER will be — durable. There is +// also no fallback (unlike push-mode's legacy-INSERT fallback): a +// swallowed LogActivity error loses the message with only a log line. +// Chat-history reads activity_logs (postgres_store.go:165-187); a missing +// row = message gone on reopen. That is exactly Hongming's symptom. +// +// Fix (parity with push-mode): the poll-mode ingest persist of the +// canvas user message must be SYNCHRONOUS — committed before the queued +// 200 is returned — on a context.WithoutCancel derived context, so a +// client disconnect on chat-exit and a post-response restart cannot lose +// it. Behavior is never worse than today (best-effort; a persist error +// still returns queued). + +import ( + "bytes" + "encoding/json" + "net/http" + "net/http/httptest" + "testing" + "time" + + "github.com/DATA-DOG/go-sqlmock" + "github.com/gin-gonic/gin" +) + +// TestProxyA2A_PollMode_PersistsUserMessageSynchronouslyBeforeQueuedResponse +// is the defining contract: for a poll-mode workspace, the canvas user +// message MUST be durably INSERTed into activity_logs BEFORE the synthetic +// queued 200 is returned to the client — with NO reliance on a detached +// async goroutine completing later. +// +// The test proves the ordering by making the INSERT block briefly and +// asserting the handler does NOT return until the INSERT has completed. +// Pre-fix (INSERT in h.goAsync, response returned immediately) the +// handler returns ~instantly while the INSERT is still pending in the +// goroutine → the elapsed time is far below the injected INSERT delay and +// ExpectationsWereMet() is racy/unmet at return. Post-fix (synchronous +// persist before the queued response) the handler return is gated on the +// INSERT, so elapsed >= the injected delay and the expectation is met +// deterministically at return WITHOUT any waitAsyncForTest()/sleep. +func TestProxyA2A_PollMode_PersistsUserMessageSynchronouslyBeforeQueuedResponse(t *testing.T) { + mock := setupTestDB(t) + setupTestRedis(t) + broadcaster := newTestBroadcaster() + handler := NewWorkspaceHandler(broadcaster, nil, "http://localhost:8080", t.TempDir()) + + const wsID = "ws-poll-sync-persist" + const insertDelay = 150 * time.Millisecond + + expectBudgetCheck(mock, wsID) + + // lookupDeliveryMode → poll, triggering the short-circuit. + mock.ExpectQuery("SELECT delivery_mode FROM workspaces WHERE id"). + WithArgs(wsID). + WillReturnRows(sqlmock.NewRows([]string{"delivery_mode"}).AddRow("poll")) + + // workspace-name lookup inside logA2AReceiveQueued. + mock.ExpectQuery(`SELECT name FROM workspaces WHERE id`). + WithArgs(wsID). + WillReturnRows(sqlmock.NewRows([]string{"name"}).AddRow("Poll WS")) + + // The durable user-message write. We delay it so a synchronous + // persist visibly gates the handler return; a detached-goroutine + // persist (pre-fix) does not. The fix must keep using + // context.WithoutCancel so this write survives a chat-exit cancel. + mock.ExpectExec("INSERT INTO activity_logs"). + WillDelayFor(insertDelay). + WillReturnResult(sqlmock.NewResult(0, 1)) + + w := httptest.NewRecorder() + c, _ := gin.CreateTestContext(w) + c.Params = gin.Params{{Key: "id", Value: wsID}} + + // callerID == "" (no X-Workspace-ID) → this is a canvas_user message, + // exactly Hongming's case. + body := `{"jsonrpc":"2.0","id":"poll-canvas-1","method":"message/send","params":{"message":{"role":"user","parts":[{"text":"my own message"}]}}}` + c.Request = httptest.NewRequest("POST", "/workspaces/"+wsID+"/a2a", bytes.NewBufferString(body)) + c.Request.Header.Set("Content-Type", "application/json") + + start := time.Now() + handler.ProxyA2A(c) + elapsed := time.Since(start) + + // Defining assertion #1: the handler must not have returned the + // queued response before the durable INSERT committed. Pre-fix this + // fails (elapsed ≈ 0, INSERT still racing in goAsync). + if elapsed < insertDelay { + t.Fatalf("poll-mode queued response returned in %v, before the %v user-message INSERT — "+ + "the message is not durable when the client/process goes away (DATA LOSS). "+ + "Persist must be synchronous before the queued 200.", elapsed, insertDelay) + } + + // Defining assertion #2: the durable write actually happened by the + // time the handler returned — checked WITHOUT waitAsyncForTest()/sleep. + if err := mock.ExpectationsWereMet(); err != nil { + t.Fatalf("user-message INSERT was not durable at handler return (unmet sqlmock expectations): %v", err) + } + + // Sanity: still the correct poll-mode envelope + status. + if w.Code != http.StatusOK { + t.Fatalf("expected 200 (queued), got %d: %s", w.Code, w.Body.String()) + } + var resp map[string]interface{} + if err := json.Unmarshal(w.Body.Bytes(), &resp); err != nil { + t.Fatalf("response is not valid JSON: %v", err) + } + if resp["status"] != "queued" || resp["delivery_mode"] != "poll" { + t.Errorf("poll envelope changed: got status=%v delivery_mode=%v, want queued/poll", + resp["status"], resp["delivery_mode"]) + } +} diff --git a/workspace-server/internal/handlers/a2a_proxy_helpers.go b/workspace-server/internal/handlers/a2a_proxy_helpers.go index 5c1d3c2b..b9d4bef9 100644 --- a/workspace-server/internal/handlers/a2a_proxy_helpers.go +++ b/workspace-server/internal/handlers/a2a_proxy_helpers.go @@ -514,26 +514,49 @@ func lookupDeliveryMode(ctx context.Context, workspaceID string) string { // reads in PR 3 — that's how a poll-mode workspace receives inbound A2A // without a public URL. func (h *WorkspaceHandler) logA2AReceiveQueued(ctx context.Context, workspaceID, callerID string, body []byte, a2aMethod string) { + // DATA-LOSS FIX (internal#471 — poll-mode sibling of #1347/internal#470): + // this is the ONLY durable write of a poll-mode inbound message, + // including a canvas_user message (callerID == "") typed in the canvas + // chat. It MUST be SYNCHRONOUS and complete BEFORE the caller returns + // the synthetic {status:"queued"} 200 — otherwise the canvas sees the + // send acknowledged while the activity_logs row is still racing in a + // detached goroutine, and a workspace-server restart / deploy / OOM / + // EC2 hibernation between the 200 and the goroutine's commit loses the + // user's message permanently (chat-history reads activity_logs, so a + // missing row = message gone on reopen). Hongming's tenant is entirely + // poll-mode (4 external workspaces, no URL — verified empirically), so + // his reported loss is THIS path; #1347 (push-mode, persists AFTER the + // poll short-circuit) structurally cannot cover it. + // + // Mirrors persistUserMessageAtIngest's discipline: + // - context.WithoutCancel: a client disconnect on chat-exit (which + // cancels the inbound request ctx) MUST NOT abort this write. + // - SYNCHRONOUS (no goAsync): the row must be durable before the + // queued 200 is returned to the caller. + // - Best-effort: LogActivity already logs+swallows INSERT errors, so + // a hiccup never blocks or fails the user's send (behavior for + // that one request is never worse than the pre-fix async path). + // The post-commit broadcast still fires inside LogActivity; a missed + // WebSocket event is not data loss (the durable row is the truth the + // canvas re-reads on reopen). + insCtx, cancel := context.WithTimeout(context.WithoutCancel(ctx), 30*time.Second) + defer cancel() + var wsName string - db.DB.QueryRowContext(ctx, `SELECT name FROM workspaces WHERE id = $1`, workspaceID).Scan(&wsName) + db.DB.QueryRowContext(insCtx, `SELECT name FROM workspaces WHERE id = $1`, workspaceID).Scan(&wsName) if wsName == "" { wsName = workspaceID } summary := a2aMethod + " → " + wsName + " (queued for poll)" - parent := ctx - h.goAsync(func() { - logCtx, cancel := context.WithTimeout(context.WithoutCancel(parent), 30*time.Second) - defer cancel() - LogActivity(logCtx, h.broadcaster, ActivityParams{ - WorkspaceID: workspaceID, - ActivityType: "a2a_receive", - SourceID: nilIfEmpty(callerID), - TargetID: &workspaceID, - Method: &a2aMethod, - Summary: &summary, - RequestBody: json.RawMessage(body), - Status: "ok", - }) + LogActivity(insCtx, h.broadcaster, ActivityParams{ + WorkspaceID: workspaceID, + ActivityType: "a2a_receive", + SourceID: nilIfEmpty(callerID), + TargetID: &workspaceID, + Method: &a2aMethod, + Summary: &summary, + RequestBody: json.RawMessage(body), + Status: "ok", }) } -- 2.52.0 From b6eecb58d7cce32593d0446c5c600822f727e225 Mon Sep 17 00:00:00 2001 From: Molecule AI Core-BE Date: Sat, 16 May 2026 14:47:07 +0000 Subject: [PATCH 02/27] fix(handlers): prevent poll-mode sync-persist test from hanging CI MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit sqlmock.ExpectationsWereMet() hangs indefinitely when the expected INSERT mock never fires. If the production code ever regresses to goAsync (pre-fix shape), the handler returns before the INSERT fires, the mock never fires, and ExpectationsWereMet() blocks for the full test/-suite timeout — wedging the CI run with no diagnostic. Fix: check expectations in a goroutine with a 2s hard timeout. When the mock has fired (synchronous production code), ExpectationsWereMet() returns <1ms and the select fires the `case err := <-expectDone` arm. When the mock has NOT fired (async regression), the 2s timeout fires and the test fails with a clear message instead of hanging. Also reduce insertDelay from 150ms → 50ms. 50ms is ~50× the normal INSERT latency and sufficient to prove synchronous blocking; the larger value was adding unnecessary suite-level wall-clock under -race detection, where mock delays are amplified by the instrumenter's goroutine overhead. Co-Authored-By: Claude Opus 4.7 --- .../handlers/a2a_poll_ingest_persist_test.go | 32 ++++++++++++++++--- 1 file changed, 28 insertions(+), 4 deletions(-) diff --git a/workspace-server/internal/handlers/a2a_poll_ingest_persist_test.go b/workspace-server/internal/handlers/a2a_poll_ingest_persist_test.go index f16d100b..06dae2b1 100644 --- a/workspace-server/internal/handlers/a2a_poll_ingest_persist_test.go +++ b/workspace-server/internal/handlers/a2a_poll_ingest_persist_test.go @@ -35,6 +35,15 @@ package handlers // client disconnect on chat-exit and a post-response restart cannot lose // it. Behavior is never worse than today (best-effort; a persist error // still returns queued). +// +// TEST DESIGN NOTE: sqlmock.ExpectationsWereMet() hangs indefinitely if +// the expected query never fires. We use a select+default+time.After +// pattern so the test FAILS fast (not hangs) when the production code +// regresses to async (the INSERT never fires before handler returns), +// while still returning promptly when all expectations are met. The +// insertDelay is kept small (50ms) to minimise suite-level timing +// impact under -race detection, where mock delays are amplified by +// the instrumenter's goroutine overhead. import ( "bytes" @@ -70,7 +79,10 @@ func TestProxyA2A_PollMode_PersistsUserMessageSynchronouslyBeforeQueuedResponse( handler := NewWorkspaceHandler(broadcaster, nil, "http://localhost:8080", t.TempDir()) const wsID = "ws-poll-sync-persist" - const insertDelay = 150 * time.Millisecond + // Keep delay small: -race detection amplifies mock delays significantly. + // A 50ms delay is sufficient to prove synchronous blocking (~50× the + // normal INSERT latency) without bloating the full ./... suite runtime. + const insertDelay = 50 * time.Millisecond expectBudgetCheck(mock, wsID) @@ -116,9 +128,21 @@ func TestProxyA2A_PollMode_PersistsUserMessageSynchronouslyBeforeQueuedResponse( } // Defining assertion #2: the durable write actually happened by the - // time the handler returned — checked WITHOUT waitAsyncForTest()/sleep. - if err := mock.ExpectationsWereMet(); err != nil { - t.Fatalf("user-message INSERT was not durable at handler return (unmet sqlmock expectations): %v", err) + // time the handler returned. ExpectionsWereMet() hangs indefinitely if + // the mock never fires (e.g. production code regressed to async), + // so we check it in a goroutine with a hard 2s timeout — fails fast + // (no CI hang) on regression while returning promptly on success. + expectDone := make(chan error, 1) + go func() { expectDone <- mock.ExpectationsWereMet() }() + select { + case err := <-expectDone: + if err != nil { + t.Fatalf("user-message INSERT was not durable at handler return (unmet sqlmock expectations): %v", err) + } + case <-time.After(2 * time.Second): + t.Fatalf("ExpectationsWereMet() hung for >2s — INSERT mock never fired. " + + "Likely cause: production code regressed logA2AReceiveQueued to goAsync " + + "(INSERT fires after handler returns, not before).") } // Sanity: still the correct poll-mode envelope + status. -- 2.52.0 From 44b78e28c82955d1dcadac089dac6e7817e56cd1 Mon Sep 17 00:00:00 2001 From: fullstack-engineer Date: Sun, 17 May 2026 07:20:14 -0700 Subject: [PATCH 03/27] fix(runtime+canvas): surface actionable provider error reason instead of opaque "Agent error (Exception)" MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit internal#212 (P0 from internal#211). When the embedded `claude` CLI emits a terminal result message with is_error=true (e.g. 403 oauth_org_not_allowed "Your organization has disabled Claude subscription access · Use an Anthropic API key instead, or ask your admin to enable access"), the user saw only `Agent error (Exception) — see workspace logs for details.` — a dead end (no such logs UI) that discards the exact secret-safe, actionable text the user needs. Root cause was a multi-cut loss of the CLI's result/error/api_error_status: cut #2 sanitize_agent_error reduced every failure to type(exc).__name__. → add a `reason` passthrough: a pre-curated, user-actionable, secret-safe explanation is surfaced verbatim (still scrubbed for key/token/bearer as a second pass). reason wins over stderr; omitting it preserves the prior generic behavior exactly. cut #3a workspace-server dropped error_detail from the live ACTIVITY_LOGGED websocket broadcast (it was persisted to the DB column but never sent), so the canvas had nothing to render. → include error_detail in the broadcast payload (already capped at 4096 by the runtime's report_activity helper). cut #3b canvas useChatSocket hardcoded the opaque string, ignoring even the activity summary. → render error_detail (fallback: summary, then a generic retry hint). The dead "see workspace logs for details." phrase that pointed at nonexistent UI is removed (a full logs tab is a separate larger follow-up, not this PR — reason-first per CTO). The runtime-side cut #1 (template-claude-code claude_sdk_executor._run_query ignoring is_error and the SDK collapsing errors[] to the bare subtype "success") is fixed in a stacked PR on molecule-ai-workspace-template-claude-code (depends on this PR's sanitize_agent_error `reason` kwarg, which ships via the molecule-ai-workspace-runtime package). Tests: 4 new sanitize_agent_error reason tests (verbatim surfacing, secret scrub still applied, reason>stderr precedence, no-reason unchanged). Verified fail-before / pass-after; full sanitize suite green; no new regressions (the 2 pre-existing test_get_a2a_instructions_mcp failures are unrelated). Refs: internal#211, internal#212 Co-Authored-By: Claude Opus 4.7 (1M context) --- .../tabs/chat/hooks/useChatSocket.ts | 18 +++++- .../internal/handlers/activity.go | 13 +++++ workspace/executor_helpers.py | 20 +++++++ workspace/tests/test_executor_helpers.py | 58 +++++++++++++++++++ 4 files changed, 106 insertions(+), 3 deletions(-) diff --git a/canvas/src/components/tabs/chat/hooks/useChatSocket.ts b/canvas/src/components/tabs/chat/hooks/useChatSocket.ts index 15815e9a..8160cb0b 100644 --- a/canvas/src/components/tabs/chat/hooks/useChatSocket.ts +++ b/canvas/src/components/tabs/chat/hooks/useChatSocket.ts @@ -67,9 +67,21 @@ export function useChatSocket( const own = (targetId || msg.workspace_id) === workspaceId; if (own) { callbacksRef.current.onSendComplete?.(); - callbacksRef.current.onSendError?.( - "Agent error (Exception) — see workspace logs for details.", - ); + // internal#211/#212: surface the runtime's curated, + // user-actionable reason (provider HTTP status + error + // code + the provider's own guidance, e.g. a 403 "org + // disabled · use an API key / ask your admin"). The + // server now includes error_detail in the ACTIVITY_LOGGED + // broadcast; fall back to summary, and only as a last + // resort to a generic line. The old hardcoded + // "Agent error (Exception) — see workspace logs for + // details." string pointed at a logs UI that does not + // exist and discarded the actionable reason entirely. + const detail = + (p.error_detail as string) || + (p.summary as string) || + "The agent turn failed but the runtime reported no detail. Retry once; if it repeats the workspace runtime may need a restart."; + callbacksRef.current.onSendError?.(detail); } } } else if (type === "a2a_send") { diff --git a/workspace-server/internal/handlers/activity.go b/workspace-server/internal/handlers/activity.go index 56dd7a1b..b10bc753 100644 --- a/workspace-server/internal/handlers/activity.go +++ b/workspace-server/internal/handlers/activity.go @@ -691,6 +691,19 @@ func logActivityExec(ctx context.Context, exec activityExecutor, broadcaster eve if respStr != nil { payload["response_body"] = json.RawMessage(respJSON) } + // internal#211/#212: error_detail carries the runtime's curated, + // user-actionable, secret-safe failure reason (provider HTTP + // status + error code + the provider's own guidance, e.g. a 403 + // "org disabled · use an API key / ask your admin"). It is + // already persisted to the DB column above and capped by the + // runtime's report_activity helper (4096 chars). Previously it + // was dropped from the LIVE broadcast, so the canvas had nothing + // to render and fell back to a hardcoded opaque + // "Agent error (Exception) — see workspace logs" string. Include + // it so the chat bubble shows the real reason in real time. + if params.ErrorDetail != nil && *params.ErrorDetail != "" { + payload["error_detail"] = *params.ErrorDetail + } } return func() { diff --git a/workspace/executor_helpers.py b/workspace/executor_helpers.py index 52ae41b4..daf93c60 100644 --- a/workspace/executor_helpers.py +++ b/workspace/executor_helpers.py @@ -608,6 +608,7 @@ def sanitize_agent_error( exc: BaseException | None = None, category: str | None = None, stderr: str | None = None, + reason: str | None = None, ) -> str: """Render an agent-side failure into a user-safe error message. @@ -615,6 +616,18 @@ def sanitize_agent_error( category string (e.g. from `classify_subprocess_error`). If both are given, `category` wins. If neither, the tag defaults to "unknown". + When ``reason`` is provided (internal#211/#212), it is a *pre-curated, + user-actionable, secret-safe* explanation built by the caller from a + provider-side failure — e.g. a 403 "Your organization has disabled + Claude subscription access · Use an Anthropic API key instead, or ask + your admin to enable access" with error code ``oauth_org_not_allowed``. + This text is exactly what the user needs to self-serve, so it is + surfaced VERBATIM as the message instead of being collapsed to the + opaque exception class name. It still passes through the + key/token/bearer/path scrubber as a belt-and-braces second pass so a + buggy caller can't leak a credential that snuck into the reason. + ``reason`` wins over ``stderr``; both lose to neither being set. + When ``stderr`` is provided (e.g. the first ~1 KB of a subprocess stderr or HTTP error body), it is sanitized and appended to the output so the A2A caller gets actionable context without needing to dig through workspace @@ -629,6 +642,13 @@ def sanitize_agent_error( else: tag = "unknown" + if reason: + # Curated, user-actionable reason — surface it as the message. + # Still scrub: a 403/auth/quota message is safe, but the scrubber + # is cheap insurance against a caller that didn't curate cleanly. + clean = _sanitize_for_external(reason[:_MAX_STDERR_PREVIEW]) + return f"Agent error ({tag}): {clean}" + if stderr: # Truncate and sanitize before including — prevents DoS via # a malicious or buggy peer injecting a huge error body, and diff --git a/workspace/tests/test_executor_helpers.py b/workspace/tests/test_executor_helpers.py index 9ca88063..70ee5011 100644 --- a/workspace/tests/test_executor_helpers.py +++ b/workspace/tests/test_executor_helpers.py @@ -788,6 +788,64 @@ def test_sanitize_agent_error_stderr_combined_with_existing_tests(): assert "workspace logs" in out +# ─── reason passthrough (internal#211/#212: surface actionable provider error) ─── + + +def test_sanitize_agent_error_reason_surfaced_verbatim(): + """A curated provider reason is shown to the user, not collapsed to the + exception class name. This is the internal#211 regression: a 403 + org-disabled message must reach the canvas.""" + reason = ( + "provider HTTP 403 — oauth_org_not_allowed — Your organization has " + "disabled Claude subscription access for Claude Code · Use an " + "Anthropic API key instead, or ask your admin to enable access" + ) + + class _ResultErr(Exception): + pass + + out = sanitize_agent_error(exc=_ResultErr("opaque"), reason=reason) + # The actionable provider guidance and status code must be visible. + assert "403" in out + assert "oauth_org_not_allowed" in out + assert "disabled Claude subscription access" in out + assert "ask your admin to enable access" in out + # NOT the old opaque form. + assert "see workspace logs" not in out + + +def test_sanitize_agent_error_reason_still_scrubs_secrets(): + """Even on the reason path the key/token scrubber runs — a buggy caller + that lets a bearer token into the reason still gets it redacted.""" + leaky = ( + "provider HTTP 401 — auth failed — Authorization: Bearer " + "sk-ant-DEADBEEFDEADBEEFDEADBEEF0123456789abcdef please re-auth" + ) + out = sanitize_agent_error(reason=leaky) + assert "[REDACTED]" in out + assert "sk-ant-DEADBEEFDEADBEEFDEADBEEF0123456789abcdef" not in out + # The non-secret guidance still survives the scrub. + assert "401" in out + assert "please re-auth" in out + + +def test_sanitize_agent_error_reason_wins_over_stderr(): + """When both reason and stderr are passed, the curated reason wins.""" + out = sanitize_agent_error( + reason="provider HTTP 403 — use an API key", + stderr="raw subprocess noise that should not be shown", + ) + assert "use an API key" in out + assert "raw subprocess noise" not in out + + +def test_sanitize_agent_error_no_reason_unchanged(): + """Omitting reason preserves the original generic behavior.""" + out = sanitize_agent_error(exc=ValueError("boom")) + assert "ValueError" in out + assert "workspace logs" in out + + # ====================================================================== # classify_subprocess_error -- 2.52.0 From 7d2eaa3748b26cb18446c1a4bbfdf599cbe4fba5 Mon Sep 17 00:00:00 2001 From: fullstack-engineer Date: Sun, 17 May 2026 07:56:16 -0700 Subject: [PATCH 04/27] harden(runtime): scrub bare sk-ant keys, JSON-quoted token/apiKey, aws_secret_access_key in _sanitize_for_external MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Addresses internal#212 PR#1420 dual-review SECURITY finding (infra-sre / infra-runtime-be): _sanitize_for_external missed three real credential shapes because the legacy regex requires a `[ :=]+` separator after the prefix: - bare `sk-ant-api03-…` keys (real key uses `-`, not `[ :=]`) - JSON-quoted "token"/"apiKey"/"secret"/"password" values - `aws_secret_access_key=…` Added three narrowly-scoped regexes (length thresholds tuned so curated short examples like `sk-ant-EXAMPLE-SHORT` / `ghp_SHORT_TOKEN` and all actionable auth/quota/HTTP guidance still pass through). Extended the unit test with test_sanitize_agent_error_reason_scrubs_all_secret_formats asserting redaction for all three new formats plus the original Bearer regression. Full sanitize suite green; existing passthrough assertions unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) --- workspace/executor_helpers.py | 22 ++++++++++ workspace/tests/test_executor_helpers.py | 56 ++++++++++++++++++++++++ 2 files changed, 78 insertions(+) diff --git a/workspace/executor_helpers.py b/workspace/executor_helpers.py index daf93c60..95e5aa81 100644 --- a/workspace/executor_helpers.py +++ b/workspace/executor_helpers.py @@ -599,6 +599,28 @@ def _sanitize_for_external(msg: str) -> str: import re as _re msg = _re.sub(r"(?i)(?:bearer|token|api[_-]?key|sk-)[ :=]+[A-Za-z0-9_/.-]{20,}", "[REDACTED]", msg) + # Bare provider key with NO separator after the prefix — a real + # `sk-ant-api03-…` / `sk-…` key uses `-` (not `[ :=]`) so the rule + # above misses it. Require ≥24 key-ish chars after the `sk-`/`sk-ant-` + # prefix so curated examples like `sk-ant-EXAMPLE-SHORT` (13 chars + # after `sk-ant-`) still pass through un-redacted. + msg = _re.sub(r"(?i)\bsk-(?:ant-)?[A-Za-z0-9_-]{24,}", "[REDACTED]", msg) + # JSON-quoted credential values: {"token": "…"} / {"apiKey": "…"} / + # {"secret": "…"} / {"password": "…"}. Redact only the value, and only + # when it is ≥24 chars so a short curated sample like + # `"api_key": "sk-ant-EXAMPLE-SHORT"` (20-char value) still passes. + msg = _re.sub( + r'(?i)("(?:token|api[_-]?key|secret|password)"\s*:\s*")[^"]{24,}(")', + r"\1[REDACTED]\2", + msg, + ) + # AWS secret access key in `aws_secret_access_key=…` form (env dumps, + # boto tracebacks). The base64-ish value runs until whitespace/quote. + msg = _re.sub( + r"(?i)(aws_secret_access_key\s*[:=]\s*)\S+", + r"\1[REDACTED]", + msg, + ) # Absolute paths: /etc/shadow, /home/user/.aws/credentials, etc. msg = _re.sub(r"(?:/[^/\s]+){2,}", lambda m: m.group(0) if len(m.group(0)) < 60 else "[REDACTED_PATH]", msg) return msg diff --git a/workspace/tests/test_executor_helpers.py b/workspace/tests/test_executor_helpers.py index 70ee5011..8ae3c967 100644 --- a/workspace/tests/test_executor_helpers.py +++ b/workspace/tests/test_executor_helpers.py @@ -829,6 +829,62 @@ def test_sanitize_agent_error_reason_still_scrubs_secrets(): assert "please re-auth" in out +def test_sanitize_agent_error_reason_scrubs_all_secret_formats(): + """The scrubber must redact every realistic credential shape — not just + the `Bearer ` form the original test happened to exercise + (internal#212 review finding: bare `sk-ant-api03-…` keys, JSON-quoted + "token"/"apiKey" values, and `aws_secret_access_key=` all leaked). + All curated/actionable guidance must still survive the scrub. + """ + # 1. Bare sk-ant-api03 key — no `[ :=]` separator after the prefix + # (a real Anthropic key uses `-`), so the legacy regex missed it. + bare = ( + "provider HTTP 401 — auth failed — invalid key " + "sk-ant-api03-AbCdEf0123456789AbCdEf0123456789AbCdEf0123456789xyz " + "please re-auth" + ) + out = sanitize_agent_error(reason=bare) + assert "sk-ant-api03-AbCdEf0123456789AbCdEf0123456789AbCdEf0123456789xyz" not in out + assert "[REDACTED]" in out + assert "401" in out # actionable status survives + assert "please re-auth" in out # actionable guidance survives + + # 2. JSON-quoted "token" / "apiKey" values. + jblob = ( + 'provider error — config dump {"token": ' + '"abcDEF0123456789ghIJKL0123456789mnopQRST", "apiKey": ' + '"sk-ant-api03-ZZZZ1111ZZZZ2222ZZZZ3333ZZZZ4444ZZZZ"} — ' + "use an API key instead" + ) + out = sanitize_agent_error(reason=jblob) + assert "abcDEF0123456789ghIJKL0123456789mnopQRST" not in out + assert "sk-ant-api03-ZZZZ1111ZZZZ2222ZZZZ3333ZZZZ4444ZZZZ" not in out + assert "[REDACTED]" in out + assert "use an API key instead" in out # actionable guidance survives + + # 3. aws_secret_access_key=… form. + awsblob = ( + "provider HTTP 403 — boto credential error " + "aws_secret_access_key=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY — " + "ask your admin to enable access" + ) + out = sanitize_agent_error(reason=awsblob) + assert "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY" not in out + assert "[REDACTED]" in out + assert "403" in out # actionable status survives + assert "ask your admin to enable access" in out # guidance survives + + # 4. Regression: the original Bearer form still redacts. + bearer = ( + "provider HTTP 401 — Authorization: Bearer " + "sk-ant-DEADBEEFDEADBEEFDEADBEEF0123456789abcdef re-auth" + ) + out = sanitize_agent_error(reason=bearer) + assert "sk-ant-DEADBEEFDEADBEEFDEADBEEF0123456789abcdef" not in out + assert "[REDACTED]" in out + assert "re-auth" in out + + def test_sanitize_agent_error_reason_wins_over_stderr(): """When both reason and stderr are passed, the curated reason wins.""" out = sanitize_agent_error( -- 2.52.0 From fb2fd20c9ee617f207b7c403fa51891af6a6ed26 Mon Sep 17 00:00:00 2001 From: Molecule AI Infra-Runtime-BE Date: Sun, 17 May 2026 15:48:31 +0000 Subject: [PATCH 05/27] fix(tests)+build: unblock secret scan and Runtime PR-Built on #1420 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Two CI failures blocking PR #1420: 1. Secret scan: `workspace/tests/test_executor_helpers.py` contains two `sk-ant-DEADBEEF...` fixtures matching `sk-ant-[A-Za-z0-9_-]{40,}`. Replaced both with PLACEHOLDER_LONG_TOKEN_... (≥40 chars, no sk-ant- prefix — scrubber path still exercised). 2. Runtime PR-Built: `workspace/a2a_tools_identity.py` missing from TOP_LEVEL_MODULES in scripts/build_runtime_package.py, causing build failure with "TOP_LEVEL_MODULES drifted". Added it. Both fixes verified locally: - pytest affected tests: 3/3 PASSED - build_runtime_package.py: builds cleanly Co-Authored-By: Claude Opus 4.7 --- scripts/build_runtime_package.py | 1 + workspace/tests/test_executor_helpers.py | 11 +++++++---- 2 files changed, 8 insertions(+), 4 deletions(-) diff --git a/scripts/build_runtime_package.py b/scripts/build_runtime_package.py index 52f57c18..08174da0 100755 --- a/scripts/build_runtime_package.py +++ b/scripts/build_runtime_package.py @@ -62,6 +62,7 @@ TOP_LEVEL_MODULES = { "a2a_tools_memory", "a2a_tools_messaging", "a2a_tools_rbac", + "a2a_tools_identity", "adapter_base", "agent", "agents_md", diff --git a/workspace/tests/test_executor_helpers.py b/workspace/tests/test_executor_helpers.py index 8ae3c967..777ca1d1 100644 --- a/workspace/tests/test_executor_helpers.py +++ b/workspace/tests/test_executor_helpers.py @@ -819,11 +819,11 @@ def test_sanitize_agent_error_reason_still_scrubs_secrets(): that lets a bearer token into the reason still gets it redacted.""" leaky = ( "provider HTTP 401 — auth failed — Authorization: Bearer " - "sk-ant-DEADBEEFDEADBEEFDEADBEEF0123456789abcdef please re-auth" + "PLACEHOLDER_LONG_TOKEN_0123456789abcdefghijklm please re-auth" ) out = sanitize_agent_error(reason=leaky) assert "[REDACTED]" in out - assert "sk-ant-DEADBEEFDEADBEEFDEADBEEF0123456789abcdef" not in out + assert "PLACEHOLDER_LONG_TOKEN_0123456789abcdefghijklm" not in out # The non-secret guidance still survives the scrub. assert "401" in out assert "please re-auth" in out @@ -875,12 +875,15 @@ def test_sanitize_agent_error_reason_scrubs_all_secret_formats(): assert "ask your admin to enable access" in out # guidance survives # 4. Regression: the original Bearer form still redacts. + # Uses PLACEHOLDER_LONG_TOKEN (>=40 chars, no sk-ant- prefix) to avoid + # triggering the secret-scan workflow pattern + # `sk-ant-[A-Za-z0-9_-]{40,}`. bearer = ( "provider HTTP 401 — Authorization: Bearer " - "sk-ant-DEADBEEFDEADBEEFDEADBEEF0123456789abcdef re-auth" + "PLACEHOLDER_LONG_TOKEN_9876543210abcdefghij re-auth" ) out = sanitize_agent_error(reason=bearer) - assert "sk-ant-DEADBEEFDEADBEEFDEADBEEF0123456789abcdef" not in out + assert "PLACEHOLDER_LONG_TOKEN_9876543210abcdefghij" not in out assert "[REDACTED]" in out assert "re-auth" in out -- 2.52.0 From 335796b0b45298ae02bff83f182a5235f07d166d Mon Sep 17 00:00:00 2001 From: Molecule AI Infra-Runtime-BE Date: Sun, 17 May 2026 16:34:31 +0000 Subject: [PATCH 06/27] fix(tests): replace remaining sk-ant-api03- fixtures with non-matching tokens MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The secret-scan workflow flags sk-ant-[A-Za-z0-9_-]{40,} patterns. Two sk-ant-api03-* fixture tokens (47 and 62 chars) were present in test_sanitize_agent_error_reason_scrubs_all_secret_formats. They were not replaced by PR #1430 (which only fixed the sk-ant-DEADBEEF* tokens). Replace with tokens that still exercise the same scrubber paths: - BARE sk-* case (≥24 chars after "sk-"): use sk-FAKEPLACEHOLDER... (53 chars total; starts with "sk-" so the bare-pattern scrubber catches it, but lacks "sk-ant-" so the secret-scan pattern does not fire). - JSON-quoted apiKey value (≥24 chars): use anon_fakefakefake... (45 chars; satisfies the JSON-quoted redaction path; does not match any secret-scan credential pattern). Co-Authored-By: Claude Opus 4.7 --- workspace/tests/test_executor_helpers.py | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/workspace/tests/test_executor_helpers.py b/workspace/tests/test_executor_helpers.py index 777ca1d1..ab953f55 100644 --- a/workspace/tests/test_executor_helpers.py +++ b/workspace/tests/test_executor_helpers.py @@ -840,11 +840,11 @@ def test_sanitize_agent_error_reason_scrubs_all_secret_formats(): # (a real Anthropic key uses `-`), so the legacy regex missed it. bare = ( "provider HTTP 401 — auth failed — invalid key " - "sk-ant-api03-AbCdEf0123456789AbCdEf0123456789AbCdEf0123456789xyz " + "sk-FAKEPLACEHOLDERabcdefghijklmnopqrstuvwxy0123456789 " "please re-auth" ) out = sanitize_agent_error(reason=bare) - assert "sk-ant-api03-AbCdEf0123456789AbCdEf0123456789AbCdEf0123456789xyz" not in out + assert "sk-FAKEPLACEHOLDERabcdefghijklmnopqrstuvwxy0123456789" not in out assert "[REDACTED]" in out assert "401" in out # actionable status survives assert "please re-auth" in out # actionable guidance survives @@ -853,12 +853,12 @@ def test_sanitize_agent_error_reason_scrubs_all_secret_formats(): jblob = ( 'provider error — config dump {"token": ' '"abcDEF0123456789ghIJKL0123456789mnopQRST", "apiKey": ' - '"sk-ant-api03-ZZZZ1111ZZZZ2222ZZZZ3333ZZZZ4444ZZZZ"} — ' + '"anon_fakefakefakefakefakefakefakefakefakefake"} — ' "use an API key instead" ) out = sanitize_agent_error(reason=jblob) assert "abcDEF0123456789ghIJKL0123456789mnopQRST" not in out - assert "sk-ant-api03-ZZZZ1111ZZZZ2222ZZZZ3333ZZZZ4444ZZZZ" not in out + assert "anon_fakefakefakefakefakefakefakefakefakefake" not in out assert "[REDACTED]" in out assert "use an API key instead" in out # actionable guidance survives -- 2.52.0 From 878e08c7fcee38fdaeecd0aeb6e250fe7feb1eac Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Molecule=20AI=20=C2=B7=20fullstack-engineer?= Date: Sun, 17 May 2026 17:13:26 +0000 Subject: [PATCH 07/27] trigger: re-fire CI all-required sentinel (Gitea 1.22.6 skipped-sentinel rerun; no code change) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The CI / all-required sentinel job was never scheduled in the prior ci.yml run (documented Gitea-1.22/act_runner skipped-sentinel quirk), so it never posted its terminal status and the required context stayed pending. Empty-tree commit is the sanctioned 1.22.6 rerun mechanism — it makes the real sentinel job actually schedule and post its genuine status. No source change. -- 2.52.0 From d8c03e9af547f695d280c341673336f64f1dea70 Mon Sep 17 00:00:00 2001 From: fullstack-engineer Date: Sun, 17 May 2026 10:31:56 -0700 Subject: [PATCH 08/27] fix(canvas-mobile): chat composer font-size to 16px to stop iOS focus-zoom MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit On the mobile PWA, tapping into the chat input scaled the whole viewport up. Root cause: iOS Safari/WebKit auto-zooms on focus when the focused field's effective font-size is < 16px. The mobile chat composer