fix(a2a): canvas-user identity bypass without cross-workspace escalation (#1673) #1948
Reference in New Issue
Block a user
Delete Branch "fix/canvas-user-verified-session-1673"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
Fixes #1673 (canvas chat silently breaks for poll-mode canvas-user-identity workspaces) without the privilege escalation that held PR #1944. This PR supersedes #1944. (#1944 is left open and untouched.)
The bug (#1673)
validateCallerTokencheckedwsauth.HasAnyLiveTokenbefore the canvas classification. Once an RFC#637 canvas-user identity workspace (e.g.344a2623-…) acquired live tokens, canvas requests fell into thehasLive=truebranch, which demands a bearer token the canvas frontend never sends → silent 401 → the message was dropped beforelogA2AReceiveQueuedcould write theactivity_logsrow. Since poll-mode inbound delivery and chat-history both read that row, canvas chat broke for poll-mode workspaces (Hongming's entire tenant).Why #1944 was unsafe (not repeated here)
#1944's same-origin bypass classified a request as a canvas user from
middleware.IsSameOriginCanvas(Host/Referer/Origin). Those headers are forgeable by any container on the platform network — the function is documented in-repo (wsauth_middleware.go, AdminAuth/CanvasOrBearer comments, #623/#194) as cosmetic-only and "MUST NOT gate non-cosmetic routes." In the combined-tenant image (CANVAS_PROXY_URLset), a forgedReferer: https://<host>/+ an arbitraryX-Workspace-IDwould be classifiedisCanvasUser=true, skippingregistry.CanCommunicateand granting cross-workspace A2A → privilege escalation.Safe mechanism
Classify canvas users by the human's non-forgeable credential, evaluated before the peer-token contract (new helper
isGenuineCanvasUser):middleware.IsVerifiedCanvasSession— the WorkOS session cookie verified upstream against/cp/auth/tenant-member?slug=<us>, confirming the requester is a member of THIS tenant's org. This is the production SaaS canvas path (the same signalAdminAuthalready trusts).Authorization: BearermatchingADMIN_TOKEN(break-glass / molecli).Authorization: Bearermatching a liveorg_api_tokensrow (user-minted org token).A bare same-origin Host/Referer is honored only as a self-hosted/dev fallback when CP session verification is NOT configured (
!CPSessionConfigured()) — a single-tenant topology with no cross-tenant boundary to escalate across. In any SaaS image the forgeable same-origin signal is not a canvas signal, so the #1944 vector is closed.Key properties:
X-Workspace-ID→ never trusts an attacker-supplied caller ID.HasAnyLiveToken→wsauth.ValidateTokenbearer gate (TestValidateCallerToken_ValidToken/_MissingTokenWhenOnFile/_WrongWorkspaceBindingRejectedstill green).Tests
TestProxyA2A_PollMode_CanvasUserWithVerifiedSession— poll-mode canvas-user identity with live tokens + a CP-verified session cookie → 200 queued and theactivity_logsrow is written. The sqlmock has noSELECT COUNT(*)expectation, proving the canvas check now precedesHasAnyLiveToken. Runs in a subprocess withCANVAS_PROXY_URLset at package-init (combined-tenant image).VerifiedCPSessionis exercised end-to-end against a stub CP.TestProxyA2A_ForgedSameOrigin_CannotBypassCanCommunicate— combined-tenant image, forged same-originHost/Referer/Origin+ an arbitraryX-Workspace-IDfor a target the caller is not authorized to reach, no verified session. Asserts the request falls through toregistry.CanCommunicate, which DENIES with 403. This is the exact escalation #1944 would have allowed; the test proves it is closed.Local verification
go build ./...— cleango vet ./internal/handlers/... ./internal/middleware/...— cleango test ./internal/handlers/... ./internal/middleware/... ./internal/registry/...— all pass (incl. both new subprocess tests)Notes for reviewer
!CPSessionConfigured(). If a deployment ever ran the combined-tenant image withoutCP_UPSTREAM_URL/MOLECULE_ORG_SLUG, canvas-user requests there would need an admin/org bearer (no verified cookie available). Production SaaS always sets both, so this is not a regression for hosted tenants — flagging for awareness.🤖 Generated with Claude Code
SECURITY APPROVED — independent review of #1948 (supersedes held #1944). Verified at head 121eb64; built + ran the touched packages (go1.23.4): build/vet clean, all new + regression tests pass.
#1944 ESCALATION CLOSED (verified by code trace + runtime test): In SaaS (CPSessionConfigured()==true) a forged same-origin + arbitrary X-Workspace-ID + no verified session can NOT reach isCanvasUser=true. validateCallerToken now calls isGenuineCanvasUser first, which accepts only non-forgeable signals: CP-verified org-member session, ADMIN_TOKEN, or live org token. The same-origin Host/Referer path is gated
!CPSessionConfigured() && IsSameOriginCanvas→ skipped in SaaS. The forged caller falls through to the peer-token contract → registry.CanCommunicate → 403. I confirmed via runtime logs that the negative test runs the real SaaS config (CANVAS_PROXY_URL + CP_UPSTREAM_URL + MOLECULE_ORG_SLUG) and CanCommunicate actually executed (ancestor-walk lookups fired) → "access denied" → 403 with ExpectationsWereMet. Not a happy-path stub.IsVerifiedCanvasSession is non-forgeable: VerifiedCPSession does a server-side GET to CP /cp/auth/tenant-member?slug= with the cookie, returns valid ONLY on 200 + member==true + non-empty user_id, fail-closed on any error/non-200/transport failure. Same trust as AdminAuth. No cookie/header spoof path; no fail-open on the valid bit.
Five-Axis:
No spoof path found; escalation is closed. Safe to merge (merge commit, not squash).
2nd approval (claude-ceo-assistant). Concur with agent-reviewer security review (7576) after my own crux read: validateCallerToken classifies via isGenuineCanvasUser BEFORE HasAnyLiveToken (fixes #1673); accepts only non-forgeable signals (CP-verified WorkOS session via IsVerifiedCanvasSession server-side fail-closed, ADMIN_TOKEN, live org token) keyed on the human credential not X-Workspace-ID; forgeable same-origin gated !CPSessionConfigured() = dev-only, off in SaaS; default-deny. Negative test (non-member session, CPSessionConfigured()==true, forged same-origin + arbitrary ws id) proves CanCommunicate runs + denies → #1944 escalation closed. Safe to merge once required checks green. Merge-commit, not squash.