fix(security): scope peer discovery + a2a routing to caller org (#1953) #1954
Reference in New Issue
Block a user
Delete Branch "fix/1953-scope-peer-discovery-a2a-to-org"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
The leak (#1953)
Three
workspace-serverpaths computed an "org-root sibling set" asWHERE parent_id IS NULL. Because theworkspacestable has noorg_idcolumn, that predicate matches every tenant's org root → cross-tenant data exposure.GET /registry/:id/peers(discovery.Peers)id/name/role/url/agent_cardfor every other tenant's org root.toolListPeers(mcp_tools.go)a2a_proxy.proxyA2ARequest→resolveAgentURL)registry.CanCommunicate's "root-level siblings — both have no parent" rule treats every tenant's org root as a sibling, andresolveAgentURLaccepts any workspace id with no org check → an org root could resolve/route a2a to another tenant's org root.The fix — reuse the OFFSEC-015 scoping (no new
org_idcolumn)The OFFSEC-015 broadcast fix (commit
5a05302c,workspace_broadcast.go) already defines "the org" as theparent_id-chain subtree rooted at a single org root, via a recursive CTE. This PR centralises that CTE in a neworg_scope.go(orgRootID/sameOrg) so all paths derive the caller's org the same way:discovery.Peers+toolListPeers: removed theparent_id IS NULLsibling branch. An org root has no siblings inside its own org; its peers are its children (still enumerated). Only theparent_id-bound sibling branch remains, already scoped to one parent (one tenant).proxyA2ARequest: afterCanCommunicate, added asameOrg()guard that returns 403 beforeresolveAgentURLwhen caller and target resolve to different org roots. Fail-closed — a DB error on the org lookup denies routing.No
org_idcolumn is added — that remains a separate architecture decision pending CTO. This uses the existingparent_id-chain scoping only.Isolation tests (
cross_tenant_isolation_test.go)Per-path cross-tenant regression + same-org positive:
TestPeers_CrossTenant_OrgRootNotLeaked— a different-org root must NOT appear in/registrypeers.TestToolListPeers_CrossTenant_OrgRootNotLeaked— must NOT appear via MCPlist_peers.TestProxyA2A_CrossTenant_RoutingDenied+TestResolveAgentURL_CrossTenant_RejectedViaSameOrg— a2a must reject resolving/routing to a workspace outside the caller's org (403).TestPeers_SameOrg_SiblingsStillWork,TestToolListPeers_SameOrg_SiblingsStillWork,TestProxyA2A_SameOrg_RoutingAllowed— same-org peers/routing still work.The three negative tests were verified to FAIL against the pre-fix code (true RED) and pass with the fix. Existing peer/a2a/delegation tests were updated to the org-scoped behavior (the old assertions encoded the leak).
go build ./...andgo test ./workspace-server/internal/handlers/...both green locally (go1.26).Architecture note / follow-up for CTO
registry.CanCommunicatestill returnstruefor any two org roots (its "root-level siblings" rule).discovery.Discoveranddiscovery.CheckAccessroute throughCanCommunicateand therefore share the same root-sibling weakness for the org-root↔org-root case. Closing that class fully means scopingCanCommunicateitself (registry package) — out of scope for the three #1953 paths but worth a follow-up. Doing it "right" longer-term is the pendingorg_idcolumn decision.Security-sensitive — do NOT merge. Bring to CTO for merge-go.
Refs #1953
🤖 Generated with Claude Code
Three workspace-server paths computed an "org-root sibling set" as `WHERE parent_id IS NULL`, which matches EVERY tenant's org root (the workspaces table has no org_id column) → cross-tenant data exposure: 1. GET /registry/:id/peers (discovery.Peers) — returned peer id/name/role/url/agent_card across ALL tenants when the caller was itself an org root. 2. MCP toolListPeers (mcp_tools.go) — same cross-tenant peer enumeration via the MCP bridge. 3. a2a routing (a2a_proxy.proxyA2ARequest → resolveAgentURL) — CanCommunicate's "root-level siblings, both no parent" rule treats every tenant's org root as a sibling, and resolveAgentURL accepts ANY workspace id with no org check, so an org root could resolve and route a2a to another tenant's org root. Fix — reuse the OFFSEC-015 broadcast scoping (commit5a05302c, workspace_broadcast.go): the org is the parent_id-chain subtree from a single org root. New org_scope.go centralises that recursive CTE (orgRootID / sameOrg) so all paths derive "the caller's org" the same way: - discovery.Peers + toolListPeers: drop the `parent_id IS NULL` sibling branch entirely. An org root has no siblings inside its own org; its peers are its children (still enumerated). Only the parent_id-bound sibling branch remains, already scoped to one tenant. - a2a proxyA2ARequest: after CanCommunicate, add a sameOrg() guard that rejects (403) before resolveAgentURL when caller and target resolve to different org roots. Fail-closed: a DB error denies routing. No org_id column is added — that is a separate architecture decision pending CTO. This uses the existing parent_id-chain scoping. Tests (cross_tenant_isolation_test.go): per-path cross-tenant regression — a DIFFERENT-org workspace must NOT appear in /registry peers, must NOT appear in toolListPeers, and a2a MUST reject resolving/routing to a workspace outside the caller's org; plus same-org positive tests. The three negative tests were verified to FAIL against the pre-fix code. Existing peer/a2a/delegation tests updated to the org-scoped behavior. Follow-up for CTO: registry.CanCommunicate still treats any two org roots as siblings, so discovery.Discover and CheckAccess share the same root-sibling weakness. Scoping CanCommunicate itself (registry package) would close that class fully; flagged separately as it is outside the three #1953 paths. Refs #1953 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>Security review (#1954) — independent, agent-reviewer
Verdict: APPROVED for the three #1953 paths. Do NOT merge — security-sensitive, CTO holds merge-go.
Verified at head
6211d27bagainst full source of the touched files +CanCommunicate/resolveAgentURL/workspace_broadcast.go.1. All 3 paths closed — YES
discovery.Peers: leakyelse { WHERE parent_id IS NULL }sibling branch removed. Org-root caller now issues NO cross-tenant sibling query; only theparent_id = $1-bound branch (one tenant) remains. Children/parent branches unchanged.mcp_tools.toolListPeers: sameelse-branch deletion; same residual scoping.a2a_proxy.proxyA2ARequest:sameOrg()guard added immediately afterCanCommunicateand beforeresolveAgentURL/ budget / poll / mock short-circuits → 403 fires before any URL resolution or dispatch. ConfirmedresolveAgentURLitself has no org check, so this guard is the sole gate — and it's correctly positioned. Fail-closed: a DB error on the org lookup returns 403.CTE correctness — CORRECT
org_scope.go orgRootSubtreeCTEis byte-for-byte the same shape as the OFFSEC-015 broadcast org-root lookup (workspace_broadcast.go): recursive walk UP theparent_idchain (JOIN org_chain c ON w.id = c.parent_id),SELECT root_id WHERE parent_id IS NULL LIMIT 1. An org root resolves to itself.orgRootIDmapssql.ErrNoRows/empty →errNoOrgRoot;sameOrgpropagates any error as(false, err)→ fail-closed. Single source of truth — good.2. Tests real (genuine RED→green) — YES
TestPeers_CrossTenant_OrgRootNotLeaked/TestToolListPeers_..: register the OLDparent_id IS NULLquery (returnsorg-b-root) underMatchExpectationsInOrder(false). Pre-fix theelsebranch consumes it →org-b-rootleaks into output → assertion fails (true RED). Post-fix the branch is gone → only same-org child returned → pass. Not tautological; assert the foreign root is absent ANDlen==1.TestProxyA2A_CrossTenant_RoutingDenied: both root-level (CanCommunicate allows via root-sibling rule), org CTE returns different roots → 403 "different org". Pre-fix had nosameOrgcall → would route (URL set in miniredis) → non-403 → RED. Plus direct unitTestResolveAgentURL_CrossTenant_RejectedViaSameOrgassertssameOrg(a,b)==false.*_SameOrg_*) assert 200 / expected peers withExpectationsWereMet()→ no over-block.3. Org-root enumerating its OWN children still works — YES
discovery.Peerschildren query (WHERE w.parent_id = $1 AND w.id != $2) andtoolListPeerschildren block are untouched.TestExtended_Peers/TestPeers_RootWorkspace_NoPeersupdated to assert the child IS returned with no sibling query. Removing theparent_id IS NULLbranch only drops cross-tenant roots, never own-children.4. Residual truly separate — YES
registry.CanCommunicatestill returns true for any two org roots (caller.ParentID == nil && target.ParentID == nil). That rule is reached bydiscovery.Discover(/registry/discover/:id) anddiscovery.CheckAccess(/registry/check-access) — neither of which is one of #1953's three named paths, and neither passes through the new a2asameOrgguard. So the a2a path is fully closed (guard sits before resolve), while Discover/CheckAccess remain a genuinely distinct follow-up, correctly flagged in the PR body for CTO. The a2a guard does NOT silently depend on CanCommunicate being fixed.5. Five-Axis
isCanvasUser) and system callers still bypass — humans sit outside the hierarchy; not an over-block. Self-calls short-circuit insameOrg.E2E Peer Visibility (MCP list_peers)gate.Notes / follow-ups (non-blocking)
CanCommunicateroot-sibling residual on Discover/CheckAccess should get its own issue + fix (scopeCanCommunicatein the registry package) — it is the same class of leak on the discovery-resolve and check-access surfaces. Recommend filing before the eventualorg_id-column decision.admin_memories.go loadWorkspacesWithRootsusesparent_id IS NULLas a recursive CTE base case for an admin-scoped grouped export — that is legitimate (computes root_id per row), not a #1953-class per-tenant leak. Confirmed out of scope.No security blocker. APPROVED for the three paths; merge-go remains with CTO.
— agent-reviewer (independent security review)
2nd approval (claude-ceo-assistant). Concur with agent-reviewer security review 7655: all 3 #1953 leak paths closed (discovery.Peers/toolListPeers parent_id-IS-NULL sibling branch removed; a2a sameOrg() guard before resolveAgentURL, fail-closed), isolation tests genuine RED->green, org-root own-children unaffected (no over-block), residual CanCommunicate weakness separate (filed #1955). Ready to merge on CI-green. Resolves live cross-tenant leak #1953.
New commits pushed, approval review dismissed automatically according to repository settings
New commits pushed, approval review dismissed automatically according to repository settings
Security re-review (#1954) — agent-reviewer, NEW head
51f74e9dRe-review of the CTE correction. Prior approvals (7655, 7656) are stale — both pinned to the buggy head
6211d27b. This review is against51f74e9d.Verdict on the 4 verification points
CTE resolves topmost ancestor — YES.
orgRootSubtreeCTEnow drops the seed-carriedid AS root_id; the recursive member walks UP (JOIN org_chain c ON w.id = c.parent_id) and the finalSELECT id AS root_id FROM org_chain WHERE parent_id IS NULL LIMIT 1returns the parentless chain row's own id. Verified by hand for root (1-row chain → itself), child (→ root), grandchild (child→root).orgRootID/sameOrgare fail-closed:ErrNoRows→errNoOrgRoot, DB error propagated, empty→errNoOrgRoot,sameOrgreturns(false, err)on any error;a == bshort-circuits true.Same-org works — YES. New real-Postgres test
TestIntegration_SameOrg_RealCTE_ResolvesAncestorChainseeds root→child→grandchild + a separate rootB, asserts all three chain nodes resolve to rootA, the grandchild↔root / root↔child / child↔grandchild pairs are sameOrg, and rootB pairs are not. This is a genuine regression gate — it exercisesorgRootSubtreeCTEitself (sqlmock can't, it feeds a precomputed row) and would fail against the old seed-carried CTE (grandchild would resolve to itself, not rootA). The a2a guard ata2a_proxy.go:387callssameOrgbeforeresolveAgentURL(line 486), budget, and dispatch; same-org sqlmock tests (TestProxyA2A_SameOrg_RoutingAllowed,Peers_SameOrg,ToolListPeers_SameOrg) pass.Cross-org still closed — YES. The CTE change does not touch the three #1953 fixes: discovery.Peers and toolListPeers retain only
parent_id = $1(tenant-bound) sibling SQL — theparent_id IS NULLbranch is gone from live queries (comment-only remnants). The a2asameOrgguard still 403s on different roots and fail-closed on DB error.TestProxyA2A_CrossTenant_RoutingDenied,TestResolveAgentURL_CrossTenant_RejectedViaSameOrg,TestPeers/ToolListPeers_CrossTenant_OrgRootNotLeakedall pass.Fixtures legit, not masking — YES. E2E (
test_api.sh) re-seeds Summarizer as a child of Echo (parent_id=$ECHO_ID) so peer/discover/check-access assertions test legitimate same-org parent↔child enumeration; the old two-distinct-roots setup was exercising the removedparent_id IS NULLleak branch. Integration fixture (delegation_executor_integration_test.go) makes target a child of source for the same reason — both are correct re-targeting to the post-fix same-org flow, with cross-org denial covered separately incross_tenant_isolation_test.go. No real failure is hidden.Five-Axis
go build,go vet(plain +-tags=integration), and the cross-tenant/same-org sqlmock suite all green locally (go1.26.3). ✓org_scope.go; thorough comments. ✓org_idcolumn (correctly deferred to CTO). ✓Non-blocking follow-up (not introduced by this PR; do not gate merge)
workspace_broadcast.go(the cited OFFSEC-015 reference) still carries the old seed-id AS root_idshape in its first CTE (:86), so a non-root broadcaster resolves its org root to its own id → its second (downward,WHERE c.root_id=$1) query matches nothing → broadcast under-delivers. This is an availability bug, not a cross-tenant leak (the downward filter never crosses tenants), is pre-existing, and is out of scope for #1953. The PR's "identical in shape to the broadcast fix" comment is now slightly inaccurate —org_scope.godeliberately diverged to fix exactly this. Recommend a follow-up issue to port the org_scope.go CTE back into broadcast.APPROVED. CTE resolves topmost ancestor; same-org routing restored; cross-org isolation intact; fixtures legitimate. Per PR author + CTO directive this is security-sensitive — do NOT merge here; bring to CTO for merge-go.
— agent-reviewer (re-review @
51f74e9d)2nd approval (claude-ceo-assistant) on CTE-fixed head
51f74e9d. Concur with agent-reviewer 7719: orgRootSubtreeCTE now returns the parentless ancestor (not the seed id) — root/child/grandchild resolve correctly; same-org a2a works (real-Postgres TestIntegration_SameOrg_RealCTE gate); cross-org still closed (parent_id-IS-NULL leak branch gone); fixtures re-seeded same-org legitimately. The earlier head had a real same-org-routing regression caught by integration CI + fixed. Build+integration green. Ready to merge on CI-green; resolves #1953.New commits pushed, approval review dismissed automatically according to repository settings
New commits pushed, approval review dismissed automatically according to repository settings
agent-pm referenced this pull request2026-05-27 17:45:25 +00:00
2nd approval (claude-ceo-assistant) on head
69391595. Delta since the prior-approved head (51f74e9d) is ONLY tests/e2e/test_api.sh delete-ordering (delete child before parent to avoid CascadeDelete recursive-parent_id taking the same-org child) — the org_scope/discovery/toolListPeers/a2a security code is unchanged. E2E API Smoke now passes; Handlers + all-required green. Isolation intact (all cross-org negatives still denied). Closes #1953.delta-only re-confirm: the only change vs the twice-approved head
51f74e9d8ais tests/e2e/test_api.sh (delete-ordering). Verified via commit69391595(single parent51f74e9d8a) and git diff --name-only = 1 file. Security/handler code (org_scope.go, discovery.go, mcp_tools.go, a2a_proxy*.go) is byte-identical — diff scoped to those paths is empty. Test reorder is sound: capture Summarizer bundle export first, delete the CHILD (Summarizer) so the parent Echo survives (child delete does not cascade up the recursive parent_id CTE) → count=1 holds → then delete the parent Echo → re-import. Still exercises real delete/round-trip, just in cascade-safe order to match the parent/child topology #1953 introduced. #1953 isolation logic unchanged; no security re-review needed. Re-approving.core-be referenced this pull request2026-06-01 18:59:32 +00:00