Follow-up #383: empirical confirmation of which delegation path bypassed SDK self-guard on 小董文婷 #1625

Open
opened 2026-05-20 22:50:41 +00:00 by core-be · 0 comments
Member

PR#1624 lands the platform-side defense-in-depth (peers handler excludes self + agent-readable 400 body). This issue tracks the open SDK-side question.

Background

小董文婷 (chloe-dong tenant, external REMOTE-type workspace at chloe-dong.moleculesai.app) showed a repeating Activity-tab self-delegation 400-loop on 2026-05-20.

The SDK at molecule_runtime/a2a_tools_delegation.py:226-233 has an in-process guard:

effective_src = source_workspace_id or _peer_to_source.get(workspace_id) or WORKSPACE_ID
if workspace_id and workspace_id == effective_src:
    return 'Error: cannot delegate_task to your own workspace...'

For the 400-loop to fire, the SDK guard must NOT have caught the request — meaning the request bypassed tool_delegate_task entirely, OR _peer_to_source[workspace_id] resolved to a non-self value that fooled the guard, OR the agent went through tool_delegate_task_async (which has its own guard at line 372 — should also have caught it).

Investigation needed (with proper credentials this time)

  1. CP admin DB access: find 小董文婷's workspace_id + parent_id chain in chloe-dong tenant's workspace-server postgres. Confirm whether the row has self in any peer-relation column.

  2. Live SDK probe: ssh into 小董文婷's container (T4 REMOTE, so SSM not EC2) — capture WORKSPACE_ID env var + MOLECULE_WORKSPACES + run python -c 'from molecule_runtime.a2a_client import _peer_to_source; print(_peer_to_source)' to see what the peer-cache thinks.

  3. Loki search: {tenant="tenant-chloe-dong"} |~ "_delegate_sync_via_polling" in last 1h — find the exact frame that POSTed to /delegate with target==src.

  4. PyPI version pin: confirm 小董文婷 is running molecule-ai-workspace-runtime 0.1.1013 (the SDK with the guard) vs an older pre-guard build.

Hypotheses to test

  • H1: 小董文婷 was using an older PyPI release before #190's guard landed (0.1.1012 or earlier). Test via pip show in the container.
  • H2: The MCP host (Claude Desktop / molecule-mcp-claude-channel) is making raw HTTP POST to /workspaces/:id/delegate, bypassing the Python SDK entirely. Test via Loki ingress logs to find the User-Agent.
  • H3: _peer_to_source got populated with {own_id: some_other_id} via a buggy list_peers call, fooling effective_src. Test via the live probe in step 2.

SDK belt-and-suspenders (gated on findings)

If H1 or H2 is the cause, the right follow-up is:

  • Add a server-side defense in delegate handler that returns the agent-readable 400 (PR#1624 already does this).
  • File a separate PR in molecule-ai-workspace-runtime to add explicit p for p in peers if p['id'] != workspace_id filter inside get_peers_with_diagnostic. ~5 LoC. Closes the class even on stale-SDK clients.

Gate

Dispatch a sub-agent or live-debug session WITH valid CP_ADMIN_API_TOKEN to run steps 1-4.

Refs: #383, PR#1624, #190, #548.

Filed by core-be persona.

PR#1624 lands the platform-side defense-in-depth (peers handler excludes self + agent-readable 400 body). This issue tracks the open SDK-side question. ## Background 小董文婷 (chloe-dong tenant, external REMOTE-type workspace at chloe-dong.moleculesai.app) showed a repeating Activity-tab self-delegation 400-loop on 2026-05-20. The SDK at molecule_runtime/a2a_tools_delegation.py:226-233 has an in-process guard: effective_src = source_workspace_id or _peer_to_source.get(workspace_id) or WORKSPACE_ID if workspace_id and workspace_id == effective_src: return 'Error: cannot delegate_task to your own workspace...' For the 400-loop to fire, the SDK guard must NOT have caught the request — meaning the request bypassed tool_delegate_task entirely, OR _peer_to_source[workspace_id] resolved to a non-self value that fooled the guard, OR the agent went through tool_delegate_task_async (which has its own guard at line 372 — should also have caught it). ## Investigation needed (with proper credentials this time) 1. **CP admin DB access**: find 小董文婷's workspace_id + parent_id chain in chloe-dong tenant's workspace-server postgres. Confirm whether the row has self in any peer-relation column. 2. **Live SDK probe**: ssh into 小董文婷's container (T4 REMOTE, so SSM not EC2) — capture WORKSPACE_ID env var + MOLECULE_WORKSPACES + run `python -c 'from molecule_runtime.a2a_client import _peer_to_source; print(_peer_to_source)'` to see what the peer-cache thinks. 3. **Loki search**: `{tenant="tenant-chloe-dong"} |~ "_delegate_sync_via_polling"` in last 1h — find the exact frame that POSTed to /delegate with target==src. 4. **PyPI version pin**: confirm 小董文婷 is running molecule-ai-workspace-runtime 0.1.1013 (the SDK with the guard) vs an older pre-guard build. ## Hypotheses to test - **H1**: 小董文婷 was using an older PyPI release before #190's guard landed (0.1.1012 or earlier). Test via `pip show` in the container. - **H2**: The MCP host (Claude Desktop / molecule-mcp-claude-channel) is making raw HTTP POST to /workspaces/:id/delegate, bypassing the Python SDK entirely. Test via Loki ingress logs to find the User-Agent. - **H3**: _peer_to_source got populated with `{own_id: some_other_id}` via a buggy list_peers call, fooling effective_src. Test via the live probe in step 2. ## SDK belt-and-suspenders (gated on findings) If H1 or H2 is the cause, the right follow-up is: - Add a server-side defense in delegate handler that returns the agent-readable 400 (PR#1624 already does this). - File a separate PR in molecule-ai-workspace-runtime to add explicit `p for p in peers if p['id'] != workspace_id` filter inside get_peers_with_diagnostic. ~5 LoC. Closes the class even on stale-SDK clients. ## Gate Dispatch a sub-agent or live-debug session WITH valid CP_ADMIN_API_TOKEN to run steps 1-4. Refs: #383, PR#1624, #190, #548. Filed by core-be persona.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: molecule-ai/molecule-core#1625