molecule-core/workspace
Hongming Wang 8b9f809966 fix(a2a): SSOT response parser — handle poll-mode queued envelope (#2967)
Introduce ``workspace/a2a_response.py`` as the single source of truth for
the wire shapes the workspace-server proxy can return at
``/workspaces/<id>/a2a``:

  * ``Result``    — JSON-RPC success
  * ``Error``     — JSON-RPC error or platform-level error (with
                    restart-in-progress metadata when present)
  * ``Queued``    — poll-mode short-circuit envelope: the platform
                    queued the message into the target's inbox, the
                    target will fetch via /activity poll
  * ``Malformed`` — anything the parser can't classify (logged at
                    WARNING so a future server change is loud)

``send_a2a_message`` (in ``a2a_client.py``) now dispatches via
``a2a_response.parse(data)`` instead of inline ``"result" in data`` /
``"error" in data`` sniffing. The Queued variant returns a new
``_A2A_QUEUED_PREFIX`` sentinel so callers can distinguish "delivered
async, no synchronous reply" from both success-with-text and failure.

reno-stars production data caught two intermittent failures that
both reduced to the same root cause:

  1. **File transfer announce silently failed** — when CEO Ryan PC
     (poll-mode external molecule-mcp) sent the harmi.zip
     announcement to Reno Stars Business Intelligent (also poll-mode
     external), ``send_a2a_message`` saw the platform's poll-queued
     envelope ``{"status":"queued","delivery_mode":"poll","method":"..."}``,
     didn't recognize it as the synthetic delivery-acknowledgement
     it is, and returned ``[A2A_ERROR] unexpected response shape``.
     The agent fell back to a chunk-shipping path; receiver did get
     the file but operator-facing logs showed a failure that didn't
     actually fail.

  2. **Duplicated agent comm** — same bug, inverted direction. d76
     delegated to 67d, send_a2a_message returned the unexpected-shape
     error, delegate_task wrapped it as DELEGATION FAILED, the calling
     agent retried with sharper wording, the recipient saw the same
     request twice and self-reported "二次请求 — 我先不执行".

External molecule-mcp standalone runtimes are inherently poll-mode
(they have no public URL), so every external↔external A2A pair was
hitting this on every send. The pre-fix client only handled JSON-RPC
``result``/``error`` keys and treated the queued envelope (which has
neither) as malformed. RFC #2339 PR 2 added the queued envelope on
the server side; the client never caught up.

When ``send_a2a_message`` returns the ``_A2A_QUEUED_PREFIX`` sentinel,
``tool_delegate_task`` now transparently falls back to
``_delegate_sync_via_polling`` (RFC #2829 PR-5's durable
``/delegate`` + ``/delegations`` polling path, which DOES work for
poll-mode peers because the platform's executeDelegation goroutine
writes to the inbox queue and the result row arrives when the target
picks it up + replies). The agent gets a real synchronous reply
instead of the empty queued sentinel.

  * ``test_a2a_response.py`` — 62 tests, **100% line coverage** on
    the parser (verified via ``coverage run --source=a2a_response``).
    Includes adversarial-input fuzzing across ~25 pathological
    payloads — parser must never raise.
  * ``test_a2a_client.py::TestSendA2AMessagePollMode`` — 4 tests for
    the new Queued/Error wiring in ``send_a2a_message``.
  * ``test_delegation_sync_via_polling.py::TestPollModeAutoFallback``
    — 3 tests for the auto-fallback in ``tool_delegate_task``,
    including negative cases (push-mode reply must NOT trigger
    fallback; genuine error must NOT silently retry).
  * **Verified all new tests FAIL on pre-fix source** by stashing
    a2a_client.py + a2a_tools_delegation.py and re-running — 5
    failures including ImportError for the missing
    ``_A2A_QUEUED_PREFIX``.

Per the operator-debuggability directive:

  * INFO at every Queued classification (expected variant; operator
    sees normal poll-mode-peer queueing in log stream).
  * INFO at the auto-fallback decision in ``tool_delegate_task``
    so a future operator can correlate "send returned queued →
    falling back to polling path" without reading the source.
  * WARNING at every Malformed classification (server contract
    drift; operator MUST see this immediately).
  * Existing transient-retry WARNING preserved.

  * Mirror Go-side typed model in workspace-server. The wire shape
    is documented in ``a2a_response.py``'s module docstring with
    file:line pointers to the canonical emitters; a future PR can
    introduce ``models/a2a_response.go`` without changing wire
    behavior. The fixture corpus in ``test_a2a_response.py`` is
    designed so a one-sided edit breaks CI.
  * ``send_message_to_user`` and ``chat_upload_receive`` use a
    different endpoint (``/notify``) and aren't affected by this
    bug; their parsing stays unchanged.

  * 135 tests pass across ``test_a2a_response.py`` +
    ``test_a2a_client.py`` + ``test_delegation_sync_via_polling.py``
    + ``test_a2a_tools_impl.py``.
  * ``coverage run --source=a2a_response -m pytest`` reports 100%
    line coverage with 0 missing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 17:21:28 -07:00
..
adapters fix: comprehensive a2a-sdk 1.x migration sweep across workspace/ 2026-04-27 09:42:57 -07:00
builtin_tools feat(harness): coordinator phase-boundary instrumentation for RFC #2251 2026-04-28 20:11:46 -07:00
lib feat(workspace): pre-stop serialization for pause/resume (closes #1386) 2026-04-21 12:40:44 +00:00
molecule_audit chore: open-source restructure — rename dirs, remove internal files, scrub secrets 2026-04-18 00:24:44 -07:00
platform_tools feat(mcp): multi-workspace routing for memory + chat_history + workspace_info 2026-05-04 14:17:58 -07:00
plugins_registry feat(plugin): implement MCPServerAdaptor (issue #847) 2026-04-24 01:42:13 +00:00
policies feat(platform): single-source-of-truth tool registry — adapters consume, no drift 2026-04-28 17:11:36 -07:00
scripts fix(git-token-helper): close TOCTOU window + stop swallowing chmod errors (closes #1552) 2026-04-26 08:22:29 -07:00
skill_loader feat(skills): per-skill runtime compatibility (#119, hermes pattern) 2026-04-27 01:57:43 -07:00
tests fix(a2a): SSOT response parser — handle poll-mode queued envelope (#2967) 2026-05-05 17:21:28 -07:00
.coveragerc test(workspace): centralize pytest-cov config + 92% floor (closes #1817) 2026-04-26 06:21:22 -07:00
a2a_cli.py fix(runtime): use lowercase wire role for v0.3 JSON-RPC compat layer 2026-04-27 12:40:11 -07:00
a2a_client.py fix(a2a): SSOT response parser — handle poll-mode queued envelope (#2967) 2026-05-05 17:21:28 -07:00
a2a_executor.py fix(a2a): route terminal Message via TaskUpdater.complete/failed in task mode 2026-05-03 04:06:45 -07:00
a2a_mcp_server.py fix(onboarding): address Claude Code MCP onboarding friction (#2934) 2026-05-05 14:19:09 -07:00
a2a_response.py fix(a2a): SSOT response parser — handle poll-mode queued envelope (#2967) 2026-05-05 17:21:28 -07:00
a2a_tools_delegation.py fix(a2a): SSOT response parser — handle poll-mode queued envelope (#2967) 2026-05-05 17:21:28 -07:00
a2a_tools_inbox.py refactor(workspace): extract inbox tools from a2a_tools.py (RFC #2873 iter 4e) 2026-05-05 14:28:58 -07:00
a2a_tools_memory.py refactor(workspace): extract memory tools from a2a_tools.py to a2a_tools_memory.py (RFC #2873 iter 4c) 2026-05-05 09:50:39 -07:00
a2a_tools_messaging.py refactor(workspace): extract messaging tools from a2a_tools.py to a2a_tools_messaging.py (RFC #2873 iter 4d) 2026-05-05 09:50:47 -07:00
a2a_tools_rbac.py refactor(workspace): extract RBAC helpers from a2a_tools.py to a2a_tools_rbac.py (RFC #2873 iter 4a) 2026-05-05 04:43:16 -07:00
a2a_tools.py refactor(workspace): extract inbox tools from a2a_tools.py (RFC #2873 iter 4e) 2026-05-05 14:28:58 -07:00
adapter_base.py feat: drop shared_context — use memory v2 team namespace instead 2026-05-04 16:30:26 -07:00
agent.py chore: open-source restructure — rename dirs, remove internal files, scrub secrets 2026-04-18 00:24:44 -07:00
agents_md.py chore: open-source restructure — rename dirs, remove internal files, scrub secrets 2026-04-18 00:24:44 -07:00
boot_routes.py test(runtime): pin PR #2756's card-vs-setup decoupling with build_routes helper 2026-05-04 14:59:56 -07:00
build-all.sh fix: update workspace script comments for workspace-template → workspace rename 2026-04-18 01:48:05 -07:00
card_helpers.py fix(runtime): isolate card-skill enrichment + transcript handler from adapter shape mismatch 2026-05-04 14:15:27 -07:00
config.py feat: drop shared_context — use memory v2 team namespace instead 2026-05-04 16:30:26 -07:00
configs_dir.py fix(runtime): auto-fallback CONFIGS_DIR for non-container hosts (closes #2458) 2026-05-01 13:07:55 -07:00
consolidation.py fix: apply #1124 env-var defaults + scrub F1088 credentials from INCIDENT_LOG.md (#1347) 2026-04-21 08:11:44 +00:00
coordinator.py feat: drop shared_context — use memory v2 team namespace instead 2026-05-04 16:30:26 -07:00
Dockerfile feat(workspace): 45-min gh-token refresh daemon + credential helper cache 2026-04-22 19:52:46 -07:00
entrypoint.sh fix(workspace): credential helper security hardening (#1797) 2026-04-23 18:14:55 +00:00
event_log.py feat(workspace): event_log module + EventLogConfig (#119 PR-2) 2026-05-03 00:17:12 -07:00
events.py chore: open-source restructure — rename dirs, remove internal files, scrub secrets 2026-04-18 00:24:44 -07:00
executor_helpers.py docs(a2a): correct misleading v1-tolerance comments 2026-05-02 02:33:00 -07:00
heartbeat.py feat(workspace): wire observability config into heartbeat + uvicorn (#119 PR-3a) 2026-05-03 01:01:57 -07:00
inbox_uploads.py fix(inbox-uploads): cancel BatchFetcher futures on wait_all timeout 2026-05-05 12:34:41 -07:00
inbox.py fix(inbox): drop unused batch_fetcher = None after end-of-batch drain 2026-05-05 11:56:54 -07:00
initial_prompt.py chore: open-source restructure — rename dirs, remove internal files, scrub secrets 2026-04-18 00:24:44 -07:00
internal_chat_uploads.py fix(workspace): surface errno + path on chat-upload mkdir failure 2026-05-01 11:47:53 -07:00
internal_file_read.py feat(chat_files): rewrite Download as HTTP-forward (RFC #2312, PR-D) 2026-04-29 15:19:02 -07:00
main.py test(runtime): pin PR #2756's card-vs-setup decoupling with build_routes helper 2026-05-04 14:59:56 -07:00
mcp_cli.py feat(mcp): add molecule-mcp doctor onboarding diagnostic 2026-05-05 15:44:36 -07:00
mcp_doctor.py fix(mcp-doctor): heartbeat (idempotent) instead of register (UPSERT) 2026-05-05 16:11:08 -07:00
mcp_heartbeat.py refactor(workspace): split mcp_cli.py (626 LOC) into focused modules (RFC #2873 iter 3) 2026-05-05 04:33:06 -07:00
mcp_inbox_pollers.py refactor(workspace): split mcp_cli.py (626 LOC) into focused modules (RFC #2873 iter 3) 2026-05-05 04:33:06 -07:00
mcp_workspace_resolver.py mcp: surface specific TOKEN_FILE errors + link follow-ups (#2934) 2026-05-05 15:07:15 -07:00
molecule_ai_status.py fix(runtime): replace remaining /app/ legacy paths in agent prompts + docstrings 2026-04-27 11:22:00 -07:00
not_configured_handler.py fix(runtime): redact secret-shaped tokens from JSON-RPC error.data 2026-05-04 15:07:53 -07:00
platform_auth.py feat(mcp): cross-workspace delegation routing (multi-ws PR-2) 2026-05-04 08:32:24 -07:00
platform_inbound_auth.py fix(runtime): auto-fallback CONFIGS_DIR for non-container hosts (closes #2458) 2026-05-01 13:07:55 -07:00
plugins.py chore: open-source restructure — rename dirs, remove internal files, scrub secrets 2026-04-18 00:24:44 -07:00
preflight.py fix(preflight): downgrade required_env + auth_token failures to warnings 2026-05-04 12:20:34 -07:00
prompt.py feat: drop shared_context — use memory v2 team namespace instead 2026-05-04 16:30:26 -07:00
pytest.ini feat(preflight): replace SUPPORTED_RUNTIMES static list with adapter discovery 2026-04-27 00:44:51 -07:00
rebuild-runtime-images.sh fix: update workspace script comments for workspace-template → workspace rename 2026-04-18 01:48:05 -07:00
requirements.txt chore(deps)(deps): update starlette requirement in /workspace 2026-05-03 01:36:45 +00:00
runtime_wedge.py chore(workspace): drop claude_sdk_executor — Phase 2 of #87 2026-04-27 00:52:55 -07:00
secret_redactor.py fix(runtime): redact secret-shaped tokens from JSON-RPC error.data 2026-05-04 15:07:53 -07:00
shared_runtime.py feat(platform): single-source-of-truth tool registry — adapters consume, no drift 2026-04-28 17:11:36 -07:00
smoke_mode.py chore(smoke): runtime_wedge follow-ups from PR #2473 review 2026-05-01 18:01:51 -07:00
transcript_auth.py chore: open-source restructure — rename dirs, remove internal files, scrub secrets 2026-04-18 00:24:44 -07:00
watcher.py chore: open-source restructure — rename dirs, remove internal files, scrub secrets 2026-04-18 00:24:44 -07:00