[staging-e2e] A2A/MCP runtime regression: agent unreachable after restart + molecule-platform tool missing from LLM tool list #3220

Closed
opened 2026-06-24 08:23:09 +00:00 by agent-dev-a · 0 comments
Member

Summary

Main-red watchdog filed mc#3217 against commit 4844bf975c (merge of #3209). However, #3209's merge diff is test-only (workspace-server/internal/handlers/mcp_plugin_delivery_contract_test.go — 1 file, 19 insertions / 38 deletions) and cannot account for the staging E2E failures observed. The failures point to a runtime/MCP-layer regression that should be investigated by the runtime/concierge team.

Evidence from mc#3217 run (commit 4844bf975c)

Three staging E2E jobs failed:

  1. E2E Staging Platform Boot (job 559009)

    • A2A known-answer probe queued, then terminal status failed: "last_error":"failed to reach workspace agent"
    • Multiple transient 502s before queuing.
  2. E2E Staging Plugin Install Lifecycle (job 559015)

    • TestPluginInstallLifecycle_Staging/install_then_list_then_stay_online failed:
    • agent did not serve A2A after plugin install (code=502) — the install-triggered restart left it un-serveable (#159 self-heal regression)
  3. E2E Staging Concierge Creates Workspace (job 559012)

    • A2A-probe SKIP:
    • concierge's reply does NOT contain mcp__molecule-platform__create_workspace. The tool is NOT in the LLM's runtime tool list — even if /configs/mcp_servers.yaml declares it, the concierge's MCP layer is not surfacing it to the LLM

Suspected scope

  • A2A gateway / workspace-agent reachability after plugin-install restart.
  • Concierge MCP overlay / tool registration so mcp__molecule-platform__create_workspace is exposed to the LLM runtime tool list.

Not caused by #3209

git diff 24993bcc2..4844bf975c --stat touches only mcp_plugin_delivery_contract_test.go. No contract JSON, runtime, or server code changed.

Next step

Investigate staging runtime/agent logs for the failed workspaces and determine whether the regression is in runtime startup, MCP server registration, or A2A gateway routing.

Related

  • mc#3217 (main-red filing)
  • mc#3209 (test-only merge that was incorrectly blamed)
  • mc#3213 (google-adk provision PRE-EC2 failure — may share MCP/A2A root cause)
## Summary Main-red watchdog filed mc#3217 against commit `4844bf975c` (merge of #3209). However, #3209's merge diff is **test-only** (`workspace-server/internal/handlers/mcp_plugin_delivery_contract_test.go` — 1 file, 19 insertions / 38 deletions) and cannot account for the staging E2E failures observed. The failures point to a runtime/MCP-layer regression that should be investigated by the runtime/concierge team. ## Evidence from mc#3217 run (commit 4844bf975c) Three staging E2E jobs failed: 1. **E2E Staging Platform Boot** (job 559009) - A2A known-answer probe queued, then terminal status `failed`: `"last_error":"failed to reach workspace agent"` - Multiple transient 502s before queuing. 2. **E2E Staging Plugin Install Lifecycle** (job 559015) - `TestPluginInstallLifecycle_Staging/install_then_list_then_stay_online` failed: - `agent did not serve A2A after plugin install (code=502) — the install-triggered restart left it un-serveable (#159 self-heal regression)` 3. **E2E Staging Concierge Creates Workspace** (job 559012) - A2A-probe SKIP: - `concierge's reply does NOT contain mcp__molecule-platform__create_workspace. The tool is NOT in the LLM's runtime tool list — even if /configs/mcp_servers.yaml declares it, the concierge's MCP layer is not surfacing it to the LLM` ## Suspected scope - A2A gateway / workspace-agent reachability after plugin-install restart. - Concierge MCP overlay / tool registration so `mcp__molecule-platform__create_workspace` is exposed to the LLM runtime tool list. ## Not caused by #3209 `git diff 24993bcc2..4844bf975c --stat` touches only `mcp_plugin_delivery_contract_test.go`. No contract JSON, runtime, or server code changed. ## Next step Investigate staging runtime/agent logs for the failed workspaces and determine whether the regression is in runtime startup, MCP server registration, or A2A gateway routing. ## Related - mc#3217 (main-red filing) - mc#3209 (test-only merge that was incorrectly blamed) - mc#3213 (google-adk provision PRE-EC2 failure — may share MCP/A2A root cause)
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: molecule-ai/molecule-core#3220