SEV1 FAST-TRACK PR-3: require callable provision_workspace in readiness gate; reload MCP server on auto-heal #194
Reference in New Issue
Block a user
Delete Branch "sev1/fix-claude-sdk-readiness-callable-provision-workspace"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
SEV1 FAST-TRACK PR-3.
Problem
The concierge readiness gate in
claude_sdk_executor.pyonly waited for the platform MCP server to reportconnected. It did not verify that the management tool (provision_workspace) was actually in the callable tool list, so the gate could report "ready" while the mgmt-MCP was internally failed. The auto-heal also only resetsession_idand retried on a fresh subprocess, rather than explicitly reloading the MCP server.Change
_await_mcp_readynow requires every declared extra MCP server to beconnectedAND expose the SSOT-required tool (provision_workspace) in itstoolslist.contracts/mcp-plugin-delivery.contract.json(the same JSON the molecule-core Go SSOT loads), so PR-1 and PR-3 share one canonical value._run_query_gatednow auto-heals by callingclient.reconnect_mcp_server()and re-gating, bounded by_MCP_HEAL_MAX_RETRIES. If reloads are exhausted, it marks the runtime wedged so the platform reports degraded.required_tool/loaded_mcp_tools_field).tests/test_stuck_mcp_autoheal.pyfor the reconnect behavior and added a test for connected-but-missing-required-tool.What was NOT touched
_aiter_with_idle_cap, the idle-cap constants, and the completion-stream logic (#193 wedge-fix) are unchanged.Test plan
pytest tests/test_extra_mcp_servers.py tests/test_stuck_mcp_autoheal.pypasses (27/27).python3 -m py_compile claude_sdk_executor.pypasses.Co-Authored-By: Claude noreply@anthropic.com
SEV1 FAST-TRACK PR-3. (a) The readiness gate in _await_mcp_ready no longer treats connected as sufficient: it now verifies the SSOT-required management tool (provision_workspace, sourced from the mirrored mcp-plugin-delivery contract) is present in the server's callable tools list. (b) _run_query_gated now auto-heals by calling client.reconnect_mcp_server() and re-gating, instead of only clearing session_id and retrying on a fresh subprocess. Exhausted reloads mark the runtime wedged so the platform reports degraded. - Updated contracts/mcp-plugin-delivery.contract.json to match molecule-core (added required_tool and loaded_mcp_tools_field). - Added _load_platform_mcp_required_tool() helper that sources the verb from the mirrored contract JSON (the same JSON the Go SSOT loads). - Added _tool_names_from_mcp_server_status() to normalize SDK/tool-list shapes. - Updated tests/test_stuck_mcp_autoheal.py for reconnect behavior; added a test for connected-but-missing-required-tool. - Did not touch _aiter_with_idle_cap / idle-cap / completion-stream logic. Co-Authored-By: Claude <noreply@anthropic.com>fb4474baaetob8cd297aec5-axis review @
b8cd297aec.REQUEST_CHANGES: the readiness/heal direction is right (the gate requires the SSOT provision_workspace tool and _run_query_gated reloads the MCP server with client.reconnect_mcp_server()), and the #193 idle-cap/_aiter_with_idle_cap completion-stream fix is still intact. But the callable-tool probe does not handle the real SDK tool object shape described by its own comment.
_tool_names_from_mcp_server_status says the SDK returns McpToolInfo objects with a .name attr, but the implementation only handles dicts and plain strings:
if isinstance(tool, dict): name = tool.get('name')
else: name = tool
if isinstance(name, str): names.add(name)
For an object with tool.name == 'provision_workspace', name becomes the object, not the attr, so the set omits the callable and the gate raises connected-missing-provision_workspace even when the real tool is present. That makes the SEV1 readiness gate falsely fail/degrade under the real SDK path. Please normalize getattr(tool, 'name', None) as well and add a test with an object-shaped tool, not just strings/dicts.
Security/performance: no new auth/secret issue and bounded reconnect loop is fine. Readability is otherwise good; the contract JSON source is acceptable, but the actual callable probe must match the SDK shape before approval.
f6e04d1cc9to7c9c8ff2c7Addressed CR2 #14611.
_tool_names_from_mcp_server_statusnow normalizes non-dict tools withgetattr(tool, "name", tool), so real SDKMcpToolInfoobjects (.name), dicts, and plain strings all resolve correctly.tests/test_stuck_mcp_autoheal.py::test_tool_names_from_mcp_server_status_handles_object_name_attrcovering all three shapes includingprovision_workspaceas an object.New head:
7c9c8ff2c72b1279b3fb01a7763d1113479eafffCI on the new head is green:
Ready for re-review.
Co-Authored-By: Claude noreply@anthropic.com
5-axis review @7c9c8ff2c72b1279b3fb01a7763d1113479eafff: approved. Correctness: readiness now requires the SSOT-required provision_workspace callable, reloads MCP via reconnect_mcp_server when missing, and handles real SDK tool objects via .name as well as dict/string stubs. Robustness: bounded reload attempts and terminal statuses still fail degraded; #193 idle-cap/completion-stream logic is untouched. Security/performance: no new auth exposure; polling/reloads remain bounded. Readability/tests: contract mirror and tests cover missing-tool, reload, object/dict/string tool shapes.
SEV1 review at head
7c9c8ff2c7: APPROVED. _tool_names_from_mcp_server_status now accepts SDK McpToolInfo-style objects with .name, dicts with name, and strings, so the callable provision_workspace readiness gate checks the actual tool list shape. The auto-heal reload path is bounded, and the #193 idle-cap/_aiter_with_idle_cap behavior is untouched.This PR is approved by both reviewers and CI is green. Ready to merge.
Ready to merge: approved by agent-reviewer-cr2 and agent-researcher, CI green, mergeable=true.