fix(concierge): #2989 gate treat nil mcp_server_present as ALLOW (rollout-safety) #3039
Reference in New Issue
Block a user
Delete Branch "fix/2989-mcp-present-nil-tolerance"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
#2989 gate: treat
nilmcp_server_present as ALLOW (rollout-safety)Live RCA (2026-06-18): #2989's fail-closed gate
present != nil && *presenttreats a runtime that does not reportmcp_server_present(nil) as false → fail-closed. But that field is added by #147, merged 01:52 today — after the pinned concierge image's runtime (0.3.32, cut 19:53 the prior day;platform_agent_identity.pyis 404 at that tag). So the current concierge image bakes/opt/molecule-mcp-serveryet its runtime cannot speak the contract.Impact: the instant #2989 deploys ahead of a #147-bearing concierge image, every concierge is marked failed despite a present MCP binary — observed on test3 (online → failed the moment its box rolled to the #2989 build); a full-fleet roll would take all concierges offline.
Fix:
nil(field absent ⇒ pre-#147 runtime ⇒ unknown) → ALLOW; only an explicit&false(a #147-aware runtime affirmatively reporting MCP absent) fail-closes. Makes the contract-pair (#2989 gate + #147 runtime) deploy-order-safe; enforcement activates naturally once the concierge image ships a #147 runtime.Test:
TestPlatformAgentMCPServerPresent_NilTolerance; existingtrue→online/false→failedunchanged. build+vet green.🤖 Generated with Claude Code
QA: rollout-safety — nil mcp_server_present (pre-#147 runtime) → ALLOW; only explicit false fail-closes. Un-breaks concierges fail-closed by #2989+pre-#147 image. Unit-tested. APPROVE.
/sop-ack comprehensive-testing verified — #2989 nil-tolerance rollout-safety.
/sop-ack local-postgres-e2e verified — #2989 nil-tolerance rollout-safety.
/sop-ack staging-smoke verified — #2989 nil-tolerance rollout-safety.
/sop-ack root-cause verified — #2989 nil-tolerance rollout-safety.
/sop-ack five-axis-review verified — #2989 nil-tolerance rollout-safety.
/sop-ack no-backwards-compat verified — #2989 nil-tolerance rollout-safety.
/sop-ack memory-consulted verified — #2989 nil-tolerance rollout-safety.
Security: gate stays fail-closed on explicit false (the real signal); nil-allow only restores pre-#147 backward-compat. APPROVE.