fix(RCA#2970): protect management MCP from user-plugin eviction on the concierge #159
Reference in New Issue
Block a user
Delete Branch "fix/2970-protect-management-mcp-from-user-plugin-eviction"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
The bug
Installing ANY user plugin on an ONLINE concierge (
POST <tenant>/workspaces/<wsid>/plugins) takes the concierge tofailed:platform agent heartbeat denied: /opt/molecule-mcp-server missing; refusing to mark online (RCA #2970 FAIL-CLOSED).Root cause (evidence-based)
/configsis rebuilt every boot.entrypoint.shdoesrm -rf /configs/plugins(template-claude-code entrypoint.sh L233), re-fetches the DB desired-set (declared ∪ installed,desiredPluginSourcesin core), and the runtime's per-plugin_merge_settings_fragmentre-adds each plugin'smcpServersblock additively.molecule-platformandimage-gen.molecule-ai-plugin-molecule-platform-mcp, a private gitea repo) re-fetching + re-merging on the SAME boot. When that private fetch fails (token/404/gitea-hang — recurring core#3065/#3108) while a public user plugin (image-gen) fetches fine, settings.json ends up with only the user plugin's entry →mcpServers["molecule-platform"]gone →_settings_has_management_mcp()False → heartbeatmcp_server_present=false→ RCA#2970 gate fail-closes (registry.go L1312-1325).The fix
Desired set is now protected-platform-entries ∪ declared-user-plugins: the runtime ALWAYS re-asserts the protected
molecule-platformentry into/configs/.claude/settings.jsonat boot, after the plugin merges, additively (never evicting a user plugin). Gated to the baked platform-agent image viaMOLECULE_PLATFORM_AGENT_IMAGE_BAKEDso ordinary workspaces never declare the org-admin MCP. The protected spec uses the image-baked binary (molecule-platform-mcp,MOLECULE_MCP_MODE=management) per the template'smcp_servers.yaml— so it is independent of the per-boot private-repo plugin fetch and self-heals when that fetch fails.platform_agent_identity.py:on_platform_agent_image()+ensure_management_mcp_in_settings()(idempotent, additive, corrupt-file safe).adapter_base._common_setup: call the self-heal afterinstall_plugins_via_registry.The server-side org-root entitlement + org-admin key injection remain the real privilege boundary; this only wires the local liveness entry.
🤖 Generated with Claude Code
APPROVED on current head
8928878f.5-axis review:
APPROVED on current head
8928878f.5-axis review:
ensure_management_mcp_in_settings()is gated to the baked platform-agent image marker, runs after plugin merges in_common_setup, re-asserts the canonicalmolecule-platformmanagement MCP spec, and preserves user-pluginmcpServersadditively. This closes the private management-plugin fetch failure path without changing ordinary workspace behavior.Reviewed: self-heal re-asserts the protected molecule-platform management MCP from the IMAGE-BAKED binary (verified Dockerfile.platform-agent L58-59 symlink + L75 MOLECULE_PLATFORM_AGENT_IMAGE_BAKED=1 exist), so the entry is independent of the fragile private-repo fetch that was the RCA#2970 trigger. Additive (protected ∪ user-declared), idempotent, no-op on ordinary images, try/except so it never blocks boot. Test reproduces the eviction with the real merge code then proves self-heal restores both molecule-platform + the user plugin. CI green. Sound. LGTM.
Security: re-asserts a fixed image-baked binary (no network/secret dependency); does not weaken the RCA#2970 fail-closed gate (the gate still requires the management MCP — this just makes it reliably present). No new secret surface. LGTM.