fix(concierge): recognize plugin-delivered management MCP in RCA#2970 online gate #156

Merged
core-devops merged 5 commits from fix/concierge-online-gate-plugin-mcp into main 2026-06-20 00:17:50 +00:00
Member

Problem (SSM-proven on staging 2026-06-19)

A fresh SaaS concierge boots cleanly on the workspace-template-claude-code image, installs the molecule-platform plugin, and wires the management MCP into /configs/.claude/settings.json (runtime #149). Yet CP marks it failed:

Registry register: platform agent registered without /opt/molecule-mcp-server; refusing online
registry_register_400 reason=platform_agent_mcp_server_missing
Heartbeat: ... refusing to mark online (RCA #2970 FAIL-CLOSED)

Root cause

mcp_server_present() only checked the legacy /opt/molecule-mcp-server binary baked into the retired platform-agent image. The concierge is now the claude-code + plugin composition, so the file is absent → runtime self-reports mcp_server_present=false → the RCA #2970 fail-closed gate (workspace-server/internal/handlers/registry.go) refuses online-marking → create_workspace never reachable. The e2e then auto-skips its assertion (green-but-unproven).

Fix

Make the signal delivery- and runtime-agnostic: mcp_server_present() returns true when the baked binary exists OR settings.json wires the molecule-platform mcpServer. Absence of both stays fail-closed. Mirrors the platform model — platform-ness is a composition (org key + management MCP plugin) on an ordinary runtime, not a special image.

Tests

plugin-wired settings ⇒ true; other-MCP-only / malformed / missing ⇒ false; binary-alone still true; name-matches-plugin guard. Existing register/heartbeat payload tests unchanged.

Follow-ups (separate)

  • Flip staging e2e to E2E_REQUIRE_LIVE=1 after this lands + image re-pins.
  • Seed system-prompt.md (non-blocking boot WARN).
  • Generalize first-class "any workspace/runtime → management agent" via org-root entitlement.

🤖 Generated with Claude Code

## Problem (SSM-proven on staging 2026-06-19) A fresh SaaS concierge boots **cleanly** on the `workspace-template-claude-code` image, installs the `molecule-platform` plugin, and wires the management MCP into `/configs/.claude/settings.json` (runtime #149). Yet CP marks it `failed`: ``` Registry register: platform agent registered without /opt/molecule-mcp-server; refusing online registry_register_400 reason=platform_agent_mcp_server_missing Heartbeat: ... refusing to mark online (RCA #2970 FAIL-CLOSED) ``` ## Root cause `mcp_server_present()` only checked the legacy `/opt/molecule-mcp-server` binary baked into the **retired** platform-agent image. The concierge is now the `claude-code + plugin` composition, so the file is absent → runtime self-reports `mcp_server_present=false` → the RCA #2970 fail-closed gate (`workspace-server/internal/handlers/registry.go`) refuses online-marking → `create_workspace` never reachable. The e2e then auto-skips its assertion (green-but-unproven). ## Fix Make the signal **delivery- and runtime-agnostic**: `mcp_server_present()` returns true when the baked binary exists **OR** `settings.json` wires the `molecule-platform` mcpServer. Absence of both stays **fail-closed**. Mirrors the platform model — platform-ness is a composition (org key + management MCP plugin) on an ordinary runtime, not a special image. ## Tests plugin-wired settings ⇒ true; other-MCP-only / malformed / missing ⇒ false; binary-alone still true; name-matches-plugin guard. Existing register/heartbeat payload tests unchanged. ## Follow-ups (separate) - Flip staging e2e to `E2E_REQUIRE_LIVE=1` **after** this lands + image re-pins. - Seed `system-prompt.md` (non-blocking boot WARN). - Generalize first-class "any workspace/runtime → management agent" via org-root entitlement. 🤖 Generated with [Claude Code](https://claude.com/claude-code)
core-devops added 2 commits 2026-06-20 00:00:13 +00:00
mcp_server_present() only checked the legacy /opt/molecule-mcp-server binary (baked into the retired platform-agent image). The concierge now runs the claude-code + molecule-platform plugin composition, delivering the management MCP via settings.json mcpServers, so a healthy concierge self-reported mcp_server_present=false and the RCA #2970 fail-closed gate refused to mark it online (register 400 + heartbeat deny), leaving create_workspace unreachable.

Broaden the signal to be delivery- and runtime-agnostic: true when the baked binary exists OR settings.json wires the molecule-platform mcpServer. Absence of both stays fail-closed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
test(concierge): cover plugin-delivered MCP in mcp_server_present()
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 8s
ci / responsiveness-e2e (pull_request) Successful in 8m8s
ci / lint (pull_request) Successful in 3m2s
ci / build (pull_request) Successful in 2m33s
ci / smoke-install (pull_request) Successful in 6m20s
ci / unit-tests (pull_request) Successful in 6m36s
c82611e0b8
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
core-devops added 1 commit 2026-06-20 00:00:15 +00:00
test(concierge): cover plugin-delivered MCP in mcp_server_present()
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 8s
ci / responsiveness-e2e (pull_request) Successful in 8m8s
ci / lint (pull_request) Successful in 3m2s
ci / build (pull_request) Successful in 2m33s
ci / smoke-install (pull_request) Successful in 6m20s
ci / unit-tests (pull_request) Successful in 6m36s
c82611e0b8
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
core-devops added 1 commit 2026-06-20 00:04:34 +00:00
molecule-code-reviewer approved these changes 2026-06-20 00:04:34 +00:00
Dismissed
molecule-code-reviewer left a comment
Member

APPROVE — Correctness review. Verified mcp_server_present() now returns true for the claude-code+plugin concierge via /configs/.claude/settings.json mcpServers['molecule-platform'], matching the exact path/key the executor (_load_settings_mcp) and plugin installer use. JSON parsing is defensive (missing/unreadable/malformed/non-dict all → False, fail-closed). Tests cover the new branch + non-dict guards + binary-only + payload shape. No regression to register/heartbeat payload.

APPROVE — Correctness review. Verified mcp_server_present() now returns true for the claude-code+plugin concierge via /configs/.claude/settings.json mcpServers['molecule-platform'], matching the exact path/key the executor (_load_settings_mcp) and plugin installer use. JSON parsing is defensive (missing/unreadable/malformed/non-dict all → False, fail-closed). Tests cover the new branch + non-dict guards + binary-only + payload shape. No regression to register/heartbeat payload.
core-security approved these changes 2026-06-20 00:04:35 +00:00
Dismissed
core-security left a comment
Member

APPROVE — Security/fail-closed review. Adversarial read confirms the gate stays fail-closed on every degenerate input; no exception escapes. Self-reporting true grants nothing without the server-side org-root entitlement + org-key injection (the real boundary), now documented in-code. Broadened signal is load-bearing (the actual config the executor consumes), not cosmetic.

APPROVE — Security/fail-closed review. Adversarial read confirms the gate stays fail-closed on every degenerate input; no exception escapes. Self-reporting true grants nothing without the server-side org-root entitlement + org-key injection (the real boundary), now documented in-code. Broadened signal is load-bearing (the actual config the executor consumes), not cosmetic.
core-devops added 1 commit 2026-06-20 00:04:35 +00:00
test(concierge): cover non-dict settings guards (fail-closed)
Secret scan / Scan diff for credential-shaped strings (pull_request) Failing after 6s
ci / responsiveness-e2e (pull_request) Successful in 6m38s
ci / unit-tests (pull_request) Successful in 5m14s
ci / lint (pull_request) Successful in 2m4s
ci / smoke-install (pull_request) Successful in 5m28s
ci / build (pull_request) Successful in 1m57s
e16ca1cb90
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
core-devops dismissed molecule-code-reviewer's review 2026-06-20 00:04:35 +00:00
Reason:

New commits pushed, approval review dismissed automatically according to repository settings

core-devops dismissed core-security's review 2026-06-20 00:04:35 +00:00
Reason:

New commits pushed, approval review dismissed automatically according to repository settings

core-devops added 1 commit 2026-06-20 00:09:28 +00:00
docs(concierge): point mcp-present literals at the cross-repo SSOT contract
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 8s
ci / lint (pull_request) Successful in 22s
ci / build (pull_request) Successful in 1m33s
ci / smoke-install (pull_request) Successful in 5m2s
ci / unit-tests (pull_request) Successful in 7m13s
ci / responsiveness-e2e (pull_request) Successful in 7m13s
400dcb97cd
The settings path/key/name here are governed by contracts/mcp-plugin-delivery.contract.json
(enforced by the drift workflow). Producer/consumer literal drift IS this bug class;
the contract is the gate that prevents recurrence.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
molecule-code-reviewer approved these changes 2026-06-20 00:17:48 +00:00
molecule-code-reviewer left a comment
Member

APPROVE (re-confirm on head 400dcb97 after SSOT-pointer doc commit). Correctness unchanged; plugin-delivered MCP detection + fail-closed guards + tests verified. unit-tests 676 passed.

APPROVE (re-confirm on head 400dcb97 after SSOT-pointer doc commit). Correctness unchanged; plugin-delivered MCP detection + fail-closed guards + tests verified. unit-tests 676 passed.
core-security approved these changes 2026-06-20 00:17:49 +00:00
core-security left a comment
Member

APPROVE (re-confirm on head 400dcb97). Fail-closed preserved on all degenerate inputs; real boundary remains server-side org-root entitlement, now documented + pointed at the SSOT contract.

APPROVE (re-confirm on head 400dcb97). Fail-closed preserved on all degenerate inputs; real boundary remains server-side org-root entitlement, now documented + pointed at the SSOT contract.
core-devops merged commit f27672dc12 into main 2026-06-20 00:17:50 +00:00
core-devops deleted branch fix/concierge-online-gate-plugin-mcp 2026-06-20 00:17:50 +00:00
Sign in to join this conversation.
3 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: molecule-ai/molecule-ai-workspace-runtime#156