Concierge management MCP never loads: runtime launches Claude Code with --strict-mcp-config (ignores /configs/.claude/settings.json where the plugin writes molecule-platform) #3079

Open
opened 2026-06-19 19:51:51 +00:00 by core-devops · 3 comments
Member

Existing concierge restart/reprovision does NOT deliver the management MCP (create_workspace)

Repro (prod, test3): concierge ws 5452a81e-150f-5276-9a52-f64b97000d80, kind=platform, runtime=claude-code (model=kimi). After BOTH a tenant-API restart (POST /workspaces/:id/restart {apply_template:true}) and a UI reprovision, the concierge is online but:

  • declared_plugins=None, installed plugins []
  • A2A capability probe → NO_MANAGEMENT_MCP (no create_workspace/list_workspaces)
  • restart response was {"config_dir":"claude-code-default",...} — i.e. a plain config-default restart, not the concierge-aware provision path.

Why: the management MCP reaches a concierge via declare → reconcile → boot-install. The declaration only happens in the kind-gated applyConciergeProvisionConfigseedTemplatePlugins(conciergePlatformMCPSource) (platform_agent.go:277). That hook runs on a fresh provision (verified working on staging), but on an existing concierge the restart/reprovision path doesn't re-run it (or recordDeclaredPlugin skips) → nothing is declared → reconcile has nothing to install → NO_MANAGEMENT_MCP.

Impact: NEW prod concierges get create_workspace (the RFC#3045 chain is live in prod after :latest promoted to 3c080825). But pre-existing concierges (created before the plugin chain) are stuck plugin-less and a restart/reprovision does NOT remediate them.

Fix options:

  1. Make the declaration fire on the restart/reprovision path for kind=platform (not provision-only), OR
  2. A backfill that records the conciergePlatformMCP declaration for existing kind=platform concierges so the post-online reconcile installs it.

To confirm the exact skip path (applyConciergeProvisionConfig-not-invoked vs recordDeclaredPlugin-refuse): read the orchestrator provision log (could not declare … skipped=N) or the concierge box boot-install log — both currently blocked (no prod SSM; test3's box is returning edge 525 / down after repeated reprovisions).

Related: #3049 (declare-as-plugin), #3055 (restart re-stub), CP#874 (deploy fleet), #3075 (main-red fix). Distinct from all of these.

## Existing concierge restart/reprovision does NOT deliver the management MCP (create_workspace) **Repro (prod, test3):** concierge ws `5452a81e-150f-5276-9a52-f64b97000d80`, kind=platform, runtime=claude-code (model=kimi). After BOTH a tenant-API restart (`POST /workspaces/:id/restart {apply_template:true}`) and a UI reprovision, the concierge is `online` but: - `declared_plugins=None`, installed plugins `[]` - A2A capability probe → `NO_MANAGEMENT_MCP` (no `create_workspace`/`list_workspaces`) - restart response was `{"config_dir":"claude-code-default",...}` — i.e. a plain config-default restart, not the concierge-aware provision path. **Why:** the management MCP reaches a concierge via **declare → reconcile → boot-install**. The declaration only happens in the kind-gated `applyConciergeProvisionConfig` → `seedTemplatePlugins(conciergePlatformMCPSource)` (platform_agent.go:277). That hook runs on a **fresh** provision (verified working on staging), but on an **existing** concierge the restart/reprovision path doesn't re-run it (or `recordDeclaredPlugin` skips) → nothing is declared → reconcile has nothing to install → `NO_MANAGEMENT_MCP`. **Impact:** NEW prod concierges get create_workspace (the RFC#3045 chain is live in prod after :latest promoted to 3c080825). But pre-existing concierges (created before the plugin chain) are stuck plugin-less and a restart/reprovision does NOT remediate them. **Fix options:** 1. Make the declaration fire on the restart/reprovision path for kind=platform (not provision-only), OR 2. A backfill that records the conciergePlatformMCP declaration for existing kind=platform concierges so the post-online reconcile installs it. **To confirm the exact skip path** (applyConciergeProvisionConfig-not-invoked vs recordDeclaredPlugin-refuse): read the orchestrator provision log (`could not declare … skipped=N`) or the concierge box boot-install log — both currently blocked (no prod SSM; test3's box is returning edge 525 / down after repeated reprovisions). Related: #3049 (declare-as-plugin), #3055 (restart re-stub), CP#874 (deploy fleet), #3075 (main-red fix). Distinct from all of these.
Author
Member

Root cause CONFIRMED (prod test4, fresh concierge) — it's a boot-ordering / re-stub problem, NOT a delivery failure

On a FRESH prod concierge (test4, ws 84addc19), the delivery chain actually works:

  • /configs/plugins/molecule-ai-plugin-molecule-platform-mcp present
  • settings.jsonmcpServers.molecule-platform = npx -y @molecule-ai/mcp-server@1.6.1, env [MOLECULE_MCP_MODE, npm_config_@molecule-ai:registry]
  • MOLECULE_ORG_API_KEY + MOLECULE_TEMPLATE_REPO_TOKEN set
  • Manual launch of that exact command → Molecule AI MANAGEMENT MCP server running on stdio … mode: management (connects via Org API Key, derives tenant host). So the MCP server itself is 100% functional.

Yet the agent reports only mcp__a2a__*, no mcp__molecule-platform__*Claude Code never loaded the server.

Mechanism (two intertwined bugs):

  1. Post-online install ordering. The molecule-platform entry is written into settings.json by the post-online reconcileafter the Claude Code agent has already started and read its (plugin-less) settings.json. Claude Code reads mcpServers once at startup, so it never picks up the post-boot addition. First boot therefore never exposes create_workspace.
  2. Restart re-stubs settings.json. A workspace restart comes back config_dir=claude-code-default — i.e. settings.json is reset to the base config, wiping the prior reconcile's molecule-platform entry. So the next Claude Code start again reads a plugin-less settings.json (#3055 class). This is why re-provisioning test3 didn't help either.

Net: Claude Code never has a boot where settings.json already contains molecule-platform → create_workspace is never available on any concierge, fresh or existing.

Fix options:

  • Install/declare-reconcile the management MCP pre-online (before Claude Code starts) so settings.json contains it at first read; AND/OR
  • After the post-online reconcile writes settings.json, restart the Claude Code agent process (re-read mcpServers) — without re-stubbing; AND
  • Ensure restart does NOT reset settings.json to claude-code-default (preserve the reconciled mcpServers).

Supersedes the earlier "declaration skipped" hypothesis — the declaration/install DO happen; the agent just never loads the result.

## Root cause CONFIRMED (prod test4, fresh concierge) — it's a boot-ordering / re-stub problem, NOT a delivery failure On a FRESH prod concierge (test4, ws 84addc19), the delivery chain actually **works**: - `/configs/plugins/molecule-ai-plugin-molecule-platform-mcp` present - `settings.json` → `mcpServers.molecule-platform` = `npx -y @molecule-ai/mcp-server@1.6.1`, env `[MOLECULE_MCP_MODE, npm_config_@molecule-ai:registry]` - `MOLECULE_ORG_API_KEY` + `MOLECULE_TEMPLATE_REPO_TOKEN` set - **Manual launch of that exact command → `Molecule AI MANAGEMENT MCP server running on stdio … mode: management`** (connects via Org API Key, derives tenant host). So the MCP server itself is 100% functional. Yet the agent reports only `mcp__a2a__*`, no `mcp__molecule-platform__*` → **Claude Code never loaded the server.** **Mechanism (two intertwined bugs):** 1. **Post-online install ordering.** The molecule-platform entry is written into `settings.json` by the **post-online reconcile** — *after* the Claude Code agent has already started and read its (plugin-less) settings.json. Claude Code reads `mcpServers` once at startup, so it never picks up the post-boot addition. First boot therefore never exposes `create_workspace`. 2. **Restart re-stubs settings.json.** A workspace restart comes back `config_dir=claude-code-default` — i.e. settings.json is reset to the base config, wiping the prior reconcile's molecule-platform entry. So the next Claude Code start again reads a plugin-less settings.json (#3055 class). This is why re-provisioning test3 didn't help either. Net: Claude Code never has a boot where settings.json already contains molecule-platform → `create_workspace` is never available on any concierge, fresh or existing. **Fix options:** - Install/declare-reconcile the management MCP **pre-online** (before Claude Code starts) so settings.json contains it at first read; AND/OR - After the post-online reconcile writes settings.json, **restart the Claude Code agent process** (re-read mcpServers) — without re-stubbing; AND - Ensure restart does NOT reset settings.json to claude-code-default (preserve the reconciled mcpServers). Supersedes the earlier "declaration skipped" hypothesis — the declaration/install DO happen; the agent just never loads the result.
Author
Member

DEFINITIVE root cause — --strict-mcp-config ignores the plugin's settings.json

The runtime (molecule_runtime) launches Claude Code with the a2a MCP passed inline and a strict flag (captured from a live prod concierge, test4):

claude ... --model moonshot/kimi-k2.6 --permission-mode bypassPermissions \
  --mcp-config '{"mcpServers":{"a2a":{"command":"/usr/local/bin/python3.11","args":["/usr/local/lib/python3.11/site-packages/molecule_runtime/a2a_mcp_server.py"]}}}' \
  --strict-mcp-config ...

--strict-mcp-config makes Claude Code load MCP servers only from the inline --mcp-config JSON (a2a only) and ignore all settings files — including /configs/.claude/settings.json, which is exactly where the molecule-platform plugin's MCPServerAdaptor merges its mcpServers. Confirmed on test4:

  • /configs/.claude/settings.jsonmcpServers:[molecule-platform] (plugin wrote it )
  • ~/.claude.json, ~/.claude/settings.jsonmcpServers:[]
  • launched config → a2a only

Net: the management MCP is fetched, installed, and written to settings.json correctly, but Claude Code is explicitly told not to read that file, so mcp__molecule-platform__* / create_workspace never load. This is independent of fresh-vs-existing, restart, or boot-ordering (supersedes the earlier ordering hypothesis).

Fix (in molecule_runtime — the Claude Code launcher)

The launcher hardcodes --mcp-config '{a2a only}'. It must merge the plugin-delivered mcpServers (from /configs/.claude/settings.json mcpServers, where MCPServerAdaptor writes molecule-platform) into the inline --mcp-config JSON it passes — so the strict config contains a2a and molecule-platform. Alternatives: drop --strict-mcp-config (less safe — would load any settings), or point --mcp-config at a file the adaptor owns and write both servers there.

Confirmed: launching that exact molecule-platform command by hand → MANAGEMENT MCP server running on stdio … mode: management. So once the launcher includes it, create_workspace will work.

## DEFINITIVE root cause — `--strict-mcp-config` ignores the plugin's settings.json The runtime (`molecule_runtime`) launches Claude Code with the a2a MCP passed **inline** and a strict flag (captured from a live prod concierge, test4): ``` claude ... --model moonshot/kimi-k2.6 --permission-mode bypassPermissions \ --mcp-config '{"mcpServers":{"a2a":{"command":"/usr/local/bin/python3.11","args":["/usr/local/lib/python3.11/site-packages/molecule_runtime/a2a_mcp_server.py"]}}}' \ --strict-mcp-config ... ``` `--strict-mcp-config` makes Claude Code load MCP servers **only** from the inline `--mcp-config` JSON (a2a only) and **ignore all settings files** — including `/configs/.claude/settings.json`, which is exactly where the molecule-platform plugin's MCPServerAdaptor merges its `mcpServers`. Confirmed on test4: - `/configs/.claude/settings.json` → `mcpServers:[molecule-platform]` (plugin wrote it ✅) - `~/.claude.json`, `~/.claude/settings.json` → `mcpServers:[]` - launched config → a2a only Net: the management MCP is fetched, installed, and written to settings.json correctly, **but Claude Code is explicitly told not to read that file**, so `mcp__molecule-platform__*` / `create_workspace` never load. This is independent of fresh-vs-existing, restart, or boot-ordering (supersedes the earlier ordering hypothesis). ## Fix (in molecule_runtime — the Claude Code launcher) The launcher hardcodes `--mcp-config '{a2a only}'`. It must **merge the plugin-delivered mcpServers** (from `/configs/.claude/settings.json` `mcpServers`, where MCPServerAdaptor writes molecule-platform) **into the inline `--mcp-config` JSON it passes** — so the strict config contains a2a **and** molecule-platform. Alternatives: drop `--strict-mcp-config` (less safe — would load any settings), or point `--mcp-config` at a file the adaptor owns and write both servers there. Confirmed: launching that exact molecule-platform command by hand → `MANAGEMENT MCP server running on stdio … mode: management`. So once the launcher includes it, create_workspace will work.
core-devops changed title from Existing concierge restart/reprovision does not deliver management MCP (create_workspace) — declaration is provision-time-only to Concierge management MCP never loads: runtime launches Claude Code with --strict-mcp-config (ignores /configs/.claude/settings.json where the plugin writes molecule-platform) 2026-06-19 20:19:15 +00:00
Author
Member

Fix PR drafted: molecule-ai-workspace-template-claude-code #149_load_settings_mcp() + _apply_settings_mcp_servers() fold /configs/.claude/settings.json mcpServers into the SDK options (since --strict-mcp-config ignores the on-disk file), and _declared_extra_mcp_names() now waits for the plugin server. 14/14 unit tests pass. After merge + runtime-image ship, a fresh concierge (or existing-volume restart) will load create_workspace.

Fix PR drafted: molecule-ai-workspace-template-claude-code **#149** — `_load_settings_mcp()` + `_apply_settings_mcp_servers()` fold `/configs/.claude/settings.json` `mcpServers` into the SDK options (since `--strict-mcp-config` ignores the on-disk file), and `_declared_extra_mcp_names()` now waits for the plugin server. 14/14 unit tests pass. After merge + runtime-image ship, a fresh concierge (or `existing-volume` restart) will load `create_workspace`.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: molecule-ai/molecule-core#3079