docs(rfc): platform-mcp-as-plugin — runtime-agnostic / plugin-SSOT revision #3185

Merged
devops-engineer merged 1 commits from docs/rfc-platform-mcp-plugin-lego-revision into main 2026-06-23 21:41:15 +00:00
+203 -151
View File
@@ -1,194 +1,246 @@
# RFC: Deliver `molecule-platform-mcp` as an entitlement-gated MCP plugin
- **Status:** **Proposed — ready for CTO sign-off** (was Draft). Open questions resolved to decisions (§6), rollout made concrete and gated (§5), image-retirement scoped. Updated 2026-06-23.
- **Status:** Proposed — ready for CTO sign-off (arch + entitlement change)
- **Author:** devops-engineer (agent)
- **Date:** 2026-06-18
- **Related:** `rfc-platform-agent.md` §5.7, RFC #2843 (delivery decoupling),
`MCPServerAdaptor` (runtime issue #847), marketplace/entitlement design,
core PR #3044 (provider-pin seed, orthogonal), template PR #5 (inert).
- **Date:** 2026-06-18 (revised 2026-06-23 — runtime-agnostic / plugin-SSOT reframe per CTO review)
- **Related:** `rfc-platform-agent.md` §5.5/§5.7, `rfc-decouple-config-skill-delivery.md` §10a,
RFC #2843, `MCPServerAdaptor` (runtime #847), marketplace/entitlement design,
core #3044 (provider-pin, orthogonal), #3159 (live staging manifestation), #3164 (fragility).
## 1. Problem
The concierge (org-root `kind=platform` agent) is supposed to be the *Org
Concierge*: a privileged agent that manages the platform through a management
MCP (`molecule-platform-mcp`, 80+ org-admin tools incl. `create_workspace`,
`list_workspaces`). In production today it is **not** that agent.
The concierge (org-root `kind=platform` agent) is meant to be the *Org Concierge*: a privileged
agent that manages the platform via a management MCP (`molecule-platform-mcp`, 80+ org-admin tools
incl. `create_workspace`). In production today it is **not** that agent.
Observed on prod tenant `test3` (2026-06-18), the concierge introspects as:
- System prompt: `"You are a Claude agent, built on Anthropic's Claude Agent SDK."` — the **generic** default, not the concierge persona.
- MCP servers: **`a2a` only** — the management MCP is **not wired**.
- `create_workspace` / `list_workspaces`: **absent**.
Asked to "spawn a test agent," it used Claude Code's **built-in `Task`
sub-agents** in `/workspace` (~17k tokens) instead of calling `create_workspace`
to create a Molecule workspace. It is **vanilla Claude Code**, not the concierge.
Observed on prod `test3` (2026-06-18): system prompt is the generic Claude default (not the concierge
persona); MCP servers = `a2a` only (management MCP **not wired**); `create_workspace`/`list_workspaces`
**absent**. Asked to "spawn a test agent," it used Claude Code's built-in `Task` sub-agents instead of
`create_workspace`. It is vanilla Claude Code, not the concierge.
### Root cause
The concierge identity + MCP wiring are delivered via the **asset/baked
channel** (template `config.yaml` with `prompt_files` + `mcp_servers.yaml`,
optionally a baked `molecule-platform-agent` image). On SaaS this channel does
**not** land: the on-box `/configs/config.yaml` is a **218-byte CP-regenerated
stub** —
```yaml
name: <wsid>
runtime: claude-code
a2a: {port: 8000, streaming: true}
model: 'moonshot/kimi-k2.6'
provider: 'platform'
runtime_config: {model: 'moonshot/kimi-k2.6', provider: 'platform'}
```
— with **no `prompt_files`** (→ generic prompt) and **no `mcp_servers`** (→ no
management MCP). No `concierge.md` or `mcp_servers.yaml` reaches the box. The CP
regenerates this stub on every (re)provision/restart, so even a correct template
`config.yaml` is overwritten.
Meanwhile the **plugin channel works**: the post-online reconcile reliably
installs declared plugins (e.g. the `seo-all` skill) into `/configs`.
The concierge identity + MCP wiring are delivered via the **asset/baked channel** (template
`config.yaml` + optional baked `molecule-platform-agent` image). On SaaS that channel does **not**
land: `/configs/config.yaml` is a 218-byte CP-regenerated stub (no `prompt_files` → generic prompt;
no `mcp_servers` → no management MCP), regenerated on every (re)provision. Meanwhile the **plugin
channel works**: the post-online reconcile reliably installs declared plugins (e.g. `seo-all`).
## 2. Proposal
Deliver `molecule-platform-mcp` as an **entitlement-gated MCP-server plugin**,
declared by the platform-agent template and installed dynamically post-online —
**not** via the asset relay or a baked image.
Deliver `molecule-platform-mcp` as an **entitlement-gated MCP-server plugin**, declared by the
platform-agent template and installed dynamically post-online — **not** via the asset relay or a
baked image.
This is not new machinery. The runtime already ships **`MCPServerAdaptor`**
(`molecule_runtime/plugins_registry/builtins.py`, issue #847), promoted to a
first-class adaptor after four MCP plugins shipped it (`molecule-firecrawl`
#512, `molecule-github-mcp` #520, `molecule-browser-use` #553, `mcp-connector`).
It deep-merges a plugin's `settings-fragment.json` `mcpServers` block into
`/configs/.claude/settings.json` — exactly where the Claude Code SDK reads MCP
servers.
Not new machinery: the runtime already ships `MCPServerAdaptor`
(`molecule_runtime/plugins_registry/builtins.py`, #847), promoted to first-class after several early
MCP-plugin proposals (#512/#520/#553). The registry standardizes on the `molecule-ai-plugin-<name>`
repo convention (e.g. `molecule-ai-plugin-image-gen`, `molecule-ai-plugin-molecule-platform-mcp`).
**Today** the adaptor renders a plugin's MCP into `/configs/.claude/settings.json` (the claude-code
path only); the revision makes that rendering **per-runtime** (§2b).
### Plugin shape
`molecule-platform-mcp` ships a `settings-fragment.json`:
```json
{
"mcpServers": {
"molecule-platform": {
"command": "npx",
"args": ["-y", "@molecule-ai/mcp-server"],
"env": {
"MOLECULE_MCP_MODE": "management",
"MOLECULE_API_URL": "${MOLECULE_API_URL}",
"MOLECULE_ORG_API_KEY": "${MOLECULE_ORG_API_KEY}"
}
}
}
}
```
The secret (`MOLECULE_ORG_API_KEY`) is **referenced, never embedded** — core
keeps injecting it into the container env via `conciergePlatformMCPEnv`. The
concierge persona prompt ships as the plugin's rule/`SKILL.md`-style content (or
remains the one small identity asset, per the RFC #2843 carve-out for
`config.yaml` + prompts).
The plugin's canonical shape is the **runtime-agnostic MCP descriptor** in §2b (the SSOT). The
claude-code adapter *renders* that descriptor into a `settings-fragment.json` — the claude rendering,
**not** the cross-runtime contract. The secret (`MOLECULE_ORG_API_KEY`) is **referenced, never
embedded** — core injects it into the container env via `conciergePlatformMCPEnv`.
### Wiring
- The platform-agent template declares `molecule-platform-mcp` in `workspace_declared_plugins`.
- The post-online reconcile resolves the per-runtime adapter and installs it.
- `npx -y @molecule-ai/mcp-server` launches on demand → **no baked binary**; the standard runtime
image for the concierge's *configured* runtime (claude-code by default, switchable) + this plugin
= concierge.
- **Runtime-agnostic delivery is a requirement, not an assumption** (§2b, §3.4).
- The platform-agent template declares `molecule-platform-mcp` in
`workspace_declared_plugins`.
- The post-online reconcile resolves `MCPServerAdaptor` and installs it.
- `command: npx -y @molecule-ai/mcp-server` launches on demand → **no baked
binary**, so the special `molecule-platform-agent` image is no longer required:
the standard runtime image **for the concierge's CONFIGURED runtime** (claude-code
by default, but switchable — see §3.4) + this plugin = concierge.
- **Runtime-agnostic delivery is a requirement, not an assumption.** `MCPServerAdaptor`
must wire the management MCP into whatever the configured runtime reads for MCP
servers (claude-code's `settings.json`; the equivalent for codex/hermes/etc.) —
so a concierge on any runtime gets `create_workspace`. (The baked image can never
be runtime-agnostic; the plugin can.)
## 2b. The single source of truth: the **plugin declaration** (not a separate `mcp_servers:` list)
**SSOT = the plugin.** An MCP server is delivered *as a plugin*; capabilities are declared **once**,
in the plugin list (`config.yaml: plugins:` / DB `workspace_declared_plugins`). The plugin *package*
carries a runtime-agnostic MCP descriptor + per-runtime adapters; the shape adapter renders it into
the runtime's native MCP config. There is **no separate top-level `mcp_servers:` list** — that would
be a second, competing declaration of the same thing.
> **Correction (CTO review):** an earlier draft proposed `config.yaml: mcp_servers:` as the SSOT.
> That is redundant — `config.yaml` already declares plugins, and the MCP *is* a plugin. A standalone
> `mcp_servers:` list is the "two delivery paths never reconciled" the docs audit flagged (it came
> from `rfc-platform-agent §5.5`, pre-plugin). The plugin declaration supersedes it.
**A) Now.** The plugin list already exists (`WorkspaceConfig.plugins: list[str]`, `config.py:350`).
On SaaS the box gets the 218-byte stub (neither `plugins` nor `mcp_servers` populated). The always-on
`a2a` MCP is a runtime *builtin* (injected directly, not a plugin/config). The management MCP today
rides the claude-only `settings-fragment.json``.claude/settings.json`. The `rfc-platform-agent
§5.5` `extra_mcp_servers`/`mcp_servers:` proposal is the redundancy trap to retire, not adopt.
**B) Changes (fix the plugin package — do NOT add a new list).** Declare the MCP as a plugin
(`plugins: [molecule-platform-mcp]`, already true via the template). Add, *inside the plugin package*,
a runtime-agnostic MCP descriptor (`name`/`command`/`args`/`env`, secret referenced) + per-runtime
adapters (`adapters/<runtime>.py`). `settings-fragment.json` becomes the claude adapter's *output*.
**C) Read (one plugin, N renderings).** Plugin declared → reconcile resolves the per-runtime shape
adapter → it materializes the descriptor into the active runtime's native MCP config:
| Runtime | Native MCP config |
|---|---|
| claude-code | `/configs/.claude/settings.json``mcpServers` (today's `settings-fragment.json` = this adapter's output) |
| codex | `~/.codex/config.toml``[mcp_servers]` |
| gemini-cli | `~/.gemini/settings.json` |
| hermes | `platforms.*` stanza / entry-point |
The server launches on demand and authenticates purely from container **env** — referenced in the
descriptor, injected by core, never embedded.
> **Identity-gate corollary:** "is the management MCP wired?" must be answered by **asking the active
> adapter**, not by reading `/configs/.claude/settings.json`. Today's claude-only check
> (`platform_agent_identity.py`) fail-closes a codex/hermes concierge **offline** even when its MCP is
> correctly wired — this is the live **#3159** staging failure.
## 3. Why this is the right channel
1. **Uses the delivery path that works** (plugins) instead of the one that
doesn't (asset/baked stub).
2. **Consistent with platform direction**`feedback_skills_are_plugins_dynamic_install`:
plugins install dynamically post-boot; the asset relay is for small
identity/config only. An MCP server is a *capability*, so it belongs in the
plugin channel.
3. **Retires a maintenance burden** — no special concierge image to build, pin,
repin (a large share of recent incident toil), and MCP updates ship via the
1. Uses the delivery path that works (plugins) instead of the one that doesn't (asset/baked stub).
2. Consistent with platform direction (`feedback_skills_are_plugins_dynamic_install`): an MCP server
is a *capability* → plugin channel.
3. Retires a maintenance burden — no special image to build/pin/repin; MCP updates ship via the
registry without an image rebuild.
4. **Required for runtime-switchable platform agents (correctness, not just
cleanup).** A platform agent is NOT claude-code-specific — its runtime is
*switchable* (claude-code is only the current default; codex/hermes/openclaw
are first-class). The baked `molecule-platform-agent` image is built **FROM the
claude-code runtime image**, so it structurally **binds the concierge to
claude-code** and cannot serve a codex/hermes concierge. Only the
runtime-agnostic plugin model (MCP wired per the configured runtime via
`MCPServerAdaptor`) supports a switchable-runtime concierge. The image isn't
just redundant — for any non-claude-code platform agent it is **wrong**.
### 3.4 Required for runtime-switchable platform agents — correctness, not cleanup
A platform agent is **NOT** claude-code-specific — its runtime is **switchable** (claude-code is only
the current default; codex/hermes/openclaw are first-class). The baked `molecule-platform-agent`
image is built **FROM the claude-code image**, so it structurally binds the concierge to claude-code
and cannot serve a codex/hermes concierge. Only the runtime-agnostic plugin model supports a
switchable-runtime concierge. **For any non-claude-code platform agent the image is not just
redundant — it is *wrong*.**
## 4. Security — the load-bearing constraint
The management MCP holds the org-admin token and can create/delete workspaces.
It **MUST** be installable **only** on the org-root `kind=platform` concierge,
**never** on a user workspace. A normal public plugin-install path would be a
privilege-escalation hole.
The management MCP holds the org-admin token and can create/delete workspaces. It **MUST** be
installable **only** on the org-root `kind=platform` concierge, never on a user workspace.
Requirements:
- **Entitlement gate:** the platform-mcp plugin is installable only for the
org-root concierge (enforced server-side at install/reconcile, keyed on
`kind=platform` + org-root, not client-asserted). Ties into the
marketplace/entitlement design.
- **Secret separation:** the org-admin token stays a core-injected container
env var; the plugin only references it.
- **Audit:** install of the privileged plugin is logged like any org-admin
action.
- **Entitlement gate:** installable only for the org-root concierge, enforced server-side at
install/reconcile, keyed on `kind=platform` + org-root (not client-asserted). *(Already shipped.)*
- **Secret separation:** the org-admin token stays a core-injected container env var; the plugin only
references it.
- **Audit:** install of the privileged plugin is logged like any org-admin action.
## 5. Migration / rollout — concrete, gated sequence
**Already shipped** (the plugin path exists end-to-end — only the cutover + retirement remain):
- `molecule-platform-mcp` plugin (settings-fragment + `MCPServerAdaptor`) — built, in the registry.
- Org-root-only **entitlement gate** (server-side, keyed on `kind=platform` + org-root) — shipped.
- Platform-agent provisioner **declares** it (`seedTemplatePlugins` in `applyConciergeProvisionConfig`).
- Verified: a fresh concierge can install it + gain `create_workspace`.
- Readiness gate (the original step 3): the management-MCP gate has a 90s warmup grace (`managementMCPUnloadedGrace`, core#3082), so the post-online install window doesn't false-red.
**Already shipped (claude-code path only):** the plugin (settings-fragment + `MCPServerAdaptor`),
the org-root entitlement gate, the provisioner declaration (`seedTemplatePlugins`), verification that
a fresh **claude-code** concierge installs it + gains `create_workspace`, and the 90s warmup grace
(`managementMCPUnloadedGrace`, #3082).
**The one blocker before the image can be retired:** the plugin is a **private Gitea repo** fetched at boot, and that fetch has been **flaky** (404 on missing/over-scoped token, gitea hang — #3065/#3108). The baked image is today's *safety net* against exactly that. Retiring it without hardening the fetch trades an intermittent degradation for a hard one.
> **The revision's CORE work — runtime-agnostic delivery — is NOT done yet.** Everything above is the
> claude-code path. The actual lego fix is still to build, and is the prerequisite for step 3:
> - **Per-runtime adapter rendering** — `MCPServerAdaptor` renders the §2b descriptor into each
> runtime's native MCP config (it currently ignores its `runtime` arg and always writes
> `.claude/settings.json`).
> - **Generalize the identity gate** — ask the active adapter "is the management MCP wired?" instead
> of reading `.claude/settings.json` (today's read fail-closes codex/hermes offline = #3159).
> - **Update the delivery contract** — `mcp-plugin-delivery.contract.json` pins the claude path as
> SSOT with a drift test; it must become per-runtime or the new adapters fail it.
> - **Proven by §5b** (per-runtime render tests + local docker MCP-visibility harness).
**Sequence (staging-first, each step gated on the prior):**
1. **Make the failure visible** — land the #3164 instrumentation (runtime PR #171: `platform_mcp_diag` in the heartbeat) so the CP can see *which* path fails (image-fallback vs plugin-fetch) without box SSH. **Already up for review.**
2. **Harden the plugin fetch** — retry-with-backoff + a **fail-LOUD** signal when the management-MCP plugin fetch fails on a `kind=platform` agent, so a concierge can never come up silently "online-but-no-MCP." Closes the safety-net dependency the image currently provides.
3. **Prove plugin-only on staging** — provision a `kind=platform` agent on the **plain runtime image for its configured runtime** (claude-code by default; ALSO smoke at least one non-claude-code runtime, e.g. codex, to prove switchability) (no platform-agent image) + the plugin; confirm `create_workspace` on a FRESH provision and the staging E2E ("Platform Boot", "Concierge Creates Workspace") green.
4. **Cut over provisioning** — flip `kind=platform` image selection to the plain runtime image (retire `resolvePlatformAgentImage` / the `-platform-agent` variant); the plugin becomes the sole MCP delivery.
5. **Re-provision existing concierges** → plugin-only; verify each online + `create_workspace` via real-chat.
6. **Retire the image** — drop the `publish-platform-agent` job; remove the dual-path code (`on_platform_agent_image`, `MOLECULE_PLATFORM_AGENT_IMAGE_BAKED`, the baked-binary branch of `mcp_server_present`). Keep a documented offline/self-host build recipe (§6 Q4) but publish/provision nothing.
7. Keep PR #3044 (provider-pin) — orthogonal (responsiveness, not capability).
**Blocker before the image can retire:** the plugin repo fetch must be reliable. As of 2026-06-23 the
repo `molecule-ai/molecule-ai-plugin-molecule-platform-mcp` is **public** (verified anon HTTP 200), so
this plugin's fetch is token-free and sidesteps the `gitea://` private-repo hang (#3108). Auth-gated
fetch applies only to future *private* plugins.
**Acceptance:** a fresh `kind=platform` agent on the plain image + plugin surfaces `create_workspace`; staging E2E green; no `molecule-platform-agent` image referenced in the provision path.
**Sequence (staging-first; each step gated on the prior):**
1. **Make the failure visible — via the OBS system.** Land #3164 instrumentation (runtime PR #171:
`platform_mcp_diag`). PR #171 ships it on the heartbeat; the companion emits it as a **boot-event
into `org_instance_boot_events`** so it's queryable alongside `image_pull`/`workspace_ready`.
2. **Harden the plugin fetch** — retry-with-backoff + fail-LOUD on a `kind=platform` agent.
3. **Prove plugin-only on staging** — provision a `kind=platform` agent on the plain runtime image
(claude-code, **and** smoke a non-claude runtime e.g. codex to prove switchability) + the plugin;
confirm `create_workspace` on a FRESH provision; staging E2E green.
4. **Cut over provisioning** — flip `kind=platform` *image selection* to the plain runtime image
(retire `resolvePlatformAgentImage`). Scope guard: this retires the *image-selection* path only —
**not the `kind=platform` field**.
5. **Re-provision existing concierges** → plugin-only; verify each via real-chat.
6. **Retire the image code** — drop `publish-platform-agent`; remove the image-specific dual-path code
(`resolvePlatformAgentImage`, `on_platform_agent_image`, `MOLECULE_PLATFORM_AGENT_IMAGE_BAKED`, the
baked-binary branch).
7. **Clean up + verify no-image, no-fallback (final)** — a fresh `kind=platform` provision references
zero platform-agent image and **no fallback path remains**; E2E green with the plugin as sole MCP
delivery.
8. Keep PR #3044 (provider-pin) — orthogonal.
> **Rollback:** through step 5, cutover is reversible (flip image-selection back + re-provision).
> After step 6 the safety net is gone — recovery rests on step 2's fail-loud signal + redeploying the
> prior image tag. Do not execute step 6 until step 3's per-runtime proof + a clean prod soak; keep
> the last image tag pullable one release cycle.
> **Do NOT retire `kind=platform` / `WORKSPACE_KIND.Platform`.** Only the *image* retires. The field
> is load-bearing: the **canvas** uses it to hide the concierge from the org-map graph (`Canvas.tsx:89`,
> `Toolbar.tsx:58`) and render it as the undeletable org-root (`ConciergeShell.tsx`,
> `canvas-topology.ts`, `socket.ts`); entitlement + provisioning special-handling key on it too.
**Acceptance:** a fresh `kind=platform` agent on the plain image + plugin surfaces `create_workspace`;
staging E2E green; no `molecule-platform-agent` image and no image-fallback branch anywhere in the
provision path.
## 5b. Local testability — no cloud wait
The entire correctness of this RFC is reproducible **locally in minutes**; staging e2e is the final
smoke, not the dev loop. The bug class (#3159/#3164) is runtime+plugin logic — no provisioned cloud
box needed to reproduce.
**Why these bugs reached staging:** (a) the delivery-contract test is claude-only (no per-runtime
render test → `MCPServerAdaptor` ignoring `runtime` was never caught); (b) the
"does the concierge see `create_workspace`" probe only runs LIVE (`E2E_REQUIRE_LIVE=1` on push/cron →
hours; `=0` on PRs → can false-green) so the real check lands after merge.
- **Addition 1 — per-runtime render unit tests (seconds).** Parametrized: assert `MCPServerAdaptor`
writes the right native config per runtime — incl. "codex → `config.toml`, **NOT**
`.claude/settings.json`." Extends `tests/test_mcp_plugin_delivery_contract.py`.
- **Addition 2 — local docker MCP-visibility harness (minutes).** `docker run` the runtime image + a
fixture config declaring the plugin → probe `loaded_mcp_tools` for
`mcp__molecule-platform__create_workspace`, per runtime. Zero-agent pre-check: launch the server,
`tools/list` over an MCP handshake.
Only the **provisioning plumbing** (EC2 cloud-init, tunnel, ECR, Neon, org-slug DNS) needs cloud.
Both additions ship **with** this revision — the lego model isn't done until a non-claude runtime
surfaces `create_workspace` locally; they are the regression guard against a future #3159-class escape.
## 6. Decisions (resolved for sign-off)
1. **Entitlement mechanism****core-side org-root-only gate** (already shipped — server-enforced, keyed on `kind=platform` + org-root). The marketplace entitlement broker is the future general path, NOT a blocker for this one privileged plugin.
2. **Persona prompt****out of scope here.** This RFC fixes the *MCP* (capability). The concierge *system prompt* is an independent failure (the runtime reads `/configs/system-prompt.md` but the template ships `prompts/concierge.md` — a naming/delivery mismatch) tracked as a companion fix. Retiring the image does not regress the prompt — the baked/asset channel never reliably delivered it on SaaS anyway.
3. **`npx` vs pre-bundled** → **`npx -y @molecule-ai/mcp-server`** (the current settings-fragment shape). Revisit pre-bundling only if first-turn cold-start is measured as a real problem.
4. **Image deprecation****RETIRE** the `molecule-platform-agent` image (CTO directive, 2026-06-23): platform agent = the standard image **for its configured runtime** (claude-code by default, switchable to codex/hermes/etc.) + the entitlement-gated plugin. The baked image is claude-code-bound and cannot serve other runtimes (§3.4), so it is incompatible with switchable platform agents regardless. A documented offline/self-host build recipe may be kept for air-gapped use, but nothing in the SaaS provision path references the image.
1. **Entitlement mechanism** → core-side org-root-only gate (shipped). Marketplace broker is the
future general path, not a blocker here.
2. **Persona prompt** → out of scope. This RFC fixes the *MCP*; the system prompt is an independent
failure (`/configs/system-prompt.md` vs template `prompts/concierge.md`), companion fix.
3. **`npx` vs pre-bundled** → `npx -y @molecule-ai/mcp-server`; revisit pre-bundling only if cold-start
is measured as a real problem.
4. **Image deprecation** → RETIRE (CTO 2026-06-23). Platform agent = standard image for its configured
runtime + the entitlement-gated plugin. A documented offline/self-host build recipe may be kept;
nothing in the SaaS provision path references the image.
5. **Naming convention** → noted, not renamed here. Repos are consistent at `molecule-ai-plugin-<name>`,
but the *name* suffix is not (some carry a redundant `molecule-`: `…-molecule-platform-mcp`; some
don't: `…-image-gen`). Renaming the platform plugin is out of scope (churns registry + template
declarations); tracked separately. Also fix the stale `builtins.py:446` comment, which cites
proposal-era names matching no current repo.
## 7. Non-goals
- The general core↔runtime provider-derivation drift (tracked separately,
template-claude-code issue #143).
- The CP config-regeneration-to-stub behavior — this RFC routes *around* it for
the MCP; whether to also stop the stub clobbering `prompt_files` is the
companion system-prompt fix (§6 D2), tracked separately.
- The core↔runtime provider-derivation drift (template-claude-code #143).
- The CP config-regeneration-to-stub behavior — routed *around* for the MCP; the system-prompt clobber
is §6 D2, tracked separately.
- The broader core `if runtime == "claude-code"` string-compares for LLM-auth/config
(`workspace_provision.go``runtimeUsesAnthropicNativeProxy`, model-normalization, session-volume).
Same "ask the adapter, don't compare the literal" theme, but out of this RFC's MCP scope — tracked
separately.
## 8. Decision requested (CTO sign-off)
Sign-off requested to adopt, as the committed architecture:
- **(a)** plugin-only delivery of the management MCP — a platform agent is a standard workspace of its **configured runtime** (claude-code by default, but switchable to codex/hermes/etc.) + the entitlement-gated `molecule-platform-mcp` plugin; **no baked image** (the baked image is claude-code-bound and cannot serve other runtimes — §3.4 — so it is structurally incompatible with switchable platform agents);
- **(b)** **retirement** of the `molecule-platform-agent` image per the gated §5 sequence — staging-first, and *blocked on the plugin-fetch hardening (§5 step 2)* so we never trade an intermittent failure for a hard one;
Sign-off to adopt as committed architecture:
- **(a)** plugin-only delivery of the management MCP — a platform agent is a standard workspace of its
*configured* runtime + the entitlement-gated `molecule-platform-mcp` plugin; no baked image (the
baked image is claude-code-bound — §3.4).
- **(b)** retirement of the `molecule-platform-agent` image per the gated §5 sequence — staging-first,
blocked on the plugin-fetch hardening (§5 step 2).
- **(c)** the resolved decisions in §6.
- **(d)** the **runtime-agnostic correctness requirement** (§2b/§3.4) — the plugin wires the MCP per
the configured runtime via per-runtime adapter rendering, with the identity gate + delivery contract
generalized, proven **per-runtime locally** (§5b). First-class architecture, not a follow-up; the §5
"DONE" items are the claude-only path and do not by themselves satisfy (d).
On sign-off, the fleet executes §5 in order. Step 1 (the #3164 `platform_mcp_diag` instrumentation, runtime PR #171) is already up for review and is the empirical gate that tells us whether prod failures today are image-fallback or plugin-fetch — informing the cutover.
**Why now:** the dual image+plugin delivery is the direct root of the recurring #3164 fragility (silent image-fallback → no `molecule-platform-mcp` binary → MCP fails to start → concierge can't `create_workspace` → staging E2E red). Collapsing to plugin-only removes the image-resolution failure mode, the `MOLECULE_PLATFORM_AGENT_IMAGE_BAKED` env-marker gating, and the build/publish/cross-account-pull maintenance burden — with no loss of the privilege boundary (it moves to the already-shipped org-root entitlement gate).
**Sign-off:** ☐ Approved as written ☐ Approved with changes ☐ Hold — _________________ (CTO) — Date: __________
On sign-off, the fleet executes §5 in order. Step 1 (PR #171) is already up and is the empirical gate
telling us whether prod failures are image-fallback or plugin-fetch.