docs(rfc): platform-mcp-as-plugin — runtime-agnostic / plugin-SSOT revision #3185
@@ -1,194 +1,246 @@
|
||||
# RFC: Deliver `molecule-platform-mcp` as an entitlement-gated MCP plugin
|
||||
|
||||
- **Status:** **Proposed — ready for CTO sign-off** (was Draft). Open questions resolved to decisions (§6), rollout made concrete and gated (§5), image-retirement scoped. Updated 2026-06-23.
|
||||
- **Status:** Proposed — ready for CTO sign-off (arch + entitlement change)
|
||||
- **Author:** devops-engineer (agent)
|
||||
- **Date:** 2026-06-18
|
||||
- **Related:** `rfc-platform-agent.md` §5.7, RFC #2843 (delivery decoupling),
|
||||
`MCPServerAdaptor` (runtime issue #847), marketplace/entitlement design,
|
||||
core PR #3044 (provider-pin seed, orthogonal), template PR #5 (inert).
|
||||
- **Date:** 2026-06-18 (revised 2026-06-23 — runtime-agnostic / plugin-SSOT reframe per CTO review)
|
||||
- **Related:** `rfc-platform-agent.md` §5.5/§5.7, `rfc-decouple-config-skill-delivery.md` §10a,
|
||||
RFC #2843, `MCPServerAdaptor` (runtime #847), marketplace/entitlement design,
|
||||
core #3044 (provider-pin, orthogonal), #3159 (live staging manifestation), #3164 (fragility).
|
||||
|
||||
## 1. Problem
|
||||
|
||||
The concierge (org-root `kind=platform` agent) is supposed to be the *Org
|
||||
Concierge*: a privileged agent that manages the platform through a management
|
||||
MCP (`molecule-platform-mcp`, 80+ org-admin tools incl. `create_workspace`,
|
||||
`list_workspaces`). In production today it is **not** that agent.
|
||||
The concierge (org-root `kind=platform` agent) is meant to be the *Org Concierge*: a privileged
|
||||
agent that manages the platform via a management MCP (`molecule-platform-mcp`, 80+ org-admin tools
|
||||
incl. `create_workspace`). In production today it is **not** that agent.
|
||||
|
||||
Observed on prod tenant `test3` (2026-06-18), the concierge introspects as:
|
||||
|
||||
- System prompt: `"You are a Claude agent, built on Anthropic's Claude Agent SDK."` — the **generic** default, not the concierge persona.
|
||||
- MCP servers: **`a2a` only** — the management MCP is **not wired**.
|
||||
- `create_workspace` / `list_workspaces`: **absent**.
|
||||
|
||||
Asked to "spawn a test agent," it used Claude Code's **built-in `Task`
|
||||
sub-agents** in `/workspace` (~17k tokens) instead of calling `create_workspace`
|
||||
to create a Molecule workspace. It is **vanilla Claude Code**, not the concierge.
|
||||
Observed on prod `test3` (2026-06-18): system prompt is the generic Claude default (not the concierge
|
||||
persona); MCP servers = `a2a` only (management MCP **not wired**); `create_workspace`/`list_workspaces`
|
||||
**absent**. Asked to "spawn a test agent," it used Claude Code's built-in `Task` sub-agents instead of
|
||||
`create_workspace`. It is vanilla Claude Code, not the concierge.
|
||||
|
||||
### Root cause
|
||||
|
||||
The concierge identity + MCP wiring are delivered via the **asset/baked
|
||||
channel** (template `config.yaml` with `prompt_files` + `mcp_servers.yaml`,
|
||||
optionally a baked `molecule-platform-agent` image). On SaaS this channel does
|
||||
**not** land: the on-box `/configs/config.yaml` is a **218-byte CP-regenerated
|
||||
stub** —
|
||||
|
||||
```yaml
|
||||
name: <wsid>
|
||||
runtime: claude-code
|
||||
a2a: {port: 8000, streaming: true}
|
||||
model: 'moonshot/kimi-k2.6'
|
||||
provider: 'platform'
|
||||
runtime_config: {model: 'moonshot/kimi-k2.6', provider: 'platform'}
|
||||
```
|
||||
|
||||
— with **no `prompt_files`** (→ generic prompt) and **no `mcp_servers`** (→ no
|
||||
management MCP). No `concierge.md` or `mcp_servers.yaml` reaches the box. The CP
|
||||
regenerates this stub on every (re)provision/restart, so even a correct template
|
||||
`config.yaml` is overwritten.
|
||||
|
||||
Meanwhile the **plugin channel works**: the post-online reconcile reliably
|
||||
installs declared plugins (e.g. the `seo-all` skill) into `/configs`.
|
||||
The concierge identity + MCP wiring are delivered via the **asset/baked channel** (template
|
||||
`config.yaml` + optional baked `molecule-platform-agent` image). On SaaS that channel does **not**
|
||||
land: `/configs/config.yaml` is a 218-byte CP-regenerated stub (no `prompt_files` → generic prompt;
|
||||
no `mcp_servers` → no management MCP), regenerated on every (re)provision. Meanwhile the **plugin
|
||||
channel works**: the post-online reconcile reliably installs declared plugins (e.g. `seo-all`).
|
||||
|
||||
## 2. Proposal
|
||||
|
||||
Deliver `molecule-platform-mcp` as an **entitlement-gated MCP-server plugin**,
|
||||
declared by the platform-agent template and installed dynamically post-online —
|
||||
**not** via the asset relay or a baked image.
|
||||
Deliver `molecule-platform-mcp` as an **entitlement-gated MCP-server plugin**, declared by the
|
||||
platform-agent template and installed dynamically post-online — **not** via the asset relay or a
|
||||
baked image.
|
||||
|
||||
This is not new machinery. The runtime already ships **`MCPServerAdaptor`**
|
||||
(`molecule_runtime/plugins_registry/builtins.py`, issue #847), promoted to a
|
||||
first-class adaptor after four MCP plugins shipped it (`molecule-firecrawl`
|
||||
#512, `molecule-github-mcp` #520, `molecule-browser-use` #553, `mcp-connector`).
|
||||
It deep-merges a plugin's `settings-fragment.json` `mcpServers` block into
|
||||
`/configs/.claude/settings.json` — exactly where the Claude Code SDK reads MCP
|
||||
servers.
|
||||
Not new machinery: the runtime already ships `MCPServerAdaptor`
|
||||
(`molecule_runtime/plugins_registry/builtins.py`, #847), promoted to first-class after several early
|
||||
MCP-plugin proposals (#512/#520/#553). The registry standardizes on the `molecule-ai-plugin-<name>`
|
||||
repo convention (e.g. `molecule-ai-plugin-image-gen`, `molecule-ai-plugin-molecule-platform-mcp`).
|
||||
**Today** the adaptor renders a plugin's MCP into `/configs/.claude/settings.json` (the claude-code
|
||||
path only); the revision makes that rendering **per-runtime** (§2b).
|
||||
|
||||
### Plugin shape
|
||||
|
||||
`molecule-platform-mcp` ships a `settings-fragment.json`:
|
||||
|
||||
```json
|
||||
{
|
||||
"mcpServers": {
|
||||
"molecule-platform": {
|
||||
"command": "npx",
|
||||
"args": ["-y", "@molecule-ai/mcp-server"],
|
||||
"env": {
|
||||
"MOLECULE_MCP_MODE": "management",
|
||||
"MOLECULE_API_URL": "${MOLECULE_API_URL}",
|
||||
"MOLECULE_ORG_API_KEY": "${MOLECULE_ORG_API_KEY}"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
The secret (`MOLECULE_ORG_API_KEY`) is **referenced, never embedded** — core
|
||||
keeps injecting it into the container env via `conciergePlatformMCPEnv`. The
|
||||
concierge persona prompt ships as the plugin's rule/`SKILL.md`-style content (or
|
||||
remains the one small identity asset, per the RFC #2843 carve-out for
|
||||
`config.yaml` + prompts).
|
||||
The plugin's canonical shape is the **runtime-agnostic MCP descriptor** in §2b (the SSOT). The
|
||||
claude-code adapter *renders* that descriptor into a `settings-fragment.json` — the claude rendering,
|
||||
**not** the cross-runtime contract. The secret (`MOLECULE_ORG_API_KEY`) is **referenced, never
|
||||
embedded** — core injects it into the container env via `conciergePlatformMCPEnv`.
|
||||
|
||||
### Wiring
|
||||
- The platform-agent template declares `molecule-platform-mcp` in `workspace_declared_plugins`.
|
||||
- The post-online reconcile resolves the per-runtime adapter and installs it.
|
||||
- `npx -y @molecule-ai/mcp-server` launches on demand → **no baked binary**; the standard runtime
|
||||
image for the concierge's *configured* runtime (claude-code by default, switchable) + this plugin
|
||||
= concierge.
|
||||
- **Runtime-agnostic delivery is a requirement, not an assumption** (§2b, §3.4).
|
||||
|
||||
- The platform-agent template declares `molecule-platform-mcp` in
|
||||
`workspace_declared_plugins`.
|
||||
- The post-online reconcile resolves `MCPServerAdaptor` and installs it.
|
||||
- `command: npx -y @molecule-ai/mcp-server` launches on demand → **no baked
|
||||
binary**, so the special `molecule-platform-agent` image is no longer required:
|
||||
the standard runtime image **for the concierge's CONFIGURED runtime** (claude-code
|
||||
by default, but switchable — see §3.4) + this plugin = concierge.
|
||||
- **Runtime-agnostic delivery is a requirement, not an assumption.** `MCPServerAdaptor`
|
||||
must wire the management MCP into whatever the configured runtime reads for MCP
|
||||
servers (claude-code's `settings.json`; the equivalent for codex/hermes/etc.) —
|
||||
so a concierge on any runtime gets `create_workspace`. (The baked image can never
|
||||
be runtime-agnostic; the plugin can.)
|
||||
## 2b. The single source of truth: the **plugin declaration** (not a separate `mcp_servers:` list)
|
||||
|
||||
**SSOT = the plugin.** An MCP server is delivered *as a plugin*; capabilities are declared **once**,
|
||||
in the plugin list (`config.yaml: plugins:` / DB `workspace_declared_plugins`). The plugin *package*
|
||||
carries a runtime-agnostic MCP descriptor + per-runtime adapters; the shape adapter renders it into
|
||||
the runtime's native MCP config. There is **no separate top-level `mcp_servers:` list** — that would
|
||||
be a second, competing declaration of the same thing.
|
||||
|
||||
> **Correction (CTO review):** an earlier draft proposed `config.yaml: mcp_servers:` as the SSOT.
|
||||
> That is redundant — `config.yaml` already declares plugins, and the MCP *is* a plugin. A standalone
|
||||
> `mcp_servers:` list is the "two delivery paths never reconciled" the docs audit flagged (it came
|
||||
> from `rfc-platform-agent §5.5`, pre-plugin). The plugin declaration supersedes it.
|
||||
|
||||
**A) Now.** The plugin list already exists (`WorkspaceConfig.plugins: list[str]`, `config.py:350`).
|
||||
On SaaS the box gets the 218-byte stub (neither `plugins` nor `mcp_servers` populated). The always-on
|
||||
`a2a` MCP is a runtime *builtin* (injected directly, not a plugin/config). The management MCP today
|
||||
rides the claude-only `settings-fragment.json` → `.claude/settings.json`. The `rfc-platform-agent
|
||||
§5.5` `extra_mcp_servers`/`mcp_servers:` proposal is the redundancy trap to retire, not adopt.
|
||||
|
||||
**B) Changes (fix the plugin package — do NOT add a new list).** Declare the MCP as a plugin
|
||||
(`plugins: [molecule-platform-mcp]`, already true via the template). Add, *inside the plugin package*,
|
||||
a runtime-agnostic MCP descriptor (`name`/`command`/`args`/`env`, secret referenced) + per-runtime
|
||||
adapters (`adapters/<runtime>.py`). `settings-fragment.json` becomes the claude adapter's *output*.
|
||||
|
||||
**C) Read (one plugin, N renderings).** Plugin declared → reconcile resolves the per-runtime shape
|
||||
adapter → it materializes the descriptor into the active runtime's native MCP config:
|
||||
|
||||
| Runtime | Native MCP config |
|
||||
|---|---|
|
||||
| claude-code | `/configs/.claude/settings.json` → `mcpServers` (today's `settings-fragment.json` = this adapter's output) |
|
||||
| codex | `~/.codex/config.toml` → `[mcp_servers]` |
|
||||
| gemini-cli | `~/.gemini/settings.json` |
|
||||
| hermes | `platforms.*` stanza / entry-point |
|
||||
|
||||
The server launches on demand and authenticates purely from container **env** — referenced in the
|
||||
descriptor, injected by core, never embedded.
|
||||
|
||||
> **Identity-gate corollary:** "is the management MCP wired?" must be answered by **asking the active
|
||||
> adapter**, not by reading `/configs/.claude/settings.json`. Today's claude-only check
|
||||
> (`platform_agent_identity.py`) fail-closes a codex/hermes concierge **offline** even when its MCP is
|
||||
> correctly wired — this is the live **#3159** staging failure.
|
||||
|
||||
## 3. Why this is the right channel
|
||||
|
||||
1. **Uses the delivery path that works** (plugins) instead of the one that
|
||||
doesn't (asset/baked stub).
|
||||
2. **Consistent with platform direction** — `feedback_skills_are_plugins_dynamic_install`:
|
||||
plugins install dynamically post-boot; the asset relay is for small
|
||||
identity/config only. An MCP server is a *capability*, so it belongs in the
|
||||
plugin channel.
|
||||
3. **Retires a maintenance burden** — no special concierge image to build, pin,
|
||||
repin (a large share of recent incident toil), and MCP updates ship via the
|
||||
1. Uses the delivery path that works (plugins) instead of the one that doesn't (asset/baked stub).
|
||||
2. Consistent with platform direction (`feedback_skills_are_plugins_dynamic_install`): an MCP server
|
||||
is a *capability* → plugin channel.
|
||||
3. Retires a maintenance burden — no special image to build/pin/repin; MCP updates ship via the
|
||||
registry without an image rebuild.
|
||||
4. **Required for runtime-switchable platform agents (correctness, not just
|
||||
cleanup).** A platform agent is NOT claude-code-specific — its runtime is
|
||||
*switchable* (claude-code is only the current default; codex/hermes/openclaw
|
||||
are first-class). The baked `molecule-platform-agent` image is built **FROM the
|
||||
claude-code runtime image**, so it structurally **binds the concierge to
|
||||
claude-code** and cannot serve a codex/hermes concierge. Only the
|
||||
runtime-agnostic plugin model (MCP wired per the configured runtime via
|
||||
`MCPServerAdaptor`) supports a switchable-runtime concierge. The image isn't
|
||||
just redundant — for any non-claude-code platform agent it is **wrong**.
|
||||
|
||||
### 3.4 Required for runtime-switchable platform agents — correctness, not cleanup
|
||||
A platform agent is **NOT** claude-code-specific — its runtime is **switchable** (claude-code is only
|
||||
the current default; codex/hermes/openclaw are first-class). The baked `molecule-platform-agent`
|
||||
image is built **FROM the claude-code image**, so it structurally binds the concierge to claude-code
|
||||
and cannot serve a codex/hermes concierge. Only the runtime-agnostic plugin model supports a
|
||||
switchable-runtime concierge. **For any non-claude-code platform agent the image is not just
|
||||
redundant — it is *wrong*.**
|
||||
|
||||
## 4. Security — the load-bearing constraint
|
||||
|
||||
The management MCP holds the org-admin token and can create/delete workspaces.
|
||||
It **MUST** be installable **only** on the org-root `kind=platform` concierge,
|
||||
**never** on a user workspace. A normal public plugin-install path would be a
|
||||
privilege-escalation hole.
|
||||
The management MCP holds the org-admin token and can create/delete workspaces. It **MUST** be
|
||||
installable **only** on the org-root `kind=platform` concierge, never on a user workspace.
|
||||
|
||||
Requirements:
|
||||
- **Entitlement gate:** the platform-mcp plugin is installable only for the
|
||||
org-root concierge (enforced server-side at install/reconcile, keyed on
|
||||
`kind=platform` + org-root, not client-asserted). Ties into the
|
||||
marketplace/entitlement design.
|
||||
- **Secret separation:** the org-admin token stays a core-injected container
|
||||
env var; the plugin only references it.
|
||||
- **Audit:** install of the privileged plugin is logged like any org-admin
|
||||
action.
|
||||
- **Entitlement gate:** installable only for the org-root concierge, enforced server-side at
|
||||
install/reconcile, keyed on `kind=platform` + org-root (not client-asserted). *(Already shipped.)*
|
||||
- **Secret separation:** the org-admin token stays a core-injected container env var; the plugin only
|
||||
references it.
|
||||
- **Audit:** install of the privileged plugin is logged like any org-admin action.
|
||||
|
||||
## 5. Migration / rollout — concrete, gated sequence
|
||||
|
||||
**Already shipped** (the plugin path exists end-to-end — only the cutover + retirement remain):
|
||||
- `molecule-platform-mcp` plugin (settings-fragment + `MCPServerAdaptor`) — built, in the registry.
|
||||
- Org-root-only **entitlement gate** (server-side, keyed on `kind=platform` + org-root) — shipped.
|
||||
- Platform-agent provisioner **declares** it (`seedTemplatePlugins` in `applyConciergeProvisionConfig`).
|
||||
- Verified: a fresh concierge can install it + gain `create_workspace`.
|
||||
- Readiness gate (the original step 3): the management-MCP gate has a 90s warmup grace (`managementMCPUnloadedGrace`, core#3082), so the post-online install window doesn't false-red.
|
||||
**Already shipped (claude-code path only):** the plugin (settings-fragment + `MCPServerAdaptor`),
|
||||
the org-root entitlement gate, the provisioner declaration (`seedTemplatePlugins`), verification that
|
||||
a fresh **claude-code** concierge installs it + gains `create_workspace`, and the 90s warmup grace
|
||||
(`managementMCPUnloadedGrace`, #3082).
|
||||
|
||||
**The one blocker before the image can be retired:** the plugin is a **private Gitea repo** fetched at boot, and that fetch has been **flaky** (404 on missing/over-scoped token, gitea hang — #3065/#3108). The baked image is today's *safety net* against exactly that. Retiring it without hardening the fetch trades an intermittent degradation for a hard one.
|
||||
> **The revision's CORE work — runtime-agnostic delivery — is NOT done yet.** Everything above is the
|
||||
> claude-code path. The actual lego fix is still to build, and is the prerequisite for step 3:
|
||||
> - **Per-runtime adapter rendering** — `MCPServerAdaptor` renders the §2b descriptor into each
|
||||
> runtime's native MCP config (it currently ignores its `runtime` arg and always writes
|
||||
> `.claude/settings.json`).
|
||||
> - **Generalize the identity gate** — ask the active adapter "is the management MCP wired?" instead
|
||||
> of reading `.claude/settings.json` (today's read fail-closes codex/hermes offline = #3159).
|
||||
> - **Update the delivery contract** — `mcp-plugin-delivery.contract.json` pins the claude path as
|
||||
> SSOT with a drift test; it must become per-runtime or the new adapters fail it.
|
||||
> - **Proven by §5b** (per-runtime render tests + local docker MCP-visibility harness).
|
||||
|
||||
**Sequence (staging-first, each step gated on the prior):**
|
||||
1. **Make the failure visible** — land the #3164 instrumentation (runtime PR #171: `platform_mcp_diag` in the heartbeat) so the CP can see *which* path fails (image-fallback vs plugin-fetch) without box SSH. **Already up for review.**
|
||||
2. **Harden the plugin fetch** — retry-with-backoff + a **fail-LOUD** signal when the management-MCP plugin fetch fails on a `kind=platform` agent, so a concierge can never come up silently "online-but-no-MCP." Closes the safety-net dependency the image currently provides.
|
||||
3. **Prove plugin-only on staging** — provision a `kind=platform` agent on the **plain runtime image for its configured runtime** (claude-code by default; ALSO smoke at least one non-claude-code runtime, e.g. codex, to prove switchability) (no platform-agent image) + the plugin; confirm `create_workspace` on a FRESH provision and the staging E2E ("Platform Boot", "Concierge Creates Workspace") green.
|
||||
4. **Cut over provisioning** — flip `kind=platform` image selection to the plain runtime image (retire `resolvePlatformAgentImage` / the `-platform-agent` variant); the plugin becomes the sole MCP delivery.
|
||||
5. **Re-provision existing concierges** → plugin-only; verify each online + `create_workspace` via real-chat.
|
||||
6. **Retire the image** — drop the `publish-platform-agent` job; remove the dual-path code (`on_platform_agent_image`, `MOLECULE_PLATFORM_AGENT_IMAGE_BAKED`, the baked-binary branch of `mcp_server_present`). Keep a documented offline/self-host build recipe (§6 Q4) but publish/provision nothing.
|
||||
7. Keep PR #3044 (provider-pin) — orthogonal (responsiveness, not capability).
|
||||
**Blocker before the image can retire:** the plugin repo fetch must be reliable. As of 2026-06-23 the
|
||||
repo `molecule-ai/molecule-ai-plugin-molecule-platform-mcp` is **public** (verified anon HTTP 200), so
|
||||
this plugin's fetch is token-free and sidesteps the `gitea://` private-repo hang (#3108). Auth-gated
|
||||
fetch applies only to future *private* plugins.
|
||||
|
||||
**Acceptance:** a fresh `kind=platform` agent on the plain image + plugin surfaces `create_workspace`; staging E2E green; no `molecule-platform-agent` image referenced in the provision path.
|
||||
**Sequence (staging-first; each step gated on the prior):**
|
||||
1. **Make the failure visible — via the OBS system.** Land #3164 instrumentation (runtime PR #171:
|
||||
`platform_mcp_diag`). PR #171 ships it on the heartbeat; the companion emits it as a **boot-event
|
||||
into `org_instance_boot_events`** so it's queryable alongside `image_pull`/`workspace_ready`.
|
||||
2. **Harden the plugin fetch** — retry-with-backoff + fail-LOUD on a `kind=platform` agent.
|
||||
3. **Prove plugin-only on staging** — provision a `kind=platform` agent on the plain runtime image
|
||||
(claude-code, **and** smoke a non-claude runtime e.g. codex to prove switchability) + the plugin;
|
||||
confirm `create_workspace` on a FRESH provision; staging E2E green.
|
||||
4. **Cut over provisioning** — flip `kind=platform` *image selection* to the plain runtime image
|
||||
(retire `resolvePlatformAgentImage`). Scope guard: this retires the *image-selection* path only —
|
||||
**not the `kind=platform` field**.
|
||||
5. **Re-provision existing concierges** → plugin-only; verify each via real-chat.
|
||||
6. **Retire the image code** — drop `publish-platform-agent`; remove the image-specific dual-path code
|
||||
(`resolvePlatformAgentImage`, `on_platform_agent_image`, `MOLECULE_PLATFORM_AGENT_IMAGE_BAKED`, the
|
||||
baked-binary branch).
|
||||
7. **Clean up + verify no-image, no-fallback (final)** — a fresh `kind=platform` provision references
|
||||
zero platform-agent image and **no fallback path remains**; E2E green with the plugin as sole MCP
|
||||
delivery.
|
||||
8. Keep PR #3044 (provider-pin) — orthogonal.
|
||||
|
||||
> **Rollback:** through step 5, cutover is reversible (flip image-selection back + re-provision).
|
||||
> After step 6 the safety net is gone — recovery rests on step 2's fail-loud signal + redeploying the
|
||||
> prior image tag. Do not execute step 6 until step 3's per-runtime proof + a clean prod soak; keep
|
||||
> the last image tag pullable one release cycle.
|
||||
|
||||
> **Do NOT retire `kind=platform` / `WORKSPACE_KIND.Platform`.** Only the *image* retires. The field
|
||||
> is load-bearing: the **canvas** uses it to hide the concierge from the org-map graph (`Canvas.tsx:89`,
|
||||
> `Toolbar.tsx:58`) and render it as the undeletable org-root (`ConciergeShell.tsx`,
|
||||
> `canvas-topology.ts`, `socket.ts`); entitlement + provisioning special-handling key on it too.
|
||||
|
||||
**Acceptance:** a fresh `kind=platform` agent on the plain image + plugin surfaces `create_workspace`;
|
||||
staging E2E green; no `molecule-platform-agent` image and no image-fallback branch anywhere in the
|
||||
provision path.
|
||||
|
||||
## 5b. Local testability — no cloud wait
|
||||
|
||||
The entire correctness of this RFC is reproducible **locally in minutes**; staging e2e is the final
|
||||
smoke, not the dev loop. The bug class (#3159/#3164) is runtime+plugin logic — no provisioned cloud
|
||||
box needed to reproduce.
|
||||
|
||||
**Why these bugs reached staging:** (a) the delivery-contract test is claude-only (no per-runtime
|
||||
render test → `MCPServerAdaptor` ignoring `runtime` was never caught); (b) the
|
||||
"does the concierge see `create_workspace`" probe only runs LIVE (`E2E_REQUIRE_LIVE=1` on push/cron →
|
||||
hours; `=0` on PRs → can false-green) so the real check lands after merge.
|
||||
|
||||
- **Addition 1 — per-runtime render unit tests (seconds).** Parametrized: assert `MCPServerAdaptor`
|
||||
writes the right native config per runtime — incl. "codex → `config.toml`, **NOT**
|
||||
`.claude/settings.json`." Extends `tests/test_mcp_plugin_delivery_contract.py`.
|
||||
- **Addition 2 — local docker MCP-visibility harness (minutes).** `docker run` the runtime image + a
|
||||
fixture config declaring the plugin → probe `loaded_mcp_tools` for
|
||||
`mcp__molecule-platform__create_workspace`, per runtime. Zero-agent pre-check: launch the server,
|
||||
`tools/list` over an MCP handshake.
|
||||
|
||||
Only the **provisioning plumbing** (EC2 cloud-init, tunnel, ECR, Neon, org-slug DNS) needs cloud.
|
||||
Both additions ship **with** this revision — the lego model isn't done until a non-claude runtime
|
||||
surfaces `create_workspace` locally; they are the regression guard against a future #3159-class escape.
|
||||
|
||||
## 6. Decisions (resolved for sign-off)
|
||||
|
||||
1. **Entitlement mechanism** → **core-side org-root-only gate** (already shipped — server-enforced, keyed on `kind=platform` + org-root). The marketplace entitlement broker is the future general path, NOT a blocker for this one privileged plugin.
|
||||
2. **Persona prompt** → **out of scope here.** This RFC fixes the *MCP* (capability). The concierge *system prompt* is an independent failure (the runtime reads `/configs/system-prompt.md` but the template ships `prompts/concierge.md` — a naming/delivery mismatch) tracked as a companion fix. Retiring the image does not regress the prompt — the baked/asset channel never reliably delivered it on SaaS anyway.
|
||||
3. **`npx` vs pre-bundled** → **`npx -y @molecule-ai/mcp-server`** (the current settings-fragment shape). Revisit pre-bundling only if first-turn cold-start is measured as a real problem.
|
||||
4. **Image deprecation** → **RETIRE** the `molecule-platform-agent` image (CTO directive, 2026-06-23): platform agent = the standard image **for its configured runtime** (claude-code by default, switchable to codex/hermes/etc.) + the entitlement-gated plugin. The baked image is claude-code-bound and cannot serve other runtimes (§3.4), so it is incompatible with switchable platform agents regardless. A documented offline/self-host build recipe may be kept for air-gapped use, but nothing in the SaaS provision path references the image.
|
||||
1. **Entitlement mechanism** → core-side org-root-only gate (shipped). Marketplace broker is the
|
||||
future general path, not a blocker here.
|
||||
2. **Persona prompt** → out of scope. This RFC fixes the *MCP*; the system prompt is an independent
|
||||
failure (`/configs/system-prompt.md` vs template `prompts/concierge.md`), companion fix.
|
||||
3. **`npx` vs pre-bundled** → `npx -y @molecule-ai/mcp-server`; revisit pre-bundling only if cold-start
|
||||
is measured as a real problem.
|
||||
4. **Image deprecation** → RETIRE (CTO 2026-06-23). Platform agent = standard image for its configured
|
||||
runtime + the entitlement-gated plugin. A documented offline/self-host build recipe may be kept;
|
||||
nothing in the SaaS provision path references the image.
|
||||
5. **Naming convention** → noted, not renamed here. Repos are consistent at `molecule-ai-plugin-<name>`,
|
||||
but the *name* suffix is not (some carry a redundant `molecule-`: `…-molecule-platform-mcp`; some
|
||||
don't: `…-image-gen`). Renaming the platform plugin is out of scope (churns registry + template
|
||||
declarations); tracked separately. Also fix the stale `builtins.py:446` comment, which cites
|
||||
proposal-era names matching no current repo.
|
||||
|
||||
## 7. Non-goals
|
||||
|
||||
- The general core↔runtime provider-derivation drift (tracked separately,
|
||||
template-claude-code issue #143).
|
||||
- The CP config-regeneration-to-stub behavior — this RFC routes *around* it for
|
||||
the MCP; whether to also stop the stub clobbering `prompt_files` is the
|
||||
companion system-prompt fix (§6 D2), tracked separately.
|
||||
- The core↔runtime provider-derivation drift (template-claude-code #143).
|
||||
- The CP config-regeneration-to-stub behavior — routed *around* for the MCP; the system-prompt clobber
|
||||
is §6 D2, tracked separately.
|
||||
- The broader core `if runtime == "claude-code"` string-compares for LLM-auth/config
|
||||
(`workspace_provision.go` — `runtimeUsesAnthropicNativeProxy`, model-normalization, session-volume).
|
||||
Same "ask the adapter, don't compare the literal" theme, but out of this RFC's MCP scope — tracked
|
||||
separately.
|
||||
|
||||
## 8. Decision requested (CTO sign-off)
|
||||
|
||||
Sign-off requested to adopt, as the committed architecture:
|
||||
|
||||
- **(a)** plugin-only delivery of the management MCP — a platform agent is a standard workspace of its **configured runtime** (claude-code by default, but switchable to codex/hermes/etc.) + the entitlement-gated `molecule-platform-mcp` plugin; **no baked image** (the baked image is claude-code-bound and cannot serve other runtimes — §3.4 — so it is structurally incompatible with switchable platform agents);
|
||||
- **(b)** **retirement** of the `molecule-platform-agent` image per the gated §5 sequence — staging-first, and *blocked on the plugin-fetch hardening (§5 step 2)* so we never trade an intermittent failure for a hard one;
|
||||
Sign-off to adopt as committed architecture:
|
||||
- **(a)** plugin-only delivery of the management MCP — a platform agent is a standard workspace of its
|
||||
*configured* runtime + the entitlement-gated `molecule-platform-mcp` plugin; no baked image (the
|
||||
baked image is claude-code-bound — §3.4).
|
||||
- **(b)** retirement of the `molecule-platform-agent` image per the gated §5 sequence — staging-first,
|
||||
blocked on the plugin-fetch hardening (§5 step 2).
|
||||
- **(c)** the resolved decisions in §6.
|
||||
- **(d)** the **runtime-agnostic correctness requirement** (§2b/§3.4) — the plugin wires the MCP per
|
||||
the configured runtime via per-runtime adapter rendering, with the identity gate + delivery contract
|
||||
generalized, proven **per-runtime locally** (§5b). First-class architecture, not a follow-up; the §5
|
||||
"DONE" items are the claude-only path and do not by themselves satisfy (d).
|
||||
|
||||
On sign-off, the fleet executes §5 in order. Step 1 (the #3164 `platform_mcp_diag` instrumentation, runtime PR #171) is already up for review and is the empirical gate that tells us whether prod failures today are image-fallback or plugin-fetch — informing the cutover.
|
||||
|
||||
**Why now:** the dual image+plugin delivery is the direct root of the recurring #3164 fragility (silent image-fallback → no `molecule-platform-mcp` binary → MCP fails to start → concierge can't `create_workspace` → staging E2E red). Collapsing to plugin-only removes the image-resolution failure mode, the `MOLECULE_PLATFORM_AGENT_IMAGE_BAKED` env-marker gating, and the build/publish/cross-account-pull maintenance burden — with no loss of the privilege boundary (it moves to the already-shipped org-root entitlement gate).
|
||||
|
||||
**Sign-off:** ☐ Approved as written ☐ Approved with changes ☐ Hold — _________________ (CTO) — Date: __________
|
||||
On sign-off, the fleet executes §5 in order. Step 1 (PR #171) is already up and is the empirical gate
|
||||
telling us whether prod failures are image-fallback or plugin-fetch.
|
||||
|
||||
Reference in New Issue
Block a user