docs: reconcile platform-agent docs with the plugin-SSOT / runtime-agnostic model #3191

Merged
devops-engineer merged 1 commits from docs/reconcile-platform-mcp-plugin into main 2026-06-23 23:03:50 +00:00
5 changed files with 143 additions and 8 deletions
+24
View File
@@ -182,6 +182,30 @@ The agent uses these tools naturally — no special instructions needed. Access
Example flow: Marketing uses `delegate_task(seo_id, "What is your status?")` → A2A message to SEO → SEO responds → result returned to Marketing.
### Additional MCP servers via plugins (plugin declaration is the SSOT)
Beyond the always-on `a2a` server, a workspace gains **additional MCP servers
by declaring plugins** — the plugin declaration (`config.yaml: plugins:`) is
the single source of truth for an agent's MCP servers; there is **no separate
hand-maintained `mcp_servers:` list**.
Conceptually: an **MCP plugin** ships a *runtime-agnostic MCP descriptor* (the
logical server definition, with secrets referenced rather than embedded), and a
**per-runtime shape adapter** renders that descriptor into the active runtime's
native MCP config — for claude-code that is the `.claude/settings.json`
`mcpServers` block read by the SDK; for other runtimes it is their own MCP
config location (codex `~/.codex/config.toml`, gemini `~/.gemini/settings.json`,
hermes `platforms.*`). Because the descriptor is runtime-agnostic, the same MCP
plugin works across runtimes and the agent stays **runtime-switchable**.
This is the channel by which the org concierge gets its privileged management /
platform MCP (entitlement-gated to the org-root `kind=platform` agent) — see
[`rfc-platform-mcp-as-plugin.md`](../design/rfc-platform-mcp-as-plugin.md) and
[plugins/agentskills-compat.md](../plugins/agentskills-compat.md#mcp-server-plugins-the-plugin-declaration-is-the-ssot).
The concrete adapter API and registry resolution order live in the workspace
runtime — **see the runtime implementation**; they are intentionally not pinned
here.
### Delegation Error Handling
When `delegate_task` receives an error from a child (auth failure, timeout, offline), the MCP server wraps it as a `DELEGATION FAILED` message with instructions for the calling agent to: (1) try a different peer, (2) handle the task itself, or (3) inform the user which peer is unavailable and provide its own best answer. Errors are tagged with a `[A2A_ERROR]` sentinel prefix so they can be reliably distinguished from normal response text. Coordinator prompts and A2A instructions reinforce that agents must never forward raw error messages to the user.
@@ -2,7 +2,7 @@
**Status:** Draft
**Author:** CEO Assistant (on CTO direction)
**Related:** RCA #2831 (SaaS agents lose config/skills/memory), #2832 (credentials in auto-memory), #2838 (provisioner reconciliation — partial), merged runtime fix #125/#134 (memory re-inject on auto-heal + persistence discipline), seo-template #16 (slash-command format)
**Related:** RCA #2831 (SaaS agents lose config/skills/memory), #2832 (credentials in auto-memory), #2838 (provisioner reconciliation — partial), merged runtime fix #125/#134 (memory re-inject on auto-heal + persistence discipline), seo-template #16 (slash-command format), [`rfc-platform-mcp-as-plugin.md`](rfc-platform-mcp-as-plugin.md) (the concierge **management MCP** moves to the plugin channel — companion to the §10a concierge-identity-as-template fix below)
## 1. Summary
@@ -119,6 +119,15 @@ The same "should be a template, not a patch" smell exists for the **org concierg
The concierge has an image (`Dockerfile.platform-agent`) but **no template home for its identity** — so its prompt/config/model live as core string literals, exactly like the SEO skill files did. The fix is the same abstraction: make the concierge a **platform-agent template** (prompt/config/model in template files) delivered via this RFC's generic asset channel, and delete the `conciergeSystemPromptTmpl`/`conciergeMCPServersBlock`/`conciergeIdentityFiles` literals from core. The asset channel introduced here is the enabler for removing **both** the SEO patch **and** the concierge hardcoding.
> **Cross-ref: [`rfc-platform-mcp-as-plugin.md`](rfc-platform-mcp-as-plugin.md).** That RFC completes
> this de-hardcoding for the concierge along the **plugin** axis: the `conciergeMCPServersBlock`
> management-MCP wiring moves out of core into an **entitlement-gated MCP plugin** declared by the
> platform-agent template (`config.yaml: plugins:` is the SSOT). It also **retires the
> `Dockerfile.platform-agent` baked image** (the standard runtime image + the plugin is the
> concierge) and makes the platform agent **runtime-switchable** (no hardcoded `runtime: claude-code`).
> In short: this RFC's asset channel carries the small concierge **identity** (config/prompts);
> the plugin channel carries the concierge **capability** (the management MCP).
**Audit scope notes:** per-runtime branches in core (e.g. `if runtime == "hermes"` for provision-timeout/config paths) are adapter/registry concerns, not per-template patches — lower priority, candidates for data-driven cleanup but not in this RFC. No plugin-behavior was found hardcoded in core (the plugin system is used for extensions). The two clear "should be a template" patches are: (1) SEO skill package, (2) concierge identity.
## 10. What we keep
+56 -7
View File
@@ -6,6 +6,22 @@
**This document is the single source of truth (SSOT) for the feature.** Code, OpenAPI, the platform
MCP, and end-user docs reconcile to this RFC — not to each other.
> **Superseded in part by [`rfc-platform-mcp-as-plugin.md`](rfc-platform-mcp-as-plugin.md).**
> The conceptual model in this RFC (platform agent as the org root, `kind` discriminator,
> default-target resolver, approval gate, billing/model parity) still stands. What has changed is
> the **delivery mechanism for the management MCP and the concierge identity**:
> - The management MCP is now delivered as an **entitlement-gated MCP plugin** (the plugin
> declaration in `config.yaml: plugins:` is the SSOT), **not** via a `config.yaml: mcp_servers:`
> list (§5.5) and **not** via a dedicated baked image (§5.7).
> - The concierge persona/config/model is a **platform-agent template** (see
> [`rfc-decouple-config-skill-delivery.md`](rfc-decouple-config-skill-delivery.md) §10a),
> not core string literals or a baked image.
> - The platform agent is **runtime-switchable** (claude-code is the default, not a hard
> requirement); the baked `molecule-platform-agent` image is **retired**.
>
> Sections below tagged *(superseded by rfc-platform-mcp-as-plugin)* are retained for history;
> defer to that RFC for the MCP-delivery, image, and runtime-switchability shape.
---
## 1. Summary
@@ -122,7 +138,17 @@ caller's `orgRootID()` and return it iff `kind='platform'`. This is the server h
targets by default; no change to `ProxyA2A`. **Authored in the OpenAPI SSOT first**; MCP/CLI/docs
derive from it.
### 5.5 Runtime: two MCPs, config-driven
### 5.5 Runtime: two MCPs, config-driven *(superseded by rfc-platform-mcp-as-plugin)*
> **Superseded by [`rfc-platform-mcp-as-plugin.md`](rfc-platform-mcp-as-plugin.md).** This section
> proposed a dedicated `config.yaml: mcp_servers:` list as the wiring channel for the management MCP.
> That is the redundant/competing path: the management MCP is now delivered as an **MCP plugin**,
> and the **plugin declaration (`config.yaml: plugins:`) is the SSOT** — there is no separate
> `mcp_servers:` list. The plugin carries a runtime-agnostic MCP descriptor; the per-runtime
> **shape adapter** renders it into the runtime's native MCP config (claude `.claude/settings.json`,
> codex `~/.codex/config.toml`, gemini `~/.gemini/settings.json`, hermes `platforms.*`). This also
> drops the hardcoded `runtime: claude-code` below — the platform agent is runtime-switchable
> (claude-code is just the default). The original text is retained for history.
Make the runtime's `mcp_servers` **config-driven** rather than hardcoded:
- `molecule_runtime/config.py`: add `extra_mcp_servers: list[dict]` to `WorkspaceConfig`, read
@@ -149,6 +175,11 @@ env (passed through to the stdio child) — no per-server `env` block needed.
### 5.6 Hosting & provisioning (tenant EC2 container)
> Note: per [`rfc-platform-mcp-as-plugin.md`](rfc-platform-mcp-as-plugin.md), `<platform-agent-image>`
> below is now the **standard runtime image** (claude-code by default, runtime-switchable), not a
> dedicated baked image; the management MCP arrives via the entitlement-gated plugin installed
> post-online, not baked into the image.
In `ec2.go:buildTenantUserDataSM()` add a `start_platform_agent` stage **after** `wait_platform_health`
(the agent registers against `localhost:8080` on boot):
@@ -166,7 +197,17 @@ docker run -d --restart=always --name molecule-platform-agent --network host \
- `--restart=always` provides Docker-level supervision (matches `molecule-tenant`).
- Mirror the block into the redeploy path (`buildRedeployScript`) so existing tenants backfill it.
### 5.7 Image
### 5.7 Image *(superseded by rfc-platform-mcp-as-plugin)*
> **Superseded by [`rfc-platform-mcp-as-plugin.md`](rfc-platform-mcp-as-plugin.md): the dedicated
> `molecule-platform-agent` image is RETIRED.** Because the management MCP now ships as a plugin
> (launched on demand, e.g. `npx -y @molecule-ai/mcp-server`), there is **no baked binary** to bake
> into a special image — the standard runtime image (claude-code by default, or any switchable
> runtime) + the entitlement-gated platform-MCP plugin **is** the concierge. The original
> security hygiene goal ("keep the org-admin MCP out of ordinary workspace images") is now met by
> the **entitlement gate** (the privileged plugin installs only on the org-root `kind=platform`
> concierge, enforced server-side) rather than by image separation. The original text is retained
> for history.
A **dedicated `molecule-platform-agent` image**: `FROM workspace-template-claude-code`, `COPY` the
prebuilt `molecule-mcp-server/dist` + `node_modules` into `/opt/molecule-mcp-server`, and **pin Node
@@ -227,8 +268,11 @@ end-user chat. Mitigations:
- **Approval gate (§5.8)** must ship *with* the agent going user-facing, not after. Until then the
agent is operator-only.
- **Tenant isolation** is unchanged — every reach path still passes `sameOrg()`.
- **MCP not in workspace images** (dedicated image, §5.7); the admin token lives only in the
platform-agent container env on the tenant box.
- **MCP not on ordinary workspaces** — originally via a dedicated image (§5.7); now enforced by the
**entitlement gate** (the privileged management-MCP plugin installs only on the org-root
`kind=platform` concierge — see [`rfc-platform-mcp-as-plugin.md`](rfc-platform-mcp-as-plugin.md) §4).
The admin token lives only in the platform-agent container env on the tenant box and is
*referenced* by the plugin, never embedded.
- **Token rotation:** the MCP reads env once at spawn → rotation = `docker restart
molecule-platform-agent` (runbook item).
- Future: a scoped-down org token (no delete/billing/member) — see §10.
@@ -251,9 +295,14 @@ Phase ordering is the rollout contract:
constants; `Register` accepts/validates `kind` with invariants.
1. **Platform-as-root + resolver** (`molecule-core` + CP): CP pre-seeds the platform row and creates
teams under it; per-org re-parent backfill (after the §8 audit); `GET /registry/platform-agent`.
2. **Config-driven two-MCP runtime** (runtime + claude-code template).
3. **Image + tenant provisioning** (CP + image + `molecule-ci`): dedicated image; `start_platform_agent`
in user-data + redeploy; config via the tenant Secrets Manager bundle; billing knob.
2. **Management MCP via plugin** (runtime + template) — *revised per
[`rfc-platform-mcp-as-plugin.md`](rfc-platform-mcp-as-plugin.md)*: the template declares the
entitlement-gated platform-MCP plugin in `config.yaml: plugins:`; the per-runtime shape adapter
wires it into the runtime's native MCP config post-online. (Was: a config-driven `mcp_servers:`
list, superseded.)
3. **Tenant provisioning** (CP + `molecule-ci`) — *revised*: the **standard runtime image** (no
dedicated `molecule-platform-agent` image); `start_platform_agent` in user-data + redeploy;
identity/config via the template asset channel; billing knob.
4. **Approval gate** (`molecule-core`): policy map + `requireApproval` at destructive handlers; OpenAPI
202 shape.
5. **Dashboard concierge UX** (`molecule-app`): design-first, then build against the resolver.
+42
View File
@@ -106,6 +106,48 @@ built-in `AgentskillsAdaptor` covers the common shape (copy skills to
[plugins_registry](../../workspace/plugins_registry/__init__.py)
for the resolution order.
## MCP-server plugins (the plugin declaration is the SSOT)
A plugin can also carry an **MCP server** rather than (or alongside) skills
and rules. This is how privileged capabilities like the **management /
platform MCP** reach an agent — see
[`rfc-platform-mcp-as-plugin.md`](../design/rfc-platform-mcp-as-plugin.md).
The model is the same two-layer split, applied to MCP:
- **The plugin declaration is the single source of truth.** An agent's MCP
servers come from the plugins it declares (`config.yaml: plugins:`), **not**
from a separate, hand-maintained `mcp_servers:` list. There is one place an
MCP capability is named: the plugin.
- **The plugin ships a runtime-agnostic MCP descriptor** — the logical server
definition (command/args/env, with secrets *referenced*, never embedded),
independent of any one runtime's config file format.
- **A per-runtime shape adapter renders that descriptor into the runtime's
native MCP config.** Each runtime reads MCP servers from a different place,
so the adapter writes the descriptor into the right shape for the active
runtime:
| Runtime | Native MCP config the adapter renders into |
|---|---|
| claude-code | `.claude/settings.json` (`mcpServers` block) |
| codex | `~/.codex/config.toml` |
| gemini | `~/.gemini/settings.json` |
| hermes | `platforms.*` config stanza |
Because the descriptor is runtime-agnostic and the adapter is per-runtime,
the **same MCP plugin works across runtimes** — the agent is
runtime-switchable, and the plugin declaration doesn't change when the
runtime does.
The exact adapter API (class names, function signatures, the registry
resolution order) is owned by the workspace runtime and is being finalized
there — **see the runtime implementation** rather than pinning specifics here.
> Privileged MCP plugins (e.g. the org-admin management MCP) are
> **entitlement-gated**: installable only on the org-root `kind=platform`
> concierge, enforced server-side. See
> [`rfc-platform-mcp-as-plugin.md`](../design/rfc-platform-mcp-as-plugin.md) §4.
## Validator
Run before publishing a plugin:
+11
View File
@@ -165,3 +165,14 @@ layer works. The two are wired together but independent: the source
layer's job ends when plugin files are staged on disk; the shape layer
(per-runtime adapter inside the workspace) decides what to do with them
on workspace startup.
One shape is an **MCP server**. An MCP plugin ships a *runtime-agnostic
MCP descriptor*, and the per-runtime shape adapter renders it into that
runtime's native MCP config (claude `.claude/settings.json`, codex
`~/.codex/config.toml`, gemini `~/.gemini/settings.json`, hermes
`platforms.*`). The plugin declaration (`config.yaml: plugins:`) is the
**single source of truth** for an agent's MCP servers — there is no
separate `mcp_servers:` list. This is how the privileged management /
platform MCP reaches the org concierge; see
[agentskills-compat.md](agentskills-compat.md#mcp-server-plugins-the-plugin-declaration-is-the-ssot)
and [`rfc-platform-mcp-as-plugin.md`](../design/rfc-platform-mcp-as-plugin.md).