diff --git a/docs/agent-runtime/cli-runtime.md b/docs/agent-runtime/cli-runtime.md index a9359b7b..98a65986 100644 --- a/docs/agent-runtime/cli-runtime.md +++ b/docs/agent-runtime/cli-runtime.md @@ -182,6 +182,30 @@ The agent uses these tools naturally — no special instructions needed. Access Example flow: Marketing uses `delegate_task(seo_id, "What is your status?")` → A2A message to SEO → SEO responds → result returned to Marketing. +### Additional MCP servers via plugins (plugin declaration is the SSOT) + +Beyond the always-on `a2a` server, a workspace gains **additional MCP servers +by declaring plugins** — the plugin declaration (`config.yaml: plugins:`) is +the single source of truth for an agent's MCP servers; there is **no separate +hand-maintained `mcp_servers:` list**. + +Conceptually: an **MCP plugin** ships a *runtime-agnostic MCP descriptor* (the +logical server definition, with secrets referenced rather than embedded), and a +**per-runtime shape adapter** renders that descriptor into the active runtime's +native MCP config — for claude-code that is the `.claude/settings.json` +`mcpServers` block read by the SDK; for other runtimes it is their own MCP +config location (codex `~/.codex/config.toml`, gemini `~/.gemini/settings.json`, +hermes `platforms.*`). Because the descriptor is runtime-agnostic, the same MCP +plugin works across runtimes and the agent stays **runtime-switchable**. + +This is the channel by which the org concierge gets its privileged management / +platform MCP (entitlement-gated to the org-root `kind=platform` agent) — see +[`rfc-platform-mcp-as-plugin.md`](../design/rfc-platform-mcp-as-plugin.md) and +[plugins/agentskills-compat.md](../plugins/agentskills-compat.md#mcp-server-plugins-the-plugin-declaration-is-the-ssot). +The concrete adapter API and registry resolution order live in the workspace +runtime — **see the runtime implementation**; they are intentionally not pinned +here. + ### Delegation Error Handling When `delegate_task` receives an error from a child (auth failure, timeout, offline), the MCP server wraps it as a `DELEGATION FAILED` message with instructions for the calling agent to: (1) try a different peer, (2) handle the task itself, or (3) inform the user which peer is unavailable and provide its own best answer. Errors are tagged with a `[A2A_ERROR]` sentinel prefix so they can be reliably distinguished from normal response text. Coordinator prompts and A2A instructions reinforce that agents must never forward raw error messages to the user. diff --git a/docs/design/rfc-decouple-config-skill-delivery.md b/docs/design/rfc-decouple-config-skill-delivery.md index 1ccbf125..f5c64e0c 100644 --- a/docs/design/rfc-decouple-config-skill-delivery.md +++ b/docs/design/rfc-decouple-config-skill-delivery.md @@ -2,7 +2,7 @@ **Status:** Draft **Author:** CEO Assistant (on CTO direction) -**Related:** RCA #2831 (SaaS agents lose config/skills/memory), #2832 (credentials in auto-memory), #2838 (provisioner reconciliation — partial), merged runtime fix #125/#134 (memory re-inject on auto-heal + persistence discipline), seo-template #16 (slash-command format) +**Related:** RCA #2831 (SaaS agents lose config/skills/memory), #2832 (credentials in auto-memory), #2838 (provisioner reconciliation — partial), merged runtime fix #125/#134 (memory re-inject on auto-heal + persistence discipline), seo-template #16 (slash-command format), [`rfc-platform-mcp-as-plugin.md`](rfc-platform-mcp-as-plugin.md) (the concierge **management MCP** moves to the plugin channel — companion to the §10a concierge-identity-as-template fix below) ## 1. Summary @@ -119,6 +119,15 @@ The same "should be a template, not a patch" smell exists for the **org concierg The concierge has an image (`Dockerfile.platform-agent`) but **no template home for its identity** — so its prompt/config/model live as core string literals, exactly like the SEO skill files did. The fix is the same abstraction: make the concierge a **platform-agent template** (prompt/config/model in template files) delivered via this RFC's generic asset channel, and delete the `conciergeSystemPromptTmpl`/`conciergeMCPServersBlock`/`conciergeIdentityFiles` literals from core. The asset channel introduced here is the enabler for removing **both** the SEO patch **and** the concierge hardcoding. +> **Cross-ref: [`rfc-platform-mcp-as-plugin.md`](rfc-platform-mcp-as-plugin.md).** That RFC completes +> this de-hardcoding for the concierge along the **plugin** axis: the `conciergeMCPServersBlock` +> management-MCP wiring moves out of core into an **entitlement-gated MCP plugin** declared by the +> platform-agent template (`config.yaml: plugins:` is the SSOT). It also **retires the +> `Dockerfile.platform-agent` baked image** (the standard runtime image + the plugin is the +> concierge) and makes the platform agent **runtime-switchable** (no hardcoded `runtime: claude-code`). +> In short: this RFC's asset channel carries the small concierge **identity** (config/prompts); +> the plugin channel carries the concierge **capability** (the management MCP). + **Audit scope notes:** per-runtime branches in core (e.g. `if runtime == "hermes"` for provision-timeout/config paths) are adapter/registry concerns, not per-template patches — lower priority, candidates for data-driven cleanup but not in this RFC. No plugin-behavior was found hardcoded in core (the plugin system is used for extensions). The two clear "should be a template" patches are: (1) SEO skill package, (2) concierge identity. ## 10. What we keep diff --git a/docs/design/rfc-platform-agent.md b/docs/design/rfc-platform-agent.md index 248c9ffa..6bcc87f7 100644 --- a/docs/design/rfc-platform-agent.md +++ b/docs/design/rfc-platform-agent.md @@ -6,6 +6,22 @@ **This document is the single source of truth (SSOT) for the feature.** Code, OpenAPI, the platform MCP, and end-user docs reconcile to this RFC — not to each other. +> **Superseded in part by [`rfc-platform-mcp-as-plugin.md`](rfc-platform-mcp-as-plugin.md).** +> The conceptual model in this RFC (platform agent as the org root, `kind` discriminator, +> default-target resolver, approval gate, billing/model parity) still stands. What has changed is +> the **delivery mechanism for the management MCP and the concierge identity**: +> - The management MCP is now delivered as an **entitlement-gated MCP plugin** (the plugin +> declaration in `config.yaml: plugins:` is the SSOT), **not** via a `config.yaml: mcp_servers:` +> list (§5.5) and **not** via a dedicated baked image (§5.7). +> - The concierge persona/config/model is a **platform-agent template** (see +> [`rfc-decouple-config-skill-delivery.md`](rfc-decouple-config-skill-delivery.md) §10a), +> not core string literals or a baked image. +> - The platform agent is **runtime-switchable** (claude-code is the default, not a hard +> requirement); the baked `molecule-platform-agent` image is **retired**. +> +> Sections below tagged *(superseded by rfc-platform-mcp-as-plugin)* are retained for history; +> defer to that RFC for the MCP-delivery, image, and runtime-switchability shape. + --- ## 1. Summary @@ -122,7 +138,17 @@ caller's `orgRootID()` and return it iff `kind='platform'`. This is the server h targets by default; no change to `ProxyA2A`. **Authored in the OpenAPI SSOT first**; MCP/CLI/docs derive from it. -### 5.5 Runtime: two MCPs, config-driven +### 5.5 Runtime: two MCPs, config-driven *(superseded by rfc-platform-mcp-as-plugin)* + +> **Superseded by [`rfc-platform-mcp-as-plugin.md`](rfc-platform-mcp-as-plugin.md).** This section +> proposed a dedicated `config.yaml: mcp_servers:` list as the wiring channel for the management MCP. +> That is the redundant/competing path: the management MCP is now delivered as an **MCP plugin**, +> and the **plugin declaration (`config.yaml: plugins:`) is the SSOT** — there is no separate +> `mcp_servers:` list. The plugin carries a runtime-agnostic MCP descriptor; the per-runtime +> **shape adapter** renders it into the runtime's native MCP config (claude `.claude/settings.json`, +> codex `~/.codex/config.toml`, gemini `~/.gemini/settings.json`, hermes `platforms.*`). This also +> drops the hardcoded `runtime: claude-code` below — the platform agent is runtime-switchable +> (claude-code is just the default). The original text is retained for history. Make the runtime's `mcp_servers` **config-driven** rather than hardcoded: - `molecule_runtime/config.py`: add `extra_mcp_servers: list[dict]` to `WorkspaceConfig`, read @@ -149,6 +175,11 @@ env (passed through to the stdio child) — no per-server `env` block needed. ### 5.6 Hosting & provisioning (tenant EC2 container) +> Note: per [`rfc-platform-mcp-as-plugin.md`](rfc-platform-mcp-as-plugin.md), `` +> below is now the **standard runtime image** (claude-code by default, runtime-switchable), not a +> dedicated baked image; the management MCP arrives via the entitlement-gated plugin installed +> post-online, not baked into the image. + In `ec2.go:buildTenantUserDataSM()` add a `start_platform_agent` stage **after** `wait_platform_health` (the agent registers against `localhost:8080` on boot): @@ -166,7 +197,17 @@ docker run -d --restart=always --name molecule-platform-agent --network host \ - `--restart=always` provides Docker-level supervision (matches `molecule-tenant`). - Mirror the block into the redeploy path (`buildRedeployScript`) so existing tenants backfill it. -### 5.7 Image +### 5.7 Image *(superseded by rfc-platform-mcp-as-plugin)* + +> **Superseded by [`rfc-platform-mcp-as-plugin.md`](rfc-platform-mcp-as-plugin.md): the dedicated +> `molecule-platform-agent` image is RETIRED.** Because the management MCP now ships as a plugin +> (launched on demand, e.g. `npx -y @molecule-ai/mcp-server`), there is **no baked binary** to bake +> into a special image — the standard runtime image (claude-code by default, or any switchable +> runtime) + the entitlement-gated platform-MCP plugin **is** the concierge. The original +> security hygiene goal ("keep the org-admin MCP out of ordinary workspace images") is now met by +> the **entitlement gate** (the privileged plugin installs only on the org-root `kind=platform` +> concierge, enforced server-side) rather than by image separation. The original text is retained +> for history. A **dedicated `molecule-platform-agent` image**: `FROM workspace-template-claude-code`, `COPY` the prebuilt `molecule-mcp-server/dist` + `node_modules` into `/opt/molecule-mcp-server`, and **pin Node @@ -227,8 +268,11 @@ end-user chat. Mitigations: - **Approval gate (§5.8)** must ship *with* the agent going user-facing, not after. Until then the agent is operator-only. - **Tenant isolation** is unchanged — every reach path still passes `sameOrg()`. -- **MCP not in workspace images** (dedicated image, §5.7); the admin token lives only in the - platform-agent container env on the tenant box. +- **MCP not on ordinary workspaces** — originally via a dedicated image (§5.7); now enforced by the + **entitlement gate** (the privileged management-MCP plugin installs only on the org-root + `kind=platform` concierge — see [`rfc-platform-mcp-as-plugin.md`](rfc-platform-mcp-as-plugin.md) §4). + The admin token lives only in the platform-agent container env on the tenant box and is + *referenced* by the plugin, never embedded. - **Token rotation:** the MCP reads env once at spawn → rotation = `docker restart molecule-platform-agent` (runbook item). - Future: a scoped-down org token (no delete/billing/member) — see §10. @@ -251,9 +295,14 @@ Phase ordering is the rollout contract: constants; `Register` accepts/validates `kind` with invariants. 1. **Platform-as-root + resolver** (`molecule-core` + CP): CP pre-seeds the platform row and creates teams under it; per-org re-parent backfill (after the §8 audit); `GET /registry/platform-agent`. -2. **Config-driven two-MCP runtime** (runtime + claude-code template). -3. **Image + tenant provisioning** (CP + image + `molecule-ci`): dedicated image; `start_platform_agent` - in user-data + redeploy; config via the tenant Secrets Manager bundle; billing knob. +2. **Management MCP via plugin** (runtime + template) — *revised per + [`rfc-platform-mcp-as-plugin.md`](rfc-platform-mcp-as-plugin.md)*: the template declares the + entitlement-gated platform-MCP plugin in `config.yaml: plugins:`; the per-runtime shape adapter + wires it into the runtime's native MCP config post-online. (Was: a config-driven `mcp_servers:` + list, superseded.) +3. **Tenant provisioning** (CP + `molecule-ci`) — *revised*: the **standard runtime image** (no + dedicated `molecule-platform-agent` image); `start_platform_agent` in user-data + redeploy; + identity/config via the template asset channel; billing knob. 4. **Approval gate** (`molecule-core`): policy map + `requireApproval` at destructive handlers; OpenAPI 202 shape. 5. **Dashboard concierge UX** (`molecule-app`): design-first, then build against the resolver. diff --git a/docs/plugins/agentskills-compat.md b/docs/plugins/agentskills-compat.md index 9f935c6a..fe717617 100644 --- a/docs/plugins/agentskills-compat.md +++ b/docs/plugins/agentskills-compat.md @@ -106,6 +106,48 @@ built-in `AgentskillsAdaptor` covers the common shape (copy skills to [plugins_registry](../../workspace/plugins_registry/__init__.py) for the resolution order. +## MCP-server plugins (the plugin declaration is the SSOT) + +A plugin can also carry an **MCP server** rather than (or alongside) skills +and rules. This is how privileged capabilities like the **management / +platform MCP** reach an agent — see +[`rfc-platform-mcp-as-plugin.md`](../design/rfc-platform-mcp-as-plugin.md). + +The model is the same two-layer split, applied to MCP: + +- **The plugin declaration is the single source of truth.** An agent's MCP + servers come from the plugins it declares (`config.yaml: plugins:`), **not** + from a separate, hand-maintained `mcp_servers:` list. There is one place an + MCP capability is named: the plugin. +- **The plugin ships a runtime-agnostic MCP descriptor** — the logical server + definition (command/args/env, with secrets *referenced*, never embedded), + independent of any one runtime's config file format. +- **A per-runtime shape adapter renders that descriptor into the runtime's + native MCP config.** Each runtime reads MCP servers from a different place, + so the adapter writes the descriptor into the right shape for the active + runtime: + + | Runtime | Native MCP config the adapter renders into | + |---|---| + | claude-code | `.claude/settings.json` (`mcpServers` block) | + | codex | `~/.codex/config.toml` | + | gemini | `~/.gemini/settings.json` | + | hermes | `platforms.*` config stanza | + + Because the descriptor is runtime-agnostic and the adapter is per-runtime, + the **same MCP plugin works across runtimes** — the agent is + runtime-switchable, and the plugin declaration doesn't change when the + runtime does. + +The exact adapter API (class names, function signatures, the registry +resolution order) is owned by the workspace runtime and is being finalized +there — **see the runtime implementation** rather than pinning specifics here. + +> Privileged MCP plugins (e.g. the org-admin management MCP) are +> **entitlement-gated**: installable only on the org-root `kind=platform` +> concierge, enforced server-side. See +> [`rfc-platform-mcp-as-plugin.md`](../design/rfc-platform-mcp-as-plugin.md) §4. + ## Validator Run before publishing a plugin: diff --git a/docs/plugins/sources.md b/docs/plugins/sources.md index 820e47b5..daed5c30 100644 --- a/docs/plugins/sources.md +++ b/docs/plugins/sources.md @@ -165,3 +165,14 @@ layer works. The two are wired together but independent: the source layer's job ends when plugin files are staged on disk; the shape layer (per-runtime adapter inside the workspace) decides what to do with them on workspace startup. + +One shape is an **MCP server**. An MCP plugin ships a *runtime-agnostic +MCP descriptor*, and the per-runtime shape adapter renders it into that +runtime's native MCP config (claude `.claude/settings.json`, codex +`~/.codex/config.toml`, gemini `~/.gemini/settings.json`, hermes +`platforms.*`). The plugin declaration (`config.yaml: plugins:`) is the +**single source of truth** for an agent's MCP servers — there is no +separate `mcp_servers:` list. This is how the privileged management / +platform MCP reaches the org concierge; see +[agentskills-compat.md](agentskills-compat.md#mcp-server-plugins-the-plugin-declaration-is-the-ssot) +and [`rfc-platform-mcp-as-plugin.md`](../design/rfc-platform-mcp-as-plugin.md).