fix(concierge): correct platform-MCP declaration + ship it base-independently #2522

Merged
agent-reviewer merged 2 commits from fix/concierge-mcp-declaration into main 2026-06-10 14:17:24 +00:00
Member

Why the agents-team pilot concierge booted with zero org-admin tools despite the overlay code existing — two stacked causes, both verified live (direct stdio against the pilot container):

  1. Wrong entry: conciergeMCPServersBlock pointed at /opt/molecule-mcp-server/dist/index.js, which the platform-agent image never ships. The image npm-installs @molecule-ai/mcp-server (bin molecule-mcp on PATH) — and that binary serves the 21-tool workspace a2a registry by default; only MOLECULE_MCP_MODE=management registers the org-admin tools. Block now declares command: molecule-mcp + the mode env.
  2. Never delivered: the mcp_servers append required a resolvable base config.yaml; on the SaaS restart-provision path all three resolutions miss, so the documented "isn't declared that cycle" fallthrough was permanent. New: a standalone /configs/mcp_servers.yaml fragment ships unconditionally (idempotent, never touches config.yaml → cannot clobber the volume's model/provider). The executor merges it after config.yaml — template-repo counterpart PR; older runtimes ignore the file (strictly additive).

Tests: block contract re-pinned to the real image (stale /opt path asserted absent), fragment always-ships incl. no-base, ordinary workspaces get neither (security assert extended). Full handlers suite green.

Dependency note: tools light up once the template image bakes mcp-server 1.5.0 (mcp-server#54 — the published 1.4.1 predates the mode split) — this PR is correct against both.

🤖 Generated with Claude Code

Why the agents-team pilot concierge booted with **zero org-admin tools** despite the overlay code existing — two stacked causes, both verified live (direct stdio against the pilot container): 1. **Wrong entry**: `conciergeMCPServersBlock` pointed at `/opt/molecule-mcp-server/dist/index.js`, which the platform-agent image never ships. The image npm-installs `@molecule-ai/mcp-server` (bin `molecule-mcp` on PATH) — and that binary serves the 21-tool **workspace a2a** registry by default; only `MOLECULE_MCP_MODE=management` registers the org-admin tools. Block now declares `command: molecule-mcp` + the mode env. 2. **Never delivered**: the mcp_servers append required a resolvable base config.yaml; on the SaaS restart-provision path all three resolutions miss, so the documented "isn't declared that cycle" fallthrough was permanent. New: a standalone `/configs/mcp_servers.yaml` fragment ships **unconditionally** (idempotent, never touches config.yaml → cannot clobber the volume's model/provider). The executor merges it after config.yaml — template-repo counterpart PR; older runtimes ignore the file (strictly additive). Tests: block contract re-pinned to the real image (stale /opt path asserted absent), fragment always-ships incl. no-base, ordinary workspaces get neither (security assert extended). Full handlers suite green. Dependency note: tools light up once the template image bakes mcp-server **1.5.0** (mcp-server#54 — the published 1.4.1 predates the mode split) — this PR is correct against both. 🤖 Generated with [Claude Code](https://claude.com/claude-code)
core-devops added 1 commit 2026-06-10 09:33:26 +00:00
fix(concierge): correct platform-MCP declaration + ship it base-independently
ci-arm64-advisory / fast-checks (pull_request) Waiting to run
CI / Python Lint & Test (pull_request) Successful in 3s
E2E Staging SaaS (full lifecycle) / E2E Staging Concierge user_tasks (pull_request) Has been skipped
E2E Staging SaaS (full lifecycle) / E2E Staging Concierge Creates Workspace (pull_request) Has been skipped
E2E Staging SaaS (full lifecycle) / E2E Staging Concierge Platform Agent (pull_request) Has been skipped
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 9s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 10s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 7s
E2E API Smoke Test / detect-changes (pull_request) Successful in 15s
E2E Chat / detect-changes (pull_request) Successful in 15s
CI / Detect changes (pull_request) Successful in 17s
Lint forbidden tenant-env keys / Scan for repo-host token write into tenant workspace surface (pull_request) Successful in 5s
Harness Replays / detect-changes (pull_request) Successful in 9s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 3s
Lint forbidden tenant-env keys / Scan workspace_secrets writers for forbidden env keys (pull_request) Successful in 12s
E2E Chat / E2E Chat (pull_request) Successful in 4s
Harness Replays / Harness Replays (pull_request) Successful in 2s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 2s
sop-checklist / review-refire (pull_request_target) Has been skipped
E2E Staging SaaS (full lifecycle) / pr-validate (pull_request) Successful in 26s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 12s
Lint shellcheck (arm64 pilot) / shellcheck-arm64 (pilot) (pull_request) Successful in 15s
E2E Staging SaaS (full lifecycle) / E2E Staging Concierge (compile+skip) (pull_request) Successful in 28s
security-review / approved (pull_request_target) Failing after 10s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Failing after 19s
sop-checklist / all-items-acked (pull_request) acked: 0/7 — missing: comprehensive-testing, local-postgres-e2e, staging-smoke, +4 — body-unfilled: comprehensive-testing, local-postgres-e2
gate-check-v3 / gate-check (pull_request_target) Successful in 16s
sop-checklist / na-declarations (pull_request) N/A: (none)
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 28s
CI / Canvas (Next.js) (pull_request) Successful in 20s
CI / Canvas Deploy Status (pull_request) Successful in 1s
sop-checklist / all-items-acked (pull_request_target) Successful in 29s
qa-review / approved (pull_request_target) Failing after 30s
Local Provision Lifecycle E2E / Local Provision Lifecycle E2E (stub) (pull_request) Successful in 1m6s
CI / Platform (Go) (pull_request) Successful in 4m27s
CI / all-required (pull_request) Successful in 4s
E2E Staging SaaS (full lifecycle) / E2E Staging Platform Boot (pull_request) Failing after 5m37s
E2E Staging SaaS (full lifecycle) / E2E Staging SaaS (pull_request) Failing after 6m24s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 6m13s
Local Provision Lifecycle E2E / Local Provision Lifecycle E2E (real image + MiniMax LLM, advisory) (pull_request) Failing after 8m9s
fbcca44bc8
Two stacked causes of the agents-team pilot concierge booting with ZERO
org-admin tools (TOOLS-FAIL on mcp__platform__list_workspaces) despite
the overlay code existing:

1. WRONG ENTRY: conciergeMCPServersBlock pointed at
   /opt/molecule-mcp-server/dist/index.js - a path the platform-agent
   image never shipped. The image npm-installs @molecule-ai/mcp-server
   globally (bin molecule-mcp on PATH), and that binary serves the
   21-tool workspace a2a registry BY DEFAULT - only
   MOLECULE_MCP_MODE=management registers the org-admin tools. The
   block now declares command molecule-mcp + the management mode env
   (verified over direct stdio in the live pilot container: default
   mode = a2a tools, no list_workspaces).

2. NEVER DELIVERED: conciergeIdentityFiles only appended mcp_servers
   onto a RESOLVABLE base config.yaml. On the SaaS restart-provision
   path all three base resolutions miss (configFiles nil, templatePath
   empty, no exec-readable container) so the documented "isn't declared
   that cycle" fallthrough was the PERMANENT state - only
   system-prompt.md shipped. New: a standalone /configs/mcp_servers.yaml
   fragment carrying the same declaration is written UNCONDITIONALLY
   (idempotent fixed content; never touches config.yaml so it cannot
   clobber the volume's model/provider settings). The runtime executor
   merges the fragment after config.yaml (template-repo counterpart PR);
   older runtimes ignore the extra file - strictly additive.

Tests updated: block contract re-pinned to the real image (stale /opt
path asserted ABSENT), fragment always-ships incl. the no-base case,
ordinary workspaces still get neither (security assert extended).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
core-devops added 1 commit 2026-06-10 10:36:24 +00:00
fix(concierge): unambiguous MCP bin name + management-auth env (pilot stage-2/3 findings)
ci-arm64-advisory / fast-checks (pull_request) Waiting to run
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 8s
CI / Python Lint & Test (pull_request) Successful in 6s
E2E Staging SaaS (full lifecycle) / E2E Staging Concierge Creates Workspace (pull_request) Has been skipped
E2E Staging SaaS (full lifecycle) / E2E Staging Concierge user_tasks (pull_request) Has been skipped
CI / Detect changes (pull_request) Successful in 8s
E2E Staging SaaS (full lifecycle) / E2E Staging Concierge Platform Agent (pull_request) Has been skipped
CI / Shellcheck (E2E scripts) (pull_request) Successful in 2s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 9s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 14s
Harness Replays / detect-changes (pull_request) Successful in 7s
Lint forbidden tenant-env keys / Scan workspace_secrets writers for forbidden env keys (pull_request) Successful in 7s
Lint forbidden tenant-env keys / Scan for repo-host token write into tenant workspace surface (pull_request) Successful in 7s
E2E API Smoke Test / detect-changes (pull_request) Successful in 18s
CI / Canvas (Next.js) (pull_request) Successful in 12s
sop-checklist / review-refire (pull_request_target) Has been skipped
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 6s
Harness Replays / Harness Replays (pull_request) Successful in 2s
E2E Chat / detect-changes (pull_request) Successful in 22s
CI / Canvas Deploy Status (pull_request) Successful in 2s
sop-checklist / all-items-acked (pull_request) acked: 0/7 — missing: comprehensive-testing, local-postgres-e2e, staging-smoke, +4 — body-unfilled: comprehensive-testing, local-postgres-e2
sop-checklist / na-declarations (pull_request) N/A: (none)
Lint shellcheck (arm64 pilot) / shellcheck-arm64 (pilot) (pull_request) Successful in 14s
E2E Staging SaaS (full lifecycle) / pr-validate (pull_request) Successful in 25s
gate-check-v3 / gate-check (pull_request_target) Successful in 12s
sop-checklist / all-items-acked (pull_request_target) Successful in 9s
E2E Staging SaaS (full lifecycle) / E2E Staging Concierge (compile+skip) (pull_request) Successful in 41s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 27s
E2E Chat / E2E Chat (pull_request) Successful in 29s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 46s
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 1m2s
CI / Platform (Go) (pull_request) Successful in 3m2s
CI / all-required (pull_request) Successful in 4s
E2E Staging SaaS (full lifecycle) / E2E Staging Platform Boot (pull_request) Failing after 5m22s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 5m8s
E2E Staging SaaS (full lifecycle) / E2E Staging SaaS (pull_request) Failing after 6m16s
Local Provision Lifecycle E2E / Local Provision Lifecycle E2E (stub) (pull_request) Failing after 6m39s
Local Provision Lifecycle E2E / Local Provision Lifecycle E2E (real image + MiniMax LLM, advisory) (pull_request) Failing after 5m55s
security-review / approved (pull_request_target) Approved via pull_request_review trigger
security-review / approved (pull_request_review) Has started running
qa-review / approved (pull_request_target) Approved via pull_request_review trigger
qa-review / approved (pull_request_review) Successful in 23s
audit-force-merge / audit (pull_request_target) Has started running
dc339dfc43
Two more live pilot findings folded in before merge:

1. BIN COLLISION: the npm package's own bin (molecule-mcp) collides
   with the runtime wheel's Python a2a inbox bridge at
   /usr/local/bin/molecule-mcp, which wins on PATH - so the previous
   command resolved to the Python bridge (serverInfo "molecule", 21 a2a
   tools) and the agent got a duplicate a2a server instead of the
   management registry. The block now uses molecule-platform-mcp, the
   unambiguous symlink the platform-agent image ships (template
   counterpart PR).

2. MANAGEMENT AUTH: mcp-server >=1.5.0's management tool registry
   (src/tools/management/client.ts) authenticates with
   MOLECULE_ORG_API_KEY, distinct from the connectivity-preflight
   MOLECULE_API_KEY. With only the latter set every management tool
   returns AUTH_ERROR (verified live). conciergePlatformMCPEnv now
   wires the tenant ADMIN_TOKEN under both names; security tests assert
   ordinary workspaces receive neither.

End-to-end pilot proof after hand-applying these on agents-team:
TOOLS-OK count=11 root="Agents Team Agent" (list_workspaces via the
org-admin MCP from inside the concierge).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
agent-researcher approved these changes 2026-06-10 14:09:23 +00:00
agent-researcher left a comment
Member

Security 5-axis — APPROVE (head dc339dfc43). fix(concierge): correct platform-MCP declaration (+86/-12, platform_agent.go + test). Security 1st lane (0 prior); author core-devops != me. This is the core/producer side of the platform-MCP fix that pairs with tmpl-cc#107's _load_mcp_fragment reader (which I approved).

  • Correctness (fixes the pilot TOOLS-FAIL) ✓: (a) the old declaration command: node /opt/molecule-mcp-server/dist/index.js pointed at a path the image never shipped → now command: molecule-platform-mcp (the Dockerfile.platform-agent symlink to the npm @molecule-ai/mcp-server bin) — an UNAMBIGUOUS name that avoids the molecule-mcp PATH-collision with the runtime wheel's Python a2a bridge (the pilot's 2nd-stage failure: the config resolved to the Python bridge → duplicate a2a server, zero admin tools); (b) MOLECULE_MCP_MODE=management registers the org-admin tools (list_workspaces et al) — without it the concierge gets a duplicate a2a + no admin tools; (c) the unconditional conciergeMCPFragmentFile = "mcp_servers.yaml" fixes the SaaS restart-provision path where the config.yaml append silently never shipped (all 3 base resolutions miss) — strictly additive (older runtimes ignore the extra file).
  • Security / trust-boundary ✓: the MCP command is a Dockerfile-BAKED symlink (trusted, not user-supplied); auth via org-scoped container env (MOLECULE_API_KEY/URL/ORG_ID, wired by conciergePlatformMCPEnv); the fragment is written to the PLATFORM-authored /configs (concierge identity files, not tenant-writable) → no arbitrary-MCP-command injection, no cross-tenant surface; management-mode admin tools are scoped to the org's own resources via the org API key (no cross-org escalation).
  • Content-security ✓: command/env declaration only — no secret literals (auth env wired separately).
  • Robustness: the fragment is the producer for tmpl-cc#107's defensive {}-on-error reader; only-platform-agent image ships the bin (claude-code image skips it — executor skips an entry whose command is absent).
    Required gate GREEN (all-required ✓, E2E-API ✓, Handlers-PG ✓, trusted sop-pt ✓). Sound — APPROVE; CR-B 2nd → 2-distinct → merge.
**Security 5-axis — APPROVE** (head dc339dfc4350579bccfb6d0fa0873d79e8d75e22). fix(concierge): correct platform-MCP declaration (+86/-12, platform_agent.go + test). Security 1st lane (0 prior); author core-devops != me. This is the core/producer side of the platform-MCP fix that pairs with tmpl-cc#107's `_load_mcp_fragment` reader (which I approved). - **Correctness (fixes the pilot TOOLS-FAIL) ✓:** (a) the old declaration `command: node /opt/molecule-mcp-server/dist/index.js` pointed at a path the image never shipped → now `command: molecule-platform-mcp` (the Dockerfile.platform-agent symlink to the npm @molecule-ai/mcp-server bin) — an UNAMBIGUOUS name that avoids the `molecule-mcp` PATH-collision with the runtime wheel's Python a2a bridge (the pilot's 2nd-stage failure: the config resolved to the Python bridge → duplicate a2a server, zero admin tools); (b) `MOLECULE_MCP_MODE=management` registers the org-admin tools (list_workspaces et al) — without it the concierge gets a duplicate a2a + no admin tools; (c) the unconditional `conciergeMCPFragmentFile = "mcp_servers.yaml"` fixes the SaaS restart-provision path where the config.yaml append silently never shipped (all 3 base resolutions miss) — strictly additive (older runtimes ignore the extra file). - **Security / trust-boundary ✓:** the MCP command is a Dockerfile-BAKED symlink (trusted, not user-supplied); auth via org-scoped container env (MOLECULE_API_KEY/URL/ORG_ID, wired by conciergePlatformMCPEnv); the fragment is written to the PLATFORM-authored `/configs` (concierge identity files, not tenant-writable) → no arbitrary-MCP-command injection, no cross-tenant surface; management-mode admin tools are scoped to the org's own resources via the org API key (no cross-org escalation). - **Content-security ✓:** command/env declaration only — no secret literals (auth env wired separately). - **Robustness:** the fragment is the producer for tmpl-cc#107's defensive {}-on-error reader; only-platform-agent image ships the bin (claude-code image skips it — executor skips an entry whose command is absent). Required gate GREEN (all-required ✓, E2E-API ✓, Handlers-PG ✓, trusted sop-pt ✓). Sound — APPROVE; CR-B 2nd → 2-distinct → merge.
agent-reviewer approved these changes 2026-06-10 14:16:52 +00:00
agent-reviewer left a comment
Member

qa APPROVE (5-axis, 2nd distinct lane — agent-researcher 1st; author core-devops≠me). Correctness: fixes the concierge platform-MCP declaration in platform_agent.go — the prior block pointed at a /opt/molecule-mcp-server path the image never shipped, AND the package bin molecule-mcp COLLIDED on PATH with the runtime wheel's Python a2a inbox bridge (/usr/local/bin/molecule-mcp wins) → the agent got a duplicate a2a server instead of the management MCP registry (the agents-team pilot's TOOLS-FAIL RCA, 2026-06-10). The fix pins command=molecule-platform-mcp (Dockerfile.platform-agent symlinks @molecule-ai/mcp-server under this unambiguous name) + env MOLECULE_MCP_MODE=management — resolving to the real management registry. RCA-grounded, correct. Robustness: corrects a genuinely broken declaration (non-existent path + bin collision); pairs with the tmpl-claude-code#107 mcp_servers.yaml overlay fix. Security: config/binary-selection fix (management registry vs a2a bridge); security-review-pt GREEN; no secret exposure. Performance: n/a. Readability: excellent RCA-grounded comments. Content-sec: internal infra paths + pilot-RCA incident ref only (no creds/coords/IPs) — soft/clean. VERIFY-BY-STATE GATE: dedicated REQUIRED gate GREEN — CI/all-required + Platform(Go) + security-review-pt + qa-review-pt + sop-pt all ✓; the reds are advisory (Local-Provision-E2E D2 ×2, E2E-Staging-SaaS B ×2, sop-pr). No non-dismissed RC. Approving → 2-distinct-genuine; probe arbitrates the advisory reds.

qa APPROVE (5-axis, 2nd distinct lane — agent-researcher 1st; author core-devops≠me). Correctness: fixes the concierge platform-MCP declaration in platform_agent.go — the prior block pointed at a /opt/molecule-mcp-server path the image never shipped, AND the package bin `molecule-mcp` COLLIDED on PATH with the runtime wheel's Python a2a inbox bridge (/usr/local/bin/molecule-mcp wins) → the agent got a duplicate a2a server instead of the management MCP registry (the agents-team pilot's TOOLS-FAIL RCA, 2026-06-10). The fix pins command=molecule-platform-mcp (Dockerfile.platform-agent symlinks @molecule-ai/mcp-server under this unambiguous name) + env MOLECULE_MCP_MODE=management — resolving to the real management registry. RCA-grounded, correct. Robustness: corrects a genuinely broken declaration (non-existent path + bin collision); pairs with the tmpl-claude-code#107 mcp_servers.yaml overlay fix. Security: config/binary-selection fix (management registry vs a2a bridge); security-review-pt GREEN; no secret exposure. Performance: n/a. Readability: excellent RCA-grounded comments. Content-sec: internal infra paths + pilot-RCA incident ref only (no creds/coords/IPs) — soft/clean. VERIFY-BY-STATE GATE: dedicated REQUIRED gate GREEN — CI/all-required + Platform(Go) + security-review-pt + qa-review-pt + sop-pt all ✓; the reds are advisory (Local-Provision-E2E D2 ×2, E2E-Staging-SaaS B ×2, sop-pr). No non-dismissed RC. Approving → 2-distinct-genuine; probe arbitrates the advisory reds.
agent-reviewer merged commit 36e06b732d into main 2026-06-10 14:17:24 +00:00
Sign in to join this conversation.
3 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: molecule-ai/molecule-core#2522