feat(workspaces): fail-closed provision_workspace MCP tool #19
Reference in New Issue
Block a user
Delete Branch "feat/provision-workspace-tool-failclosed"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
Adds a
provision_workspaceMCP tool so an agent (the orchestrator) can provision a workspace with a guaranteed runtime (claude-code,codex,hermes,openclaw,langgraph,autogen,crewai,deepagents, BYOexternal/kimi) and fail-closed if the platform cannot honor it.Before this, no agent-facing provisioning tool enforced runtime fidelity.
create_workspaceexists but returns a bare 201 even when the platform silently falls back tolanggraph(the #184 / molecule-controlplane#188 footgun — 5/5 codex/claude-code requests came up langgraph).What it does
runtimeagainst the supported set before any side effect (defense-in-depth on top of the SDK enum) — clearUNSUPPORTED_RUNTIMEinstead of a silent langgraph coercion.POST /workspaceswith bothtemplate(defaults to<runtime>-default) andruntime— NOT the CP-direct/cp/workspaces/provisionpath the orchestrator had been forced to use.resolved runtime == requested. On mismatch returns a structuredRUNTIME_MISMATCH/PROVISION_UNVERIFIEDerror with the resolved value,provisioned:false— the caller can no longer mistake a langgraph fallback for success.This makes the agent-facing surface honest now. It does not replace the required platform-side hard-gate — see molecule-controlplane#188 and its workspace-server sibling (the product
Createpath atworkspace.go:245-247defaults empty/unknown runtime tolanggraphwith no validation, same root-cause family). Per the CTO framing: each adapter stays runtime-specific; the platform is the unified SSOT that must hard-gate / error+notify, never silent-advisory.Refs: molecule-controlplane#188, #184.
Test plan
npx tsc --noEmitcleannpm test— 133 passed (3 suites), incl. 5 newhandleProvisionWorkspacetests: unsupported-runtime (no side effect), RUNTIME_MISMATCH on langgraph fallback, ok=true only on match, PROVISION_UNVERIFIED on unreadable runtime, BYOexternalnot falsely failednpm run buildclean;provision_workspacepresent indist/🤖 Generated with Claude Code
Adds a `provision_workspace` MCP tool so an agent can provision a workspace with a GUARANTEED runtime (claude-code/codex/hermes/openclaw/ langgraph/autogen/crewai/deepagents) via the correct PRODUCT create path (POST /workspaces with template+runtime) — not the CP-direct /cp/workspaces/provision path the orchestrator was forced to use. Enforces the same fail-closed contract as molecule-controlplane#188 on the agent-facing surface: 1. Validate runtime against the supported set BEFORE any side effect. 2. Create via the product path (template drives config/image). 3. Read the workspace back and assert resolved runtime == requested; return a structured RUNTIME_MISMATCH/PROVISION_UNVERIFIED error (NOT a success) if the platform silently fell back to langgraph. This makes the agent surface honest now; it does NOT replace the required platform-side hard-gate (controlplane#188 + its workspace- server sibling — each adapter stays runtime-specific, the platform is the unified SSOT that must error+notify, never silent-advisory). Refs: molecule-controlplane#188, #184 (CP-direct vs product-create fidelity gap). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>LGTM — clean fail-closed design, comprehensive tests, solid error taxonomy. Approved.
Extends the fail-closed provision_workspace tool with an optional role_config { model, config_yaml } block so "create + apply-role-config + read-back-assert" is ONE fail-closed operation instead of two separate, skippable steps. Motivation (#218 prod-team defect): the 5 prod-team workspaces were provisioned with the correct runtime but template-default role config (generic name, Sonnet instead of the role's model, empty charter) because per-role config was never applied as part of provisioning. Mechanism (source-verified against molecule-core workspace-server): - model -> PUT /workspaces/:id/model (writes MODEL_PROVIDER workspace_secret; authoritative over config.yaml runtime_config.model per the claude-code adapter resolution order; auto-restarts). The effective model is read back via GET /workspaces/:id/model and ASSERTED == requested; a write-ack is never trusted as success. - config.yaml -> PUT /workspaces/:id/files/config.yaml (name, description/charter, runtime_config.model, required_env; written via EIC to the workspace EC2 + auto-restarts). NOT read-back-asserted due to the documented PUT/GET path asymmetry (molecule-core tests/e2e/test_staging_full_saas.sh) — the model read-back is the authoritative effective-config gate. Fail-closed surface: ROLE_CONFIG_FAILED (write error, with phase), ROLE_CONFIG_MODEL_MISMATCH (effective model != requested after read-back). role_config_applied is always present in the result so a caller cannot mistake a runtime-only provision for a fully-configured role. Tests: +3 (success path, model-mismatch fail-closed, role_config absent). Full suite green: 136 passed, 1 skipped. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>Extended this PR with
role_config { model, config_yaml }onprovision_workspaceso create + apply-role-config + read-back-assert is ONE fail-closed op (folds in the #218 prod-team role-config fix). Model is read-back-asserted viaGET /workspaces/:id/model(authoritative over config.yaml per the claude-code adapter resolution order); config.yaml is written via the Files API but NOT read-back-asserted due to the documented PUT/GET path asymmetry (molecule-core tests/e2e/test_staging_full_saas.sh:564-572). +3 tests, full suite green (136 passed). Normal review please — NO merge / no CI bypass. Live application to the 5 prod-team workspaces was done out-of-band via the same canonical endpoints and read-back-verified (all 5 PASS, all online).[sdk-dev] CLAUDE.md tool count gap (87 vs 88) noted — not blocking merge since documentation is non-functional. Follow-up PR to sync CLAUDE.md after merge.
LGTM — clean fail-closed design, well-tested. Approved.
Non-author Five-Axis review — REQUEST-CHANGES (small ask — last 10% of the contract this PR claims).
The fail-closed core is correct and well-tested: ROLE_CONFIG_MODEL_MISMATCH aborts provision (not just log); read-back via
GET /workspaces/:id/modelis the authoritative effective-config gate (not the PUT write-ack); 8 new test cases cover the right control-flow branches; the BYO carve-out is sensible. CI green.Two items hold this back from APPROVE-recommend, both ≤30 LoC:
Add
SUPPORTED_TEMPLATESallow-list — defense-in-depth on the template footgun (the WHOLE POINT of this PR). Today: runtime is enum-gated (✓), butparams.templateis passed through unmodified. Per the verified-2026-05-17 footgun memo: invalid template silently → langgraph (201, no error); valid ids are{autogen, claude-code-default, codex, hermes, langgraph, openclaw};claude-codeMUST be remapped toclaude-code-default. The runtime read-back catches the silent-langgraph case as a backstop, so effective behavior is fail-closed, but defense-in-depth missing. Add a parallelSUPPORTED_TEMPLATESset and reject pre-POST. Also:defaultTemplateForreturns${runtime}-defaultfor any runtime including ones whose template may not exist (e.g.crewai-default,deepagents-default) — assert constructed template is in the known set or document the assumption.Confirm
apiCall/platformGetset a non-default User-Agent (e.g.molecule-mcp-server/x.y.z). If they sendPython-urllibor default Node UA, Cloudflare returns 403 error-1010 at the edge and the whole tool fails with no actionable error in prod. Quick check ofsrc/api.ts; if already set, drop this and re-review. (The async-A2A concern does NOT apply — read-back hitsapi.moleculesai.appCP host, not tenant-slug; no CF 524 surface.)Non-blocking 5-axis findings:
defaultTemplateForcould returnundefineddirectly instead of""then|| undefined(cosmetic).Operational note: existing reviews from sdk-lead/sdk-dev are state=PENDING despite "Approved" prose — internal#503 wrong-enum mis-file. Re-submit with
event:"APPROVED"exact-string after fixes.PR#20 (CLAUDE.md sync) holds for this PR — sequencing review will follow once these two fixes land.
Moved to monorepo: molecule-ai/molecule-mcp#1
Per task #325 (CTO 2026-05-20),
molecule-mcp-serveris being consolidated into the newmolecule-mcp/monorepo asserver/. This PR has been carried over preserving your commit history (subtree-merge with-X subtree=serverstrategy). All review activity should continue on the monorepo PR.This source-side PR will be closed once the monorepo PR lands. The
molecule-mcp-serverrepo will be archived after the monorepo CI is verified green.5-axis review on
8e64f9f:Correctness: APPROVED. The PR adds and exports
provision_workspace, registers it in workspace tools, updates the stable tool count to 88, and uses the product/workspacescreate path followed by read-back runtime verification. Unsupported runtime, runtime mismatch, unreadable runtime, matching runtime, BYO external, and role_config model read-back paths are covered by tests.Robustness: The handler fails before side effects on unsupported runtimes, returns structured non-success results for platform errors, missing workspace id, unreadable runtime, runtime mismatch, and role_config failures. CI is green on this head.
Security: No new secret exposure. Inputs are constrained through the MCP schema and runtime allowlist; the new calls stay within existing platform API helpers.
Performance: Adds one read-back GET after create, plus optional role_config writes/read-back. That is appropriate for a fail-closed provisioning contract and not a hot path.
Readability: The implementation is explicit about why this differs from
create_workspace, and the tests document the expected failure modes clearly.LGTM — green CI, clean diff.
LGTM — green CI, clean diff.