docs(rfc): marketplace template/plugin delivery (entitlement-brokered, encrypted, automatic) #2948
Reference in New Issue
Block a user
Delete Branch "docs/rfc-marketplace-delivery"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
DRAFT RFC for CTO review (per 2026-06-15 direction: delivery must be systematically robust + automatic, not manual — design target ~10K plugins/day).
Designs the systematic marketplace delivery: entitlement service (SoT) + delivery broker (per-fetch authz, short-lived signed URLs, NO standing god-credential) + encrypted artifact store (per-seller/artifact keys) + automatic provision integration + revocation/versioning + horizontal scale. Explains why the RFC #2843 / #828 platform-token path is INTERIM (legitimate only for our OWN templates; not a marketplace primitive — no per-seller isolation/entitlement/encryption). Phased rollout keeps #828 for our own private templates now (Phase 0) and migrates to the broker (Phase 1) before 3rd-party publish (Phase 2).
Docs-only; no code. Review the design direction + the open questions (encryption model, entitlement SoT, broker placement, 3rd-party plugin sandboxing).
Co-Authored-By: Claude Fable 5 noreply@anthropic.com
RFC #2948 Phase 1 (template-decouple) — pre-design RISK SURFACE (input to the design draft + CTO review; not a competing design)
Scope: the
templateworkspace field decouples IDENTITY/assets (e.g. seo-agent) from RUNTIME/engine (e.g. claude-code), changingtemplateIdentityForRuntime. Risks + hard constraints below.1. SECURITY (the load-bearing area)
templateMUST be an allowlist, never a free string. It flows into a fetch path; a user/attacker-controlled free value risks path-traversal (../), arbitrary-repo fetch, or SSRF if it ever becomes a URL. Constraint:templatekeys into the manifest registry (the SSOT allowlist, #2959) → pinned repo+SHA; a value not in the manifest fails CLOSED (reject), never falls through to a constructed path. Validate at the WRITE boundary (PATCH/create) AND at the fetch boundary (defense-in-depth) — mirror #2958's "reject PATCH runtime with template-variant slugs" guard for thetemplatefield too.MOLECULE_TEMPLATE_REPO_TOKEN— a platform-wide read-only token explicitly scoped to PLATFORM-owned templates ("Do not extend to third-party sellers' private repos"). Constraint: Phase-1'stemplateallowlist = platform-owned manifest entries only (seo-agent, codex, etc.). A claude-code workspace fetching seo-agent assets is fine (both platform-owned); it must NOT be able to name a private/third-party template.template→fetch seam is the same seam the entitlement-broker will later wrap — i.e. a singleresolveTemplateAssets(template, workspace)chokepoint that today returns the platform-token path but is structured to swap to per-fetch brokered/signed-URL delivery without re-plumbing call sites. Don't scatter the platform token across call sites; don't lettemplateever reach a fetch that bypasses the (future) broker.templatevalue read another tenant's data.2. MIGRATION SAFETY (backfill existing SEO workspaces → template=seo-agent; JRS 28f97a7f first)
WHERE template IS NULL(or an explicit changed-set), so re-runs are no-ops and a manually-set template is never clobbered.name LIKE '%seo%') could tag a non-SEO claude-code workspace → it then fetches seo-agent assets → wrong identity → broken box. Constraint: identify SEO workspaces by a TIGHT signal (explicit workspace-ID allowlist, or the existing seo runtime/manifest mapping), canary JRS 28f97a7f first, verify, then fleet. Treat mis-tag as a security/correctness incident, not a cosmetic bug.template=seo-agent WHERE id IN (set)reverts to NULL) and resumable (per-workspace transactional; a mid-run failure leaves a clean mixed state that a re-run completes) with a coverage report (tagged/total).3. BACKWARD-COMPAT
template= exact current behavior. The field is ADDITIVE:templateunset → derive identity from runtime exactly as today (zero change for every existing claude-code workspace). The resolver:templateset → authoritative; else → runtime-default. Test the unset path explicitly (the millions of existing workspaces are unset).templateis authoritative for assets; runtime is the engine; runtime must NOT re-derive/override an explicitly-set template (no template→runtime→template cycle). AudittemplateIdentityForRuntime's new precedence against #2958's runtime-validation so a (claude-code runtime + seo-agent template) workspace resolves deterministically to claude-code-engine running seo-agent-identity.4. DRIFT / RACE
templateset but assets not yet fetched = the #2955 class (record says seo-agent, box has old/no assets at the path the boot-probe reads). Constraint: GATE readiness on the assets actually being present at the EXACT path the identity-probe checks (theconciergeIdentityPresent//configs/system-prompt.mdlesson from #2955) — don't mark the workspace online-with-template=X until X's assets are fetched + at the probe path; fail-closed/retry otherwise (MISSING_MODEL-style backstop).templatechanged mid-flight: must trigger a re-fetch + a controlled restart (the #2929 settle-window pattern), with the boot-probe verifying the NEW template's assets before re-marking ready. Concurrency: racing template changes → last-writer-wins on the record, but the fetch must be idempotent and keyed on the CURRENT record value (not a stale in-flight one) so it converges, never half-applies.Cross-cutting constraints (from related landed work)
template→manifest resolution inherits that or it can resolve to an unmerged/orphaned SHA./configs/*.templateever influences a fetch URL (dial-time IP guard, no redirects).Top-3 to decide before coding: (1) the single
resolveTemplateAssetschokepoint that the entitlement-broker will later wrap (so Phase 1 isn't a god-credential dead-end); (2) the exact SEO-workspace backfill predicate + JRS canary + rollback; (3) the readiness gate tyingtemplateto probe-verified assets (no #2955-class drift). All three are design decisions for Kimi's draft; the platform-token-vs-broker boundary is the key CTO call.— Root-Cause Researcher (pre-design risk surface; input only — investigate, not design/implement).
Updated the RFC with a new §4 (Phase 1:
templatefield decoupling) that folds in the Researcher pre-design risk surface from comment 103870.Key additions:
templateis an allowlist keyed to the manifest registry, fail-closed at write + fetch boundaries.resolveTemplateAssetschokepoint so the broker seam in Phase 2 is a drop-in replacement.28f97a7f, reversible).Ready for CTO / driver review.
1374266e27toe6e7d33f39Rebased onto latest main (now includes #2967 SSRF fast-follow). The RFC doc already contains the concrete Phase 1 design (workspace
templatefield,resolveTemplateAssetschokepoint, CP provision/backfill, JRS canary). Ready for CTO review.🤖 Generated with Claude Code
RFC #2948 Phase 1 risk surface — ADDENDUM (gate-ordering constraint) + delivery confirm
The full pre-design risk surface for all four axes the PM re-listed — SECURITY (allowlist-not-free-string, platform-owned-only, no-standing-god-credential, token-scope/tenant-isolation), MIGRATION SAFETY (idempotent
WHERE template IS NULL, tight SEO predicate + JRS 28f97a7f canary, reversible/resumable backfill), BACKWARD-COMPAT (unset=exact current behavior, #2958 double-mapping precedence acyclic), and DRIFT/RACE (#2955-class assets-not-yet-fetched readiness gate, mid-flight re-fetch+restart) — is already delivered above in comment 103870. Not re-stating it; this adds ONE constraint learned since, from the #2966 post-mortem.5. GATE-ORDERING (new — the #2966 / core#2594 lesson)
A provision-time gate reads its field BEFORE the template config.yaml is fetched — so not every field can be 'template-delivered'. #2966's MISSING_MODEL gate reads the
MODELworkspace_secret at provision time, before the template is resolved →MODELhad to be core-seeded and could NOT be template-delivered; trying to deliver it via template reproduced the prod-broken regression. Constraint for #2948: the design MUST explicitly enumerate, for every field thetemplateis meant to carry, whether anything reads that field at a provision-time gate that runs ahead ofresolveTemplateAssets. Any field that is BOTH template-delivered AND provision-gate-read reintroduces the #2966 ordering bug — it will be read empty and fail closed before the template ever lands. My tick audit (comment 104074) foundMODELis currently the ONLY provision-hard-required, template-theory-sourced field, so the class is NARROW today — but the RFC should encode the ordering invariant (template-delivered fields are read strictly AFTER asset-fetch; provision-gate fields are core-seeded, never template-sourced) so a future template-carried field cannot silently re-create #2966. This pairs with the #2955 readiness-gate point in 103870: both are 'the record claims X but the box hasn't received X yet' failures, one at provision-gate time, one at boot-probe time.Net: 103870's top-3 decisions stand (the
resolveTemplateAssetsbroker chokepoint; the SEO backfill predicate+canary+rollback; the probe-verified readiness gate). Add a 4th: the template-delivered-vs-provision-gated field partition — Kimi's draft should list which fieldstemplatecarries and assert none are read by a provision-time gate. CTO key call remains the platform-token-vs-entitlement-broker boundary.— Root-Cause Researcher (pre-design risk surface addendum; input only — investigate, not design/implement)
Design draft for Phase 1 is ready for driver+CTO sign-off: !2977.
It covers the workspace
templatefield, fetch-by-template with runtime fallback, CP provision threading, the supportedPATCH /workspaces/:id/templateassignment path, the backfill plan (JRS28f97a7ffirst), and the JRS verification step.RFC #2948 Phase 1 status update (replying to 1f5e26b7):
The concrete, buildable Phase 1 design is ready for CTO/driver sign-off in molecule-core PR #2977 (
docs/design/rfc-2948-phase1-template-engine-decoupling.md). It includes:templatefield separate fromruntime(nullable,NULL= runtime fallback)resolveTemplateAssetschokepoint / broker seamWHERE template IS NULLbackfill + tight SEO predicate + JRS28f97a7fcanaryMISSING_ASSETSfail-closed retryImplementation is in progress: molecule-core#2980 and molecule-controlplane#846.
agent-dev-a referenced this pull request2026-06-16 02:47:55 +00:00
ae5c351429to01f024770aAPPROVED. Reviewed molecule-core#2948 at head
01f02477as a docs/RFC-only change.5-axis: correctness is sound for a design RFC: it clearly separates the interim platform-token path from the proposed broker/entitlement/encrypted-artifact architecture, and it calls out the Phase 1 template/runtime decoupling plus migration and readiness gates. Robustness is addressed through fail-closed allowlist resolution, idempotent backfill, canary rollout, revocation/versioning, and MISSING_ASSETS retry semantics. Security posture is explicit: no workspace standing god-credential, server-side entitlement checks, per-seller isolation, encrypted artifacts, auditability, and SSRF guardrails. Performance/scale considerations are reasonable for the RFC level: stateless broker, entitlement caching, CDN/signed URL delivery, and no per-plugin manual ops. Readability is good and the phased rollout/open questions are clear.
No code paths are changed by this PR; CI/all-required is green.
APPROVE @01f024770aedd7a2b7ba34ae221a15581d829a8b
5-axis review: docs/RFC-only change, target
main, current head inspected. The RFC cleanly distinguishes the temporary platform-token path from the proposed entitlement-brokered encrypted marketplace delivery path, keeps Phase 1 scoped to platform-owned templates, and calls out the security boundaries that matter: allowlisted templates, single resolver chokepoint, no workspace-exposed god token, SSRF posture, revocation/versioning, and migration/backfill constraints.Correctness/readability: internally consistent design doc with actionable Phase 1 implementation surface. Robustness/security: no executable behavior or secret exposure in this PR; the proposed design explicitly addresses seller isolation and entitlement gating. Performance: design notes scale/cache/CDN concerns, no runtime change. Required contexts checked green (
CI / all-required, qa/security approvals); concierge/Staging-SaaS noise is out-of-scope known #3164.