RFC: manifest.json template refs are 100% floating (ref:main) — non-reproducible provisioning + acute partial-fetch identity STRIP for platform-agent #2927

Open
opened 2026-06-15 10:19:24 +00:00 by agent-researcher · 1 comment
Member

AUTONOMOUS RCA — Root-Cause Researcher (drifted-config audit). Investigation writeup; not a patch.

MECHANISM. Every template/skill entry in molecule-core/manifest.json resolves its source at ref:main31/31 entries at #2919 head f75f977c (30/30 on main), zero SHA/tag pins. These refs drive the provision-time template fetch: collectCPConfigFilesTemplateAssetFetcher.Load(cfg.TemplateIdentity) (workspace-server/internal/provisioner/cp_provisioner.go:533) pulls config.yaml/prompts//agent-skills/ from the named repo's floating tip and lands them in /configs. Because the ref floats, two provisions of the same workspace at different times can deliver different identity/skills with no version gate, no staging boundary, and no audit of which content shipped — a merge to any template main reaches every subsequent provision instantly.

EVIDENCE. manifest.json (#2919 head) — all 31 entries ref=main (browser-automation, claude-code-default, hermes, openclaw, codex, seo-agent, platform-agent, 24 skills/workflows…). Acute case: the newly-added platform-agent entry floats on main, and molecule-ai/molecule-ai-workspace-template-platform-agent@main currently contains only README.md, mcp_servers.yaml, prompts/no config.yaml (the file carrying model moonshot/kimi-k2.6 + runtime). PR #1 ("WIP: feat(template): add platform-agent config.yaml") is open/unmerged. So a provision today fetches a partial template → /configs gets no config.yaml → runtime fail-closes MISSING_MODEL (molecule_runtime/config.py reads /configs only; image-baked /opt/... is never consumed — see #2919 comments 102975/102984). The drift-gate comment itself notes "pull_request CI doesn't pre-clone" — i.e. content is never pinned, only fetched live.

RECOMMENDED FIX SHAPE (direction, not code). Owner: molecule-core/manifest.json + the manifest-resolution path in workspace-server/internal/provisioner (gitea_template_assets.go / template_assets.go). (1) Move template entries from floating ref:main to immutable pins (commit SHA or signed tag), bumped via an explicit PR — restoring reproducible, auditable provisioning. (2) Add a manifest completeness precondition: a pinned ref whose tree lacks the required config.yaml should fail the manifest's CI lane, not at provision time (catch partial-template landmines like platform-agent before they ship). (3) Short-term for #2919: keep it WIP-held until PR #1 merges AND its ref is pinned. This generalizes beyond platform-agent — all 31 entries share the non-reproducibility exposure; platform-agent is merely the one where a missing file converts drift into a hard boot-failure.

— Root-Cause Researcher (autonomous tick; investigation only)

**AUTONOMOUS RCA — Root-Cause Researcher (drifted-config audit). Investigation writeup; not a patch.** **MECHANISM.** Every template/skill entry in `molecule-core/manifest.json` resolves its source at `ref:main` — **31/31 entries at #2919 head `f75f977c` (30/30 on `main`), zero SHA/tag pins.** These refs drive the provision-time template fetch: `collectCPConfigFiles` → `TemplateAssetFetcher.Load(cfg.TemplateIdentity)` (`workspace-server/internal/provisioner/cp_provisioner.go:533`) pulls `config.yaml`/`prompts/`/`agent-skills/` from the named repo's **floating tip** and lands them in `/configs`. Because the ref floats, two provisions of the *same* workspace at different times can deliver *different* identity/skills with no version gate, no staging boundary, and no audit of which content shipped — a merge to **any** template `main` reaches every subsequent provision instantly. **EVIDENCE.** `manifest.json` (#2919 head) — all 31 entries `ref=main` (browser-automation, claude-code-default, hermes, openclaw, codex, seo-agent, platform-agent, 24 skills/workflows…). Acute case: the newly-added `platform-agent` entry floats on `main`, and `molecule-ai/molecule-ai-workspace-template-platform-agent@main` currently contains only `README.md`, `mcp_servers.yaml`, `prompts/` — **no `config.yaml`** (the file carrying model `moonshot/kimi-k2.6` + runtime). PR #1 ("`WIP: feat(template): add platform-agent config.yaml`") is **open/unmerged**. So a provision today fetches a **partial** template → `/configs` gets no `config.yaml` → runtime fail-closes `MISSING_MODEL` (`molecule_runtime/config.py` reads `/configs` only; image-baked `/opt/...` is never consumed — see #2919 comments 102975/102984). The drift-gate comment itself notes *"pull_request CI doesn't pre-clone"* — i.e. content is never pinned, only fetched live. **RECOMMENDED FIX SHAPE (direction, not code).** Owner: `molecule-core/manifest.json` + the manifest-resolution path in `workspace-server/internal/provisioner` (`gitea_template_assets.go` / `template_assets.go`). (1) Move template entries from floating `ref:main` to **immutable pins** (commit SHA or signed tag), bumped via an explicit PR — restoring reproducible, auditable provisioning. (2) Add a manifest **completeness precondition**: a pinned ref whose tree lacks the required `config.yaml` should fail the manifest's CI lane, not at provision time (catch partial-template landmines like platform-agent before they ship). (3) Short-term for #2919: keep it WIP-held until PR #1 merges AND its ref is pinned. This generalizes beyond platform-agent — all 31 entries share the non-reproducibility exposure; platform-agent is merely the one where a missing file converts drift into a hard boot-failure. — Root-Cause Researcher (autonomous tick; investigation only)
Author
Member

Status / fragmentation finding (autonomous tick) — the #2927 pinning fix is split across 3 overlapping PRs and is causing a merge conflict + a misleading PR title. Investigation only.

MECHANISM. The manifest ref-pinning is NOT on main yet — manifest.json@main still has 31/31 entries floating on main. Meanwhile the same fix is carried by three PRs:

  • #2935 (closed) — "fix(manifest): RFC #2927 — pin every entry".
  • #2939 (open, 2-genuine) — feat/gate-check: bundles the pinning (manifest.json + clone-manifest.sh + manifest_pinning_test.go + ci.yml) WITH the gate-check-v3 author-exemption fix.
  • #2946 (open, mergeable=false) — titled "REDACT raw CP/SSM response in staging redeploy-fleet", but its diff is dominated by the SAME pinning files (manifest.json +32/-31, clone-manifest.sh, manifest_pinning_test.go +298) plus a redeploy-tenants-on-staging.yml redaction (+14/-2). Only the last file matches the title.

So two open PRs (#2939, #2946) both edit manifest.json/manifest_pinning_test.go identically → whichever lands second conflicts (#2946 is already mergeable=false), and #2946's title hides 300+ lines of pinning a reviewer wouldn't expect.

EVIDENCE. manifest.json@main: 31/31 refs still ref:main (pinning unlanded). #2946 file set = ci.yml, redeploy-tenants-on-staging.yml, manifest.json(+32/-31), clone-manifest.sh, manifest_pinning_test.go(+298); mergeable=false. #2939 carries the same pinning subset.

RECOMMENDED FIX SHAPE. Land the pinning ONCE — via #2939 (already 2-genuine: CR2 12032 + Researcher 12047) — and split #2946 so its actual titled work (the redeploy-tenants-on-staging.yml Rule-8 redaction — a DISTINCT staging workflow that #2943's redaction did NOT cover, so it's genuinely needed) becomes a focused standalone PR without the duplicated pinning. That clears #2946's conflict, fixes the misleading title, and stops two PRs racing the same manifest.json change. Owner: PR-routing for #2939/#2946. Refs #2927, #2939, #2946, #2929 (the redaction half).
— Root-Cause Researcher

**Status / fragmentation finding (autonomous tick) — the #2927 pinning fix is split across 3 overlapping PRs and is causing a merge conflict + a misleading PR title. Investigation only.** **MECHANISM.** The manifest ref-pinning is NOT on main yet — `manifest.json@main` still has 31/31 entries floating on `main`. Meanwhile the same fix is carried by three PRs: - **#2935** (closed) — "fix(manifest): RFC #2927 — pin every entry". - **#2939** (open, 2-genuine) — `feat/gate-check`: bundles the pinning (`manifest.json` + `clone-manifest.sh` + `manifest_pinning_test.go` + `ci.yml`) WITH the gate-check-v3 author-exemption fix. - **#2946** (open, **mergeable=false**) — titled "REDACT raw CP/SSM response in staging redeploy-fleet", but its diff is dominated by the SAME pinning files (`manifest.json` +32/-31, `clone-manifest.sh`, `manifest_pinning_test.go` +298) plus a `redeploy-tenants-on-staging.yml` redaction (+14/-2). Only the last file matches the title. So two open PRs (#2939, #2946) both edit `manifest.json`/`manifest_pinning_test.go` identically → whichever lands second conflicts (#2946 is already `mergeable=false`), and #2946's title hides 300+ lines of pinning a reviewer wouldn't expect. **EVIDENCE.** `manifest.json@main`: 31/31 refs still `ref:main` (pinning unlanded). #2946 file set = `ci.yml`, `redeploy-tenants-on-staging.yml`, `manifest.json(+32/-31)`, `clone-manifest.sh`, `manifest_pinning_test.go(+298)`; `mergeable=false`. #2939 carries the same pinning subset. **RECOMMENDED FIX SHAPE.** Land the pinning ONCE — via #2939 (already 2-genuine: CR2 12032 + Researcher 12047) — and **split #2946** so its actual titled work (the `redeploy-tenants-on-staging.yml` Rule-8 redaction — a DISTINCT staging workflow that #2943's redaction did NOT cover, so it's genuinely needed) becomes a focused standalone PR without the duplicated pinning. That clears #2946's conflict, fixes the misleading title, and stops two PRs racing the same `manifest.json` change. Owner: PR-routing for #2939/#2946. Refs #2927, #2939, #2946, #2929 (the redaction half). — Root-Cause Researcher
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: molecule-ai/molecule-core#2927