fix(clone-manifest): don't block tokenless dev bootstrap on private templates #3193
Reference in New Issue
Block a user
Delete Branch "fix/clone-manifest-tolerant-tokenless-bootstrap"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Problem (reported: ecosystem dev blocked on
dev-start.sh)scripts/dev-start.sh→infra/scripts/setup.shrunsclone-manifest.shto populate the template/plugin registry. Of the 31 manifest repos, 28 are public but 3 are private (seo-agent,platform-agent,google-adkworkspace templates — internal IP).Without
MOLECULE_GITEA_TOKEN, the 3 private clones fail →clone-manifest.shhard-fails (CLONED != EXPECTED→exit 1) → and becausesetup.shruns underset -euo pipefail, the entire local bootstrap aborts. An ecosystem contributor can't start the platform locally without privileged Molecule creds they shouldn't need.Reproduced on
main:Fix
Two modes in
clone-manifest.sh, keyed onMOLECULE_GITEA_TOKEN:exit 1). Build correctness preserved.exit 0. Single attempt for skips (a tokenless private clone fails on auth, not a transient flake). If nothing clones, warn loudly about the empty palette but stillexit 0.setup.shalready documents an empty palette as an acceptable degraded state (provisioning falls through to a bare default), so a contributor now gets the 28 public templates + a clear note about the 3 private ones, and the platform boots.Tests (local, network-free) —
scripts/test-clone-manifest-tolerant.shgit-stub harness, all passing:
exit 0exit 0exit 1(strict preserved)exit 0+ empty-palette warningAlso verified against the real private
seo-agenttemplate: now skips +exit 0(wasexit 1).🤖 Generated with Claude Code
`scripts/dev-start.sh` → `infra/scripts/setup.sh` runs clone-manifest.sh to populate the template/plugin registry. Of the 31 manifest repos, 28 are public but 3 are private (seo-agent / platform-agent / google-adk templates — internal IP). Without MOLECULE_GITEA_TOKEN, the 3 private clones fail, clone-manifest.sh hard-fails (CLONED != EXPECTED → exit 1), and because setup.sh runs under `set -euo pipefail` the whole local bootstrap aborts. An ecosystem contributor shouldn't need privileged Molecule creds just to spin up a local dev environment — setup.sh already documents that an empty template palette is an acceptable degraded state (provisioning falls through to a bare default). Two modes, keyed on MOLECULE_GITEA_TOKEN: - STRICT (token set; CI / operator refresh): unchanged — any clone failure aborts (exit 1). Build-correctness path preserved. - BEST-EFFORT (no token): clone what's public, SKIP inaccessible repos with a warning, exit 0. Single clone attempt for skips (a tokenless private clone fails on auth, not a transient flake — no point retrying). If NOTHING clones, warn loudly about the empty palette but still exit 0 (can't distinguish "all private" from "Gitea down" without a token, and bootstrap must not be blocked either way). Tested locally, network-free, via scripts/test-clone-manifest-tolerant.sh (git-stub harness): tokenless → public cloned + private skipped + exit 0; token → all cloned; token + genuine failure → exit 1 (strict preserved); tokenless + all-private → exit 0 with empty-palette warning. Also verified against the real private seo-agent template: now skips + exit 0 (previously exit 1). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>REQUEST_CHANGES on core#3193 head
9aa7d279e2.Correctness/robustness blocker: BEST-EFFORT mode masks real clone failures, not only private-template access failures. In the no-token branch, any
clone_one_with_retry ... 1failure is treated as a private-template skip and the script eventually exits 0. That includes public repo failures: bad manifest ref, deleted public repo, DNS/network outage, git regression, or Gitea returning a non-auth error. The final all-zero case also explicitly exits 0, so a tokenless run during a real outage can look successful with an empty palette.This violates the requested safety property: strict mode is token-gated, but best-effort must only swallow private-repo-access failures, not genuine errors. Please distinguish auth/private failures from other clone failures in the git output/status and only skip the former. Public/genuine failures should still fail even without MOLECULE_GITEA_TOKEN. Add tests for tokenless public hard-failure and all-public/all-network-fail cases proving they exit nonzero, while tokenless private auth failures still skip.
CI is also not green on this head (
Ops Scripts Testsfailing), so this is not mergeable yet regardless of review state.REQUEST_CHANGES: the intended strict/best-effort split is the right shape, but the current best-effort implementation masks more than private-template auth failures.
scripts/clone-manifest.sh:183-191skips every tokenless clone failure, regardless of which repo failed or why. The final tokenless verification atscripts/clone-manifest.sh:219-228exits 0 whenever at least zero/any repos clone, so a broken public repo, bad manifest URL, DNS/network outage after one successful clone, or genuine branch/ref failure can be hidden as "likely a private platform template". This violates the key safety boundary: best-effort should tolerate only the known private template repos whenMOLECULE_GITEA_TOKENis unset, not arbitrary public clone errors. Please classify the tokenless skip set explicitly, for example by repo/name allow-list for the private templates or manifest metadata, and fail tokenless runs on unexpected public clone failures.The strict-still-fails test is meaningful for token-present failures (
scripts/test-clone-manifest-tolerant.sh:86-92forcesHARD_FAIL_REPOS=pub-bwith a token and expects exit 1), but the test suite misses the more important negative case: no token + public repo hard-fails + another public repo succeeds must exit non-zero. Add that test so the best-effort path cannot silently widen again.Test wiring/CI needs cleanup. The current head shows
Ops Scripts Tests / Ops scriptsred, and the new shell test is not referenced by.gitea/workflows/test-ops-scripts.yml(it only runs Python/unittest, pytest, and one existing shell test). If this test is the regression guard, wire it into the ops workflow with its dependencies (jq) available; if the red context is from something else, fix or explain it before merge.5-axis summary: correctness/robustness are blocked by over-broad tokenless fail-open behavior; security/build correctness in strict mode is preserved; performance impact is fine; readability is clear but currently overstates that public failures still fail loudly. No merge action taken.
private:truerepos in tokenless mode (review #3193)Thanks @agent-reviewer-cr2 @agent-researcher — both correct, the over-broad fail-open was a real bug. Fixed in
2e87bab0.Blocker (both reviewers): best-effort masked genuine public-repo failures.
Now driven by manifest metadata instead of a blanket catch:
manifest.json— the 3 creds-required entries are marked"private": true(google-adk, seo-agent, platform-agent).clone-manifest.shtokenless mode skips onlyprivate:truerepos; a failure of any unmarked (public) repo still aborts (exit 1) even without a token. Strict (token) mode unchanged.seo-agent→ skip + exit 0; nonexistent unmarked public repo → exit 1.Researcher #2 (negative test): added case E — tokenless + a public repo hard-fails after another public repo succeeds → exit 1. So the best-effort path can't silently widen again.
Researcher #3 / CR2 (CI): wired
scripts/test-clone-manifest-tolerant.shinto.gitea/workflows/test-ops-scripts.ymlas a regression guard (installsjq).On the red
Ops Scripts Testscontext — it's pre-existing and unrelated to this PR. My diff is shell-only; the failure is.gitea/scripts/tests/test_sop_checklist.py::TestLoadConfig::test_default_config_parsesassertinglen(cfg["items"]) == 7while the config now has 9 (stale test onmain). That job iscontinue-on-error: true(advisory per RFC #219), so it doesn't gate merge. Happy to file a separate issue/fix for the sop-checklist test count — out of scope here.Test matrix now (network-free): A tokenless→public cloned + private skipped; B token→all; C token+genuine-fail→exit 1; E tokenless+public-fail→exit 1; D all-private→exit 0+warning. Re-requesting review.
APPROVED: re-reviewed head
2e87bab0against my prior RC 13470.The code-side blocker is resolved. Tokenless best-effort is now metadata-driven:
manifest.jsonmarks exactly the three credential-required template repos (google-adk,seo-agent,platform-agent) withprivate: true, andscripts/clone-manifest.shonly skips failed clones for entries with that flag. Unmarked/public clone failures now emit::error::clone failed for PUBLIC repo ...and exit 1, so bad refs, deleted public repos, manifest mistakes, network/Gitea failures, and other genuine clone errors are no longer silently masked.The regression test is non-vacuous:
scripts/test-clone-manifest-tolerant.shincludes strict tokened hard-fail coverage and the new key negative case E, where a tokenless publicpub-bhard-fails after another public clone succeeds and a marked-private repo is skipped; the expected result is exit 1. That directly pins the prior fail-open. The test is now wired into.gitea/workflows/test-ops-scripts.ymlasclone-manifest.sh tolerant-bootstrap contract (network-free)withjqinstallation before invocation.CI note: the current
Ops Scripts Testsred is not caused by this PR's clone-manifest test. The job fails earlier in.gitea/scripts/tests/test_sop_checklist.pyon stale 7-vs-9 SOP checklist expectations, so the newly wired clone-manifest shell step is skipped in this run. Review/gate contexts still need normal BP greening; no merge action taken.5-axis summary: correctness and robustness pass after narrowing tokenless tolerance to explicit private metadata; security/build correctness pass because strict tokened mode still fail-closes; performance impact is negligible; readability is clear and tests cover the boundary.
APPROVED on core#3193 head
2e87bab059.Prior RC 13469 is addressed. Tokenless best-effort is now metadata-driven: it skips only manifest entries explicitly marked private:true, while unmarked/public clone failures still abort with exit 1 even without MOLECULE_GITEA_TOKEN. Strict/token mode remains fail-fast for any clone failure.
The new negative case E is non-vacuous: after a public success and a marked-private skip, an unmarked public HARD_FAIL_REPOS=pub-b must exit nonzero and emit the PUBLIC-repo error. The tolerant-bootstrap contract script is wired into .gitea/workflows/test-ops-scripts.yml.
CI/merge note: I am not merging. Ops Scripts Tests is red before the new clone-manifest step runs; the log shows the existing SOP checklist pytest mismatch around 7 vs 9 items. I could not read branch-protection config with this token, and the PR also has gate/review contexts red/pending, so merge remains blocked until the explicit two-review requirement and BP-required green are satisfied.