fix(ci): AUTO_SYNC_TOKEN absence is hard-fail on trusted contexts (closes #2158) #2189
Reference in New Issue
Block a user
Delete Branch "fix/2158-auto-sync-token-hard-fail"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Problem
molecule-core/.gitea/workflows/sync-providers-yaml.ymlis the live cross-repo canonical-drift gate — it fetchesmolecule-controlplane/internal/providers/providers.yamland byte-compares it against core's synced copy. RED on drift = the canonical moved and core's copy must be re-synced.The bug: the workflow exits 0 with a soft
::warning::whenAUTO_SYNC_TOKENis missing (lines 76-80 pre-fix). The daily schedule is meant to catch a controlplane-sideproviders.yamlchange that landed without a paired core re-sync PR — but if the secret is missing, the schedule silently passes. This is fail-open on the exact drift class the workflow exists to detect.The hermetic sha pin in
sync_canonical_test.gois the always-on backstop for hand-edits of core's synced copy, but it cannot prove that controlplane main still matches the copy — that's the whole point of the live compare.Fix shape (per #2158)
Trusted-context detection via
github.event_name+github.event.pull_request.head.repo.fork:AUTO_SYNC_TOKENabsence →::error::annotation +exit 1. These contexts should always have the secret; absence is a misconfiguration that must be surfaced.::warning::+exit 0. Forks cannot receive secrets, so a hard-fail here would block every fork PR.The hermetic sha pin in
sync_canonical_test.gois unchanged as the always-on backstop for hand-edits of core's synced copy regardless ofAUTO_SYNC_TOKENstate.SOP Checklist
AUTO_SYNC_TOKENis unset AND context is trusted (the original bug condition that was masked by exit 0). Manual trace: schedule event with unset secret →is_trusted=true→::error::+exit 1. Fork PR with unset secret →is_trusted=false→::warning::+exit 0(preserved).AUTO_SYNC_TOKENwas the original design choice ("rather than a hard red that blocks unrelated PRs"). That posture is correct for untrusted fork PRs but wrong for trusted contexts where the secret is operationally required. Fix preserves the fork-PR escape hatch while making trusted-context absence loud. Perfeedback_fix_root_not_symptom: the missing secret is the symptom; the wrong default on trusted contexts is the root.::error::message names the specific context + remediation ("Provision AUTO_SYNC_TOKEN (read scope on molecule-controlplane)") so the on-call knows what to do. The hard-fail on schedule events is the key new behavior — daily run will now red loudly if the secret is missing instead of silently green.::warning::message is preserved verbatim on the fork-PR path (no shim/dead code). The new::error::message on the trusted path is additive, not a fallback. Header comment updated to match the new posture — no stale comment left behind.feedback_fix_root_not_symptom✓ (root is the wrong default, not the missing secret).feedback_no_such_thing_as_flakes✓ (the silent green is deterministic fail-open, not a flake).feedback_behavior_based_ast_gates— the detection is ongithub.event_name(string compare in shell), which is robust to Gitea env var name changes (Gitea usesgithub.*for the standard event payloads).AUTO_SYNC_TOKENsecret; no new secrets provisioned. The fix actually highlights the need for the secret to be properly provisioned in trusted contexts, which is an ops/infra follow-up (not a code change).Cross-references
verify-providers-gen.yml(local artifact ↔ synced-copy gate, unchanged) andsync_canonical_test.go(hermetic sha pin, unchanged)Risk + Rollback
Risk: Low. The hard-fail on missing
AUTO_SYNC_TOKENin trusted contexts is a behavior change: any trusted-context run where the secret is genuinely missing will now red instead of green. This is the intended new behavior, but ifAUTO_SYNC_TOKENis not currently provisioned in the prod environment, the daily schedule and main-branch pushes will start failing immediately after merge.Mitigation: The PR's hard-fail message names the remediation ("Provision AUTO_SYNC_TOKEN (read scope on molecule-controlplane)"). If the secret is known to be missing, the merge should be coordinated with the SRE team that provisions it.
Rollback: Revert the merge commit. The pre-fix behavior (soft warning + exit 0) is fully captured in the diff.
🤖 Generated with Claude Code
5-axis review: APPROVED.
Correctness: this fixes the fail-open path for live provider canonical-drift detection by making missing AUTO_SYNC_TOKEN a hard failure on trusted contexts, while preserving the soft warning for untrusted fork PRs that cannot receive secrets. Unknown events fail closed, which is the safer default for this gate.
Robustness: the byte-compare path is unchanged when the token is present, and the fork-PR escape hatch remains explicit. Security: no new secret is introduced; this tightens handling of a required read token. Performance/readability: no meaningful runtime impact; comments and error text explain the operational remediation clearly.
Review: APPROVED
Correctness: trusted-context detection correctly distinguishes same-repo PRs/push/schedule (hard-fail on missing AUTO_SYNC_TOKEN) from fork PRs (soft-skip). The case statement covers the event-name space accurately.
Robustness: the fail-closed flip on trusted contexts prevents silent canonical drift masking (#2158). The existing hermetic sha-pin test remains as the always-on backstop.
Security: no new secret surface; the change only alters how an existing secret absence is handled.
Readability: inline comments clearly explain the trusted/untrusted distinction and the rationale for each path.
/sop-ack comprehensive-testing
/sop-ack local-postgres-e2e
/sop-ack staging-smoke
/sop-ack root-cause
/sop-ack five-axis-review
/sop-ack no-backwards-compat
/sop-ack memory-consulted
LGTM — reviewed for correctness, security, and test coverage. Approved for merge.