fix(ci): AUTO_SYNC_TOKEN absence is hard-fail on trusted contexts (closes #2158) #2189

Merged
core-be merged 1 commits from fix/2158-auto-sync-token-hard-fail into main 2026-06-05 14:59:13 +00:00
Member

Problem

molecule-core/.gitea/workflows/sync-providers-yaml.yml is the live cross-repo canonical-drift gate — it fetches molecule-controlplane/internal/providers/providers.yaml and byte-compares it against core's synced copy. RED on drift = the canonical moved and core's copy must be re-synced.

The bug: the workflow exits 0 with a soft ::warning:: when AUTO_SYNC_TOKEN is missing (lines 76-80 pre-fix). The daily schedule is meant to catch a controlplane-side providers.yaml change that landed without a paired core re-sync PR — but if the secret is missing, the schedule silently passes. This is fail-open on the exact drift class the workflow exists to detect.

The hermetic sha pin in sync_canonical_test.go is the always-on backstop for hand-edits of core's synced copy, but it cannot prove that controlplane main still matches the copy — that's the whole point of the live compare.

Fix shape (per #2158)

Trusted-context detection via github.event_name + github.event.pull_request.head.repo.fork:

  • Trusted contexts (push to main/staging, schedule, workflow_dispatch, same-repo PR): AUTO_SYNC_TOKEN absence → ::error:: annotation + exit 1. These contexts should always have the secret; absence is a misconfiguration that must be surfaced.
  • Untrusted fork PRs: preserved soft ::warning:: + exit 0. Forks cannot receive secrets, so a hard-fail here would block every fork PR.
  • Unknown event types (future events not yet enumerated): default to trusted (fail-closed) so we don't silently degrade.

The hermetic sha pin in sync_canonical_test.go is unchanged as the always-on backstop for hand-edits of core's synced copy regardless of AUTO_SYNC_TOKEN state.

SOP Checklist

  • Comprehensive testing performed — Diff is +37/-7 on a single workflow file. The case statement has 4 branches (push/schedule/workflow_dispatch → trusted; pull_request → fork-aware; default → trusted). The hard-fail path is reached when AUTO_SYNC_TOKEN is unset AND context is trusted (the original bug condition that was masked by exit 0). Manual trace: schedule event with unset secret → is_trusted=true::error:: + exit 1. Fork PR with unset secret → is_trusted=false::warning:: + exit 0 (preserved).
  • Local-postgres E2E run — N/A: pure workflow YAML change, no Go code, no database interaction.
  • Staging-smoke verified or pending — Pending: workflow is standalone (not in branch protection per the existing posture) so the lint-required-context-exists-in-bp gate does not fire. PR-time CI will exercise the workflow run with the secret present (normal case), which should pass the byte-compare path unchanged. The new failure path only triggers when the secret is missing — CI cannot easily test that condition without manual secret rotation, but the logic is straightforward shell control flow.
  • Root-cause not symptom — Root cause: the workflow's fail-open posture on missing AUTO_SYNC_TOKEN was the original design choice ("rather than a hard red that blocks unrelated PRs"). That posture is correct for untrusted fork PRs but wrong for trusted contexts where the secret is operationally required. Fix preserves the fork-PR escape hatch while making trusted-context absence loud. Per feedback_fix_root_not_symptom: the missing secret is the symptom; the wrong default on trusted contexts is the root.
  • Five-Axis review walked — Self-5-axis: (1) Correctness: case statement exhaustively covers the 4 event types, fork detection is on a single field, unknown events fail-closed; (2) Tests: no Go test changes (no Go code touched); (3) Architecture: minimal-blast-radius — only the AUTH check changed, byte-compare logic untouched; (4) Compat: no public API change, no Gitea API change, no behavior change for the secret-present path (the byte-compare still runs as before); (5) Ops: the ::error:: message names the specific context + remediation ("Provision AUTO_SYNC_TOKEN (read scope on molecule-controlplane)") so the on-call knows what to do. The hard-fail on schedule events is the key new behavior — daily run will now red loudly if the secret is missing instead of silently green.
  • No backwards-compat shim / dead code added — The original ::warning:: message is preserved verbatim on the fork-PR path (no shim/dead code). The new ::error:: message on the trusted path is additive, not a fallback. Header comment updated to match the new posture — no stale comment left behind.
  • Memory/saved-feedback consultedfeedback_fix_root_not_symptom ✓ (root is the wrong default, not the missing secret). feedback_no_such_thing_as_flakes ✓ (the silent green is deterministic fail-open, not a flake). feedback_behavior_based_ast_gates — the detection is on github.event_name (string compare in shell), which is robust to Gitea env var name changes (Gitea uses github.* for the standard event payloads).
  • No new secrets required — The change uses the existing AUTO_SYNC_TOKEN secret; no new secrets provisioned. The fix actually highlights the need for the secret to be properly provisioned in trusted contexts, which is an ops/infra follow-up (not a code change).

Cross-references

Risk + Rollback

Risk: Low. The hard-fail on missing AUTO_SYNC_TOKEN in trusted contexts is a behavior change: any trusted-context run where the secret is genuinely missing will now red instead of green. This is the intended new behavior, but if AUTO_SYNC_TOKEN is not currently provisioned in the prod environment, the daily schedule and main-branch pushes will start failing immediately after merge.

Mitigation: The PR's hard-fail message names the remediation ("Provision AUTO_SYNC_TOKEN (read scope on molecule-controlplane)"). If the secret is known to be missing, the merge should be coordinated with the SRE team that provisions it.

Rollback: Revert the merge commit. The pre-fix behavior (soft warning + exit 0) is fully captured in the diff.

🤖 Generated with Claude Code

## Problem `molecule-core/.gitea/workflows/sync-providers-yaml.yml` is the live cross-repo canonical-drift gate — it fetches `molecule-controlplane/internal/providers/providers.yaml` and byte-compares it against core's synced copy. RED on drift = the canonical moved and core's copy must be re-synced. **The bug:** the workflow exits 0 with a soft `::warning::` when `AUTO_SYNC_TOKEN` is missing (lines 76-80 pre-fix). The daily schedule is meant to catch a controlplane-side `providers.yaml` change that landed without a paired core re-sync PR — but if the secret is missing, the schedule silently passes. This is fail-open on the exact drift class the workflow exists to detect. The hermetic sha pin in `sync_canonical_test.go` is the always-on backstop for hand-edits of core's synced copy, but it cannot prove that controlplane main still matches the copy — that's the whole point of the live compare. ## Fix shape (per #2158) Trusted-context detection via `github.event_name` + `github.event.pull_request.head.repo.fork`: - **Trusted contexts** (push to main/staging, schedule, workflow_dispatch, same-repo PR): `AUTO_SYNC_TOKEN` absence → `::error::` annotation + `exit 1`. These contexts should always have the secret; absence is a misconfiguration that must be surfaced. - **Untrusted fork PRs**: preserved soft `::warning::` + `exit 0`. Forks cannot receive secrets, so a hard-fail here would block every fork PR. - **Unknown event types** (future events not yet enumerated): default to trusted (fail-closed) so we don't silently degrade. The hermetic sha pin in `sync_canonical_test.go` is unchanged as the always-on backstop for hand-edits of core's synced copy regardless of `AUTO_SYNC_TOKEN` state. ## SOP Checklist - [x] **Comprehensive testing performed** — Diff is +37/-7 on a single workflow file. The case statement has 4 branches (push/schedule/workflow_dispatch → trusted; pull_request → fork-aware; default → trusted). The hard-fail path is reached when `AUTO_SYNC_TOKEN` is unset AND context is trusted (the original bug condition that was masked by exit 0). Manual trace: schedule event with unset secret → `is_trusted=true` → `::error::` + `exit 1`. Fork PR with unset secret → `is_trusted=false` → `::warning::` + `exit 0` (preserved). - [x] **Local-postgres E2E run** — N/A: pure workflow YAML change, no Go code, no database interaction. - [x] **Staging-smoke verified or pending** — Pending: workflow is standalone (not in branch protection per the existing posture) so the lint-required-context-exists-in-bp gate does not fire. PR-time CI will exercise the workflow run with the secret present (normal case), which should pass the byte-compare path unchanged. The new failure path only triggers when the secret is missing — CI cannot easily test that condition without manual secret rotation, but the logic is straightforward shell control flow. - [x] **Root-cause not symptom** — Root cause: the workflow's fail-open posture on missing `AUTO_SYNC_TOKEN` was the original design choice ("rather than a hard red that blocks unrelated PRs"). That posture is correct for untrusted fork PRs but wrong for trusted contexts where the secret is operationally required. Fix preserves the fork-PR escape hatch while making trusted-context absence loud. Per `feedback_fix_root_not_symptom`: the missing secret is the symptom; the wrong default on trusted contexts is the root. - [x] **Five-Axis review walked** — Self-5-axis: (1) Correctness: case statement exhaustively covers the 4 event types, fork detection is on a single field, unknown events fail-closed; (2) Tests: no Go test changes (no Go code touched); (3) Architecture: minimal-blast-radius — only the AUTH check changed, byte-compare logic untouched; (4) Compat: no public API change, no Gitea API change, no behavior change for the secret-present path (the byte-compare still runs as before); (5) Ops: the `::error::` message names the specific context + remediation ("Provision AUTO_SYNC_TOKEN (read scope on molecule-controlplane)") so the on-call knows what to do. The hard-fail on schedule events is the key new behavior — daily run will now red loudly if the secret is missing instead of silently green. - [x] **No backwards-compat shim / dead code added** — The original `::warning::` message is preserved verbatim on the fork-PR path (no shim/dead code). The new `::error::` message on the trusted path is additive, not a fallback. Header comment updated to match the new posture — no stale comment left behind. - [x] **Memory/saved-feedback consulted** — `feedback_fix_root_not_symptom` ✓ (root is the wrong default, not the missing secret). `feedback_no_such_thing_as_flakes` ✓ (the silent green is deterministic fail-open, not a flake). `feedback_behavior_based_ast_gates` — the detection is on `github.event_name` (string compare in shell), which is robust to Gitea env var name changes (Gitea uses `github.*` for the standard event payloads). - [x] **No new secrets required** — The change uses the existing `AUTO_SYNC_TOKEN` secret; no new secrets provisioned. The fix actually highlights the need for the secret to be properly provisioned in trusted contexts, which is an ops/infra follow-up (not a code change). ## Cross-references - Closes #2158 - Umbrella: internal#718 P2-A (Distribution = SDK via codegen + verify-CI) - Sibling template finding: internal#766 - Pairs with `verify-providers-gen.yml` (local artifact ↔ synced-copy gate, unchanged) and `sync_canonical_test.go` (hermetic sha pin, unchanged) ## Risk + Rollback **Risk:** Low. The hard-fail on missing `AUTO_SYNC_TOKEN` in trusted contexts is a behavior change: any trusted-context run where the secret is genuinely missing will now red instead of green. This is the intended new behavior, but if `AUTO_SYNC_TOKEN` is not currently provisioned in the prod environment, the daily schedule and main-branch pushes will start failing immediately after merge. **Mitigation:** The PR's hard-fail message names the remediation ("Provision AUTO_SYNC_TOKEN (read scope on molecule-controlplane)"). If the secret is known to be missing, the merge should be coordinated with the SRE team that provisions it. **Rollback:** Revert the merge commit. The pre-fix behavior (soft warning + exit 0) is fully captured in the diff. 🤖 Generated with [Claude Code](https://claude.com/claude-code)
fullstack-engineer added 1 commit 2026-06-04 01:58:42 +00:00
fix(ci): AUTO_SYNC_TOKEN absence is hard-fail on trusted contexts (#2158)
ci-arm64-advisory / fast-checks (pull_request) Waiting to run
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 5s
CI / Python Lint & Test (pull_request) Successful in 3s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 4s
Lint curl status-code capture / Scan workflows for curl status-capture pollution (pull_request) Successful in 4s
Lint forbidden tenant-env keys / Scan workspace_secrets writers for forbidden env keys (pull_request) Successful in 5s
Lint shellcheck (arm64 pilot) / shellcheck-arm64 (pilot) (pull_request) Successful in 2s
CI / Detect changes (pull_request) Successful in 10s
Lint forbidden tenant-env keys / Scan for repo-host token write into tenant workspace surface (pull_request) Successful in 7s
lint-required-workflows-docker-host-pinned / Lint docker-host pin on docker-touching workflows (pull_request) Successful in 5s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 11s
E2E API Smoke Test / detect-changes (pull_request) Successful in 18s
E2E Chat / detect-changes (pull_request) Successful in 17s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 8s
qa-review / approved (pull_request_target) Failing after 5s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 1s
sync-providers-yaml / Compare synced providers.yaml against controlplane canonical (pull_request) Successful in 9s
CI / Platform (Go) (pull_request) Successful in 1s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 2s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 1s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 2s
E2E Chat / E2E Chat (pull_request) Successful in 2s
CI / Canvas (Next.js) (pull_request) Successful in 8s
security-review / approved (pull_request_target) Failing after 11s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
CI / all-required (pull_request) Successful in 1s
gate-check-v3 / gate-check (pull_request_target) Successful in 6s
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 57s
lint-continue-on-error-tracking / lint-continue-on-error-tracking (pull_request) Successful in 1m7s
Lint workflow YAML (Gitea-1.22.6-hostile shapes) / Lint workflow YAML for Gitea-1.22.6-hostile shapes (pull_request) Successful in 1m13s
Lint pre-flip continue-on-error / Verify continue-on-error flips have run-log proof (pull_request) Successful in 1m19s
lint-required-context-exists-in-bp / lint-required-context-exists-in-bp (pull_request) Successful in 1m20s
sop-checklist / review-refire (pull_request_target) Has been skipped
sop-checklist / all-items-acked (pull_request) [info tier:low] acked: 0/7 — missing: comprehensive-testing, local-postgres-e2e, staging-smoke, +4
sop-checklist / na-declarations (pull_request) N/A: (none)
sop-checklist / all-items-acked (pull_request_target) Successful in 4s
sop-tier-check / tier-check (pull_request_target) Successful in 7s
qa-review / approved (pull_request_review) Has been skipped
security-review / approved (pull_request_review) Has been skipped
sop-tier-check / tier-check (pull_request_review) Successful in 6s
audit-force-merge / audit (pull_request_target) Successful in 9s
968d77fc1a
The sync-providers-yaml workflow's live cross-repo canonical-drift
compare (vs molecule-controlplane/internal/providers/providers.yaml)
exits 0 with a soft warning when AUTO_SYNC_TOKEN is missing. This
silent fail-open masks the exact drift class the workflow is meant
to catch — a controlplane-side providers.yaml change that lands
without a paired core re-sync PR.

Fix shape (per #2158 recommended fix):
- Trusted contexts (push, schedule, workflow_dispatch, same-repo PR):
  hard ::error:: + exit 1. These contexts should always have the
  secret, so its absence is a misconfiguration that must be surfaced.
- Untrusted fork PRs: preserved soft ::warning:: + exit 0. Forks
  cannot receive secrets, so a hard-fail here would block every
  fork PR.
- The hermetic sha pin in sync_canonical_test.go is unchanged as
  the always-on backstop for hand-edits of core's synced copy.

Detection via github.event_name + github.event.pull_request.head.repo.fork.
Unknown event types default to trusted (fail-closed posture) to avoid
silently degrading on a future event we haven't enumerated.

Refs: #2158
Umbrella: internal#718 P2-A
Sibling template finding: internal#766
agent-reviewer approved these changes 2026-06-05 10:21:19 +00:00
agent-reviewer left a comment
Member

5-axis review: APPROVED.

Correctness: this fixes the fail-open path for live provider canonical-drift detection by making missing AUTO_SYNC_TOKEN a hard failure on trusted contexts, while preserving the soft warning for untrusted fork PRs that cannot receive secrets. Unknown events fail closed, which is the safer default for this gate.

Robustness: the byte-compare path is unchanged when the token is present, and the fork-PR escape hatch remains explicit. Security: no new secret is introduced; this tightens handling of a required read token. Performance/readability: no meaningful runtime impact; comments and error text explain the operational remediation clearly.

5-axis review: APPROVED. Correctness: this fixes the fail-open path for live provider canonical-drift detection by making missing AUTO_SYNC_TOKEN a hard failure on trusted contexts, while preserving the soft warning for untrusted fork PRs that cannot receive secrets. Unknown events fail closed, which is the safer default for this gate. Robustness: the byte-compare path is unchanged when the token is present, and the fork-PR escape hatch remains explicit. Security: no new secret is introduced; this tightens handling of a required read token. Performance/readability: no meaningful runtime impact; comments and error text explain the operational remediation clearly.
core-be added the tier:low label 2026-06-05 13:19:59 +00:00
core-be approved these changes 2026-06-05 13:20:44 +00:00
core-be left a comment
Member

Review: APPROVED

Correctness: trusted-context detection correctly distinguishes same-repo PRs/push/schedule (hard-fail on missing AUTO_SYNC_TOKEN) from fork PRs (soft-skip). The case statement covers the event-name space accurately.

Robustness: the fail-closed flip on trusted contexts prevents silent canonical drift masking (#2158). The existing hermetic sha-pin test remains as the always-on backstop.

Security: no new secret surface; the change only alters how an existing secret absence is handled.

Readability: inline comments clearly explain the trusted/untrusted distinction and the rationale for each path.

/sop-ack comprehensive-testing
/sop-ack local-postgres-e2e
/sop-ack staging-smoke
/sop-ack root-cause
/sop-ack five-axis-review
/sop-ack no-backwards-compat
/sop-ack memory-consulted

## Review: APPROVED Correctness: trusted-context detection correctly distinguishes same-repo PRs/push/schedule (hard-fail on missing AUTO_SYNC_TOKEN) from fork PRs (soft-skip). The case statement covers the event-name space accurately. Robustness: the fail-closed flip on trusted contexts prevents silent canonical drift masking (#2158). The existing hermetic sha-pin test remains as the always-on backstop. Security: no new secret surface; the change only alters how an existing secret absence is handled. Readability: inline comments clearly explain the trusted/untrusted distinction and the rationale for each path. /sop-ack comprehensive-testing /sop-ack local-postgres-e2e /sop-ack staging-smoke /sop-ack root-cause /sop-ack five-axis-review /sop-ack no-backwards-compat /sop-ack memory-consulted
core-be approved these changes 2026-06-05 14:58:24 +00:00
core-be left a comment
Member

LGTM — reviewed for correctness, security, and test coverage. Approved for merge.

LGTM — reviewed for correctness, security, and test coverage. Approved for merge.
core-be merged commit 2171c47cfa into main 2026-06-05 14:59:13 +00:00
Sign in to join this conversation.
3 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: molecule-ai/molecule-core#2189