ci: synthetic-check cron for AUTO_SYNC_TOKEN rotation drift detection (post-#66 hostile-self-review #3) #72
Loading…
Reference in New Issue
Block a user
No description provided.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Context
PR #66 fixed auto-sync main→staging by replacing the broken
gh pr create(Gitea 405 on GraphQL) with a direct git push from thedevops-engineerpersona'sAUTO_SYNC_TOKEN. The hostile self-review of that PR flagged weakest-spot #3:In the worst case (slow main-push cadence), the gap between rotation and detection could be many hours. During that window, every commit to main fails to propagate to staging — auto-promote-staging.yml then sees a divergent staging that isn't a superset of main, and the
staging is a superset of maininvariant is silently broken.What this issue tracks
Add a low-frequency cron-triggered synthetic check that fires the auto-sync auth surface (or a cheap variant) and emits a clear red signal if
AUTO_SYNC_TOKENhas drifted out of validity.Investigation findings
What
AUTO_SYNC_TOKENdoes todayUsed in three workflows:
auto-sync-main-to-staging.yml— PR #66's direct push from devops-engineer personapublish-workspace-server-image.yml—oauth2:<token>basic-auth for cloning manifest deps in CIFailure modes from the auto-sync header:
branch-protection-drift.yml(daily cron)So: synthetic check focuses on B.
Decision: Option B (read-only verify), rejecting A and C
Option A — full auto-sync on schedule: REJECTED. Every 6h × 4 = 4 synthetic merge commits per day on staging when main hasn't advanced. That's pure history clutter. Worse: if main has advanced, the scheduled run races the real
push:trigger.Option B — token-validity probe (pick this): cron-triggered workflow that does:
GET /api/v1/useragainst Gitea with the token → validates auth + identity (expectsusername == devops-engineer)GET /api/v1/repos/molecule-ai/molecule-corewith the token → validatesread:repositoryscope on this repogit ls-remote https://oauth2:$AUTO_SYNC_TOKEN@git.moleculesai.app/molecule-ai/molecule-core staging→ validates the exact HTTPS auth path used byactions/checkoutstep in the real workflowgit push --dry-run origin stagingfrom a noop synthetic branch — exercises the connection + ref-negotiation path, but does NOT exercise pre-receive hook (so does not validate authz; Option C does, but authz is already covered by branch-protection-drift)Pros: cheap (~3 HTTPS calls, ~5s wall-clock), zero side-effects on staging, no branch noise.
Cons: doesn't validate the protected-branch push whitelist authz on its own. (Branch-protection-drift.yml is the canonical gate for that, daily.)
Option C — push to dedicated
auto-sync-canarybranch: REJECTED. Tests authz too, but: (a) branch noise on Gitea, (b) requires adding the canary branch to staging'spush_whitelistor a new protection — more YAML drift, (c) authz validation is already done daily bybranch-protection-drift.yml. Don't duplicate.Prior art
/user/tokens/verifyendpoint specifically for canary scripts to validate "is this token still good" with no side effects. Gitea's equivalent isGET /api/v1/user.testSecretstep that calls the target service with the new credential before promotingAWSPENDINGtoAWSCURRENT. Same shape — auth probe before commit.secret_idperiodic health checks viavault token lookupto detect renewal failures.The canonical pattern is: a dedicated read-only auth-validity endpoint + cron canary. Option B applies that pattern verbatim.
Cadence
6h is the brief's suggestion. Justification:
6h dominates daily, and the marginal benefit of 1h doesn't justify 24× the noise in actions feed.
Token-scoping security
No new token. Reuses
secrets.AUTO_SYNC_TOKEN(read scope is sufficient — Option B does not push). The synthetic check has the same blast-radius profile as the workflow it's monitoring.Surfaces affected
.github/workflows/auto-sync-canary.ymlPlan (Phase 2 design → Phase 3 implement → Phase 4 verify)
auto-promote-stale-alarm.yml).Coordination
Implementation in PR #77 (
fix/issue-72-auto-sync-token-canary-v2→main). Phase 3 done; Phase 4 verification pending merge + manual trigger + mutation test.Phase 4 verification update
Local probe verification (since Gitea 1.22.6 doesn't expose REST workflow_dispatch)
Ran all three probes against live Gitea using a real token, then mutated each.
Probe 1 —
GET /api/v1/user:username == claude-ceo-assistant(when run with my token; will bedevops-engineerin production).Token rotation suspected: GET /api/v1/user returned HTTP 401 ... Likely cause: AUTO_SYNC_TOKEN has been rotated/revoked on Gitea but the repo Actions secret was not updated. Runbook: see header comment of this workflow file.Token resolves to user 'claude-ceo-assistant', expected 'devops-engineer'. AUTO_SYNC_TOKEN must be the devops-engineer persona PAT (not founder PAT, not another persona).Probe 2 —
GET /api/v1/repos/molecule-ai/molecule-core: HTTP 200 with valid token (read scope confirmed).Probe 3 — original
git ls-remote refs/heads/staging: REJECTED on review. Discovered Gitea falls back to anonymous read on public repos, sols-remotesucceeded even with a junk token. False-green — the worst possible canary failure mode. Rewrote to usegit push --dry-runof current staging SHA back to staging:Everything up-to-date, exit 0.fatal: Authentication failed for ..., exit 128. Error message:Token rotation suspected: git push --dry-run against staging failed via the AUTO_SYNC_TOKEN HTTPS auth path (exit 128). This is the EXACT auth path that actions/checkout + git push use in auto-sync-main-to-staging.yml. Likely cause: AUTO_SYNC_TOKEN was rotated/revoked on Gitea but the repo Actions secret was not updated.Because
git pushrequires a local repo, the workflow now doesgit initin a tempdir (~50ms, ~1KB) instead ofactions/checkout(which would clone hundreds of MB).CI status on PR #77
0cef033a)pr-guards / disable-auto-merge-on-push— depends onmolecule-ai/molecule-cireusable workflow that appears to be unavailable on Gitea. Pre-existing org-wide; not introduced by this PR. Deferred.e4e1bf40(post-self-review comment update); will settle GREEN/22-of-23 (excluding pr-guards).Hostile self-review weakest-3
auto-sync-main-to-staging.ymland this canary's env var. Addressed by adding an inline comment pointing the next editor at both files (commite4e1bf40).ls-remoteandpush --dry-run.--dry-runsemantics specifically don't transmit, so even in the race no actual ref-create happens. And branch protection prevents staging deletion. Documented for completeness.Outstanding follow-ups
None within scope. Possible future enhancements (out of scope for this issue):
write:repository,read:user, etc.) — currently only validates the read paths the canary itself uses