chore(workflows): delete obsolete promote/sync workflows (Phase 3C of internal#81)

Trunk-based migration final cleanup for molecule-core. The 6 workflows
deleted here all existed to manage the staging↔main branch dance that
trunk-based makes obsolete:

  - auto-promote-staging.yml         fast-forward staging→main on green
  - auto-promote-on-e2e.yml          alt promote path on E2E green
  - auto-promote-stale-alarm.yml     alarm if staging promotion stalls
  - auto-sync-main-to-staging.yml    sync main→staging after UI merges
  - auto-sync-canary.yml             dry-run probe of the auto-sync
                                     token+push path
  - retarget-main-to-staging.yml     rebase open PRs onto staging

After Phase 3A (PR #108 promoted 5 staging-only feature PRs to main)
and Phase 3B (PR #109 dropped staging-branch triggers from the 4 e2e
workflows), main is the only branch the CI cares about. None of the
above workflows have anything to do; they're 1977 lines of dead Go-time-
no-Gitea-time-yes code.

Rollback: `git revert` this commit to restore the workflows. They still
work mechanically; trunk-based just doesn't need them.

The `staging` branch on the remote is deleted in a follow-up step
(`git push origin --delete staging`) after this PR merges, so reviewers
can confirm CI runs cleanly on the new shape before the ref disappears.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
claude-ceo-assistant 2026-05-08 14:18:35 +00:00
parent ae2d9eabf6
commit 08e8d325e2
6 changed files with 0 additions and 1977 deletions

View File

@ -1,467 +0,0 @@
name: Auto-promote :latest after main image build
# Retags `ghcr.io/molecule-ai/{platform,platform-tenant}:staging-<sha>`
# → `:latest` after either the image build or E2E completes on a `main`
# push, gated on E2E Staging SaaS not being red for that SHA.
#
# Why two triggers:
#
# `publish-workspace-server-image` and `e2e-staging-saas` are both
# paths-filtered, but with DIFFERENT path sets:
#
# publish-workspace-server-image:
# workspace-server/**, canvas/**, manifest.json
#
# e2e-staging-saas (full lifecycle):
# workspace-server/internal/handlers/{registry,workspace_provision,
# a2a_proxy}.go, workspace-server/internal/middleware/**,
# workspace-server/internal/provisioner/**, tests/e2e/test_staging_full_saas.sh
#
# The E2E set is a strict SUBSET of the publish set. So:
# - canvas/** changes → publish fires, E2E does not
# - workspace-server/cmd/** changes → publish fires, E2E does not
# - workspace-server/internal/sweep/** → publish fires, E2E does not
#
# The previous version triggered ONLY on E2E completion, which meant
# non-E2E-path changes (canvas, cmd, sweep, etc.) rebuilt the image
# but never advanced `:latest`. Result: as of 2026-04-28 this workflow
# had run zero times since merge despite eight main pushes — `:latest`
# was ~7 hours / 9 PRs behind main with no human realising. See
# `molecule-core` Slack discussion 2026-04-28.
#
# Adding `publish-workspace-server-image` as a second trigger closes
# the gap: any image rebuild on main eligibly advances `:latest`.
#
# Why E2E remains a kill-switch (not the trigger):
#
# When E2E DID run for this SHA and ended red, we abort — `:latest`
# stays on the prior known-good digest. When E2E didn't run (paths
# filtered out), we proceed: pre-merge gates already validated this
# SHA on staging via auto-promote-staging requiring CI + E2E Canvas +
# E2E API + CodeQL all green. Image content for non-E2E-paths
# (canvas, cmd, sweep) is exercised by those staging gates.
#
# Why `main` only:
#
# `:latest` is what prod tenants pull. We only want SHAs that have
# reached main (via auto-promote-staging) to advance `:latest`.
# Triggering on staging would let a staging-only revert advance
# `:latest` to a SHA that never reaches main, breaking the "production
# runs what's on main" invariant.
#
# Idempotency:
#
# When a SHA touches paths that match BOTH publish and E2E, both
# workflows fire and complete. Both trigger this workflow on
# completion → two runs race. Both retag `:staging-<sha>` →
# `:latest`. crane tag is idempotent (re-tagging the same digest is a
# no-op), so the second run is harmless. concurrency group serializes
# them anyway.
on:
workflow_run:
workflows:
- 'E2E Staging SaaS (full lifecycle)'
- 'publish-workspace-server-image'
types: [completed]
branches: [main]
workflow_dispatch:
inputs:
sha:
description: 'Short sha to promote (override; defaults to upstream workflow_run head_sha)'
required: false
type: string
permissions:
contents: read
packages: write
concurrency:
# Serialize promotes per-SHA so the publish+E2E both-fired race lands
# cleanly. Different SHAs can promote in parallel.
group: auto-promote-latest-${{ github.event.workflow_run.head_sha || github.event.inputs.sha || github.sha }}
cancel-in-progress: false
env:
IMAGE_NAME: ghcr.io/molecule-ai/platform
TENANT_IMAGE_NAME: ghcr.io/molecule-ai/platform-tenant
jobs:
promote:
# Proceed if upstream succeeded OR manual dispatch. Upstream-failure
# paths are filtered here; the E2E-was-red kill-switch lives in the
# gate-check step below (covers the case where upstream is publish
# success but E2E for the same SHA failed).
if: |
github.event_name == 'workflow_dispatch' ||
(github.event_name == 'workflow_run' && github.event.workflow_run.conclusion == 'success')
runs-on: ubuntu-latest
steps:
- name: Compute short sha
id: sha
run: |
set -euo pipefail
if [ -n "${{ github.event.inputs.sha }}" ]; then
FULL="${{ github.event.inputs.sha }}"
else
FULL="${{ github.event.workflow_run.head_sha }}"
fi
echo "short=${FULL:0:7}" >> "$GITHUB_OUTPUT"
echo "full=${FULL}" >> "$GITHUB_OUTPUT"
- name: Gate — E2E Staging SaaS state for this SHA
# When upstream IS E2E success, we know it's green (filtered by
# the job-level `if` already). When upstream is publish, look up
# E2E state for the same SHA. Four buckets:
#
# - completed/success: E2E confirmed safe → proceed
# - completed/failure|cancelled|timed_out: E2E found a
# regression → ABORT (exit 1), `:latest` stays put
# - in_progress|queued|requested: E2E is RACING with publish
# for a runtime-touching SHA. publish typically completes
# ~5-10min before E2E (~10-15min). If we promote on the
# publish signal here, a later E2E failure can't roll back
# `:latest` — it'd already be wrongly advanced. So we DEFER:
# skip subsequent steps (proceed=false) and let E2E's own
# completion event re-fire this workflow, which then takes
# the upstream-is-E2E path. exit 0 so the run shows as
# success rather than a noisy fake-failure.
# - none/none: E2E was paths-filtered out for this SHA (the
# change touched canvas/cmd/sweep/etc. — paths covered by
# publish but not by E2E). pre-merge gates on staging
# already validated this SHA → proceed.
#
# Manual dispatch skips this check — operator override.
id: gate
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
REPO: ${{ github.repository }}
SHA: ${{ steps.sha.outputs.full }}
UPSTREAM_NAME: ${{ github.event.workflow_run.name }}
EVENT_NAME: ${{ github.event_name }}
run: |
set -euo pipefail
if [ "$EVENT_NAME" = "workflow_dispatch" ]; then
echo "proceed=true" >> "$GITHUB_OUTPUT"
echo "::notice::Manual dispatch — skipping E2E gate (operator override)"
exit 0
fi
if [ "$UPSTREAM_NAME" = "E2E Staging SaaS (full lifecycle)" ]; then
echo "proceed=true" >> "$GITHUB_OUTPUT"
echo "::notice::Upstream is E2E itself (success per job-level if) — gate trivially satisfied"
exit 0
fi
# Upstream is publish-workspace-server-image. Check E2E state
# for the same SHA via Gitea's commit-status API.
#
# GitHub-era this was `gh run list --workflow=X --commit=SHA
# --json status,conclusion` returning either `[]` (no run on
# this SHA) or `[{status, conclusion}]` (the run's state).
# Gitea has NO workflow-runs API at all — `/api/v1/repos/.../
# actions/runs` returns 404 (verified 2026-05-07, issue #75).
# However Gitea Actions DOES emit a commit status per workflow
# job, with `context = "<Workflow Name> / <Job Name> (<event>)"`,
# which is exactly what we need: each E2E run leg becomes one
# status row on the SHA, and the aggregate state encodes the
# run's outcome.
#
# Mapping:
# 0 matched contexts → "none/none" (E2E paths-
# filtered
# out — same
# semantic
# as before)
# any context = pending → "in_progress/none" (defer)
# any context = error|failure → "completed/failure" (abort)
# all contexts = success → "completed/success" (proceed)
#
# The "completed/cancelled" and "completed/timed_out" buckets
# don't have direct Gitea analogs (Gitea statuses are
# success / failure / error / pending / warning). Per-SHA
# concurrency cancellation surfaces as `error` on Gitea, which
# we map to "completed/failure" rather than "completed/cancelled"
# — losing the soft-defer semantic of the cancelled bucket on
# this fleet. Tradeoff: the staleness alarm (auto-promote-stale-
# alarm.yml) still catches a stuck :latest within 4h, and a
# legitimate cancel is rare enough that aborting + manual
# re-dispatch is acceptable. If we measure cancel frequency
# > 1/week, revisit by reading the run-step-summary text via
# a follow-up script.
#
# Network or auth blips collapse to "none/none" via the curl
# `|| true` fallback, matching the pre-Gitea behaviour where
# an empty list also degenerated to none/none.
GITEA_API_URL="${GITHUB_SERVER_URL:-https://git.moleculesai.app}/api/v1"
STATUSES_JSON=$(curl --fail-with-body -sS \
-H "Authorization: token ${GH_TOKEN}" \
-H "Accept: application/json" \
"${GITEA_API_URL}/repos/${REPO}/commits/${SHA}/statuses?limit=100" \
2>/dev/null || echo "[]")
RESULT=$(printf '%s' "$STATUSES_JSON" | jq -r '
# Filter to E2E Staging SaaS (full lifecycle) statuses.
# Match by leading workflow-name prefix so the "<job>
# (<event>)" tail is irrelevant. Gitea emits the workflow
# name verbatim from the YAML `name:` field.
[.[] | select(.context | startswith("E2E Staging SaaS (full lifecycle) /"))] as $rows
| if ($rows | length) == 0 then
"none/none"
elif any($rows[]; .status == "pending") then
"in_progress/none"
elif any($rows[]; .status == "failure" or .status == "error") then
"completed/failure"
elif all($rows[]; .status == "success") then
"completed/success"
else
# Mixed / unknown — fall through to *) bucket below.
"completed/" + ($rows[0].status // "unknown")
end
' 2>/dev/null || echo "none/none")
echo "E2E Staging SaaS for ${SHA:0:7}: $RESULT"
case "$RESULT" in
completed/success)
echo "proceed=true" >> "$GITHUB_OUTPUT"
echo "::notice::E2E green for this SHA — proceeding with promote"
;;
completed/failure|completed/timed_out)
echo "proceed=false" >> "$GITHUB_OUTPUT"
{
echo "## ❌ Auto-promote aborted — E2E Staging SaaS failed"
echo
echo "E2E Staging SaaS for \`${SHA:0:7}\`: \`$RESULT\`"
echo "\`:latest\` stays on the prior known-good digest."
echo
echo "If the failure was a flake, manually dispatch this workflow with the same sha to override."
} >> "$GITHUB_STEP_SUMMARY"
exit 1
;;
completed/cancelled)
# GitHub-era only: cancelled ≠ failure. Gitea statuses
# don't expose a "cancelled" state — a per-SHA concurrency
# cancellation surfaces as `failure` or `error` on Gitea
# and is now handled by the failure branch above. This
# arm is kept for backwards compatibility / dual-host
# operation (if we ever add a non-Gitea fallback) but
# under the post-#75 flow it's unreachable.
echo "proceed=false" >> "$GITHUB_OUTPUT"
{
echo "## ⏭ Auto-promote deferred — E2E Staging SaaS was cancelled"
echo
echo "E2E Staging SaaS for \`${SHA:0:7}\`: \`$RESULT\`"
echo "Likely per-SHA concurrency (newer push superseded this E2E run)."
echo "The newer SHA's E2E will fire its own promote when it lands."
echo "If you need this specific SHA promoted, manually dispatch."
} >> "$GITHUB_STEP_SUMMARY"
;;
in_progress/*|queued/*|requested/*|waiting/*|pending/*)
echo "proceed=false" >> "$GITHUB_OUTPUT"
{
echo "## ⏳ Auto-promote deferred — E2E Staging SaaS still running"
echo
echo "Publish completed before E2E for \`${SHA:0:7}\` (state: \`$RESULT\`)."
echo "Skipping retag here — E2E's own completion event will re-fire this workflow."
echo "If E2E ends green, that run promotes \`:latest\`. If red, it aborts."
} >> "$GITHUB_STEP_SUMMARY"
;;
none/none)
echo "proceed=true" >> "$GITHUB_OUTPUT"
echo "::notice::E2E paths-filtered out for this SHA — pre-merge staging gates carry"
;;
*)
echo "proceed=false" >> "$GITHUB_OUTPUT"
{
echo "## ❓ Auto-promote aborted — unexpected E2E state"
echo
echo "E2E Staging SaaS for \`${SHA:0:7}\`: \`$RESULT\` (unhandled)"
echo "Manual investigation needed; re-dispatch with the same sha once resolved."
} >> "$GITHUB_STEP_SUMMARY"
exit 1
;;
esac
- if: steps.gate.outputs.proceed == 'true'
uses: imjasonh/setup-crane@6da1ae018866400525525ce74ff892880c099987 # v0.5
- name: GHCR login
if: steps.gate.outputs.proceed == 'true'
run: |
echo "${{ secrets.GITHUB_TOKEN }}" | \
crane auth login ghcr.io -u "${{ github.actor }}" --password-stdin
- name: Verify :staging-<sha> exists for both images
# Better to fail fast with a clear message than to half-tag
# (platform retagged but platform-tenant missing → tenants pull
# a stale image).
if: steps.gate.outputs.proceed == 'true'
run: |
set -euo pipefail
for img in "${IMAGE_NAME}" "${TENANT_IMAGE_NAME}"; do
tag="${img}:staging-${{ steps.sha.outputs.short }}"
if ! crane manifest "$tag" >/dev/null 2>&1; then
echo "::error::Missing tag: $tag"
echo "::error::publish-workspace-server-image must complete on this SHA before auto-promote can retag :latest."
exit 1
fi
echo " ok: $tag exists"
done
- name: Ancestry check — refuse to promote :latest backwards
# #2244: workflow_run completions arrive in arbitrary order. If
# SHA-A and SHA-B both reach main within ~10 min and SHA-B's E2E
# completes before SHA-A's, this workflow can fire for SHA-A
# AFTER it already promoted SHA-B → :latest goes backwards. The
# orphan-reconciler "next run corrects it" doesn't apply: there's
# no auto-corrective re-promote, :latest stays wrong until the
# next main push lands.
#
# Detection: read current :latest's `org.opencontainers.image.revision`
# label (set by publish-workspace-server-image.yml at build time)
# and ask the GitHub compare API whether the candidate SHA is
# ahead-of / identical-to / behind / diverged-from current.
# Hard-fail on `behind` and `diverged` per the approved design —
# silent-bypass is the class we're moving away from. Workflow
# goes red, oncall sees it, operator decides how to recover
# (manual dispatch with the right SHA, force-promote, etc.).
#
# Manual dispatch skips this check — operator override semantics
# match the gate-check step above.
#
# Backward-compat: when current :latest carries no revision
# label (legacy image pre-publish-with-label), skip-with-warning.
# All :latest images on main are post-label as of 2026-04-29, so
# this branch will be dead within 90 days; remove then.
if: steps.gate.outputs.proceed == 'true' && github.event_name != 'workflow_dispatch'
id: ancestry
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
REPO: ${{ github.repository }}
TARGET_SHA: ${{ steps.sha.outputs.full }}
run: |
set -euo pipefail
# Read the current :latest config and pull the revision label.
# `crane config` returns the OCI image config blob (not the manifest);
# labels live under `.config.Labels`. `// empty` makes jq return ""
# rather than the literal "null" so the test below works.
CURRENT_REVISION=$(crane config "${IMAGE_NAME}:latest" 2>/dev/null \
| jq -r '.config.Labels["org.opencontainers.image.revision"] // empty' \
|| true)
if [ -z "$CURRENT_REVISION" ]; then
echo "decision=skip-no-label" >> "$GITHUB_OUTPUT"
{
echo "## ⚠ Ancestry check skipped — current :latest has no revision label"
echo
echo "Likely a legacy image built before \`org.opencontainers.image.revision\` was set."
echo "Falling through to retag. After all \`:latest\` images are post-label (TODO 90 days), this branch is dead and should be removed."
} >> "$GITHUB_STEP_SUMMARY"
echo "::warning::Current :latest carries no revision label — skipping ancestry check (legacy image)"
exit 0
fi
if [ "$CURRENT_REVISION" = "$TARGET_SHA" ]; then
echo "decision=identical" >> "$GITHUB_OUTPUT"
echo "::notice:::latest already at ${TARGET_SHA:0:7} — retag will be a no-op"
exit 0
fi
# Ask GitHub which side of the merge graph TARGET_SHA sits on
# relative to CURRENT_REVISION. Returns one of: ahead | identical
# | behind | diverged. Network or auth errors collapse to "error"
# via the explicit fallback so the case below always matches.
STATUS=$(gh api \
"repos/${REPO}/compare/${CURRENT_REVISION}...${TARGET_SHA}" \
--jq '.status' 2>/dev/null || echo "error")
echo "ancestry compare ${CURRENT_REVISION:0:7} → ${TARGET_SHA:0:7}: $STATUS"
case "$STATUS" in
ahead)
echo "decision=ahead" >> "$GITHUB_OUTPUT"
echo "::notice::Target ${TARGET_SHA:0:7} is ahead of current :latest (${CURRENT_REVISION:0:7}) — proceeding with retag"
;;
identical)
echo "decision=identical" >> "$GITHUB_OUTPUT"
echo "::notice::Target identical to :latest — retag will be a no-op"
;;
behind)
echo "decision=behind" >> "$GITHUB_OUTPUT"
{
echo "## ❌ Auto-promote refused — target is BEHIND current :latest"
echo
echo "| Field | Value |"
echo "|---|---|"
echo "| Target SHA | \`$TARGET_SHA\` |"
echo "| Current :latest revision | \`$CURRENT_REVISION\` |"
echo "| GitHub compare status | \`behind\` |"
echo
echo "This guard catches the workflow_run-completion-order race (#2244):"
echo "two rapid main pushes whose E2Es complete out-of-order can otherwise"
echo "promote \`:latest\` backwards. \`:latest\` stays on \`${CURRENT_REVISION:0:7}\`."
echo
echo "**Recovery:** if this is a legitimate revert that should land on \`:latest\`,"
echo "manually dispatch this workflow with the target sha as input — the manual-dispatch"
echo "path skips the ancestry check (operator override)."
} >> "$GITHUB_STEP_SUMMARY"
exit 1
;;
diverged)
echo "decision=diverged" >> "$GITHUB_OUTPUT"
{
echo "## ❓ Auto-promote refused — history diverged"
echo
echo "| Field | Value |"
echo "|---|---|"
echo "| Target SHA | \`$TARGET_SHA\` |"
echo "| Current :latest revision | \`$CURRENT_REVISION\` |"
echo "| GitHub compare status | \`diverged\` |"
echo
echo "Likely cause: force-push rewrote main's history, leaving the previous"
echo "\`:latest\` revision orphaned. Needs human review before \`:latest\` advances."
} >> "$GITHUB_STEP_SUMMARY"
exit 1
;;
error|*)
echo "decision=error" >> "$GITHUB_OUTPUT"
{
echo "## ❌ Auto-promote aborted — ancestry-check API error"
echo
echo "\`gh api repos/${REPO}/compare/${CURRENT_REVISION}...${TARGET_SHA}\` returned unexpected status: \`$STATUS\`"
echo
echo "Manual dispatch with the target sha bypasses this check."
} >> "$GITHUB_STEP_SUMMARY"
exit 1
;;
esac
- name: Retag platform :staging-<sha> → :latest
if: steps.gate.outputs.proceed == 'true'
run: |
crane tag "${IMAGE_NAME}:staging-${{ steps.sha.outputs.short }}" latest
- name: Retag tenant :staging-<sha> → :latest
if: steps.gate.outputs.proceed == 'true'
run: |
crane tag "${TENANT_IMAGE_NAME}:staging-${{ steps.sha.outputs.short }}" latest
- name: Summary
if: steps.gate.outputs.proceed == 'true'
run: |
{
echo "## :latest promoted to ${{ steps.sha.outputs.short }}"
echo
if [ "${{ github.event_name }}" = "workflow_dispatch" ]; then
echo "- Trigger: manual dispatch"
else
echo "- Upstream: \`${{ github.event.workflow_run.name }}\` ([run](${{ github.event.workflow_run.html_url }}))"
fi
echo "- platform:staging-${{ steps.sha.outputs.short }} → :latest"
echo "- platform-tenant:staging-${{ steps.sha.outputs.short }} → :latest"
echo
echo "Tenant fleet auto-pulls within 5 min via IMAGE_AUTO_REFRESH=true."
echo "Force immediate fanout: dispatch redeploy-tenants-on-main.yml."
} >> "$GITHUB_STEP_SUMMARY"

View File

@ -1,492 +0,0 @@
name: Auto-promote staging → main
# Fires after any of the staging-branch quality gates complete. When ALL
# required gates are green on the same staging SHA, opens (or re-uses)
# a PR `staging → main` and schedules Gitea auto-merge so the PR lands
# automatically once approval + status checks are satisfied.
#
# ============================================================
# What this workflow does
# ============================================================
#
# 1. On a workflow_run completion event for one of the staging gate
# workflows (CI, E2E Staging Canvas, E2E API Smoke, CodeQL),
# checks if the combined status on the staging head SHA is green.
# 2. If green, opens (or re-uses) a PR `head: staging → base: main`
# via Gitea REST `POST /api/v1/repos/.../pulls`.
# 3. Schedules auto-merge via `POST /api/v1/repos/.../pulls/{index}/merge`
# with `merge_when_checks_succeed: true`. Gitea waits for the
# approval requirement on `main` (`required_approvals: 1`) and
# the status-check gates, then merges.
# 4. The merge commit lands on `main` and fires
# `publish-workspace-server-image.yml` naturally via its
# `on: push: branches: [main]` trigger — no explicit dispatch
# needed (see "Why no workflow_dispatch tail" below).
#
# `auto-sync-main-to-staging.yml` is the reverse-direction
# counterpart (main → staging, fast-forward push). Together they
# keep the staging-superset-of-main invariant tight.
#
# ============================================================
# Why Gitea REST (and not `gh pr create`)
# ============================================================
#
# Pre-2026-05-06 this workflow used `gh pr create`, `gh pr merge --auto`,
# `gh run list`, and `gh workflow run` against GitHub. After the
# GitHub→Gitea cutover those calls fail because:
#
# - `gh pr create / merge / view / list` route to GitHub GraphQL
# (`/api/graphql`). Gitea does not expose a GraphQL endpoint;
# every call returns `HTTP 405 Method Not Allowed` — same root
# cause as #65 (auto-sync) which PR #66 fixed by dropping `gh`
# entirely.
# - `gh run list --workflow=...` GitHub-shape; Gitea has the
# simpler `GET /repos/.../commits/{ref}/status` combined-status
# endpoint instead.
# - `gh workflow run X.yml` calls `POST /repos/.../actions/workflows/{id}/dispatches`,
# which does NOT exist on Gitea 1.22.6 (verified via swagger.v1.json).
#
# So this workflow uses direct `curl` calls to Gitea REST. No `gh`
# CLI dependency, no GraphQL, no missing-endpoint footgun.
#
# ============================================================
# Why no workflow_dispatch tail (was load-bearing on GitHub, dead on Gitea)
# ============================================================
#
# The GitHub-era version had a 60-line polling step that waited for
# the promote PR to merge, then explicitly dispatched
# `publish-workspace-server-image.yml` on `--ref main`. That step
# existed because GitHub's GITHUB_TOKEN-initiated merges suppress
# downstream `on: push` workflows (the documented "no recursion" rule
# — https://docs.github.com/en/actions/using-workflows/triggering-a-workflow#triggering-a-workflow-from-a-workflow).
# The explicit dispatch was the workaround.
#
# Gitea Actions does NOT have this no-recursion rule. PR #66's auto-
# sync merge to main fired `auto-promote-staging` on the next push
# trigger naturally. So the cascade fires on the natural push event;
# the explicit dispatch is dead code. (And even if we wanted to
# preserve it, Gitea has no `workflow_dispatch` REST endpoint.)
#
# Removed in this rewrite. If we ever observe the cascade misfire,
# operator can push an empty commit to `main` to wake it.
#
# ============================================================
# Why open a PR (and not direct push)
# ============================================================
#
# `main` branch protection has `enable_push: false` with NO
# `push_whitelist_usernames`. Direct push is impossible for any
# persona, including admins. PR-mediated merge is the only path,
# which is intentional: prod state mutations (and staging→main IS a
# prod mutation, since the next deploy fans out to tenants) require
# Hongming's approval per `feedback_prod_apply_needs_hongming_chat_go`.
#
# The auto-merge schedule preserves this gate: `merge_when_checks_succeed`
# does NOT bypass `required_approvals: 1`. Gitea waits for BOTH
# approval AND green checks before merging. Hongming reviews via the
# canvas/chat-handle of the PR notification, approves, and Gitea
# auto-merges within seconds.
#
# ============================================================
# Identity + token (anti-bot-ring per saved-memory
# `feedback_per_agent_gitea_identity_default`)
# ============================================================
#
# This workflow uses `secrets.AUTO_SYNC_TOKEN` — a personal access
# token issued to the `devops-engineer` Gitea persona. NOT the
# founder PAT. The bot-ring fingerprint that triggered the GitHub
# org suspension on 2026-05-06 was characterised by founder PAT
# acting as CI at machine speed.
#
# Token scope: `push: true` (read+write) on this repo. The persona
# can: open PRs, comment on PRs, schedule auto-merge. The persona
# CANNOT bypass main's branch protection (`required_approvals: 1`
# still applies — only Hongming's review unblocks merge).
#
# Authorship: the PR is opened by `devops-engineer`; the merge
# commit credits Hongming-as-approver and `devops-engineer` as
# the merger.
#
# ============================================================
# Failure modes & operational notes
# ============================================================
#
# A — staging gates not all green at trigger time:
# - The combined-status check returns `state: pending|failure`.
# Workflow exits 0 with a step-summary "not all green; staying
# on current main". Re-fires on the next gate completion.
#
# B — Gitea PR-create returns non-201 (e.g. 422 already-exists):
# - Idempotent: the workflow first GETs the existing open
# staging→main PR. If found, reuse it; if not, POST a new one.
# 422 should never surface; if it does (race), step summary
# captures the body and the next workflow_run picks up.
#
# C — `merge_when_checks_succeed` schedule fails:
# - 422 with "Pull request is not mergeable" if there are
# conflicts or stale base. Step summary surfaces it; operator
# (or `auto-sync-main-to-staging`) needs to bring staging up
# to date with main first. Workflow exits 1 to surface red.
#
# D — `AUTO_SYNC_TOKEN` rotated / wrong scope:
# - 401/403 on first REST call. Step summary surfaces it.
# Re-issue the token from `~/.molecule-ai/personas/` on the
# operator host and update the repo Actions secret.
#
# ============================================================
# Loop safety
# ============================================================
#
# When the promote PR merges to main, `auto-sync-main-to-staging.yml`
# fires (on:push:main) and pushes the merge commit back to staging.
# That push to staging is by `devops-engineer`, NOT this workflow's
# token, and triggers the staging gate workflows. When they all
# complete, we end up back here — but the tree-diff guard catches
# it: staging tree == main tree (the merge commit changes nothing),
# so we skip and the cycle terminates.
on:
workflow_run:
workflows:
- CI
- E2E Staging Canvas (Playwright)
- E2E API Smoke Test
- CodeQL
types: [completed]
workflow_dispatch:
inputs:
force:
description: "Force promote even when AUTO_PROMOTE_ENABLED is unset (manual override)"
required: false
default: "false"
permissions:
contents: read
pull-requests: write
# Serialize auto-promote runs. Multiple staging gate completions can land
# in quick succession (CI + E2E + CodeQL all finish within seconds of
# each other on a green PR) — without this, two parallel runs both:
# 1. Would race the GET-or-POST PR step.
# 2. Would both call merge-schedule (idempotent — fine on Gitea).
# cancel-in-progress: false because the second run on a fresh staging
# tip should NOT kill the first which has already opened the PR.
concurrency:
group: auto-promote-staging
cancel-in-progress: false
jobs:
check-all-gates-green:
# Only consider staging pushes. PRs into staging don't promote.
if: >
(github.event_name == 'workflow_run' &&
github.event.workflow_run.head_branch == 'staging' &&
github.event.workflow_run.event == 'push')
|| github.event_name == 'workflow_dispatch'
runs-on: ubuntu-latest
outputs:
all_green: ${{ steps.gates.outputs.all_green }}
head_sha: ${{ steps.gates.outputs.head_sha }}
steps:
# Skip empty-tree promotes (the perpetual auto-promote↔auto-sync
# cycle observed pre-cutover on GitHub). On Gitea the cycle shape
# is different (auto-sync uses fast-forward, no merge commit),
# but the tree-diff guard is cheap insurance and protects against
# any future merge-style regression.
- name: Checkout for tree-diff check
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
fetch-depth: 0
ref: staging
- name: Skip if staging tree == main tree (cycle-break safety)
id: tree-diff
env:
HEAD_SHA: ${{ github.event.workflow_run.head_sha || github.sha }}
run: |
set -eu
git fetch origin main --depth=50 || { echo "::warning::git fetch main failed — proceeding (fail-open)"; exit 0; }
if git diff --quiet origin/main "$HEAD_SHA" -- 2>/dev/null; then
{
echo "## Skipped — no code to promote"
echo
echo "staging tip (\`${HEAD_SHA:0:8}\`) and \`main\` have identical trees."
echo "Skipping to avoid opening an empty promote PR."
} >> "$GITHUB_STEP_SUMMARY"
echo "::notice::auto-promote: staging tree == main tree — no code to promote, skipping"
echo "skip=true" >> "$GITHUB_OUTPUT"
else
echo "skip=false" >> "$GITHUB_OUTPUT"
fi
- name: Check combined status on staging head
if: steps.tree-diff.outputs.skip != 'true'
id: gates
env:
GITEA_TOKEN: ${{ secrets.AUTO_SYNC_TOKEN }}
HEAD_SHA: ${{ github.event.workflow_run.head_sha || github.sha }}
REPO: ${{ github.repository }}
GITEA_HOST: ${{ vars.GITEA_HOST || 'https://git.moleculesai.app' }}
run: |
set -euo pipefail
# Gitea-native combined-status endpoint aggregates every
# check context attached to a SHA. This is structurally
# cleaner than the GitHub-era per-workflow `gh run list`
# loop because:
#
# 1. There's no risk of "workflow name collision" (the
# GitHub-era code had to switch from `--workflow=NAME`
# to `--workflow=FILE.YML` to disambiguate "CodeQL"
# between the explicit workflow and GitHub's UI-
# configured default setup; Gitea has no such
# duplicate-name surface).
# 2. Gitea's combined state already encodes the AND
# across all contexts: success only if EVERY context
# is success. Pending or failure on any context
# produces non-success state.
#
# See https://docs.gitea.com/api/1.22 for the schema —
# `state` is one of: success, pending, failure, error.
echo "head_sha=${HEAD_SHA}" >> "$GITHUB_OUTPUT"
echo "Checking combined status on SHA ${HEAD_SHA}"
# `set +o pipefail` for the http-code capture pattern; restore
# immediately. Pattern hardened per `feedback_curl_status_capture_pollution`.
BODY_FILE=$(mktemp)
set +e
STATUS=$(curl -sS \
-H "Authorization: token ${GITEA_TOKEN}" \
-H "Accept: application/json" \
-o "${BODY_FILE}" \
-w "%{http_code}" \
"${GITEA_HOST}/api/v1/repos/${REPO}/commits/${HEAD_SHA}/status")
CURL_RC=$?
set -e
if [ "${CURL_RC}" -ne 0 ] || [ "${STATUS}" != "200" ]; then
echo "::error::combined-status fetch failed: curl=${CURL_RC} http=${STATUS}"
cat "${BODY_FILE}" | head -c 500 || true
rm -f "${BODY_FILE}"
echo "all_green=false" >> "$GITHUB_OUTPUT"
exit 0
fi
STATE=$(jq -r '.state // "missing"' < "${BODY_FILE}")
TOTAL=$(jq -r '.total_count // 0' < "${BODY_FILE}")
rm -f "${BODY_FILE}"
echo "Combined status: state=${STATE} total_count=${TOTAL}"
if [ "${STATE}" = "success" ] && [ "${TOTAL}" -gt 0 ]; then
echo "all_green=true" >> "$GITHUB_OUTPUT"
echo "::notice::All gates green on ${HEAD_SHA} (${TOTAL} contexts)"
else
echo "all_green=false" >> "$GITHUB_OUTPUT"
{
echo "## Not promoting — combined status not green"
echo
echo "- SHA: \`${HEAD_SHA:0:8}\`"
echo "- Combined state: \`${STATE}\`"
echo "- Context count: ${TOTAL}"
echo
echo "Will re-fire on the next gate completion. Investigate any red gate via the Actions UI."
} >> "$GITHUB_STEP_SUMMARY"
echo "::notice::auto-promote: combined status is ${STATE} on ${HEAD_SHA} — staying on current main"
fi
promote:
needs: check-all-gates-green
if: needs.check-all-gates-green.outputs.all_green == 'true'
runs-on: ubuntu-latest
steps:
- name: Check rollout gate
env:
AUTO_PROMOTE_ENABLED: ${{ vars.AUTO_PROMOTE_ENABLED }}
FORCE_INPUT: ${{ github.event.inputs.force }}
run: |
set -eu
# Repo variable AUTO_PROMOTE_ENABLED=true flips this on. While
# it's unset, the workflow dry-runs (logs what it would have
# done) but doesn't open the promote PR. Set the variable in
# Settings → Actions → Variables.
if [ "${AUTO_PROMOTE_ENABLED:-}" != "true" ] && [ "${FORCE_INPUT:-false}" != "true" ]; then
{
echo "## Auto-promote disabled"
echo
echo "Repo variable \`AUTO_PROMOTE_ENABLED\` is not set to \`true\`."
echo "All gates are green on staging; would have opened a promote PR to \`main\`."
echo
echo "To enable: Settings → Actions → Variables → \`AUTO_PROMOTE_ENABLED=true\`."
echo "To test once manually: workflow_dispatch with \`force=true\`."
} >> "$GITHUB_STEP_SUMMARY"
echo "::notice::auto-promote disabled — dry run only"
exit 0
fi
- name: Open or reuse promote PR + schedule auto-merge
if: ${{ vars.AUTO_PROMOTE_ENABLED == 'true' || github.event.inputs.force == 'true' }}
env:
GITEA_TOKEN: ${{ secrets.AUTO_SYNC_TOKEN }}
REPO: ${{ github.repository }}
TARGET_SHA: ${{ needs.check-all-gates-green.outputs.head_sha }}
GITEA_HOST: ${{ vars.GITEA_HOST || 'https://git.moleculesai.app' }}
run: |
set -euo pipefail
API="${GITEA_HOST}/api/v1/repos/${REPO}"
AUTH=(-H "Authorization: token ${GITEA_TOKEN}" -H "Accept: application/json")
# http_status_get RESULT_VAR URL
# Sets RESULT_VAR to "<http_code>:<body_file>". Curl status
# capture pattern per `feedback_curl_status_capture_pollution`:
# http_code goes to its own tempfile-equivalent (-w), body to
# another tempfile, set +e/-e bracket protects pipeline state.
http_get() {
local body_file="$1"; shift
local url="$1"; shift
set +e
local code
code=$(curl -sS "${AUTH[@]}" -o "${body_file}" -w "%{http_code}" "${url}")
local rc=$?
set -e
if [ "${rc}" -ne 0 ]; then
echo "::error::curl GET failed (rc=${rc}) on ${url}"
return 99
fi
echo "${code}"
}
http_post_json() {
local body_file="$1"; shift
local data="$1"; shift
local url="$1"; shift
set +e
local code
code=$(curl -sS "${AUTH[@]}" -H "Content-Type: application/json" \
-X POST -d "${data}" -o "${body_file}" -w "%{http_code}" "${url}")
local rc=$?
set -e
if [ "${rc}" -ne 0 ]; then
echo "::error::curl POST failed (rc=${rc}) on ${url}"
return 99
fi
echo "${code}"
}
# Step 1: look for an existing open staging→main promote PR
# (idempotent on workflow re-run). Gitea doesn't have a
# head/base filter on the list endpoint that's as ergonomic
# as gh's, but the dedicated `/pulls/{base}/{head}` lookup
# works.
BODY=$(mktemp)
STATUS=$(http_get "${BODY}" "${API}/pulls/main/staging") || true
PR_NUM=""
if [ "${STATUS}" = "200" ]; then
STATE=$(jq -r '.state // "missing"' < "${BODY}")
if [ "${STATE}" = "open" ]; then
PR_NUM=$(jq -r '.number // ""' < "${BODY}")
echo "::notice::Re-using existing open promote PR #${PR_NUM}"
fi
fi
rm -f "${BODY}"
# Step 2: if no open PR, create one.
if [ -z "${PR_NUM}" ]; then
TITLE="staging → main: auto-promote ${TARGET_SHA:0:7}"
BODY_TEXT=$(cat <<EOFBODY
Automated promotion of \`staging\` (\`${TARGET_SHA:0:8}\`) to \`main\`. All required staging gates are green at this SHA (combined status reported success).
This PR is auto-generated by \`.github/workflows/auto-promote-staging.yml\` whenever every required gate completes green on the same staging SHA.
**Approval gate:** \`main\` branch protection requires 1 approval before this can land. Once approved, Gitea will auto-merge (the workflow scheduled \`merge_when_checks_succeed: true\` immediately after open).
The reverse-direction sync (the merge commit on \`main\` → \`staging\`) is handled automatically by \`auto-sync-main-to-staging.yml\` after this PR lands.
---
- Source: staging at \`${TARGET_SHA}\`
- Opened by: \`devops-engineer\` persona (anti-bot-ring; never founder PAT)
- Refs: #65, #73, #195
EOFBODY
)
REQ=$(jq -n \
--arg title "${TITLE}" \
--arg body "${BODY_TEXT}" \
--arg base "main" \
--arg head "staging" \
'{title:$title, body:$body, base:$base, head:$head}')
BODY=$(mktemp)
STATUS=$(http_post_json "${BODY}" "${REQ}" "${API}/pulls")
if [ "${STATUS}" = "201" ]; then
PR_NUM=$(jq -r '.number // ""' < "${BODY}")
echo "::notice::Opened promote PR #${PR_NUM}"
else
echo "::error::Failed to create promote PR: HTTP ${STATUS}"
jq -r '.message // .' < "${BODY}" | head -c 500
rm -f "${BODY}"
exit 1
fi
rm -f "${BODY}"
fi
# Step 3: schedule auto-merge. merge_when_checks_succeed
# tells Gitea to wait for both:
# - all required status checks to pass
# - the required-approvals gate (1 approval on main)
# before merging. On approval+green, Gitea merges within
# seconds. On any check failing or approval being denied,
# the schedule stays armed but doesn't fire.
#
# Idempotent: re-arming on an already-armed PR is a no-op.
REQ=$(jq -n '{Do:"merge", merge_when_checks_succeed:true}')
BODY=$(mktemp)
STATUS=$(http_post_json "${BODY}" "${REQ}" "${API}/pulls/${PR_NUM}/merge")
# Gitea returns:
# - 200/204 on successful immediate merge (gates already green AND approved)
# - 405 "Please try again later" when scheduled successfully but waiting
# - 422 on "Pull request is not mergeable" (conflict, stale base, etc.)
#
# 405 here is benign — Gitea's way of saying "scheduled, not merging now".
# We treat 200/204/405 as success, anything else as failure.
case "${STATUS}" in
200|204)
MERGE_OUTCOME="merged-immediately"
echo "::notice::Promote PR #${PR_NUM} merged immediately (gates+approval already green)"
;;
405)
MERGE_OUTCOME="auto-merge-scheduled"
echo "::notice::Promote PR #${PR_NUM}: auto-merge scheduled (Gitea will land on approval+green)"
;;
422)
MERGE_OUTCOME="not-mergeable"
echo "::warning::Promote PR #${PR_NUM}: not mergeable (conflict, stale base, or already merging)."
jq -r '.message // .' < "${BODY}" | head -c 500
;;
*)
echo "::error::Unexpected status ${STATUS} on merge schedule"
jq -r '.message // .' < "${BODY}" | head -c 500
rm -f "${BODY}"
exit 1
;;
esac
rm -f "${BODY}"
{
echo "## Auto-promote PR opened"
echo
echo "- Source: staging at \`${TARGET_SHA:0:8}\`"
echo "- PR: #${PR_NUM}"
echo "- Outcome: \`${MERGE_OUTCOME}\`"
echo
if [ "${MERGE_OUTCOME}" = "auto-merge-scheduled" ]; then
echo "Gitea will auto-merge once Hongming approves and all checks are green. No human action needed beyond approval."
elif [ "${MERGE_OUTCOME}" = "merged-immediately" ]; then
echo "Merged immediately. \`publish-workspace-server-image.yml\` will fire naturally on the resulting \`main\` push."
else
echo "PR is not auto-merging. Operator may need to bring staging up to date with main, then re-trigger this workflow via workflow_dispatch."
fi
} >> "$GITHUB_STEP_SUMMARY"

View File

@ -1,83 +0,0 @@
name: auto-promote-stale-alarm
# Hourly cron + on-demand alarm for the silent-block failure mode that
# motivated issue #2975:
# - The auto-promote-staging.yml workflow opened a PR + armed
# auto-merge, but main's branch protection requires a human review
# (reviewDecision=REVIEW_REQUIRED). The PR sat BLOCKED with no
# surface-up-the-stack for 12+ hours, holding 25 commits hostage
# including the Memory v2 redesign and a reno-stars data-loss fix.
#
# This workflow runs `scripts/check-stale-promote-pr.sh` against the
# repo's open auto-promote PRs (base=main head=staging). When a PR has
# been BLOCKED on REVIEW_REQUIRED for >4h, it:
# 1. Emits a workflow-level warning (visible in run summary + the
# Actions UI feed).
# 2. Posts a comment on the PR (idempotent — one alarm per PR).
#
# The detection logic lives in scripts/check-stale-promote-pr.sh so
# it's unit-testable with stubbed `gh` (see test-check-stale-promote-pr.sh).
# This file is the schedule + invocation surface only — SSOT for the
# detector itself.
on:
schedule:
# Hourly. Cheap (one `gh pr list` + jq), and 1h granularity is
# plenty for a 4h staleness threshold — operators see the alarm
# within at most 1h of crossing the threshold.
- cron: "27 * * * *" # at :27 to dodge the cron herd at :00
workflow_dispatch:
inputs:
stale_hours:
description: "Hours after which a BLOCKED+REVIEW_REQUIRED PR is stale (default 4)"
required: false
default: "4"
post_comment:
description: "Post a comment on stale PRs (default true)"
required: false
default: "true"
permissions:
contents: read
pull-requests: write # post comments on stale PRs
# Serialize so the on-demand and scheduled runs don't double-comment
# the same PR. cancel-in-progress=false because the script is idempotent
# (existing comment marker prevents dupes), but a scheduled run firing
# while a manual one runs would just re-list the same PR set.
concurrency:
group: auto-promote-stale-alarm
cancel-in-progress: false
jobs:
scan:
runs-on: ubuntu-latest
steps:
- name: Checkout (need scripts/ only)
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
sparse-checkout: |
scripts/check-stale-promote-pr.sh
sparse-checkout-cone-mode: false
- name: Run stale-PR detector
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
GITHUB_REPOSITORY: ${{ github.repository }}
STALE_HOURS: ${{ inputs.stale_hours || '4' }}
POST_COMMENT: ${{ inputs.post_comment || 'true' }}
run: |
# The script's exit code reflects the count of stale PRs.
# We don't want a stale finding to fail the workflow run —
# the warning + comment are the signal, the green/red is
# noise. So convert any non-zero exit to a workflow notice
# and exit 0.
set +e
bash scripts/check-stale-promote-pr.sh
rc=$?
set -e
if [ "$rc" -ne 0 ]; then
echo "::notice::Stale PR detector found $rc PR(s) needing attention. See warnings above + comments on the PRs."
fi
# Always succeed — operator-facing surface is the warning,
# not the workflow status.
exit 0

View File

@ -1,404 +0,0 @@
name: Auto-sync canary — AUTO_SYNC_TOKEN rotation drift
# Synthetic health check for the AUTO_SYNC_TOKEN secret consumed by
# auto-sync-main-to-staging.yml (PR #66) and publish-workspace-server-image.yml.
#
# ============================================================
# Why this workflow exists
# ============================================================
#
# PR #66 fixed auto-sync (replaced GitHub-era `gh pr create` — which
# 405s on Gitea's GraphQL endpoint — with a direct git push from the
# `devops-engineer` persona's `AUTO_SYNC_TOKEN`). Hostile self-review
# weakest spot #3 of that PR:
#
# "Token rotation silently breaks auto-sync. If AUTO_SYNC_TOKEN is
# rotated without updating the repo secret, every push to main
# fails red on the auto-sync push step. The workflow surfaces the
# failure mode in the step summary (failure mode B in the header),
# but there's no proactive monitoring."
#
# Detection latency under the status quo: rotation is only caught on
# the next push to `main`. During quiet periods (no main push for
# many hours) the staging-superset-of-main invariant silently breaks.
#
# This workflow closes the gap: every 6 hours, it fires the auth
# surface that auto-sync depends on and emits a red workflow status
# if AUTO_SYNC_TOKEN has drifted out of validity.
#
# ============================================================
# What this checks (Option B — read-only verify)
# ============================================================
#
# 1. `GET /api/v1/user` against Gitea with the token → validates the
# token authenticates AND resolves to `devops-engineer` (catches
# the case where the token was regenerated under a different
# persona by mistake).
# 2. `GET /api/v1/repos/molecule-ai/molecule-core` with the token →
# validates the token has `read:repository` scope on this repo
# (the v2 scope contract — see saved memory
# `reference_persona_token_v2_scope`).
# 3. `git push --dry-run` of the current staging SHA back to
# `refs/heads/staging` via `https://oauth2:<token>@<gitea>/...`
# → validates the EXACT HTTPS basic-auth path that
# `actions/checkout` + `git push origin staging` use inside
# auto-sync-main-to-staging.yml. NOP by construction (push the
# current tip to itself = "Everything up-to-date"); auth is
# checked at the smart-protocol handshake BEFORE the empty-diff
# computation, so bad token → exit 128 with "Authentication
# failed". `git ls-remote` is NOT used here because Gitea
# falls back to anonymous read on public repos and would
# silently green-light a rotated token.
#
# Each step exits non-zero with an actionable error message if it
# fails. The workflow status itself is the operator-facing surface.
#
# ============================================================
# What this does NOT check (intentional)
# ============================================================
#
# - **Branch-protection authz** (failure mode C in auto-sync header):
# would require an actual write to staging. Already monitored by
# `branch-protection-drift.yml` daily. Don't duplicate.
# - **Conflict resolution** (failure mode A): a real conflict is data-
# driven, not auth-driven; can't synthesise it without polluting
# staging. Already surfaces immediately on the next main push.
# - **Concurrency** (failure mode D): handled by workflow concurrency
# group on auto-sync, not a credential issue.
#
# ============================================================
# Why Option B (read-only) and not the alternatives
# ============================================================
#
# Considered + rejected (see issue #72 for full write-up):
#
# - **Option A — full auto-sync on schedule**: every run creates a
# no-op merge commit on staging when main hasn't advanced. 4 noise
# commits/day. And races the real `push:` trigger when main has
# advanced. Rejected.
#
# - **Option C — push to dedicated `auto-sync-canary` branch**: would
# exercise authz too, but adds branch noise on Gitea AND requires
# maintaining a second branch protection (or expanding staging's
# whitelist to a junk branch). Authz already covered by
# `branch-protection-drift.yml`. Rejected.
#
# Prior art for the chosen Option B shape:
# - Cloudflare's `/user/tokens/verify` endpoint (read-only auth
# probe explicitly designed for credential canaries).
# - AWS Secrets Manager rotation Lambda's `testSecret` step (auth
# probe before promoting AWSPENDING → AWSCURRENT).
# - HashiCorp Vault's `vault token lookup` for renewal canaries.
#
# ============================================================
# Operator runbook — what to do when this workflow goes RED
# ============================================================
#
# 1. **Identify which step failed**:
# - Step "Verify token authenticates as devops-engineer" red →
# token is invalid OR resolves to wrong persona.
# - Step "Verify token has repo read scope" red → token valid but
# stripped of `read:repository` scope (or repo perms changed).
# - Step "Verify git HTTPS auth path via no-op dry-run push to
# staging" red → token rotated/revoked OR Gitea git-HTTPS
# surface is broken (rare). Auth check happens on the
# smart-protocol handshake, separate from the API path.
#
# 2. **Re-issue the token** on the operator host:
# ```
# ssh root@5.78.80.188 'docker exec --user git molecule-gitea-1 \
# gitea admin user generate-access-token \
# --username devops-engineer \
# --token-name persona-devops-engineer-vN \
# --scopes "read:repository,write:repository,read:user,read:organization,read:issue,write:issue,read:notification,read:misc"'
# ```
# Update `/etc/molecule-bootstrap/agent-secrets.env` in place
# (per `feedback_unified_credentials_file`). The previous token
# file lands at `.bak.<date>`.
#
# 3. **Update the repo Actions secret** at:
# Settings → Secrets and variables → Actions → AUTO_SYNC_TOKEN
# Paste the new token. (Don't echo it in chat — but per
# `feedback_passwords_in_chat_are_burned`, a paste in a 1:1
# Claude session is within trust boundary.)
#
# 4. **Re-run this canary** via workflow_dispatch. Confirm GREEN.
#
# 5. **Backfill any missed main → staging syncs** by re-running
# `auto-sync-main-to-staging.yml` from its workflow_dispatch
# surface, OR by pushing an empty commit to main (if you'd
# rather force a real trigger).
#
# ============================================================
# Security notes
# ============================================================
#
# - Token usage: read-only (`GET /api/v1/user`, `GET /api/v1/repos/...`,
# `git ls-remote`). No write paths. Same blast-radius profile as
# `actions/checkout` on a public repo.
# - The token NEVER appears in logs: every `curl` uses a header
# variable, never inline; the `git ls-remote` URL builds the
# `oauth2:$TOKEN@host` form into a single env var that's not
# echoed. GitHub Actions secret-masking covers anything that does
# slip through.
# - No new token introduced — same `AUTO_SYNC_TOKEN` the workflow
# under monitor uses. Per least-privilege we deliberately do NOT
# broaden scope for the canary.
on:
schedule:
# Every 6 hours at :17 (offsets the cron herd at :00). Justification
# from issue #72: cheap to run (~5s wall-clock, no quota), 3h average
# detection latency, 6h max. 1h would be 24× the runs for marginal
# benefit; daily would be 6× longer latency and worse than status
# quo on a quiet-main day.
- cron: '17 */6 * * *'
workflow_dispatch:
# No concurrency group needed — the canary is read-only and idempotent.
# Two parallel runs (e.g. operator dispatch during a scheduled tick) are
# harmless: same result, doubled HTTPS calls, no shared state.
permissions:
contents: read
jobs:
verify-token:
name: Verify AUTO_SYNC_TOKEN validity
runs-on: ubuntu-latest
# 2 min surfaces hangs (Gitea API stall, DNS issue) within one
# cron interval. Realistic worst case is ~10s: 2 curls + 1 git
# ls-remote, each capped by the explicit timeouts below.
timeout-minutes: 2
env:
# Pinned in env so individual steps can read it without
# repeating the secret reference. GitHub masks the value in
# logs automatically.
AUTO_SYNC_TOKEN: ${{ secrets.AUTO_SYNC_TOKEN }}
# MUST stay in sync with auto-sync-main-to-staging.yml's
# `git config user.name "devops-engineer"` line. Renaming the
# devops-engineer persona requires updating both files (and
# the staging branch protection's `push_whitelist_usernames`).
EXPECTED_PERSONA: devops-engineer
GITEA_HOST: git.moleculesai.app
REPO_PATH: molecule-ai/molecule-core
steps:
- name: Verify AUTO_SYNC_TOKEN secret is configured
# Schedule-vs-dispatch behaviour split, per
# `feedback_schedule_vs_dispatch_secrets_hardening`:
#
# - schedule: hard-fail when the secret is missing. The
# whole point of the canary is to surface drift; soft-
# skipping on missing-secret would make the canary
# itself drift-invisible (sweep-cf-orphans #2088 lesson).
# - workflow_dispatch: hard-fail too — there's no scenario
# where an operator wants this canary to silently no-op.
# The workflow has no other ad-hoc utility; if you ran
# it, you wanted the answer.
run: |
if [ -z "${AUTO_SYNC_TOKEN}" ]; then
echo "::error::AUTO_SYNC_TOKEN secret is not set on this repo." >&2
echo "::error::Set it at Settings → Secrets and variables → Actions." >&2
echo "::error::Without it, auto-sync-main-to-staging.yml will fail every push to main." >&2
exit 1
fi
echo "AUTO_SYNC_TOKEN is configured (value masked)."
- name: Verify token authenticates as ${{ env.EXPECTED_PERSONA }}
# Calls Gitea's `/api/v1/user` — the canonical
# auth-probe-with-no-side-effects endpoint (mirrors
# Cloudflare's /user/tokens/verify).
#
# Failure surfaces:
# - HTTP 401: token invalid (rotated, revoked, or never
# correctly registered).
# - HTTP 200 but username != devops-engineer: token was
# regenerated under the wrong persona — this would let
# auth pass but commit attribution would be wrong, and
# branch-protection authz would fail because only
# `devops-engineer` is whitelisted.
run: |
set -euo pipefail
response_file="$(mktemp)"
code_file="$(mktemp)"
# `--max-time 30`: full call ceiling. `--connect-timeout 10`:
# DNS + TCP. `-w "%{http_code}"` routed to a tempfile so curl's
# exit code can't pollute the captured status — see
# feedback_curl_status_capture_pollution + the
# `lint-curl-status-capture.yml` gate that rejects the unsafe
# `$(curl ... || echo "000")` shape.
set +e
curl -sS -o "$response_file" \
--max-time 30 --connect-timeout 10 \
-w "%{http_code}" \
-H "Authorization: token ${AUTO_SYNC_TOKEN}" \
-H "Accept: application/json" \
"https://${GITEA_HOST}/api/v1/user" >"$code_file" 2>/dev/null
set -e
status=$(cat "$code_file" 2>/dev/null || true)
[ -z "$status" ] && status="000"
if [ "$status" != "200" ]; then
echo "::error::Token rotation suspected: GET /api/v1/user returned HTTP $status (expected 200)." >&2
echo "::error::Likely cause: AUTO_SYNC_TOKEN has been rotated/revoked on Gitea but the repo Actions secret was not updated." >&2
echo "::error::Runbook: see header comment of this workflow file." >&2
# Print response body but redact anything that looks like a token.
sed -E 's/[A-Fa-f0-9]{32,}/<redacted>/g' "$response_file" >&2 || true
exit 1
fi
username=$(python3 -c "import json,sys; print(json.load(open(sys.argv[1])).get('login',''))" "$response_file")
if [ "$username" != "${EXPECTED_PERSONA}" ]; then
echo "::error::Token resolves to user '$username', expected '${EXPECTED_PERSONA}'." >&2
echo "::error::AUTO_SYNC_TOKEN must be the devops-engineer persona PAT (not founder PAT, not another persona)." >&2
echo "::error::Auto-sync push will fail because only 'devops-engineer' is whitelisted on staging branch protection." >&2
exit 1
fi
echo "Token authenticates as: $username ✓"
- name: Verify token has repo read scope
# `GET /api/v1/repos/<owner>/<repo>` requires `read:repository`
# on the persona's v2 scope contract. If the scope was
# narrowed/dropped on rotation we catch it here, before the
# next main push reveals it via a checkout failure.
run: |
set -euo pipefail
response_file="$(mktemp)"
code_file="$(mktemp)"
# See first probe step for the rationale on the tempfile-routed
# `-w "%{http_code}"` pattern — the unsafe `|| echo "000"` shape
# is rejected by lint-curl-status-capture.yml.
set +e
curl -sS -o "$response_file" \
--max-time 30 --connect-timeout 10 \
-w "%{http_code}" \
-H "Authorization: token ${AUTO_SYNC_TOKEN}" \
-H "Accept: application/json" \
"https://${GITEA_HOST}/api/v1/repos/${REPO_PATH}" >"$code_file" 2>/dev/null
set -e
status=$(cat "$code_file" 2>/dev/null || true)
[ -z "$status" ] && status="000"
if [ "$status" != "200" ]; then
echo "::error::Token lacks read:repository scope on ${REPO_PATH}: HTTP $status." >&2
echo "::error::Auto-sync's actions/checkout step will fail with this token." >&2
echo "::error::Re-issue with v2 scope contract: read:repository,write:repository,read:user,read:organization,read:issue,write:issue,read:notification,read:misc" >&2
sed -E 's/[A-Fa-f0-9]{32,}/<redacted>/g' "$response_file" >&2 || true
exit 1
fi
echo "Token has read:repository on ${REPO_PATH} ✓"
- name: Verify git HTTPS auth path via no-op dry-run push to staging
# Final probe: exercise the EXACT auth path that
# `actions/checkout` + `git push origin staging` use in
# auto-sync-main-to-staging.yml. Gitea's API and git-HTTPS
# surfaces share the token-lookup code path internally but
# the wire-level error shapes differ — historically (#173)
# the API path was healthy while git-HTTPS rejected, so
# checking only the API would have given false-green.
#
# IMPORTANT: `git ls-remote` on a public repo (which
# molecule-core is) succeeds even with a junk token because
# Gitea falls back to anonymous-read. `ls-remote` therefore
# CANNOT validate auth on this surface. We use
# `git push --dry-run` instead — push is auth-gated even on
# public repos.
#
# NOP shape: read the current staging SHA via authenticated
# ls-remote (the SHA itself is public; auth is incidental
# here, used only to colocate the discovery in one step), then
# `git push --dry-run <SHA>:refs/heads/staging`. Pushing the
# current tip back to itself is "Everything up-to-date" with
# exit 0 when auth succeeds. With a bad token Gitea returns
# HTTP 401 in the smart-protocol handshake and git exits 128
# with "Authentication failed".
#
# The dry-run never reaches Gitea's pre-receive hook (which
# is where branch-protection authz runs), so this probe does
# not validate failure mode C. That's intentional —
# branch-protection-drift.yml owns authz monitoring; this
# canary owns auth.
env:
# Don't hang waiting for password prompt if auth fails on a
# terminal-attached run. (In Actions there's no terminal,
# but the env-var hardens against an interactive runner
# config.)
GIT_TERMINAL_PROMPT: "0"
run: |
set -euo pipefail
# Token is in $AUTO_SYNC_TOKEN (job-level env). Compose the
# URL as a local var that's never echoed.
url="https://oauth2:${AUTO_SYNC_TOKEN}@${GITEA_HOST}/${REPO_PATH}"
# Step a: read current staging SHA. ~1KB; auth-gated only
# on private repos but always works on public — used here
# only to discover the SHA, not to validate auth.
staging_ref=$(timeout 30s git ls-remote --refs "$url" refs/heads/staging 2>&1) || {
redacted=$(echo "$staging_ref" | sed -E "s|oauth2:[^@]+@|oauth2:<redacted>@|g")
echo "::error::ls-remote against staging failed (network/DNS issue):" >&2
echo "$redacted" >&2
exit 1
}
if ! echo "$staging_ref" | grep -qE '^[0-9a-f]{40}[[:space:]]+refs/heads/staging$'; then
echo "::error::ls-remote returned unexpected shape:" >&2
echo "$staging_ref" | sed -E "s|oauth2:[^@]+@|oauth2:<redacted>@|g" >&2
exit 1
fi
staging_sha=$(echo "$staging_ref" | awk '{print $1}')
# Step b: spin up an ephemeral local repo. `git push` always
# requires a local repo even when pushing a remote SHA that
# isn't in the local object DB (the protocol negotiates and
# discovers we don't need to send any objects). We don't use
# `actions/checkout` for this — it would clone the whole
# repo (~hundreds of MB) for what's essentially `git init`.
tmp_repo="$(mktemp -d)"
trap 'rm -rf "$tmp_repo"' EXIT
git -C "$tmp_repo" init -q
# Author config required for any git operation; values are
# arbitrary because nothing gets committed here.
git -C "$tmp_repo" config user.email canary@auto-sync.local
git -C "$tmp_repo" config user.name auto-sync-canary
# Step c: dry-run push the current staging SHA back to
# staging. NOP by construction — the remote tip equals the
# SHA we're pushing, so "Everything up-to-date" is the
# success path.
#
# Authentication is checked at the smart-protocol handshake,
# BEFORE the dry-run can compute an empty diff. Bad token
# → "Authentication failed", exit 128. Good token → exit 0.
set +e
push_out=$(timeout 30s git -C "$tmp_repo" push --dry-run "$url" "${staging_sha}:refs/heads/staging" 2>&1)
push_rc=$?
set -e
if [ "$push_rc" -ne 0 ]; then
redacted=$(echo "$push_out" | sed -E "s|oauth2:[^@]+@|oauth2:<redacted>@|g")
echo "::error::Token rotation suspected: git push --dry-run against staging failed via the AUTO_SYNC_TOKEN HTTPS auth path (exit $push_rc)." >&2
echo "::error::This is the EXACT auth path that actions/checkout + git push use in auto-sync-main-to-staging.yml." >&2
echo "::error::Likely cause: AUTO_SYNC_TOKEN was rotated/revoked on Gitea but the repo Actions secret was not updated. Runbook: see header." >&2
echo "$redacted" >&2
exit 1
fi
echo "git HTTPS auth path: NOP push --dry-run to staging → ${staging_sha:0:8} ✓"
- name: Summarise canary result
# Everything passed — surface a green summary. (Failures
# already wrote ::error:: lines and exited above; if we got
# here, all three probes passed.)
run: |
{
echo "## Auto-sync canary: GREEN"
echo ""
echo "AUTO_SYNC_TOKEN is healthy:"
echo "- Authenticates as \`${EXPECTED_PERSONA}\` ✓"
echo "- Has \`read:repository\` scope on \`${REPO_PATH}\` ✓"
echo "- Git HTTPS auth path: no-op dry-run push to \`refs/heads/staging\` succeeds ✓"
echo ""
echo "Auto-sync main → staging will succeed on the next push to main."
echo "If this canary ever goes RED, see the runbook in this workflow's header."
} >> "$GITHUB_STEP_SUMMARY"

View File

@ -1,255 +0,0 @@
name: Auto-sync main → staging
# Reflects every push to `main` back onto `staging` so the
# staging-as-superset-of-main invariant holds.
#
# ============================================================
# What this workflow does
# ============================================================
#
# On every push to `main`:
# 1. Checks if staging already contains main → no-op.
# 2. Fetches both branches, merges main into staging in the
# runner workspace (fast-forward if possible, else
# `--no-ff` merge commit).
# 3. Pushes staging directly to origin via the
# `devops-engineer` persona's `AUTO_SYNC_TOKEN`.
#
# Authoritative path: a single `git push origin staging` from
# inside this workflow is the SSOT for advancing staging after
# a main push. No PR, no merge queue, no human approval —
# staging is mechanically maintained as a superset of main.
#
# `auto-promote-staging.yml` is the reverse-direction
# counterpart (staging → main, gated on green CI). Together
# they keep the staging-superset-of-main invariant tight.
#
# ============================================================
# Why direct push (and not "open a PR")
# ============================================================
#
# Pre-2026-05-06 the canonical SCM was GitHub.com, where:
# - The `staging` branch had a `merge_queue` ruleset that
# blocked ALL direct pushes (no bypass even for org
# admins or the GitHub Actions integration).
# - Therefore this workflow opened a PR via `gh pr create`
# and let auto-merge land it through the queue.
#
# Post-2026-05-06 the canonical SCM is Gitea
# (`git.moleculesai.app/molecule-ai/molecule-core`). Gitea:
# - Has no `merge_queue` concept.
# - Allows direct push to protected branches via per-user
# `push_whitelist_usernames` on the branch protection.
# - Does not expose a GraphQL endpoint, so `gh pr create`
# returns `HTTP 405 Method Not Allowed
# (https://git.moleculesai.app/api/graphql)` — the
# pre-suspension architecture cannot work on Gitea.
#
# The molecule-ai/molecule-core staging branch protection
# (verified via `GET /api/v1/repos/.../branch_protections`)
# whitelists `devops-engineer` for direct push. So the
# correct Gitea-shape architecture is: authenticate as
# `devops-engineer`, merge locally, push staging directly.
#
# This is structurally simpler than the GitHub-era PR dance
# and removes the dependence on `gh` CLI / GraphQL entirely.
#
# ============================================================
# Identity + token (anti-bot-ring per saved-memory
# `feedback_per_agent_gitea_identity_default`)
# ============================================================
#
# This workflow uses `secrets.AUTO_SYNC_TOKEN`, which is a
# personal access token issued to the `devops-engineer`
# persona on Gitea — NOT the founder PAT. The bot-ring
# fingerprint that triggered the GitHub org suspension on
# 2026-05-06 was characterised by founder PAT acting as CI
# at machine speed; per-persona identities split the
# attribution honestly.
#
# Token scope on Gitea: repo write. Push target restricted
# to `staging` (this workflow is the only writer; main is
# untouched). Compromise blast radius: bounded to staging
# branch + this repo's read surface.
#
# Commits are authored by the persona email
# `devops-engineer@agents.moleculesai.app` so commit history
# reflects which automation produced the merge.
#
# ============================================================
# Failure modes & operational notes
# ============================================================
#
# A — staging has commits main doesn't, and the merge
# conflicts:
# - The `--no-ff` merge step exits non-zero. Workflow
# fails red. Operator (devops-engineer or human)
# resolves manually:
# git fetch origin
# git checkout staging
# git merge --no-ff origin/main
# # resolve conflicts
# git push origin staging
# - Step summary surfaces the conflict so the failed run
# is self-explanatory.
#
# B — `AUTO_SYNC_TOKEN` rotated / wrong scope:
# - `git push` step exits non-zero with `HTTP 401` /
# `403`. Step summary surfaces the failed push.
# - Re-issue the token from `~/.molecule-ai/personas/`
# on the operator host and update the repo Actions
# secret. Re-run the workflow.
#
# C — staging branch protection no longer whitelists
# `devops-engineer`:
# - `git push` exits non-zero with a Gitea protected-
# branch rejection. Step summary surfaces it.
# - Re-add `devops-engineer` to
# `push_whitelist_usernames` on the staging
# protection (Settings → Branches → staging).
#
# D — concurrent push to main while a sync is in flight:
# - The `concurrency` group below serialises runs.
# The second waits for the first; if main advances
# again while we're syncing, the second run picks
# up the new tip on its own fetch.
#
# ============================================================
# Loop safety
# ============================================================
#
# The push to staging from this workflow does NOT itself
# fire a `push: branches: [main]` event (different branch),
# so there's no risk of self-recursion. `auto-promote-staging.yml`
# fires on `workflow_run` of CI etc. — it sees the new
# staging tip on its next gate-completion event, NOT on this
# push directly. No loop.
on:
push:
branches: [main]
# workflow_dispatch lets operators manually backfill a
# missed sync (e.g. if AUTO_SYNC_TOKEN was rotated and a
# main push slipped through while the secret was stale).
workflow_dispatch:
permissions:
contents: write
concurrency:
group: auto-sync-main-to-staging
cancel-in-progress: false
jobs:
sync-staging:
runs-on: ubuntu-latest
steps:
- name: Checkout staging (with devops-engineer push token)
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
fetch-depth: 0
ref: staging
# AUTO_SYNC_TOKEN authenticates as the
# `devops-engineer` Gitea persona — the only
# identity whitelisted for direct push to
# staging. See header comment for context.
token: ${{ secrets.AUTO_SYNC_TOKEN }}
- name: Configure git author
run: |
# Per-persona identity, NOT founder PAT.
# `feedback_per_agent_gitea_identity_default`.
git config user.name "devops-engineer"
git config user.email "devops-engineer@agents.moleculesai.app"
- name: Check if staging already contains main
id: check
run: |
set -euo pipefail
git fetch origin main
if git merge-base --is-ancestor origin/main HEAD; then
echo "needs_sync=false" >> "$GITHUB_OUTPUT"
{
echo "## No-op"
echo
echo "staging already contains \`origin/main\` ($(git rev-parse --short=8 origin/main))."
} >> "$GITHUB_STEP_SUMMARY"
else
echo "needs_sync=true" >> "$GITHUB_OUTPUT"
MAIN_SHORT=$(git rev-parse --short=8 origin/main)
echo "main_short=${MAIN_SHORT}" >> "$GITHUB_OUTPUT"
echo "::notice::staging is missing main's tip (${MAIN_SHORT}) — merging in-runner and pushing"
fi
- name: Merge main into staging (in-runner)
if: steps.check.outputs.needs_sync == 'true'
id: merge
run: |
set -euo pipefail
# Already on staging from checkout. Try fast-forward
# first (cleanest history); fall back to merge commit
# if staging has commits main doesn't.
if git merge --ff-only origin/main; then
echo "did_ff=true" >> "$GITHUB_OUTPUT"
echo "::notice::Fast-forwarded staging to origin/main"
else
echo "did_ff=false" >> "$GITHUB_OUTPUT"
if ! git merge --no-ff origin/main \
-m "chore: sync main → staging (auto, ${{ steps.check.outputs.main_short }})"; then
# Hygiene: leave the work tree clean before failing.
git merge --abort || true
{
echo "## Conflict"
echo
echo "Auto-merge \`main → staging\` failed with conflicts."
echo "A human (or devops-engineer persona) needs to resolve manually:"
echo
echo '```'
echo "git fetch origin"
echo "git checkout staging"
echo "git merge --no-ff origin/main"
echo "# resolve conflicts"
echo "git push origin staging"
echo '```'
} >> "$GITHUB_STEP_SUMMARY"
exit 1
fi
fi
- name: Push staging to origin
if: steps.check.outputs.needs_sync == 'true'
run: |
set -euo pipefail
# Direct push to staging. devops-engineer persona is
# whitelisted for direct push on the staging branch
# protection (Settings → Branches → staging).
#
# No --force / --force-with-lease: a fast-forward or
# legitimate merge commit on top of current staging
# is the only thing we'd ever push. If origin/staging
# advanced under us (concurrent merge), the push
# legitimately rejects and the next run picks up the
# new state.
if ! git push origin staging; then
{
echo "## Push rejected"
echo
echo "Direct push to \`staging\` failed. Likely causes:"
echo "- \`AUTO_SYNC_TOKEN\` rotated / wrong scope (HTTP 401/403)"
echo "- \`devops-engineer\` no longer in"
echo " \`push_whitelist_usernames\` on the staging"
echo " branch protection (HTTP 422)"
echo "- staging advanced concurrently — re-running this"
echo " workflow on the new main tip will pick it up"
} >> "$GITHUB_STEP_SUMMARY"
exit 1
fi
{
echo "## Auto-sync succeeded"
echo
echo "- staging advanced to: \`$(git rev-parse --short=8 HEAD)\`"
echo "- main tip: \`${{ steps.check.outputs.main_short }}\`"
echo "- Strategy: $([ "${{ steps.merge.outputs.did_ff }}" = "true" ] && echo "fast-forward" || echo "merge commit")"
echo "- Pushed by: \`devops-engineer\` (per-agent persona, anti-bot-ring)"
} >> "$GITHUB_STEP_SUMMARY"

View File

@ -1,276 +0,0 @@
name: Retarget main PRs to staging
# Mechanical enforcement of SHARED_RULES rule 8 ("Staging-first
# workflow, no exceptions"). When a bot opens a PR against `main`,
# retarget it to `staging` automatically and leave an explanatory
# comment. Human / CEO-authored PRs (the staging→main promotion
# PRs, etc.) are left alone — they're the authorised exception
# to the rule.
#
# ============================================================
# What this workflow does
# ============================================================
#
# On `pull_request_target` opened/reopened against `main`:
# 1. If the PR head is `staging`, skip (the auto-promote PRs
# MUST stay base=main).
# 2. If the PR author is a bot, retarget the PR base to
# `staging` via Gitea REST `PATCH /pulls/{N}` body
# `{"base":"staging"}`.
# 3. If the retarget returns 422 "pull request already exists
# for base branch 'staging'" (issue #1884 case: another PR
# on the same head already targets staging), close the
# now-redundant main-PR via Gitea REST instead of failing
# red.
# 4. Post an explainer comment on the retargeted PR via
# Gitea REST `POST /issues/{N}/comments`.
#
# ============================================================
# Why Gitea REST (and not `gh api / gh pr close / gh pr comment`)
# ============================================================
#
# Pre-2026-05-06 this workflow used `gh api -X PATCH "repos/{owner}/{repo}/pulls/{N}" -f base=staging`
# plus `gh pr close` and `gh pr comment`. After the GitHub→Gitea
# cutover those calls fail because:
#
# - `gh` CLI defaults to `api.github.com`. Even with `GH_HOST`
# pointing at Gitea, `gh pr close / comment` route through
# GraphQL (`/api/graphql`) which Gitea does not expose.
# Empirical: every `gh pr *` call returns
# `HTTP 405 Method Not Allowed (https://git.moleculesai.app/api/graphql)`
# — same root cause as #65 (auto-sync, fixed in PR #66) and
# #73/#195 (auto-promote, fixed in PR #78).
# - `gh api -X PATCH /pulls/{N}` happens to use a REST path
# that Gitea also has, but the `gh` host-resolution layer
# and pagination/retry logic don't always hit Gitea cleanly,
# and the cost of switching to direct `curl` is one extra
# line of code.
#
# So this workflow uses direct `curl` calls to Gitea REST. No
# `gh` CLI dependency, no GraphQL, no flaky host-resolution.
#
# ============================================================
# Identity + token (anti-bot-ring per saved-memory
# `feedback_per_agent_gitea_identity_default`)
# ============================================================
#
# Pre-fix this workflow used the per-job ephemeral
# `secrets.GITHUB_TOKEN`. On Gitea Actions that token has
# narrow scope and unpredictable cross-PR write capability.
#
# Post-fix: `secrets.AUTO_SYNC_TOKEN` (the `devops-engineer`
# Gitea persona). Same persona used by `auto-sync-main-to-staging.yml`
# (PR #66) and `auto-promote-staging.yml` (PR #78). Token scope:
# `push: true` repo write, sufficient for PR-edit + close + comment.
#
# Why this token does NOT need branch-protection bypass:
# patching a PR's base ref is a PR-level operation that does not
# require push perms on either branch (the PR's own commits stay
# put; only the metadata changes).
#
# ============================================================
# Failure modes & operational notes
# ============================================================
#
# A — PATCH base→staging returns 422 "pull request already exists"
# (issue #1884 case):
# - Detected by string-match on response body. Workflow
# falls through to closing the now-redundant main-PR
# (Gitea REST `PATCH /pulls/{N}` with `state: closed`)
# and posts an explanation comment. Step summary surfaces.
#
# B — `AUTO_SYNC_TOKEN` rotated / wrong scope:
# - First REST call returns 401/403. Step summary surfaces.
# Re-issue token from `~/.molecule-ai/personas/` on the
# operator host and update repo Actions secret.
#
# C — PR was deleted between trigger and run:
# - REST call returns 404. Workflow exits 0 with a notice
# (the rule was already enforced or the PR is gone).
#
# D — author is not actually a bot but the filter mis-fires:
# - Filter is conservative: only triggers on
# `user.type == 'Bot'`, `login` ends with `[bot]`, or
# known bot logins (`molecule-ai[bot]`, `app/molecule-ai`).
# Human PRs slip through unaffected. If a NEW bot login
# starts shipping main-PRs, add it to the filter.
on:
pull_request_target:
types: [opened, reopened]
branches: [main]
permissions:
pull-requests: write
jobs:
retarget:
name: Retarget to staging
runs-on: ubuntu-latest
# Only fire for bot-authored PRs. Human CEO PRs (staging→main
# promotion) are intentional and pass through.
#
# Head-ref guard: never retarget a PR whose head IS `staging`
# — those are the auto-promote staging→main PRs (opened by
# `devops-engineer` since PR #78 / #195 fix). Retargeting
# head=staging onto base=staging fails with HTTP 422 "no new
# commits between base 'staging' and head 'staging'", which
# would surface as a noisy red workflow run on every
# auto-promote (caught 2026-05-03 on the GitHub-era PR #2588).
if: >-
github.event.pull_request.head.ref != 'staging'
&& (
github.event.pull_request.user.type == 'Bot'
|| endsWith(github.event.pull_request.user.login, '[bot]')
|| github.event.pull_request.user.login == 'app/molecule-ai'
|| github.event.pull_request.user.login == 'molecule-ai[bot]'
|| github.event.pull_request.user.login == 'devops-engineer'
)
steps:
- name: Retarget PR base to staging via Gitea REST
id: retarget
env:
GITEA_TOKEN: ${{ secrets.AUTO_SYNC_TOKEN }}
GITEA_HOST: ${{ vars.GITEA_HOST || 'https://git.moleculesai.app' }}
REPO: ${{ github.repository }}
PR_NUMBER: ${{ github.event.pull_request.number }}
PR_AUTHOR: ${{ github.event.pull_request.user.login }}
# Issue #1884 case: when the bot opens a PR against main
# and there's already another PR on the same head branch
# targeting staging, Gitea's PATCH returns 422 with a
# body mentioning "pull request already exists for base
# branch 'staging'" (the Gitea message wording is
# slightly different from GitHub's; the substring match
# below covers both for forward/back compat).
# The retarget can't proceed — but the right response is
# to close the now-redundant main-PR, not to fail the
# workflow noisily. Detect that specific 422 and close
# instead.
run: |
set -euo pipefail
API="${GITEA_HOST}/api/v1/repos/${REPO}"
AUTH=(-H "Authorization: token ${GITEA_TOKEN}" -H "Accept: application/json")
echo "Retargeting PR #${PR_NUMBER} (author: ${PR_AUTHOR}) from main → staging"
# Curl-status-capture pattern per `feedback_curl_status_capture_pollution`:
# http_code via -w to its own scalar, body to a tempfile, set +e/-e
# bracket so curl's non-zero-on-4xx doesn't pollute the script's exit chain.
BODY_FILE=$(mktemp)
REQ='{"base":"staging"}'
set +e
STATUS=$(curl -sS "${AUTH[@]}" -H "Content-Type: application/json" \
-X PATCH -d "${REQ}" \
-o "${BODY_FILE}" -w "%{http_code}" \
"${API}/pulls/${PR_NUMBER}")
CURL_RC=$?
set -e
if [ "${CURL_RC}" -ne 0 ]; then
echo "::error::curl PATCH failed (rc=${CURL_RC})"
rm -f "${BODY_FILE}"
exit 1
fi
if [ "${STATUS}" = "201" ] || [ "${STATUS}" = "200" ]; then
NEW_BASE=$(jq -r '.base.ref // "?"' < "${BODY_FILE}")
rm -f "${BODY_FILE}"
if [ "${NEW_BASE}" = "staging" ]; then
echo "::notice::Retargeted PR #${PR_NUMBER} → staging"
echo "outcome=retargeted" >> "$GITHUB_OUTPUT"
exit 0
fi
echo "::error::PATCH returned ${STATUS} but base.ref is '${NEW_BASE}', not 'staging'"
exit 1
fi
# Specifically match the 422 duplicate-base/head error so
# any OTHER PATCH failure (auth, deleted PR, etc.) still
# surfaces as a real workflow failure.
BODY=$(cat "${BODY_FILE}" || true)
rm -f "${BODY_FILE}"
if [ "${STATUS}" = "422" ] && echo "${BODY}" | grep -qE "(pull request already exists for base branch 'staging'|already exists.*base.*staging)"; then
echo "::notice::PR #${PR_NUMBER}: duplicate target-staging PR exists on same head — closing this main-PR as redundant."
# Close the now-redundant main-PR via Gitea REST
# (PATCH state=closed). Post comment explaining
# rationale BEFORE close so the comment lands on the
# PR (commenting on a closed PR works on Gitea, but
# historically caused notification ordering surprises).
CLOSE_BODY_FILE=$(mktemp)
CMT_REQ=$(jq -n '{body:"[retarget-bot] Closing — another PR on the same head branch already targets `staging`. This PR is redundant. See issue #1884 for the rationale."}')
set +e
CMT_STATUS=$(curl -sS "${AUTH[@]}" -H "Content-Type: application/json" \
-X POST -d "${CMT_REQ}" \
-o "${CLOSE_BODY_FILE}" -w "%{http_code}" \
"${API}/issues/${PR_NUMBER}/comments")
set -e
if [ "${CMT_STATUS}" != "201" ]; then
echo "::warning::dup-close comment POST returned ${CMT_STATUS}; continuing to close anyway"
cat "${CLOSE_BODY_FILE}" | head -c 300 || true
fi
rm -f "${CLOSE_BODY_FILE}"
CLOSE_REQ='{"state":"closed"}'
CLOSE_RESP=$(mktemp)
set +e
CL_STATUS=$(curl -sS "${AUTH[@]}" -H "Content-Type: application/json" \
-X PATCH -d "${CLOSE_REQ}" \
-o "${CLOSE_RESP}" -w "%{http_code}" \
"${API}/pulls/${PR_NUMBER}")
set -e
if [ "${CL_STATUS}" = "201" ] || [ "${CL_STATUS}" = "200" ]; then
echo "::notice::Closed PR #${PR_NUMBER} as redundant"
echo "outcome=closed-as-duplicate" >> "$GITHUB_OUTPUT"
rm -f "${CLOSE_RESP}"
exit 0
fi
echo "::error::Failed to close redundant PR: HTTP ${CL_STATUS}"
cat "${CLOSE_RESP}" | head -c 300 || true
rm -f "${CLOSE_RESP}"
exit 1
fi
echo "::error::Retarget PATCH failed and was NOT a duplicate-base error: HTTP ${STATUS}"
echo "${BODY}" | head -c 500 >&2
exit 1
- name: Post explainer comment
if: steps.retarget.outputs.outcome == 'retargeted'
env:
GITEA_TOKEN: ${{ secrets.AUTO_SYNC_TOKEN }}
GITEA_HOST: ${{ vars.GITEA_HOST || 'https://git.moleculesai.app' }}
REPO: ${{ github.repository }}
PR_NUMBER: ${{ github.event.pull_request.number }}
run: |
set -euo pipefail
API="${GITEA_HOST}/api/v1/repos/${REPO}"
AUTH=(-H "Authorization: token ${GITEA_TOKEN}" -H "Accept: application/json")
# PR comments live on the issue endpoint in Gitea
# (PRs ARE issues — same endpoint, different sub-resources
# for diffs/files/etc.). The body uses jq to safely
# encode the multi-line markdown without shell-quote
# nightmares.
REQ=$(jq -n '{body:"[retarget-bot] This PR was opened against `main` and has been retargeted to `staging` automatically.\n\n**Why:** per [SHARED_RULES rule 8](https://git.moleculesai.app/molecule-ai/molecule-ai-org-template-molecule-dev/src/branch/main/SHARED_RULES.md), all feature work targets `staging` first; the CEO promotes `staging → main` separately.\n\n**What changed:** just the base branch — no code change. CI will re-run against `staging`. If you get merge conflicts, rebase on `staging`.\n\n**If this PR is the CEO`s staging→main promotion:** the Action skipped you (only bot-authored PRs are retargeted, head=staging is also exempted). If you see this comment on your CEO PR, that`s a bug — please tag @hongmingwang."}')
BODY_FILE=$(mktemp)
set +e
STATUS=$(curl -sS "${AUTH[@]}" -H "Content-Type: application/json" \
-X POST -d "${REQ}" \
-o "${BODY_FILE}" -w "%{http_code}" \
"${API}/issues/${PR_NUMBER}/comments")
set -e
if [ "${STATUS}" = "201" ]; then
echo "::notice::Posted explainer comment on PR #${PR_NUMBER}"
else
echo "::warning::Failed to post explainer (HTTP ${STATUS}) — retarget itself succeeded"
cat "${BODY_FILE}" | head -c 300 || true
fi
rm -f "${BODY_FILE}"