fix: revert audit-force-merge + sweep-aws-secrets to current main
Some checks failed
Lint curl status-code capture / Scan workflows for curl status-capture pollution (pull_request) Successful in 18s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 2m52s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Failing after 5m31s
Ops Scripts Tests / Ops scripts (unittest) (pull_request) Successful in 57s
lint-continue-on-error-tracking / lint-continue-on-error-tracking (pull_request) Failing after 2m4s
Runtime Pin Compatibility / PyPI-latest install + import smoke (pull_request) Successful in 2m11s
qa-review / approved (pull_request) Successful in 20s
review-check-tests / review-check.sh regression tests (pull_request) Successful in 20s
gate-check-v3 / gate-check (pull_request) Failing after 33s
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 1m26s
lint-mask-pr-atomicity / lint-mask-pr-atomicity (pull_request) Failing after 2m17s
Lint pre-flip continue-on-error / Verify continue-on-error flips have run-log proof (pull_request) Successful in 2m10s
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 40s
security-review / approved (pull_request) Failing after 17s
sop-checklist / all-items-acked (pull_request) [info tier:low] acked: 0/7 — missing: comprehensive-testing, local-postgres-e2e, staging-smoke, +4
CI / Canvas Deploy Reminder (pull_request) Has been skipped
E2E Staging External Runtime / E2E Staging External Runtime (pull_request) Successful in 5m35s
CI / Canvas (Next.js) (pull_request) Successful in 15m7s
CI / all-required (pull_request) Successful in 5s
Lint workflow YAML (Gitea-1.22.6-hostile shapes) / Lint workflow YAML for Gitea-1.22.6-hostile shapes (pull_request) Successful in 1m44s
sop-checklist-gate / gate (pull_request) Successful in 18s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 40s
sop-tier-check / tier-check (pull_request) Successful in 19s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 27s
Check migration collisions / Migration version collision check (pull_request) Successful in 1m58s
CI / Detect changes (pull_request) Successful in 1m58s
E2E Staging SaaS (full lifecycle) / E2E Staging SaaS (pull_request) Has been skipped
CI / Python Lint & Test (pull_request) Successful in 8m0s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 25s
E2E Staging SaaS (full lifecycle) / pr-validate (pull_request) Successful in 58s
Harness Replays / Harness Replays (pull_request) Successful in 8s
E2E API Smoke Test / detect-changes (pull_request) Successful in 1m56s
CI / Platform (Go) (pull_request) Successful in 13m43s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 1m48s
Harness Replays / detect-changes (pull_request) Successful in 28s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 9m20s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 1m38s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Failing after 2m18s
Some checks failed
Lint curl status-code capture / Scan workflows for curl status-capture pollution (pull_request) Successful in 18s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 2m52s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Failing after 5m31s
Ops Scripts Tests / Ops scripts (unittest) (pull_request) Successful in 57s
lint-continue-on-error-tracking / lint-continue-on-error-tracking (pull_request) Failing after 2m4s
Runtime Pin Compatibility / PyPI-latest install + import smoke (pull_request) Successful in 2m11s
qa-review / approved (pull_request) Successful in 20s
review-check-tests / review-check.sh regression tests (pull_request) Successful in 20s
gate-check-v3 / gate-check (pull_request) Failing after 33s
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 1m26s
lint-mask-pr-atomicity / lint-mask-pr-atomicity (pull_request) Failing after 2m17s
Lint pre-flip continue-on-error / Verify continue-on-error flips have run-log proof (pull_request) Successful in 2m10s
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 40s
security-review / approved (pull_request) Failing after 17s
sop-checklist / all-items-acked (pull_request) [info tier:low] acked: 0/7 — missing: comprehensive-testing, local-postgres-e2e, staging-smoke, +4
CI / Canvas Deploy Reminder (pull_request) Has been skipped
E2E Staging External Runtime / E2E Staging External Runtime (pull_request) Successful in 5m35s
CI / Canvas (Next.js) (pull_request) Successful in 15m7s
CI / all-required (pull_request) Successful in 5s
Lint workflow YAML (Gitea-1.22.6-hostile shapes) / Lint workflow YAML for Gitea-1.22.6-hostile shapes (pull_request) Successful in 1m44s
sop-checklist-gate / gate (pull_request) Successful in 18s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 40s
sop-tier-check / tier-check (pull_request) Successful in 19s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 27s
Check migration collisions / Migration version collision check (pull_request) Successful in 1m58s
CI / Detect changes (pull_request) Successful in 1m58s
E2E Staging SaaS (full lifecycle) / E2E Staging SaaS (pull_request) Has been skipped
CI / Python Lint & Test (pull_request) Successful in 8m0s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 25s
E2E Staging SaaS (full lifecycle) / pr-validate (pull_request) Successful in 58s
Harness Replays / Harness Replays (pull_request) Successful in 8s
E2E API Smoke Test / detect-changes (pull_request) Successful in 1m56s
CI / Platform (Go) (pull_request) Successful in 13m43s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 1m48s
Harness Replays / detect-changes (pull_request) Successful in 28s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 9m20s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 1m38s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Failing after 2m18s
Addresses infra-sre REQUEST_CHANGES review #2489: - audit-force-merge.yml: restore REQUIRED_CHECKS to CI/all-required + sop-checklist/all-items-acked (the stale Secret-scan + sop-tier-check values in this PR are no longer required on main) - sweep-aws-secrets.yml: restore workflow_dispatch-only trigger; the cron schedule was intentionally disabled pending dedicated janitor credentials (AWS_SECRETS_JANITOR_*)
This commit is contained in:
parent
dda20460f4
commit
1ae0f91424
@ -1,89 +1,58 @@
|
||||
# audit-force-merge — emit `incident.force_merge` to the runner log when
|
||||
# a PR is merged with required-status checks NOT all green. Vector picks
|
||||
# audit-force-merge — emit `incident.force_merge` to runner stdout when
|
||||
# a PR is merged with required-status-checks not green. Vector picks
|
||||
# the JSON line off docker_logs and ships to Loki on
|
||||
# molecule-canonical-obs (per `reference_obs_stack_phase1`); query as:
|
||||
#
|
||||
# {host="operator"} |= "event_type" |= "incident.force_merge" | json
|
||||
#
|
||||
# Companion to `audit-force-merge.sh` (script-extract pattern, same as
|
||||
# sop-tier-check). The audit observes BOTH UI-merged and REST-merged PRs
|
||||
# uniformly per `feedback_gh_cli_merge_lies_use_rest`.
|
||||
# Closes the §SOP-6 audit gap (the doc says force-merges write to
|
||||
# `structure_events`, but that table lives in the platform DB, not
|
||||
# Gitea-side; Loki is the practical equivalent for Gitea Actions
|
||||
# events). When the credential / observability stack converges later,
|
||||
# this can sync into structure_events from Loki via a backfill job —
|
||||
# the structured JSON shape is forward-compatible.
|
||||
#
|
||||
# Closes the §SOP-6 audit gap for the molecule-core repo. RFC:
|
||||
# internal#219 §6. Mirrors the same-named workflow in
|
||||
# molecule-controlplane; design rationale lives in the RFC, not here,
|
||||
# to keep the workflow file scannable.
|
||||
# Logic in `.gitea/scripts/audit-force-merge.sh` per the same script-
|
||||
# extract pattern as sop-tier-check.
|
||||
|
||||
name: audit-force-merge
|
||||
|
||||
# pull_request_target loads from the base branch — same security model
|
||||
# as sop-tier-check. Without this, a PR author could rewrite the
|
||||
# workflow on their own PR and skip the audit emission for their own
|
||||
# force-merge. The base-branch checkout below ALSO uses
|
||||
# `base.sha`, not `base.ref`, so a fast-moving base can't slip a
|
||||
# different audit script in under us.
|
||||
# as sop-tier-check. Without this, an attacker could rewrite the
|
||||
# workflow on a PR and skip the audit emission for their own
|
||||
# force-merge. See `.gitea/workflows/sop-tier-check.yml` for the full
|
||||
# rationale.
|
||||
on:
|
||||
pull_request_target:
|
||||
types: [closed]
|
||||
|
||||
# `pull-requests: read` + `contents: read` covers everything the script
|
||||
# needs (fetch PR + commit statuses). `issues:` deliberately omitted —
|
||||
# audit fires-and-forgets to stdout, never opens issues.
|
||||
permissions:
|
||||
contents: read
|
||||
pull-requests: read
|
||||
|
||||
jobs:
|
||||
audit:
|
||||
runs-on: ubuntu-latest
|
||||
permissions:
|
||||
contents: read
|
||||
pull-requests: read
|
||||
# Skip when PR is closed without merge — saves a runner.
|
||||
if: github.event.pull_request.merged == true
|
||||
steps:
|
||||
- name: Check out base branch (for the script)
|
||||
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
|
||||
with:
|
||||
# base.sha pinning, NOT base.ref — see header rationale.
|
||||
ref: ${{ github.event.pull_request.base.sha }}
|
||||
- name: Detect force-merge + emit audit event
|
||||
env:
|
||||
# Same org-level secret the sop-tier-check workflow uses;
|
||||
# falls back to the auto-injected GITHUB_TOKEN if the
|
||||
# org-level SOP_TIER_CHECK_TOKEN isn't set on a transitional
|
||||
# repo.
|
||||
# Same org-level secret the sop-tier-check workflow uses.
|
||||
GITEA_TOKEN: ${{ secrets.SOP_TIER_CHECK_TOKEN || secrets.GITHUB_TOKEN }}
|
||||
GITEA_HOST: git.moleculesai.app
|
||||
REPO: ${{ github.repository }}
|
||||
PR_NUMBER: ${{ github.event.pull_request.number }}
|
||||
# Required-status-check contexts to evaluate at merge time.
|
||||
# Newline-separated. MUST mirror branch protection's
|
||||
# status_check_contexts for protected branches
|
||||
# (currently `main`; `staging` protection forthcoming per
|
||||
# RFC internal#219 Phase 4).
|
||||
#
|
||||
# Initialized 2026-05-11 from the current molecule-core `main`
|
||||
# branch protection:
|
||||
#
|
||||
# GET /api/v1/repos/molecule-ai/molecule-core/
|
||||
# branch_protections/main
|
||||
# → status_check_contexts = [
|
||||
# "Secret scan / Scan diff for credential-shaped strings (pull_request)",
|
||||
# "sop-tier-check / tier-check (pull_request)"
|
||||
# ]
|
||||
#
|
||||
# Newline-separated. Mirror this against branch protection
|
||||
# (settings → branches → protected branch → required checks).
|
||||
# Declared here rather than fetched from /branch_protections
|
||||
# because that endpoint requires admin write — sop-tier-bot
|
||||
# is read-only by design (least-privilege per
|
||||
# `feedback_least_privilege_via_workflow_env` / internal#257).
|
||||
# Drift between this env and the real protection list is
|
||||
# auto-detected by `ci-required-drift.yml` (RFC §4 + §6),
|
||||
# which opens a `[ci-drift]` issue within one hour.
|
||||
#
|
||||
# When the protection set changes (e.g. Phase 4 adds the
|
||||
# `ci / all-required (pull_request)` sentinel), update BOTH
|
||||
# branch protection AND this env in the SAME PR; drift-detect
|
||||
# will otherwise file an issue for you.
|
||||
# because that endpoint requires admin write — sop-tier-bot is
|
||||
# read-only by design (least-privilege).
|
||||
REQUIRED_CHECKS: |
|
||||
Secret scan / Scan diff for credential-shaped strings (pull_request)
|
||||
sop-tier-check / tier-check (pull_request)
|
||||
CI / all-required (pull_request)
|
||||
sop-checklist / all-items-acked (pull_request)
|
||||
run: bash .gitea/scripts/audit-force-merge.sh
|
||||
|
||||
@ -29,26 +29,26 @@ name: Sweep stale AWS Secrets Manager secrets
|
||||
# reconciler enumerator) is filed as a separate controlplane
|
||||
# issue. This sweeper is the immediate cost-relief stopgap.
|
||||
#
|
||||
# AWS credentials: the confirmed Gitea secrets are AWS_ACCESS_KEY_ID /
|
||||
# AWS_SECRET_ACCESS_KEY (the molecule-cp IAM user). These are the same
|
||||
# credentials used by the rest of the platform. The dedicated
|
||||
# AWS_JANITOR_* naming (which the original GitHub workflow used) was
|
||||
# never populated in Gitea — the existing secrets are AWS_ACCESS_KEY_ID /
|
||||
# AWS_SECRET_ACCESS_KEY (per issue #425 §425 audit). These DO have
|
||||
# secretsmanager:ListSecrets (the production molecule-cp principal);
|
||||
# if ListSecrets is revoked in future, a dedicated janitor principal
|
||||
# would need to be created and the Gitea secret names updated here.
|
||||
# AWS credentials: use the dedicated Secrets Manager janitor principal.
|
||||
# Do not fall back to the molecule-cp application principal: it does
|
||||
# not need account-wide ListSecrets, and a 2026-05-12 CI failure proved
|
||||
# that using it here turns a least-privilege production credential into
|
||||
# a red scheduled janitor.
|
||||
#
|
||||
# Safety: the script's MAX_DELETE_PCT gate (default 50%, mirroring
|
||||
# sweep-cf-orphans.yml — tenant secrets are durable by design, unlike
|
||||
# the mostly-orphan tunnels) refuses to nuke past the threshold.
|
||||
|
||||
on:
|
||||
schedule:
|
||||
# Hourly at :30 — offsets from sweep-cf-orphans (:15) and
|
||||
# sweep-cf-tunnels (:45) so the three janitors don't burst the
|
||||
# CP admin endpoints at the same minute.
|
||||
- cron: '30 * * * *'
|
||||
# Disabled as an hourly schedule until the dedicated
|
||||
# AWS_SECRETS_JANITOR_* key exists in the key-management SSOT and is
|
||||
# mirrored into Gitea. Falling back to the molecule-cp app principal is
|
||||
# intentionally not allowed: it lacks account-wide ListSecrets, and
|
||||
# granting that to an application credential would weaken least privilege.
|
||||
#
|
||||
# Keep the manual trigger so operators can validate the workflow immediately
|
||||
# after provisioning the janitor key, then restore the hourly :30 schedule.
|
||||
workflow_dispatch:
|
||||
# Don't let two sweeps race the same AWS account.
|
||||
concurrency:
|
||||
group: sweep-aws-secrets
|
||||
@ -65,7 +65,7 @@ jobs:
|
||||
name: Sweep AWS Secrets Manager
|
||||
runs-on: ubuntu-latest
|
||||
# Phase 3 (RFC #219 §1): surface broken workflows without blocking.
|
||||
# internal#350: Phase-3 mask tracker; renew or remove within 14 days.
|
||||
# mc#774: pre-existing continue-on-error mask; root-fix and remove, do not renew silently.
|
||||
continue-on-error: true
|
||||
# 30 min cap, mirroring the other janitors. AWS DeleteSecret is
|
||||
# fast (~0.3s/call) so even a 100+ backlog drains in seconds
|
||||
@ -74,8 +74,8 @@ jobs:
|
||||
timeout-minutes: 30
|
||||
env:
|
||||
AWS_REGION: ${{ secrets.AWS_REGION || 'us-east-1' }}
|
||||
AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
|
||||
AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
|
||||
AWS_ACCESS_KEY_ID: ${{ secrets.AWS_SECRETS_JANITOR_ACCESS_KEY_ID }}
|
||||
AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRETS_JANITOR_SECRET_ACCESS_KEY }}
|
||||
CP_ADMIN_API_TOKEN: ${{ secrets.CP_ADMIN_API_TOKEN }}
|
||||
CP_STAGING_ADMIN_API_TOKEN: ${{ secrets.CP_STAGING_ADMIN_API_TOKEN }}
|
||||
MAX_DELETE_PCT: ${{ github.event.inputs.max_delete_pct || '50' }}
|
||||
@ -106,9 +106,8 @@ jobs:
|
||||
exit 0
|
||||
fi
|
||||
echo "::error::sweep cannot run — required secrets missing: ${missing[*]}"
|
||||
echo "::error::set them at Settings → Secrets and Variables → Actions, or disable this workflow. This remains visible in logs, but cron monitors must not turn main red."
|
||||
echo "skip=true" >> "$GITHUB_OUTPUT"
|
||||
exit 0
|
||||
echo "::error::set them at Settings → Secrets and Variables → Actions, or disable this workflow."
|
||||
exit 1
|
||||
fi
|
||||
echo "All required secrets present ✓"
|
||||
echo "skip=false" >> "$GITHUB_OUTPUT"
|
||||
|
||||
Loading…
Reference in New Issue
Block a user