molecule-core/.gitea/workflows/sweep-aws-secrets.yml
hongming-codex-laptop 334b748492
All checks were successful
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 5s
Lint curl status-code capture / Scan workflows for curl status-capture pollution (pull_request) Successful in 7s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 8s
E2E API Smoke Test / detect-changes (pull_request) Successful in 17s
CI / Detect changes (pull_request) Successful in 17s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 20s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 21s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 26s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 7s
CI / Platform (Go) (pull_request) Successful in 6s
CI / Canvas (Next.js) (pull_request) Successful in 5s
CI / Python Lint & Test (pull_request) Successful in 4s
Ops Scripts Tests / Ops scripts (unittest) (pull_request) Successful in 34s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 4s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 2s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 2s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 8s
CI / all-required (pull_request) Successful in 0s
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 1m1s
lint-continue-on-error-tracking / lint-continue-on-error-tracking (pull_request) Successful in 1m10s
Lint workflow YAML (Gitea-1.22.6-hostile shapes) / Lint workflow YAML for Gitea-1.22.6-hostile shapes (pull_request) Successful in 1m12s
Lint pre-flip continue-on-error / Verify continue-on-error flips have run-log proof (pull_request) Successful in 1m18s
lint-required-context-exists-in-bp / lint-required-context-exists-in-bp (pull_request) Successful in 1m25s
sop-tier-check / tier-check (pull_request) Successful in 21s
sop-checklist-gate / gate (pull_request) Successful in 23s
gate-check-v3 / gate-check (pull_request) Successful in 34s
sop-checklist / all-items-acked (pull_request) acked: 7/7
qa-review / approved (pull_request) Manual verified: qa-review APPROVED by core-qa (team=qa)
security-review / approved (pull_request) Manual verified: security-review APPROVED by core-security (team=security)
fix(ci): harden Cloudflare sweep API errors
2026-05-13 00:35:15 -07:00

131 lines
6.0 KiB
YAML

name: Sweep stale AWS Secrets Manager secrets
# Ported from .github/workflows/sweep-aws-secrets.yml on 2026-05-11 per RFC
# internal#219 §1 sweep. Differences from the GitHub version:
# - Dropped `workflow_dispatch.inputs` (Gitea 1.22.6 parser rejects them
# per feedback_gitea_workflow_dispatch_inputs_unsupported).
# - Dropped `merge_group:` (no Gitea merge queue).
# - Dropped `environment:` blocks (Gitea has no environments).
# - Workflow-level env.GITHUB_SERVER_URL pinned per
# feedback_act_runner_github_server_url.
# - `continue-on-error: true` on each job (RFC §1 contract).
#
# Janitor for per-tenant AWS Secrets Manager secrets
# (`molecule/tenant/<org_id>/bootstrap`) whose backing tenant no
# longer exists. Parallel-shape to sweep-cf-tunnels.yml and
# sweep-cf-orphans.yml — different cloud, same justification.
#
# Why this exists separately from a long-term reconciler integration:
# - molecule-controlplane's tenant_resources audit table (mig 024)
# currently tracks four resource kinds: CloudflareTunnel,
# CloudflareDNS, EC2Instance, SecurityGroup. SecretsManager is
# not in the list, so the existing reconciler doesn't catch
# orphan secrets.
# - At ~$0.40/secret/month the cost grew to ~$19/month before this
# sweeper was written, indicating ~45+ orphan secrets from
# crashed provisions and incomplete deprovision flows.
# - The proper fix (KindSecretsManagerSecret + recorder hook +
# reconciler enumerator) is filed as a separate controlplane
# issue. This sweeper is the immediate cost-relief stopgap.
#
# AWS credentials: use the dedicated Secrets Manager janitor principal.
# Do not fall back to the molecule-cp application principal: it does
# not need account-wide ListSecrets, and a 2026-05-12 CI failure proved
# that using it here turns a least-privilege production credential into
# a red scheduled janitor.
#
# Safety: the script's MAX_DELETE_PCT gate (default 50%, mirroring
# sweep-cf-orphans.yml — tenant secrets are durable by design, unlike
# the mostly-orphan tunnels) refuses to nuke past the threshold.
on:
# Disabled as an hourly schedule until the dedicated
# AWS_SECRETS_JANITOR_* key exists in the key-management SSOT and is
# mirrored into Gitea. Falling back to the molecule-cp app principal is
# intentionally not allowed: it lacks account-wide ListSecrets, and
# granting that to an application credential would weaken least privilege.
#
# Keep the manual trigger so operators can validate the workflow immediately
# after provisioning the janitor key, then restore the hourly :30 schedule.
workflow_dispatch:
# Don't let two sweeps race the same AWS account.
concurrency:
group: sweep-aws-secrets
cancel-in-progress: false
permissions:
contents: read
env:
GITHUB_SERVER_URL: https://git.moleculesai.app
jobs:
sweep:
name: Sweep AWS Secrets Manager
runs-on: ubuntu-latest
# Phase 3 (RFC #219 §1): surface broken workflows without blocking.
# mc#774: pre-existing continue-on-error mask; root-fix and remove, do not renew silently.
continue-on-error: true
# 30 min cap, mirroring the other janitors. AWS DeleteSecret is
# fast (~0.3s/call) so even a 100+ backlog drains in seconds
# under the 8-way xargs parallelism, but the cap is set generously
# to leave headroom for any actual API hang.
timeout-minutes: 30
env:
AWS_REGION: ${{ secrets.AWS_REGION || 'us-east-1' }}
AWS_ACCESS_KEY_ID: ${{ secrets.AWS_SECRETS_JANITOR_ACCESS_KEY_ID }}
AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRETS_JANITOR_SECRET_ACCESS_KEY }}
CP_ADMIN_API_TOKEN: ${{ secrets.CP_ADMIN_API_TOKEN }}
CP_STAGING_ADMIN_API_TOKEN: ${{ secrets.CP_STAGING_ADMIN_API_TOKEN }}
MAX_DELETE_PCT: ${{ github.event.inputs.max_delete_pct || '50' }}
GRACE_HOURS: ${{ github.event.inputs.grace_hours || '24' }}
steps:
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
- name: Verify required secrets present
id: verify
# Schedule-vs-dispatch behaviour split mirrors sweep-cf-orphans
# and sweep-cf-tunnels (hardened 2026-04-28). Same principle:
# - schedule → exit 1 on missing secrets (red CI surfaces it)
# - workflow_dispatch → exit 0 with warning (operator-driven,
# they already accepted the repo state)
run: |
missing=()
for var in AWS_ACCESS_KEY_ID AWS_SECRET_ACCESS_KEY CP_ADMIN_API_TOKEN CP_STAGING_ADMIN_API_TOKEN; do
if [ -z "${!var:-}" ]; then
missing+=("$var")
fi
done
if [ ${#missing[@]} -gt 0 ]; then
if [ "${{ github.event_name }}" = "workflow_dispatch" ]; then
echo "::warning::skipping sweep — secrets not configured: ${missing[*]}"
echo "::warning::set them at Settings → Secrets and Variables → Actions, then rerun."
echo "skip=true" >> "$GITHUB_OUTPUT"
exit 0
fi
echo "::error::sweep cannot run — required secrets missing: ${missing[*]}"
echo "::error::set them at Settings → Secrets and Variables → Actions, or disable this workflow."
exit 1
fi
echo "All required secrets present ✓"
echo "skip=false" >> "$GITHUB_OUTPUT"
- name: Run sweep
if: steps.verify.outputs.skip != 'true'
# Schedule-vs-dispatch dry-run asymmetry mirrors sweep-cf-tunnels:
# - Scheduled: input empty → "false" → --execute (the whole
# point of an hourly janitor).
# - Manual workflow_dispatch: input default true → dry-run;
# operator must flip it to actually delete.
run: |
set -euo pipefail
if [ "${{ github.event.inputs.dry_run || 'false' }}" = "true" ]; then
echo "Running in dry-run mode — no deletions"
bash scripts/ops/sweep-aws-secrets.sh
else
echo "Running with --execute — will delete identified orphans"
bash scripts/ops/sweep-aws-secrets.sh --execute
fi