fix(ci): canonicalize MOLECULE_STAGING_ADMIN_TOKEN -> CP_STAGING_ADMIN_API_TOKEN (post-#443 rebase) + drop staging-smoke continue-on-error

Re-applies PR#462 on current main (PR#443 merged first and renamed
canary-staging.yml -> staging-smoke.yml, conflicting #462).

Swept 6 files (15 secret-ref flips):

- .gitea/workflows/staging-smoke.yml          (3 refs + drop continue-on-error + add notify-on-failure step)
- .gitea/workflows/e2e-staging-saas.yml       (3 refs)
- .gitea/workflows/e2e-staging-sanity.yml     (3 refs)
- .gitea/workflows/e2e-staging-canvas.yml     (3 refs)
- .gitea/workflows/e2e-staging-external.yml   (3 refs)
- tests/e2e/STAGING_SAAS_E2E.md               (1 heading flip + 1 historical-rename breadcrumb)

Each workflow keeps one inline breadcrumb comment pointing back to
the old name and internal#322.

staging-smoke is the 30-min canary cadence for the entire staging
SaaS stack; silent failure (continue-on-error: true) masked exactly
the regressions the smoke exists to surface, same class as PR#461
(`sweep-stale-e2e-orgs`). Dropped continue-on-error from the smoke
job + added a fail-loud `if: failure()` Notify step mirroring
PR#461. The four other `e2e-staging-*` workflows KEEP
continue-on-error: true per RFC #219 §1 — they are advisory.

Excluded from this PR:
- .gitea/workflows/sweep-stale-e2e-orgs.yml  (PR#461 owns)
- .gitea/workflows/staging-verify.yml         (only references the plural MOLECULE_STAGING_ADMIN_TOKENS canary-fleet secret, out of scope)
- scripts/staging-smoke.sh                    (same — plural only)
- docs/architecture/canary-release.md         (same — plural only)
- .github/ mirror tree                        (separate scope per reference_molecule_core_actions_gitea_only)

Verified locally: yaml.safe_load clean on all 5 workflows; grep
returns ZERO non-breadcrumb references in the swept files; the
plural MOLECULE_STAGING_ADMIN_TOKENS references in
staging-verify.yml / scripts/staging-smoke.sh / canary-release.md
are intentionally untouched.

Refs: internal#322, PR#461, feedback_rename_pr_and_edit_pr_conflict_sequence
This commit is contained in:
claude-ceo-assistant 2026-05-11 04:33:56 -07:00
parent ae30cdef87
commit 8f1d24f33f
6 changed files with 66 additions and 18 deletions

View File

@ -124,7 +124,10 @@ jobs:
env:
CANVAS_E2E_STAGING: '1'
MOLECULE_CP_URL: https://staging-api.moleculesai.app
MOLECULE_ADMIN_TOKEN: ${{ secrets.MOLECULE_STAGING_ADMIN_TOKEN }}
# 2026-05-11: secret canonicalised from MOLECULE_STAGING_ADMIN_TOKEN
# (dead in org secret store) to CP_STAGING_ADMIN_API_TOKEN per
# internal#322 — see this PR for the cross-workflow sweep.
MOLECULE_ADMIN_TOKEN: ${{ secrets.CP_STAGING_ADMIN_API_TOKEN }}
defaults:
run:
@ -145,7 +148,7 @@ jobs:
if: needs.detect-changes.outputs.canvas == 'true'
run: |
if [ -z "$MOLECULE_ADMIN_TOKEN" ]; then
echo "::error::Missing MOLECULE_STAGING_ADMIN_TOKEN"
echo "::error::Missing CP_STAGING_ADMIN_API_TOKEN"
exit 2
fi
@ -207,7 +210,7 @@ jobs:
- name: Teardown safety net
if: always() && needs.detect-changes.outputs.canvas == 'true'
env:
ADMIN_TOKEN: ${{ secrets.MOLECULE_STAGING_ADMIN_TOKEN }}
ADMIN_TOKEN: ${{ secrets.CP_STAGING_ADMIN_API_TOKEN }}
run: |
set +e
STATE_FILE=".playwright-staging-state.json"

View File

@ -89,7 +89,10 @@ jobs:
env:
MOLECULE_CP_URL: https://staging-api.moleculesai.app
MOLECULE_ADMIN_TOKEN: ${{ secrets.MOLECULE_STAGING_ADMIN_TOKEN }}
# 2026-05-11: secret canonicalised from MOLECULE_STAGING_ADMIN_TOKEN
# (dead in org secret store) to CP_STAGING_ADMIN_API_TOKEN per
# internal#322 — see this PR for the cross-workflow sweep.
MOLECULE_ADMIN_TOKEN: ${{ secrets.CP_STAGING_ADMIN_API_TOKEN }}
E2E_RUN_ID: "${{ github.run_id }}-${{ github.run_attempt }}"
E2E_KEEP_ORG: ${{ github.event.inputs.keep_org && '1' || '0' }}
E2E_STALE_WAIT_SECS: ${{ github.event.inputs.stale_wait_secs || '180' }}
@ -104,7 +107,7 @@ jobs:
# missing — silent skip would mask infra rot. Manual dispatch
# gets the same hard-fail; an operator running this on a fork
# without secrets configured needs to know up-front.
echo "::error::MOLECULE_STAGING_ADMIN_TOKEN secret not set (Railway staging CP_ADMIN_API_TOKEN)"
echo "::error::CP_STAGING_ADMIN_API_TOKEN secret not set (Railway staging CP_ADMIN_API_TOKEN)"
exit 2
fi
echo "Admin token present ✓"
@ -129,7 +132,7 @@ jobs:
- name: Teardown safety net (runs on cancel/failure)
if: always()
env:
ADMIN_TOKEN: ${{ secrets.MOLECULE_STAGING_ADMIN_TOKEN }}
ADMIN_TOKEN: ${{ secrets.CP_STAGING_ADMIN_API_TOKEN }}
run: |
set +e
orgs=$(curl -sS "$MOLECULE_CP_URL/cp/admin/orgs" \

View File

@ -86,7 +86,10 @@ jobs:
# Single admin-bearer secret drives provision + tenant-token
# retrieval + teardown. Configure in
# Settings → Secrets and variables → Actions → Repository secrets.
MOLECULE_ADMIN_TOKEN: ${{ secrets.MOLECULE_STAGING_ADMIN_TOKEN }}
# 2026-05-11: secret canonicalised from MOLECULE_STAGING_ADMIN_TOKEN
# (dead in org secret store) to CP_STAGING_ADMIN_API_TOKEN per
# internal#322 — see this PR for the cross-workflow sweep.
MOLECULE_ADMIN_TOKEN: ${{ secrets.CP_STAGING_ADMIN_API_TOKEN }}
# MiniMax is the PRIMARY LLM auth path post-2026-05-04. Switched
# from hermes+OpenAI default after #2578 (the staging OpenAI key
# account went over quota and stayed dead for 36+ hours, taking
@ -122,7 +125,7 @@ jobs:
- name: Verify admin token present
run: |
if [ -z "$MOLECULE_ADMIN_TOKEN" ]; then
echo "::error::MOLECULE_STAGING_ADMIN_TOKEN secret not set (Railway staging CP_ADMIN_API_TOKEN)"
echo "::error::CP_STAGING_ADMIN_API_TOKEN secret not set (Railway staging CP_ADMIN_API_TOKEN)"
exit 2
fi
echo "Admin token present ✓"
@ -189,7 +192,7 @@ jobs:
- name: Teardown safety net (runs on cancel/failure)
if: always()
env:
ADMIN_TOKEN: ${{ secrets.MOLECULE_STAGING_ADMIN_TOKEN }}
ADMIN_TOKEN: ${{ secrets.CP_STAGING_ADMIN_API_TOKEN }}
run: |
# Best-effort: find any e2e-YYYYMMDD-* orgs matching this run and
# nuke them. Catches the case where the script died before

View File

@ -42,7 +42,10 @@ jobs:
env:
MOLECULE_CP_URL: https://staging-api.moleculesai.app
MOLECULE_ADMIN_TOKEN: ${{ secrets.MOLECULE_STAGING_ADMIN_TOKEN }}
# 2026-05-11: secret canonicalised from MOLECULE_STAGING_ADMIN_TOKEN
# (dead in org secret store) to CP_STAGING_ADMIN_API_TOKEN per
# internal#322 — see this PR for the cross-workflow sweep.
MOLECULE_ADMIN_TOKEN: ${{ secrets.CP_STAGING_ADMIN_API_TOKEN }}
E2E_MODE: smoke
E2E_RUNTIME: hermes
E2E_RUN_ID: "sanity-${{ github.run_id }}"
@ -54,7 +57,7 @@ jobs:
- name: Verify admin token present
run: |
if [ -z "$MOLECULE_ADMIN_TOKEN" ]; then
echo "::error::MOLECULE_STAGING_ADMIN_TOKEN not set"
echo "::error::CP_STAGING_ADMIN_API_TOKEN not set"
exit 2
fi
@ -118,7 +121,7 @@ jobs:
- name: Teardown safety net
if: always()
env:
ADMIN_TOKEN: ${{ secrets.MOLECULE_STAGING_ADMIN_TOKEN }}
ADMIN_TOKEN: ${{ secrets.CP_STAGING_ADMIN_API_TOKEN }}
run: |
set +e
orgs=$(curl -sS "$MOLECULE_CP_URL/cp/admin/orgs" \

View File

@ -52,8 +52,20 @@ jobs:
smoke:
name: Staging SaaS smoke
runs-on: ubuntu-latest
# Phase 3 (RFC #219 §1): surface broken workflows without blocking.
continue-on-error: true
# NOTE: Phase 3 (RFC #219 §1) `continue-on-error: true` removed
# 2026-05-11. The "surface broken workflows without blocking"
# rationale was correctly applied to advisory/lint workflows but
# wrong for this smoke — it is the 30-min canary cadence for the
# entire staging SaaS stack, and silent failure here masks the
# exact regressions the smoke exists to surface (AMI rot, CF cert
# drift, WorkOS session breakage, secret rotations). Same class of
# failure as PR#461 (`sweep-stale-e2e-orgs`) where Phase-3 silent
# failure leaked EC2. The four other `e2e-staging-*` workflows
# KEEP `continue-on-error: true` per RFC #219 §1 — they are
# advisory and matrix-style; this one is the canary. A follow-up
# `notify-failure` step below also surfaces breakage to ops even
# if branch-protection wiring is adjusted to keep this off the
# required-checks list.
# 25 min headroom over the 15-min TLS-readiness deadline in
# tests/e2e/test_staging_full_saas.sh (#2107). Without the buffer
# the job is killed at the wall-clock 15:00 mark BEFORE the bash
@ -65,7 +77,10 @@ jobs:
env:
MOLECULE_CP_URL: https://staging-api.moleculesai.app
MOLECULE_ADMIN_TOKEN: ${{ secrets.MOLECULE_STAGING_ADMIN_TOKEN }}
# 2026-05-11: secret canonicalised from MOLECULE_STAGING_ADMIN_TOKEN
# (dead in org secret store) to CP_STAGING_ADMIN_API_TOKEN per
# internal#322 — see this PR for the cross-workflow sweep.
MOLECULE_ADMIN_TOKEN: ${{ secrets.CP_STAGING_ADMIN_API_TOKEN }}
# MiniMax is the smoke's PRIMARY LLM auth path post-2026-05-04.
# Switched from hermes+OpenAI after #2578 (the staging OpenAI key
# account went over quota and stayed dead for 36+ hours, taking
@ -111,7 +126,7 @@ jobs:
- name: Verify admin token present
run: |
if [ -z "$MOLECULE_ADMIN_TOKEN" ]; then
echo "::error::MOLECULE_STAGING_ADMIN_TOKEN not set"
echo "::error::CP_STAGING_ADMIN_API_TOKEN not set"
exit 2
fi
@ -241,7 +256,7 @@ jobs:
- name: Teardown safety net
if: always()
env:
ADMIN_TOKEN: ${{ secrets.MOLECULE_STAGING_ADMIN_TOKEN }}
ADMIN_TOKEN: ${{ secrets.CP_STAGING_ADMIN_API_TOKEN }}
run: |
set +e
# Slug prefix matches what test_staging_full_saas.sh emits
@ -316,3 +331,16 @@ jobs:
echo "::warning::smoke teardown left ${#leaks[@]} leak(s): ${leaks[*]}"
fi
exit 0
- name: Notify on smoke failure
# Fail-loud companion to dropping `continue-on-error: true`.
# The Open-issue-on-failure step above handles the human-facing
# alert; this step emits a clearly-tagged ::error:: line that
# log-tail consumers (Loki SOPRefireRule, orchestrator triage
# loop) can grep on. Mirrors PR#461's sweep-stale-e2e-orgs
# pattern. Runs AFTER the teardown safety net (which is
# if: always()) so failures don't suppress cleanup.
if: failure()
run: |
echo "::error::staging-smoke FAILED — staging SaaS canary is red. See prior step logs + the auto-filed alert issue. Common causes: (a) CP_STAGING_ADMIN_API_TOKEN secret missing/rotated, (b) staging-api.moleculesai.app 5xx, (c) MiniMax/Anthropic LLM key dead, (d) AMI/CF/WorkOS drift. The 30-min cron will retry, but a chronic red here indicates the staging SaaS stack is broken end-to-end."
exit 1

View File

@ -49,7 +49,15 @@ Runs the harness with `E2E_INTENTIONAL_FAILURE=1`, which poisons the tenant admi
Set in **Settings → Secrets and variables → Actions → Repository secrets**:
### `MOLECULE_STAGING_ADMIN_TOKEN`
### `CP_STAGING_ADMIN_API_TOKEN`
> **Historical-rename note (2026-05-11):** previously named
> `MOLECULE_STAGING_ADMIN_TOKEN`. Canonicalised to
> `CP_STAGING_ADMIN_API_TOKEN` per internal#322 (the Railway staging
> service exposes it as `CP_ADMIN_API_TOKEN`; the `CP_*` repo-secret
> prefix matches the upstream env name + makes the service it talks
> to obvious in workflow YAMLs). See the original PR for the
> cross-workflow sweep.
The `CP_ADMIN_API_TOKEN` env currently set on the Railway staging molecule-platform → controlplane service.