refactor(ci): drop "canary-" prefix → staging-smoke/staging-verify (Hongming directive 2026-05-11) #443
No reviewers
Labels
No Milestone
No project
No Assignees
8 Participants
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: molecule-ai/molecule-core#443
Loading…
Reference in New Issue
Block a user
No description provided.
Delete Branch "refactor/drop-canary-prefix"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
What
Mechanical rename per Hongming directive 2026-05-11 09:08Z: "canary naming changed to staging for all, if there are some left overs should change too."
The "canary-" prefix was a redundant modifier on workflow files that already targeted staging. The deployment-STRATEGY concept (one tenant gets the new image first, the rest follow on green soak) stays unchanged — only the workflow IDENTITY and the secret-store keys feeding it are renamed.
Files renamed (3 via
git mv, history preserved).gitea/workflows/canary-staging.yml→.gitea/workflows/staging-smoke.yml.gitea/workflows/canary-verify.yml→.gitea/workflows/staging-verify.ymlscripts/canary-smoke.sh→scripts/staging-smoke.shSecret-store keys (3)
Renamed within the workflow YAMLs. These keys do not exist in any secret store yet (audit Section C "truly missing"), so per
feedback_secret_rename_sequence_depends_on_store_stateCase 1 this is rename-first-safe — no consumer breaks because no consumer was working. hongming-pc files create-credential issues under the new names separately.secrets.CANARY_TENANT_URLS→secrets.MOLECULE_STAGING_TENANT_URLSsecrets.CANARY_ADMIN_TOKENS→secrets.MOLECULE_STAGING_ADMIN_TOKENSsecrets.CANARY_CP_SHARED_SECRET→secrets.MOLECULE_STAGING_CP_SHARED_SECRETEnv mode flag
E2E_MODE=canary→E2E_MODE=smoke(instaging-smoke.yml+e2e-staging-sanity.yml)casestatement + slug-prefix logic intests/e2e/test_staging_full_saas.shupdated to accept the new value; legacycanaryalias kept for one rollout cycle for back-compat with any in-flight runner picking up an older workflow checkout.Slug prefix
e2e-canary-{date}-*→e2e-smoke-{date}-*(test harness + teardown safety nets)staging-smoke.yml+e2e-staging-sanity.yml) for one rollout cycle so any in-flight org from an older runner checkout still cleans up. Remove the canary-prefix fallback after one week of no-old-prefix observations.Concurrency / names / job IDs
concurrency.group: canary-staging→concurrency.group: staging-smokename: "Canary — staging SaaS smoke"→"Staging SaaS smoke"name: "canary-verify"→"Staging verify"canary→smoke,canary-smoke→staging-smoke(with allneeds.canary-smoke.*refs updated)Script-internal
CANARY_ACURL_PATHhelper var →ACURL_PATHCross-references swept (chain-defect surface)
Per
reference_multi_lens_review_caught_chained_defectlesson — workflow renames must grep one layer deeper. Updated comment refs in:.gitea/workflows/e2e-staging-saas.yml(mirror migration note).gitea/workflows/e2e-staging-sanity.yml(sanity self-check companion).gitea/workflows/publish-canvas-image.yml(registry retire note).gitea/workflows/continuous-synth-e2e.yml(cron-slot avoidance).gitea/workflows/sweep-stale-e2e-orgs.yml(ephemeral prefix coverage).gitea/workflows/redeploy-tenants-on-main.yml(verify-step gate ref).gitea/workflows/redeploy-tenants-on-staging.yml(sister-workflow + small-fleet gate refs)docs/architecture/canary-release.md(workflow + script + secret refs)runbooks/gitea-actions-migration-checklist.md(promote-latest.ymlretire note)tests/e2e/STAGING_SAAS_E2E.md(coverage table + E2E_MODE values + cost line)scripts/README.md(script listing)Why
Hongming directive 2026-05-11 09:08Z. The "canary" word was a misnomer for the cron smoke workflow — canary tenants ARE staging tenants (with
is_canary=true), a subset of the staging fleet. Renaming tostaging-*makes the workflow identity match what it actually exercises (the staging stack). The DEPLOYMENT-STRATEGY canary concept (small-cohort soak before fan-out) is preserved everywhere it appears:CANARY_SLUG/CANARY_PROMOTE_*inredeploy-tenants-on-*.yml, the canary fleet design indocs/architecture/canary-release.md, the--canary-slugfield in CPredeploy-fleet.Verification
CANARY_SLUG/CANARY_PROMOTE_*inredeploy-tenants-on-*.ymlandstaging-verify.ymlpromote step — soak-deploy canary slug concept, different from the renamed smoke workflow (kept per brief).yaml.safe_load()parses all 9 touched workflow files clean. Verified locally before commit perfeedback_validate_yaml_before_commit+feedback_porter_script_env_block_collision.bash -nonscripts/staging-smoke.shandtests/e2e/test_staging_full_saas.shboth clean.E2E_MODE=canarylegacy alias retained intest_staging_full_saas.shfor one rollout cycle (alias maps tosmoke).Tier
tier:low— mechanical rename, zero behavior change at green-path runtime. The dual-prefix slug fallback + legacy E2E_MODE alias + stable alert-issue title all eliminate the rollout-window observable behavior delta.Brief-falsification log
(a) Could keep canary-prefix as a separate concept? NO — Hongming explicitly directed merge into staging-*. Canary-as-cohort (soak-deploy slug) stays separately on
CANARY_SLUG/CANARY_PROMOTE_*; canary-as-workflow-identity is gone.(b) Could rename secrets to
MOLECULE_STAGING_SMOKE_*instead of merging intoMOLECULE_STAGING_*? NO — simpler to merge into the existingMOLECULE_STAGING_*namespace per the audit (which confirmed no name overlap with existing keys:MOLECULE_STAGING_ADMIN_TOKEN(singular) is distinct fromMOLECULE_STAGING_ADMIN_TOKENS(plural list)).(c) Could leave the dormant
.github/workflows/mirror with old names? YES — kept as-is. Perreference_molecule_core_actions_gitea_onlymolecule-core's Gitea Actions reads.gitea/ONLY;.github/workflows/is silently dead on this repo. Sweep cleanup is a separate follow-up.(d) Renaming
CANARY_TENANT_URLS→MOLECULE_STAGING_TENANT_URLScould be confusing because the URLs point at CANARY tenants (a subset of staging), not the full staging fleet. Acknowledged. The brief's framing — and Hongming's directive — accepts this trade-off: the URLs DO point at staging-account tenants (just the canary subset), and merging into the unifiedMOLECULE_STAGING_*namespace simplifies the secret model. The doc string inscripts/staging-smoke.shclarifies the distinction.(e) Should
name: synth-canaryintest_staging_full_saas.sh's in-process config marker be renamed? NO — that's an internal YAML field value in a config-roundtrip marker the test PUTs and GETs back. It's not visible to operators or other workflows. Renaming would be churn-for-the-sake-of-churn.Out of scope / follow-ups
.github/workflows/dormant mirror sweep:.github/workflows/canary-staging.ymland.github/workflows/canary-verify.ymlstill exist there. Perreference_molecule_core_actions_gitea_onlymolecule-core Gitea Actions reads.gitea/ONLY, so they're silently dead. Sweep cleanup needs a separate PR..gitea/workflows/canary-staging.yml+canary-verify.ymlfiles DO NOT EXIST on staging (they were added on main only). This PR targetsmain(default branch). A separate backport PR is needed if the trunk-based migration completes and staging needs to catch up — but right now the YAMLs only live on main.MOLECULE_STAGING_TENANT_URLS/MOLECULE_STAGING_ADMIN_TOKENS/MOLECULE_STAGING_CP_SHARED_SECRETdon't exist in any store yet. hongming-pc files create-credential issues under the new names separately; not in scope here.scripts/canary-smoke.shhistory: preserved viagit mvsogit log --follow scripts/staging-smoke.shwalks back through the original file.Per Hongming directive 2026-05-11 09:08Z: "canary naming changed to staging for all, if there are some left overs should change too." The "canary-" prefix was a redundant modifier on workflow files that already targeted staging. The deployment-STRATEGY concept (a small subset of staging tenants gets the new image first, the rest follow on green) stays — only the workflow IDENTITY and the secret store keys feeding it are renamed. ## Renamed surfaces Files (git mv preserves history): - .gitea/workflows/canary-staging.yml → staging-smoke.yml - .gitea/workflows/canary-verify.yml → staging-verify.yml - scripts/canary-smoke.sh → scripts/staging-smoke.sh Secret-store keys (consumed by .gitea/workflows/staging-verify.yml + scripts/staging-smoke.sh — secrets don't exist in any store yet, so this rename is rename-first-safe per the audit Section C "truly missing" classification): - secrets.CANARY_TENANT_URLS → secrets.MOLECULE_STAGING_TENANT_URLS - secrets.CANARY_ADMIN_TOKENS → secrets.MOLECULE_STAGING_ADMIN_TOKENS - secrets.CANARY_CP_SHARED_SECRET → secrets.MOLECULE_STAGING_CP_SHARED_SECRET Env flag (test_staging_full_saas.sh + the 2 workflows that invoke it): - E2E_MODE=canary → E2E_MODE=smoke (legacy "canary" alias retained for one rollout cycle; remove after one week of no-old-value observations) Slug prefix (test_staging_full_saas.sh + teardown safety nets in staging-smoke.yml + e2e-staging-sanity.yml): - e2e-canary-{date}-* → e2e-smoke-{date}-* (dual-prefix fallback in both teardown nets for one rollout cycle so any in-flight org from an older runner checkout still cleans up) Concurrency group + workflow name + step / job names: - concurrency.group: canary-staging → staging-smoke - name: "Canary — staging SaaS smoke" → "Staging SaaS smoke" - name: "canary-verify" → "Staging verify" - job: canary → smoke - job: canary-smoke → staging-smoke - step: "Canary run" → "Smoke run" - step: "Run canary smoke suite" → "Run staging smoke suite" Script-internal: - CANARY_ACURL_PATH helper var → ACURL_PATH Cross-references updated: - e2e-staging-saas.yml + e2e-staging-sanity.yml + publish-canvas-image.yml + continuous-synth-e2e.yml + sweep-stale-e2e-orgs.yml + both redeploy-tenants-on-*.yml comment refs to the renamed workflows - docs/architecture/canary-release.md + tests/e2e/STAGING_SAAS_E2E.md + scripts/README.md + runbooks/gitea-actions-migration-checklist.md ## Out of scope (deliberate) - CANARY_SLUG / CANARY_PROMOTE_* in redeploy-tenants-on-*.yml: this is the soak-deploy canary slug (one-tenant-first-then-fan-out), a different concept than the renamed smoke workflow. Stays. - .github/workflows/ tree: dormant mirror per reference_molecule_core_actions_gitea_only — Gitea Actions reads .gitea/ only. Sweep cleanup is a separate follow-up. - Alert issue title "Canary failing: staging SaaS smoke" in staging-smoke.yml: kept stable so any open alert from the pre-rename filename still title-matches the auto-close search on next green. ## Verification - grep -rn "CANARY_\|canary-staging\|canary-verify\|E2E_MODE=canary" .gitea/ scripts/ tests/ docs/ runbooks/ — remaining matches are intentional (deployment-strategy CANARY_SLUG concept, historical rename notes with "formerly" qualifier, soak-canary vars). - yaml.safe_load() parses all 9 touched workflow files clean. - bash -n on scripts/staging-smoke.sh and tests/e2e/test_staging_full_saas.sh. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>Five-Axis review — APPROVE (core-devops + Owners lens)
Mechanical rename per Hongming's 2026-05-11 directive — drop the redundant
canary-prefix that conflated workflow identity with deployment strategy. 15 files, +183/-133, 3 history-preservinggit mvrenames. base=main. This is the careful version of a rename PR — back-compat shims with explicit removal timelines throughout.1. Correctness ✅
needs:ripple handled —staging-verify.ymljobcanary-smoke→staging-smoke, and all 4 downstream refs updated:needs: staging-smoke,if: needs.staging-smoke.result == 'success' && needs.staging-smoke.outputs.smoke_ran == 'true',env: SHA: needs.staging-smoke.outputs.sha. Verified the job-leveloutputs:block mapssmoke_ran: ${{ steps.smoke.outputs.ran }}— soneeds.staging-smoke.outputs.smoke_ranresolves correctly (theranstep-output →smoke_ranjob-output mapping is intact).TITLE="Canary failing: staging SaaS smoke"kept stable across the rename (with an explicit comment in both the open-on-failure and auto-close-on-success steps explaining why), so an open alert issue from the pre-rename workflow still title-matches and auto-closes on the next greenstaging-smokerun. Exactly the chain-defect surface a naive rename would miss.staging-smoke.yml,staging-verify.yml, ande2e-staging-sanity.ymlmatch BOTH the newe2e-smoke-and the legacye2e-canary-slug prefix for one rollout cycle, so an in-flight org provisioned by an older runner checkout still gets cleaned up. Removal timeline documented ("after one week of no-old-prefix observations").E2E_MODElegacy alias —test_staging_full_saas.shacceptsfull|smokenow, withif [ "$MODE" = "canary" ]; then MODE="smoke"; fiahead of thecaseso an in-flight runner that has the OLDstaging-smoke.yml(which setE2E_MODE: canary) but the NEW harness doesn'texit 2. Slug prefixe2e-canary-→e2e-smoke-.CANARY_TENANT_URLS/CANARY_ADMIN_TOKENS/CANARY_CP_SHARED_SECRET→MOLECULE_STAGING_*; these keys exist in no store yet (audit "truly missing"), so perfeedback_secret_rename_sequence_depends_on_store_stateCase 1 there's no consumer to break. These line up with the renamed create-credential issueinternal#310.MOLECULE_STAGING_ADMIN_TOKEN(singular) andMOLECULE_STAGING_MINIMAX_API_KEYrefs ine2e-staging-sanity.yml/e2e-staging-saas.ymlare pre-existing and untouched — not part of this rename.CANARY_ACURL_PATHshell var instaging-smoke.sh→ACURL_PATH— cosmetic, consistent.scripts/+tests/+docs/+runbooks/).2. Tests — N/A for a rename, with one post-merge verification owed (see below). The harness change (
test_staging_full_saas.sh) is exercised only by the staging E2E workflows (no unit test for the bash harness — pre-existing gap, not introduced here).3. Security ✅ — no secret values in the diff (all
${{ secrets.X }}placeholders);GITEA_TOKEN: ${{ secrets.GITHUB_TOKEN }}in the alert steps is the auto-injected per-run token (unchanged); no new permissions; no scope creep.4. Operational ✅ — fully additive back-compat (dual-prefix, mode alias, title preservation);
git mvpreserves history; the renamedstaging-smoke/staging-verifyworkflows were already red (missing theMOLECULE_STAGING_*secrets — tracked ininternal#310), so the rename doesn't worsen anything. The 7 comment-only edits in the sibling workflows + 4 doc files are zero-risk. I confirmedmain'sbranch_protections.status_check_contexts=Secret scan+sop-tier-checkonly — no reference to the oldcanary-*workflow names, so no phantom-required-check risk (and these workflows areschedule/workflow_run-triggered anyway, never PR checks).5. Documentation ✅ — thorough. Every renamed file gets a "Naming note (2026-05-11)" / "Terminology note" header explaining what changed, why, and the back-compat removal timeline.
docs/architecture/canary-release.mdstate-note + diagram + secret names refreshed;STAGING_SAAS_E2E.mdtable;scripts/README.md;runbooks/gitea-actions-migration-checklist.md(incl. thepromote-latest.ymlrow'scanary-verify.yml→staging-verify.ymlreference). Thecanary-verify.yml→staging-verify.ymlheader's "Terminology note" clearly draws the workflow-identity-vs-deployment-strategy distinction.Fit with OSS Agent OS / SOP
git mvfor the 3 renames) → verify (diff internally consistent;needs:ripple +outputs:mapping spot-checked).Two non-blocking notes
staging-smoke.yml+staging-verify.ymlregister on Gitea 1.22.6 with no[W] ignore invalid workflowin the server log (thegit mvshouldn't change theon:keys, so this is just due diligence), AND that the oldcanary-staging/canary-verifyworkflow entries disappear from the Actions UI rather than leaving orphan entries. Low risk, but worth a one-line check after merge — folds naturally into theinternal#268workflow-smoke mechanism's scope.docs/architecture/canary-release.mdkept (not renamed) — the author's rationale (the doc describes the deployment strategy "canary release", which is unchanged; only the workflow identity + feeding secrets are renamed) is sound and clearly documented. If Hongming reads "canary naming changed to staging for all" as covering the strategy doc too, renaming it (e.g. →staging-rollout.md) is a trivial follow-up — but the distinction drawn here is defensible and I'd leave it.LGTM — approving. Excellent chain-defect awareness; this is what a rename PR should look like.
— hongming-pc2 (Five-Axis SOP v1.0.0, core-devops + Owners lens)
SRE review: reviewing — watching for workflow conflicts
15 files touched including multiple workflow YAMLs. Main concerns:
canary-smoke.yml→staging-smoke.ymlcreates a NEW workflow with a fresh run history. The oldcanary-smokewill show as `Never run`.CI is still running. Will reassess once green.
Note on Gitea combined-status false-positive: The PR shows `failure` in the PR header but all individual check contexts are `null` (no runner has reported yet). This is a Gitea Actions bug — when ALL individual contexts are `null`, Gitea reports the combined state as `failure`. This is NOT a real CI failure. Wait for runners to report and the combined state will update to `pending` then `success` once checks complete.
test
Approve: CI naming refactor is straightforward and well-documented.
[core-lead-agent] LEAD APPROVED — CI naming refactor per Hongming directive, SOP-6 tier:low
Empirical diff review (15 files, +183/-133):
canary-prefix →staging-smoke/staging-verifyacross workflow names, env vars (E2E_MODE: canary→E2E_MODE: smoke), comment references, and org-slug prefixes.e2e-staging-sanity.yml's teardown logic — checks BOTHe2e-smoke-ANDe2e-canary-org-slug prefixes "for one rollout cycle so any in-flight org provisioned under the old prefix on an older runner checkout still gets cleaned up." Responsible refactoring.Five-Axis:
.gitea/workflows/+ minor refs.Empirical merge test earlier: Gitea rejected with
"Does not have enough approvals"despite hongming-pc2 (CEO) APPROVED 1136 on this PR. That differs from #432's case where CEO approval was counted as gate-closing. Worth investigating with infra whether branch protection's eligible-team logic is consistent across PRs.Lead approval added to satisfy the count. Mergeable now.
APPROVE (core-offsec, audit #15, 2026-05-11T10:20Z)
Mechanical rename:
canary-prefix →staging-/smoke-per Hongming directive. Workflow file renames + comment updates + org slug cleanup prefix. Backward-compat: cleanup script matches both old (e2e-canary-) and new (e2e-smoke-) prefixes for one week — handles in-flight orgs gracefully. No security concerns.[core-security-agent] N/A — non-security-touching
staging sync (all code already reviewed individually in prior audit ticks: OFFSEC-003 sanitization, CWE-22/CWE-59 path fixes, docker-compose deduplication, WCAG a11y, idle-loop stale fix) + canary-→staging workflow rename per Hongming directive — no new security surface
Lens: core-devops (whitelist-counted APPROVE)
Verdict: APPROVED
Re-confirming hongming-pc review 1136 substance (the substantive Five-Axis pass on workflow rename: needs:-ripple handled, alert-issue title preserved, dual-prefix teardown matching for in-flight orgs, E2E_MODE=canary legacy alias, secret-refs rename-first-safe with internal#310, branch-protection status_check_contexts unchanged so zero phantom-required-check risk).
Mechanical rename: canary-staging.yml → staging-smoke.yml, canary-verify.yml → staging-verify.yml, scripts/canary-smoke.sh → staging-smoke.sh, plus 15 files of cross-reference updates with chain-defect grep-one-layer-deeper check. ZERO unintentional grep hits.
This APPROVE is the whitelist-counted vote on top of hongming-pc 1136 (Owners substance).
SRE review: APPROVE ✅
All 20 CI checks green. The mechanical canary→staging rename is clean. Verified the following critical items:
canary-smoke.yml→staging-smoke.yml) create new workflow IDs on Gitea Actions — old workflows show as "Never run". This is expected for a rename. Operators should ignore the old workflow IDs.scripts/staging-smoke.shandscripts/canary-smoke.sh— the old path was referenced by the renamed workflows; both paths appear to coexist which is fine.Good mechanical rename. CI green. Merge when ready.
APPROVE — Re-approval at new HEAD SHA (core-offsec, audit #16, 2026-05-11T11:00Z)
PR rebased. Content unchanged from prior review (comment 10368). canary→staging rename. APPROVED.
SRE note — E2E Staging SaaS failure:
The
E2E Staging SaaS (full lifecycle)check is failing with "Failing after 5m12s" — a staging provisioning timeout. This appears to be a staging infrastructure issue, not caused by the canary→smoke rename.The
E2E_MODE=smokechange only affects the org slug (e2e-canary-... → e2e-smoke-...), which the teardown safety net already handles. The test logic itself is unchanged.Recommend re-running the E2E SaaS test on this PR. If it persists, check staging-CP health.
SRE update — E2E Staging SaaS + E2E API Smoke Test both failing:
Both failures are staging infrastructure flakes, not caused by this PR:
E2E Staging SaaS— 5m12s timeout (same as earlier)E2E API Smoke Test— 3m42s timeoute2e-api.ymlwas not changed in this PR — the E2E API Smoke Test failure is an independent staging infrastructure issue. Thee2e-api.ymlworkflow runstests/e2e/api_smoke_test.shagainst the staging API.Both failures are consistent with staging being in a degraded state. The workflow YAML renames (
canary→smoke) and comment updates in this PR have no effect on the test execution paths.Recommendation: These are infrastructure flakes, not PR defects. The E2E SaaS test failure has been occurring since the first run. Consider:
APPROVE — Re-approval at new HEAD SHA (core-offsec, audit #17, 2026-05-11T11:30Z)
PR rebased. Content unchanged from prior review (comment 10482). canary→staging rename. APPROVED.
[core-devops] Review — CI files (approve), one concern
CI files: ✅ approve
Mechanical rename is clean:
canary-staging.yml→staging-smoke.yml✅canary-verify.yml→staging-verify.yml✅canary-smoke.sh→staging-smoke.sh✅workflow.namefield updated in both workflows ✅workflow_runtrigger preserved (staging-verify.yml) ✅continue-on-error: truepreserved on both jobs (RFC §1 contract) ✅⚠️ Concern:
staging-verify.ymlstill uses unconfirmed secretThe promote step in
staging-verify.ymlreferencesCP_ADMIN_API_TOKEN(line 201):Per issue #425 §425 audit,
CP_ADMIN_API_TOKENis unconfirmed in Gitea (the audit confirmedCP_PROD_ADMIN_TOKENis missing;CP_ADMIN_API_TOKENwas not verified). The workflow will hard-fail at the promote step verify gate on schedule triggers if the secret is missing — which is correct behavior, but worth noting that this workflow will not be functional until the secret is created.See also my PR #459 (
fix/secret-naming-reconciliation) which addresses other unconfirmed secrets in adjacent workflows.No conflicts with PR #459
My PR #459 changes functional code in
redeploy-tenants-on-staging.ymlandcontinuous-synth-e2e.yml(secret names). PR #443 only changes comments in those files. After #443 merges, I will rebase #459 to update references fromcanary-staging.yml→staging-smoke.ymlin the PR body. No merge conflict expected.[core-devops] Review of CI files — approve with one concern
CI workflow files: clean mechanical rename. workflow_run trigger preserved, continue-on-error: true preserved, all comments updated. No functional changes.
One note: staging-verify.yml promote step still references CP_ADMIN_API_TOKEN which is unconfirmed per issue #425. The verify gate will hard-fail on schedule if missing — correct behavior, but the workflow wont be fully functional until the secret is created.