molecule-core/.github/workflows
Hongming Wang 0082568448 ci: canary-verify graceful-skip + draft auto-promote staging→main
Two related workflow hygiene changes:

## (1) canary-verify: graceful-skip when canary secrets absent

Before: canary-verify hit `scripts/canary-smoke.sh` which exited
non-zero when CANARY_TENANT_URLS was empty. Every main publish
ran → canary-verify failed → red check on main CI signal (7/7 in
past 24h). Noise, no value.

After: smoke step detects the missing-secrets case, writes a
warning to the step summary, sets an output `smoke_ran=false`,
and exits 0. The workflow completes green without pretending to
have tested anything.

Gated downstream: `promote-to-latest` now requires BOTH
`needs.canary-smoke.result == success` AND
`needs.canary-smoke.outputs.smoke_ran == true`. A skip does NOT
auto-promote — manual `promote-latest.yml` remains the release
gate while Phase 2 canary is absent (see
molecule-controlplane/docs/canary-tenants.md for the fleet
stand-up plan + decision framework).

When the canary fleet is stood up and secrets populated: delete
the early-exit branch + the smoke_ran gate. The workflow goes back
to its original "smoke gates promotion" semantics.

## (2) auto-promote-staging.yml — draft

New workflow that fires after CI / E2E Staging Canvas / E2E API /
CodeQL complete on the staging branch, checks that ALL four are
green on the same SHA, and fast-forwards `main` to that SHA.

Shipped disabled: the promote step is gated behind repo variable
`AUTO_PROMOTE_ENABLED=true`. Until that's set, the workflow
dry-runs and logs what it would have done. Toggle via Settings →
Variables when staging CI has been reliably green for a few days.

Safety:
- workflow_run events only fire on push to staging (PRs into
  staging don't promote).
- Every required gate must be `completed/success` on the same
  head_sha. Pending / failed / skipped / cancelled → abort.
- `--ff-only` push. Refuses to advance main if it has diverged
  from staging history (someone landed a direct-to-main commit
  that's not on staging). Human resolves the fork.
- `workflow_dispatch` with `force=true` lets us test the flow
  end-to-end before flipping the variable on.

Motivation: molecule-core#1496 has been open with 1172 commits
divergence between staging and main. Today that trapped PR #1526
(dynamic canvas runtime dropdown) on staging while prod users
hit the hardcoded-dropdown bug. Auto-promote retires the bulk
staging→main PR pattern once the staging CI it depends on is
reliable.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-22 22:39:23 +00:00
..
auto-promote-staging.yml ci: canary-verify graceful-skip + draft auto-promote staging→main 2026-04-22 22:39:23 +00:00
canary-staging.yml fix(e2e): CP DELETE /cp/admin/tenants body uses 'confirm', not 'confirm_token' 2026-04-21 04:50:28 -07:00
canary-verify.yml ci: canary-verify graceful-skip + draft auto-promote staging→main 2026-04-22 22:39:23 +00:00
ci.yml perf(ci): move all public-repo workflows to ubuntu-latest 2026-04-22 12:56:49 -07:00
codeql.yml perf(ci): move all public-repo workflows to ubuntu-latest 2026-04-22 12:56:49 -07:00
e2e-api.yml perf(ci): move all public-repo workflows to ubuntu-latest 2026-04-22 12:56:49 -07:00
e2e-staging-canvas.yml fix(e2e): CP DELETE /cp/admin/tenants body uses 'confirm', not 'confirm_token' 2026-04-21 04:50:28 -07:00
e2e-staging-saas.yml ci(e2e): wire MOLECULE_STAGING_OPENAI_KEY into workflow env 2026-04-21 11:24:59 -07:00
e2e-staging-sanity.yml fix(e2e): CP DELETE /cp/admin/tenants body uses 'confirm', not 'confirm_token' 2026-04-21 04:50:28 -07:00
promote-latest.yml perf(ci): move all public-repo workflows to ubuntu-latest 2026-04-22 12:56:49 -07:00
publish-canvas-image.yml perf(ci): move all public-repo workflows to ubuntu-latest 2026-04-22 12:56:49 -07:00
publish-workspace-server-image.yml perf(ci): move all public-repo workflows to ubuntu-latest 2026-04-22 12:56:49 -07:00