molecule-core/.github/workflows
Hongming Wang 187a9bf87a feat(e2e): staging full-SaaS workflow — per-run org provision + leak-free teardown
Dedicated CI/CD lane that exercises the whole SaaS cross-EC2 shape end to
end, against live staging:

  1. Accept terms / create org (POST /cp/orgs) — catches ToS gate, slug
     validation, billing/quota, member insert regressions.
  2. Wait for tenant EC2 + cloudflared tunnel + TLS propagation (up to
     15 min cold).
  3. Provision a parent + child workspace via the tenant URL.
  4. Wait both online (exercises the SaaS register + token bootstrap
     flow fixed in #1364).
  5. A2A round-trip on parent — validates the full LLM loop (MCP tools,
     provider auth, JSON-RPC response shape, proxy SSRF gate).
  6. HMA memory write + read — validates awareness namespace + scope
     routing.
  7. Peers + activity smoke — route-registration regression guard.
  8. Teardown via DELETE /cp/admin/tenants/:slug + leak assertion — a
     leaked org at teardown fails CI with exit 4.

Why a dedicated workflow (not folded into ci.yml):
  - ~20 min wall clock per run (EC2 boot is the long pole). Too slow
    for every PR push.
  - Needs its own concurrency group (staging has an org-create quota
    and two overlapping runs would race on slug prefix).
  - Distinct secret surface (session cookie + admin bearer) — keep it
    off PR jobs that don't need them.

Triggers: push to main (provisioning-critical paths only), PRs on the
same paths, manual workflow_dispatch (with runtime + keep_org inputs),
and 07:00 UTC nightly cron for drift detection.

Belt-and-braces teardown: the script installs an EXIT trap, and the
workflow has an always()-step that greps e2e-YYYYMMDD-* orgs created
today and force-deletes them via the idempotent admin endpoint. Covers
the case where GH cancels the runner before the trap fires.

Docs: tests/e2e/STAGING_SAAS_E2E.md — what's covered, how to provision
the two required secrets, local-dev notes, cost (~$0.007/run), known
gaps (canvas UI + delegation + claude-code).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 03:54:09 -07:00
..
canary-verify.yml fix(ci): replace sleep 360 with health-check poll in canary-verify (#1013) 2026-04-19 19:29:15 -07:00
ci.yml fix(canvas/test): patch test regressions from PR #1243 + proximity hitbox fix (#1313) 2026-04-21 07:06:57 +00:00
codeql.yml ci: add workflow-level concurrency to ci.yml and codeql.yml (#1242) 2026-04-21 03:07:31 +00:00
e2e-api.yml fix(ci): update working-directory for workspace-server/ and workspace/ renames 2026-04-18 07:05:44 -07:00
e2e-staging-saas.yml feat(e2e): staging full-SaaS workflow — per-run org provision + leak-free teardown 2026-04-21 03:54:09 -07:00
promote-latest.yml ci(promote-latest): suppress brew cleanup that hits perm-denied on shared runner 2026-04-19 05:55:45 -07:00
publish-canvas-image.yml ci: update GitHub Actions to current stable versions (closes #780) 2026-04-18 12:04:10 -07:00
publish-workspace-server-image.yml feat(router): /cp/* reverse-proxy to CP + same-origin canvas fetches 2026-04-20 13:01:40 -07:00