test(e2e): live staging e2e — reconciler heals a terminated EC2 (core#2261) #2270
Reference in New Issue
Block a user
Delete Branch "feat/core2261-reconciler-live-e2e"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
What
A live staging E2E that proves the core#2261 instance-state reconciler
(
workspace-server/internal/registry/cp_instance_reconciler.go) actually healsa terminated EC2 against real infra — the real-infra complement to the
deterministic unit tests, which only pin the reconcile logic against fakes.
tests/e2e/test_reconciler_heals_terminated_instance.sh:runtime/model + provisioning/token conventions as
test_staging_full_saas.sh), polls the tenant API untilstatus=online,and captures its
instance_id.aws ec2 terminate-instanceson that exact capturedinstance_id(falls back to a slug-tag describe vialib/aws_leak_check.shif the id wasn't surfaced).statusleavesonline— thereconciler detected the dead instance via
IsRunningand flipped it.This is the core#2247 regression guard: a dead instance must NOT keep
reading
online. PRIMARY failing exits 1.statusreturns to
onlineon aninstance_idthat differs from theterminated one (the
onOffline → RestartByIDexisting-volume heal). Ifthe reprovision doesn't finish in the bound it's logged clearly but
does not fail — PRIMARY stands as the gate. A future tightening to a
hard fail is deliberately one edit away (noted inline).
EXIT/INT/TERMtrap deletes the tenantand leak-sweeps slug-tagged EC2, so a mid-test failure never orphans a box.
Workflow
.gitea/workflows/e2e-staging-reconciler.yml, modeled one2e-staging-saas.yml(sameCP_STAGING_ADMIN_API_TOKEN+ AWS secrets,E2E_AWS_TERMINATE_LEAKS=1, "Verify required secrets present" preflight,belt-and-braces teardown). Triggers:
workflow_dispatch+ a paths filter onthe reconciler source, the new script, and the libs (so it runs when the
reconciler changes) + a daily
schedule.NON-required initially (
continue-on-error: true) — a brand-new live E2Ethat provisions/terminates real EC2 should not hard-gate every merge until it
has a green track record. A header note documents the promotion to
branch-required.
Validation
shellcheck --severity=warning(CI-exact) clean; default-severity clean.bash -nparse-clean.tests/e2e/*.shclean (no sibling broken).lint_cleanup_traps.shclean; workflow-YAML linter + continue-on-errortracker linter clean (job-level
continue-on-errorreferences mc#1982).real EC2 and costs money. It runs against staging only in CI.
Refs core#2261, core#2247.
🤖 Generated with Claude Code
Security (core#2261). Real-infra e2e; guaranteed teardown prevents EC2 leaks; AWS-creds preflight; slug-tagged for the orphan sweeper. No prod-runtime change. Approve.
QA approve (core#2261 live reconciler e2e).