test(db): real-PG migration replay-from-scratch + InitPostgres ping + redis online-status key/TTL coverage (#2150) #2452
Reference in New Issue
Block a user
Delete Branch "refile/2155-migration-replay-from-scratch"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
What
Refile of the WIP at
origin/regression/2150-migration-replay-from-scratch-real-pgfor issue #2150 ([needs-regression-test] migrations: real-PG replay-from-scratch gate + InitPostgres ping + redis online-status key/TTL). The WIP was stalled 2026-06-03, never advanced past DRAFT, and was branched from a ~200-commit-stale main (60ab864b) — merging it directly would have undone PR #2449's table-presence guard widening, the mc#1982continue-on-error: falsemask removal, and the #2149 scheduler trigger. This refile is re-implemented cleanly on currentmain(00705c11).Closes the WIP PR #2155 as superseded.
Changes (cleanly-separable from #2449 + #2149)
workspace-server/internal/db/postgres_replay_integration_test.go(NEW, 286 lines,//go:build integration):TestIntegration_MigrationReplay_FromScratch— resetspublicschema, runs the productiondb.RunMigrationsover the full forward chain with hard-fail semantics. Asserts every forward migration recorded, zero.down.sqlrecorded (#211 signature), load-bearing tables present, andagent_memoriesabsent at HEAD (late-drop ordering).TestIntegration_MigrationReplay_IsIdempotent_DoubleApply— runs the chain twice; second pass must apply zero new files. Guards the 045 crash-loop class.TestIntegration_InitPostgres_PingSucceeds/..._BadDSNFails— realInitPostgresround-trip; bad-DSN proves theDB.Ping()check is load-bearing.workspace-server/internal/db/redis_test.go(NEW, 291 lines, plain unit, miniredis):SetOnline/IsOnline/RefreshTTL(ws:<id>),CacheURL/GetCachedURL(ws:<id>:url),CacheInternalURL/GetCachedInternalURL(ws:<id>:internal_url),ClearWorkspaceKeys.TestLivenessTTL_ExceedsHeartbeatWindowpinsLivenessTTL >= 5 × 30s(180s today, regression-safe).TestIsOnline_TrueThenExpires/TestRefreshTTL_ExtendsLivenessdrive miniredis's clock for real TTL expiry.TestKeyNamespacesDoNotCollide— the three key namespaces must stay distinct..gitea/workflows/handlers-postgres-integration.yml(+27 lines):molecule_replaydatabase on the same sibling Postgres so the destructiveDROP SCHEMA public CASCADEnever touches the handlersmoleculeDB.continue-on-error: falseon the integration job (mc#1982), the preflightINTEGRATION_DB_URLcheck (PR #2166 blocker #2), the table-presence guard (PR #2449), or the scheduler (#2149) test step..gitea/scripts/detect-changes.py(+5 lines):handlers-postgresprofile now also matchesworkspace-server/internal/db/(additive — preserves theinternal/scheduler/trigger from #2149).redis.goorpostgres.gonow triggers the gate.Why this is the right shape (and not a cherry-pick)
The WIP's
handlers-postgres-integration.ymldiff would have removed:continue-on-error: falsemask removal (regression to mc#1982)INTEGRATION_DB_URLstep (PR #2166 blocker)workspace_schedules+workspace_auth_tokens+org_api_tokensguard entries (PR #2449)internal/scheduler/detect-changes trigger (#2149)So cherry-picking the WIP's CI wiring on top of current main would actively regress the gate. Re-implementing just the substantive step (the new
internal/dbtest run againstmolecule_replay) and the test files preserves all current safety nets.Test plan
Handlers Postgres Integrationjob: green.Migration replay-from-scratch gate (#2150)step passes against the full forward chain on a freshmolecule_replayDB.redis_test.gocases pass under the existingunit-testsjob (nointegrationbuild tag needed).architecture_test.go(db-is-a-leaf) stays green — new imports are confined to_test.gofiles, which it skips.Watch-fail intent
*.sqlglobbing without the.down.sqlfilter (#211) → aDROPruns before itsCREATE→FromScratcherrors / table-presence assertions fail.DoubleApplyfails on the second pass.ws:%s...key format or shrinkLivenessTTL→ aredis_test.gocase fails.References
[needs-regression-test]migrations)QA-team-20 review (agent-reviewer / qa 5-axis) — molecule-core #2452 @
894bd072Scope verified: test + CI-wiring only (4 files, +609/-0) — no production code touched.
Correctness ✅ —
detect-changes.pyadds^workspace-server/internal/db/as a new alternation branch in the handlers profile (interleaved#comments inside the parenthesised implicit string-concat are valid Python); db-package changes now correctly trigger this PG workflow. The new workflow step is appended after the scheduler integration step under the sameneeds.detect-changes.outputs.handlers == 'true'guard — purely additive, does not modify or reorder any existing step/guard.Robustness ✅ — The replay test issues
DROP SCHEMA public CASCADE, but CI runs it against a separatemolecule_replaydatabase (the step provisions it and setsINTEGRATION_DB_URLto it), so it never touches themoleculeDB the handlers tests migrated — no ordering coupling.*_FromScratchproves the full forward chain replays with zero skips (closing the gap the psql apply-loop's⊘ skippedtolerates) and asserts no.down.sqlis recorded (#211 signature).*_IsIdempotent_DoubleApplyruns the chain twice to catch the 045 non-idempotent crash-loop class. redis_test.go uses miniredis (real RESP + real TTL/expiry) and pins key names (ws:<id>,:url,:internal_url), the LivenessTTL≥6×heartbeat invariant, and a namespace-collision guard. All assertions are observable-state, not statement-pinning. Tests skip cleanly when env is unset.Security ✅ — No secrets;
PGPASSWORD: testis the ephemeral CI Postgres only; cache URLs are fake fixtures; secret-scan CI green. No content-security surface (static regex + go test against an isolated DB).Performance ✅ — Bounded: 5m test timeouts, full chain applied at most twice. Acceptable for a real-PG replay gate.
Readability — 1 minor, non-blocking:
freshIntegrationDB's doc comment says it "points the package-global db.DB at this connection," but the function does not — each call site setsDB = connexplicitly. Harmless (no behavioural bug); worth tightening the comment in a follow-up.CI gate: all required contexts green (CI/all-required, E2E API Smoke, Handlers Postgres Integration, trusted sop-checklist pull_request_target); Local-Provision failures are advisory and ignored.
Genuine outcome: APPROVE.
qa-team-20 review — APPROVE (head
894bd072)QA gate (latest-state per context, history paginated past page 1):
CI / all-required✅ ·E2E API Smoke Test✅ ·Handlers Postgres Integration(Handlers PG) ✅ · trustedsop-checklist / all-items-acked (pull_request_target)✅. gate-check-v3 ✅ and Local-Provision (advisory/stub) failures ignored per policy; thesop-checklist (pull_request)failure is superseded by the trusted_targetvariant.5-axis QA:
db.RunMigrationsover aDROP SCHEMA public CASCADEblank slate (real PG): asserts no.down.sqlrecorded (#211 down-wipe), seeds root tables (workspaces/workspace_auth_tokens/...), and double-applies the full chain to catch the 045 non-idempotent crash-loop class. redis_test.go uses miniredis with real RESP/TTL expiry against realredis.gosymbols (SetOnline/RefreshTTL/IsOnline/CacheURL/CacheInternalURL/ClearWorkspaceKeys), pins thews:<id>/ws:<id>:url/ws:<id>:internal_urlnamespaces, prefix-collision, TTL expiry via FastForward, and thatLivenessTTL(180s) ≥ 5×30s heartbeat window. All symbols verified present in source — not vacuous.- if: needs.detect-changes.outputs.handlers == 'true'step is appended after the scheduler test and runs against a SEPARATEmolecule_replayDB (no ordering coupling with the handlersmoleculeDB).PGPASSWORD: test; no token/secret/curl/${{ secrets }}exfil.db/trigger; no production code touched.Outcome: APPROVE.
Security review — security-team-21 (axis: security + safety) — head
894bd072Scope: 4 files — 2 new Go integration tests (
postgres_replay_integration_test.go,redis_test.go), 1 CI workflow step, 1detect-changes.pyregex. No production code paths change; blast radius is CI + test infra.Verdict: APPROVE — no security blockers.
Observations:
postgres:test,PGPASSWORD=test) — throwaway creds for the ephemeral CI Postgres, not real secrets. Acceptable for fixtures. Keep that Postgres unreachable beyond the runner bridge.DROP SCHEMA public CASCADEis intentional (true replay-from-scratch) and well-isolated: CI runs it against a separatemolecule_replayDB, and the test is env-gated onINTEGRATION_DB_URL(skips when unset), so a localgo testcan't nuke a dev DB by accident. Solid.INTEGRATION_DB_URLpoints at a disposable DB — anyone exporting it toward a shared/real DB would silentlyDROP SCHEMA. Consider asserting the DB name matches a disposable pattern (e.g. containsreplay/test) before the destructive reset.redis_test.gouses in-processminiredis— no network listener, no exposure.detect-changes.pyonly widens the trigger tointernal/db/**; benign.Test quality is high — closes the genuine #211 (down-before-up ordering) and #045 (non-idempotent double-apply) gaps the skip-tolerant psql loop misses. LGTM from the security axis.