Migration 043 (2026-04-25) introduced the workspace_status enum but
omitted two values application code had been writing for days, so every
UPDATE that tried to write either value failed silently in production:
'awaiting_agent' (since 2026-04-24, commit 1e8b5e01):
- handlers/workspace.go:333 — external workspace pre-register
- handlers/registry.go (via PR #2382) — liveness offline transition
- registry/healthsweep.go (via PR #2382) — heartbeat-staleness sweep
'hibernating' (since hibernation feature shipped):
- handlers/workspace_restart.go:271 — DB-level claim before stop
All four/five sites swallowed the enum-cast error. User-visible impact:
external workspaces never transition to a stale state when their agent
disconnects (canvas shows them stuck on 'online'/'degraded' indefinitely),
new external workspaces never advance past 'provisioning', and idle
workspaces never auto-hibernate (resources held forever).
PR #2382 didn't *cause* this — it inherited the gap and added two more
silent-fail paths on top. The pre-existing two had been broken for five
days and went unnoticed because:
1. sqlmock matches SQL by regex, not against the live enum constraint.
Every test passed despite the prod-only failure.
2. The handlers either drop the Exec error entirely (workspace.go:333)
or log+continue without an alert (the other three).
Fix in three pieces:
1. migrations/046_*.up.sql — ALTER TYPE workspace_status ADD VALUE
'awaiting_agent', 'hibernating'. IF NOT EXISTS makes it idempotent
across re-runs (RunMigrations re-applies until schema_migrations
records the file). ALTER TYPE ADD VALUE doesn't take a heavy lock
and commits immediately, safe under live traffic.
2. migrations/046_*.down.sql — full rename → recreate → cast → drop
recipe. Postgres has no DROP VALUE so this is the only honest
rollback. Pre-flights existing rows to compatible values
(awaiting_agent → offline, hibernating → hibernated) before the
type swap.
3. internal/db/workspace_status_enum_drift_test.go — static gate that
parses every UPDATE/INSERT against `workspaces` in workspace-server/
internal/, extracts every status literal, and asserts each is in
the enum union (CREATE TYPE + every ALTER TYPE ADD VALUE). The gate
runs in unit tests, no DB required, and would have caught both
omissions on the day they shipped. Pattern matches
feedback_behavior_based_ast_gates and feedback_mock_at_drifting_layer.
Verification:
- go test ./internal/db/ -count=1 -race ✓
- go vet ./... ✓
- Drift gate flips red if I delete either ADD VALUE from the migration
(validated via local mutation).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>