molecule-core/workspace-server/internal
Hongming Wang c6cb82e1c0 fix(workspaces): add missing 'awaiting_agent' + 'hibernating' to workspace_status enum
Migration 043 (2026-04-25) introduced the workspace_status enum but
omitted two values application code had been writing for days, so every
UPDATE that tried to write either value failed silently in production:

  'awaiting_agent' (since 2026-04-24, commit 1e8b5e01):
    - handlers/workspace.go:333 — external workspace pre-register
    - handlers/registry.go (via PR #2382) — liveness offline transition
    - registry/healthsweep.go (via PR #2382) — heartbeat-staleness sweep

  'hibernating' (since hibernation feature shipped):
    - handlers/workspace_restart.go:271 — DB-level claim before stop

All four/five sites swallowed the enum-cast error. User-visible impact:
external workspaces never transition to a stale state when their agent
disconnects (canvas shows them stuck on 'online'/'degraded' indefinitely),
new external workspaces never advance past 'provisioning', and idle
workspaces never auto-hibernate (resources held forever).

PR #2382 didn't *cause* this — it inherited the gap and added two more
silent-fail paths on top. The pre-existing two had been broken for five
days and went unnoticed because:

  1. sqlmock matches SQL by regex, not against the live enum constraint.
     Every test passed despite the prod-only failure.
  2. The handlers either drop the Exec error entirely (workspace.go:333)
     or log+continue without an alert (the other three).

Fix in three pieces:

  1. migrations/046_*.up.sql — ALTER TYPE workspace_status ADD VALUE
     'awaiting_agent', 'hibernating'. IF NOT EXISTS makes it idempotent
     across re-runs (RunMigrations re-applies until schema_migrations
     records the file). ALTER TYPE ADD VALUE doesn't take a heavy lock
     and commits immediately, safe under live traffic.

  2. migrations/046_*.down.sql — full rename → recreate → cast → drop
     recipe. Postgres has no DROP VALUE so this is the only honest
     rollback. Pre-flights existing rows to compatible values
     (awaiting_agent → offline, hibernating → hibernated) before the
     type swap.

  3. internal/db/workspace_status_enum_drift_test.go — static gate that
     parses every UPDATE/INSERT against `workspaces` in workspace-server/
     internal/, extracts every status literal, and asserts each is in
     the enum union (CREATE TYPE + every ALTER TYPE ADD VALUE). The gate
     runs in unit tests, no DB required, and would have caught both
     omissions on the day they shipped. Pattern matches
     feedback_behavior_based_ast_gates and feedback_mock_at_drifting_layer.

Verification:
  - go test ./internal/db/ -count=1 -race  ✓
  - go vet ./...                           ✓
  - Drift gate flips red if I delete either ADD VALUE from the migration
    (validated via local mutation).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 08:52:05 -07:00
..
artifacts chore: sync staging to main — 1188 commits, 5 conflicts resolved (#1743) 2026-04-23 18:30:18 +00:00
bundle
channels feat(channels): first-class Lark/Feishu support via schema-driven config 2026-04-24 11:51:15 -07:00
crypto
db fix(workspaces): add missing 'awaiting_agent' + 'hibernating' to workspace_status enum 2026-04-30 08:52:05 -07:00
envx
events test(handlers): introduce events.EventEmitter interface (#1814 partial) 2026-04-26 09:05:52 -07:00
handlers refactor(chat_files): extract streamWorkspaceResponse helper for Upload+Download 2026-04-30 08:27:45 -07:00
imagewatch feat(workspace-server): GHCR digest watcher closes runtime CD chain (#2114) 2026-04-26 13:36:26 -07:00
metrics
middleware refactor(wsauth): extract lookupTokenByHash to dedup auth predicate across 3 callers 2026-04-30 03:11:38 -07:00
models Merge pull request #2348 from Molecule-AI/auto/issue-2339-pr1-delivery-mode 2026-04-30 05:18:03 +00:00
orgtoken fix: F1085 rm scope concat + GH#756 ValidateToken terminal guard + CI test fixes 2026-04-24 07:16:54 +00:00
plugins
provisioner fix(a2a): detect dead EC2 agents on upstream 5xx + reactive auto-restart for SaaS 2026-04-30 00:28:22 -07:00
registry feat(external): default external runtime to poll-mode + awaiting_agent 2026-04-30 06:39:57 -07:00
router fix(team): delegate Expand child-provisioning to shared mint pipeline (#2367) 2026-04-30 02:28:29 -07:00
scheduler feat(runtime): native_scheduler skip — primitive #3 of 6 2026-04-26 22:47:00 -07:00
supervised
ws
wsauth refactor(wsauth): extract lookupTokenByHash to dedup auth predicate across 3 callers 2026-04-30 03:11:38 -07:00