molecule-core/workspace-server/internal
Hongming Wang ed6dfe01e5 feat(delegations): durable per-task ledger + audit-write helper (RFC #2829 PR-1)
Adds the `delegations` table and the DelegationLedger writer that PRs #2-#4
of RFC #2829 build on. Schema-only foundation — no behavior change in this
PR. PR-2 wires the ledger into the existing handlers and ships the result-
push-to-inbox cutover behind a feature flag.

Why a dedicated table when activity_logs already records every delegation
event:

  Today, "what is currently in flight for this workspace" is reconstructed
  by GROUPing activity_logs by delegation_id and ORDER BY created_at DESC.
  PR-3's stuck-task sweeper needs the join

      SELECT delegation_id FROM delegations
      WHERE status = 'in_progress'
        AND last_heartbeat < now() - interval '10 minutes'

  which is impossible to express against the event stream without a window
  over every (delegation_id, latest event) pair — a planner-killing query
  at scale. The dedicated table makes the sweeper an indexed scan.

  Same posture as tenant_resources (PR #2343, memory
  `reference_tenant_resources_audit`): activity_logs remains the audit-
  grade source of truth, delegations is the queryable view for dashboards
  + sweeper joins. Symmetric writes — both tables are written, neither
  blocks orchestration on the other's failure.

Schema highlights:
  - delegation_id PRIMARY KEY (caller-chosen, idempotent retry on
    restart is a no-op via ON CONFLICT DO NOTHING)
  - caller_id / callee_id NOT FK — workspace delete must NOT cascade-
    delete delegation history (audit retention)
  - status CHECK constraint enforces the lifecycle
    (queued|dispatched|in_progress|completed|failed|stuck)
  - last_heartbeat NULL-able; PR-3 sweeper compares to NOW()
  - deadline default now()+6h matches longest-observed legit delegation
    (memory-namespace migrations) — protects against forever-heartbeating
    wedged agents
  - Partial index `idx_delegations_inflight_heartbeat` keeps the sweeper
    hot path tiny (only non-terminal rows)
  - UNIQUE(caller_id, idempotency_key) WHERE NOT NULL — natural
    collision becomes ON CONFLICT no-op without colliding across callers

DelegationLedger.SetStatus enforces forward-only on terminal states
(completed/failed/stuck cannot be revised) as defense-in-depth on the
schema CHECK. Same-status replay is a no-op. Missing-row SetStatus is
a no-op (transient inconsistency the next agent retry will heal).

Heartbeat updates only in-flight rows — terminal-state delegations are
silently skipped.

Coverage:
  - 17 unit tests against sqlmock-backed *sql.DB (Insert happy path,
    missing-required guards, truncation, lifecycle transitions, terminal
    forward-only protection, replay no-op, missing-row no-op, empty-input
    rejection, heartbeat semantics, transition table shape)
  - Migration roundtrip verified on a real Postgres 15 instance:
    up creates the expected schema with all 4 indexes + CHECK, down
    drops everything cleanly.

Refs RFC #2829.
2026-05-04 20:43:06 -07:00
..
artifacts test(handlers): add CWE-22 regression suite + KI-005 terminal access fix + tests (#1574) 2026-04-22 15:30:11 +00:00
buildinfo feat(deploy): verify each tenant /buildinfo matches published SHA after redeploy 2026-04-30 10:55:08 -07:00
bundle refactor(workspace-status): typed constants + AST-based drift gate 2026-04-30 10:41:41 -07:00
channels feat(channels): first-class Lark/Feishu support via schema-driven config 2026-04-24 11:51:15 -07:00
crypto chore: open-source restructure — rename dirs, remove internal files, scrub secrets 2026-04-18 00:24:44 -07:00
db refactor(workspace-status): catch missed literal in workspace_bootstrap.go + add literal-drift gate 2026-04-30 10:51:01 -07:00
envx chore: open-source restructure — rename dirs, remove internal files, scrub secrets 2026-04-18 00:24:44 -07:00
events test(handlers): introduce events.EventEmitter interface (#1814 partial) 2026-04-26 09:05:52 -07:00
handlers feat(delegations): durable per-task ledger + audit-write helper (RFC #2829 PR-1) 2026-05-04 20:43:06 -07:00
imagewatch feat(workspace-server): GHCR digest watcher closes runtime CD chain (#2114) 2026-04-26 13:36:26 -07:00
memory fix(memory v2): warn at boot when cutover env half-configured 2026-05-04 17:24:11 -07:00
metrics chore: open-source restructure — rename dirs, remove internal files, scrub secrets 2026-04-18 00:24:44 -07:00
middleware fix(tenant-guard): allowlist /buildinfo so redeploy verifier can reach it 2026-04-30 12:54:51 -07:00
models refactor(workspace-status): typed constants + AST-based drift gate 2026-04-30 10:41:41 -07:00
orgtoken fix: F1085 rm scope concat + GH#756 ValidateToken terminal guard + CI test fixes 2026-04-24 07:16:54 +00:00
plugins chore: open-source restructure — rename dirs, remove internal files, scrub secrets 2026-04-18 00:24:44 -07:00
provisioner feat(provisioner): digest-pin workspace images via runtime_image_pins (#2272 layer 1) 2026-05-03 02:30:00 -07:00
registry fix(orphan-sweeper): exclude runtime='external' from stale-token revoke 2026-05-03 00:49:37 -07:00
router fix(provision): StopWorkspaceAuto mirror — close SaaS EC2-leak class 2026-05-04 20:00:23 -07:00
scheduler feat(runtime): native_scheduler skip — primitive #3 of 6 2026-04-26 22:47:00 -07:00
supervised chore: open-source restructure — rename dirs, remove internal files, scrub secrets 2026-04-18 00:24:44 -07:00
ws chore: open-source restructure — rename dirs, remove internal files, scrub secrets 2026-04-18 00:24:44 -07:00
wsauth perf(wsauth): in-process cache for platform_inbound_secret reads 2026-05-03 00:04:38 -07:00