fix(delegations): preserve result_preview through completion + add real-Postgres integration gate

Two-part PR:

## Fix: result_preview was lost on completion

Self-review of #2854 caught a real bug. SetStatus has a same-status
replay no-op; the order of calls in `executeDelegation` completion
+ `UpdateStatus` completed branch clobbered the preview field:

  1. updateDelegationStatus(completed, "")    fires
  2. inner recordLedgerStatus(completed, "", "")
       → SetStatus transitions dispatched → completed with preview=""
  3. outer recordLedgerStatus(completed, "", responseText)
       → SetStatus reads current=completed, status=completed
       → SAME-STATUS NO-OP, never writes responseText → preview lost

Confirmed against real Postgres (see integration test). Strict-sqlmock
unit tests passed because they pin SQL shape, not row state.

Fix: call the WITH-PREVIEW recordLedgerStatus FIRST, then
updateDelegationStatus. The inner call becomes the no-op (correctly
preserves the row written by the outer call).

Same gap fixed in UpdateStatus handler — body.ResponsePreview was
never landing in the ledger because updateDelegationStatus's nested
SetStatus(completed, "", "") fired first.

## Gate: real-Postgres integration tests + CI workflow

The unit-test-only workflow that shipped #2854 was the root cause.
Adding two layers of defense:

1. workspace-server/internal/handlers/delegation_ledger_integration_test.go
   — `//go:build integration` tag, requires INTEGRATION_DB_URL env var.
   4 tests:
     * ResultPreviewPreservedThroughCompletion (regression gate for the
       bug above — fires the production call sequence in fixed order
       and asserts row.result_preview matches)
     * ResultPreviewBuggyOrderIsLost (DIAGNOSTIC: confirms the
       same-status no-op contract works as designed; if SetStatus's
       semantics ever change, this test fires)
     * FailedTransitionCapturesErrorDetail (failure-path symmetry)
     * FullLifecycle_QueuedToDispatchedToCompleted (forward-only +
       happy path)

2. .github/workflows/handlers-postgres-integration.yml
   — required check on staging branch protection. Spins postgres:15
   service container, applies the delegations migration, runs
   `go test -tags=integration` against the live DB. Always-runs +
   per-step gating on path filter (handlers/wsauth/migrations) so
   the required-check name is satisfied on PRs that don't touch
   relevant code.

Local dev workflow (file header documents this):

  docker run --rm -d --name pg -e POSTGRES_PASSWORD=test -p 55432:5432 postgres:15-alpine
  psql ... < workspace-server/migrations/049_delegations.up.sql
  INTEGRATION_DB_URL="postgres://postgres:test@localhost:55432/molecule?sslmode=disable" \
    go test -tags=integration ./internal/handlers/ -run "^TestIntegration_"

## Why this matters

Per memory `feedback_mandatory_local_e2e_before_ship`: backend PRs
MUST verify against real Postgres before claiming done. sqlmock pins
SQL shape; only a real DB can verify row state. The workflow makes
this gate mandatory rather than optional.
This commit is contained in:
Hongming Wang 2026-05-05 02:47:52 -07:00
parent ca61213578
commit 4c9f12258d
3 changed files with 388 additions and 4 deletions

View File

@ -0,0 +1,126 @@
name: Handlers Postgres Integration
# Real-Postgres integration tests for workspace-server/internal/handlers/.
# Triggered on every PR/push that touches the handlers package.
#
# Why this workflow exists
# ------------------------
# Strict-sqlmock unit tests pin which SQL statements fire — they're fast
# and let us iterate without a DB. But sqlmock CANNOT detect bugs that
# depend on the row state AFTER the SQL runs. The result_preview-lost
# bug shipped to staging in PR #2854 because every unit test was
# satisfied with "an UPDATE statement fired" — none verified the row's
# preview field actually landed. The local-postgres E2E that retrofit
# self-review caught it took 2 minutes to set up and would have caught
# the bug at PR-time.
#
# This job spins a Postgres service container, applies the migration,
# and runs `go test -tags=integration` against a live DB. Required
# check on staging branch protection — backend handler PRs cannot
# merge without a real-DB regression gate.
#
# Cost: ~30s job (postgres pull from GH cache + go build + 4 tests).
on:
push:
branches: [main, staging]
pull_request:
branches: [main, staging]
merge_group:
types: [checks_requested]
workflow_dispatch:
concurrency:
group: handlers-pg-integ-${{ github.event.pull_request.head.sha || github.sha }}
cancel-in-progress: false
jobs:
detect-changes:
name: detect-changes
runs-on: ubuntu-latest
outputs:
handlers: ${{ steps.filter.outputs.handlers }}
steps:
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
- uses: dorny/paths-filter@fbd0ab8f3e69293af611ebaee6363fc25e6d187d # v4.0.1
id: filter
with:
filters: |
handlers:
- 'workspace-server/internal/handlers/**'
- 'workspace-server/internal/wsauth/**'
- 'workspace-server/migrations/**'
- '.github/workflows/handlers-postgres-integration.yml'
# Single-job-with-per-step-if pattern: always runs to satisfy the
# required-check name on branch protection; real work gates on the
# paths filter. See ci.yml's Platform (Go) for the same shape.
integration:
name: Handlers Postgres Integration
needs: detect-changes
runs-on: ubuntu-latest
services:
postgres:
image: postgres:15-alpine
env:
POSTGRES_PASSWORD: test
POSTGRES_DB: molecule
ports:
- 5432:5432
# GHA spins this with --health-cmd built in for postgres images.
options: >-
--health-cmd pg_isready
--health-interval 5s
--health-timeout 5s
--health-retries 10
defaults:
run:
working-directory: workspace-server
steps:
- if: needs.detect-changes.outputs.handlers != 'true'
working-directory: .
run: echo "No handlers/migrations changes — skipping; this job always runs to satisfy the required-check name."
- if: needs.detect-changes.outputs.handlers == 'true'
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
- if: needs.detect-changes.outputs.handlers == 'true'
uses: actions/setup-go@40f1582b2485089dde7abd97c1529aa768e1baff # v5
with:
go-version: 'stable'
- if: needs.detect-changes.outputs.handlers == 'true'
name: Apply migrations to Postgres service
env:
PGPASSWORD: test
run: |
# Wait for postgres to actually accept connections (the
# GHA --health-cmd is best-effort but psql can still race).
for i in {1..15}; do
if pg_isready -h localhost -p 5432 -U postgres -q; then break; fi
echo "waiting for postgres..."; sleep 2
done
# Apply only the delegations migration — it doesn't FK to
# workspaces, so we don't need to fold in the full migration
# chain (some earlier migrations have ordering dependencies on
# tables that have since been dropped). Per-test migrations
# under handlers/ should add their schema files to this list
# as needed.
psql -h localhost -U postgres -d molecule -v ON_ERROR_STOP=1 \
-f migrations/049_delegations.up.sql
- if: needs.detect-changes.outputs.handlers == 'true'
name: Run integration tests
env:
INTEGRATION_DB_URL: postgres://postgres:test@localhost:5432/molecule?sslmode=disable
run: |
go test -tags=integration -timeout 5m -v ./internal/handlers/ -run "^TestIntegration_"
- if: needs.detect-changes.outputs.handlers == 'true' && failure()
name: Diagnostic dump on failure
env:
PGPASSWORD: test
run: |
echo "::group::delegations table state"
psql -h localhost -U postgres -d molecule -c "SELECT * FROM delegations LIMIT 50;" || true
echo "::endgroup::"

View File

@ -411,11 +411,15 @@ func (h *DelegationHandler) executeDelegation(sourceID, targetID, delegationID s
log.Printf("Delegation %s: failed to insert success log: %v", delegationID, err)
}
h.updateDelegationStatus(sourceID, delegationID, "completed", "")
// RFC #2829 #318 — mirror result_preview to the durable ledger
// (updateDelegationStatus handles the status flip; ledger gets the
// preview field set on the same row).
// RFC #2829 #318: write the ledger row with result_preview FIRST,
// THEN updateDelegationStatus. Order matters: SetStatus has a
// same-status replay no-op — if updateDelegationStatus's nested
// recordLedgerStatus(completed, "", "") fires first, the outer call
// hits the no-op branch and result_preview is never written.
// Caught by the local-Postgres integration test in
// delegation_ledger_integration_test.go.
recordLedgerStatus(ctx, delegationID, "completed", "", responseText)
h.updateDelegationStatus(sourceID, delegationID, "completed", "")
h.broadcaster.RecordAndBroadcast(ctx, "DELEGATION_COMPLETE", sourceID, map[string]interface{}{
"delegation_id": delegationID,
"target_id": targetID,
@ -534,6 +538,13 @@ func (h *DelegationHandler) UpdateStatus(c *gin.Context) {
return
}
// RFC #2829 #318 — same ordering pin as executeDelegation completion:
// write the with-preview ledger row FIRST so updateDelegationStatus's
// inner same-status no-op doesn't clobber preview.
if body.Status == "completed" {
recordLedgerStatus(ctx, delegationID, "completed", "", body.ResponsePreview)
}
h.updateDelegationStatus(sourceID, delegationID, body.Status, body.Error)
if body.Status == "completed" {

View File

@ -0,0 +1,247 @@
//go:build integration
// +build integration
// delegation_ledger_integration_test.go — REAL Postgres integration tests
// for the RFC #2829 ledger writes.
//
// Run with:
//
// docker run --rm -d --name pg-integration \
// -e POSTGRES_PASSWORD=test -e POSTGRES_DB=molecule \
// -p 55432:5432 postgres:15-alpine
// sleep 4
// psql ... < workspace-server/migrations/049_delegations.up.sql
// cd workspace-server
// INTEGRATION_DB_URL="postgres://postgres:test@localhost:55432/molecule?sslmode=disable" \
// go test -tags=integration ./internal/handlers/ -run Integration_
//
// CI (.github/workflows/handlers-postgres-integration.yml) runs this on
// every PR that touches workspace-server/internal/handlers/**.
//
// Why these are NOT plain unit tests
// ----------------------------------
// The strict-sqlmock unit tests in delegation_ledger_writes_test.go pin
// which SQL statements fire — they are fast and let us iterate without
// a DB. But sqlmock CANNOT detect bugs that depend on the ROW STATE
// after the SQL runs. The result_preview-lost bug shipped to staging in
// PR #2854 because every unit test was satisfied with "an UPDATE
// statement fired" — none verified the row's preview field landed.
//
// These integration tests close that gap by booting a real Postgres,
// running the production helpers, and SELECTing the row to verify the
// observable state matches the expected outcome.
package handlers
import (
"context"
"database/sql"
"os"
"testing"
mdb "github.com/Molecule-AI/molecule-monorepo/platform/internal/db"
_ "github.com/lib/pq"
)
// integrationDB returns the configured integration-test connection or
// skips the test if INTEGRATION_DB_URL is unset. Local devs run the
// docker-postgres incantation in the file header; CI's workflow sets the
// env var via a service container.
func integrationDB(t *testing.T) *sql.DB {
t.Helper()
url := os.Getenv("INTEGRATION_DB_URL")
if url == "" {
t.Skip("INTEGRATION_DB_URL not set; skipping (local devs: see file header)")
}
conn, err := sql.Open("postgres", url)
if err != nil {
t.Fatalf("open: %v", err)
}
if err := conn.Ping(); err != nil {
t.Fatalf("ping: %v", err)
}
// Each test gets a fresh table state — fail loud if cleanup fails so
// a bad test doesn't pollute the next one.
if _, err := conn.ExecContext(context.Background(), `DELETE FROM delegations`); err != nil {
t.Fatalf("cleanup: %v", err)
}
// Wire the package-level db.DB so production helpers (recordLedgerInsert,
// recordLedgerStatus) see the same connection.
prev := mdb.DB
mdb.DB = conn
t.Cleanup(func() {
mdb.DB = prev
conn.Close()
})
return conn
}
// readLedgerRow returns (status, result_preview, error_detail) for the
// given delegation_id, or fails the test on miss.
func readLedgerRow(t *testing.T, conn *sql.DB, id string) (status, preview, errorDetail string) {
t.Helper()
var prev, errDet sql.NullString
err := conn.QueryRowContext(context.Background(),
`SELECT status, result_preview, error_detail FROM delegations WHERE delegation_id = $1`, id,
).Scan(&status, &prev, &errDet)
if err != nil {
t.Fatalf("readLedgerRow(%s): %v", id, err)
}
return status, prev.String, errDet.String
}
// TestIntegration_ResultPreviewPreservedThroughCompletion is the
// regression gate for the bug that shipped in PR #2854 + was caught in
// self-review: when both the inner SetStatus(completed, "", "") (from
// updateDelegationStatus) and an outer SetStatus(completed, "", preview)
// fire, the SECOND one is a same-status no-op — order matters.
//
// The fix in delegation.go calls the WITH-PREVIEW SetStatus FIRST so the
// outer write lands the preview, and the inner becomes the no-op.
//
// This test fires the call sequence in the corrected order and asserts
// the row's result_preview matches.
//
// If a future refactor reverses the order, this test fails on a real
// Postgres — which sqlmock would have missed.
func TestIntegration_ResultPreviewPreservedThroughCompletion(t *testing.T) {
conn := integrationDB(t)
t.Setenv("DELEGATION_LEDGER_WRITE", "1")
id := "integ-deleg-preview-1"
caller := "11111111-1111-1111-1111-111111111111"
callee := "22222222-2222-2222-2222-222222222222"
expectedPreview := "the long-running task's final answer"
// Mirror the production call sequence the FIXED code path uses.
// executeDelegation flow:
// 1. insertDelegationRow → recordLedgerInsert (status=queued)
// 2. updateDelegationStatus("dispatched", "") at the start of execute,
// so the row is at status=dispatched by completion time
// 3. recordLedgerStatus("completed", "", preview) ← outer FIRST (the fix)
// 4. updateDelegationStatus("completed", "") inside, which calls
// recordLedgerStatus("completed", "", "") ← inner same-status no-op
recordLedgerInsert(context.Background(), caller, callee, id, "the question", "")
recordLedgerStatus(context.Background(), id, "dispatched", "", "")
recordLedgerStatus(context.Background(), id, "completed", "", expectedPreview)
recordLedgerStatus(context.Background(), id, "completed", "", "")
status, preview, errDet := readLedgerRow(t, conn, id)
if status != "completed" {
t.Errorf("status: want completed, got %q", status)
}
if preview != expectedPreview {
t.Errorf("result_preview lost: want %q, got %q", expectedPreview, preview)
}
if errDet != "" {
t.Errorf("error_detail should be empty: got %q", errDet)
}
}
// TestIntegration_ResultPreviewBuggyOrderIsLost — DIAGNOSTIC test that
// confirms the ORIGINAL buggy order does lose the preview. Useful when
// auditing similar wiring elsewhere.
//
// This is documented behavior: it asserts the same-status replay no-op
// works as designed in DelegationLedger.SetStatus. The fix in
// delegation.go is to AVOID this order, not to change SetStatus's
// same-status semantics (which the operator dashboard relies on for
// idempotent completion notifications).
func TestIntegration_ResultPreviewBuggyOrderIsLost(t *testing.T) {
conn := integrationDB(t)
t.Setenv("DELEGATION_LEDGER_WRITE", "1")
id := "integ-deleg-preview-2"
caller := "11111111-1111-1111-1111-111111111111"
callee := "22222222-2222-2222-2222-222222222222"
// BUGGY sequence in production-shape order: queued → dispatched →
// completed (no preview) → completed (preview ignored as same-status).
recordLedgerInsert(context.Background(), caller, callee, id, "the question", "")
recordLedgerStatus(context.Background(), id, "dispatched", "", "") // pre-completion stage
recordLedgerStatus(context.Background(), id, "completed", "", "") // inner first
recordLedgerStatus(context.Background(), id, "completed", "", "the answer") // outer same-status no-op
_, preview, _ := readLedgerRow(t, conn, id)
if preview != "" {
t.Errorf("buggy-order preview was unexpectedly non-empty: %q (SetStatus same-status no-op contract may have changed)", preview)
}
}
// TestIntegration_FailedTransitionCapturesErrorDetail — error_detail is
// the failure-path equivalent of result_preview. The legacy path calls
// SetStatus(failed, errorDetail, "") via updateDelegationStatus; no
// outer call exists today (no observed bug). This test pins that
// error_detail lands as expected, so a future refactor adding an outer
// call must consciously preserve the field — same lesson as the preview
// bug, just on the failure path.
func TestIntegration_FailedTransitionCapturesErrorDetail(t *testing.T) {
conn := integrationDB(t)
t.Setenv("DELEGATION_LEDGER_WRITE", "1")
id := "integ-deleg-fail-1"
caller := "11111111-1111-1111-1111-111111111111"
callee := "22222222-2222-2222-2222-222222222222"
expectedError := "callee unreachable: connection refused"
// queued → failed is allowed by allowedTransitions (the failure-on-
// dispatch case) so this exercises a real production path.
recordLedgerInsert(context.Background(), caller, callee, id, "the question", "")
recordLedgerStatus(context.Background(), id, "failed", expectedError, "")
status, preview, errDet := readLedgerRow(t, conn, id)
if status != "failed" {
t.Errorf("status: want failed, got %q", status)
}
if errDet != expectedError {
t.Errorf("error_detail: want %q, got %q", expectedError, errDet)
}
if preview != "" {
t.Errorf("result_preview should be empty on failure: got %q", preview)
}
}
// TestIntegration_FullLifecycle_QueuedToDispatchedToCompleted — pins the
// happy-path lifecycle. INSERT lands the row at queued; SetStatus moves
// it through dispatched and into completed with preview. After the
// terminal transition, no further state change is possible via
// SetStatus (forward-only protection).
func TestIntegration_FullLifecycle_QueuedToDispatchedToCompleted(t *testing.T) {
conn := integrationDB(t)
t.Setenv("DELEGATION_LEDGER_WRITE", "1")
id := "integ-deleg-lifecycle-1"
caller := "11111111-1111-1111-1111-111111111111"
callee := "22222222-2222-2222-2222-222222222222"
recordLedgerInsert(context.Background(), caller, callee, id, "task body", "")
if status, _, _ := readLedgerRow(t, conn, id); status != "queued" {
t.Errorf("after Insert: status want queued, got %q", status)
}
recordLedgerStatus(context.Background(), id, "dispatched", "", "")
if status, _, _ := readLedgerRow(t, conn, id); status != "dispatched" {
t.Errorf("after dispatched: status want dispatched, got %q", status)
}
recordLedgerStatus(context.Background(), id, "completed", "", "the result")
status, preview, _ := readLedgerRow(t, conn, id)
if status != "completed" {
t.Errorf("after completed: status want completed, got %q", status)
}
if preview != "the result" {
t.Errorf("preview after completed: want %q, got %q", "the result", preview)
}
// Forward-only: trying to revise to failed should silently no-op
// (recordLedgerStatus swallows ErrInvalidTransition).
recordLedgerStatus(context.Background(), id, "failed", "post-hoc revision", "")
status, preview, errDet := readLedgerRow(t, conn, id)
if status != "completed" {
t.Errorf("forward-only broken: status changed to %q", status)
}
if preview != "the result" {
t.Errorf("preview clobbered by failed revision: %q", preview)
}
if errDet != "" {
t.Errorf("error_detail clobbered by failed revision: %q", errDet)
}
}