molecule-core/workspace-server/cmd/memory-backfill/verify.go
claude-ceo-assistant 3501e6bfd7
Some checks failed
CodeQL / Analyze (${{ matrix.language }}) (go) (pull_request) Successful in 13s
CodeQL / Analyze (${{ matrix.language }}) (python) (pull_request) Successful in 11s
CodeQL / Analyze (${{ matrix.language }}) (javascript-typescript) (pull_request) Successful in 12s
Check merge_group trigger on required workflows / Required workflows have merge_group trigger (pull_request) Successful in 15s
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 27s
CI / Detect changes (pull_request) Successful in 20s
Retarget main PRs to staging / Retarget to staging (pull_request) Has been skipped
Lint curl status-code capture / Scan workflows for curl status-capture pollution (pull_request) Successful in 15s
E2E API Smoke Test / detect-changes (pull_request) Successful in 51s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 51s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 39s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 51s
Harness Replays / detect-changes (pull_request) Successful in 53s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 48s
Ops Scripts Tests / Ops scripts (unittest) (pull_request) Successful in 1m7s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 31s
Harness Replays / Harness Replays (pull_request) Failing after 1m18s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 2m19s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 3m14s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 6m1s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 6m47s
CI / Python Lint & Test (pull_request) Successful in 8m16s
CI / Canvas (Next.js) (pull_request) Failing after 9m36s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
CI / Platform (Go) (pull_request) Successful in 12m18s
fix(post-suspension): vanity import paths go.moleculesai.app/core/{platform,tests/harness/cp-stub} (closes molecule-ai/internal#71 phase 2)
Migrates the two Go modules under molecule-core off the dead
github.com/Molecule-AI/molecule-monorepo/... identity onto the vanity
host go.moleculesai.app. Also fixes the historical naming
inconsistency where the Gitea repo is molecule-core but the Go module
path said molecule-monorepo.

Module changes:
- workspace-server/go.mod:
    github.com/Molecule-AI/molecule-monorepo/platform
    -> go.moleculesai.app/core/platform
- tests/harness/cp-stub/go.mod:
    github.com/Molecule-AI/molecule-monorepo/tests/harness/cp-stub
    -> go.moleculesai.app/core/tests/harness/cp-stub

Surfaces touched
- 174 *.go files (374 import lines) — every import under
  workspace-server/ + tests/harness/cp-stub/
- 2 Dockerfiles (workspace-server/Dockerfile + Dockerfile.tenant) —
  -ldflags strings updated in lockstep with the module rename so
  buildinfo.GitSHA injection still resolves correctly
- README + docs + scripts + comment URLs to git.moleculesai.app form
- NEW workspace-server/internal/lint/import_path_lint_test.go —
  structural lint gate rejecting future github.com/Molecule-AI/ or
  Molecule-AI/molecule-monorepo references. Identical template to the
  other migration PRs (plugin-gh-identity#3, molecule-cli#2,
  molecule-controlplane#32).

Cross-repo dep allowlist (documented in lint gate)
workspace-server requires molecule-ai-plugin-gh-identity, whose own
vanity migration is PR molecule-ai-plugin-gh-identity#3. Until that PR
merges + a tag is cut at go.moleculesai.app/plugin/gh-identity, the
two locations referencing the legacy github.com path
(workspace-server/go.mod require, cmd/server/main.go import) remain
allowlisted. Follow-up PR drops the allowlist + updates both refs in
one shot once gh-identity is fully migrated.

Test plan
- go build ./... clean for both modules
- go test ./... green except two pre-existing failures
  (TestStartSweeper_RecordsMetricsOnSuccess flaky-on-suite,
  TestLocalResolver_BubblesUpCopyFailure relies on read-only fs perms
  but runs as root on operator host) — both reproduce identically on
  baseline main pre-migration; NOT regressions of this PR
- Mutation-tested: lint gate fails on canaries in .go + .md;
  allowlist correctly suppresses cross-repo dep references in go.mod
  while still flagging unrelated additions

Open dependency
- go.moleculesai.app responder must be deployed before fresh-clone
  external builds resolve the vanity path. Existing CI / Docker builds
  ride pinned go.sum + self-referential module path + responder is
  not on critical path for those.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 22:37:42 +00:00

197 lines
5.9 KiB
Go

package main
// verify.go — post-apply parity check.
//
// After a backfill -apply, run with -verify to confirm the migration
// actually produced equivalent data. Picks `SampleSize` random
// workspaces, queries agent_memories direct + plugin search via the
// caller's namespaces, and diffs the result sets by content.
//
// The diff is best-effort: pg's recent-first ordering and the plugin's
// internal ordering may differ, so we compare as sets, not lists.
// We do require strict 1:1 multiset equality (every legacy row maps
// to exactly one plugin row, ignoring id since the backfill preserves
// it via the C1 idempotency key).
import (
"context"
"database/sql"
"fmt"
"math/rand"
"os"
"go.moleculesai.app/core/platform/internal/memory/contract"
"go.moleculesai.app/core/platform/internal/textutil"
)
// verifyConfig is the typed dependency bundle for verifyParity.
type verifyConfig struct {
DB *sql.DB
Plugin verifyPlugin
Resolver verifyResolver
SampleSize int
WorkspaceID string // optional: limit to one workspace
Rand *rand.Rand
}
// verifyPlugin is the slice of memory-plugin client we call.
type verifyPlugin interface {
Search(ctx context.Context, body contract.SearchRequest) (*contract.SearchResponse, error)
}
// verifyResolver mirrors namespace.Resolver. Same shape as
// backfillResolver but kept distinct so verify isn't tied to
// backfill's interface.
type verifyResolver interface {
ReadableNamespaces(ctx context.Context, workspaceID string) ([]ResolvedNamespace, error)
}
// ResolvedNamespace is the minimum we need from the resolver — kept
// separate so the verify code doesn't depend on the namespace package
// (the live tests inject stubs, the binary uses an adapter).
type ResolvedNamespace struct {
Name string
}
// verifyReport accumulates the per-workspace results.
type verifyReport struct {
WorkspacesSampled int
Matches int
Mismatches int
Errors int
}
// verifyParity is the workhorse. Returns a report; the CLI converts
// any non-zero mismatches/errors into a non-zero exit so CI can gate
// the cutover.
func verifyParity(ctx context.Context, cfg verifyConfig, stdout *os.File) (*verifyReport, error) {
report := &verifyReport{}
rng := cfg.Rand
if rng == nil {
rng = rand.New(rand.NewSource(42)) //nolint:gosec // determinism > unpredictability for ops
}
wsIDs, err := pickWorkspaceSample(ctx, cfg.DB, cfg.WorkspaceID, cfg.SampleSize, rng)
if err != nil {
return report, fmt.Errorf("pick sample: %w", err)
}
for _, wsID := range wsIDs {
report.WorkspacesSampled++
legacy, err := queryLegacyMemories(ctx, cfg.DB, wsID)
if err != nil {
fmt.Fprintf(stdout, "[err] workspace=%s legacy query: %v\n", wsID, err)
report.Errors++
continue
}
readable, err := cfg.Resolver.ReadableNamespaces(ctx, wsID)
if err != nil {
fmt.Fprintf(stdout, "[err] workspace=%s resolve: %v\n", wsID, err)
report.Errors++
continue
}
nsList := make([]string, len(readable))
for i, ns := range readable {
nsList[i] = ns.Name
}
if len(nsList) == 0 {
// No readable namespaces — empty plugin result expected.
if len(legacy) == 0 {
report.Matches++
} else {
fmt.Fprintf(stdout, "[mismatch] workspace=%s legacy=%d plugin=0 (no readable namespaces)\n", wsID, len(legacy))
report.Mismatches++
}
continue
}
resp, err := cfg.Plugin.Search(ctx, contract.SearchRequest{Namespaces: nsList, Limit: 100})
if err != nil {
fmt.Fprintf(stdout, "[err] workspace=%s plugin search: %v\n", wsID, err)
report.Errors++
continue
}
pluginContents := make(map[string]int, len(resp.Memories))
for _, m := range resp.Memories {
pluginContents[m.Content]++
}
// Compare as multisets: each legacy content appears at least
// once in plugin output. We deliberately tolerate plugin
// having MORE rows (the namespace might include team-shared
// memories from sibling workspaces that aren't in this
// workspace's agent_memories rows).
matched := true
for _, c := range legacy {
if pluginContents[c] == 0 {
fmt.Fprintf(stdout, "[mismatch] workspace=%s missing-from-plugin content=%q\n", wsID, textutil.TruncateBytes(c, 80))
matched = false
break
}
pluginContents[c]--
}
if matched {
report.Matches++
} else {
report.Mismatches++
}
}
return report, nil
}
// pickWorkspaceSample returns up to N workspace UUIDs. If
// WorkspaceID is set, returns only that one. Otherwise selects N
// random workspaces from the workspaces table (TABLESAMPLE would be
// nicer but SYSTEM/BERNOULLI sampling has surprising distribution
// properties for small populations; we just ORDER BY random() LIMIT).
func pickWorkspaceSample(ctx context.Context, db *sql.DB, workspaceID string, n int, _ *rand.Rand) ([]string, error) {
if workspaceID != "" {
return []string{workspaceID}, nil
}
rows, err := db.QueryContext(ctx, `
SELECT id::text
FROM workspaces
WHERE status != 'removed'
ORDER BY random()
LIMIT $1
`, n)
if err != nil {
return nil, err
}
defer rows.Close()
out := make([]string, 0, n)
for rows.Next() {
var id string
if err := rows.Scan(&id); err != nil {
return nil, err
}
out = append(out, id)
}
return out, rows.Err()
}
// queryLegacyMemories pulls all agent_memories rows for a workspace
// (LOCAL + TEAM scopes — what the plugin search would return through
// the resolver's readable list, mapped via PR-6 shim semantics).
func queryLegacyMemories(ctx context.Context, db *sql.DB, workspaceID string) ([]string, error) {
rows, err := db.QueryContext(ctx, `
SELECT content
FROM agent_memories
WHERE workspace_id = $1
ORDER BY created_at DESC
`, workspaceID)
if err != nil {
return nil, err
}
defer rows.Close()
out := []string{}
for rows.Next() {
var c string
if err := rows.Scan(&c); err != nil {
return nil, err
}
out = append(out, c)
}
return out, rows.Err()
}
// truncation moved to internal/textutil.TruncateBytes (#2962 SSOT).