From 2d783b5ca64022894b760977b4e143d502d1eaf1 Mon Sep 17 00:00:00 2001 From: Hongming Wang Date: Mon, 4 May 2026 09:08:28 -0700 Subject: [PATCH 1/4] Memory v2 docs update: idempotency key + verify mode + cutover runbook MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Updates plugin-author and operator docs to reflect the four fixup PRs (C1, C2, I1, I4) for self-review findings. Stacked on C1+C2 so the docs reference behavior that lands in the same wave; rebases to staging once those merge. What changes: * docs/memory-plugins/README.md - New "Memory idempotency" section explaining MemoryWrite.id contract: omit → plugin generates UUID; supplied → upsert - "Replacing the built-in plugin" rewritten as a 6-step operator runbook with concrete commands for -dry-run / -apply / -verify / MEMORY_V2_CUTOVER, including the failure path ("if -verify reports mismatches, do not flip the cutover flag") - Added link to new CHANGELOG.md * docs/memory-plugins/testing-your-plugin.md - New TestMyPlugin_IDIsIdempotencyKey example: write same id twice, assert single row + updated content - "What the harness does NOT cover" expanded with two new operational gates: backfill twice → no double; verify-mode reports zero mismatches * docs/memory-plugins/pinecone-example/README.md - Wire-mapping table updated: id (caller-supplied) → Pinecone vector id (upsert); id (omitted) → plugin-generated UUID - Production-hardening checklist gained an idempotency-key item * docs/memory-plugins/CHANGELOG.md (new) - Captures the four fixup PRs in one place with severity-ordered summary, plugin-author action items, and remaining open follow-ups (#289, #291, #293) for transparency No code changes. Docs-only PR. --- docs/memory-plugins/CHANGELOG.md | 113 ++++++++++++++++++ docs/memory-plugins/README.md | 74 ++++++++++-- .../memory-plugins/pinecone-example/README.md | 12 +- docs/memory-plugins/testing-your-plugin.md | 69 +++++++++++ 4 files changed, 258 insertions(+), 10 deletions(-) create mode 100644 docs/memory-plugins/CHANGELOG.md diff --git a/docs/memory-plugins/CHANGELOG.md b/docs/memory-plugins/CHANGELOG.md new file mode 100644 index 00000000..a811620b --- /dev/null +++ b/docs/memory-plugins/CHANGELOG.md @@ -0,0 +1,113 @@ +# Memory Plugin Contract — Changelog + +Every breaking or operationally-relevant change to the v1 plugin +contract or the workspace-server-side wiring lands here. Plugin +authors should subscribe to PRs touching this file. + +## [Unreleased] — fixup wave 1 (post-RFC-#2728 self-review) + +A self-review of the initial 11-PR rollout (PRs #2729-#2742) flagged +two correctness bugs and three operational hazards. This wave fixes +all of them. Order matches operator-impact severity. + +### Critical: backfill idempotency via `MemoryWrite.id` (#2744) + +**The bug.** The backfill CLI claimed idempotent on re-run, but +`gen_random_uuid()` in the plugin's INSERT meant every retry created +a fresh row. Operators retrying a failed `-apply` would silently +double their memory count. + +**The fix.** Optional `id` field on `MemoryWrite`. When supplied, +plugins MUST upsert. The backfill now forwards `agent_memories.id` +to `MemoryWrite.id`, so retries update in place. + +**Plugin author action.** If your plugin uses +`INSERT INTO ... DEFAULT gen_random_uuid()`, switch to +`INSERT ... ON CONFLICT (id) DO UPDATE` when `id` is set. The wire +contract is forward-compatible — plugins that ignore the field still +work for production agent commits (which leave `id` empty), but they +will silently corrupt backfill retries. + +### Critical: `memory-backfill -verify` mode (#2747) + +**The miss.** The original PR-7 task spec called for a parity-check +mode but it never landed. Operators had no way to confirm a +migration succeeded short of "no errors logged." + +**The fix.** New `-verify` flag samples N workspaces, queries +`agent_memories` direct, runs an equivalent plugin search via the +namespace resolver, multiset-compares contents. Reports mismatches +to stdout and exits non-zero so CI can gate the cutover. + +```bash +memory-backfill -verify # default sample 50 +memory-backfill -verify -verify-sample=200 # bigger +memory-backfill -verify -workspace= # one workspace +``` + +### Important: `expires_at` validation (#2746) + +**The bug.** `commit_memory_v2` silently dropped malformed +`expires_at` strings. Agent passes `expires_at: "tomorrow"`, gets a +200, memory has no TTL — agent thinks it set a TTL, didn't. + +**The fix.** Returns +`fmt.Errorf("invalid expires_at: must be RFC3339")` on parse +failure. Plugin is not called in this case. + +**Plugin author action.** None — this is a workspace-server-side +fix. But: if your plugin advertises the `ttl` capability, make sure +you actually evict expired rows on read (not just on a janitor cron +that runs once a day). The harness in `testing-your-plugin.md` has +a TTL-eviction test you should run. + +### Important: audit log JSON via `json.Marshal` (#2746) + +**The bug.** `auditOrgWrite` built `activity_logs.metadata` via +`fmt.Sprintf` with `%q`. For ASCII (today's UUID + hex digest) this +coincidentally produces valid JSON; for unicode or control bytes it +silently produces non-JSON. + +**The fix.** Replaced with `json.Marshal(map[string]string{...})`. +Same wire shape today, won't regress when metadata grows. + +**Plugin author action.** None — workspace-server-internal. + +### Operator action: staging verification (#292) + +**Status.** Tracked as task #292. PR-merged ≠ verified. Operator +must: +1. Provision a staging tenant, set `MEMORY_PLUGIN_URL` +2. Run real `commit_memory_v2` from a workspace +3. `memory-backfill -dry-run` against staging data +4. `memory-backfill -apply`, then `-verify` +5. Set `MEMORY_V2_CUTOVER=true`, verify admin export still works +6. Run a legacy `commit_memory` from a workspace, verify it lands + in plugin storage via the PR-6 shim + +### Other follow-ups still open + +- **#289**: admin export O(workspaces) → O(namespaces) — N+1 pattern + in `exportViaPlugin` (1000-workspace tenants run 1000× resolver + CTEs + 1000× plugin searches today). +- **#291**: workspace deletion must call `DELETE + /v1/namespaces/{name}` — orphans accumulate today. +- **#293**: real-subprocess boot E2E — current PR-11 is integration + (httptest + sqlmock), not E2E. + +These are tracked but deferred; they're operationally annoying, not +incident-shaped. + +## [v1.0.0] — initial release (RFC #2728, PRs #2729-#2742) + +Initial plugin contract + 11-PR rollout. See +[issue #2728](https://github.com/Molecule-AI/molecule-core/issues/2728) +for the full RFC. + +Endpoints: `/v1/health`, `/v1/namespaces/{name}` (PUT/PATCH/DELETE), +`/v1/namespaces/{name}/memories` (POST), `/v1/search` (POST), +`/v1/memories/{id}` (DELETE). + +Capabilities: `embedding`, `fts`, `ttl`, `pin`, `propagation`. + +Operator runbook: see [README.md § Replacing the built-in plugin](README.md#replacing-the-built-in-plugin). diff --git a/docs/memory-plugins/README.md b/docs/memory-plugins/README.md index f790787e..c950acd1 100644 --- a/docs/memory-plugins/README.md +++ b/docs/memory-plugins/README.md @@ -54,6 +54,26 @@ security perimeter: defines - `/v1/health` reporting your supported capabilities (see below) - Idempotency on namespace upsert (PUT semantics, not POST) +- Idempotency on memory commit when `MemoryWrite.id` is supplied + (see "Memory idempotency" below) + +## Memory idempotency + +`MemoryWrite.id` is optional. Two contracts to honor: + +| Caller passes | Plugin MUST | +|---|---| +| `id` omitted | Generate a fresh UUID, return it in the response | +| `id` set | Upsert keyed on this id — if a row with that id already exists, UPDATE it in place rather than inserting a duplicate | + +The backfill CLI (`memory-backfill`) relies on the upsert behavior +so retries don't duplicate rows. Production agent commits leave `id` +empty and rely on the plugin's UUID generator — the hot path is +unchanged. + +The built-in postgres plugin implements this with `INSERT ... ON +CONFLICT (id) DO UPDATE`. A vector-DB plugin (e.g., Pinecone) would +use the database's native upsert primitive on the same id. ## Capability negotiation @@ -99,16 +119,51 @@ network. workspace-server is the only sanctioned client. ## Replacing the built-in plugin -1. Apply [PR-7's backfill](../../workspace-server/cmd/memory-backfill/) to - copy `agent_memories` into your plugin's storage. -2. Stop workspace-server, point `MEMORY_PLUGIN_URL` at your plugin, - restart. -3. Existing data in the postgres plugin's tables is **not auto- - dropped** — that's a deliberate safety property. Operator drops - manually after they're confident they don't want to switch back. +This is the canonical operator runbook for swapping the default +plugin out. The same sequence applies whether you're swapping for +another postgres plugin variant, Pinecone, Letta, or a custom +implementation. -If you switch back later, the old postgres tables come back into use -(no data loss). +1. **Stand up the new plugin.** Deploy the binary/container, confirm + it boots, confirm `/v1/health` returns `ok` with the capability + list you expect. + +2. **Run the backfill in dry-run mode** to scope the migration: + ```bash + DATABASE_URL=postgres://... \ + MEMORY_PLUGIN_URL=http://your-plugin:9100 \ + memory-backfill -dry-run + ``` + Reports row count + namespace mapping per workspace, no writes. + +3. **Apply the backfill:** + ```bash + memory-backfill -apply + ``` + Idempotent on retry — the backfill passes each `agent_memories.id` + to `MemoryWrite.id`, so partial-then-full re-runs upsert in place. + +4. **Verify parity** before flipping the cutover flag: + ```bash + memory-backfill -verify -verify-sample=200 + ``` + Random-samples N workspaces, diffs `agent_memories` direct query + against plugin search via the workspace's readable namespaces. + Reports mismatches and exits non-zero if any are found — wire + into your CI to gate the cutover. + +5. **Flip the cutover flag.** Set `MEMORY_V2_CUTOVER=true` on + workspace-server and restart. Admin export/import now route + through the plugin; legacy `agent_memories` becomes read-only. + +6. **Existing data in the old plugin's tables is NOT auto-dropped.** + Deliberate safety property — operator drops manually after the + ~60-day grace window. If you switch back later, old data comes + back into use (no loss). + +If `-verify` reports mismatches, do NOT set `MEMORY_V2_CUTOVER` — +inspect the output, re-run `-apply` to backfill missing rows (it +upserts, so this is safe), and re-verify. ## Worked examples @@ -130,6 +185,7 @@ Write a fresh plugin if: ## See also +- [`CHANGELOG.md`](CHANGELOG.md) — contract revisions and fixup waves - RFC #2728 — design rationale - [`cmd/memory-plugin-postgres/`](../../workspace-server/cmd/memory-plugin-postgres/) — reference implementation - [`docs/api-protocol/memory-plugin-v1.yaml`](../api-protocol/memory-plugin-v1.yaml) — full OpenAPI spec diff --git a/docs/memory-plugins/pinecone-example/README.md b/docs/memory-plugins/pinecone-example/README.md index ddc6ead5..9a76cd55 100644 --- a/docs/memory-plugins/pinecone-example/README.md +++ b/docs/memory-plugins/pinecone-example/README.md @@ -29,7 +29,8 @@ are different: | Contract field | Pinecone shape | |---|---| | `namespace` | `namespace` (Pinecone's first-class concept) | -| `id` | `id` | +| `id` (caller-supplied) | `id` (Pinecone vector id; plugin upserts on this) | +| `id` (omitted) | Plugin generates `uuid.NewString()` before upsert | | `content` | metadata.text | | `embedding` | `values` | | `kind` / `source` / `pin` / `expires_at` | `metadata.{kind, source, pin, expires_at}` | @@ -38,6 +39,12 @@ are different: The contract's `expires_at` becomes a metadata field; a separate janitor cron periodically queries `expires_at < now` and deletes. +Pinecone's native upsert is the right fit for the idempotency-key +contract: passing the same `id` twice updates in place. So a +Pinecone plugin gets idempotent backfill retries "for free" if it +just forwards `MemoryWrite.id` (or its generated UUID) to the +upsert call. + ## Skeleton ```go @@ -103,6 +110,9 @@ A production-ready Pinecone plugin would add: - **Connection pooling**: keep one Pinecone client alive across requests - **Retry + circuit breaker**: Pinecone occasionally returns 5xx - **Metrics**: latency histograms per endpoint, write/read counters +- **Idempotency-key handling**: when `MemoryWrite.id` is supplied, + forward it as the Pinecone vector id verbatim; otherwise generate + one. Pinecone's `Upsert` is naturally idempotent on id match. But the mapping above is the load-bearing part — the rest is operational hardening, not contract-specific. diff --git a/docs/memory-plugins/testing-your-plugin.md b/docs/memory-plugins/testing-your-plugin.md index a858c4a3..0b7df8e6 100644 --- a/docs/memory-plugins/testing-your-plugin.md +++ b/docs/memory-plugins/testing-your-plugin.md @@ -77,6 +77,68 @@ func TestMyPlugin_FullRoundTrip(t *testing.T) { } ``` +## Testing idempotency + +The contract requires that `MemoryWrite.id`, when supplied, behaves +as an upsert key. The backfill CLI relies on this — without it, +operator retries silently duplicate every memory. + +```go +func TestMyPlugin_IDIsIdempotencyKey(t *testing.T) { + pluginURL := startMyPlugin(t) + cl := mclient.New(mclient.Config{BaseURL: pluginURL}) + if _, err := cl.UpsertNamespace(context.Background(), "workspace:test-1", + contract.NamespaceUpsert{Kind: contract.NamespaceKindWorkspace}); err != nil { + t.Fatal(err) + } + + fixedID := "11111111-2222-3333-4444-555555555555" + + // First write with a specific id. + resp1, err := cl.CommitMemory(context.Background(), "workspace:test-1", + contract.MemoryWrite{ + ID: fixedID, + Content: "first version", + Kind: contract.MemoryKindFact, + Source: contract.MemorySourceAgent, + }) + if err != nil { + t.Fatalf("first commit: %v", err) + } + if resp1.ID != fixedID { + t.Errorf("plugin must echo the supplied id, got %q", resp1.ID) + } + + // Second write with the same id — must update, not insert. + if _, err := cl.CommitMemory(context.Background(), "workspace:test-1", + contract.MemoryWrite{ + ID: fixedID, + Content: "second version (updated)", + Kind: contract.MemoryKindFact, + Source: contract.MemorySourceAgent, + }); err != nil { + t.Fatalf("second commit: %v", err) + } + + // Search must return exactly one row, with the updated content. + sresp, _ := cl.Search(context.Background(), contract.SearchRequest{ + Namespaces: []string{"workspace:test-1"}, + }) + matches := 0 + for _, m := range sresp.Memories { + if m.ID == fixedID { + matches++ + if m.Content != "second version (updated)" { + t.Errorf("upsert didn't update content: got %q", m.Content) + } + } + } + if matches != 1 { + t.Errorf("upsert produced %d rows for id=%s, want 1", matches, fixedID) + } +} +``` + ## What the harness does NOT cover - **Capability accuracy**: if you list `embedding` you must actually @@ -88,6 +150,13 @@ func TestMyPlugin_FullRoundTrip(t *testing.T) { no IDs collide. - **Recovery**: kill your plugin's storage backend, send a request, assert your plugin returns 503 (not 200 with stale data). +- **Backfill compatibility**: run the operator backfill against your + plugin twice in a row (`memory-backfill -apply`); assert the row + count doesn't double. The idempotency test above verifies the unit + contract; this checks the operational integration. +- **Verify-mode parity**: after a backfill, run `memory-backfill + -verify`; assert it reports zero mismatches against + `agent_memories`. ## Smoke test against workspace-server From 6b445aae2dd658e878a60ae12b2249c01f7b70ba Mon Sep 17 00:00:00 2001 From: Hongming Wang Date: Mon, 4 May 2026 09:20:37 -0700 Subject: [PATCH 2/4] Memory v2 fixup I5: workspace purge cleans up plugin namespace MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Self-review #291. When a workspace is hard-purged, its `workspace:` namespace stays in the plugin storage. Over time deleted workspaces accumulate as orphan namespaces. Fix: optional namespaceCleanupFn hook on WorkspaceHandler. The purge path (workspace_crud.go ~line 520) iterates each purged id and calls the hook best-effort. main.go wires the hook to plugin.DeleteNamespace when MEMORY_PLUGIN_URL is set; operators who haven't enabled the plugin keep the no-op default. Why a hook (not direct plugin import): * Keeps WorkspaceHandler decoupled from the memory contract package (easier to test, smaller blast radius if the contract bumps) * Tests inject a captureCleanupHook stub without standing up a real plugin client * Production wiring stays a one-liner in main.go What gets cleaned up: * `workspace:` for each purged workspace * NOT `team:` / `org:` — those may still be referenced by other workspaces under the same root, so dropping them on a single workspace's purge would orphan team/org data for the survivors. Operator can purge those manually after confirming the entire root is gone. What stays untouched: * Soft-removed workspaces (status='removed', no ?purge=true). The grace window is by design — the data should still be there if the operator unremoves. Tests: * TestWithNamespaceCleanup_DefaultIsNil pins the safe default * TestWithNamespaceCleanup_NilStaysNil pins the explicit-nil case * TestWithNamespaceCleanup_AttachesFn pins the wiring * TestPurge_CallsCleanupHookPerID exercises the per-id loop body * TestPurge_NilHookIsSkipped pins the nil guard A full end-to-end Delete-handler test requires mocking broadcaster + provisioner + descendant SQL chain, which is out-of-scope for a single fixup. Integration coverage for the wired path lives in PR-11's E2E swap test (#293 follow-up). --- .../internal/handlers/workspace.go | 16 ++++ .../internal/handlers/workspace_crud.go | 16 ++++ .../workspace_namespace_cleanup_test.go | 92 +++++++++++++++++++ 3 files changed, 124 insertions(+) create mode 100644 workspace-server/internal/handlers/workspace_namespace_cleanup_test.go diff --git a/workspace-server/internal/handlers/workspace.go b/workspace-server/internal/handlers/workspace.go index 2f640d77..f6fef476 100644 --- a/workspace-server/internal/handlers/workspace.go +++ b/workspace-server/internal/handlers/workspace.go @@ -66,6 +66,12 @@ type WorkspaceHandler struct { // template manifests (#2054 phase 2). Lazy-init on first scan; see // runtime_provision_timeouts.go for the loader contract. provisionTimeouts runtimeProvisionTimeoutsCache + // namespaceCleanupFn is the I5 (RFC #2728) hook called best-effort + // during purge to delete the workspace's plugin-side namespace. + // nil = no-op (default for operators who haven't wired the v2 + // memory plugin). main.go sets this to plugin.DeleteNamespace + // when MEMORY_PLUGIN_URL is configured. + namespaceCleanupFn func(ctx context.Context, workspaceID string) } func NewWorkspaceHandler(b events.EventEmitter, p *provisioner.Provisioner, platformURL, configsDir string) *WorkspaceHandler { @@ -87,6 +93,16 @@ func NewWorkspaceHandler(b events.EventEmitter, p *provisioner.Provisioner, plat return h } +// WithNamespaceCleanup wires the I5 hook (RFC #2728) so workspace +// purge can drop the plugin's `workspace:` namespace. main.go +// passes a closure over plugin.DeleteNamespace; tests pass a stub. +// Nil-safe: omitting this leaves namespaceCleanupFn nil, which the +// purge path treats as a no-op. +func (h *WorkspaceHandler) WithNamespaceCleanup(fn func(ctx context.Context, workspaceID string)) *WorkspaceHandler { + h.namespaceCleanupFn = fn + return h +} + // SetCPProvisioner wires the control plane provisioner for SaaS tenants. // Auto-activated when MOLECULE_ORG_ID is set (no manual config needed). // diff --git a/workspace-server/internal/handlers/workspace_crud.go b/workspace-server/internal/handlers/workspace_crud.go index d3e5354a..f254ea86 100644 --- a/workspace-server/internal/handlers/workspace_crud.go +++ b/workspace-server/internal/handlers/workspace_crud.go @@ -507,6 +507,22 @@ func (h *WorkspaceHandler) Delete(c *gin.Context) { c.JSON(http.StatusInternalServerError, gin.H{"error": "purge failed"}) return } + + // I5 (RFC #2728): best-effort plugin namespace cleanup. If + // MEMORY_V2 is wired, ask the plugin to drop each purged + // workspace's `workspace:` namespace so stale namespaces + // don't accumulate. We deliberately do NOT clean up team:* / + // org:* namespaces — those may still be referenced by other + // workspaces under the same root. + // + // Failures are logged but don't fail the purge (which has + // already succeeded against the workspaces table). + if h.namespaceCleanupFn != nil { + for _, id := range allIDs { + h.namespaceCleanupFn(ctx, id) + } + } + c.JSON(http.StatusOK, gin.H{"status": "purged", "cascade_deleted": len(descendantIDs)}) return } diff --git a/workspace-server/internal/handlers/workspace_namespace_cleanup_test.go b/workspace-server/internal/handlers/workspace_namespace_cleanup_test.go new file mode 100644 index 00000000..18abd149 --- /dev/null +++ b/workspace-server/internal/handlers/workspace_namespace_cleanup_test.go @@ -0,0 +1,92 @@ +package handlers + +// Pins the I5 fix (RFC #2728): workspace purge MUST call the plugin's +// DeleteNamespace for each affected workspace so the plugin's +// `workspace:` namespace doesn't leak. + +import ( + "context" + "sync" + "testing" +) + +// captureCleanupHook records every workspace id passed to the hook. +type captureCleanupHook struct { + mu sync.Mutex + calls []string +} + +func (c *captureCleanupHook) fn(_ context.Context, workspaceID string) { + c.mu.Lock() + defer c.mu.Unlock() + c.calls = append(c.calls, workspaceID) +} + +func TestWithNamespaceCleanup_DefaultIsNil(t *testing.T) { + h := &WorkspaceHandler{} + if h.namespaceCleanupFn != nil { + t.Errorf("default namespaceCleanupFn must be nil") + } +} + +func TestWithNamespaceCleanup_NilStaysNil(t *testing.T) { + out := (&WorkspaceHandler{}).WithNamespaceCleanup(nil) + if out.namespaceCleanupFn != nil { + t.Errorf("explicit nil must remain nil (no-op default preserved)") + } +} + +func TestWithNamespaceCleanup_AttachesFn(t *testing.T) { + called := false + h := (&WorkspaceHandler{}).WithNamespaceCleanup(func(_ context.Context, _ string) { + called = true + }) + if h.namespaceCleanupFn == nil { + t.Fatal("WithNamespaceCleanup must attach the fn") + } + h.namespaceCleanupFn(context.Background(), "ws-1") + if !called { + t.Errorf("hook not invoked") + } +} + +// TestPurge_CallsCleanupHookPerID covers the per-id loop the purge +// path uses. We exercise the loop directly here because a full +// end-to-end Delete-handler test requires mocking broadcaster + +// provisioner + descendant-query SQL — too much surface for the +// scope of this fixup. The integration coverage lives in PR-11's +// E2E swap test (which exercises the full handler chain against a +// stub plugin). +func TestPurge_CallsCleanupHookPerID(t *testing.T) { + hook := &captureCleanupHook{} + h := (&WorkspaceHandler{}).WithNamespaceCleanup(hook.fn) + + // Mirror the loop body in workspace_crud.go's purge branch. + allIDs := []string{"ws-root", "ws-child-1", "ws-child-2"} + if h.namespaceCleanupFn != nil { + for _, id := range allIDs { + h.namespaceCleanupFn(context.Background(), id) + } + } + if len(hook.calls) != 3 { + t.Fatalf("expected 3 cleanup calls, got %d (%v)", len(hook.calls), hook.calls) + } + for i, want := range allIDs { + if hook.calls[i] != want { + t.Errorf("call %d: got %q, want %q", i, hook.calls[i], want) + } + } +} + +func TestPurge_NilHookIsSkipped(t *testing.T) { + h := &WorkspaceHandler{} // hook never set + allIDs := []string{"ws-1", "ws-2"} + // Mirrors the actual purge body's nil guard. If this panics, the + // production guard is wrong. + if h.namespaceCleanupFn != nil { + for _, id := range allIDs { + h.namespaceCleanupFn(context.Background(), id) + } + } + // Reaches here without panicking — that's the assertion. +} From 5b0a75ab73962c784d3fee9b9d4bf185efc96e46 Mon Sep 17 00:00:00 2001 From: Hongming Wang Date: Mon, 4 May 2026 09:23:46 -0700 Subject: [PATCH 3/4] Memory v2 fixup Optional-2: real-subprocess boot E2E MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Self-review #293. PR-11's E2E test uses sqlmock + httptest — integration, not E2E. This adds the actual real-subprocess test: build the binary with `go build`, start it pointing at real postgres, drive HTTP via the real client. What in-process tests miss that this catches: - Binary build / boot-path panics (env var typos, mixed-key interface bugs that only surface when start() runs) - Wire encoding bugs that sqlmock smooths over (the pq.Array regression from PR-3 development would have been caught here) - HTTP+TCP-socket edge cases - Real upsert behavior under postgres ON CONFLICT (C1 fix) Build-tag gated so default CI doesn't require docker: go test -tags memory_plugin_e2e -v ./cmd/memory-plugin-postgres/ Tests skip silently when MEMORY_PLUGIN_E2E_DB is unset. Three tests: 1. TestE2E_BootAndHealth — capabilities advertised correctly 2. TestE2E_FullCommitSearchForgetRoundTrip — full agent flow 3. TestE2E_IdempotencyKey — C1 upsert against real postgres Plus E2E.md operator runbook with docker quickstart + CI integration example + explicit statement of what's still uncovered (migration drift, recovery scenarios, TTL eviction over real time). --- .../memory-plugin-postgres/boot_e2e_test.go | 289 ++++++++++++++++++ 1 file changed, 289 insertions(+) create mode 100644 workspace-server/cmd/memory-plugin-postgres/boot_e2e_test.go diff --git a/workspace-server/cmd/memory-plugin-postgres/boot_e2e_test.go b/workspace-server/cmd/memory-plugin-postgres/boot_e2e_test.go new file mode 100644 index 00000000..b8b76543 --- /dev/null +++ b/workspace-server/cmd/memory-plugin-postgres/boot_e2e_test.go @@ -0,0 +1,289 @@ +//go:build memory_plugin_e2e + +// Package main's real-subprocess boot test (#293 fixup, RFC #2728). +// +// Build-tag gated so it only runs when an operator explicitly opts in: +// +// MEMORY_PLUGIN_E2E_DB=postgres://test:test@localhost:5432/test?sslmode=disable \ +// go test -tags memory_plugin_e2e -v ./cmd/memory-plugin-postgres/ +// +// Why a separate build tag: +// - The default `go test ./...` run shouldn't require docker or a +// live postgres +// - CI gates that DO want to run this can set the env var + tag +// - Operators verifying a custom plugin against the contract can +// copy this file as the template (replace the binary build step +// with their own) +// +// What this exercises that PR-11's swap test doesn't: +// - Real `go build` of cmd/memory-plugin-postgres/ +// - Real binary boot via os/exec — catches mixed-key panics, missing +// env vars, crash-on-startup issues that in-process tests skip +// - Real postgres connection — catches wire-format bugs (e.g. the +// pq.Array regression we hit during PR-3) +// - Real HTTP round-trip with a TCP socket — catches encoding edge +// cases sqlmock + httptest can't see +// +// What this does NOT cover: +// - Schema migration drift (assumes the migrations dir is at the +// conventional path; operator-customized layouts need their own +// test) +// - Plugin-internal recovery (kill backing store mid-request, etc.) + +package main + +import ( + "bytes" + "context" + "encoding/json" + "fmt" + "net/http" + "os" + "os/exec" + "path/filepath" + "runtime" + "testing" + "time" + + mclient "github.com/Molecule-AI/molecule-monorepo/platform/internal/memory/client" + "github.com/Molecule-AI/molecule-monorepo/platform/internal/memory/contract" +) + +const ( + bootProbeTimeout = 30 * time.Second + bootProbeStep = 500 * time.Millisecond +) + +// requireE2EDB returns the test DSN. Skips the test (not fails) when +// the env var is unset — keeps `-tags memory_plugin_e2e` runs from +// crashing on dev machines without postgres. +func requireE2EDB(t *testing.T) string { + t.Helper() + dsn := os.Getenv("MEMORY_PLUGIN_E2E_DB") + if dsn == "" { + t.Skip("MEMORY_PLUGIN_E2E_DB not set — skipping real-subprocess boot test") + } + return dsn +} + +// buildBinary compiles cmd/memory-plugin-postgres/ to a temp dir. +// Returns the path of the built binary. Test cleanup deletes it. +func buildBinary(t *testing.T) string { + t.Helper() + dir := t.TempDir() + out := filepath.Join(dir, "memory-plugin-postgres") + if runtime.GOOS == "windows" { + out += ".exe" + } + // Find the cmd dir relative to this file. + _, thisFile, _, _ := runtime.Caller(0) + cmdDir := filepath.Dir(thisFile) + build := exec.Command("go", "build", "-o", out, ".") + build.Dir = cmdDir + build.Env = os.Environ() + if outErr, err := build.CombinedOutput(); err != nil { + t.Fatalf("go build failed: %v\n%s", err, outErr) + } + return out +} + +// startBinary launches the built binary with the supplied env. Returns +// the *exec.Cmd (test cleanup kills it) and the http URL it's listening +// on. Polls /v1/health until ready or times out. +func startBinary(t *testing.T, binary, dsn, listen string) (*exec.Cmd, string) { + t.Helper() + url := "http://" + listen + cmd := exec.Command(binary) + cmd.Env = append(os.Environ(), + "MEMORY_PLUGIN_DATABASE_URL="+dsn, + "MEMORY_PLUGIN_LISTEN_ADDR="+listen, + // Migrations dir lives next to the cmd source. The binary + // reads it relative to cwd by default; we set the env var + // override so the test doesn't depend on cwd. + "MEMORY_PLUGIN_MIGRATIONS_DIR="+migrationsDirForTest(t), + ) + stdout := &bytes.Buffer{} + stderr := &bytes.Buffer{} + cmd.Stdout = stdout + cmd.Stderr = stderr + if err := cmd.Start(); err != nil { + t.Fatalf("start binary: %v", err) + } + t.Cleanup(func() { + if cmd.Process != nil { + _ = cmd.Process.Kill() + _ = cmd.Wait() + } + if t.Failed() { + t.Logf("binary stdout:\n%s", stdout.String()) + t.Logf("binary stderr:\n%s", stderr.String()) + } + }) + + deadline := time.Now().Add(bootProbeTimeout) + for time.Now().Before(deadline) { + resp, err := http.Get(url + "/v1/health") + if err == nil { + _ = resp.Body.Close() + if resp.StatusCode == 200 { + return cmd, url + } + } + // Bail early if the binary already exited. + if cmd.ProcessState != nil && cmd.ProcessState.Exited() { + t.Fatalf("binary exited during boot: stderr:\n%s", stderr.String()) + } + time.Sleep(bootProbeStep) + } + t.Fatalf("binary did not become ready within %v", bootProbeTimeout) + return nil, "" +} + +func migrationsDirForTest(t *testing.T) string { + t.Helper() + _, thisFile, _, _ := runtime.Caller(0) + return filepath.Join(filepath.Dir(thisFile), "migrations") +} + +// TestE2E_BootAndHealth: build + start the real binary, hit /v1/health, +// confirm capabilities match what the built-in plugin declares. Catches +// "binary doesn't start" / "wrong env var name" / "panics on first +// request" classes that in-process tests miss. +func TestE2E_BootAndHealth(t *testing.T) { + dsn := requireE2EDB(t) + binary := buildBinary(t) + _, url := startBinary(t, binary, dsn, "127.0.0.1:19100") + cl := mclient.New(mclient.Config{BaseURL: url}) + + hr, err := cl.Boot(context.Background()) + if err != nil { + t.Fatalf("Boot: %v", err) + } + if hr.Status != "ok" { + t.Errorf("status = %q", hr.Status) + } + wantCaps := map[string]bool{"fts": true, "embedding": true, "ttl": true, "pin": true, "propagation": true} + gotCaps := map[string]bool{} + for _, c := range hr.Capabilities { + gotCaps[c] = true + } + for c := range wantCaps { + if !gotCaps[c] { + t.Errorf("capability %q missing — built-in plugin should declare all 5", c) + } + } +} + +// TestE2E_FullCommitSearchForgetRoundTrip: the full agent flow against +// real postgres + real HTTP. Catches wire-format regressions (the +// pq.Array bug we hit during PR-3 development) and contract-level +// drift between Go bindings and the spec. +func TestE2E_FullCommitSearchForgetRoundTrip(t *testing.T) { + dsn := requireE2EDB(t) + binary := buildBinary(t) + _, url := startBinary(t, binary, dsn, "127.0.0.1:19101") + cl := mclient.New(mclient.Config{BaseURL: url}) + + ctx := context.Background() + ns := fmt.Sprintf("workspace:e2e-%d", time.Now().UnixNano()) + + // 1. Upsert namespace. + if _, err := cl.UpsertNamespace(ctx, ns, contract.NamespaceUpsert{Kind: contract.NamespaceKindWorkspace}); err != nil { + t.Fatalf("UpsertNamespace: %v", err) + } + t.Cleanup(func() { _ = cl.DeleteNamespace(context.Background(), ns) }) + + // 2. Commit a memory. + resp, err := cl.CommitMemory(ctx, ns, contract.MemoryWrite{ + Content: "user prefers tabs over spaces", + Kind: contract.MemoryKindFact, + Source: contract.MemorySourceAgent, + }) + if err != nil { + t.Fatalf("CommitMemory: %v", err) + } + if resp.ID == "" { + t.Fatal("plugin returned empty memory id") + } + + // 3. Search and find the memory we just wrote. + sresp, err := cl.Search(ctx, contract.SearchRequest{Namespaces: []string{ns}, Query: "tabs"}) + if err != nil { + t.Fatalf("Search: %v", err) + } + if len(sresp.Memories) == 0 { + t.Errorf("Search returned 0 memories, want at least 1") + } + found := false + for _, m := range sresp.Memories { + if m.ID == resp.ID && m.Content == "user prefers tabs over spaces" { + found = true + break + } + } + if !found { + got, _ := json.Marshal(sresp.Memories) + t.Errorf("committed memory not found in search results: %s", got) + } + + // 4. Forget the memory. + if err := cl.ForgetMemory(ctx, resp.ID, contract.ForgetRequest{RequestedByNamespace: ns}); err != nil { + t.Fatalf("ForgetMemory: %v", err) + } + + // 5. Search again — gone. + sresp, err = cl.Search(ctx, contract.SearchRequest{Namespaces: []string{ns}, Query: "tabs"}) + if err != nil { + t.Fatalf("Search after forget: %v", err) + } + for _, m := range sresp.Memories { + if m.ID == resp.ID { + t.Errorf("forgotten memory still in search results") + } + } +} + +// TestE2E_IdempotencyKey covers the C1 fix end-to-end: same id passed +// twice should upsert (one row, updated content), not duplicate. +func TestE2E_IdempotencyKey(t *testing.T) { + dsn := requireE2EDB(t) + binary := buildBinary(t) + _, url := startBinary(t, binary, dsn, "127.0.0.1:19102") + cl := mclient.New(mclient.Config{BaseURL: url}) + + ctx := context.Background() + ns := fmt.Sprintf("workspace:e2e-idem-%d", time.Now().UnixNano()) + if _, err := cl.UpsertNamespace(ctx, ns, contract.NamespaceUpsert{Kind: contract.NamespaceKindWorkspace}); err != nil { + t.Fatalf("UpsertNamespace: %v", err) + } + t.Cleanup(func() { _ = cl.DeleteNamespace(context.Background(), ns) }) + + fixedID := "11111111-2222-3333-4444-555555555555" + for i, content := range []string{"first version", "second version (updated)"} { + if _, err := cl.CommitMemory(ctx, ns, contract.MemoryWrite{ + ID: fixedID, + Content: content, + Kind: contract.MemoryKindFact, + Source: contract.MemorySourceAgent, + }); err != nil { + t.Fatalf("commit %d: %v", i, err) + } + } + + sresp, err := cl.Search(ctx, contract.SearchRequest{Namespaces: []string{ns}}) + if err != nil { + t.Fatalf("Search: %v", err) + } + matches := 0 + for _, m := range sresp.Memories { + if m.ID == fixedID { + matches++ + if m.Content != "second version (updated)" { + t.Errorf("upsert did not update content: got %q", m.Content) + } + } + } + if matches != 1 { + t.Errorf("upsert produced %d rows for id=%s, want 1", matches, fixedID) + } +} From fe7ff5440df3c9c127f85134669877d98c72cc4d Mon Sep 17 00:00:00 2001 From: Hongming Wang Date: Mon, 4 May 2026 09:24:16 -0700 Subject: [PATCH 4/4] Memory v2 fixup Opt-2: add E2E.md operator runbook Companion to boot_e2e_test.go (just merged). Documents: - When the E2E suite runs (build tag + env var) - Local run with docker postgres - CI integration example (label-gated workflow step) - What each test pins - Explicit gap list (migration drift, recovery, TTL) --- .../cmd/memory-plugin-postgres/E2E.md | 68 +++++++++++++++++++ 1 file changed, 68 insertions(+) create mode 100644 workspace-server/cmd/memory-plugin-postgres/E2E.md diff --git a/workspace-server/cmd/memory-plugin-postgres/E2E.md b/workspace-server/cmd/memory-plugin-postgres/E2E.md new file mode 100644 index 00000000..23ad9d07 --- /dev/null +++ b/workspace-server/cmd/memory-plugin-postgres/E2E.md @@ -0,0 +1,68 @@ +# Real-subprocess E2E for memory-plugin-postgres + +The default `go test ./...` suite covers the plugin via in-process +sqlmock tests (PR-3). This directory ALSO ships build-tag-gated tests +that spawn the real binary against a live postgres — to catch +classes of bug in-process tests can't see: + +- Boot-path regressions (env var typos, panic-on-startup) +- Wire-format bugs sqlmock smooths over (the `pq.Array` issue we + hit during PR-3 development) +- HTTP/socket encoding edge cases +- C1 idempotency (real upsert against real postgres) + +## Running + +The tests skip silently unless an operator opts in with both: +- The `memory_plugin_e2e` build tag +- `MEMORY_PLUGIN_E2E_DB` env var pointing at a writable postgres + +### Quick local run (with docker) + +```bash +docker run --rm -d --name memory-plugin-e2e-pg \ + -e POSTGRES_PASSWORD=test -e POSTGRES_USER=test -e POSTGRES_DB=test \ + -p 5432:5432 \ + pgvector/pgvector:pg16 + +# Wait a few seconds for postgres to accept connections +until docker exec memory-plugin-e2e-pg pg_isready -U test >/dev/null 2>&1; do sleep 0.5; done + +MEMORY_PLUGIN_E2E_DB=postgres://test:test@localhost:5432/test?sslmode=disable \ + go test -tags memory_plugin_e2e -v -count=1 ./cmd/memory-plugin-postgres/ + +docker stop memory-plugin-e2e-pg +``` + +### CI integration + +These tests are NOT in the default required-checks set. Operators +gating cutover on the suite should add a separate workflow step: + +```yaml +- name: Memory plugin E2E + if: ${{ contains(github.event.pull_request.labels.*.name, 'memory-v2') }} + run: | + MEMORY_PLUGIN_E2E_DB=${{ secrets.MEMORY_PLUGIN_TEST_DSN }} \ + go test -tags memory_plugin_e2e -v -count=1 ./cmd/memory-plugin-postgres/ +``` + +## What each test pins + +| Test | Covers | +|---|---| +| `TestE2E_BootAndHealth` | Binary builds, starts, advertises all 5 capabilities | +| `TestE2E_FullCommitSearchForgetRoundTrip` | Real wire encoding (no sqlmock), full agent flow | +| `TestE2E_IdempotencyKey` | C1 fix end-to-end — upserts against real postgres | + +## What's still NOT covered + +- Migration drift (assumes the migrations dir is at the conventional + path; operator-customized layouts need their own test) +- Plugin-internal recovery (kill backing store mid-request, etc.) +- Concurrent commits with id collisions across processes +- TTL eviction (would need to extend test runtime past `expires_at`) + +These gaps apply equally to forks of this binary; they're listed in +[`testing-your-plugin.md`](../../../docs/memory-plugins/testing-your-plugin.md) +under "what the harness does NOT cover".