Memory v2 docs update: idempotency key + verify mode + cutover runbook
Updates plugin-author and operator docs to reflect the four fixup
PRs (C1, C2, I1, I4) for self-review findings.
Stacked on C1+C2 so the docs reference behavior that lands in the
same wave; rebases to staging once those merge.
What changes:
* docs/memory-plugins/README.md
- New "Memory idempotency" section explaining MemoryWrite.id
contract: omit → plugin generates UUID; supplied → upsert
- "Replacing the built-in plugin" rewritten as a 6-step
operator runbook with concrete commands for -dry-run / -apply
/ -verify / MEMORY_V2_CUTOVER, including the failure path
("if -verify reports mismatches, do not flip the cutover flag")
- Added link to new CHANGELOG.md
* docs/memory-plugins/testing-your-plugin.md
- New TestMyPlugin_IDIsIdempotencyKey example: write same id
twice, assert single row + updated content
- "What the harness does NOT cover" expanded with two new
operational gates: backfill twice → no double; verify-mode
reports zero mismatches
* docs/memory-plugins/pinecone-example/README.md
- Wire-mapping table updated: id (caller-supplied) → Pinecone
vector id (upsert); id (omitted) → plugin-generated UUID
- Production-hardening checklist gained an idempotency-key item
* docs/memory-plugins/CHANGELOG.md (new)
- Captures the four fixup PRs in one place with severity-ordered
summary, plugin-author action items, and remaining open
follow-ups (#289, #291, #293) for transparency
No code changes. Docs-only PR.
This commit is contained in:
parent
1986260603
commit
2d783b5ca6
113
docs/memory-plugins/CHANGELOG.md
Normal file
113
docs/memory-plugins/CHANGELOG.md
Normal file
@ -0,0 +1,113 @@
|
||||
# Memory Plugin Contract — Changelog
|
||||
|
||||
Every breaking or operationally-relevant change to the v1 plugin
|
||||
contract or the workspace-server-side wiring lands here. Plugin
|
||||
authors should subscribe to PRs touching this file.
|
||||
|
||||
## [Unreleased] — fixup wave 1 (post-RFC-#2728 self-review)
|
||||
|
||||
A self-review of the initial 11-PR rollout (PRs #2729-#2742) flagged
|
||||
two correctness bugs and three operational hazards. This wave fixes
|
||||
all of them. Order matches operator-impact severity.
|
||||
|
||||
### Critical: backfill idempotency via `MemoryWrite.id` (#2744)
|
||||
|
||||
**The bug.** The backfill CLI claimed idempotent on re-run, but
|
||||
`gen_random_uuid()` in the plugin's INSERT meant every retry created
|
||||
a fresh row. Operators retrying a failed `-apply` would silently
|
||||
double their memory count.
|
||||
|
||||
**The fix.** Optional `id` field on `MemoryWrite`. When supplied,
|
||||
plugins MUST upsert. The backfill now forwards `agent_memories.id`
|
||||
to `MemoryWrite.id`, so retries update in place.
|
||||
|
||||
**Plugin author action.** If your plugin uses
|
||||
`INSERT INTO ... DEFAULT gen_random_uuid()`, switch to
|
||||
`INSERT ... ON CONFLICT (id) DO UPDATE` when `id` is set. The wire
|
||||
contract is forward-compatible — plugins that ignore the field still
|
||||
work for production agent commits (which leave `id` empty), but they
|
||||
will silently corrupt backfill retries.
|
||||
|
||||
### Critical: `memory-backfill -verify` mode (#2747)
|
||||
|
||||
**The miss.** The original PR-7 task spec called for a parity-check
|
||||
mode but it never landed. Operators had no way to confirm a
|
||||
migration succeeded short of "no errors logged."
|
||||
|
||||
**The fix.** New `-verify` flag samples N workspaces, queries
|
||||
`agent_memories` direct, runs an equivalent plugin search via the
|
||||
namespace resolver, multiset-compares contents. Reports mismatches
|
||||
to stdout and exits non-zero so CI can gate the cutover.
|
||||
|
||||
```bash
|
||||
memory-backfill -verify # default sample 50
|
||||
memory-backfill -verify -verify-sample=200 # bigger
|
||||
memory-backfill -verify -workspace=<uuid> # one workspace
|
||||
```
|
||||
|
||||
### Important: `expires_at` validation (#2746)
|
||||
|
||||
**The bug.** `commit_memory_v2` silently dropped malformed
|
||||
`expires_at` strings. Agent passes `expires_at: "tomorrow"`, gets a
|
||||
200, memory has no TTL — agent thinks it set a TTL, didn't.
|
||||
|
||||
**The fix.** Returns
|
||||
`fmt.Errorf("invalid expires_at: must be RFC3339")` on parse
|
||||
failure. Plugin is not called in this case.
|
||||
|
||||
**Plugin author action.** None — this is a workspace-server-side
|
||||
fix. But: if your plugin advertises the `ttl` capability, make sure
|
||||
you actually evict expired rows on read (not just on a janitor cron
|
||||
that runs once a day). The harness in `testing-your-plugin.md` has
|
||||
a TTL-eviction test you should run.
|
||||
|
||||
### Important: audit log JSON via `json.Marshal` (#2746)
|
||||
|
||||
**The bug.** `auditOrgWrite` built `activity_logs.metadata` via
|
||||
`fmt.Sprintf` with `%q`. For ASCII (today's UUID + hex digest) this
|
||||
coincidentally produces valid JSON; for unicode or control bytes it
|
||||
silently produces non-JSON.
|
||||
|
||||
**The fix.** Replaced with `json.Marshal(map[string]string{...})`.
|
||||
Same wire shape today, won't regress when metadata grows.
|
||||
|
||||
**Plugin author action.** None — workspace-server-internal.
|
||||
|
||||
### Operator action: staging verification (#292)
|
||||
|
||||
**Status.** Tracked as task #292. PR-merged ≠ verified. Operator
|
||||
must:
|
||||
1. Provision a staging tenant, set `MEMORY_PLUGIN_URL`
|
||||
2. Run real `commit_memory_v2` from a workspace
|
||||
3. `memory-backfill -dry-run` against staging data
|
||||
4. `memory-backfill -apply`, then `-verify`
|
||||
5. Set `MEMORY_V2_CUTOVER=true`, verify admin export still works
|
||||
6. Run a legacy `commit_memory` from a workspace, verify it lands
|
||||
in plugin storage via the PR-6 shim
|
||||
|
||||
### Other follow-ups still open
|
||||
|
||||
- **#289**: admin export O(workspaces) → O(namespaces) — N+1 pattern
|
||||
in `exportViaPlugin` (1000-workspace tenants run 1000× resolver
|
||||
CTEs + 1000× plugin searches today).
|
||||
- **#291**: workspace deletion must call `DELETE
|
||||
/v1/namespaces/{name}` — orphans accumulate today.
|
||||
- **#293**: real-subprocess boot E2E — current PR-11 is integration
|
||||
(httptest + sqlmock), not E2E.
|
||||
|
||||
These are tracked but deferred; they're operationally annoying, not
|
||||
incident-shaped.
|
||||
|
||||
## [v1.0.0] — initial release (RFC #2728, PRs #2729-#2742)
|
||||
|
||||
Initial plugin contract + 11-PR rollout. See
|
||||
[issue #2728](https://github.com/Molecule-AI/molecule-core/issues/2728)
|
||||
for the full RFC.
|
||||
|
||||
Endpoints: `/v1/health`, `/v1/namespaces/{name}` (PUT/PATCH/DELETE),
|
||||
`/v1/namespaces/{name}/memories` (POST), `/v1/search` (POST),
|
||||
`/v1/memories/{id}` (DELETE).
|
||||
|
||||
Capabilities: `embedding`, `fts`, `ttl`, `pin`, `propagation`.
|
||||
|
||||
Operator runbook: see [README.md § Replacing the built-in plugin](README.md#replacing-the-built-in-plugin).
|
||||
@ -54,6 +54,26 @@ security perimeter:
|
||||
defines
|
||||
- `/v1/health` reporting your supported capabilities (see below)
|
||||
- Idempotency on namespace upsert (PUT semantics, not POST)
|
||||
- Idempotency on memory commit when `MemoryWrite.id` is supplied
|
||||
(see "Memory idempotency" below)
|
||||
|
||||
## Memory idempotency
|
||||
|
||||
`MemoryWrite.id` is optional. Two contracts to honor:
|
||||
|
||||
| Caller passes | Plugin MUST |
|
||||
|---|---|
|
||||
| `id` omitted | Generate a fresh UUID, return it in the response |
|
||||
| `id` set | Upsert keyed on this id — if a row with that id already exists, UPDATE it in place rather than inserting a duplicate |
|
||||
|
||||
The backfill CLI (`memory-backfill`) relies on the upsert behavior
|
||||
so retries don't duplicate rows. Production agent commits leave `id`
|
||||
empty and rely on the plugin's UUID generator — the hot path is
|
||||
unchanged.
|
||||
|
||||
The built-in postgres plugin implements this with `INSERT ... ON
|
||||
CONFLICT (id) DO UPDATE`. A vector-DB plugin (e.g., Pinecone) would
|
||||
use the database's native upsert primitive on the same id.
|
||||
|
||||
## Capability negotiation
|
||||
|
||||
@ -99,16 +119,51 @@ network. workspace-server is the only sanctioned client.
|
||||
|
||||
## Replacing the built-in plugin
|
||||
|
||||
1. Apply [PR-7's backfill](../../workspace-server/cmd/memory-backfill/) to
|
||||
copy `agent_memories` into your plugin's storage.
|
||||
2. Stop workspace-server, point `MEMORY_PLUGIN_URL` at your plugin,
|
||||
restart.
|
||||
3. Existing data in the postgres plugin's tables is **not auto-
|
||||
dropped** — that's a deliberate safety property. Operator drops
|
||||
manually after they're confident they don't want to switch back.
|
||||
This is the canonical operator runbook for swapping the default
|
||||
plugin out. The same sequence applies whether you're swapping for
|
||||
another postgres plugin variant, Pinecone, Letta, or a custom
|
||||
implementation.
|
||||
|
||||
If you switch back later, the old postgres tables come back into use
|
||||
(no data loss).
|
||||
1. **Stand up the new plugin.** Deploy the binary/container, confirm
|
||||
it boots, confirm `/v1/health` returns `ok` with the capability
|
||||
list you expect.
|
||||
|
||||
2. **Run the backfill in dry-run mode** to scope the migration:
|
||||
```bash
|
||||
DATABASE_URL=postgres://... \
|
||||
MEMORY_PLUGIN_URL=http://your-plugin:9100 \
|
||||
memory-backfill -dry-run
|
||||
```
|
||||
Reports row count + namespace mapping per workspace, no writes.
|
||||
|
||||
3. **Apply the backfill:**
|
||||
```bash
|
||||
memory-backfill -apply
|
||||
```
|
||||
Idempotent on retry — the backfill passes each `agent_memories.id`
|
||||
to `MemoryWrite.id`, so partial-then-full re-runs upsert in place.
|
||||
|
||||
4. **Verify parity** before flipping the cutover flag:
|
||||
```bash
|
||||
memory-backfill -verify -verify-sample=200
|
||||
```
|
||||
Random-samples N workspaces, diffs `agent_memories` direct query
|
||||
against plugin search via the workspace's readable namespaces.
|
||||
Reports mismatches and exits non-zero if any are found — wire
|
||||
into your CI to gate the cutover.
|
||||
|
||||
5. **Flip the cutover flag.** Set `MEMORY_V2_CUTOVER=true` on
|
||||
workspace-server and restart. Admin export/import now route
|
||||
through the plugin; legacy `agent_memories` becomes read-only.
|
||||
|
||||
6. **Existing data in the old plugin's tables is NOT auto-dropped.**
|
||||
Deliberate safety property — operator drops manually after the
|
||||
~60-day grace window. If you switch back later, old data comes
|
||||
back into use (no loss).
|
||||
|
||||
If `-verify` reports mismatches, do NOT set `MEMORY_V2_CUTOVER` —
|
||||
inspect the output, re-run `-apply` to backfill missing rows (it
|
||||
upserts, so this is safe), and re-verify.
|
||||
|
||||
## Worked examples
|
||||
|
||||
@ -130,6 +185,7 @@ Write a fresh plugin if:
|
||||
|
||||
## See also
|
||||
|
||||
- [`CHANGELOG.md`](CHANGELOG.md) — contract revisions and fixup waves
|
||||
- RFC #2728 — design rationale
|
||||
- [`cmd/memory-plugin-postgres/`](../../workspace-server/cmd/memory-plugin-postgres/) — reference implementation
|
||||
- [`docs/api-protocol/memory-plugin-v1.yaml`](../api-protocol/memory-plugin-v1.yaml) — full OpenAPI spec
|
||||
|
||||
@ -29,7 +29,8 @@ are different:
|
||||
| Contract field | Pinecone shape |
|
||||
|---|---|
|
||||
| `namespace` | `namespace` (Pinecone's first-class concept) |
|
||||
| `id` | `id` |
|
||||
| `id` (caller-supplied) | `id` (Pinecone vector id; plugin upserts on this) |
|
||||
| `id` (omitted) | Plugin generates `uuid.NewString()` before upsert |
|
||||
| `content` | metadata.text |
|
||||
| `embedding` | `values` |
|
||||
| `kind` / `source` / `pin` / `expires_at` | `metadata.{kind, source, pin, expires_at}` |
|
||||
@ -38,6 +39,12 @@ are different:
|
||||
The contract's `expires_at` becomes a metadata field; a separate
|
||||
janitor cron periodically queries `expires_at < now` and deletes.
|
||||
|
||||
Pinecone's native upsert is the right fit for the idempotency-key
|
||||
contract: passing the same `id` twice updates in place. So a
|
||||
Pinecone plugin gets idempotent backfill retries "for free" if it
|
||||
just forwards `MemoryWrite.id` (or its generated UUID) to the
|
||||
upsert call.
|
||||
|
||||
## Skeleton
|
||||
|
||||
```go
|
||||
@ -103,6 +110,9 @@ A production-ready Pinecone plugin would add:
|
||||
- **Connection pooling**: keep one Pinecone client alive across requests
|
||||
- **Retry + circuit breaker**: Pinecone occasionally returns 5xx
|
||||
- **Metrics**: latency histograms per endpoint, write/read counters
|
||||
- **Idempotency-key handling**: when `MemoryWrite.id` is supplied,
|
||||
forward it as the Pinecone vector id verbatim; otherwise generate
|
||||
one. Pinecone's `Upsert` is naturally idempotent on id match.
|
||||
|
||||
But the mapping above is the load-bearing part — the rest is
|
||||
operational hardening, not contract-specific.
|
||||
|
||||
@ -77,6 +77,68 @@ func TestMyPlugin_FullRoundTrip(t *testing.T) {
|
||||
}
|
||||
```
|
||||
|
||||
## Testing idempotency
|
||||
|
||||
The contract requires that `MemoryWrite.id`, when supplied, behaves
|
||||
as an upsert key. The backfill CLI relies on this — without it,
|
||||
operator retries silently duplicate every memory.
|
||||
|
||||
```go
|
||||
func TestMyPlugin_IDIsIdempotencyKey(t *testing.T) {
|
||||
pluginURL := startMyPlugin(t)
|
||||
cl := mclient.New(mclient.Config{BaseURL: pluginURL})
|
||||
if _, err := cl.UpsertNamespace(context.Background(), "workspace:test-1",
|
||||
contract.NamespaceUpsert{Kind: contract.NamespaceKindWorkspace}); err != nil {
|
||||
t.Fatal(err)
|
||||
}
|
||||
|
||||
fixedID := "11111111-2222-3333-4444-555555555555"
|
||||
|
||||
// First write with a specific id.
|
||||
resp1, err := cl.CommitMemory(context.Background(), "workspace:test-1",
|
||||
contract.MemoryWrite{
|
||||
ID: fixedID,
|
||||
Content: "first version",
|
||||
Kind: contract.MemoryKindFact,
|
||||
Source: contract.MemorySourceAgent,
|
||||
})
|
||||
if err != nil {
|
||||
t.Fatalf("first commit: %v", err)
|
||||
}
|
||||
if resp1.ID != fixedID {
|
||||
t.Errorf("plugin must echo the supplied id, got %q", resp1.ID)
|
||||
}
|
||||
|
||||
// Second write with the same id — must update, not insert.
|
||||
if _, err := cl.CommitMemory(context.Background(), "workspace:test-1",
|
||||
contract.MemoryWrite{
|
||||
ID: fixedID,
|
||||
Content: "second version (updated)",
|
||||
Kind: contract.MemoryKindFact,
|
||||
Source: contract.MemorySourceAgent,
|
||||
}); err != nil {
|
||||
t.Fatalf("second commit: %v", err)
|
||||
}
|
||||
|
||||
// Search must return exactly one row, with the updated content.
|
||||
sresp, _ := cl.Search(context.Background(), contract.SearchRequest{
|
||||
Namespaces: []string{"workspace:test-1"},
|
||||
})
|
||||
matches := 0
|
||||
for _, m := range sresp.Memories {
|
||||
if m.ID == fixedID {
|
||||
matches++
|
||||
if m.Content != "second version (updated)" {
|
||||
t.Errorf("upsert didn't update content: got %q", m.Content)
|
||||
}
|
||||
}
|
||||
}
|
||||
if matches != 1 {
|
||||
t.Errorf("upsert produced %d rows for id=%s, want 1", matches, fixedID)
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## What the harness does NOT cover
|
||||
|
||||
- **Capability accuracy**: if you list `embedding` you must actually
|
||||
@ -88,6 +150,13 @@ func TestMyPlugin_FullRoundTrip(t *testing.T) {
|
||||
no IDs collide.
|
||||
- **Recovery**: kill your plugin's storage backend, send a request,
|
||||
assert your plugin returns 503 (not 200 with stale data).
|
||||
- **Backfill compatibility**: run the operator backfill against your
|
||||
plugin twice in a row (`memory-backfill -apply`); assert the row
|
||||
count doesn't double. The idempotency test above verifies the unit
|
||||
contract; this checks the operational integration.
|
||||
- **Verify-mode parity**: after a backfill, run `memory-backfill
|
||||
-verify`; assert it reports zero mismatches against
|
||||
`agent_memories`.
|
||||
|
||||
## Smoke test against workspace-server
|
||||
|
||||
|
||||
Loading…
Reference in New Issue
Block a user