memory: Phase A2 — backfill agent_memories rows into v2 plugin per tenant #1791

Closed
opened 2026-05-24 09:13:21 +00:00 by hongming · 1 comment
Owner

Summary

Phase A2 of the v1→v2 memory migration. Phase A1 (#1747 — kill the v1 SQL fallback) is live in production as of 2026-05-24; the v2 plugin is the only memory backend on every active tenant. A2 is the per-tenant cleanup: existing agent_memories rows need to be migrated into the v2 plugin so historical memory isn't orphaned.

Why now

  • Phase A1 verified stable in production (all 4 active tenants on platform-tenant:staging-272cb8b since 2026-05-24 08:05Z, memory plugin sidecar healthy, MEMORY_PLUGIN_URL wired).
  • Every memory write post-A1 goes to v2. Every read goes to v2. The agent_memories table is now read-only legacy state.
  • Without A2, anything an agent wrote pre-A1 is invisible — UI reads from v2, the row is in v1.

Scope

For each active tenant (currently 4: agents-team, hongming, chloe-dong, reno-stars):

  1. Read all rows from agent_memories on the tenant's DB.
  2. For each row, call the local memory plugin's CommitMemory with:
    • namespace = workspace:<workspace_id> (matches workspaceMemoryNamespace())
    • content = agent_memories.content
    • kind = MemoryKindFact (legacy rows are pre-commit_memory_v2; treat as factual baseline)
    • source = MemorySourceAgent (was written by agent in the old MCP path)
  3. Mark the source row as migrated_at = now() (additive column, no row delete) so A3 can verify completeness later.
  4. Idempotent: re-running the script must be a no-op (check migrated_at).

Implementation hints

  • Script lives in workspace-server/cmd/memory-plugin-postgres/migrations/ or as a one-shot in scripts/ops/.
  • Best run via SSM on the tenant EC2 — already-authenticated path used by RedeployTenant.
  • Per-tenant run is independent (each tenant has its own DB + plugin).
  • Per memory feedback_per_agent_gitea_identity_default — the migration script should run under a tenant-scoped service identity, not the founder PAT.

Acceptance

  • Script lives in repo with tests against sqlmock covering: empty source table → no-op; populated source → all rows committed; pre-migrated rows → skip.
  • Dry-run mode (--check-only) reports counts without writing.
  • Run against all 4 active tenants. Report per-tenant counts (rows migrated, rows skipped, rows failed).
  • Spot-check: pick 3 rows on a tenant, verify they're now visible via recall_memory MCP call AND via the canvas MemoryInspectorPanel.

Gating for Phase A3

Once this lands and a 30-day soak passes with no v1-vs-v2 divergence reports, file Phase A3 to drop the agent_memories table.

Discovered during

Multi-PR memory-system migration 2026-05-24. Phase A1 = #1747. Phase A2 = this issue. Phase A3 = follow-on.

## Summary Phase A2 of the v1→v2 memory migration. Phase A1 (#1747 — kill the v1 SQL fallback) is live in production as of 2026-05-24; the v2 plugin is the only memory backend on every active tenant. A2 is the per-tenant cleanup: existing `agent_memories` rows need to be migrated into the v2 plugin so historical memory isn't orphaned. ## Why now - Phase A1 verified stable in production (all 4 active tenants on `platform-tenant:staging-272cb8b` since 2026-05-24 08:05Z, memory plugin sidecar healthy, MEMORY_PLUGIN_URL wired). - Every memory write post-A1 goes to v2. Every read goes to v2. The `agent_memories` table is now read-only legacy state. - Without A2, anything an agent wrote pre-A1 is invisible — UI reads from v2, the row is in v1. ## Scope For each active tenant (currently 4: `agents-team`, `hongming`, `chloe-dong`, `reno-stars`): 1. Read all rows from `agent_memories` on the tenant's DB. 2. For each row, call the local memory plugin's `CommitMemory` with: - namespace = `workspace:<workspace_id>` (matches `workspaceMemoryNamespace()`) - content = `agent_memories.content` - kind = `MemoryKindFact` (legacy rows are pre-`commit_memory_v2`; treat as factual baseline) - source = `MemorySourceAgent` (was written by agent in the old MCP path) 3. Mark the source row as `migrated_at = now()` (additive column, no row delete) so A3 can verify completeness later. 4. Idempotent: re-running the script must be a no-op (check `migrated_at`). ## Implementation hints - Script lives in `workspace-server/cmd/memory-plugin-postgres/migrations/` or as a one-shot in `scripts/ops/`. - Best run via SSM on the tenant EC2 — already-authenticated path used by RedeployTenant. - Per-tenant run is independent (each tenant has its own DB + plugin). - Per memory `feedback_per_agent_gitea_identity_default` — the migration script should run under a tenant-scoped service identity, not the founder PAT. ## Acceptance - [ ] Script lives in repo with tests against `sqlmock` covering: empty source table → no-op; populated source → all rows committed; pre-migrated rows → skip. - [ ] Dry-run mode (`--check-only`) reports counts without writing. - [ ] Run against all 4 active tenants. Report per-tenant counts (rows migrated, rows skipped, rows failed). - [ ] Spot-check: pick 3 rows on a tenant, verify they're now visible via `recall_memory` MCP call AND via the canvas MemoryInspectorPanel. ## Gating for Phase A3 Once this lands and a 30-day soak passes with no v1-vs-v2 divergence reports, file Phase A3 to drop the `agent_memories` table. ## Discovered during Multi-PR memory-system migration 2026-05-24. Phase A1 = #1747. Phase A2 = this issue. Phase A3 = follow-on.
Author
Owner

Phase A2 complete — closing

End-to-end verification on production (2026-05-24):

Tenant agent_memories (v1) memory_plugin.memory_records (v2) Status
agents-team 1805 (frozen) 1805 → 1807 (live writes)
hongming 144 (frozen) 144
chloe-dong 1 (frozen) 1
reno-stars 102 (frozen) 102

Total: 2,052 historical rows migrated, zero errors. Live POST /memories now returns 201 with the row landing in memory_plugin.memory_records. The v2 plugin row count grew by 2 immediately post-backfill, confirming that the writer migration (#1794) is taking effect — new writes go to v2 only, agent_memories is frozen.

Chain summary

PR Role
#1794 Route POST /memories through v2 plugin (the writer migration)
#1795 Broadcast ACTIVITY_LOGGED on MCP memory writes (#1754 fix)
#1796 Bundle memory-backfill CLI into tenant image
#1798 URGENT fix — marshalMetadata nil→null (caught latent pq+jsonb bug that #1794 unmasked in production)

Post-recycle on staging-7604e11:

  1. All 4 tenants verified on new image with /memory-backfill binary
  2. Backfill applied per-tenant via docker exec molecule-tenant /memory-backfill -apply
  3. Live writes verified going to v2

Closes

Closes #1791 (Phase A2). Phase A3 (#1792 — drop agent_memories table) remains open per its 30-day soak gate.

## Phase A2 complete — closing End-to-end verification on production (2026-05-24): | Tenant | agent_memories (v1) | memory_plugin.memory_records (v2) | Status | |---|---|---|---| | agents-team | 1805 (frozen) | 1805 → 1807 (live writes) | ✅ | | hongming | 144 (frozen) | 144 | ✅ | | chloe-dong | 1 (frozen) | 1 | ✅ | | reno-stars | 102 (frozen) | 102 | ✅ | **Total: 2,052 historical rows migrated, zero errors.** Live POST /memories now returns 201 with the row landing in `memory_plugin.memory_records`. The v2 plugin row count grew by 2 immediately post-backfill, confirming that the writer migration (#1794) is taking effect — new writes go to v2 only, agent_memories is frozen. ### Chain summary | PR | Role | |---|---| | #1794 | Route POST /memories through v2 plugin (the writer migration) | | #1795 | Broadcast ACTIVITY_LOGGED on MCP memory writes (#1754 fix) | | #1796 | Bundle memory-backfill CLI into tenant image | | **#1798** | URGENT fix — marshalMetadata nil→`null` (caught latent pq+jsonb bug that #1794 unmasked in production) | Post-recycle on `staging-7604e11`: 1. All 4 tenants verified on new image with /memory-backfill binary 2. Backfill applied per-tenant via `docker exec molecule-tenant /memory-backfill -apply` 3. Live writes verified going to v2 ### Closes Closes #1791 (Phase A2). Phase A3 (#1792 — drop `agent_memories` table) remains open per its 30-day soak gate.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: molecule-ai/molecule-core#1791