fix(memories): upsert namespace before HTTP commit — fleet-wide memory-write outage #2517
Reference in New Issue
Block a user
Delete Branch "fix/memories-http-upsert-namespace"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Severity: HIGH — every tenant, every memory write through the HTTP surface
POST /workspaces/:id/memories— the path behind the runtime'sa2a commit_memoryMCP tool and the canvas — callsplugin.CommitMemorywithout first ensuring thememory_namespacesrow exists. The plugin contract (pgplugin/store.go) is "namespace must already exist (auto-created by handler if not)" andmemory_recordshas an FK tomemory_namespaces, so any workspace whose namespace row was never seeded fails every write:Live evidence (2026-06-10): reproduced 500
failed to store memoryon jrs-auto, hongming, and agents-team; tenant-box logs show the FK error for the jrs-auto SEO agent (workspace28f97a7f), which surfaced to the CTO as "平台 memory 保存遇到技术问题(可能 RBAC 限制)". Reads (recall/search) are unaffected — namespace rows only gate writes.The MCP tool path (
mcp_tools_memory_v2.go:121) has always upserted before committing. Only this HTTP path skipped it — so every workspace created after the Phase A2 backfill that only writes through this surface has silently lost all memory persistence ("memory is the only thing that persists" per the agent runbook, making this a data-loss-class bug).Fix
Mirror the MCP path: idempotent
UpsertNamespace(ns, kindFromNamespace(ns))immediately before the write. Upsert failure → same stable generic 500 (no plugin internals leaked), andCommitMemorynever runs.Tests
TestMemoriesCommit_UpsertsNamespaceBeforeWrite— call-order pinned (upsert → commit), namespace + kind asserted. MUTATION: drop the upsert → RED with the exact production failure.TestMemoriesCommit_UpsertError_500— fail-closed, stable error body, no write attempted.internal/handlerssuite green;go vet+gofmtclean;-tags=integrationbuilds.🤖 Generated with Claude Code
Security+correctness 5-axis — APPROVE (head
0e232f370d). fix(memories): upsert namespace before the HTTP Commit write (+120/-3).Commitpath went straight toplugin.CommitMemorywithout ensuring thememory_namespacesrow exists (memory_records has an FK to it) — so any workspace whose namespace was never seeded (the runtime a2a commit_memory tool + canvas, post-Phase-A2-backfill) failed EVERY write withmemory_records_namespace_fkey(fleet-wide, 2026-06-10). The fix addsUpsertNamespace(ctx, nsName, {Kind: kindFromNamespace(nsName)})before the write — mirrors the MCP-tool path that always did this; idempotent (warm-namespace = cheap no-op). Sound fix for a real fleet-wide regression.nsNameis derived from the authenticated workspace + body.Scope (not arbitrary client-spoofable cross-workspace namespace); UpsertNamespace parameterized via contract — no SQL-injection, no cross-workspace write. Content-clean (no secrets/host). ✓TestMemoriesCommit_UpsertsNamespaceBeforeWritepins the regression (non-vacuous — asserts the upsert precedes the write). ✓GATE: required aggregate GREEN — CI/all-required ✓, Platform(Go) ✓, E2E-API ✓, Handlers-PG ✓. ⛔ sop-checklist(pull_request_target) is PENDING (running) → merger verify-by-state it greens before merge (do NOT merge over pending sop-pt). Local-Provision (ignore-list) + qa-review (team-gate) reds are non-code. Author core-devops (≠me). Sound — APPROVE; needs CR-B qa 2nd lane + sop-pt-green → merge.
qa 2nd-lane (full-SHA pinned). fix(memories): upsert namespace before HTTP commit — HIGH-sev (every tenant, every memory write via the runtime's a2a commit_memory). DIFF VALIDATED: memories.go now upserts the namespace row BEFORE the HTTP memory-commit, so a first-write to a not-yet-existing namespace no longer FK-violates (23503) and fails the write. Namespace resolves to 'general' default, ≤50 chars; symmetric WithMemoryV2 wiring; SAFE-T1201 redaction preserved. memories_test.go (+100) covers it. Sound, targeted fix.
⚠️ GATE-TRANSPARENT MERGE-HELD: the red Local Provision Lifecycle E2E is NOT diff-caused — this PR touches only the memories HTTP handler (no provisioning path), and CURRENT MAIN (
6f0b7ba8) ITSELF has Local Provision red (one variant SUCCESS 2m1s, one variant FAILURE 7m16s). So it's a MAIN-LEVEL inherited failure. This APPROVE certifies the diff + arms 2-genuine; merge stays HELD via verify-by-state until Local Provision greens (needs the main-level Local-Provision fix/re-run, not a change here). APPROVED (diff-validated; merge-on-LocalProvision-green).