Compare commits

...

1 Commits

Author SHA1 Message Date
hongming 4e16e385ab fix(memory-plugin): emit JSON literal "null" for nil metadata/propagation
ci-arm64-advisory / fast-checks (pull_request) Waiting to run
Lint shellcheck (arm64 pilot) / shellcheck-arm64 (pilot) (pull_request) Successful in 7s
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 3s
CI / Detect changes (pull_request) Successful in 6s
CI / Python Lint & Test (pull_request) Successful in 3s
E2E API Smoke Test / detect-changes (pull_request) Successful in 9s
E2E Chat / detect-changes (pull_request) Successful in 8s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 7s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 5s
Harness Replays / detect-changes (pull_request) Successful in 5s
Lint forbidden tenant-env keys / Scan workspace_secrets writers for forbidden env keys (pull_request) Successful in 4s
Lint no tenant GITEA or GITHUB token write / Scan for repo-host token write into tenant workspace surface (pull_request) Successful in 4s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 4s
gate-check-v3 / gate-check (pull_request) Successful in 7s
qa-review / approved (pull_request) Failing after 5s
sop-checklist / na-declarations (pull_request) N/A: (none)
security-review / approved (pull_request) Failing after 5s
sop-checklist / review-refire (pull_request) Has been skipped
sop-checklist / all-items-acked (pull_request) Successful in 4s
sop-tier-check / tier-check (pull_request) Successful in 5s
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 58s
CI / Canvas (Next.js) (pull_request) Successful in 5s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 2s
E2E Chat / E2E Chat (pull_request) Successful in 7s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 7s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 12s
Harness Replays / Harness Replays (pull_request) Successful in 7s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 2m14s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
CI / Platform (Go) (pull_request) Successful in 7m10s
CI / all-required (pull_request) Successful in 9m11s
audit-force-merge / audit (pull_request) Successful in 6s
URGENT — fixes a production regression introduced by #1794 hitting a
latent bug in the plugin's marshalMetadata helper.

Symptom: every POST /workspaces/:id/memories on tenants running the
post-#1794 image returns HTTP 500. Tenant logs show:

    Commit memory error (plugin): memory-plugin: internal:
    commit memory: pq: invalid input syntax for type json

Same error fires for UpsertNamespace called from the Phase A2 backfill
(cmd/memory-backfill), causing the client's circuit breaker to open
after 3 failures and rejecting EVERY subsequent plugin call for that
process.

Root cause: marshalMetadata returned Go nil for nil-map input. lib/pq
sends Go nil as a bytea-typed NULL placeholder. PostgreSQL refused to
implicitly cast that placeholder into the target jsonb column
(memory_namespaces.metadata + memory_records.propagation), so the
INSERT failed at parse time. Direct `INSERT ... metadata=NULL` works
fine from psql, so the bug is purely on the driver-binding side —
sending the JSON literal `null` keeps the parameter typed as a
valid jsonb value regardless of pq's auto-conversion heuristics.

Existing sqlmock tests use sqlmock.AnyArg() for the metadata column,
so they didn't catch the real-postgres type mismatch.

Verified:
- All four production tenants on staging-39c8618 currently FAIL every
  POST /memories with the JSON syntax error
- Direct `INSERT INTO memory_plugin.memory_namespaces ... NULL` from
  psql succeeds — confirms PostgreSQL is fine with NULL; only the
  pq-binding path is broken
- pgplugin unit test updated to pin the new contract: nil input →
  `[]byte("null")`

Once this lands + image rebuilds + tenants recycle:
- POST /memories writes will succeed (closes the #1794 regression)
- Phase A2 backfill via /memory-backfill will succeed (closes #1791
  step 2)

Closes the production regression. Phase A2 backfill is now unblocked.
2026-05-24 03:31:54 -07:00
2 changed files with 25 additions and 3 deletions
@@ -349,8 +349,23 @@ func scanMemory(row interface {
}
func marshalMetadata(m map[string]interface{}) ([]byte, error) {
// Returning Go nil here previously made lib/pq send the parameter
// as a bytea-typed NULL placeholder that PostgreSQL refused to
// implicitly cast into the jsonb column. The resulting INSERT
// failed with `pq: invalid input syntax for type json`, which
// surfaced first as "scan namespace" errors from the QueryRow
// path (the row never gets created so RETURNING produces no rows;
// the QueryRow error propagates through Scan). Caught 2026-05-24
// during the Phase A2 backfill — UpsertNamespace failed on every
// call, then the client's circuit breaker opened and ALL plugin
// writes started failing. The same hazard exists on the
// memory_records.propagation column (also marshalled via this
// helper). Sending the JSON literal `null` keeps the parameter
// typed as a valid jsonb value regardless of pq's auto-conversion
// heuristics — semantically identical to NULL for our consumers
// (they treat empty propagation/metadata as absent).
if m == nil {
return nil, nil
return []byte(`null`), nil
}
b, err := json.Marshal(m)
if err != nil {
@@ -17,12 +17,19 @@ import (
// --- marshalMetadata corner cases ---
func TestMarshalMetadata_Nil(t *testing.T) {
// Post-fix (2026-05-24): nil input must return the JSON literal
// `null` (4 bytes), NOT Go nil. Returning Go nil made lib/pq send
// a bytea NULL parameter that PostgreSQL refused to implicitly
// cast into the target jsonb column, breaking every UpsertNamespace
// and CommitMemory call with `pq: invalid input syntax for type
// json`. The Phase A2 backfill exposed it (production saw every
// POST /memories return 500 immediately post-tenant-recycle).
got, err := marshalMetadata(nil)
if err != nil {
t.Errorf("err = %v", err)
}
if got != nil {
t.Errorf("got = %v, want nil", got)
if string(got) != "null" {
t.Errorf("got = %q (%v), want %q", string(got), got, "null")
}
}