Compare commits

...

3 Commits

Author SHA1 Message Date
hongming ea01341a9f ci: refire stale lint workflows on #1742 (post-persona-ack)
ci-arm64-advisory / fast-checks (pull_request) Waiting to run
Lint shellcheck (arm64 pilot) / shellcheck-arm64 (pilot) (pull_request) Waiting to run
CI / Python Lint & Test (pull_request) Successful in 8s
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 8s
E2E API Smoke Test / detect-changes (pull_request) Successful in 11s
E2E Staging SaaS (full lifecycle) / E2E Staging SaaS (pull_request) Has been skipped
CI / Detect changes (pull_request) Successful in 13s
E2E Chat / detect-changes (pull_request) Successful in 12s
Harness Replays / detect-changes (pull_request) Successful in 6s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 19s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 15s
Lint curl status-code capture / Scan workflows for curl status-capture pollution (pull_request) Successful in 22s
Lint forbidden tenant-env keys / Scan workspace_secrets writers for forbidden env keys (pull_request) Successful in 21s
E2E Staging SaaS (full lifecycle) / pr-validate (pull_request) Successful in 38s
Lint no tenant GITEA or GITHUB token write / Scan for repo-host token write into tenant workspace surface (pull_request) Successful in 20s
lint-required-workflows-docker-host-pinned / Lint docker-host pin on docker-touching workflows (pull_request) Successful in 37s
lint-continue-on-error-tracking / lint-continue-on-error-tracking (pull_request) Failing after 1m19s
review-check-tests / review-check.sh regression tests (pull_request) Successful in 13s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 10s
gate-check-v3 / gate-check (pull_request) Successful in 17s
qa-review / approved (pull_request) Failing after 5s
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 1m28s
Lint pre-flip continue-on-error / Verify continue-on-error flips have run-log proof (pull_request) Failing after 1m41s
sop-checklist / review-refire (pull_request) Has been skipped
sop-checklist / na-declarations (pull_request) N/A: qa-review, security-review
security-review / approved (pull_request) Failing after 9s
sop-checklist / all-items-acked (pull_request) Successful in 9s
sop-tier-check / tier-check (pull_request) Successful in 13s
Lint workflow YAML (Gitea-1.22.6-hostile shapes) / Lint workflow YAML for Gitea-1.22.6-hostile shapes (pull_request) Successful in 1m42s
lint-required-context-exists-in-bp / lint-required-context-exists-in-bp (pull_request) Successful in 1m55s
Ops Scripts Tests / Ops scripts (unittest) (pull_request) Successful in 1m45s
CI / Canvas (Next.js) (pull_request) Successful in 4s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 2s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 1m57s
E2E Chat / E2E Chat (pull_request) Successful in 5s
Harness Replays / Harness Replays (pull_request) Successful in 1s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 2s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 2s
CI / Platform (Go) (pull_request) Successful in 4m50s
CI / all-required (pull_request) Successful in 27m50s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
audit-force-merge / audit (pull_request) Successful in 6s
2026-05-23 19:42:00 -07:00
hongming e03b3fb243 review(memory-plugin): #1742 — search_path fallback to public + per-file embed test
ci-arm64-advisory / fast-checks (pull_request) Waiting to run
Lint shellcheck (arm64 pilot) / shellcheck-arm64 (pilot) (pull_request) Waiting to run
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 10s
CI / Python Lint & Test (pull_request) Successful in 7s
CI / Detect changes (pull_request) Successful in 10s
E2E API Smoke Test / detect-changes (pull_request) Successful in 11s
E2E Chat / detect-changes (pull_request) Successful in 15s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 13s
E2E Staging SaaS (full lifecycle) / pr-validate (pull_request) Successful in 46s
E2E Staging SaaS (full lifecycle) / E2E Staging SaaS (pull_request) Has been skipped
Handlers Postgres Integration / detect-changes (pull_request) Successful in 6s
Harness Replays / detect-changes (pull_request) Successful in 3s
Lint curl status-code capture / Scan workflows for curl status-capture pollution (pull_request) Successful in 5s
Lint forbidden tenant-env keys / Scan workspace_secrets writers for forbidden env keys (pull_request) Successful in 7s
Lint no tenant GITEA or GITHUB token write / Scan for repo-host token write into tenant workspace surface (pull_request) Successful in 4s
lint-continue-on-error-tracking / lint-continue-on-error-tracking (pull_request) Failing after 1m8s
Lint pre-flip continue-on-error / Verify continue-on-error flips have run-log proof (pull_request) Failing after 1m10s
lint-required-workflows-docker-host-pinned / Lint docker-host pin on docker-touching workflows (pull_request) Successful in 19s
lint-required-context-exists-in-bp / lint-required-context-exists-in-bp (pull_request) Successful in 1m26s
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 1m8s
review-check-tests / review-check.sh regression tests (pull_request) Successful in 7s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 6s
qa-review / approved (pull_request) Failing after 4s
security-review / approved (pull_request) Failing after 4s
sop-checklist / na-declarations (pull_request) N/A: (none)
Lint workflow YAML (Gitea-1.22.6-hostile shapes) / Lint workflow YAML for Gitea-1.22.6-hostile shapes (pull_request) Successful in 1m29s
Ops Scripts Tests / Ops scripts (unittest) (pull_request) Successful in 1m23s
CI / all-required (pull_request) Failing after 40m27s
E2E Chat / E2E Chat (pull_request) Successful in 3s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 2m31s
CI / Platform (Go) (pull_request) Successful in 5m24s
CI / Shellcheck (E2E scripts) (pull_request) Has been cancelled
CI / Canvas (Next.js) (pull_request) Has been cancelled
CI / Canvas Deploy Reminder (pull_request) Has been skipped
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 3s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 2s
Harness Replays / Harness Replays (pull_request) Successful in 1s
gate-check-v3 / gate-check (pull_request) Waiting to run
sop-checklist / all-items-acked (pull_request) Waiting to run
sop-checklist / review-refire (pull_request) Waiting to run
sop-tier-check / tier-check (pull_request) Waiting to run
Addresses two findings from the #1742 review (PR-comment context: the
extension-resolution finding was flagged as a production blocker for
any tenant that previously installed pgvector in `public`):

1. entrypoint-tenant.sh — search_path now `memory_plugin,public`
   (was `memory_plugin`). Pgvector is schema-bound, not relocated by
   search_path. On tenants where pgvector was already installed into
   `public` by a prior boot or operator action, the previous
   single-schema search_path made `vector(1536)` references in
   001_memory_v2.up.sql unresolvable — boot would die inside the 30s
   health gate. The `,public` fallback resolves `vector` via the
   tail schema while still landing every new CREATE TABLE in
   `memory_plugin` (SSOT preserved — search_path picks the FIRST
   writable schema for new objects).

   Verified locally with the exact failure case from the review:
   - Fresh tenant: CREATE EXTENSION vector → extension lands in
     `memory_plugin` (SSOT intent).
   - Pre-existing tenant (pgvector pre-installed in `public`):
     CREATE EXTENSION IF NOT EXISTS is a no-op, extension stays in
     `public`, vector(1536) resolves via fallback, plugin boots,
     memory tables still land in `memory_plugin`.

2. migrations_embed_test.go — replaced the global "at least one file
   has CREATE TABLE somewhere" assertion with explicit per-file
   pins:
   - 000_schema_bootstrap.up.sql MUST contain `CREATE SCHEMA`.
   - 001_memory_v2.up.sql MUST contain `CREATE TABLE` AND mention
     both `memory_records` and `memory_namespaces`.
   The global check let a future rewrite of 001 silently regress to
   schema-only as long as ANY embedded file declared a table.

3. Updated 000_schema_bootstrap.up.sql comment block to document
   the new search_path shape + the pgvector resolution rule, so a
   future reader doesn't have to re-derive why the fallback exists.

Stage A:
- Plugin tests still green (`go test -count=1
  ./cmd/memory-plugin-postgres/...`).
- Local Postgres-in-Docker round-trip exercising BOTH the
  pre-existing-public and fresh-tenant paths; logs + schema
  verification in the commit-discussion thread.

Refs: #1742 (this PR), review-finding C1 (extension resolution) +
C2 (test invariant weakened).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 16:53:43 -07:00
hongming 354bb5a4d4 feat(memory-plugin): #1733 A0 — isolate v2 plugin tables under memory_plugin schema
ci-arm64-advisory / fast-checks (pull_request) Waiting to run
Lint shellcheck (arm64 pilot) / shellcheck-arm64 (pilot) (pull_request) Waiting to run
sop-checklist / review-refire (pull_request) Waiting to run
sop-tier-check / tier-check (pull_request) Waiting to run
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 4s
CI / Detect changes (pull_request) Successful in 6s
CI / Python Lint & Test (pull_request) Successful in 3s
E2E API Smoke Test / detect-changes (pull_request) Successful in 6s
E2E Chat / detect-changes (pull_request) Successful in 6s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 6s
E2E Staging SaaS (full lifecycle) / pr-validate (pull_request) Successful in 35s
E2E Staging SaaS (full lifecycle) / E2E Staging SaaS (pull_request) Has been skipped
Handlers Postgres Integration / detect-changes (pull_request) Successful in 5s
Harness Replays / detect-changes (pull_request) Successful in 3s
Lint forbidden tenant-env keys / Scan workspace_secrets writers for forbidden env keys (pull_request) Successful in 3s
Lint no tenant GITEA or GITHUB token write / Scan for repo-host token write into tenant workspace surface (pull_request) Successful in 3s
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 1m0s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 4s
gate-check-v3 / gate-check (pull_request) Successful in 4s
qa-review / approved (pull_request) Successful in 3s
security-review / approved (pull_request) Failing after 3s
sop-checklist / all-items-acked (pull_request) acked: 0/7 — missing: comprehensive-testing, local-postgres-e2e, staging-smoke, +4 — body-unfilled: comprehensive-testing, local-postgres-e2
sop-checklist / na-declarations (pull_request) N/A: (none)
CI / Canvas (Next.js) (pull_request) Successful in 2s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 2s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 1m34s
E2E Chat / E2E Chat (pull_request) Successful in 3s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 4s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 3s
Harness Replays / Harness Replays (pull_request) Successful in 4s
CI / Platform (Go) (pull_request) Successful in 4m17s
CI / all-required (pull_request) Successful in 21m24s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
The Memory v2 plugin sidecar already auto-starts on tenant boot (per
entrypoint-tenant.sh, since 2026-05-14) when `MEMORY_V2_CUTOVER=true`,
which CP sets on every tenant since 2026-05-05 (controlplane
ec2.go:2232). Plugin tables today land in the tenant Postgres's `public`
schema, sharing the namespace with platform-tenant tables — a soft SSOT
violation flagged in #1733.

CTO decision 2026-05-23: same Postgres instance, separate schema.

Changes:
- New `cmd/memory-plugin-postgres/migrations/000_schema_bootstrap.up.sql`
  + matching `.down.sql`. Migration filenames sort alphabetically (per
  `main.go:runMigrationsFromEmbed → sort.Strings`), so 000 runs before
  the existing 001_memory_v2 which assumes the schema exists in
  search_path.
- `entrypoint-tenant.sh`: when defaulting `MEMORY_PLUGIN_DATABASE_URL`
  from the tenant's `DATABASE_URL`, append `search_path=memory_plugin`
  (handles both `?` and `&` URL forms). When the operator explicitly
  sets `MEMORY_PLUGIN_DATABASE_URL` (separate DB entirely), they keep
  full control — no rewriting.
- `migrations_embed_test.go`: relax the per-file assertion (was: every
  *.up.sql must contain `CREATE TABLE`; now: every file must contain a
  recognized DDL keyword AND at least one file declares CREATE TABLE).
  Required because the new 000 file is schema-only.

Stage A — verified locally against `pgvector/pgvector:pg16` (the same
image CP launches on tenant EC2):

- Plugin binary boots, embedded migrations apply in order
  `000_schema_bootstrap → 001_memory_v2`.
- Tables land in `memory_plugin` schema; `public` stays empty.
- End-to-end PUT namespace / POST memory / POST search round-trips
  through the new schema.
- The entrypoint URL-mutation logic exercised on 3 cases: DATABASE_URL
  with `?`, without `?`, and operator-override (no rewrite).
- `go test -short -count=1 ./...` green across all packages
  (workspace-server + plugin).

Operational follow-ups (deliberately out of scope for this PR):

- Tenants currently running with rows in `public.memory_*` need a
  one-time migration to `memory_plugin.memory_*`. Handled in #1733
  Phase A2 (per-tenant backfill via `cmd/memory-backfill`); not gating
  this PR because no production tenant has rows in v2 yet (every
  tenant_workspace_server has been silently falling back to v1 SQL on
  `agent_memories` since 2026-05-05).
- `CREATE EXTENSION vector` (in 001_memory_v2) implicitly lands in the
  first writable search_path schema — i.e. `memory_plugin` now. Dropping
  the schema cascade-drops the extension. If a future operator drops the
  schema intentionally, they must re-`CREATE EXTENSION`. Acceptable for
  a defensive operator action; flagged in `000_schema_bootstrap.down.sql`.

Refs: #1733 (memory SSOT — Phase A0 of revised plan). Followed by A1
(workspace-server: kill v1 fallback, fail-fast on missing plugin) in
a separate core PR.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 14:27:59 -07:00
4 changed files with 105 additions and 3 deletions
@@ -0,0 +1,9 @@
-- Reverse of 000_schema_bootstrap.up.sql.
--
-- Drops the dedicated schema and every plugin object inside it. Operator
-- runs this only when intentionally tearing down the v2 plugin store on
-- a shared tenant Postgres. Down migrations are NOT auto-applied by the
-- plugin (cmd/memory-plugin-postgres/main.go:runMigrations comment) —
-- this file is for manual operator-driven cleanup.
DROP SCHEMA IF EXISTS memory_plugin CASCADE;
@@ -0,0 +1,31 @@
-- Create the dedicated schema this plugin owns when it shares a Postgres
-- with the tenant platform (the default deployment shape — see
-- entrypoint-tenant.sh which appends `search_path=memory_plugin,public`
-- to DATABASE_URL when the operator hasn't set MEMORY_PLUGIN_DATABASE_URL
-- explicitly).
--
-- The schema name `memory_plugin` matches the search_path injected by
-- the entrypoint; sql.Open's URL controls *where* subsequent CREATE
-- TABLE lands (search_path) but does NOT auto-create the target schema,
-- so the plugin has to do it itself before 001_memory_v2 runs.
--
-- About the `,public` fallback in search_path: pgvector ships as an
-- extension that lives in one schema. On fresh tenants 001_memory_v2's
-- `CREATE EXTENSION IF NOT EXISTS vector` installs it into the first
-- writable schema in search_path (memory_plugin — SSOT preserved). On
-- tenants where pgvector was already installed into `public` by a
-- prior boot, the IF NOT EXISTS is a no-op and the extension stays in
-- public; the `vector(1536)` type reference in 001 then resolves via
-- the public fallback. Without that fallback, 001 would die with
-- "type vector does not exist" on every pre-existing tenant (#1742
-- review finding).
--
-- Operators who point the plugin at a dedicated database (no shared
-- tenant tables) can leave this no-op: CREATE SCHEMA IF NOT EXISTS is
-- idempotent and a fresh DB happily owns a `memory_plugin` schema.
--
-- Migration files are sorted alphabetically (cmd/memory-plugin-postgres/
-- main.go:runMigrationsFromEmbed → sort.Strings), so 000_ runs before
-- 001_memory_v2 which assumes the schema already exists in search_path.
CREATE SCHEMA IF NOT EXISTS memory_plugin;
@@ -34,13 +34,49 @@ func TestMigrationsEmbedded_ContainsCreateTable(t *testing.T) {
t.Errorf("read embedded %q: %v", e.Name(), err)
continue
}
if !strings.Contains(string(data), "CREATE TABLE") {
t.Errorf("embedded %q has no CREATE TABLE — wrong file embedded?", e.Name())
// Each file must contain at least one DDL statement we expect to see
// — guards against truncated / empty files that would silently embed.
if !containsAnyDDL(string(data)) {
t.Errorf("embedded %q contains no recognized DDL (CREATE TABLE/SCHEMA/EXTENSION/INDEX) — wrong or truncated file?", e.Name())
}
}
if !seenUp {
t.Fatal("no *.up.sql in embedded migrations — runtime would have no schema to apply")
}
// Per-file invariants (issue #1742 review finding). The previous global
// "at least one file has CREATE TABLE somewhere" check let a future
// rewrite of 001_memory_v2.up.sql silently regress to schema-only as
// long as any other file declared a table. Pin the load-bearing DDL
// per filename so a wrong-or-truncated 001 fails this test loudly.
assertFileContains(t, "000_schema_bootstrap.up.sql", "CREATE SCHEMA")
assertFileContains(t, "001_memory_v2.up.sql", "CREATE TABLE")
assertFileContains(t, "001_memory_v2.up.sql", "memory_records")
assertFileContains(t, "001_memory_v2.up.sql", "memory_namespaces")
}
// assertFileContains fails the test if the embedded migration `name`
// either can't be read or doesn't contain `needle`. Pulled out so each
// per-file pin reads as one line.
func assertFileContains(t *testing.T, name, needle string) {
t.Helper()
data, err := migrationsFS.ReadFile("migrations/" + name)
if err != nil {
t.Errorf("required migration %q not embedded: %v", name, err)
return
}
if !strings.Contains(string(data), needle) {
t.Errorf("migration %q must contain %q — wrong or truncated file?", name, needle)
}
}
func containsAnyDDL(sql string) bool {
for _, kw := range []string{"CREATE TABLE", "CREATE SCHEMA", "CREATE EXTENSION", "CREATE INDEX", "CREATE OR REPLACE", "ALTER TABLE"} {
if strings.Contains(sql, kw) {
return true
}
}
return false
}
// TestRunMigrationsFromEmbed_OrderingIsAlphabetic pins that we apply
+27 -1
View File
@@ -38,7 +38,33 @@ if [ "$MEMORY_V2_CUTOVER" = "true" ] || [ -n "$MEMORY_PLUGIN_URL" ]; then
memory_plugin_wanted=1
fi
if [ -z "$MEMORY_PLUGIN_DISABLE" ] && [ -n "$memory_plugin_wanted" ] && [ -n "$DATABASE_URL" ]; then
: "${MEMORY_PLUGIN_DATABASE_URL:=$DATABASE_URL}"
# Schema isolation (issue #1733): when defaulting from the tenant
# DATABASE_URL we co-locate the plugin's tables under a dedicated
# `memory_plugin` schema so they never collide with platform-tenant
# tables in `public`. The plugin's 000_schema_bootstrap migration
# creates the schema; search_path here directs every subsequent CREATE
# TABLE / SELECT to land in it.
#
# The search_path includes `public` as a fallback so the `vector` type
# resolves regardless of which schema pgvector was installed into.
# Fresh tenants (no prior `CREATE EXTENSION vector`) install the
# extension into `memory_plugin` (first writable schema in the path),
# keeping the SSOT intent. Tenants where pgvector was already
# installed into `public` by a prior boot or operator action keep the
# extension where it is and resolve `vector(1536)` via the public
# fallback — without this fallback those tenants would crash the
# plugin boot with "type vector does not exist" once the migrations
# try to create memory_records (#1742 review finding).
#
# Operators who explicitly set MEMORY_PLUGIN_DATABASE_URL (separate DB
# entirely) keep full control — search_path is only injected when we
# default from DATABASE_URL.
if [ -z "$MEMORY_PLUGIN_DATABASE_URL" ]; then
case "$DATABASE_URL" in
*\?*) MEMORY_PLUGIN_DATABASE_URL="${DATABASE_URL}&search_path=memory_plugin,public" ;;
*) MEMORY_PLUGIN_DATABASE_URL="${DATABASE_URL}?search_path=memory_plugin,public" ;;
esac
fi
: "${MEMORY_PLUGIN_LISTEN_ADDR:=:9100}"
export MEMORY_PLUGIN_DATABASE_URL MEMORY_PLUGIN_LISTEN_ADDR
echo "memory-plugin: starting sidecar on $MEMORY_PLUGIN_LISTEN_ADDR" >&2