94bdd8ff35
ci-arm64-advisory / fast-checks (pull_request) Waiting to run
Lint shellcheck (arm64 pilot) / shellcheck-arm64 (pilot) (pull_request) Successful in 8s
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 9s
Check migration collisions / Migration version collision check (pull_request) Successful in 35s
CI / Detect changes (pull_request) Successful in 10s
CI / Python Lint & Test (pull_request) Successful in 5s
E2E API Smoke Test / detect-changes (pull_request) Successful in 9s
E2E Chat / detect-changes (pull_request) Successful in 7s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 10s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 4s
Harness Replays / detect-changes (pull_request) Successful in 3s
Lint forbidden tenant-env keys / Scan workspace_secrets writers for forbidden env keys (pull_request) Successful in 6s
Lint no tenant GITEA or GITHUB token write / Scan for repo-host token write into tenant workspace surface (pull_request) Successful in 3s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 5s
gate-check-v3 / gate-check (pull_request) Successful in 3s
qa-review / approved (pull_request) Failing after 3s
sop-checklist / na-declarations (pull_request) N/A: (none)
security-review / approved (pull_request) Failing after 4s
sop-checklist / all-items-acked (pull_request) Successful in 4s
sop-checklist / review-refire (pull_request) Has been skipped
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 1m30s
sop-tier-check / tier-check (pull_request) Successful in 6s
CI / Canvas (Next.js) (pull_request) Successful in 2s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 12s
E2E Chat / E2E Chat (pull_request) Successful in 6s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 6s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 1m39s
Harness Replays / Harness Replays (pull_request) Successful in 3s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 2m13s
E2E Staging External Runtime / E2E Staging External Runtime (pull_request) Successful in 5m14s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
CI / Platform (Go) (pull_request) Successful in 5m22s
CI / all-required (pull_request) Successful in 13m18s
audit-force-merge / audit (pull_request) Successful in 23s
Closes the v1→v2 memory migration. Phase A2 (#1791) ran on production 2026-05-24 and verified parity: every active tenant has its agent_memories rows mirrored 1:1 into memory_plugin.memory_records, live writes go to v2 only (v1 frozen). With parity confirmed, this PR drops the entire v1 surface. Per the audit before this PR: | Tenant | v1 (frozen) | v2 (live) | Status | |---|---|---|---| | agents-team | 1805 | 1805+live | parity | | hongming | 144 | 144 | parity | | chloe-dong | 1 | 1 | parity | | reno-stars | 102 | 102 | parity | ## Changes 1. **Migration** drops the agent_memories table. Down migration recreates an empty table for tool symmetry; rollback would not restore data (A2 was one-way). 2. **memories.go**: removed Search, Update, Delete methods + their dead helpers (EmbeddingFunc, embed field, WithEmbedding, formatVector, nextArg, memoryFTSMinQueryLen, memoryRecallMaxLimit). Kept Commit, which post-#1794 routes through the v2 plugin. 3. **router.go**: removed GET /memories, DELETE /memories/:id, PATCH /memories/:id routes. Callers use /v2/memories (canvas does this already) and /v2/memories/:id (Forget) instead. POST /memories stays — it's the high-volume write surface, still on v2. 4. **activity.go**: dropped the agent_memories UNION branch from buildSessionSearchQuery. Session search now returns only activity_logs items; memory-tab content comes from /v2/memories directly via MemoryInspectorPanel. 5. **workspace_crud.go**: removed agent_memories from the workspace purge cleanup list. Memory rows now cascade-delete via the memory plugin's namespace deletion path. 6. **entrypoint-tenant.sh**: removed the MEMORY_V2_CUTOVER deprecation shim (#1747 deprecated it; A3 retires the synonym). New tenants use MEMORY_PLUGIN_URL directly. Controlplane user-data still sets MEMORY_V2_CUTOVER='true' as belt-and-suspenders — that's a no-op now and will be cleaned up in a separate molecule-controlplane PR. 7. **Tests**: removed test functions that exercised the deleted methods (Search/Update/Delete and the embed/recall paths). Tests for Commit + redactSecrets stay. ## Risk - **Hard 404** on any caller still hitting GET /workspaces/:id/memories, PATCH /workspaces/:id/memories/:id, or DELETE /workspaces/:id/memories/:id. Production traffic audit showed 2 GETs vs 66 POSTs to legacy /memories over a 24h window — runtime callers are POST-dominant. Canvas reads from /v2/memories. Acceptable. - **No DB rollback** restores data — A2 was one-way. If a critical bug appears post-merge, recover via memory_plugin.memory_records direct SQL (data is preserved there). ## SOP Checklist (RFC #351) ### 1. Comprehensive testing performed - `go test -short -count=1 ./internal/handlers/` green. - `go test -short -count=1 ./cmd/memory-backfill/` green (sqlmock tests still pass; tool is now effectively inert on tenants since the source table is gone but the binary stays for one image cycle). - `go vet ./...` clean. ### 2. Local-postgres E2E run N/A. Schema change verified against the well-tested migration tool shape; no new SQL paths added. ### 3. Staging-smoke verified or pending Pending merge + tenant recycle. Will verify by SSM-checking that agent_memories is gone from each tenant's DB and POST /memories still returns 201 with rows landing in memory_plugin.memory_records. ### 4. Root-cause not symptom Yes. The v1 table existed only as a dual-write target during the A1+A2 transition. With A2 done and parity verified, the table is dead weight. Dropping it removes the SSOT-violation surface entirely. ### 5. Five-Axis review walked Walked solo. Happy to dispatch a hostile reviewer if anyone wants sign-off on the cleanup scope (whether to also drop memory-backfill binary, the activity UNION removal, etc). ### 6. No backwards-compat shim / dead code added Net deletion: -787 LOC across 7 files. The MEMORY_V2_CUTOVER shim is removed (was the last backwards-compat hook). One follow-up needed: controlplane ec2.go still sets MEMORY_V2_CUTOVER='true' — that's a no-op now but should be cleaned up in a separate PR for tidiness. ### 7. Memory/saved-feedback consulted - `feedback_no_single_source_of_truth` — A3 is the final step in establishing v2 as the only memory backend. - `feedback_check_for_parallel_work_before_fix_pr` — grep'd recent PRs touching memories.go / activity.go / workspace_crud.go; no parallel in flight. Closes #1792. Memory v1→v2 migration complete.
128 lines
5.2 KiB
Bash
128 lines
5.2 KiB
Bash
#!/bin/sh
|
|
# Tenant entrypoint — starts both Go platform (API) and Canvas (UI).
|
|
#
|
|
# Container runs as non-root 'canvas' user (USER directive in Dockerfile.tenant).
|
|
# Both processes start as non-root. SIGTERM propagates to child processes via the
|
|
# shell's trap + wait -n pattern below.
|
|
#
|
|
# Go platform listens on :8080 (Fly health checks hit this port).
|
|
# Canvas Node.js listens on :3000 (internal only).
|
|
# The Go platform's fallback handler proxies non-API routes to :3000
|
|
# so the browser only ever talks to :8080.
|
|
#
|
|
# If either process dies, we kill the other and exit non-zero so Fly
|
|
# restarts the machine.
|
|
|
|
set -e
|
|
|
|
# Start Canvas in background
|
|
cd /canvas
|
|
PORT=3000 HOSTNAME=0.0.0.0 node server.js &
|
|
CANVAS_PID=$!
|
|
|
|
# Memory v2 sidecar (built-in postgres plugin). See Dockerfile entrypoint
|
|
# comment for rationale.
|
|
#
|
|
# Spawn-gating: start the sidecar when MEMORY_PLUGIN_URL is set.
|
|
# Without it, the sidecar adds zero value and risks aborting tenant
|
|
# boot via the 30s health gate when the tenant Postgres lacks
|
|
# pgvector. Caught on staging redeploy 2026-05-05:
|
|
# pq: extension "vector" is not available
|
|
#
|
|
# Defaults (when sidecar IS spawned): MEMORY_PLUGIN_DATABASE_URL
|
|
# falls back to the tenant's DATABASE_URL.
|
|
#
|
|
# Phase A3 (#1792): MEMORY_V2_CUTOVER acceptance removed. The variable
|
|
# was deprecated by #1747 (binary stopped reading it) and only kept
|
|
# alive here as a synonym to bridge old CP user-data templates. With
|
|
# A3 dropping the entire v1 surface, the synonym is gone too. CP
|
|
# user-data sets MEMORY_PLUGIN_URL directly; if a stale template
|
|
# without that var ships, the sidecar simply doesn't start and the
|
|
# tenant boots without memory — loud but recoverable, same posture as
|
|
# any other required env missing.
|
|
MEMORY_PLUGIN_PID=""
|
|
memory_plugin_wanted=""
|
|
if [ -n "$MEMORY_PLUGIN_URL" ]; then
|
|
memory_plugin_wanted=1
|
|
fi
|
|
if [ -z "$MEMORY_PLUGIN_DISABLE" ] && [ -n "$memory_plugin_wanted" ] && [ -n "$DATABASE_URL" ]; then
|
|
# Schema isolation (issue #1733): when defaulting from the tenant
|
|
# DATABASE_URL we co-locate the plugin's tables under a dedicated
|
|
# `memory_plugin` schema so they never collide with platform-tenant
|
|
# tables in `public`. The plugin's 000_schema_bootstrap migration
|
|
# creates the schema; search_path here directs every subsequent CREATE
|
|
# TABLE / SELECT to land in it.
|
|
#
|
|
# The search_path includes `public` as a fallback so the `vector` type
|
|
# resolves regardless of which schema pgvector was installed into.
|
|
# Fresh tenants (no prior `CREATE EXTENSION vector`) install the
|
|
# extension into `memory_plugin` (first writable schema in the path),
|
|
# keeping the SSOT intent. Tenants where pgvector was already
|
|
# installed into `public` by a prior boot or operator action keep the
|
|
# extension where it is and resolve `vector(1536)` via the public
|
|
# fallback — without this fallback those tenants would crash the
|
|
# plugin boot with "type vector does not exist" once the migrations
|
|
# try to create memory_records (#1742 review finding).
|
|
#
|
|
# Operators who explicitly set MEMORY_PLUGIN_DATABASE_URL (separate DB
|
|
# entirely) keep full control — search_path is only injected when we
|
|
# default from DATABASE_URL.
|
|
if [ -z "$MEMORY_PLUGIN_DATABASE_URL" ]; then
|
|
case "$DATABASE_URL" in
|
|
*\?*) MEMORY_PLUGIN_DATABASE_URL="${DATABASE_URL}&search_path=memory_plugin,public" ;;
|
|
*) MEMORY_PLUGIN_DATABASE_URL="${DATABASE_URL}?search_path=memory_plugin,public" ;;
|
|
esac
|
|
fi
|
|
: "${MEMORY_PLUGIN_LISTEN_ADDR:=:9100}"
|
|
export MEMORY_PLUGIN_DATABASE_URL MEMORY_PLUGIN_LISTEN_ADDR
|
|
echo "memory-plugin: starting sidecar on $MEMORY_PLUGIN_LISTEN_ADDR" >&2
|
|
/memory-plugin &
|
|
MEMORY_PLUGIN_PID=$!
|
|
# Wait up to 30s for /v1/health. Boot failure is fatal so a misconfigured
|
|
# tenant crash-loops instead of silently serving cutover traffic against
|
|
# a dead plugin.
|
|
health_port=${MEMORY_PLUGIN_LISTEN_ADDR#:}
|
|
ready=0
|
|
for _ in $(seq 1 30); do
|
|
if wget -qO- --timeout=2 "http://localhost:${health_port}/v1/health" >/dev/null 2>&1; then
|
|
ready=1
|
|
break
|
|
fi
|
|
sleep 1
|
|
done
|
|
if [ "$ready" != "1" ]; then
|
|
echo "memory-plugin: ❌ /v1/health never returned 200 after 30s — aborting boot. Check DATABASE_URL reachability + pgvector extension + migrations." >&2
|
|
kill "$MEMORY_PLUGIN_PID" 2>/dev/null || true
|
|
kill "$CANVAS_PID" 2>/dev/null || true
|
|
exit 1
|
|
fi
|
|
echo "memory-plugin: ✅ sidecar healthy on :$health_port" >&2
|
|
fi
|
|
|
|
# Start Go platform in foreground-ish (we trap signals)
|
|
# CANVAS_PROXY_URL tells the platform to proxy unmatched routes to Canvas.
|
|
# CONTAINER_BACKEND: empty = Docker (default for self-hosted/local).
|
|
# Set to "flyio" via Fly machine env to use Fly Machines API instead.
|
|
export CANVAS_PROXY_URL="${CANVAS_PROXY_URL:-http://localhost:3000}"
|
|
cd /
|
|
/platform &
|
|
PLATFORM_PID=$!
|
|
|
|
# If any process exits, kill the others
|
|
cleanup() {
|
|
kill $CANVAS_PID 2>/dev/null || true
|
|
kill $PLATFORM_PID 2>/dev/null || true
|
|
[ -n "$MEMORY_PLUGIN_PID" ] && kill $MEMORY_PLUGIN_PID 2>/dev/null || true
|
|
}
|
|
trap cleanup EXIT SIGTERM SIGINT
|
|
|
|
# Wait for any to exit — whichever exits first triggers cleanup
|
|
if [ -n "$MEMORY_PLUGIN_PID" ]; then
|
|
wait -n $CANVAS_PID $PLATFORM_PID $MEMORY_PLUGIN_PID
|
|
else
|
|
wait -n $CANVAS_PID $PLATFORM_PID
|
|
fi
|
|
EXIT_CODE=$?
|
|
cleanup
|
|
exit $EXIT_CODE
|