feat(memory-v2): bundle memory-plugin-postgres as in-image sidecar

Closes the gap between the merged Memory v2 code (PR #2757 wired the
client into main.go) and operator activation. Without this PR an
operator wanting to flip MEMORY_V2_CUTOVER=true had to provision a
separate memory-plugin service and point MEMORY_PLUGIN_URL at it —
extra ops surface for what the design intends to be a built-in.

What ships:
  * Both Dockerfile + Dockerfile.tenant build the
    cmd/memory-plugin-postgres binary into /memory-plugin.
  * Entrypoints spawn the plugin in the background on :9100 BEFORE
    starting the main server; wait up to 30s for /v1/health to return
    200; abort boot loud if it doesn't (better to crash-loop than to
    silently route cutover traffic against a dead plugin).
  * Default env: MEMORY_PLUGIN_DATABASE_URL=$DATABASE_URL (share the
    existing tenant Postgres — plugin's `memory_namespaces` /
    `memory_records` tables coexist with platform schema, no
    conflicts), MEMORY_PLUGIN_LISTEN_ADDR=:9100.
  * MEMORY_PLUGIN_DISABLE=1 escape hatch for operators running the
    plugin externally on a separate host.
  * Platform image: plugin runs as the `platform` user (not root) via
    su-exec — matches the privilege boundary the main server already
    drops to. Tenant image already starts as `canvas` so the plugin
    inherits non-root automatically.

What stays operator-controlled:
  * MEMORY_V2_CUTOVER is NOT auto-set. Behavior change for existing
    deployments: zero. The wiring at workspace-server/internal/memory/
    wiring/wiring.go skips building the plugin client until the
    operator opts in, so the running sidecar is a no-op for traffic
    until then.
  * MEMORY_PLUGIN_URL is NOT auto-set either, for the same reason —
    setting it implies cutover-active intent. Operators set both on
    staging first, verify a live commit/recall round-trip (closes
    pending task #292), then promote to production.

Operator activation steps after this PR ships:
  1. Verify pgvector extension is available on the target Postgres
     (the plugin's first migration runs CREATE EXTENSION IF NOT
     EXISTS vector). Railway's managed Postgres ships pgvector
     available; some self-hosted operators may need to enable it.
  2. Redeploy the workspace-server with this image.
  3. Set MEMORY_PLUGIN_URL=http://localhost:9100 + MEMORY_V2_CUTOVER=true
     in the environment (staging first).
  4. Watch boot logs for "memory-plugin:  sidecar healthy" and the
     wiring.go cutover messages; do a live commit_memory + recall_memory
     round-trip via the canvas Memory tab to verify.
  5. Promote to production once staging holds for a sweep window.

Refs RFC #2728. Closes the dormant-plugin gap noted in task #294.
This commit is contained in:
Hongming Wang 2026-05-05 11:10:11 -07:00
parent 9bd2a2c45f
commit b89a49ec93
3 changed files with 111 additions and 6 deletions

View File

@ -21,6 +21,14 @@ ARG GIT_SHA=dev
RUN CGO_ENABLED=0 GOOS=linux go build \
-ldflags "-X github.com/Molecule-AI/molecule-monorepo/platform/internal/buildinfo.GitSHA=${GIT_SHA}" \
-o /platform ./cmd/server
# Bundle the built-in memory-plugin-postgres binary so an operator can
# activate Memory v2 by setting MEMORY_V2_CUTOVER=true + (default)
# MEMORY_PLUGIN_URL=http://localhost:9100. The entrypoint starts this
# binary in the background; main /platform talks to it over loopback.
# Stays inert until the operator flips the cutover env var.
RUN CGO_ENABLED=0 GOOS=linux go build \
-ldflags "-X github.com/Molecule-AI/molecule-monorepo/platform/internal/buildinfo.GitSHA=${GIT_SHA}" \
-o /memory-plugin ./cmd/memory-plugin-postgres
# Clone templates + plugins at build time from manifest.json
FROM alpine:3.20 AS templates
@ -30,8 +38,9 @@ COPY scripts/clone-manifest.sh /scripts/clone-manifest.sh
RUN chmod +x /scripts/clone-manifest.sh && /scripts/clone-manifest.sh /manifest.json /workspace-configs-templates /org-templates /plugins
FROM alpine:3.20
RUN apk add --no-cache ca-certificates git tzdata
RUN apk add --no-cache ca-certificates git tzdata wget
COPY --from=builder /platform /platform
COPY --from=builder /memory-plugin /memory-plugin
COPY workspace-server/migrations /migrations
COPY --from=templates /workspace-configs-templates /workspace-configs-templates
COPY --from=templates /org-templates /org-templates
@ -41,6 +50,7 @@ RUN addgroup -g 1000 platform && adduser -u 1000 -G platform -s /bin/sh -D platf
EXPOSE 8080
COPY <<'ENTRY' /entrypoint.sh
#!/bin/sh
# Set up docker-socket group (unchanged from pre-sidecar entrypoint).
if [ -S /var/run/docker.sock ]; then
SOCK_GID=$(stat -c '%g' /var/run/docker.sock 2>/dev/null || stat -f '%g' /var/run/docker.sock 2>/dev/null)
if [ -n "$SOCK_GID" ] && [ "$SOCK_GID" != "0" ]; then
@ -50,6 +60,52 @@ if [ -S /var/run/docker.sock ]; then
addgroup platform root 2>/dev/null || true
fi
fi
# Memory v2 sidecar (built-in postgres plugin). Co-located with the
# main server so operators flipping MEMORY_V2_CUTOVER=true don't need
# to provision a separate service. Stays inert at the protocol layer
# until that env var is set — the workspace-server's wiring.go skips
# building the client without MEMORY_PLUGIN_URL, so the running plugin
# is a no-op for traffic.
#
# Env defaults:
# MEMORY_PLUGIN_DATABASE_URL = $DATABASE_URL (share existing Postgres;
# plugin's `memory_namespaces` / `memory_records` tables coexist
# with `agent_memories` and the rest of the platform schema —
# no conflicts. Operator can override with a separate URL.)
# MEMORY_PLUGIN_LISTEN_ADDR = :9100
#
# Set MEMORY_PLUGIN_DISABLE=1 to skip launching the sidecar entirely
# (e.g. an operator running the plugin externally on a separate host).
if [ -z "$MEMORY_PLUGIN_DISABLE" ] && [ -n "$DATABASE_URL" ]; then
: "${MEMORY_PLUGIN_DATABASE_URL:=$DATABASE_URL}"
: "${MEMORY_PLUGIN_LISTEN_ADDR:=:9100}"
export MEMORY_PLUGIN_DATABASE_URL MEMORY_PLUGIN_LISTEN_ADDR
echo "memory-plugin: starting sidecar on $MEMORY_PLUGIN_LISTEN_ADDR" >&2
# Drop privs to the platform user — the plugin doesn't need root and
# runs unprivileged elsewhere (tenant image already starts as canvas).
su-exec platform /memory-plugin &
MEMORY_PLUGIN_PID=$!
# Wait up to 30s for the plugin's /v1/health to return 200. Boot
# failure here is fatal — better to crash-loop than to silently
# serve cutover traffic against a dead plugin.
health_port=${MEMORY_PLUGIN_LISTEN_ADDR#:}
ready=0
for _ in $(seq 1 30); do
if wget -qO- --timeout=2 "http://localhost:${health_port}/v1/health" >/dev/null 2>&1; then
ready=1
break
fi
sleep 1
done
if [ "$ready" != "1" ]; then
echo "memory-plugin: ❌ /v1/health never returned 200 after 30s — aborting boot. Check that DATABASE_URL is reachable, has the pgvector extension, and the plugin's migrations applied." >&2
kill "$MEMORY_PLUGIN_PID" 2>/dev/null || true
exit 1
fi
echo "memory-plugin: ✅ sidecar healthy on :$health_port" >&2
fi
exec su-exec platform /platform "$@"
ENTRY
RUN chmod +x /entrypoint.sh && apk add --no-cache su-exec

View File

@ -34,6 +34,13 @@ ARG GIT_SHA=dev
RUN CGO_ENABLED=0 GOOS=linux go build \
-ldflags "-X github.com/Molecule-AI/molecule-monorepo/platform/internal/buildinfo.GitSHA=${GIT_SHA}" \
-o /platform ./cmd/server
# Memory v2 sidecar binary (Memory v2 #2728). Bundled so an operator
# can activate cutover by flipping MEMORY_V2_CUTOVER=true without
# provisioning a separate service. See entrypoint-tenant.sh for the
# launch logic.
RUN CGO_ENABLED=0 GOOS=linux go build \
-ldflags "-X github.com/Molecule-AI/molecule-monorepo/platform/internal/buildinfo.GitSHA=${GIT_SHA}" \
-o /memory-plugin ./cmd/memory-plugin-postgres
# ── Stage 2: Canvas Next.js standalone ────────────────────────────────
FROM node:20-alpine AS canvas-builder
@ -74,8 +81,9 @@ RUN deluser --remove-home node 2>/dev/null || true; \
delgroup node 2>/dev/null || true; \
addgroup -g 1000 canvas && adduser -u 1000 -G canvas -s /bin/sh -D canvas
# Go platform binary
# Go platform binary + Memory v2 sidecar
COPY --from=go-builder /platform /platform
COPY --from=go-builder /memory-plugin /memory-plugin
COPY workspace-server/migrations /migrations
# Templates + plugins (cloned from GitHub in stage 3)
@ -91,7 +99,7 @@ COPY --from=canvas-builder /canvas/public ./public
COPY workspace-server/entrypoint-tenant.sh /entrypoint.sh
RUN chmod +x /entrypoint.sh && \
chown -R canvas:canvas /canvas /platform /migrations
chown -R canvas:canvas /canvas /platform /memory-plugin /migrations
EXPOSE 8080
# entrypoint.sh starts as root to fix volume perms, then drops to

View File

@ -20,6 +20,42 @@ cd /canvas
PORT=3000 HOSTNAME=0.0.0.0 node server.js &
CANVAS_PID=$!
# Memory v2 sidecar (built-in postgres plugin). See Dockerfile entrypoint
# comment for rationale. Stays inert at the protocol layer until the
# operator sets MEMORY_V2_CUTOVER=true; running it is cheap.
#
# Defaults the plugin's DATABASE_URL to the tenant's DATABASE_URL so
# operators don't need to configure two of them. Plugin tables coexist
# with the platform schema.
MEMORY_PLUGIN_PID=""
if [ -z "$MEMORY_PLUGIN_DISABLE" ] && [ -n "$DATABASE_URL" ]; then
: "${MEMORY_PLUGIN_DATABASE_URL:=$DATABASE_URL}"
: "${MEMORY_PLUGIN_LISTEN_ADDR:=:9100}"
export MEMORY_PLUGIN_DATABASE_URL MEMORY_PLUGIN_LISTEN_ADDR
echo "memory-plugin: starting sidecar on $MEMORY_PLUGIN_LISTEN_ADDR" >&2
/memory-plugin &
MEMORY_PLUGIN_PID=$!
# Wait up to 30s for /v1/health. Boot failure is fatal so a misconfigured
# tenant crash-loops instead of silently serving cutover traffic against
# a dead plugin.
health_port=${MEMORY_PLUGIN_LISTEN_ADDR#:}
ready=0
for _ in $(seq 1 30); do
if wget -qO- --timeout=2 "http://localhost:${health_port}/v1/health" >/dev/null 2>&1; then
ready=1
break
fi
sleep 1
done
if [ "$ready" != "1" ]; then
echo "memory-plugin: ❌ /v1/health never returned 200 after 30s — aborting boot. Check DATABASE_URL reachability + pgvector extension + migrations." >&2
kill "$MEMORY_PLUGIN_PID" 2>/dev/null || true
kill "$CANVAS_PID" 2>/dev/null || true
exit 1
fi
echo "memory-plugin: ✅ sidecar healthy on :$health_port" >&2
fi
# Start Go platform in foreground-ish (we trap signals)
# CANVAS_PROXY_URL tells the platform to proxy unmatched routes to Canvas.
# CONTAINER_BACKEND: empty = Docker (default for self-hosted/local).
@ -29,15 +65,20 @@ cd /
/platform &
PLATFORM_PID=$!
# If either process exits, kill the other
# If any process exits, kill the others
cleanup() {
kill $CANVAS_PID 2>/dev/null || true
kill $PLATFORM_PID 2>/dev/null || true
[ -n "$MEMORY_PLUGIN_PID" ] && kill $MEMORY_PLUGIN_PID 2>/dev/null || true
}
trap cleanup EXIT SIGTERM SIGINT
# Wait for either to exit — whichever exits first triggers cleanup
wait -n $CANVAS_PID $PLATFORM_PID
# Wait for any to exit — whichever exits first triggers cleanup
if [ -n "$MEMORY_PLUGIN_PID" ]; then
wait -n $CANVAS_PID $PLATFORM_PID $MEMORY_PLUGIN_PID
else
wait -n $CANVAS_PID $PLATFORM_PID
fi
EXIT_CODE=$?
cleanup
exit $EXIT_CODE