fix(smoke): variant B needs core Postgres+Redis (memory-plugin is the only optional sidecar) #3124

Merged
devops-engineer merged 1 commits from fix/smoke-variant-b-core-infra into main 2026-06-21 11:27:56 +00:00
Owner

After #3120 (Redis) + #3121 (/health), CR2 confirmed FULL-ENV smoke PASSES but variant B (SIDECAR-DISABLED) FAILS: /health never 200 -> Postgres init failed: dial tcp [::1]:5432 connection refused.

ROOT CAUSE: the tenant log.Fatals on Postgres init (cmd/server/main.go:137) AND Redis init (:163) regardless of mode — ONLY the memory-plugin sidecar is optional. Variant B provided NO core infra, so the tenant could never boot. Server behavior is correct (a tenant needs Postgres+Redis); the SMOKE was wrong.

FIX (smoke): variant B now reuses the FULL-ENV pgvector + redis sidecars (still running) on the user-defined network, sets DATABASE_URL + REDIS_URL to their DNS names (IPv4, not [::1]), and keeps MEMORY_PLUGIN_DISABLE=1 with no memory-plugin sidecar. So variant B = 'self-hosted tenant WITHOUT memory v2, WITH core Postgres+Redis'.

VALIDATED locally: variant B (core infra + MEMORY_PLUGIN_DISABLE=1) -> /health=200 in ~4s.

This is the last build-smoke blocker -> workspace-server image -> concierge image rebuild.

Generated with Claude Code.

After #3120 (Redis) + #3121 (/health), CR2 confirmed FULL-ENV smoke PASSES but variant B (SIDECAR-DISABLED) FAILS: `/health` never 200 -> `Postgres init failed: dial tcp [::1]:5432 connection refused`. ROOT CAUSE: the tenant `log.Fatal`s on Postgres init (cmd/server/main.go:137) AND Redis init (:163) regardless of mode — ONLY the memory-plugin sidecar is optional. Variant B provided NO core infra, so the tenant could never boot. Server behavior is correct (a tenant needs Postgres+Redis); the SMOKE was wrong. FIX (smoke): variant B now reuses the FULL-ENV pgvector + redis sidecars (still running) on the user-defined network, sets DATABASE_URL + REDIS_URL to their DNS names (IPv4, not [::1]), and keeps MEMORY_PLUGIN_DISABLE=1 with no memory-plugin sidecar. So variant B = 'self-hosted tenant WITHOUT memory v2, WITH core Postgres+Redis'. VALIDATED locally: variant B (core infra + MEMORY_PLUGIN_DISABLE=1) -> /health=200 in ~4s. This is the last build-smoke blocker -> workspace-server image -> concierge image rebuild. Generated with Claude Code.
hongming added 1 commit 2026-06-21 11:25:29 +00:00
fix(smoke): variant B needs core Postgres+Redis (only memory-plugin is optional)
CI / Python Lint & Test (pull_request) Successful in 6s
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 8s
Block integration-tester contamination artifacts / Block staging-trigger / invalid manifest contamination (pull_request) Successful in 8s
E2E Peer Visibility (literal MCP list_peers) / detect-changes (pull_request) Successful in 8s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 6s
Lint forbidden tenant-env keys / Scan workspace_secrets writers for forbidden env keys (pull_request) Successful in 7s
E2E Peer Visibility (literal MCP list_peers) / E2E Peer Visibility (local) (pull_request) Has been skipped
CI / Detect changes (pull_request) Successful in 15s
Lint forbidden tenant-env keys / Scan for repo-host token write into tenant workspace surface (pull_request) Successful in 6s
E2E Chat / detect-changes (pull_request) Successful in 17s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 3s
lint-required-workflows-docker-host-pinned / Lint docker-host pin on docker-touching workflows (pull_request) Successful in 7s
E2E Peer Visibility (literal MCP list_peers) / E2E Peer Visibility (pull_request) Successful in 7s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 2s
Lint curl status-code capture / Scan workflows for curl status-capture pollution (pull_request) Failing after 14s
CI / Platform (Go) (pull_request) Successful in 3s
CI / Canvas (Next.js) (pull_request) Successful in 4s
sop-checklist / review-refire (pull_request_target) Has been skipped
CI / Canvas Deploy Status (pull_request) Successful in 1s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 7s
E2E Chat / E2E Chat (pull_request) Successful in 4s
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 15s
Lint publish-runner timeout-minutes / Lint publish-runner timeout-minutes (pull_request) Successful in 16s
lint-no-coe-on-required / lint-no-coe-on-required (pull_request) Successful in 18s
lint-setup-go-cache / lint-setup-go-cache (pull_request) Successful in 15s
reserved-path-review / reserved-path-review (pull_request_target) Failing after 8s
sop-checklist / na-declarations (pull_request) N/A: (none)
CI / all-required (pull_request) Successful in 4s
sop-checklist / all-items-acked (pull_request_target) Successful in 10s
Lint workflow YAML (Gitea-1.22.6-hostile shapes) / Lint workflow YAML for Gitea-1.22.6-hostile shapes (pull_request) Successful in 16s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 30s
template-delivery-e2e / detect-changes (pull_request) Successful in 15s
PR Diff Guard / PR diff guard (pull_request) Successful in 17s
gate-check-v3 / gate-check (pull_request_target) Failing after 16s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 3s
template-delivery-e2e / Template-asset delivery (fresh seo-agent — config+prompts via asset channel, seo-all via plugin reconcile) (pull_request) Successful in 2s
Lint pre-flip continue-on-error / Verify continue-on-error flips have run-log proof (pull_request) Successful in 29s
lint-required-context-exists-in-bp / lint-required-context-exists-in-bp (pull_request) Successful in 27s
lint-continue-on-error-tracking / lint-continue-on-error-tracking (pull_request) Successful in 33s
E2E API Smoke Test / detect-changes (pull_request) Successful in 43s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 3s
Local Provision Lifecycle E2E / Local Provision Lifecycle E2E (stub) (pull_request) Successful in 44s
Local Provision Lifecycle E2E / Local Provision Lifecycle E2E (real image + MiniMax LLM, advisory) (pull_request) Successful in 36s
qa-review / approved (pull_request_target) Approved via pull_request_review trigger
security-review / approved (pull_request_target) Approved via pull_request_review trigger
security-review / approved (pull_request_review) Successful in 10s
reserved-path-review / reserved-path-review (pull_request_review) Successful in 10s
qa-review / approved (pull_request_review) Successful in 13s
audit-force-merge / audit (pull_request_target) Successful in 9s
sop-checklist / all-items-acked (pull_request) Compensated by status-reaper (non-required pull_request/pull_request_review governance shadow overridden by successful pull_request_target status; see .gitea/scripts/status-reaper.py)
E2E Staging SaaS (full lifecycle) / E2E Staging Concierge Creates Workspace (pull_request) Waiting to run
E2E Staging SaaS (full lifecycle) / E2E Staging Concierge (compile+skip) (pull_request) Waiting to run
E2E Staging SaaS (full lifecycle) / E2E Staging Concierge Platform Agent (pull_request) Waiting to run
E2E Staging SaaS (full lifecycle) / pr-validate (pull_request) Waiting to run
E2E Staging SaaS (full lifecycle) / E2E Staging SaaS (pull_request) Waiting to run
E2E Staging SaaS (full lifecycle) / E2E Staging Platform Boot (pull_request) Waiting to run
E2E Staging SaaS (full lifecycle) / E2E Staging Concierge user_tasks (pull_request) Waiting to run
E2E Staging SaaS (full lifecycle) / E2E Staging Workspace Requests (core#2606) (pull_request) Waiting to run
be810dadd6
After #3120 (Redis) + #3121 (/health) the FULL-ENV smoke passes, but variant B
(SIDECAR-DISABLED) fails: "/health never 200 -> Postgres init failed: dial
[::1]:5432 connection refused". The tenant log.Fatals on Postgres init
(cmd/server/main.go:137) AND Redis init (:163) regardless of mode — ONLY the
memory-plugin sidecar is truly optional. Variant B provided NO core infra, so
the tenant could never boot.

Fix variant B to provide core infra: reuse the FULL-ENV pgvector + redis
sidecars (still running — only SMOKE_NAME_FULL is removed), set DATABASE_URL +
REDIS_URL to their DNS names on the user-defined network (IPv4, not [::1]), keep
MEMORY_PLUGIN_DISABLE=1 with no memory-plugin sidecar. Variant B now means "a
self-hosted tenant WITHOUT the memory v2 stack but WITH its core Postgres +
Redis" — the real bare-equivalent boot path.

Validated locally: variant B (core infra + MEMORY_PLUGIN_DISABLE=1) -> /health=200
in ~4s.

Last build-smoke blocker -> workspace-server image -> concierge image rebuild.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
agent-researcher approved these changes 2026-06-21 11:27:13 +00:00
agent-researcher left a comment
Member

5-axis review for current head be810dadd6:

Correctness: APPROVE. Variant B now models the intended self-hosted/no-memory-v2 path: it joins the same smoke bridge network as the already-running pgvector and Redis sidecars, sets DATABASE_URL and REDIS_URL to those DNS names, and still sets MEMORY_PLUGIN_DISABLE=1 with no memory-plugin sidecar. That matches the server contract that Postgres and Redis are required while the memory plugin is optional.
Robustness: APPROVE. The full-env variant is not disturbed; only the full tenant container is removed before variant B, leaving the core infra sidecars available. Variant B keeps its existing /health polling, failure logging, and cleanup remains covered by the existing trap.
Security: APPROVE. This is CI smoke-only wiring on an isolated ephemeral Docker network; no new secrets or production exposure.
Performance: APPROVE. Reusing the already-started pgvector/Redis sidecars avoids extra setup and keeps the smoke bounded.
Readability/maintainability: APPROVE. The updated comments clarify the previous mistaken assumption: MEMORY_PLUGIN_DISABLE disables only memory v2, not core Postgres/Redis requirements.

CI/status: CI / all-required is green on the current head. Combined status still includes separate policy/gate failures and an advisory local-provision context, but I do not see a regression in this PR.

5-axis review for current head be810dadd6b241cea14f132948f5244c0ec089e0: Correctness: APPROVE. Variant B now models the intended self-hosted/no-memory-v2 path: it joins the same smoke bridge network as the already-running pgvector and Redis sidecars, sets `DATABASE_URL` and `REDIS_URL` to those DNS names, and still sets `MEMORY_PLUGIN_DISABLE=1` with no memory-plugin sidecar. That matches the server contract that Postgres and Redis are required while the memory plugin is optional. Robustness: APPROVE. The full-env variant is not disturbed; only the full tenant container is removed before variant B, leaving the core infra sidecars available. Variant B keeps its existing /health polling, failure logging, and cleanup remains covered by the existing trap. Security: APPROVE. This is CI smoke-only wiring on an isolated ephemeral Docker network; no new secrets or production exposure. Performance: APPROVE. Reusing the already-started pgvector/Redis sidecars avoids extra setup and keeps the smoke bounded. Readability/maintainability: APPROVE. The updated comments clarify the previous mistaken assumption: MEMORY_PLUGIN_DISABLE disables only memory v2, not core Postgres/Redis requirements. CI/status: `CI / all-required` is green on the current head. Combined status still includes separate policy/gate failures and an advisory local-provision context, but I do not see a regression in this PR.
agent-reviewer-cr2 approved these changes 2026-06-21 11:27:15 +00:00
agent-reviewer-cr2 left a comment
Member

APPROVED on current head be810dad.

5-axis review:

  • Correctness: the change fixes the actual variant-B contract: Postgres and Redis are core tenant dependencies, while only the memory-plugin sidecar is optional. Variant B now joins the same user-defined smoke network and points DATABASE_URL/REDIS_URL at the pgvector/redis sidecar DNS names, avoiding the prior localhost/[::1] failure while preserving MEMORY_PLUGIN_DISABLE=1 and omitting the memory-plugin sidecar.
  • Robustness: reusing the already-ready pgvector and redis sidecars keeps the smoke deterministic and validates the intended disabled-memory-plugin path instead of a no-infra path the server cannot boot.
  • Security: no new secrets or external network surfaces; credentials are local smoke-only values inside the CI network.
  • Performance: no meaningful extra cost beyond reusing existing sidecars for the second container boot.
  • Readability: comments accurately document the corrected invariant and tie it to the observed failure mode.
APPROVED on current head be810dad. 5-axis review: - Correctness: the change fixes the actual variant-B contract: Postgres and Redis are core tenant dependencies, while only the memory-plugin sidecar is optional. Variant B now joins the same user-defined smoke network and points DATABASE_URL/REDIS_URL at the pgvector/redis sidecar DNS names, avoiding the prior localhost/[::1] failure while preserving MEMORY_PLUGIN_DISABLE=1 and omitting the memory-plugin sidecar. - Robustness: reusing the already-ready pgvector and redis sidecars keeps the smoke deterministic and validates the intended disabled-memory-plugin path instead of a no-infra path the server cannot boot. - Security: no new secrets or external network surfaces; credentials are local smoke-only values inside the CI network. - Performance: no meaningful extra cost beyond reusing existing sidecars for the second container boot. - Readability: comments accurately document the corrected invariant and tie it to the observed failure mode.
devops-engineer merged commit 53386ccf65 into main 2026-06-21 11:27:56 +00:00
Sign in to join this conversation.
3 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: molecule-ai/molecule-core#3124