ci(tenant-image): add Redis sidecar to FULL ENV smoke (CR2 RCA from job 538500) #3120
Reference in New Issue
Block a user
Delete Branch "fix/p0-sev-smoke-gate-add-redis"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
CR2 RCA — full-env smoke needs Redis
CR2 pulled the
publish-workspace-server-image.ymllog for job 538500 step 8 (the tenant full-env smoke) and found that the FULL ENV variant never reaches/healthz=200because Redis init fails connecting to[::1]:6379(memory-plugin was healthy, pgvector was fine).The workspace-server's
cmd/server/main.go:161callsdb.InitRedis(REDIS_URL)andlog.Fatalfon failure → tenant exits 1 BEFORE serving/healthz. With runtime v0.3.43 from PR #161, the workspace-server startup path requires Redis (events broadcaster + health-sweep subscribe).The bare-equivalent smoke (
248c7f52) didn't need Redis because variant (B) short-circuits the same boot path withMEMORY_PLUGIN_DISABLE=1. The FULL ENV variant didn't, until now.Fix
Added a Redis sidecar to the FULL ENV smoke (same pattern as the existing pgvector sidecar):
redis:7-alpineimage booted in the user-defined bridge network alongside pgvector--bind 0.0.0.0 --protected-mode noso go-redis can connect via the user-defined bridge network's DNS-resolved hostname. The[::1]:6379IPv6 bind was the exact failure mode CR2's job 538500 log named — fixed by the explicit--bind 0.0.0.0--save "" --appendonly noto skip disk persistence (smoke data is throwaway)redis-cli ... PING(poll until PONG, 30s budget)-e REDIS_URL="redis://${REDIS_NAME}:6379/0"REDIS_NAMEDiff
+61 / -4lines in.gitea/workflows/publish-workspace-server-image.yml(single file).Verified
bash -nclean--bind 0.0.0.0(no[::1]IPv6) ✓Why this is critical path
PM's dispatch: "THIS IS CRITICAL PATH for Concierge going live."
molecule-ai/molecule-ai-platform-agentconcierge imagestart_platformboot path fails)Test plan
bash -ncleanRollback
Single-file revert is safe:
git revert d6b5ef1fremoves the Redis sidecar block. The previous (248c7f52) FULL ENV variant without Redis returns; it's still directionally right but won't catch the Redis class of failure.Refs: CR2 RCA via job 538500 step 8 log pull, PM dispatch, PR #3111, runtime v0.3.43 from PR #161.
🤖 Generated with Claude Code
5-axis review for current head
d6b5ef1f8f:Correctness: APPROVE. The full-env smoke now starts a Redis sidecar on the same user-defined bridge network as the tenant, waits for
redis-cli PING, and passesREDIS_URL=redis://${REDIS_NAME}:6379/0into the full tenant run. Binding Redis to0.0.0.0with protected mode disabled is the right container-network fix for the prior[::1]:6379failure, and this should let workspace-server completeInitRedisand serve/healthz.Robustness: APPROVE. Readiness is bounded and fail-closed; cleanup now removes the Redis sidecar as well as tenant/pgvector/network. The existing pgvector and memory-plugin wiring remain unchanged, and the bare smoke path is not given
REDIS_URLand remains covered by its existingMEMORY_PLUGIN_DISABLE=1behavior.Security: APPROVE. Redis is scoped to the ephemeral smoke bridge network and throwaway test data; no secret or production exposure is added.
Performance: APPROVE.
redis:7-alpineis small and only used in the publish smoke; bounded readiness avoids long hangs.Readability/maintainability: APPROVE. The workflow comments identify the P0 failure mode and the sidecar mirrors the existing pgvector pattern.
CI/status:
CI / all-requiredis green on the current head, and template-delivery/local smoke contexts are green. Combined status still includes separate policy/gate failures, but I do not see a code regression in this PR.APPROVED on current head
d6b5ef1f.5-axis review:
Correctness: the fix addresses the exact job 538500 failure mode. The FULL ENV smoke now starts Redis on the same user-defined bridge network as pgvector and the tenant, waits for PONG, and passes REDIS_URL=redis://${REDIS_NAME}:6379/0 so the tenant resolves Redis by container DNS instead of falling back to [::1]:6379. Binding Redis to 0.0.0.0 with protected-mode off is appropriate for this isolated CI-only smoke network.
Robustness: cleanup covers Redis; readiness is explicit; pgvector/memory-plugin FULL ENV flow remains intact; the SIDECAR-DISABLED variant is not changed.
Security: Redis is only exposed inside the ephemeral smoke bridge network, with no host port published; protected-mode is disabled only in that isolated CI container.
Performance: small CI-only sidecar/startup overhead; no product runtime impact.
Readability: comments document the RCA and why REDIS_URL plus non-loopback bind are required.