fix(smoke): poll /health not /healthz (the route the server serves) #3121
Reference in New Issue
Block a user
Delete Branch "fix/smoke-health-path-healthz-to-health"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
CR2 found the next build-and-push smoke failure after #3120: full-env smoke gets /healthz 404 for 180s. ROOT CAUSE: the smoke polls
/healthzbut the workspace-server mounts ONLY/health(router.go:93 -> 200); there is no/healthzroute. The tenant is up + serving — it's just route-not-found on the wrong path. SMOKE-PATH bug, not a server bug./healthis canonical: the readiness canary (CanaryTenantURL), the EC2 boot probe (localhost:8080/health), and the server route all use it.Fix: both smoke variants (A FULL-ENV, B SIDECAR-DISABLED) now poll
/health. The memory-plugin sidecar check (/v1/healthon :9100) is unchanged.Unblocks: workspace-server image build -> platform-agent/concierge image rebuild.
Generated with Claude Code.
5-axis review for current head
4ca15b0375:Correctness: APPROVE. The workflow smoke probes now poll the canonical tenant
/healthroute in both variants: FULL-ENV onlocalhost:18080/healthand SIDECAR-DISABLED onlocalhost:18081/health. This matches the server/canary/EC2 readiness contract described in the PR and directly fixes the/healthz404 smoke-path failure. The memory-plugin sidecar check remains on:19100/v1/health, which is the separate plugin health endpoint and should not be changed.Robustness: APPROVE. Both smoke variants keep their existing retry budgets, fail-closed behavior, log capture, and cleanup; only the tenant health path and human-facing log strings changed.
Security: APPROVE. No auth, secret, network exposure, or runtime server behavior changes.
Performance: APPROVE. No added work; same polling cadence and timeout against the correct endpoint.
Readability/maintainability: APPROVE. Comments and error/success messages now align with the route actually being probed, reducing future RCA noise.
CI/status:
CI / all-requiredis green on the current head and the template-delivery/local contexts visible in status are green or advisory. Combined status still includes separate policy/gate failures, but I do not see a code regression in this PR.APPROVED on current head
4ca15b03.5-axis review:
Correctness: this addresses the exact post-#3120 failure mode. The workspace-server smoke was polling /healthz, but the server exposes /health; the PR updates both FULL ENV (:18080) and SIDECAR-DISABLED (:18081) smoke curls and matching log strings to /health. The memory-plugin /v1/health check on :9100 is unchanged.
Robustness: both smoke variants retain their existing retry windows, cleanup, and failure diagnostics; only the canonical platform health path changes.
Security: no secrets, auth, network exposure, or runtime privileges change.
Performance: no runtime impact; CI-only polling path fix.
Readability: comments and notices now match the actual route being checked.