fix(smoke): poll /health not /healthz (the route the server serves) #3121

Merged
devops-engineer merged 1 commits from fix/smoke-health-path-healthz-to-health into main 2026-06-21 11:03:22 +00:00
Owner

CR2 found the next build-and-push smoke failure after #3120: full-env smoke gets /healthz 404 for 180s. ROOT CAUSE: the smoke polls /healthz but the workspace-server mounts ONLY /health (router.go:93 -> 200); there is no /healthz route. The tenant is up + serving — it's just route-not-found on the wrong path. SMOKE-PATH bug, not a server bug.

/health is canonical: the readiness canary (CanaryTenantURL), the EC2 boot probe (localhost:8080/health), and the server route all use it.

Fix: both smoke variants (A FULL-ENV, B SIDECAR-DISABLED) now poll /health. The memory-plugin sidecar check (/v1/health on :9100) is unchanged.

Unblocks: workspace-server image build -> platform-agent/concierge image rebuild.

Generated with Claude Code.

CR2 found the next build-and-push smoke failure after #3120: full-env smoke gets /healthz 404 for 180s. ROOT CAUSE: the smoke polls `/healthz` but the workspace-server mounts ONLY `/health` (router.go:93 -> 200); there is no `/healthz` route. The tenant is up + serving — it's just route-not-found on the wrong path. SMOKE-PATH bug, not a server bug. `/health` is canonical: the readiness canary (CanaryTenantURL), the EC2 boot probe (localhost:8080/health), and the server route all use it. Fix: both smoke variants (A FULL-ENV, B SIDECAR-DISABLED) now poll `/health`. The memory-plugin sidecar check (`/v1/health` on :9100) is unchanged. Unblocks: workspace-server image build -> platform-agent/concierge image rebuild. Generated with Claude Code.
hongming added 1 commit 2026-06-21 11:01:01 +00:00
fix(smoke): poll /health not /healthz (the route the server actually serves)
CI / Python Lint & Test (pull_request) Successful in 5s
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 7s
Block integration-tester contamination artifacts / Block staging-trigger / invalid manifest contamination (pull_request) Successful in 7s
E2E Peer Visibility (literal MCP list_peers) / detect-changes (pull_request) Successful in 13s
Lint forbidden tenant-env keys / Scan for repo-host token write into tenant workspace surface (pull_request) Successful in 8s
Lint forbidden tenant-env keys / Scan workspace_secrets writers for forbidden env keys (pull_request) Successful in 9s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 16s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 14s
Lint curl status-code capture / Scan workflows for curl status-capture pollution (pull_request) Failing after 14s
E2E API Smoke Test / detect-changes (pull_request) Successful in 20s
E2E Chat / detect-changes (pull_request) Successful in 20s
Lint publish-runner timeout-minutes / Lint publish-runner timeout-minutes (pull_request) Successful in 15s
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 15s
sop-checklist / review-refire (pull_request_target) Has been skipped
lint-no-coe-on-required / lint-no-coe-on-required (pull_request) Successful in 20s
E2E Peer Visibility (literal MCP list_peers) / E2E Peer Visibility (local) (pull_request) Has been skipped
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 9s
lint-setup-go-cache / lint-setup-go-cache (pull_request) Successful in 13s
lint-required-workflows-docker-host-pinned / Lint docker-host pin on docker-touching workflows (pull_request) Successful in 15s
reserved-path-review / reserved-path-review (pull_request_target) Failing after 8s
Lint workflow YAML (Gitea-1.22.6-hostile shapes) / Lint workflow YAML for Gitea-1.22.6-hostile shapes (pull_request) Successful in 16s
CI / Detect changes (pull_request) Successful in 34s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 3s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 3s
sop-checklist / na-declarations (pull_request) N/A: (none)
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 3s
sop-checklist / all-items-acked (pull_request_target) Successful in 10s
E2E Chat / E2E Chat (pull_request) Successful in 4s
E2E Peer Visibility (literal MCP list_peers) / E2E Peer Visibility (pull_request) Successful in 7s
Lint pre-flip continue-on-error / Verify continue-on-error flips have run-log proof (pull_request) Successful in 29s
lint-required-context-exists-in-bp / lint-required-context-exists-in-bp (pull_request) Successful in 28s
gate-check-v3 / gate-check (pull_request_target) Failing after 17s
CI / Platform (Go) (pull_request) Successful in 3s
template-delivery-e2e / detect-changes (pull_request) Successful in 18s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 2s
CI / Canvas (Next.js) (pull_request) Successful in 2s
CI / Canvas Deploy Status (pull_request) Successful in 1s
PR Diff Guard / PR diff guard (pull_request) Successful in 22s
template-delivery-e2e / Template-asset delivery (fresh seo-agent — config+prompts via asset channel, seo-all via plugin reconcile) (pull_request) Successful in 2s
lint-continue-on-error-tracking / lint-continue-on-error-tracking (pull_request) Successful in 39s
CI / all-required (pull_request) Successful in 4s
Local Provision Lifecycle E2E / Local Provision Lifecycle E2E (stub) (pull_request) Successful in 33s
Local Provision Lifecycle E2E / Local Provision Lifecycle E2E (real image + MiniMax LLM, advisory) (pull_request) Successful in 37s
security-review / approved (pull_request_target) Approved via pull_request_review trigger
reserved-path-review / reserved-path-review (pull_request_review) Successful in 9s
qa-review / approved (pull_request_target) Approved via pull_request_review trigger
security-review / approved (pull_request_review) Successful in 10s
qa-review / approved (pull_request_review) Successful in 11s
sop-checklist / all-items-acked (pull_request) Compensated by status-reaper (non-required pull_request/pull_request_review governance shadow overridden by successful pull_request_target status; see .gitea/scripts/status-reaper.py)
audit-force-merge / audit (pull_request_target) Successful in 8s
E2E Staging SaaS (full lifecycle) / pr-validate (pull_request) Blocked by required conditions
E2E Staging SaaS (full lifecycle) / E2E Staging SaaS (pull_request) Blocked by required conditions
E2E Staging SaaS (full lifecycle) / E2E Staging Platform Boot (pull_request) Blocked by required conditions
E2E Staging SaaS (full lifecycle) / E2E Staging Concierge user_tasks (pull_request) Blocked by required conditions
E2E Staging SaaS (full lifecycle) / E2E Staging Workspace Requests (core#2606) (pull_request) Blocked by required conditions
E2E Staging SaaS (full lifecycle) / E2E Staging Concierge Creates Workspace (pull_request) Blocked by required conditions
E2E Staging SaaS (full lifecycle) / E2E Staging Concierge (compile+skip) (pull_request) Blocked by required conditions
E2E Staging SaaS (full lifecycle) / E2E Staging Concierge Platform Agent (pull_request) Blocked by required conditions
4ca15b0375
The publish-workspace-server-image build-and-push smoke gate polled
http://localhost:<port>/healthz, but the workspace-server mounts ONLY /health
(router.go:93 -> 200 {"status":"ok"}); there is no /healthz route. So the smoke
got 404 for 180s even though the tenant was up + serving — failing the build and
blocking the platform-agent/concierge image rebuild. (#3120 fixed the Redis
init; this is the next, distinct failure CR2 found.)

/health is canonical: the readiness canary (CanaryTenantURL), the EC2 boot probe
(localhost:8080/health), and the server route all use /health.

Change BOTH smoke variants (A: FULL ENV, B: SIDECAR-DISABLED) to poll /health.
The memory-plugin sidecar check (/v1/health on :9100) is unchanged.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
agent-researcher approved these changes 2026-06-21 11:02:52 +00:00
agent-researcher left a comment
Member

5-axis review for current head 4ca15b0375:

Correctness: APPROVE. The workflow smoke probes now poll the canonical tenant /health route in both variants: FULL-ENV on localhost:18080/health and SIDECAR-DISABLED on localhost:18081/health. This matches the server/canary/EC2 readiness contract described in the PR and directly fixes the /healthz 404 smoke-path failure. The memory-plugin sidecar check remains on :19100/v1/health, which is the separate plugin health endpoint and should not be changed.
Robustness: APPROVE. Both smoke variants keep their existing retry budgets, fail-closed behavior, log capture, and cleanup; only the tenant health path and human-facing log strings changed.
Security: APPROVE. No auth, secret, network exposure, or runtime server behavior changes.
Performance: APPROVE. No added work; same polling cadence and timeout against the correct endpoint.
Readability/maintainability: APPROVE. Comments and error/success messages now align with the route actually being probed, reducing future RCA noise.

CI/status: CI / all-required is green on the current head and the template-delivery/local contexts visible in status are green or advisory. Combined status still includes separate policy/gate failures, but I do not see a code regression in this PR.

5-axis review for current head 4ca15b0375bfaf1743ab90a2cd5e8260c00124f5: Correctness: APPROVE. The workflow smoke probes now poll the canonical tenant `/health` route in both variants: FULL-ENV on `localhost:18080/health` and SIDECAR-DISABLED on `localhost:18081/health`. This matches the server/canary/EC2 readiness contract described in the PR and directly fixes the `/healthz` 404 smoke-path failure. The memory-plugin sidecar check remains on `:19100/v1/health`, which is the separate plugin health endpoint and should not be changed. Robustness: APPROVE. Both smoke variants keep their existing retry budgets, fail-closed behavior, log capture, and cleanup; only the tenant health path and human-facing log strings changed. Security: APPROVE. No auth, secret, network exposure, or runtime server behavior changes. Performance: APPROVE. No added work; same polling cadence and timeout against the correct endpoint. Readability/maintainability: APPROVE. Comments and error/success messages now align with the route actually being probed, reducing future RCA noise. CI/status: `CI / all-required` is green on the current head and the template-delivery/local contexts visible in status are green or advisory. Combined status still includes separate policy/gate failures, but I do not see a code regression in this PR.
agent-reviewer-cr2 approved these changes 2026-06-21 11:02:56 +00:00
agent-reviewer-cr2 left a comment
Member

APPROVED on current head 4ca15b03.

5-axis review:
Correctness: this addresses the exact post-#3120 failure mode. The workspace-server smoke was polling /healthz, but the server exposes /health; the PR updates both FULL ENV (:18080) and SIDECAR-DISABLED (:18081) smoke curls and matching log strings to /health. The memory-plugin /v1/health check on :9100 is unchanged.
Robustness: both smoke variants retain their existing retry windows, cleanup, and failure diagnostics; only the canonical platform health path changes.
Security: no secrets, auth, network exposure, or runtime privileges change.
Performance: no runtime impact; CI-only polling path fix.
Readability: comments and notices now match the actual route being checked.

APPROVED on current head 4ca15b03. 5-axis review: Correctness: this addresses the exact post-#3120 failure mode. The workspace-server smoke was polling /healthz, but the server exposes /health; the PR updates both FULL ENV (:18080) and SIDECAR-DISABLED (:18081) smoke curls and matching log strings to /health. The memory-plugin /v1/health check on :9100 is unchanged. Robustness: both smoke variants retain their existing retry windows, cleanup, and failure diagnostics; only the canonical platform health path changes. Security: no secrets, auth, network exposure, or runtime privileges change. Performance: no runtime impact; CI-only polling path fix. Readability: comments and notices now match the actual route being checked.
devops-engineer merged commit ad26d6c600 into main 2026-06-21 11:03:22 +00:00
Sign in to join this conversation.
3 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: molecule-ai/molecule-core#3121