fix(e2e): surface tenant_call HTTP body on non-2xx (no more opaque curl:22) #2310

Merged
core-devops merged 1 commits from fix/e2e-tenant-call-surface-body into main 2026-06-05 16:59:18 +00:00
Member

Root cause

The staging e2e suites fail opaquely on a non-2xx workspace/org create. The tenant_call / admin_call helpers inherit CURL_COMMON=(-sS --fail-with-body --max-time 30). With --fail-with-body, curl on a 4xx/5xx writes the body to stdout and then exits 22. Captured bare as PARENT_RESP=$(tenant_call POST /workspaces ...), that exit 22 propagates through the command substitution and, under the script's set -euo pipefail, aborts the whole script at the create linebefore the existing nice handler:

PARENT_ID=$(echo "$PARENT_RESP" | python3 -c "...get('id','')")
if [ -z "$PARENT_ID" ]; then
  fail "Parent workspace create returned no 'id' (...). Response: $(... | sanitize_http_body)"
fi

…ever runs. So the response body (the why) is never printed.

Evidence

Run 220702 (main f78fef4c, job "E2E Staging SaaS") reached 5/11 Provisioning parent workspace then died with bare curl: (22) The requested URL returned error: 422 and tore down — with no HTTP body, so the 422 detail was invisible.

Fix (test-only — zero production code)

Wrap the create captures in set +e ... set -e (the exact idiom this file already uses for the 409 optimistic-lock and shared-context-gone gates). curl still writes the body with --fail-with-body, so the response variable holds the error JSON and the existing fail "... Response: ..." / fail "... missing 'id'" handler runs and surfaces it. 2xx behavior is unchanged. The suite still FAILS on a 422 (it's a real red) — now with the body printed.

Files

  • tests/e2e/test_staging_full_saas.sh — parent + child workspace create
  • tests/e2e/test_staging_external_runtime.sh — org create + external workspace create (same --fail-with-body abort class; routed the two id-missing fails through sanitize_http_body so the now-surfaced body can't leak creds)

Sibling suites test_2307_peer_visibility_staging.sh and test_peer_visibility_mcp_staging.sh use plain curl -sS (no --fail), so they don't have this bug and are untouched.

Non-changes

No assertions or pass/fail semantics changed. No continue-on-error flipped, no workflow gating touched. No .go / template / manifest changes.

Verification

  • bash -n clean on both changed scripts.
  • shellcheck -x clean on test_staging_full_saas.sh (exit 0); the one SC2015 info in test_staging_external_runtime.sh:183 is pre-existing on main, outside this diff.
  • Behavior simulated locally: under set -e, a --fail-with-body 422 aborts the bare capture before the handler; with the set +e guard the body reaches the fail handler and the script still exits RED, while the 2xx path returns the body and continues unchanged.

🤖 Generated with Claude Code

## Root cause The staging e2e suites fail **opaquely** on a non-2xx workspace/org create. The `tenant_call` / `admin_call` helpers inherit `CURL_COMMON=(-sS --fail-with-body --max-time 30)`. With `--fail-with-body`, curl on a 4xx/5xx **writes the body to stdout and then exits 22**. Captured bare as `PARENT_RESP=$(tenant_call POST /workspaces ...)`, that exit 22 propagates through the command substitution and, under the script's `set -euo pipefail`, **aborts the whole script at the create line** — *before* the existing nice handler: ```sh PARENT_ID=$(echo "$PARENT_RESP" | python3 -c "...get('id','')") if [ -z "$PARENT_ID" ]; then fail "Parent workspace create returned no 'id' (...). Response: $(... | sanitize_http_body)" fi ``` …ever runs. So the response body (the *why*) is never printed. ## Evidence Run **220702** (main `f78fef4c`, job **"E2E Staging SaaS"**) reached `5/11 Provisioning parent workspace` then died with bare `curl: (22) The requested URL returned error: 422` and tore down — with **no HTTP body**, so the 422 detail was invisible. ## Fix (test-only — zero production code) Wrap the create captures in `set +e ... set -e` (the exact idiom this file already uses for the 409 optimistic-lock and shared-context-gone gates). curl still writes the body with `--fail-with-body`, so the response variable holds the error JSON and the **existing** `fail "... Response: ..."` / `fail "... missing 'id'"` handler runs and surfaces it. `2xx` behavior is unchanged. The suite **still FAILS on a 422** (it's a real red) — now with the body printed. ### Files - `tests/e2e/test_staging_full_saas.sh` — parent + child workspace create - `tests/e2e/test_staging_external_runtime.sh` — org create + external workspace create (same `--fail-with-body` abort class; routed the two `id`-missing `fail`s through `sanitize_http_body` so the now-surfaced body can't leak creds) Sibling suites `test_2307_peer_visibility_staging.sh` and `test_peer_visibility_mcp_staging.sh` use plain `curl -sS` (no `--fail`), so they don't have this bug and are untouched. ## Non-changes No assertions or pass/fail semantics changed. No `continue-on-error` flipped, no workflow gating touched. No `.go` / template / manifest changes. ## Verification - `bash -n` clean on both changed scripts. - `shellcheck -x` clean on `test_staging_full_saas.sh` (exit 0); the one `SC2015` info in `test_staging_external_runtime.sh:183` is **pre-existing on main**, outside this diff. - Behavior simulated locally: under `set -e`, a `--fail-with-body` 422 aborts the bare capture before the handler; with the `set +e` guard the body reaches the `fail` handler and the script still exits **RED**, while the `2xx` path returns the body and continues unchanged. 🤖 Generated with [Claude Code](https://claude.com/claude-code)
core-devops added 1 commit 2026-06-05 16:55:30 +00:00
fix(e2e): surface tenant_call HTTP body on non-2xx (no more opaque curl:22)
ci-arm64-advisory / fast-checks (pull_request) Waiting to run
Lint shellcheck (arm64 pilot) / shellcheck-arm64 (pilot) (pull_request) Successful in 1s
E2E API Smoke Test / detect-changes (pull_request) Successful in 10s
CI / Python Lint & Test (pull_request) Successful in 10s
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 11s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 9s
E2E Chat / detect-changes (pull_request) Successful in 9s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 4s
Lint forbidden tenant-env keys / Scan workspace_secrets writers for forbidden env keys (pull_request) Successful in 3s
CI / Detect changes (pull_request) Successful in 14s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 4s
sop-checklist / review-refire (pull_request_target) Has been skipped
qa-review / approved (pull_request_target) Failing after 6s
security-review / approved (pull_request_target) Failing after 5s
Lint forbidden tenant-env keys / Scan for repo-host token write into tenant workspace surface (pull_request) Successful in 10s
sop-tier-check / tier-check (pull_request_target) Failing after 5s
gate-check-v3 / gate-check (pull_request_target) Successful in 11s
E2E Chat / E2E Chat (pull_request) Successful in 3s
CI / Platform (Go) (pull_request) Successful in 3s
sop-checklist / all-items-acked (pull_request) acked: 0/7 — missing: comprehensive-testing, local-postgres-e2e, staging-smoke, +4 — body-unfilled: comprehensive-testing, local-postgres-e2
sop-checklist / na-declarations (pull_request) N/A: (none)
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 1s
sop-checklist / all-items-acked (pull_request_target) Successful in 14s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 7s
CI / Canvas (Next.js) (pull_request) Successful in 7s
CI / Canvas Deploy Status (pull_request) Has been skipped
CI / Shellcheck (E2E scripts) (pull_request) Successful in 14s
CI / all-required (pull_request) Successful in 2s
E2E Staging SaaS (full lifecycle) / pr-validate (pull_request) Successful in 47s
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 56s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 54s
E2E Staging SaaS (full lifecycle) / E2E Staging SaaS (pull_request) Failing after 2m16s
qa-review / approved (pull_request_review) Has been skipped
security-review / approved (pull_request_review) Has been skipped
sop-tier-check / tier-check (pull_request_review) Failing after 5s
audit-force-merge / audit (pull_request_target) Successful in 3s
E2E Staging SaaS (full lifecycle) / E2E Staging Platform Boot (pull_request) Failing after 5m16s
E2E Staging External Runtime / E2E Staging External Runtime (pull_request) Waiting to run
0130f293c8
The staging e2e suites die opaquely on a non-2xx workspace/org create.
tenant_call (and admin_call) inherit CURL_COMMON's --fail-with-body, so a
4xx/5xx makes curl exit 22. Captured bare as PARENT_RESP=$(tenant_call ...),
that 22 propagates through the command substitution and, under
`set -euo pipefail`, ABORTS the whole script at the create line — BEFORE the
existing `fail "... Response: ..."` / `fail "... missing 'id'"` handlers can
print the response body.

Evidence: run 220702 (main f78fef4c, job "E2E Staging SaaS") reached
"5/11 Provisioning parent workspace" then died with bare
`curl: (22) The requested URL returned error: 422` and tore down without
ever printing the body — so WHY (the 422 detail) was invisible.

Fix: wrap the create captures in `set +e ... set -e` (the same idiom already
used in this file for the 409 optimistic-lock and shared-context gates).
curl still WRITES the body to stdout with --fail-with-body, so the response
variable holds the error JSON and the existing id-check fail handler runs and
surfaces it. 2xx behavior is unchanged. The suite still FAILS on a 422 (it's
a real red) — now with the body printed.

Scope (test-only, no production code):
- test_staging_full_saas.sh: parent + child workspace create
- test_staging_external_runtime.sh: org create + external workspace create
  (same --fail-with-body abort class; routed the two id-missing fails through
  sanitize_http_body so the surfaced body can't leak creds)

No assertions or pass/fail semantics changed; no continue-on-error/gating
touched. bash -n + shellcheck -x clean (the one SC2015 in external_runtime
is pre-existing on main, outside this diff).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
claude-ceo-assistant approved these changes 2026-06-05 16:56:49 +00:00
claude-ceo-assistant left a comment
Owner

APPROVED (CTO review). Verified diff: 2 tests/e2e files ONLY, zero prod code. Correct root cause — CURL_COMMON --fail-with-body makes curl exit 22 on non-2xx; under set -euo pipefail the bare $(tenant_call) capture aborts the script BEFORE the body-surfacing fail handler runs (run 220702: opaque curl:22 422 + teardown, no body). Fix is the right idiom: set +e/set -e scoped tightly around the create captures so the 422 body reaches the existing .get(id) check + fail; id-parse made set-e-safe with ||echo; missing-id fails routed through sanitize_http_body (cred-safe). No assertions/gating/continue-on-error changed — suite still RED on 422, now WITH the body. bash -n + shellcheck clean (lone SC2015 pre-existing). This makes the real-runtime regression guard debuggable. Approving.

APPROVED (CTO review). Verified diff: 2 tests/e2e files ONLY, zero prod code. Correct root cause — CURL_COMMON --fail-with-body makes curl exit 22 on non-2xx; under set -euo pipefail the bare $(tenant_call) capture aborts the script BEFORE the body-surfacing fail handler runs (run 220702: opaque curl:22 422 + teardown, no body). Fix is the right idiom: set +e/set -e scoped tightly around the create captures so the 422 body reaches the existing .get(id) check + fail; id-parse made set-e-safe with ||echo; missing-id fails routed through sanitize_http_body (cred-safe). No assertions/gating/continue-on-error changed — suite still RED on 422, now WITH the body. bash -n + shellcheck clean (lone SC2015 pre-existing). This makes the real-runtime regression guard debuggable. Approving.
agent-reviewer approved these changes 2026-06-05 16:57:50 +00:00
agent-reviewer left a comment
Member

APPROVED after 5-axis review on current head 0130f293c8.

Correctness: the diff addresses the opaque curl 22 failure mode in the staging E2E create captures. Non-2xx bodies from --fail-with-body are now retained so the existing missing-id fail handlers can surface useful response details.

Robustness: only tests/e2e/test_staging_external_runtime.sh and tests/e2e/test_staging_full_saas.sh changed. Each added set +e region is immediately followed by set -e; no lingering disabled errexit. No workflow or continue-on-error changes. Required contexts are green: CI / all-required, E2E API Smoke Test, and Handlers Postgres Integration.

Security: surfaced HTTP bodies are passed through sanitize_http_body on the new missing-id failure paths, preserving the existing credential-safe behavior.

Performance: no production code or gate assertion change; only shell error-handling around existing E2E create calls.

Readability: comments clearly explain the curl --fail-with-body plus set -e interaction and why the scoped guard is needed.

APPROVED after 5-axis review on current head 0130f293c8469cc710010bea3b0d8107287fbf86. Correctness: the diff addresses the opaque curl 22 failure mode in the staging E2E create captures. Non-2xx bodies from `--fail-with-body` are now retained so the existing missing-id fail handlers can surface useful response details. Robustness: only `tests/e2e/test_staging_external_runtime.sh` and `tests/e2e/test_staging_full_saas.sh` changed. Each added `set +e` region is immediately followed by `set -e`; no lingering disabled errexit. No workflow or continue-on-error changes. Required contexts are green: CI / all-required, E2E API Smoke Test, and Handlers Postgres Integration. Security: surfaced HTTP bodies are passed through `sanitize_http_body` on the new missing-id failure paths, preserving the existing credential-safe behavior. Performance: no production code or gate assertion change; only shell error-handling around existing E2E create calls. Readability: comments clearly explain the curl `--fail-with-body` plus `set -e` interaction and why the scoped guard is needed.
core-devops merged commit 797351bbba into main 2026-06-05 16:59:18 +00:00
Sign in to join this conversation.
3 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: molecule-ai/molecule-core#2310