fix(tests/e2e): surface diagnose step Detail (subprocess stderr) in EIC smoke output
Some checks failed
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 21s
CI / Detect changes (pull_request) Successful in 43s
E2E API Smoke Test / detect-changes (pull_request) Successful in 46s
E2E Staging SaaS (full lifecycle) / E2E Staging SaaS (pull_request) Has been skipped
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 51s
E2E Staging SaaS (full lifecycle) / pr-validate (pull_request) Successful in 57s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 46s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 16s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 40s
qa-review / approved (pull_request) Failing after 15s
gate-check-v3 / gate-check (pull_request) Successful in 24s
security-review / approved (pull_request) Failing after 17s
sop-checklist-gate / gate (pull_request) Successful in 18s
sop-tier-check / tier-check (pull_request) Successful in 21s
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 1m30s
CI / Platform (Go) (pull_request) Successful in 8s
CI / Canvas (Next.js) (pull_request) Successful in 8s
CI / Python Lint & Test (pull_request) Successful in 12s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 9s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 12s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 19s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 7s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Failing after 3m37s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
CI / all-required (pull_request) Successful in 3s
sop-checklist / all-items-acked (pull_request) acked: 0/7 — missing: comprehensive-testing, local-postgres-e2e, staging-smoke, +4 — body-unfilled: 7
audit-force-merge / audit (pull_request) Has been skipped

mc#687 / mc#424 root-cause finding: the smoke test was extracting only
the Go error string (e.g. "exec: ... executable file not found") but
ignoring the subprocess stderr captured in each diagnoseStep's Detail field.

Detail carries the vendor-truth signal — e.g.
  "AccessDeniedException: ... is not authorized to perform:
   ec2-instance-connect:OpenTunnel"
— which is exactly what was needed to root-cause the 21h mc#424 outage
in 5 minutes instead of 21 hours.

The fix extracts both error + detail from the first failing step and
includes detail on its own line in the failure message. The shell
conditional ${DIAG_DETAIL:+\n  detail (subprocess stderr): $DIAG_DETAIL}
only appends the extra line when detail is non-empty, keeping the
output clean for simple errors.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
Molecule AI · core-be 2026-05-12 10:05:58 +00:00
parent a9351ae47d
commit 52ff25ec99

View File

@ -511,8 +511,14 @@ for wid in $WS_TO_CHECK; do
ok " $wid terminal-reachable (canvas terminal will work)"
else
DIAG_FAIL=$(echo "$DIAG_JSON" | python3 -c "import json,sys; d=json.load(sys.stdin); print(d.get('first_failure','unknown'))" 2>/dev/null || echo "unknown")
DIAG_DETAIL=$(echo "$DIAG_JSON" | python3 -c "import json,sys; d=json.load(sys.stdin); s=[x for x in d.get('steps',[]) if not x.get('ok')]; print(s[0].get('error','') if s else '')" 2>/dev/null || echo "")
fail "Workspace $wid terminal diagnose failed at step '$DIAG_FAIL': $DIAG_DETAIL — check tenant SG has tcp/22 from EIC endpoint SG (sg-0785d5c6138220523), EIC_ENDPOINT_SG_ID set in Railway, and EIC endpoint health"
# Extract both error (Go error string) and detail (subprocess stderr — vendor truth).
# detail carries subprocess stderr for EIC/ssh/tunnel failures, which is the
# actionable signal (e.g. "AccessDeniedException: ... is not authorized to perform
# ec2-instance-connect:OpenTunnel"). mc#687 / mc#424 root-cause finding.
DIAG_ERR=$(echo "$DIAG_JSON" | python3 -c "import json,sys; d=json.load(sys.stdin); s=[x for x in d.get('steps',[]) if not x.get('ok')]; print(s[0].get('error','') if s else '')" 2>/dev/null || echo "")
DIAG_DETAIL=$(echo "$DIAG_JSON" | python3 -c "import json,sys; d=json.load(sys.stdin); s=[x for x in d.get('steps',[]) if not x.get('ok')]; print(s[0].get('detail','') if s else '')" 2>/dev/null || echo "")
fail "Workspace $wid terminal diagnose failed at step '$DIAG_FAIL': $DIAG_ERR${DIAG_DETAIL:+
detail (subprocess stderr): $DIAG_DETAIL} — check tenant SG has tcp/22 from EIC endpoint SG (sg-0785d5c6138220523), EIC_ENDPOINT_SG_ID set in Railway, and EIC endpoint health"
fi
done