From 9c9f06acca310040c85ba4dcd9cb5adebfa679fb Mon Sep 17 00:00:00 2001 From: "Molecule AI Dev Engineer B (MiniMax)" Date: Mon, 15 Jun 2026 16:41:06 +0000 Subject: [PATCH] fix(ci#2929/RC): REDACT raw CP/SSM response in staging redeploy-fleet (Rule 8) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Researcher RCA #2929 comment 103332 (job 509031, run 370964): the staging redeploy 500'd AND the raw SSM ValidationException ('Value "[mol-hzdbg24819-8aaebec0]" at "instanceIds" failed ... pattern (^i-…|^mi-…)') was printed unredacted into the persistent CI log. Two redaction leaks in redeploy-tenants-on-staging.yml: 1. The runner-log `cat $HTTP_RESPONSE | jq . || cat $HTTP_RESPONSE` on failure leaked the raw JSON (including the operator- sensitive SSM error) when jq succeeded OR failed. 2. The GITHUB_STEP_SUMMARY per-tenant table printed the raw `.error` STRING (`\(.error // "-")`) — printed the actual SSM exception text, with operator-sensitive values. Both redactions land in the same commit because they're mechanically paired (the runner-log REDACTED_BODY shape is deliberately aligned with the GITHUB_STEP_SUMMARY per-tenant table shape so an operator can correlate the two views). Single landing keeps the redaction-as-a-whole in one auditable diff. #2946 SPLIT NOTE (per PM 16:45Z): this commit is the redeploy redaction ONLY. The 5 other commits on the original #2946 branch (the manifest-pinning carrier — manifest.json + clone- manifest.sh + manifest_pinning_test.go + the ci.yml Platform Go job env block + the test-bearer wiring) are DROPPED here because they're #2939's canonical carrier (a separate PR already on the org-templates pin stream). Keeping them in #2946 would have created two competing pinning paths. The split lets the redaction merge cleanly without waiting on the pinning #2939 to land first. Run: deploy-staging on PR #2946 branch (per .gitea/workflows). Expect: same green as #2946 originally, with the Rule 8 redaction in effect. The runtime behavior is unchanged for SUCCESS runs (only the FAILURE-shape runner log + the .error STRING in the per-tenant table are redacted). --- .gitea/workflows/redeploy-tenants-on-staging.yml | 16 ++++++++++++++-- 1 file changed, 14 insertions(+), 2 deletions(-) diff --git a/.gitea/workflows/redeploy-tenants-on-staging.yml b/.gitea/workflows/redeploy-tenants-on-staging.yml index 5e07eaafd..4e50f3aba 100644 --- a/.gitea/workflows/redeploy-tenants-on-staging.yml +++ b/.gitea/workflows/redeploy-tenants-on-staging.yml @@ -169,7 +169,15 @@ jobs: [ -z "$HTTP_CODE" ] && HTTP_CODE="000" echo "HTTP $HTTP_CODE" - cat "$HTTP_RESPONSE" | jq . || cat "$HTTP_RESPONSE" + # REDACT the raw CP/SSM response in the runner log (Rule 8 — + # the previous `cat $HTTP_RESPONSE | jq . || cat $HTTP_RESPONSE` + # leaked the raw ValidationException into the persistent CI log + # on 2026-06-15 staging run 509031, masking the actual cause). + # Print ONLY: HTTP status code, a count, and a success/failure + # boolean. Per-tenant details (with the .error field redacted to + # a boolean) go to the GITHUB_STEP_SUMMARY for operator visibility. + REDACTED_BODY="$(jq -c '{ok, result_count: (.results // [] | length), stragglers_count: ((.stragglers // []) | length), http_code: ("'"$HTTP_CODE"'")}' "$HTTP_RESPONSE" 2>/dev/null || echo '{"ok":"unparseable","http_code":"'"$HTTP_CODE"'"}')" + echo "$REDACTED_BODY" { echo "## Staging tenant redeploy fleet" @@ -182,9 +190,13 @@ jobs: echo "" echo "### Per-tenant result" echo "" + # REDACT `.error` to a boolean (Rule 8 — the previous + # `\(.error // "-")` printed the raw error string, including + # SSM ValidationExceptions with operator-sensitive values, into + # the persistent CI log). echo '| Slug | Phase | SSM Status | Exit | Healthz | Error |' echo '|------|-------|------------|------|---------|-------|' - jq -r '.results[]? | "| \(.slug) | \(.phase) | \(.ssm_status // "-") | \(.ssm_exit_code) | \(.healthz_ok) | \(.error // "-") |"' "$HTTP_RESPONSE" || true + jq -r '.results[]? | "| \(.slug) | \(.phase) | \(.ssm_status // "-") | \(.ssm_exit_code) | \(.healthz_ok) | \((.error // "") != "") |"' "$HTTP_RESPONSE" 2>/dev/null || true } >> "$GITHUB_STEP_SUMMARY" # Distinguish "real fleet failure" from "E2E teardown race". -- 2.52.0