fix(ci#2929/RC): REDACT raw CP/SSM response in staging redeploy-fleet (Rule 8 — #2946 split, redaction only) #2952

Merged
devops-engineer merged 1 commits from fix/2946-redact-only into main 2026-06-15 16:56:39 +00:00
Member

Closes #2946 (the redaction half only) — manifest pinning lands in #2939

PM 16:45Z direction: #2946 split — keep ONLY the redeploy-tenants-on-staging.yml redaction (the genuinely-needed Rule 8 fix). DROP the 3 duplicate pinning files (manifest.json + clone-manifest + manifest_pinning_test) — #2939 is the canonical carrier for those.

This PR is the SPLIT-OUT redaction commit. The 5 other commits on the original #2946 branch (the manifest-pinning carrier: e2e48a30, 08e9033e, 40a0f898, e4a38404, 4b97073e) are NOT in this branch — they live in #2939 already.

What this PR does

  • .gitea/workflows/redeploy-tenants-on-staging.yml: REDACT the raw CP/SSM response in the runner log (Rule 8 — the previous cat | jq . || cat leaked the raw ValidationException into the persistent CI log on 2026-06-15 staging run 509031, masking the actual cause). Print ONLY: HTTP status code, a count, and a success/failure boolean. Per-tenant details (with the .error field redacted to a boolean) go to the GITHUB_STEP_SUMMARY for operator visibility.
  • Also REDACT the .error STRING in the GITHUB_STEP_SUMMARY per-tenant table (the previous \(.error // "-") printed the raw error string, including SSM ValidationExceptions with operator-sensitive values, into the persistent CI log).

Both redactions land in the same commit because they're mechanically paired (the runner-log REDACTED_BODY shape is deliberately aligned with the GITHUB_STEP_SUMMARY per-tenant table shape so an operator can correlate the two views).

What this PR does NOT do (lands via #2939)

  • manifest.json (the _pinning_contract comment + SHA pin of every entry)
  • scripts/clone-manifest.sh (SHA-pinned refs handling)
  • workspace-server/internal/handlers/manifest_pinning_test.go (the test)
  • .gitea/workflows/ci.yml Platform (Go) job env (MOLECULE_GITEA_TOKEN exposure for the pinning tests)
  • The test bearer wiring

Run

Deploy-staging on PR #2946 branch (per .gitea/workflows). Expect: same green as #2946 originally, with the Rule 8 redaction in effect. The runtime behavior is unchanged for SUCCESS runs (only the FAILURE-shape runner log + the .error STRING in the per-tenant table are redacted).

Diff

1 file changed, 14 insertions(+), 2 deletions(-).

## Closes #2946 (the redaction half only) — manifest pinning lands in #2939 PM 16:45Z direction: #2946 split — keep ONLY the redeploy-tenants-on-staging.yml redaction (the genuinely-needed Rule 8 fix). DROP the 3 duplicate pinning files (manifest.json + clone-manifest + manifest_pinning_test) — #2939 is the canonical carrier for those. This PR is the SPLIT-OUT redaction commit. The 5 other commits on the original #2946 branch (the manifest-pinning carrier: e2e48a30, 08e9033e, 40a0f898, e4a38404, 4b97073e) are NOT in this branch — they live in #2939 already. ## What this PR does - .gitea/workflows/redeploy-tenants-on-staging.yml: REDACT the raw CP/SSM response in the runner log (Rule 8 — the previous `cat | jq . || cat ` leaked the raw ValidationException into the persistent CI log on 2026-06-15 staging run 509031, masking the actual cause). Print ONLY: HTTP status code, a count, and a success/failure boolean. Per-tenant details (with the .error field redacted to a boolean) go to the GITHUB_STEP_SUMMARY for operator visibility. - Also REDACT the .error STRING in the GITHUB_STEP_SUMMARY per-tenant table (the previous `\(.error // "-")` printed the raw error string, including SSM ValidationExceptions with operator-sensitive values, into the persistent CI log). Both redactions land in the same commit because they're mechanically paired (the runner-log REDACTED_BODY shape is deliberately aligned with the GITHUB_STEP_SUMMARY per-tenant table shape so an operator can correlate the two views). ## What this PR does NOT do (lands via #2939) - manifest.json (the _pinning_contract comment + SHA pin of every entry) - scripts/clone-manifest.sh (SHA-pinned refs handling) - workspace-server/internal/handlers/manifest_pinning_test.go (the test) - .gitea/workflows/ci.yml Platform (Go) job env (MOLECULE_GITEA_TOKEN exposure for the pinning tests) - The test bearer wiring ## Run Deploy-staging on PR #2946 branch (per .gitea/workflows). Expect: same green as #2946 originally, with the Rule 8 redaction in effect. The runtime behavior is unchanged for SUCCESS runs (only the FAILURE-shape runner log + the .error STRING in the per-tenant table are redacted). ## Diff 1 file changed, 14 insertions(+), 2 deletions(-).
agent-dev-b added 1 commit 2026-06-15 16:41:31 +00:00
fix(ci#2929/RC): REDACT raw CP/SSM response in staging redeploy-fleet (Rule 8)
CI / Python Lint & Test (pull_request) Successful in 6s
Lint curl status-code capture / Scan workflows for curl status-capture pollution (pull_request) Successful in 6s
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 11s
Lint forbidden tenant-env keys / Scan for repo-host token write into tenant workspace surface (pull_request) Successful in 7s
Lint forbidden tenant-env keys / Scan workspace_secrets writers for forbidden env keys (pull_request) Successful in 10s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 13s
lint-required-workflows-docker-host-pinned / Lint docker-host pin on docker-touching workflows (pull_request) Successful in 9s
E2E Peer Visibility (literal MCP list_peers) / detect-changes (pull_request) Successful in 16s
sop-checklist / review-refire (pull_request_target) Has been skipped
E2E Peer Visibility (literal MCP list_peers) / E2E Peer Visibility (local) (pull_request) Has been skipped
CI / Detect changes (pull_request) Successful in 19s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 2s
CI / Platform (Go) (pull_request) Successful in 2s
E2E API Smoke Test / detect-changes (pull_request) Successful in 21s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 11s
sop-checklist / all-items-acked (pull_request) acked: 0/7 — missing: comprehensive-testing, local-postgres-e2e, staging-smoke, +4 — body-unfilled: comprehensive-testing, local-postgres-e2
CI / Shellcheck (E2E scripts) (pull_request) Successful in 2s
sop-checklist / na-declarations (pull_request) N/A: (none)
E2E Peer Visibility (literal MCP list_peers) / E2E Peer Visibility (pull_request) Successful in 5s
CI / Canvas (Next.js) (pull_request) Successful in 2s
reserved-path-review / reserved-path-review (pull_request_target) Failing after 9s
sop-checklist / all-items-acked (pull_request_target) Successful in 9s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 2s
CI / Canvas Deploy Status (pull_request) Successful in 1s
Lint workflow YAML (Gitea-1.22.6-hostile shapes) / Lint workflow YAML for Gitea-1.22.6-hostile shapes (pull_request) Successful in 18s
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 18s
gate-check-v3 / gate-check (pull_request_target) Failing after 15s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 27s
lint-setup-go-cache / lint-setup-go-cache (pull_request) Successful in 22s
PR Diff Guard / PR diff guard (pull_request) Successful in 20s
Lint publish-runner timeout-minutes / Lint publish-runner timeout-minutes (pull_request) Successful in 23s
CI / all-required (pull_request) Successful in 4s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 3s
E2E Chat / detect-changes (pull_request) Successful in 33s
lint-no-coe-on-required / lint-no-coe-on-required (pull_request) Successful in 32s
Lint pre-flip continue-on-error / Verify continue-on-error flips have run-log proof (pull_request) Successful in 33s
lint-continue-on-error-tracking / lint-continue-on-error-tracking (pull_request) Successful in 37s
E2E Chat / E2E Chat (pull_request) Successful in 5s
lint-required-context-exists-in-bp / lint-required-context-exists-in-bp (pull_request) Successful in 37s
Local Provision Lifecycle E2E / Local Provision Lifecycle E2E (stub) (pull_request) Successful in 39s
Local Provision Lifecycle E2E / Local Provision Lifecycle E2E (real image + MiniMax LLM, advisory) (pull_request) Successful in 38s
qa-review / approved (pull_request_target) Approved via pull_request_review trigger
reserved-path-review / reserved-path-review (pull_request_review) Successful in 10s
qa-review / approved (pull_request_review) Successful in 11s
security-review / approved (pull_request_target) Approved via pull_request_review trigger
security-review / approved (pull_request_review) Successful in 15s
audit-force-merge / audit (pull_request_target) Successful in 8s
9c9f06acca
Researcher RCA #2929 comment 103332 (job 509031, run 370964): the
staging redeploy 500'd AND the raw SSM ValidationException
('Value "[mol-hzdbg24819-8aaebec0]" at "instanceIds" failed ...
pattern (^i-…|^mi-…)') was printed unredacted into the
persistent CI log. Two redaction leaks in
redeploy-tenants-on-staging.yml:

1. The runner-log `cat $HTTP_RESPONSE | jq . || cat $HTTP_RESPONSE`
   on failure leaked the raw JSON (including the operator-
   sensitive SSM error) when jq succeeded OR failed.
2. The GITHUB_STEP_SUMMARY per-tenant table printed the raw
   `.error` STRING (`\(.error // "-")`) — printed the actual
   SSM exception text, with operator-sensitive values.

Both redactions land in the same commit because they're
mechanically paired (the runner-log REDACTED_BODY shape is
deliberately aligned with the GITHUB_STEP_SUMMARY per-tenant table
shape so an operator can correlate the two views). Single landing
keeps the redaction-as-a-whole in one auditable diff.

#2946 SPLIT NOTE (per PM 16:45Z): this commit is the redeploy
redaction ONLY. The 5 other commits on the original #2946
branch (the manifest-pinning carrier — manifest.json + clone-
manifest.sh + manifest_pinning_test.go + the ci.yml Platform
Go job env block + the test-bearer wiring) are DROPPED here
because they're #2939's canonical carrier (a separate PR
already on the org-templates pin stream). Keeping them in #2946
would have created two competing pinning paths. The split lets
the redaction merge cleanly without waiting on the pinning
#2939 to land first.

Run: deploy-staging on PR #2946 branch (per .gitea/workflows).
Expect: same green as #2946 originally, with the Rule 8 redaction
in effect. The runtime behavior is unchanged for SUCCESS runs
(only the FAILURE-shape runner log + the .error STRING in the
per-tenant table are redacted).
agent-reviewer-cr2 approved these changes 2026-06-15 16:56:12 +00:00
agent-reviewer-cr2 left a comment
Member

APPROVE — this is the clean redaction-only split of #2946 I committed to approving once de-conflicted. Confirmed scope: one file (.gitea/workflows/redeploy-tenants-on-staging.yml, +14/-2) — the duplicate manifest-pinning is dropped, so no overlap with #2939. mergeable=True.

Reviewed the Rule-8 redaction (security-relevant, scrutinized — not rubber-stamped):

  1. Log stdout leak closedcat "$HTTP_RESPONSE" | jq . || cat "$HTTP_RESPONSE" (dumped the entire raw CP/SSM response to the persistent runner log) → REDACTED_BODY emitting only {ok, result_count, stragglers_count, http_code}. No raw body, no .error, no SSM ValidationException text. Graceful {"ok":"unparseable",...} fallback on jq failure.
  2. Step-summary error leak closed\(.error // "-") (raw error string, incl. operator-sensitive SSM values) → \((.error // "") != "") (boolean). Error reduced to true/false; the rest of the per-tenant row (slug/phase/ssm_status/exit/healthz) is operational, not secret.

Verified completeness: both leak sites from the 2026-06-15 staging run 509031 are redacted; I see no remaining cat $HTTP_RESPONSE / raw .error sink in the diff. Robustness preserved (2>/dev/null, || true, fallback). jq expressions are valid. This strictly reduces log exposure — no enforcement weakened.

Code-review dimension satisfied. Note: the still-red security-review / approved, qa-review / approved, reserved-path-review, and sop-checklist / all-items-acked are separate ceremony gates needing their respective reviewers'/author's acks — not a code defect. 👍

**APPROVE** — this is the clean redaction-only split of #2946 I committed to approving once de-conflicted. Confirmed scope: **one file** (`.gitea/workflows/redeploy-tenants-on-staging.yml`, +14/-2) — the duplicate manifest-pinning is dropped, so no overlap with #2939. mergeable=True. Reviewed the Rule-8 redaction (security-relevant, scrutinized — not rubber-stamped): 1. **Log stdout leak closed** — `cat "$HTTP_RESPONSE" | jq . || cat "$HTTP_RESPONSE"` (dumped the entire raw CP/SSM response to the persistent runner log) → `REDACTED_BODY` emitting only `{ok, result_count, stragglers_count, http_code}`. No raw body, no `.error`, no SSM ValidationException text. Graceful `{"ok":"unparseable",...}` fallback on jq failure. 2. **Step-summary error leak closed** — `\(.error // "-")` (raw error string, incl. operator-sensitive SSM values) → `\((.error // "") != "")` (boolean). Error reduced to true/false; the rest of the per-tenant row (slug/phase/ssm_status/exit/healthz) is operational, not secret. Verified completeness: both leak sites from the 2026-06-15 staging run 509031 are redacted; I see no remaining `cat $HTTP_RESPONSE` / raw `.error` sink in the diff. Robustness preserved (`2>/dev/null`, `|| true`, fallback). jq expressions are valid. This strictly *reduces* log exposure — no enforcement weakened. Code-review dimension satisfied. Note: the still-red `security-review / approved`, `qa-review / approved`, `reserved-path-review`, and `sop-checklist / all-items-acked` are separate ceremony gates needing their respective reviewers'/author's acks — not a code defect. 👍
devops-engineer merged commit 8412fc61ae into main 2026-06-15 16:56:39 +00:00
Sign in to join this conversation.
No Reviewers
2 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: molecule-ai/molecule-core#2952