fix(ci): make canvas publish docker probe pipefail-safe #776

Merged
hongming-codex-laptop merged 1 commits from fix/publish-canvas-docker-probe-20260512 into main 2026-05-13 02:29:32 +00:00

Summary

  • make the publish-canvas-image.yml Docker daemon probe safe under set -o pipefail
  • keep the bounded diagnostic preview without piping the docker info command itself into head
  • preserve the loud failure path with runner identity and Docker diagnostics when the daemon is genuinely unavailable

Root cause

The first main push after PR #773 failed in publish-canvas-image / Build & push canvas image before the image build. The runner had Docker available, but the health-check step used docker info 2>&1 | head -5 while pipefail was enabled. head closes the pipe after five lines, so a successful docker info can still make the pipeline fail from the truncated pipe. This patch captures docker info output first, checks the real command exit status, then prints the first five lines separately.

Verification

  • python3 -m pytest tests/test_lint_workflow_yaml.py tests/test_lint_continue_on_error_tracking.py -q
  • bash -n against the extracted workflow shell block
  • git diff --check
  • live main action log inspected for task 48577, confirming the failure occurred at Verify Docker daemon access after docker info printed normal client output

SOP-Checklist

  • Comprehensive testing performed: Workflow lint suite, shell parse check, and whitespace check passed locally; live main action log was inspected directly.
  • Local-postgres E2E run: Not applicable; this is a single workflow shell-probe fix with no database/runtime behavior.
  • Staging-smoke verified or pending: Pending on this PR/main CI rerun; this fixes the image publish path that staging relies on.
  • Root-cause not symptom: Fixed the pipefail-unsafe health probe instead of masking or manually overriding the failed status.
  • Five-Axis review walked: Correctness, readability, architecture, security, and operations reviewed; diagnostics remain bounded and do not expose secrets.
  • No backwards-compat shim / dead code added: Replaced the brittle probe directly without adding fallback branches.
  • Memory/saved-feedback consulted: Used current CI/Gitea migration context and validated the live action log before patching.
## Summary - make the `publish-canvas-image.yml` Docker daemon probe safe under `set -o pipefail` - keep the bounded diagnostic preview without piping the `docker info` command itself into `head` - preserve the loud failure path with runner identity and Docker diagnostics when the daemon is genuinely unavailable ## Root cause The first main push after PR #773 failed in `publish-canvas-image / Build & push canvas image` before the image build. The runner had Docker available, but the health-check step used `docker info 2>&1 | head -5` while `pipefail` was enabled. `head` closes the pipe after five lines, so a successful `docker info` can still make the pipeline fail from the truncated pipe. This patch captures `docker info` output first, checks the real command exit status, then prints the first five lines separately. ## Verification - `python3 -m pytest tests/test_lint_workflow_yaml.py tests/test_lint_continue_on_error_tracking.py -q` - `bash -n` against the extracted workflow shell block - `git diff --check` - live main action log inspected for task `48577`, confirming the failure occurred at `Verify Docker daemon access` after `docker info` printed normal client output ## SOP-Checklist - [x] **Comprehensive testing performed**: Workflow lint suite, shell parse check, and whitespace check passed locally; live main action log was inspected directly. - [x] **Local-postgres E2E run**: Not applicable; this is a single workflow shell-probe fix with no database/runtime behavior. - [x] **Staging-smoke verified or pending**: Pending on this PR/main CI rerun; this fixes the image publish path that staging relies on. - [x] **Root-cause not symptom**: Fixed the pipefail-unsafe health probe instead of masking or manually overriding the failed status. - [x] **Five-Axis review walked**: Correctness, readability, architecture, security, and operations reviewed; diagnostics remain bounded and do not expose secrets. - [x] **No backwards-compat shim / dead code added**: Replaced the brittle probe directly without adding fallback branches. - [x] **Memory/saved-feedback consulted**: Used current CI/Gitea migration context and validated the live action log before patching.
hongming-codex-laptop added 1 commit 2026-05-13 02:17:11 +00:00
fix(ci): make canvas docker probe pipefail-safe
All checks were successful
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 14s
CI / Detect changes (pull_request) Successful in 1m10s
E2E API Smoke Test / detect-changes (pull_request) Successful in 1m13s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 1m8s
Lint curl status-code capture / Scan workflows for curl status-capture pollution (pull_request) Successful in 15s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 1m7s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 44s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 11s
gate-check-v3 / gate-check (pull_request) Successful in 11s
lint-continue-on-error-tracking / lint-continue-on-error-tracking (pull_request) Successful in 1m47s
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 1m38s
Lint workflow YAML (Gitea-1.22.6-hostile shapes) / Lint workflow YAML for Gitea-1.22.6-hostile shapes (pull_request) Successful in 1m41s
Lint pre-flip continue-on-error / Verify continue-on-error flips have run-log proof (pull_request) Successful in 1m59s
sop-checklist-gate / gate (pull_request) Successful in 12s
sop-tier-check / tier-check (pull_request) Successful in 12s
CI / Platform (Go) (pull_request) Successful in 6s
lint-required-context-exists-in-bp / lint-required-context-exists-in-bp (pull_request) Successful in 2m9s
CI / Canvas (Next.js) (pull_request) Successful in 11s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 8s
CI / Python Lint & Test (pull_request) Successful in 4s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 8s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 6s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 8s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 7s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
CI / all-required (pull_request) Successful in 3s
qa-review / approved (pull_request) verified: fresh QA approval; recheck succeeded on issue-comment run
security-review / approved (pull_request) verified: fresh security approval; recheck succeeded on issue-comment run
sop-checklist / all-items-acked (pull_request) acked: 7/7
audit-force-merge / audit (pull_request) Successful in 4s
baa5e3957a
hongming-codex-laptop added the
tier:medium
label 2026-05-13 02:18:04 +00:00
core-qa approved these changes 2026-05-13 02:26:56 +00:00
core-qa left a comment
Member

QA approval for current head baa5e39: verified workflow lint, shell parse check, and live main log evidence that Buildx reached the Docker socket before the pipefail-sensitive docker-info preview failed.

QA approval for current head baa5e39: verified workflow lint, shell parse check, and live main log evidence that Buildx reached the Docker socket before the pipefail-sensitive docker-info preview failed.
core-security approved these changes 2026-05-13 02:27:04 +00:00
core-security left a comment
Member

Security approval for current head baa5e39: no credential path changes, no secret values printed, and the diagnostic remains bounded while preserving failure output for real daemon failures.

Security approval for current head baa5e39: no credential path changes, no secret values printed, and the diagnostic remains bounded while preserving failure output for real daemon failures.
Member

/sop-ack comprehensive-testing — pytest workflow lint suite, bash parse check, git diff check passed; live action log task 48577 validated the failure location.

/sop-ack comprehensive-testing — pytest workflow lint suite, bash parse check, git diff check passed; live action log task 48577 validated the failure location.
Member

/sop-ack local-postgres-e2e — N/A is valid for a single workflow shell-probe change; no DB/runtime handler path changed.

/sop-ack local-postgres-e2e — N/A is valid for a single workflow shell-probe change; no DB/runtime handler path changed.
Member

/sop-ack staging-smoke — pending on PR/main CI rerun; this repairs the canvas image publish path staging consumes.

/sop-ack staging-smoke — pending on PR/main CI rerun; this repairs the canvas image publish path staging consumes.
Member

/sop-ack root-cause — root is pipefail-unsafe docker-info preview after Buildx had already reached the Docker socket, not ECR auth or canvas build code.

/sop-ack root-cause — root is pipefail-unsafe docker-info preview after Buildx had already reached the Docker socket, not ECR auth or canvas build code.
Member

/sop-ack five-axis-review — correctness/readability/architecture/security/ops reviewed; diagnostics stay bounded and secrets remain masked.

/sop-ack five-axis-review — correctness/readability/architecture/security/ops reviewed; diagnostics stay bounded and secrets remain masked.
Member

/sop-ack no-backwards-compat — brittle probe replaced directly; no fallback shim or dead path added.

/sop-ack no-backwards-compat — brittle probe replaced directly; no fallback shim or dead path added.
Member

/sop-ack memory-consulted — current Gitea CI/migration context used and live logs were validated before patching.

/sop-ack memory-consulted — current Gitea CI/migration context used and live logs were validated before patching.
Author
Member

/qa-recheck

/qa-recheck
Author
Member

/security-recheck

/security-recheck
hongming-codex-laptop merged commit e487b202a1 into main 2026-05-13 02:29:32 +00:00
Sign in to join this conversation.
No description provided.