fix(ci): avoid failing canvas publish on gha cache export #777

Merged
hongming-codex-laptop merged 1 commits from fix/publish-canvas-disable-gha-cache-20260512 into main 2026-05-13 02:41:06 +00:00

Summary

  • remove type=gha Buildx cache import/export from publish-canvas-image.yml
  • keep the successful ECR build/push path as the source of truth for the publish job
  • document why Gitea artifact-cache reachability must not fail an already-pushed image

Root cause

After PR #776 merged, the main publish-canvas-image job got past the Docker probe, built the canvas image, and pushed both ECR tags. It then failed during Buildx cache export to the Gitea Actions artifact-cache endpoint (172.18.0.6:46663) with an I/O timeout. The cache is an optimization; treating cache export as required made main red after the publish had already succeeded.

Verification

  • python3 -m pytest tests/test_lint_workflow_yaml.py tests/test_lint_continue_on_error_tracking.py -q
  • git diff --check
  • live main action log task 48813 inspected: ECR manifests for latest and sha-e487b20 pushed successfully before the type=gha cache export timeout failed the job

SOP-Checklist

  • Comprehensive testing performed: Workflow lint suite and whitespace check passed locally; live main action log was inspected directly.
  • Local-postgres E2E run: Not applicable; this is a workflow cache-configuration fix with no DB/runtime handler behavior.
  • Staging-smoke verified or pending: Pending on PR/main CI rerun; the prior main run already pushed the canvas image before cache export failed.
  • Root-cause not symptom: Removed the required type=gha cache export that failed after a successful ECR push instead of overriding the failed status.
  • Five-Axis review walked: Correctness, readability, architecture, security, and operations reviewed; no credential handling changed.
  • No backwards-compat shim / dead code added: Removed the fragile cache optimization from this publish path rather than adding retry/fallback branches.
  • Memory/saved-feedback consulted: Used current Gitea CI context and validated the live action log before patching.
## Summary - remove `type=gha` Buildx cache import/export from `publish-canvas-image.yml` - keep the successful ECR build/push path as the source of truth for the publish job - document why Gitea artifact-cache reachability must not fail an already-pushed image ## Root cause After PR #776 merged, the main `publish-canvas-image` job got past the Docker probe, built the canvas image, and pushed both ECR tags. It then failed during Buildx cache export to the Gitea Actions artifact-cache endpoint (`172.18.0.6:46663`) with an I/O timeout. The cache is an optimization; treating cache export as required made main red after the publish had already succeeded. ## Verification - `python3 -m pytest tests/test_lint_workflow_yaml.py tests/test_lint_continue_on_error_tracking.py -q` - `git diff --check` - live main action log task `48813` inspected: ECR manifests for `latest` and `sha-e487b20` pushed successfully before the `type=gha` cache export timeout failed the job ## SOP-Checklist - [x] **Comprehensive testing performed**: Workflow lint suite and whitespace check passed locally; live main action log was inspected directly. - [x] **Local-postgres E2E run**: Not applicable; this is a workflow cache-configuration fix with no DB/runtime handler behavior. - [x] **Staging-smoke verified or pending**: Pending on PR/main CI rerun; the prior main run already pushed the canvas image before cache export failed. - [x] **Root-cause not symptom**: Removed the required `type=gha` cache export that failed after a successful ECR push instead of overriding the failed status. - [x] **Five-Axis review walked**: Correctness, readability, architecture, security, and operations reviewed; no credential handling changed. - [x] **No backwards-compat shim / dead code added**: Removed the fragile cache optimization from this publish path rather than adding retry/fallback branches. - [x] **Memory/saved-feedback consulted**: Used current Gitea CI context and validated the live action log before patching.
hongming-codex-laptop added 1 commit 2026-05-13 02:37:13 +00:00
fix(ci): avoid failing canvas publish on gha cache export
All checks were successful
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 3s
Lint curl status-code capture / Scan workflows for curl status-capture pollution (pull_request) Successful in 7s
CI / Detect changes (pull_request) Successful in 14s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 11s
E2E API Smoke Test / detect-changes (pull_request) Successful in 16s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 15s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 22s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 23s
gate-check-v3 / gate-check (pull_request) Successful in 18s
sop-checklist-gate / gate (pull_request) Successful in 12s
CI / Platform (Go) (pull_request) Successful in 5s
sop-tier-check / tier-check (pull_request) Successful in 13s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 4s
CI / Canvas (Next.js) (pull_request) Successful in 4s
CI / Python Lint & Test (pull_request) Successful in 4s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 6s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 7s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 7s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 5s
CI / all-required (pull_request) Successful in 3s
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 1m9s
Lint pre-flip continue-on-error / Verify continue-on-error flips have run-log proof (pull_request) Successful in 1m22s
lint-continue-on-error-tracking / lint-continue-on-error-tracking (pull_request) Successful in 1m24s
Lint workflow YAML (Gitea-1.22.6-hostile shapes) / Lint workflow YAML for Gitea-1.22.6-hostile shapes (pull_request) Successful in 1m25s
lint-required-context-exists-in-bp / lint-required-context-exists-in-bp (pull_request) Successful in 1m34s
qa-review / approved (pull_request) verified: fresh QA approval; recheck succeeded on issue-comment run
security-review / approved (pull_request) verified: fresh security approval; recheck succeeded on issue-comment run
sop-checklist / all-items-acked (pull_request) acked: 7/7
audit-force-merge / audit (pull_request) Successful in 5s
cefbc26005
hongming-codex-laptop added the
tier:medium
label 2026-05-13 02:37:35 +00:00
core-qa approved these changes 2026-05-13 02:38:08 +00:00
core-qa left a comment
Member

QA approval for current head cefbc26: verified workflow lint and live main log task 48813 showing ECR push succeeded before Gitea gha cache export timed out.

QA approval for current head cefbc26: verified workflow lint and live main log task 48813 showing ECR push succeeded before Gitea gha cache export timed out.
core-security approved these changes 2026-05-13 02:38:09 +00:00
core-security left a comment
Member

Security approval for current head cefbc26: no credential path changes; removing Buildx gha cache does not expose secrets and keeps ECR auth unchanged.

Security approval for current head cefbc26: no credential path changes; removing Buildx gha cache does not expose secrets and keeps ECR auth unchanged.
Member

/sop-ack comprehensive-testing — workflow lint suite and diff check passed; live main task 48813 validated the cache-export-only failure.

/sop-ack comprehensive-testing — workflow lint suite and diff check passed; live main task 48813 validated the cache-export-only failure.
Member

/sop-ack local-postgres-e2e — N/A is valid for workflow cache configuration; no DB/runtime handler path changed.

/sop-ack local-postgres-e2e — N/A is valid for workflow cache configuration; no DB/runtime handler path changed.
Member

/sop-ack staging-smoke — pending on PR/main CI rerun; prior run already pushed the ECR image before cache export failed.

/sop-ack staging-smoke — pending on PR/main CI rerun; prior run already pushed the ECR image before cache export failed.
Member

/sop-ack root-cause — root is required type=gha cache export timing out after successful ECR push, so the fragile cache optimization was removed.

/sop-ack root-cause — root is required type=gha cache export timing out after successful ECR push, so the fragile cache optimization was removed.
Member

/sop-ack five-axis-review — correctness/readability/architecture/security/ops reviewed; no credential handling changed.

/sop-ack five-axis-review — correctness/readability/architecture/security/ops reviewed; no credential handling changed.
Member

/sop-ack no-backwards-compat — removed brittle cache optimization directly; no retry shim or dead branch added.

/sop-ack no-backwards-compat — removed brittle cache optimization directly; no retry shim or dead branch added.
Member

/sop-ack memory-consulted — current Gitea CI context used and live logs were validated before patching.

/sop-ack memory-consulted — current Gitea CI context used and live logs were validated before patching.
Author
Member

/qa-recheck

/qa-recheck
Author
Member

/security-recheck

/security-recheck
hongming-codex-laptop merged commit bc9c61ff47 into main 2026-05-13 02:41:06 +00:00
Sign in to join this conversation.
No description provided.