ci(canvas): deterministic ordered canvas deploy + digest-pin (core#2226) #2233

Merged
hongming merged 1 commits from fix/core2226-canvas-ordered-deploy into main 2026-06-04 09:04:19 +00:00
Owner

Closes #2226

Problem

The standalone molecule-ai/canvas image had no ordered/verified deploy — unlike the platform (publish-workspace-server-image.yml: build → push :staging-<sha> → fleet redeploy → re-point :latest after /buildinfo verify). publish-canvas-image.yml only built+pushed :latest + :sha-<sha>, and docker-compose.yml:170 referenced canvas:latest unpinned (standing TODO: pin canvas ECR image digest). Tenants/hosts only picked up new canvas as a side effect of the platform fleet-redeploy pulling the mutable :latest — non-deterministic and unverifiable, hence the advisory "Canvas Deploy Reminder".

Fix — mirror the platform's ordered deploy

publish-canvas-image.yml

  • build-and-push now pushes :staging-<sha> + :staging-latest (+ legacy :sha-<sha> for back-compat) and no longer moves :latest — an unpromoted/red build can never become the prod-blessed tag.
  • New promote-canvas job (needs: build-and-push, if: push && main): waits for green main CI on this SHA via the same prod-auto-deploy.py wait-ci SSOT the platform deploy-production uses, then re-points :latest to the verified :staging-<sha> by digest (imagetools create, no rebuild). So :latest == the last CI-green canvas, and platform + canvas advance :latest off the identical signal/SHA. Honors the PROD_AUTO_DEPLOY_DISABLED kill-switch; reuses the platform's writable-HOME/continue-on-error patterns.

docker-compose.yml — canvas image pins via CANVAS_IMAGE_TAG (default latest = prod-blessed; set staging-<sha> or staging-<sha>@<digest> for a fully reproducible deploy). Resolves the standing TODO: pin canvas ECR image digest. Local-dev build: context unchanged.

ci.yml — replaced the advisory "Canvas Deploy Reminder" (which prescribed a manual docker compose pull canvas) with "Canvas Deploy Status" recording that the ordered deploy is now handling it. (This job was never a required branch-protection context; advisory-only.)

New deploy sequence

canvas/** merge to main → build+push :staging-<sha>/:staging-latestpromote-canvas waits green main CI on the SHA → promote :latest:staging-<sha> by digest. Image always exists in ECR before :latest is re-pointed (mirrors the platform build-then-deploy ordering).

How it's verified

The platform's per-tenant /buildinfo already proves the tenant canvas: prod tenants serve canvas baked into platform-tenant (Dockerfile.tenant Stage 2 builds canvas/ at the same SHA), which is already ordered + /buildinfo-verified. This PR makes the standalone molecule-ai/canvas image (the co-located docker-compose canvas / canvas.moleculesai.app surface) equally deterministic: CI-green gate + immutable :staging-<sha> + digest-pin. The canvas process has no /buildinfo of its own today, so the served-SHA assertion is not yet possible for the standalone image — flagged below.

Validation

  • python3 -c "yaml.safe_load(...)" parses all 3 edited files.
  • lint-workflow-yaml.py — clean (no Gitea-1.22.6-hostile shapes).
  • docker compose config validates; CANVAS_IMAGE_TAG default→:latest, override→:staging-<sha> both resolve; local-dev build: intact.
  • pytest test_prod_auto_deploy.py test_ci_required_drift.py test_ci_workflow_bookkeeping.py — 49 passed (covers the reused wait-ci + the job rename).

Flags / follow-ups (no CP-side dependency required)

  • No CP endpoint change needed. This is self-contained in core (CI workflow + compose). redeploy-fleet is unchanged — it deploys platform-tenant (which already bakes canvas), so the prod tenant fleet's canvas is already ordered+verified. This PR closes the gap for the standalone canvas image.
  • Follow-up (filed-worthy): a canvas /buildinfo endpoint would let promote-canvas assert the served SHA like the platform does. Tracked in #2226's "verify per-tenant via a canvas /buildinfo" item; not built here (would touch the Next.js app + needs a deploy surface to poll).
  • :staging-latest/:staging-<sha> tag scheme is new for this repo's canvas image; any external consumer pinning the legacy :sha-<sha> keeps working (still pushed).

🤖 Generated with Claude Code

Closes #2226 ## Problem The standalone `molecule-ai/canvas` image had **no ordered/verified deploy** — unlike the platform (`publish-workspace-server-image.yml`: build → push `:staging-<sha>` → fleet redeploy → re-point `:latest` after `/buildinfo` verify). `publish-canvas-image.yml` only built+pushed `:latest` + `:sha-<sha>`, and `docker-compose.yml:170` referenced `canvas:latest` **unpinned** (standing `TODO: pin canvas ECR image digest`). Tenants/hosts only picked up new canvas as a **side effect** of the platform fleet-redeploy pulling the mutable `:latest` — non-deterministic and unverifiable, hence the advisory "Canvas Deploy Reminder". ## Fix — mirror the platform's ordered deploy **`publish-canvas-image.yml`** - `build-and-push` now pushes `:staging-<sha>` + `:staging-latest` (+ legacy `:sha-<sha>` for back-compat) and **no longer moves `:latest`** — an unpromoted/red build can never become the prod-blessed tag. - New **`promote-canvas`** job (`needs: build-and-push`, `if: push && main`): waits for green main CI on this SHA via the **same** `prod-auto-deploy.py wait-ci` SSOT the platform `deploy-production` uses, then re-points `:latest` to the verified `:staging-<sha>` **by digest** (`imagetools create`, no rebuild). So `:latest` == the last CI-green canvas, and **platform + canvas advance `:latest` off the identical signal/SHA**. Honors the `PROD_AUTO_DEPLOY_DISABLED` kill-switch; reuses the platform's writable-HOME/`continue-on-error` patterns. **`docker-compose.yml`** — canvas image pins via `CANVAS_IMAGE_TAG` (default `latest` = prod-blessed; set `staging-<sha>` or `staging-<sha>@<digest>` for a fully reproducible deploy). **Resolves the standing `TODO: pin canvas ECR image digest`.** Local-dev `build:` context unchanged. **`ci.yml`** — replaced the advisory **"Canvas Deploy Reminder"** (which prescribed a manual `docker compose pull canvas`) with **"Canvas Deploy Status"** recording that the ordered deploy is now handling it. (This job was never a required branch-protection context; advisory-only.) ## New deploy sequence `canvas/**` merge to main → build+push `:staging-<sha>`/`:staging-latest` → `promote-canvas` waits green main CI on the SHA → promote `:latest` → `:staging-<sha>` by digest. Image always exists in ECR before `:latest` is re-pointed (mirrors the platform build-then-deploy ordering). ## How it's verified The platform's per-tenant `/buildinfo` already proves the **tenant** canvas: prod tenants serve canvas baked into `platform-tenant` (`Dockerfile.tenant` Stage 2 builds `canvas/` at the same SHA), which is already ordered + `/buildinfo`-verified. This PR makes the **standalone** `molecule-ai/canvas` image (the co-located `docker-compose` canvas / `canvas.moleculesai.app` surface) equally deterministic: CI-green gate + immutable `:staging-<sha>` + digest-pin. The canvas process has no `/buildinfo` of its own today, so the served-SHA assertion is not yet possible for the standalone image — flagged below. ## Validation - `python3 -c "yaml.safe_load(...)"` parses all 3 edited files. - `lint-workflow-yaml.py` — clean (no Gitea-1.22.6-hostile shapes). - `docker compose config` validates; `CANVAS_IMAGE_TAG` default→`:latest`, override→`:staging-<sha>` both resolve; local-dev `build:` intact. - `pytest test_prod_auto_deploy.py test_ci_required_drift.py test_ci_workflow_bookkeeping.py` — 49 passed (covers the reused `wait-ci` + the job rename). ## Flags / follow-ups (no CP-side dependency required) - **No CP endpoint change needed.** This is self-contained in core (CI workflow + compose). `redeploy-fleet` is unchanged — it deploys `platform-tenant` (which already bakes canvas), so the prod tenant fleet's canvas is already ordered+verified. This PR closes the gap for the *standalone* canvas image. - **Follow-up (filed-worthy):** a canvas `/buildinfo` endpoint would let `promote-canvas` assert the **served** SHA like the platform does. Tracked in #2226's "verify per-tenant via a canvas `/buildinfo`" item; not built here (would touch the Next.js app + needs a deploy surface to poll). - `:staging-latest`/`:staging-<sha>` tag scheme is new for this repo's canvas image; any external consumer pinning the legacy `:sha-<sha>` keeps working (still pushed). 🤖 Generated with [Claude Code](https://claude.com/claude-code)
hongming added 1 commit 2026-06-04 08:59:55 +00:00
ci(canvas): deterministic ordered canvas deploy + digest-pin (core#2226)
ci-arm64-advisory / fast-checks (pull_request) Waiting to run
CI / Python Lint & Test (pull_request) Successful in 5s
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 13s
Lint shellcheck (arm64 pilot) / shellcheck-arm64 (pilot) (pull_request) Successful in 1s
E2E API Smoke Test / detect-changes (pull_request) Successful in 12s
E2E Chat / detect-changes (pull_request) Successful in 12s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 13s
CI / Detect changes (pull_request) Successful in 23s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 19s
Lint forbidden tenant-env keys / Scan for repo-host token write into tenant workspace surface (pull_request) Successful in 3s
Lint forbidden tenant-env keys / Scan workspace_secrets writers for forbidden env keys (pull_request) Successful in 3s
Lint curl status-code capture / Scan workflows for curl status-capture pollution (pull_request) Successful in 10s
lint-required-workflows-docker-host-pinned / Lint docker-host pin on docker-touching workflows (pull_request) Successful in 3s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 3s
gate-check-v3 / gate-check (pull_request_target) Successful in 4s
qa-review / approved (pull_request_target) Failing after 3s
sop-checklist / all-items-acked (pull_request) acked: 0/7 — missing: comprehensive-testing, local-postgres-e2e, staging-smoke, +4 — body-unfilled: comprehensive-testing, local-postgres-e2
sop-checklist / na-declarations (pull_request) N/A: (none)
security-review / approved (pull_request_target) Failing after 5s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 2s
lint-continue-on-error-tracking / lint-continue-on-error-tracking (pull_request) Successful in 1m26s
E2E Chat / E2E Chat (pull_request) Successful in 19s
CI / Platform (Go) (pull_request) Successful in 8s
Lint pre-flip continue-on-error / Verify continue-on-error flips have run-log proof (pull_request) Successful in 1m14s
lint-mask-pr-atomicity / lint-mask-pr-atomicity (pull_request) Successful in 1m18s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 4s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 2s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 4s
CI / Canvas (Next.js) (pull_request) Successful in 9s
lint-required-context-exists-in-bp / lint-required-context-exists-in-bp (pull_request) Failing after 1m22s
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 1m21s
Lint workflow YAML (Gitea-1.22.6-hostile shapes) / Lint workflow YAML for Gitea-1.22.6-hostile shapes (pull_request) Successful in 1m11s
CI / all-required (pull_request) Successful in 6s
sop-tier-check / tier-check (pull_request_target) Has been cancelled
sop-checklist / all-items-acked (pull_request_target) Has been cancelled
sop-checklist / review-refire (pull_request_target) Has been cancelled
CI / Canvas Deploy Status (pull_request) Has been skipped
qa-review / approved (pull_request_review) Has been skipped
security-review / approved (pull_request_review) Has been skipped
sop-tier-check / tier-check (pull_request_review) Successful in 11s
audit-force-merge / audit (pull_request_target) Successful in 5s
41b842cc98
The standalone molecule-ai/canvas image previously only built+pushed
:latest + :sha-<sha> with no deploy step, and docker-compose referenced
canvas:latest UNPINNED. Tenants/hosts picked up new canvas only as a side
effect of the platform fleet-redeploy pulling :latest — non-deterministic
and unverifiable, hence the advisory "Canvas Deploy Reminder".

Mirror the platform's ordered deploy (publish-workspace-server-image.yml):

- publish-canvas-image.yml: build job now pushes :staging-<sha> +
  :staging-latest (+ legacy :sha-<sha>) and no longer moves :latest. New
  promote-canvas job waits for green main CI on the SHA (same
  prod-auto-deploy wait-ci SSOT the platform deploy uses), then re-points
  :latest to the verified :staging-<sha> by digest (imagetools create).
  So :latest == last CI-green canvas, and platform+canvas advance off the
  identical signal/SHA. Honors the PROD_AUTO_DEPLOY_DISABLED kill-switch.

- docker-compose.yml: canvas image pins via CANVAS_IMAGE_TAG (default
  latest = prod-blessed; set staging-<sha> or staging-<sha>@<digest> for a
  reproducible deploy). Resolves the standing TODO: pin canvas ECR digest.
  Local-dev `build:` context unchanged.

- ci.yml: replace the advisory "Canvas Deploy Reminder" (prescribed a
  manual docker compose pull) with "Canvas Deploy Status" recording that
  the ordered deploy is handling it.

Closes #2226

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
hongming added the tier:low label 2026-06-04 09:02:54 +00:00
core-qa approved these changes 2026-06-04 09:02:56 +00:00
core-qa left a comment
Member

QA approve. Ordered+digest-pinned canvas deploy (promote-canvas via the platform prod-auto-deploy SSOT); replaces advisory reminder. Found tenants bake canvas into platform-tenant (already ordered) so no CP change; only the standalone canvas image made deterministic. 49 tests pass, YAMLs+compose validate.

QA approve. Ordered+digest-pinned canvas deploy (promote-canvas via the platform prod-auto-deploy SSOT); replaces advisory reminder. Found tenants bake canvas into platform-tenant (already ordered) so no CP change; only the standalone canvas image made deterministic. 49 tests pass, YAMLs+compose validate.
core-security approved these changes 2026-06-04 09:02:57 +00:00
core-security left a comment
Member

Security approve. CI/workflow+compose only; digest-pin tightens (not loosens) reproducibility; honors PROD_AUTO_DEPLOY_DISABLED. No auth surface change.

Security approve. CI/workflow+compose only; digest-pin tightens (not loosens) reproducibility; honors PROD_AUTO_DEPLOY_DISABLED. No auth surface change.
Author
Owner

/qa-recheck

/qa-recheck
Author
Owner

/security-recheck

/security-recheck
hongming merged commit 28519c6dbe into main 2026-06-04 09:04:19 +00:00
Sign in to join this conversation.
3 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: molecule-ai/molecule-core#2233