molecule-core

molecule-ai/molecule-core

Fork 2

Commit Graph

Author SHA1 Message Date

Author	SHA1	Message	Date
Molecule AI Core-DevOps	af5406d29e	fix(ci): migrate canary-verify from GHCR to ECR + add POST route smoke tests Some checks failed Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 4s Details sop-tier-check / tier-check (pull_request) Failing after 4s Details Root cause of issue #213: canary-verify.yml still used GHCR (ghcr.io/molecule-ai/platform-tenant) while publish-workspace-server-image.yml migrated to ECR on 2026-05-07 (commit `10e510f5`). Canary smoke tests were silently testing a stale GHCR image while actual staging/prod tenants ran the ECR build. The POST /org/import and POST /workspaces routes were missing from the ECR binary (likely a Docker layer-caching artefact during the staging push window) but smoke tests passed because they never tested the ECR image at all. Changes: - canary-verify.yml: migrate promote-to-latest from GHCR crane tag ops to the CP redeploy-fleet endpoint (same mechanism as redeploy-tenants-on-main.yml). The wait-for-canaries step already read SHA from the running tenant /health (registry-agnostic), so no change needed there. Pre-fix promote step used `crane tag` against GHCR, which was never updated after the ECR migration. - redeploy-tenants-on-main.yml: update stale comments that reference GHCR to reflect ECR; replace the 30s GHCR CDN propagation wait with a no-op comment (ECR has no CDN cache to wait for). - scripts/canary-smoke.sh: add POST /org/import and POST /workspaces smoke tests (steps 6-8). These assert HTTP 401 unauthenticated (proves AdminAuth enforced AND the route is compiled in — 404 would mean route missing from binary). GET /workspaces was already covered; POST was the untested gap. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-10 02:10:12 +00:00
Hongming Wang	9662590360	feat(canary): smoke harness + GHA verification workflow (Phase 2) Post-deploy verification for staging tenant images. Runs against the canary fleet after each publish-workspace-server-image build — catches auto-update breakage (a la today's E2E current_task drift) before it propagates to the prod tenant fleet that auto-pulls :latest every 5 min. scripts/canary-smoke.sh iterates a space-sep list of canary base URLs (paired with their ADMIN_TOKENs) and checks: - /admin/liveness reachable with admin bearer (tenant boot OK) - /workspaces list responds (wsAuth + DB path OK) - /memories/commit + /memories/search round-trip (encryption + scrubber) - /events admin read (AdminAuth C4 path) - /admin/liveness without bearer returns 401 (C4 fail-closed regression) .github/workflows/canary-verify.yml runs after publish succeeds: - 6-min sleep (tenant auto-updater pulls every 5 min) - bash scripts/canary-smoke.sh with secrets pulled from repo settings - on failure: writes a Step Summary flagging that :latest should be rolled back to prior known-good digest Phase 3 follow-up will split the publish workflow so only :staging-<sha> ships initially, and canary-verify's green gate is what promotes :staging-<sha> → :latest. This commit lays the test gate alone so we have something running against tenants immediately. Secrets to set in GitHub repo settings before this workflow can run: - CANARY_TENANT_URLS (space-sep list) - CANARY_ADMIN_TOKENS (same order as URLs) - CANARY_CP_SHARED_SECRET (matches staging CP PROVISION_SHARED_SECRET) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-19 03:30:19 -07:00

Molecule AI Core-DevOps

af5406d29e

fix(ci): migrate canary-verify from GHCR to ECR + add POST route smoke tests

Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 4s

Details

sop-tier-check / tier-check (pull_request) Failing after 4s

Details

Root cause of issue #213: canary-verify.yml still used GHCR
(ghcr.io/molecule-ai/platform-tenant) while
publish-workspace-server-image.yml migrated to ECR on 2026-05-07
(commit 10e510f5). Canary smoke tests were silently testing a stale
GHCR image while actual staging/prod tenants ran the ECR build.
The POST /org/import and POST /workspaces routes were missing from
the ECR binary (likely a Docker layer-caching artefact during the
staging push window) but smoke tests passed because they never tested
the ECR image at all.

Changes:
- canary-verify.yml: migrate promote-to-latest from GHCR crane tag
  ops to the CP redeploy-fleet endpoint (same mechanism as
  redeploy-tenants-on-main.yml). The wait-for-canaries step already
  read SHA from the running tenant /health (registry-agnostic), so
  no change needed there. Pre-fix promote step used `crane tag` against
  GHCR, which was never updated after the ECR migration.
- redeploy-tenants-on-main.yml: update stale comments that reference
  GHCR to reflect ECR; replace the 30s GHCR CDN propagation wait
  with a no-op comment (ECR has no CDN cache to wait for).
- scripts/canary-smoke.sh: add POST /org/import and POST /workspaces
  smoke tests (steps 6-8). These assert HTTP 401 unauthenticated
  (proves AdminAuth enforced AND the route is compiled in — 404 would
  mean route missing from binary). GET /workspaces was already covered;
  POST was the untested gap.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

2026-05-10 02:10:12 +00:00

Hongming Wang

9662590360

feat(canary): smoke harness + GHA verification workflow (Phase 2)

Post-deploy verification for staging tenant images. Runs against the
canary fleet after each publish-workspace-server-image build — catches
auto-update breakage (a la today's E2E current_task drift) before it
propagates to the prod tenant fleet that auto-pulls :latest every 5 min.

scripts/canary-smoke.sh iterates a space-sep list of canary base URLs
(paired with their ADMIN_TOKENs) and checks:
- /admin/liveness reachable with admin bearer (tenant boot OK)
- /workspaces list responds (wsAuth + DB path OK)
- /memories/commit + /memories/search round-trip (encryption + scrubber)
- /events admin read (AdminAuth C4 path)
- /admin/liveness without bearer returns 401 (C4 fail-closed regression)

.github/workflows/canary-verify.yml runs after publish succeeds:
- 6-min sleep (tenant auto-updater pulls every 5 min)
- bash scripts/canary-smoke.sh with secrets pulled from repo settings
- on failure: writes a Step Summary flagging that :latest should be
  rolled back to prior known-good digest

Phase 3 follow-up will split the publish workflow so only
:staging-<sha> ships initially, and canary-verify's green gate is
what promotes :staging-<sha> → :latest. This commit lays the test
gate alone so we have something running against tenants immediately.

Secrets to set in GitHub repo settings before this workflow can run:
- CANARY_TENANT_URLS (space-sep list)
- CANARY_ADMIN_TOKENS (same order as URLs)
- CANARY_CP_SHARED_SECRET (matches staging CP PROVISION_SHARED_SECRET)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-04-19 03:30:19 -07:00

2 Commits