Some checks failed
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 6s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 7s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 5s
CodeQL / Analyze (${{ matrix.language }}) (go) (pull_request) Failing after 54s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 5s
CI / Detect changes (pull_request) Successful in 5s
E2E API Smoke Test / detect-changes (pull_request) Successful in 6s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 6s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 6s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 6s
CI / Platform (Go) (pull_request) Successful in 3s
CI / Python Lint & Test (pull_request) Successful in 3s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 3s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 5s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 4s
Ops Scripts Tests / Ops scripts (unittest) (pull_request) Failing after 13s
CI / Canvas (Next.js) (pull_request) Successful in 42s
CodeQL / Analyze (${{ matrix.language }}) (javascript-typescript) (pull_request) Failing after 1m18s
CodeQL / Analyze (${{ matrix.language }}) (python) (pull_request) Failing after 1m20s
Per documentation-specialist's grep agent (2026-05-07T07:30, see internal#46): runtime-breaking ghcr.io references in shell scripts + docker-compose + the slip-past-workflow lint_secret_pattern_drift.py all need migration. These were missed by security-auditor's workflow-only audit. Files (6): - .github/scripts/lint_secret_pattern_drift.py:40 — workspace-runtime pre-commit-checks.sh consumer URL: raw.githubusercontent.com → Gitea raw URL (https://git.moleculesai.app/molecule-ai/.../raw/ branch/main/...). The lint job runs in CI and would 404 today. - scripts/refresh-workspace-images.sh:54 — workspace-template image pull URL: ghcr.io → ECR (153263036946.dkr.ecr.us-east-2.amazonaws.com). - scripts/rollback-latest.sh — full rewrite of header + auth flow: * ghcr.io/molecule-ai/{platform,platform-tenant} → ECR * GITHUB_TOKEN with write:packages → AWS ECR auth (aws ecr get-login-password). Per saved memory reference_post_suspension_pipeline, prod cutover is to ECR. * Updated header docs to match new auth flow + prereqs. - scripts/demo-freeze.sh:13,17 — comment-only ghcr → ECR (the script doesn't currently exec these URLs, but the comments describe the cascade and need to match reality). - docker-compose.yml:215-216 — canvas image: ghcr.io → ECR + updated the auth comment to describe `aws ecr get-login-password` flow. - tools/check-template-parity.sh:21 — inline curl install instructions: raw.githubusercontent.com → Gitea raw URL. Hostile self-review: 1. rollback-latest.sh's GITHUB_TOKEN→aws-cli auth swap is a behavior change. Operators using this script now need aws CLI authenticated for region us-east-2 with ECR pull/push perms. Documented in updated header. Operators who don't have aws CLI will get 'aws: command not installed' which is a clear failure mode (not silent). 2. The Gitea raw URL shape (/raw/branch/main/) differs from GitHub's raw.githubusercontent.com structure. Verified pattern by inspecting other Gitea raw URLs in the codebase. If Gitea's URL changes (1.23+), update via the same one-line edit. 3. Doesn't touch packer/scripts/install-base.sh which has a similar ghcr.io ref per the grep agent's findings — that's bigger-scope (packer build pipeline) and lives in molecule-controlplane-ish territory; filing as parked follow-up under #46 if not already. Refs: molecule-ai/internal#46, molecule-ai/internal#37, molecule-ai/internal#38, saved memory reference_post_suspension_pipeline
86 lines
2.8 KiB
Bash
Executable File
86 lines
2.8 KiB
Bash
Executable File
#!/bin/bash
|
|
# rollback-latest.sh — moves the :latest tag on the platform image
|
|
# (and the matching tenant image) on AWS ECR back to a prior
|
|
# :staging-<sha> digest without rebuilding anything. Prod tenants
|
|
# auto-pull :latest every 5 min, so this is the fast path when a
|
|
# canary-verified image turns out to have a runtime regression that
|
|
# canary didn't catch.
|
|
#
|
|
# Usage:
|
|
# scripts/rollback-latest.sh <sha>
|
|
# scripts/rollback-latest.sh 4c1d56e
|
|
#
|
|
# Prereqs:
|
|
# - crane on $PATH (brew install crane OR download from
|
|
# https://github.com/google/go-containerregistry/releases)
|
|
# - aws CLI authenticated for region us-east-2 with ECR pull/push
|
|
# access to the molecule-ai/platform + platform-tenant repositories.
|
|
# `aws sts get-caller-identity` should succeed.
|
|
#
|
|
# What it does (per image — platform + tenant):
|
|
# crane digest <ecr>:<sha> # verify the target sha exists
|
|
# crane tag <ecr>:<sha> latest # retag remotely, single API call
|
|
# crane digest <ecr>:latest # confirm the move
|
|
#
|
|
# Exit codes: 0 = both retagged, 1 = tag missing / crane error, 2 = bad args.
|
|
|
|
set -euo pipefail
|
|
|
|
if [ "${1:-}" = "" ]; then
|
|
echo "usage: $0 <staging-sha>" >&2
|
|
echo " e.g. $0 4c1d56e — retags :latest to :staging-4c1d56e" >&2
|
|
exit 2
|
|
fi
|
|
|
|
TARGET_SHA="$1"
|
|
ECR_HOST=153263036946.dkr.ecr.us-east-2.amazonaws.com
|
|
PLATFORM=$ECR_HOST/molecule-ai/platform
|
|
TENANT=$ECR_HOST/molecule-ai/platform-tenant
|
|
|
|
if ! command -v crane >/dev/null; then
|
|
echo "ERROR: crane not installed. brew install crane" >&2
|
|
exit 1
|
|
fi
|
|
if ! command -v aws >/dev/null; then
|
|
echo "ERROR: aws CLI not installed. brew install awscli" >&2
|
|
exit 1
|
|
fi
|
|
|
|
# Log in once. ECR auth is via short-lived password from `aws ecr
|
|
# get-login-password`. crane stores creds in a config file keyed by
|
|
# registry; re-running is cheap.
|
|
aws ecr get-login-password --region us-east-2 | crane auth login "$ECR_HOST" -u AWS --password-stdin >/dev/null
|
|
|
|
roll() {
|
|
local image="$1"
|
|
local src="$image:staging-$TARGET_SHA"
|
|
local dst="$image:latest"
|
|
|
|
echo "→ $image"
|
|
# Abort rollout if the target tag doesn't exist in the registry.
|
|
# Otherwise crane tag would error anyway, but a pre-check gives a
|
|
# clearer message for ops.
|
|
if ! crane digest "$src" >/dev/null 2>&1; then
|
|
echo " FAIL: $src not found in registry. Did you type the wrong sha?" >&2
|
|
return 1
|
|
fi
|
|
local src_digest=$(crane digest "$src")
|
|
|
|
crane tag "$src" latest
|
|
local new_digest=$(crane digest "$dst")
|
|
|
|
if [ "$new_digest" != "$src_digest" ]; then
|
|
echo " FAIL: $dst digest $new_digest does not match expected $src_digest" >&2
|
|
return 1
|
|
fi
|
|
echo " OK $dst → $new_digest"
|
|
}
|
|
|
|
roll "$PLATFORM"
|
|
roll "$TENANT"
|
|
|
|
echo
|
|
echo "=== ROLLBACK COMPLETE ==="
|
|
echo "Both images now point :latest at staging-$TARGET_SHA."
|
|
echo "Prod tenants will pick up the rollback within their 5-min auto-update cycle."
|