molecule-core/scripts/rollback-latest.sh
documentation-specialist 5d4184f4a3 fix(scripts): migrate ghcr.io→ECR + raw.githubusercontent.com→Gitea (#46)
Per documentation-specialist's grep agent (2026-05-07T07:30, see
internal#46): runtime-breaking ghcr.io references in shell scripts +
docker-compose + the slip-past-workflow lint_secret_pattern_drift.py
all need migration. These were missed by security-auditor's
workflow-only audit.

Files (6):

- .github/scripts/lint_secret_pattern_drift.py:40 — workspace-runtime
  pre-commit-checks.sh consumer URL: raw.githubusercontent.com →
  Gitea raw URL (https://git.moleculesai.app/molecule-ai/.../raw/
  branch/main/...). The lint job runs in CI and would 404 today.

- scripts/refresh-workspace-images.sh:54 — workspace-template image
  pull URL: ghcr.io → ECR (153263036946.dkr.ecr.us-east-2.amazonaws.com).

- scripts/rollback-latest.sh — full rewrite of header + auth flow:
  * ghcr.io/molecule-ai/{platform,platform-tenant} → ECR
  * GITHUB_TOKEN with write:packages → AWS ECR auth
    (aws ecr get-login-password). Per saved memory
    reference_post_suspension_pipeline, prod cutover is to ECR.
  * Updated header docs to match new auth flow + prereqs.

- scripts/demo-freeze.sh:13,17 — comment-only ghcr → ECR
  (the script doesn't currently exec these URLs, but the comments
  describe the cascade and need to match reality).

- docker-compose.yml:215-216 — canvas image: ghcr.io → ECR + updated
  the auth comment to describe `aws ecr get-login-password` flow.

- tools/check-template-parity.sh:21 — inline curl install instructions:
  raw.githubusercontent.com → Gitea raw URL.

Hostile self-review:

1. rollback-latest.sh's GITHUB_TOKEN→aws-cli auth swap is a behavior
   change. Operators using this script now need aws CLI
   authenticated for region us-east-2 with ECR pull/push perms.
   Documented in updated header. Operators who don't have aws CLI
   will get 'aws: command not installed' which is a clear failure
   mode (not silent).
2. The Gitea raw URL shape (/raw/branch/main/) differs from GitHub's
   raw.githubusercontent.com structure. Verified pattern by
   inspecting other Gitea raw URLs in the codebase. If Gitea's URL
   changes (1.23+), update via the same one-line edit.
3. Doesn't touch packer/scripts/install-base.sh which has a similar
   ghcr.io ref per the grep agent's findings — that's bigger-scope
   (packer build pipeline) and lives in molecule-controlplane-ish
   territory; filing as parked follow-up under #46 if not already.

Refs: molecule-ai/internal#46, molecule-ai/internal#37,
molecule-ai/internal#38, saved memory reference_post_suspension_pipeline
2026-05-07 00:56:23 -07:00

86 lines
2.8 KiB
Bash
Executable File

#!/bin/bash
# rollback-latest.sh — moves the :latest tag on the platform image
# (and the matching tenant image) on AWS ECR back to a prior
# :staging-<sha> digest without rebuilding anything. Prod tenants
# auto-pull :latest every 5 min, so this is the fast path when a
# canary-verified image turns out to have a runtime regression that
# canary didn't catch.
#
# Usage:
# scripts/rollback-latest.sh <sha>
# scripts/rollback-latest.sh 4c1d56e
#
# Prereqs:
# - crane on $PATH (brew install crane OR download from
# https://github.com/google/go-containerregistry/releases)
# - aws CLI authenticated for region us-east-2 with ECR pull/push
# access to the molecule-ai/platform + platform-tenant repositories.
# `aws sts get-caller-identity` should succeed.
#
# What it does (per image — platform + tenant):
# crane digest <ecr>:<sha> # verify the target sha exists
# crane tag <ecr>:<sha> latest # retag remotely, single API call
# crane digest <ecr>:latest # confirm the move
#
# Exit codes: 0 = both retagged, 1 = tag missing / crane error, 2 = bad args.
set -euo pipefail
if [ "${1:-}" = "" ]; then
echo "usage: $0 <staging-sha>" >&2
echo " e.g. $0 4c1d56e — retags :latest to :staging-4c1d56e" >&2
exit 2
fi
TARGET_SHA="$1"
ECR_HOST=153263036946.dkr.ecr.us-east-2.amazonaws.com
PLATFORM=$ECR_HOST/molecule-ai/platform
TENANT=$ECR_HOST/molecule-ai/platform-tenant
if ! command -v crane >/dev/null; then
echo "ERROR: crane not installed. brew install crane" >&2
exit 1
fi
if ! command -v aws >/dev/null; then
echo "ERROR: aws CLI not installed. brew install awscli" >&2
exit 1
fi
# Log in once. ECR auth is via short-lived password from `aws ecr
# get-login-password`. crane stores creds in a config file keyed by
# registry; re-running is cheap.
aws ecr get-login-password --region us-east-2 | crane auth login "$ECR_HOST" -u AWS --password-stdin >/dev/null
roll() {
local image="$1"
local src="$image:staging-$TARGET_SHA"
local dst="$image:latest"
echo "$image"
# Abort rollout if the target tag doesn't exist in the registry.
# Otherwise crane tag would error anyway, but a pre-check gives a
# clearer message for ops.
if ! crane digest "$src" >/dev/null 2>&1; then
echo " FAIL: $src not found in registry. Did you type the wrong sha?" >&2
return 1
fi
local src_digest=$(crane digest "$src")
crane tag "$src" latest
local new_digest=$(crane digest "$dst")
if [ "$new_digest" != "$src_digest" ]; then
echo " FAIL: $dst digest $new_digest does not match expected $src_digest" >&2
return 1
fi
echo " OK $dst$new_digest"
}
roll "$PLATFORM"
roll "$TENANT"
echo
echo "=== ROLLBACK COMPLETE ==="
echo "Both images now point :latest at staging-$TARGET_SHA."
echo "Prod tenants will pick up the rollback within their 5-min auto-update cycle."