forked from molecule-ai/molecule-core
ops: demo-day freeze + rollback runbook
Demo-day preparation bundle for the funding demo (~2026-05-06). Adds: - scripts/demo-freeze.sh — captures current ghcr.io workspace-template-* :latest digests for all 8 runtimes, then disables both cascade vectors that could re-tag :latest mid-demo: publish-runtime.yml in molecule-core (PATH 1 — staging push to workspace/** auto-bumps the wheel and fans out to 8 templates) and publish-image.yml in each of the 8 template repos (PATH 2 — direct template repo merge re-tags :latest). Defaults to dry-run; requires --execute to apply. Writes both digest + workflow receipts to scripts/demo-freeze-snapshots/. - scripts/demo-thaw.sh — re-enables every workflow demo-freeze.sh disabled, keyed off the receipt timestamp. Defaults to executing (the inverse safety polarity from freeze, where the destructive default is dry-run). --dry-run prints without applying. - scripts/demo-day-runbook.md — operator runbook indexing the six rollback levers (platform image rollback, template image rollback, tenant redeploy, workspace delete, Railway rollback, Vercel rollback) plus pre-warm timing and post-demo cleanup. Also covers read-only diagnostics for "is this working?" moments and the CP_ADMIN_API_TOKEN rotation step that must follow demo (the token gets copy-pasted into shells during incident response). - scripts/demo-freeze-snapshots/.gitignore — generated freeze receipts are operational state, not source. Tracked .gitkeep so the directory exists when the script writes to it. Both scripts dry-run-tested locally. Did not exercise --execute since that would actually disable production workflows mid-development. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
parent
092724b6d7
commit
6d23611620
306
scripts/demo-day-runbook.md
Normal file
306
scripts/demo-day-runbook.md
Normal file
@ -0,0 +1,306 @@
|
||||
# Demo-day runbook
|
||||
|
||||
Pre-, during-, and post-demo operational procedures for the molecule
|
||||
production stack. Updated 2026-05-01 ahead of the funding-demo on
|
||||
~2026-05-06.
|
||||
|
||||
The whole stack:
|
||||
|
||||
```
|
||||
Vercel canvas (app.moleculesai.app)
|
||||
→ Railway controlplane (api.moleculesai.app)
|
||||
→ CloudFront/Cloudflare per-tenant edge (<slug>.moleculesai.app)
|
||||
→ EC2 tenant instance running platform container
|
||||
→ Docker workspaces pulled from
|
||||
ghcr.io/molecule-ai/workspace-template-<runtime>:latest
|
||||
```
|
||||
|
||||
Every layer has its own deploy/rollback story. This runbook indexes
|
||||
them in the order an operator would touch them during an incident.
|
||||
|
||||
## Pre-demo (T-48h to T-1h)
|
||||
|
||||
### 1. Freeze the runtime + template image cascade
|
||||
|
||||
A merge to `molecule-core/staging` that touches `workspace/**` triggers
|
||||
`publish-runtime.yml` → PyPI bump → repository_dispatch → 8 template
|
||||
repos rebuild and re-tag `:latest`. A merge to any template repo's
|
||||
`main` triggers the same final re-tag directly. Either path means a
|
||||
new workspace provision during the demo pulls whatever `:latest`
|
||||
resolved to seconds earlier.
|
||||
|
||||
Capture current good digests + disable both cascade vectors:
|
||||
|
||||
```bash
|
||||
# Dry-run first — verifies digests can be fetched and tooling is set up
|
||||
scripts/demo-freeze.sh
|
||||
|
||||
# Apply
|
||||
scripts/demo-freeze.sh --execute
|
||||
```
|
||||
|
||||
The script writes two receipts to `scripts/demo-freeze-snapshots/`:
|
||||
|
||||
- `digests-<TS>.txt` — current `:latest` digest per template (rollback target if needed)
|
||||
- `disabled-workflows-<TS>.txt` — workflow paths to re-enable post-demo
|
||||
|
||||
Verify the freeze landed:
|
||||
|
||||
```bash
|
||||
gh workflow list -R Molecule-AI/molecule-core | grep publish-runtime
|
||||
# expect: status = disabled_manually
|
||||
```
|
||||
|
||||
If a critical fix MUST ship during the freeze window:
|
||||
|
||||
1. `gh workflow enable publish-runtime.yml -R Molecule-AI/molecule-core`
|
||||
2. Merge the fix
|
||||
3. Watch the cascade through to GHCR:latest manually
|
||||
4. Smoke-verify against a staging tenant (`scripts/api-smoke.sh` or
|
||||
manual canvas walkthrough)
|
||||
5. `gh workflow disable publish-runtime.yml -R Molecule-AI/molecule-core` to re-freeze
|
||||
|
||||
Don't auto-promote during the freeze — the value of the freeze is that
|
||||
nothing happens automatically.
|
||||
|
||||
### 2. Confirm production CP is on the expected SHA
|
||||
|
||||
```bash
|
||||
gh run list -R Molecule-AI/molecule-controlplane --branch main --limit 5
|
||||
# Last `ci` run should be SUCCESS with the SHA you intend to demo on
|
||||
```
|
||||
|
||||
Railway auto-deploys from main. Spot-check `api.moleculesai.app`:
|
||||
|
||||
```bash
|
||||
curl -fsS -H "Authorization: Bearer $CP_ADMIN_API_TOKEN" \
|
||||
https://api.moleculesai.app/cp/admin/orgs?limit=1
|
||||
# Expect: 200 + a JSON {"orgs": [...]}
|
||||
```
|
||||
|
||||
### 3. Confirm production canvas (Vercel) is on main
|
||||
|
||||
Vercel auto-deploys `main`. Verify in the Vercel dashboard the most
|
||||
recent prod deploy ran from the expected commit SHA.
|
||||
|
||||
### 4. Pre-warm the demo tenant
|
||||
|
||||
Cold-start times on workspace-template images:
|
||||
|
||||
| Runtime | Cold-start (first boot) |
|
||||
|---|---|
|
||||
| claude-code | ~30-60s |
|
||||
| openclaw | ~1-2 min |
|
||||
| langgraph | ~1 min |
|
||||
| hermes | **~7 min** (large image) |
|
||||
|
||||
If the demo will use `hermes`, provision the demo workspace at least
|
||||
10 min before. The cold-start clock starts when the workspace is
|
||||
created, not when it's used.
|
||||
|
||||
## During demo — emergency rollback levers
|
||||
|
||||
### Lever A: Platform-image rollback (canvas/CP layer regression)
|
||||
|
||||
If the canvas or platform container shipped a regression, retag
|
||||
`:latest` to a prior staging SHA without rebuilding:
|
||||
|
||||
```bash
|
||||
# Find a known-good SHA from staging history
|
||||
gh run list -R Molecule-AI/molecule-core --workflow=publish-canvas-image.yml --limit 5
|
||||
|
||||
# Roll both platform + tenant images
|
||||
GITHUB_TOKEN=$(gh auth token) scripts/rollback-latest.sh <good-sha>
|
||||
```
|
||||
|
||||
`rollback-latest.sh` retags both `ghcr.io/molecule-ai/platform:latest`
|
||||
and `ghcr.io/molecule-ai/platform-tenant:latest`. Existing tenants
|
||||
auto-pull `:latest` every 5 min — rollback propagates without manual
|
||||
restart.
|
||||
|
||||
### Lever B: Workspace-template image rollback
|
||||
|
||||
If a specific runtime template (claude-code, hermes, etc.) shipped a
|
||||
broken `:latest`:
|
||||
|
||||
```bash
|
||||
# Get the demo's snapshotted-good digest from the freeze receipt
|
||||
grep claude-code scripts/demo-freeze-snapshots/digests-<TS>.txt
|
||||
|
||||
# Retag :latest back to the snapshotted digest using crane
|
||||
crane auth login ghcr.io -u "$(gh api user --jq .login)" \
|
||||
--password-stdin <<< "$(gh auth token)"
|
||||
crane tag \
|
||||
ghcr.io/molecule-ai/workspace-template-claude-code@sha256:<digest> \
|
||||
latest
|
||||
```
|
||||
|
||||
The next workspace provision pulls the rolled-back image. Existing
|
||||
workspaces are unaffected (their image is already loaded into Docker).
|
||||
|
||||
### Lever C: Wedged demo tenant — redeploy
|
||||
|
||||
If the demo tenant's EC2 instance is wedged (boot succeeded but app
|
||||
not responding, or a stuck workspace), the controlplane has an admin
|
||||
redeploy endpoint:
|
||||
|
||||
```bash
|
||||
# AWS-side: forces a fresh EC2 launch with current image. ~3 min.
|
||||
curl -fsS -X POST \
|
||||
-H "Authorization: Bearer $CP_ADMIN_API_TOKEN" \
|
||||
https://api.moleculesai.app/cp/admin/orgs/<slug>/redeploy
|
||||
```
|
||||
|
||||
WARNING per memory: this triggers real EC2 + SSM actions on production.
|
||||
Double-check `<slug>` against the demo tenant's slug before pressing
|
||||
return. The `/redeploy` endpoint is idempotent on the EC2 side but
|
||||
WILL drop active SSH sessions.
|
||||
|
||||
### Lever D: Specific bad workspace — delete
|
||||
|
||||
If a single workspace inside the demo tenant is misbehaving (e.g.
|
||||
hermes wedged on cold-start, claude-code returning the generic
|
||||
"Agent error (Exception)" message), kill it:
|
||||
|
||||
```bash
|
||||
# Get the demo tenant's per-tenant ADMIN_TOKEN
|
||||
TENANT_ADMIN=$(curl -fsS -H "Authorization: Bearer $CP_ADMIN_API_TOKEN" \
|
||||
https://api.moleculesai.app/cp/admin/orgs/<slug>/admin-token \
|
||||
| jq -r .admin_token)
|
||||
|
||||
ORG_ID=$(curl -fsS -H "Authorization: Bearer $CP_ADMIN_API_TOKEN" \
|
||||
https://api.moleculesai.app/cp/admin/orgs?limit=20 \
|
||||
| jq -r '.orgs[] | select(.slug=="<slug>") | .id')
|
||||
|
||||
# Delete the bad workspace
|
||||
curl -fsS -X DELETE \
|
||||
-H "Origin: https://<slug>.moleculesai.app" \
|
||||
-H "Authorization: Bearer $TENANT_ADMIN" \
|
||||
-H "X-Molecule-Org-Id: $ORG_ID" \
|
||||
https://<slug>.moleculesai.app/workspaces/<workspace-id>
|
||||
```
|
||||
|
||||
Then re-provision a fresh workspace from the canvas. Faster than
|
||||
debugging the wedged one.
|
||||
|
||||
### Lever E: Railway production rollback (CP regression)
|
||||
|
||||
If the last Railway deploy of CP introduced a regression that lever A
|
||||
can't fix (e.g. a logic bug, not a container issue):
|
||||
|
||||
1. Open Railway dashboard → molecule-platform → controlplane → Deployments
|
||||
2. Find the previous-known-good deployment
|
||||
3. Click **Rollback to this deployment**
|
||||
|
||||
Manual step — no CLI equivalent built. Takes ~30s to redeploy from
|
||||
the prior image. Note: rollback restores the prior code AND prior env
|
||||
var snapshot; don't expect any env var changes made since to persist.
|
||||
|
||||
### Lever F: Vercel production rollback (canvas regression)
|
||||
|
||||
If the canvas ships a regression:
|
||||
|
||||
1. Open Vercel dashboard → molecule-app → Deployments
|
||||
2. Find the previous prod deployment
|
||||
3. **Promote to Production**
|
||||
|
||||
Same pattern as Railway — fast revert, no rebuild.
|
||||
|
||||
## Tenant-level read-only diagnostics (not actions)
|
||||
|
||||
Useful during a "is this working?" moment without touching anything:
|
||||
|
||||
```bash
|
||||
# Tenant infra state
|
||||
curl -fsS -H "Authorization: Bearer $CP_ADMIN_API_TOKEN" \
|
||||
"https://api.moleculesai.app/cp/admin/orgs?limit=20" \
|
||||
| jq '.orgs[] | select(.slug=="<slug>")'
|
||||
|
||||
# Tenant boot events (debug a stuck provision)
|
||||
curl -fsS -H "Authorization: Bearer $CP_ADMIN_API_TOKEN" \
|
||||
"https://api.moleculesai.app/cp/admin/tenants/<slug>/boot-events?limit=50" \
|
||||
| jq
|
||||
|
||||
# Workspace activity (debug an unresponsive agent)
|
||||
curl -fsS \
|
||||
-H "Origin: https://<slug>.moleculesai.app" \
|
||||
-H "Authorization: Bearer $TENANT_ADMIN" \
|
||||
-H "X-Molecule-Org-Id: $ORG_ID" \
|
||||
"https://<slug>.moleculesai.app/workspaces/<workspace-id>/activity?limit=20" \
|
||||
| jq
|
||||
```
|
||||
|
||||
## Post-demo (T+30m to T+24h)
|
||||
|
||||
### 1. Thaw the cascades
|
||||
|
||||
```bash
|
||||
# Find the freeze receipt
|
||||
ls scripts/demo-freeze-snapshots/
|
||||
|
||||
# Thaw — pass the timestamp suffix
|
||||
scripts/demo-thaw.sh 20260506-180000
|
||||
```
|
||||
|
||||
The next merge to `molecule-core/staging` (workspace/**) or any
|
||||
template repo's `main` will resume the auto-rebuild cascade.
|
||||
|
||||
### 2. Audit what was held back
|
||||
|
||||
If any merges queued during the freeze:
|
||||
|
||||
```bash
|
||||
gh pr list -R Molecule-AI/molecule-core --base staging --state merged \
|
||||
--search "merged:>=$(date -u -v-7d +%Y-%m-%d)"
|
||||
```
|
||||
|
||||
Verify each merge's CI is green and dispatch the runtime cascade once
|
||||
to ensure all templates rebuild against the post-freeze HEAD.
|
||||
|
||||
### 3. File a post-mortem if anything fired
|
||||
|
||||
If any rollback lever was used during the demo, file a brief doc:
|
||||
|
||||
- Which lever (A through F)
|
||||
- Which SHA was rolled back FROM and TO
|
||||
- Did the rollback fully resolve the issue or was a follow-up needed
|
||||
- Whether the underlying regression should have been caught by CI
|
||||
|
||||
## Common issues + first-line fix
|
||||
|
||||
| Symptom | First lever to try |
|
||||
|---|---|
|
||||
| Workspace boots but agent always errors | Lever D (delete + reprovision) |
|
||||
| Whole tenant unreachable | Lever C (redeploy) |
|
||||
| Canvas crashes on load | Lever F (Vercel rollback) |
|
||||
| Login broken / API errors | Lever E (Railway rollback) |
|
||||
| Specific runtime broken across tenants | Lever B (template image rollback) |
|
||||
| Platform container regression | Lever A (rollback-latest.sh) |
|
||||
| Mid-demo stray PR auto-published a bad image | Lever B + investigate why freeze didn't catch it |
|
||||
|
||||
## Auth fingerprint (rotate post-demo)
|
||||
|
||||
The freeze + rollback procedures assume:
|
||||
|
||||
- `CP_ADMIN_API_TOKEN` available via `railway variables --kv --environment production`
|
||||
- `gh auth token` returns a working PAT with `workflow:write` + `write:packages`
|
||||
- `crane` installed (`brew install crane`)
|
||||
|
||||
After the demo, **rotate** `CP_ADMIN_API_TOKEN` (it's the keys-to-the-kingdom
|
||||
token for production) — it likely got copy-pasted into shells during
|
||||
the demo.
|
||||
|
||||
```bash
|
||||
# Generate a new admin token
|
||||
NEW_TOKEN=$(openssl rand -hex 32)
|
||||
|
||||
# Update Railway production env var (and optionally staging)
|
||||
railway variables --set CP_ADMIN_API_TOKEN="$NEW_TOKEN" --environment production
|
||||
|
||||
# Restart CP service to pick up the change
|
||||
# (Railway auto-restarts on env var change)
|
||||
|
||||
# Verify
|
||||
curl -fsS -H "Authorization: Bearer $NEW_TOKEN" \
|
||||
https://api.moleculesai.app/cp/admin/orgs?limit=1
|
||||
```
|
||||
6
scripts/demo-freeze-snapshots/.gitignore
vendored
Normal file
6
scripts/demo-freeze-snapshots/.gitignore
vendored
Normal file
@ -0,0 +1,6 @@
|
||||
# Generated by scripts/demo-freeze.sh — receipts are operational state,
|
||||
# not source. Tracked .gitignore + .gitkeep keep the directory itself
|
||||
# in version control so the freeze script's output dir always exists.
|
||||
*
|
||||
!.gitignore
|
||||
!.gitkeep
|
||||
0
scripts/demo-freeze-snapshots/.gitkeep
Normal file
0
scripts/demo-freeze-snapshots/.gitkeep
Normal file
214
scripts/demo-freeze.sh
Executable file
214
scripts/demo-freeze.sh
Executable file
@ -0,0 +1,214 @@
|
||||
#!/usr/bin/env bash
|
||||
# demo-freeze.sh — disable the runtime + template image publish cascades
|
||||
# during a demo-prep window so a stray staging merge can't auto-rebuild
|
||||
# `:latest` for the 8 workspace-template images mid-demo.
|
||||
#
|
||||
# Demo prep typically runs T-48h to T+1h. During that window:
|
||||
#
|
||||
# PATH 1: any merge to molecule-core/staging that touches workspace/**
|
||||
# → publish-runtime.yml fires
|
||||
# → PyPI auto-bumps molecule-ai-workspace-runtime patch version
|
||||
# → repository_dispatch fans out to 8 workspace-template-* repos
|
||||
# → each template repo rebuilds and re-tags
|
||||
# ghcr.io/molecule-ai/workspace-template-<runtime>:latest
|
||||
#
|
||||
# PATH 2: any merge to a workspace-template-* repo's main branch
|
||||
# → that repo's publish-image.yml fires
|
||||
# → ghcr.io/molecule-ai/workspace-template-<runtime>:latest
|
||||
# gets re-tagged
|
||||
#
|
||||
# provisioner.go:296 RuntimeImages[runtime] reads `:latest` at every
|
||||
# workspace boot. A new workspace provision during demo pulls whatever
|
||||
# `:latest` resolved to seconds earlier — so a bad merge minutes
|
||||
# before the demo can break a tenant the funder is about to see.
|
||||
#
|
||||
# This script captures the current good `:latest` digests for all 8
|
||||
# templates and disables both cascade vectors. The complementary
|
||||
# demo-thaw.sh re-enables them.
|
||||
#
|
||||
# Usage:
|
||||
# scripts/demo-freeze.sh # dry run — print what would happen
|
||||
# scripts/demo-freeze.sh --execute # actually disable workflows + snapshot
|
||||
#
|
||||
# Prereqs:
|
||||
# - gh CLI authenticated with workflow:write scope on Molecule-AI org
|
||||
# - curl + jq (for digest snapshot via GHCR anonymous registry API)
|
||||
#
|
||||
# Output:
|
||||
# <snapshot dir>/digests-YYYYMMDD-HHMMSS.txt
|
||||
# One line per template: "<runtime>: <digest>"
|
||||
# <snapshot dir>/disabled-workflows-YYYYMMDD-HHMMSS.txt
|
||||
# One line per disabled workflow: "<repo>: <workflow>"
|
||||
#
|
||||
# Exit codes:
|
||||
# 0 — freeze complete (or dry-run successful)
|
||||
# 1 — pre-flight failure (missing tooling, missing auth, etc.)
|
||||
# 2 — partial freeze (some workflows did not disable cleanly; see log)
|
||||
|
||||
set -euo pipefail
|
||||
|
||||
usage() {
|
||||
cat <<'USAGE'
|
||||
demo-freeze.sh — disable the runtime + template image publish cascades
|
||||
during a demo-prep window.
|
||||
|
||||
Captures current :latest digests for all 8 workspace-template-* images
|
||||
and disables the workflows that would otherwise re-tag them.
|
||||
|
||||
Usage:
|
||||
scripts/demo-freeze.sh # dry run — print what would happen
|
||||
scripts/demo-freeze.sh --execute # actually disable workflows + snapshot
|
||||
|
||||
See the comment block at the top of this script for the full procedure.
|
||||
USAGE
|
||||
}
|
||||
|
||||
EXECUTE=0
|
||||
case "${1:-}" in
|
||||
--execute)
|
||||
EXECUTE=1
|
||||
;;
|
||||
--help|-h)
|
||||
usage
|
||||
exit 0
|
||||
;;
|
||||
"")
|
||||
;;
|
||||
*)
|
||||
echo "unknown arg: $1" >&2
|
||||
usage >&2
|
||||
exit 2
|
||||
;;
|
||||
esac
|
||||
|
||||
# Templates and their GHCR repository slugs. Source of truth for the
|
||||
# runtime → image map is workspace-server/internal/provisioner/provisioner.go
|
||||
# RuntimeImages — keep this list in sync if a runtime is added.
|
||||
TEMPLATES=(
|
||||
"claude-code"
|
||||
"hermes"
|
||||
"openclaw"
|
||||
"langgraph"
|
||||
"deepagents"
|
||||
"crewai"
|
||||
"autogen"
|
||||
"gemini-cli"
|
||||
)
|
||||
|
||||
# Pre-flight: required tooling.
|
||||
need() {
|
||||
command -v "$1" >/dev/null || { echo "ERROR: missing required tool: $1" >&2; exit 1; }
|
||||
}
|
||||
need gh
|
||||
need curl
|
||||
need jq
|
||||
|
||||
# Pre-flight: gh auth. Snapshot via anonymous GHCR token works without
|
||||
# org auth, but workflow disable needs an authenticated gh.
|
||||
if ! gh auth status >/dev/null 2>&1; then
|
||||
echo "ERROR: gh not authenticated. Run 'gh auth login' first." >&2
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# Snapshot location relative to this script. Keeping it under scripts/
|
||||
# rather than a temp dir means freeze receipts are easy to find again
|
||||
# during the actual demo.
|
||||
SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
|
||||
SNAPSHOT_DIR="${SCRIPT_DIR}/demo-freeze-snapshots"
|
||||
mkdir -p "$SNAPSHOT_DIR"
|
||||
TS="$(date -u +%Y%m%d-%H%M%S)"
|
||||
DIGESTS_FILE="${SNAPSHOT_DIR}/digests-${TS}.txt"
|
||||
WORKFLOWS_FILE="${SNAPSHOT_DIR}/disabled-workflows-${TS}.txt"
|
||||
|
||||
if [ $EXECUTE -eq 0 ]; then
|
||||
echo "=== DRY RUN (no changes will be made; pass --execute to apply) ==="
|
||||
else
|
||||
echo "=== EXECUTING FREEZE — workflows will be disabled ==="
|
||||
fi
|
||||
echo "Snapshot timestamp: $TS"
|
||||
echo "Digest log: $DIGESTS_FILE"
|
||||
echo "Workflow log: $WORKFLOWS_FILE"
|
||||
echo
|
||||
|
||||
# Step 1: capture current :latest digest for each template.
|
||||
echo "→ Capturing current :latest digests"
|
||||
for tpl in "${TEMPLATES[@]}"; do
|
||||
token=$(curl -fsS "https://ghcr.io/token?scope=repository:molecule-ai/workspace-template-${tpl}:pull" | jq -r .token 2>/dev/null || true)
|
||||
if [ -z "$token" ] || [ "$token" = "null" ]; then
|
||||
echo " WARN: token fetch failed for $tpl — skipping digest capture"
|
||||
continue
|
||||
fi
|
||||
digest=$(curl -fsSI \
|
||||
-H "Authorization: Bearer $token" \
|
||||
-H "Accept: application/vnd.oci.image.index.v1+json" \
|
||||
-H "Accept: application/vnd.docker.distribution.manifest.v2+json" \
|
||||
"https://ghcr.io/v2/molecule-ai/workspace-template-${tpl}/manifests/latest" 2>/dev/null \
|
||||
| grep -i 'docker-content-digest' \
|
||||
| awk '{print $2}' \
|
||||
| tr -d '\r')
|
||||
if [ -z "$digest" ]; then
|
||||
echo " WARN: digest fetch failed for $tpl"
|
||||
continue
|
||||
fi
|
||||
echo " $tpl: $digest"
|
||||
if [ $EXECUTE -eq 1 ]; then
|
||||
echo "$tpl: $digest" >> "$DIGESTS_FILE"
|
||||
fi
|
||||
done
|
||||
echo
|
||||
|
||||
# Step 2: disable publish-runtime.yml in molecule-core (PATH 1 source).
|
||||
echo "→ Disabling publish-runtime.yml in molecule-core (kills runtime → 8-template cascade)"
|
||||
if [ $EXECUTE -eq 1 ]; then
|
||||
if gh workflow disable publish-runtime.yml -R Molecule-AI/molecule-core 2>/tmp/freeze.err; then
|
||||
echo " OK molecule-core/publish-runtime.yml disabled"
|
||||
echo "Molecule-AI/molecule-core: publish-runtime.yml" >> "$WORKFLOWS_FILE"
|
||||
else
|
||||
echo " FAIL molecule-core/publish-runtime.yml: $(cat /tmp/freeze.err)" >&2
|
||||
fi
|
||||
else
|
||||
echo " (dry-run) would disable: gh workflow disable publish-runtime.yml -R Molecule-AI/molecule-core"
|
||||
fi
|
||||
echo
|
||||
|
||||
# Step 3: disable publish-image.yml in each of the 8 template repos (PATH 2 sources).
|
||||
echo "→ Disabling publish-image.yml in each workspace-template-* repo"
|
||||
PARTIAL_FAIL=0
|
||||
for tpl in "${TEMPLATES[@]}"; do
|
||||
repo="Molecule-AI/molecule-ai-workspace-template-${tpl}"
|
||||
if [ $EXECUTE -eq 1 ]; then
|
||||
if gh workflow disable publish-image.yml -R "$repo" 2>/tmp/freeze.err; then
|
||||
echo " OK $repo/publish-image.yml disabled"
|
||||
echo "${repo}: publish-image.yml" >> "$WORKFLOWS_FILE"
|
||||
else
|
||||
echo " FAIL $repo/publish-image.yml: $(cat /tmp/freeze.err)" >&2
|
||||
PARTIAL_FAIL=1
|
||||
fi
|
||||
else
|
||||
echo " (dry-run) would disable: gh workflow disable publish-image.yml -R $repo"
|
||||
fi
|
||||
done
|
||||
echo
|
||||
|
||||
if [ $EXECUTE -eq 0 ]; then
|
||||
echo "=== DRY RUN COMPLETE ==="
|
||||
echo "Re-run with --execute to apply the freeze."
|
||||
exit 0
|
||||
fi
|
||||
|
||||
echo "=== FREEZE COMPLETE ==="
|
||||
echo "Receipts: $DIGESTS_FILE"
|
||||
echo " $WORKFLOWS_FILE"
|
||||
echo
|
||||
echo "Next steps:"
|
||||
echo " - Verify by running: gh workflow list -R Molecule-AI/molecule-core | grep publish-runtime"
|
||||
echo " Status should be 'disabled_manually'."
|
||||
echo " - Demo proceeds; new workspaces pull the snapshotted :latest digests."
|
||||
echo " - Post-demo, run: scripts/demo-thaw.sh ${TS}"
|
||||
echo " to re-enable every workflow this freeze disabled."
|
||||
echo
|
||||
if [ $PARTIAL_FAIL -ne 0 ]; then
|
||||
echo "WARNING: one or more workflows did not disable cleanly. Re-run after fixing." >&2
|
||||
exit 2
|
||||
fi
|
||||
exit 0
|
||||
124
scripts/demo-thaw.sh
Executable file
124
scripts/demo-thaw.sh
Executable file
@ -0,0 +1,124 @@
|
||||
#!/usr/bin/env bash
|
||||
# demo-thaw.sh — re-enable workflows that demo-freeze.sh disabled.
|
||||
#
|
||||
# Usage:
|
||||
# scripts/demo-thaw.sh <freeze-timestamp>
|
||||
# scripts/demo-thaw.sh 20260503-180000
|
||||
#
|
||||
# Reads disabled-workflows-<ts>.txt produced by demo-freeze.sh and
|
||||
# runs `gh workflow enable` for each entry. Idempotent — re-enabling
|
||||
# an already-enabled workflow is a no-op.
|
||||
#
|
||||
# Defaults to executing (the inverse of freeze, which defaults to
|
||||
# dry-run). Pass --dry-run to print without executing.
|
||||
#
|
||||
# Prereqs:
|
||||
# - gh CLI authenticated with workflow:write scope on Molecule-AI org
|
||||
#
|
||||
# Exit codes:
|
||||
# 0 — all workflows re-enabled
|
||||
# 1 — pre-flight failure (missing receipt file, missing tooling)
|
||||
# 2 — partial thaw (some workflows did not enable; check output)
|
||||
|
||||
set -euo pipefail
|
||||
|
||||
usage() {
|
||||
cat <<'USAGE'
|
||||
demo-thaw.sh — re-enable workflows that demo-freeze.sh disabled.
|
||||
|
||||
Usage:
|
||||
scripts/demo-thaw.sh <freeze-timestamp> # apply
|
||||
scripts/demo-thaw.sh <freeze-timestamp> --dry-run # print without applying
|
||||
|
||||
ts is the YYYYMMDD-HHMMSS suffix on
|
||||
scripts/demo-freeze-snapshots/disabled-workflows-*.txt produced by
|
||||
demo-freeze.sh.
|
||||
USAGE
|
||||
}
|
||||
|
||||
DRY_RUN=0
|
||||
TS=""
|
||||
for arg in "$@"; do
|
||||
case "$arg" in
|
||||
--dry-run)
|
||||
DRY_RUN=1
|
||||
;;
|
||||
--help|-h)
|
||||
usage
|
||||
exit 0
|
||||
;;
|
||||
*)
|
||||
if [ -z "$TS" ]; then
|
||||
TS="$arg"
|
||||
else
|
||||
echo "unknown arg: $arg" >&2
|
||||
usage >&2
|
||||
exit 2
|
||||
fi
|
||||
;;
|
||||
esac
|
||||
done
|
||||
|
||||
if [ -z "$TS" ]; then
|
||||
echo "usage: $0 <freeze-timestamp> [--dry-run]" >&2
|
||||
echo " e.g. $0 20260503-180000" >&2
|
||||
echo " ts is the YYYYMMDD-HHMMSS suffix on demo-freeze-snapshots/disabled-workflows-*.txt" >&2
|
||||
exit 2
|
||||
fi
|
||||
|
||||
command -v gh >/dev/null || { echo "ERROR: gh CLI required" >&2; exit 1; }
|
||||
if ! gh auth status >/dev/null 2>&1; then
|
||||
echo "ERROR: gh not authenticated. Run 'gh auth login' first." >&2
|
||||
exit 1
|
||||
fi
|
||||
|
||||
SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
|
||||
WORKFLOWS_FILE="${SCRIPT_DIR}/demo-freeze-snapshots/disabled-workflows-${TS}.txt"
|
||||
|
||||
if [ ! -f "$WORKFLOWS_FILE" ]; then
|
||||
echo "ERROR: receipt not found: $WORKFLOWS_FILE" >&2
|
||||
echo "Available receipts:" >&2
|
||||
ls "${SCRIPT_DIR}/demo-freeze-snapshots/" 2>/dev/null | grep '^disabled-workflows-' >&2 || echo " (none)" >&2
|
||||
exit 1
|
||||
fi
|
||||
|
||||
if [ $DRY_RUN -eq 1 ]; then
|
||||
echo "=== DRY RUN (no changes will be made) ==="
|
||||
else
|
||||
echo "=== THAWING — re-enabling workflows ==="
|
||||
fi
|
||||
echo "Reading: $WORKFLOWS_FILE"
|
||||
echo
|
||||
|
||||
PARTIAL_FAIL=0
|
||||
while IFS=': ' read -r repo workflow; do
|
||||
[ -z "$repo" ] && continue
|
||||
if [ $DRY_RUN -eq 1 ]; then
|
||||
echo " (dry-run) would enable: gh workflow enable $workflow -R $repo"
|
||||
else
|
||||
if gh workflow enable "$workflow" -R "$repo" 2>/tmp/thaw.err; then
|
||||
echo " OK $repo/$workflow re-enabled"
|
||||
else
|
||||
echo " FAIL $repo/$workflow: $(cat /tmp/thaw.err)" >&2
|
||||
PARTIAL_FAIL=1
|
||||
fi
|
||||
fi
|
||||
done < "$WORKFLOWS_FILE"
|
||||
|
||||
echo
|
||||
if [ $DRY_RUN -eq 1 ]; then
|
||||
echo "=== DRY RUN COMPLETE ==="
|
||||
echo "Re-run without --dry-run to apply."
|
||||
exit 0
|
||||
fi
|
||||
|
||||
echo "=== THAW COMPLETE ==="
|
||||
echo "Cascades restored. Next workspace/** push to molecule-core/staging will"
|
||||
echo "auto-publish the runtime wheel and fan out to template rebuilds as normal."
|
||||
if [ $PARTIAL_FAIL -ne 0 ]; then
|
||||
echo
|
||||
echo "WARNING: one or more workflows did not re-enable cleanly. Re-run or enable manually:" >&2
|
||||
echo " gh workflow list -R <repo>" >&2
|
||||
exit 2
|
||||
fi
|
||||
exit 0
|
||||
Loading…
Reference in New Issue
Block a user