ops: demo-day freeze + rollback runbook

Demo-day preparation bundle for the funding demo (~2026-05-06). Adds: - scripts/demo-freeze.sh — captures current ghcr.io workspace-template-* :latest digests for all 8 runtimes, then disables both cascade vectors that could re-tag :latest mid-demo: publish-runtime.yml in molecule-core (PATH 1 — staging push to workspace/** auto-bumps the wheel and fans out to 8 templates) and publish-image.yml in each of the 8 template repos (PATH 2 — direct template repo merge re-tags :latest). Defaults to dry-run; requires --execute to apply. Writes both digest + workflow receipts to scripts/demo-freeze-snapshots/. - scripts/demo-thaw.sh — re-enables every workflow demo-freeze.sh disabled, keyed off the receipt timestamp. Defaults to executing (the inverse safety polarity from freeze, where the destructive default is dry-run). --dry-run prints without applying. - scripts/demo-day-runbook.md — operator runbook indexing the six rollback levers (platform image rollback, template image rollback, tenant redeploy, workspace delete, Railway rollback, Vercel rollback) plus pre-warm timing and post-demo cleanup. Also covers read-only diagnostics for "is this working?" moments and the CP_ADMIN_API_TOKEN rotation step that must follow demo (the token gets copy-pasted into shells during incident response). - scripts/demo-freeze-snapshots/.gitignore — generated freeze receipts are operational state, not source. Tracked .gitkeep so the directory exists when the script writes to it. Both scripts dry-run-tested locally. Did not exercise --execute since that would actually disable production workflows mid-development. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 12:04:30 -07:00 · 2026-05-01 12:04:30 -07:00 · 6d23611620
commit 6d23611620
parent 092724b6d7
5 changed files with 650 additions and 0 deletions
--- a/scripts/demo-day-runbook.md
+++ b/scripts/demo-day-runbook.md
@ -0,0 +1,306 @@
+# Demo-day runbook
+
+Pre-, during-, and post-demo operational procedures for the molecule
+production stack. Updated 2026-05-01 ahead of the funding-demo on
+~2026-05-06.
+
+The whole stack:
+
+```
+Vercel canvas (app.moleculesai.app)
+  → Railway controlplane (api.moleculesai.app)
+    → CloudFront/Cloudflare per-tenant edge (<slug>.moleculesai.app)
+      → EC2 tenant instance running platform container
+        → Docker workspaces pulled from
+          ghcr.io/molecule-ai/workspace-template-<runtime>:latest
+```
+
+Every layer has its own deploy/rollback story. This runbook indexes
+them in the order an operator would touch them during an incident.
+
+## Pre-demo (T-48h to T-1h)
+
+### 1. Freeze the runtime + template image cascade
+
+A merge to `molecule-core/staging` that touches `workspace/**` triggers
+`publish-runtime.yml` → PyPI bump → repository_dispatch → 8 template
+repos rebuild and re-tag `:latest`. A merge to any template repo's
+`main` triggers the same final re-tag directly. Either path means a
+new workspace provision during the demo pulls whatever `:latest`
+resolved to seconds earlier.
+
+Capture current good digests + disable both cascade vectors:
+
+```bash
+# Dry-run first — verifies digests can be fetched and tooling is set up
+scripts/demo-freeze.sh
+
+# Apply
+scripts/demo-freeze.sh --execute
+```
+
+The script writes two receipts to `scripts/demo-freeze-snapshots/`:
+
+- `digests-<TS>.txt` — current `:latest` digest per template (rollback target if needed)
+- `disabled-workflows-<TS>.txt` — workflow paths to re-enable post-demo
+
+Verify the freeze landed:
+
+```bash
+gh workflow list -R Molecule-AI/molecule-core | grep publish-runtime
+# expect: status = disabled_manually
+```
+
+If a critical fix MUST ship during the freeze window:
+
+1. `gh workflow enable publish-runtime.yml -R Molecule-AI/molecule-core`
+2. Merge the fix
+3. Watch the cascade through to GHCR:latest manually
+4. Smoke-verify against a staging tenant (`scripts/api-smoke.sh` or
+   manual canvas walkthrough)
+5. `gh workflow disable publish-runtime.yml -R Molecule-AI/molecule-core` to re-freeze
+
+Don't auto-promote during the freeze — the value of the freeze is that
+nothing happens automatically.
+
+### 2. Confirm production CP is on the expected SHA
+
+```bash
+gh run list -R Molecule-AI/molecule-controlplane --branch main --limit 5
+# Last `ci` run should be SUCCESS with the SHA you intend to demo on
+```
+
+Railway auto-deploys from main. Spot-check `api.moleculesai.app`:
+
+```bash
+curl -fsS -H "Authorization: Bearer $CP_ADMIN_API_TOKEN" \
+  https://api.moleculesai.app/cp/admin/orgs?limit=1
+# Expect: 200 + a JSON {"orgs": [...]}
+```
+
+### 3. Confirm production canvas (Vercel) is on main
+
+Vercel auto-deploys `main`. Verify in the Vercel dashboard the most
+recent prod deploy ran from the expected commit SHA.
+
+### 4. Pre-warm the demo tenant
+
+Cold-start times on workspace-template images:
+
+| Runtime | Cold-start (first boot) |
+|---|---|
+| claude-code | ~30-60s |
+| openclaw | ~1-2 min |
+| langgraph | ~1 min |
+| hermes | **~7 min** (large image) |
+
+If the demo will use `hermes`, provision the demo workspace at least
+10 min before. The cold-start clock starts when the workspace is
+created, not when it's used.
+
+## During demo — emergency rollback levers
+
+### Lever A: Platform-image rollback (canvas/CP layer regression)
+
+If the canvas or platform container shipped a regression, retag
+`:latest` to a prior staging SHA without rebuilding:
+
+```bash
+# Find a known-good SHA from staging history
+gh run list -R Molecule-AI/molecule-core --workflow=publish-canvas-image.yml --limit 5
+
+# Roll both platform + tenant images
+GITHUB_TOKEN=$(gh auth token) scripts/rollback-latest.sh <good-sha>
+```
+
+`rollback-latest.sh` retags both `ghcr.io/molecule-ai/platform:latest`
+and `ghcr.io/molecule-ai/platform-tenant:latest`. Existing tenants
+auto-pull `:latest` every 5 min — rollback propagates without manual
+restart.
+
+### Lever B: Workspace-template image rollback
+
+If a specific runtime template (claude-code, hermes, etc.) shipped a
+broken `:latest`:
+
+```bash
+# Get the demo's snapshotted-good digest from the freeze receipt
+grep claude-code scripts/demo-freeze-snapshots/digests-<TS>.txt
+
+# Retag :latest back to the snapshotted digest using crane
+crane auth login ghcr.io -u "$(gh api user --jq .login)" \
+  --password-stdin <<< "$(gh auth token)"
+crane tag \
+  ghcr.io/molecule-ai/workspace-template-claude-code@sha256:<digest> \
+  latest
+```
+
+The next workspace provision pulls the rolled-back image. Existing
+workspaces are unaffected (their image is already loaded into Docker).
+
+### Lever C: Wedged demo tenant — redeploy
+
+If the demo tenant's EC2 instance is wedged (boot succeeded but app
+not responding, or a stuck workspace), the controlplane has an admin
+redeploy endpoint:
+
+```bash
+# AWS-side: forces a fresh EC2 launch with current image. ~3 min.
+curl -fsS -X POST \
+  -H "Authorization: Bearer $CP_ADMIN_API_TOKEN" \
+  https://api.moleculesai.app/cp/admin/orgs/<slug>/redeploy
+```
+
+WARNING per memory: this triggers real EC2 + SSM actions on production.
+Double-check `<slug>` against the demo tenant's slug before pressing
+return. The `/redeploy` endpoint is idempotent on the EC2 side but
+WILL drop active SSH sessions.
+
+### Lever D: Specific bad workspace — delete
+
+If a single workspace inside the demo tenant is misbehaving (e.g.
+hermes wedged on cold-start, claude-code returning the generic
+"Agent error (Exception)" message), kill it:
+
+```bash
+# Get the demo tenant's per-tenant ADMIN_TOKEN
+TENANT_ADMIN=$(curl -fsS -H "Authorization: Bearer $CP_ADMIN_API_TOKEN" \
+  https://api.moleculesai.app/cp/admin/orgs/<slug>/admin-token \
+  | jq -r .admin_token)
+
+ORG_ID=$(curl -fsS -H "Authorization: Bearer $CP_ADMIN_API_TOKEN" \
+  https://api.moleculesai.app/cp/admin/orgs?limit=20 \
+  | jq -r '.orgs[] | select(.slug=="<slug>") | .id')
+
+# Delete the bad workspace
+curl -fsS -X DELETE \
+  -H "Origin: https://<slug>.moleculesai.app" \
+  -H "Authorization: Bearer $TENANT_ADMIN" \
+  -H "X-Molecule-Org-Id: $ORG_ID" \
+  https://<slug>.moleculesai.app/workspaces/<workspace-id>
+```
+
+Then re-provision a fresh workspace from the canvas. Faster than
+debugging the wedged one.
+
+### Lever E: Railway production rollback (CP regression)
+
+If the last Railway deploy of CP introduced a regression that lever A
+can't fix (e.g. a logic bug, not a container issue):
+
+1. Open Railway dashboard → molecule-platform → controlplane → Deployments
+2. Find the previous-known-good deployment
+3. Click **Rollback to this deployment**
+
+Manual step — no CLI equivalent built. Takes ~30s to redeploy from
+the prior image. Note: rollback restores the prior code AND prior env
+var snapshot; don't expect any env var changes made since to persist.
+
+### Lever F: Vercel production rollback (canvas regression)
+
+If the canvas ships a regression:
+
+1. Open Vercel dashboard → molecule-app → Deployments
+2. Find the previous prod deployment
+3. **Promote to Production**
+
+Same pattern as Railway — fast revert, no rebuild.
+
+## Tenant-level read-only diagnostics (not actions)
+
+Useful during a "is this working?" moment without touching anything:
+
+```bash
+# Tenant infra state
+curl -fsS -H "Authorization: Bearer $CP_ADMIN_API_TOKEN" \
+  "https://api.moleculesai.app/cp/admin/orgs?limit=20" \
+  | jq '.orgs[] | select(.slug=="<slug>")'
+
+# Tenant boot events (debug a stuck provision)
+curl -fsS -H "Authorization: Bearer $CP_ADMIN_API_TOKEN" \
+  "https://api.moleculesai.app/cp/admin/tenants/<slug>/boot-events?limit=50" \
+  | jq
+
+# Workspace activity (debug an unresponsive agent)
+curl -fsS \
+  -H "Origin: https://<slug>.moleculesai.app" \
+  -H "Authorization: Bearer $TENANT_ADMIN" \
+  -H "X-Molecule-Org-Id: $ORG_ID" \
+  "https://<slug>.moleculesai.app/workspaces/<workspace-id>/activity?limit=20" \
+  | jq
+```
+
+## Post-demo (T+30m to T+24h)
+
+### 1. Thaw the cascades
+
+```bash
+# Find the freeze receipt
+ls scripts/demo-freeze-snapshots/
+
+# Thaw — pass the timestamp suffix
+scripts/demo-thaw.sh 20260506-180000
+```
+
+The next merge to `molecule-core/staging` (workspace/**) or any
+template repo's `main` will resume the auto-rebuild cascade.
+
+### 2. Audit what was held back
+
+If any merges queued during the freeze:
+
+```bash
+gh pr list -R Molecule-AI/molecule-core --base staging --state merged \
+  --search "merged:>=$(date -u -v-7d +%Y-%m-%d)"
+```
+
+Verify each merge's CI is green and dispatch the runtime cascade once
+to ensure all templates rebuild against the post-freeze HEAD.
+
+### 3. File a post-mortem if anything fired
+
+If any rollback lever was used during the demo, file a brief doc:
+
+- Which lever (A through F)
+- Which SHA was rolled back FROM and TO
+- Did the rollback fully resolve the issue or was a follow-up needed
+- Whether the underlying regression should have been caught by CI
+
+## Common issues + first-line fix
+
+| Symptom | First lever to try |
+|---|---|
+| Workspace boots but agent always errors | Lever D (delete + reprovision) |
+| Whole tenant unreachable | Lever C (redeploy) |
+| Canvas crashes on load | Lever F (Vercel rollback) |
+| Login broken / API errors | Lever E (Railway rollback) |
+| Specific runtime broken across tenants | Lever B (template image rollback) |
+| Platform container regression | Lever A (rollback-latest.sh) |
+| Mid-demo stray PR auto-published a bad image | Lever B + investigate why freeze didn't catch it |
+
+## Auth fingerprint (rotate post-demo)
+
+The freeze + rollback procedures assume:
+
+- `CP_ADMIN_API_TOKEN` available via `railway variables --kv --environment production`
+- `gh auth token` returns a working PAT with `workflow:write` + `write:packages`
+- `crane` installed (`brew install crane`)
+
+After the demo, **rotate** `CP_ADMIN_API_TOKEN` (it's the keys-to-the-kingdom
+token for production) — it likely got copy-pasted into shells during
+the demo.
+
+```bash
+# Generate a new admin token
+NEW_TOKEN=$(openssl rand -hex 32)
+
+# Update Railway production env var (and optionally staging)
+railway variables --set CP_ADMIN_API_TOKEN="$NEW_TOKEN" --environment production
+
+# Restart CP service to pick up the change
+# (Railway auto-restarts on env var change)
+
+# Verify
+curl -fsS -H "Authorization: Bearer $NEW_TOKEN" \
+  https://api.moleculesai.app/cp/admin/orgs?limit=1
+```
--- a/scripts/demo-freeze-snapshots/.gitignore
+++ b/scripts/demo-freeze-snapshots/.gitignore
@ -0,0 +1,6 @@
+# Generated by scripts/demo-freeze.sh — receipts are operational state,
+# not source. Tracked .gitignore + .gitkeep keep the directory itself
+# in version control so the freeze script's output dir always exists.
+*
+!.gitignore
+!.gitkeep
--- a/scripts/demo-freeze-snapshots/.gitkeep
+++ b/scripts/demo-freeze-snapshots/.gitkeep
--- a/scripts/demo-freeze.sh
+++ b/scripts/demo-freeze.sh
@ -0,0 +1,214 @@
+#!/usr/bin/env bash
+# demo-freeze.sh — disable the runtime + template image publish cascades
+# during a demo-prep window so a stray staging merge can't auto-rebuild
+# `:latest` for the 8 workspace-template images mid-demo.
+#
+# Demo prep typically runs T-48h to T+1h. During that window:
+#
+#   PATH 1: any merge to molecule-core/staging that touches workspace/**
+#           → publish-runtime.yml fires
+#           → PyPI auto-bumps molecule-ai-workspace-runtime patch version
+#           → repository_dispatch fans out to 8 workspace-template-* repos
+#           → each template repo rebuilds and re-tags
+#             ghcr.io/molecule-ai/workspace-template-<runtime>:latest
+#
+#   PATH 2: any merge to a workspace-template-* repo's main branch
+#           → that repo's publish-image.yml fires
+#           → ghcr.io/molecule-ai/workspace-template-<runtime>:latest
+#             gets re-tagged
+#
+#   provisioner.go:296 RuntimeImages[runtime] reads `:latest` at every
+#   workspace boot. A new workspace provision during demo pulls whatever
+#   `:latest` resolved to seconds earlier — so a bad merge minutes
+#   before the demo can break a tenant the funder is about to see.
+#
+# This script captures the current good `:latest` digests for all 8
+# templates and disables both cascade vectors. The complementary
+# demo-thaw.sh re-enables them.
+#
+# Usage:
+#   scripts/demo-freeze.sh                # dry run — print what would happen
+#   scripts/demo-freeze.sh --execute      # actually disable workflows + snapshot
+#
+# Prereqs:
+#   - gh CLI authenticated with workflow:write scope on Molecule-AI org
+#   - curl + jq (for digest snapshot via GHCR anonymous registry API)
+#
+# Output:
+#   <snapshot dir>/digests-YYYYMMDD-HHMMSS.txt
+#     One line per template: "<runtime>: <digest>"
+#   <snapshot dir>/disabled-workflows-YYYYMMDD-HHMMSS.txt
+#     One line per disabled workflow: "<repo>: <workflow>"
+#
+# Exit codes:
+#   0 — freeze complete (or dry-run successful)
+#   1 — pre-flight failure (missing tooling, missing auth, etc.)
+#   2 — partial freeze (some workflows did not disable cleanly; see log)
+
+set -euo pipefail
+
+usage() {
+  cat <<'USAGE'
+demo-freeze.sh — disable the runtime + template image publish cascades
+during a demo-prep window.
+
+Captures current :latest digests for all 8 workspace-template-* images
+and disables the workflows that would otherwise re-tag them.
+
+Usage:
+  scripts/demo-freeze.sh                # dry run — print what would happen
+  scripts/demo-freeze.sh --execute      # actually disable workflows + snapshot
+
+See the comment block at the top of this script for the full procedure.
+USAGE
+}
+
+EXECUTE=0
+case "${1:-}" in
+  --execute)
+    EXECUTE=1
+    ;;
+  --help|-h)
+    usage
+    exit 0
+    ;;
+  "")
+    ;;
+  *)
+    echo "unknown arg: $1" >&2
+    usage >&2
+    exit 2
+    ;;
+esac
+
+# Templates and their GHCR repository slugs. Source of truth for the
+# runtime → image map is workspace-server/internal/provisioner/provisioner.go
+# RuntimeImages — keep this list in sync if a runtime is added.
+TEMPLATES=(
+  "claude-code"
+  "hermes"
+  "openclaw"
+  "langgraph"
+  "deepagents"
+  "crewai"
+  "autogen"
+  "gemini-cli"
+)
+
+# Pre-flight: required tooling.
+need() {
+  command -v "$1" >/dev/null || { echo "ERROR: missing required tool: $1" >&2; exit 1; }
+}
+need gh
+need curl
+need jq
+
+# Pre-flight: gh auth. Snapshot via anonymous GHCR token works without
+# org auth, but workflow disable needs an authenticated gh.
+if ! gh auth status >/dev/null 2>&1; then
+  echo "ERROR: gh not authenticated. Run 'gh auth login' first." >&2
+  exit 1
+fi
+
+# Snapshot location relative to this script. Keeping it under scripts/
+# rather than a temp dir means freeze receipts are easy to find again
+# during the actual demo.
+SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
+SNAPSHOT_DIR="${SCRIPT_DIR}/demo-freeze-snapshots"
+mkdir -p "$SNAPSHOT_DIR"
+TS="$(date -u +%Y%m%d-%H%M%S)"
+DIGESTS_FILE="${SNAPSHOT_DIR}/digests-${TS}.txt"
+WORKFLOWS_FILE="${SNAPSHOT_DIR}/disabled-workflows-${TS}.txt"
+
+if [ $EXECUTE -eq 0 ]; then
+  echo "=== DRY RUN (no changes will be made; pass --execute to apply) ==="
+else
+  echo "=== EXECUTING FREEZE — workflows will be disabled ==="
+fi
+echo "Snapshot timestamp: $TS"
+echo "Digest log:    $DIGESTS_FILE"
+echo "Workflow log:  $WORKFLOWS_FILE"
+echo
+
+# Step 1: capture current :latest digest for each template.
+echo "→ Capturing current :latest digests"
+for tpl in "${TEMPLATES[@]}"; do
+  token=$(curl -fsS "https://ghcr.io/token?scope=repository:molecule-ai/workspace-template-${tpl}:pull" | jq -r .token 2>/dev/null || true)
+  if [ -z "$token" ] || [ "$token" = "null" ]; then
+    echo "  WARN: token fetch failed for $tpl — skipping digest capture"
+    continue
+  fi
+  digest=$(curl -fsSI \
+    -H "Authorization: Bearer $token" \
+    -H "Accept: application/vnd.oci.image.index.v1+json" \
+    -H "Accept: application/vnd.docker.distribution.manifest.v2+json" \
+    "https://ghcr.io/v2/molecule-ai/workspace-template-${tpl}/manifests/latest" 2>/dev/null \
+    | grep -i 'docker-content-digest' \
+    | awk '{print $2}' \
+    | tr -d '\r')
+  if [ -z "$digest" ]; then
+    echo "  WARN: digest fetch failed for $tpl"
+    continue
+  fi
+  echo "  $tpl: $digest"
+  if [ $EXECUTE -eq 1 ]; then
+    echo "$tpl: $digest" >> "$DIGESTS_FILE"
+  fi
+done
+echo
+
+# Step 2: disable publish-runtime.yml in molecule-core (PATH 1 source).
+echo "→ Disabling publish-runtime.yml in molecule-core (kills runtime → 8-template cascade)"
+if [ $EXECUTE -eq 1 ]; then
+  if gh workflow disable publish-runtime.yml -R Molecule-AI/molecule-core 2>/tmp/freeze.err; then
+    echo "  OK   molecule-core/publish-runtime.yml disabled"
+    echo "Molecule-AI/molecule-core: publish-runtime.yml" >> "$WORKFLOWS_FILE"
+  else
+    echo "  FAIL molecule-core/publish-runtime.yml: $(cat /tmp/freeze.err)" >&2
+  fi
+else
+  echo "  (dry-run) would disable: gh workflow disable publish-runtime.yml -R Molecule-AI/molecule-core"
+fi
+echo
+
+# Step 3: disable publish-image.yml in each of the 8 template repos (PATH 2 sources).
+echo "→ Disabling publish-image.yml in each workspace-template-* repo"
+PARTIAL_FAIL=0
+for tpl in "${TEMPLATES[@]}"; do
+  repo="Molecule-AI/molecule-ai-workspace-template-${tpl}"
+  if [ $EXECUTE -eq 1 ]; then
+    if gh workflow disable publish-image.yml -R "$repo" 2>/tmp/freeze.err; then
+      echo "  OK   $repo/publish-image.yml disabled"
+      echo "${repo}: publish-image.yml" >> "$WORKFLOWS_FILE"
+    else
+      echo "  FAIL $repo/publish-image.yml: $(cat /tmp/freeze.err)" >&2
+      PARTIAL_FAIL=1
+    fi
+  else
+    echo "  (dry-run) would disable: gh workflow disable publish-image.yml -R $repo"
+  fi
+done
+echo
+
+if [ $EXECUTE -eq 0 ]; then
+  echo "=== DRY RUN COMPLETE ==="
+  echo "Re-run with --execute to apply the freeze."
+  exit 0
+fi
+
+echo "=== FREEZE COMPLETE ==="
+echo "Receipts: $DIGESTS_FILE"
+echo "          $WORKFLOWS_FILE"
+echo
+echo "Next steps:"
+echo "  - Verify by running: gh workflow list -R Molecule-AI/molecule-core | grep publish-runtime"
+echo "    Status should be 'disabled_manually'."
+echo "  - Demo proceeds; new workspaces pull the snapshotted :latest digests."
+echo "  - Post-demo, run: scripts/demo-thaw.sh ${TS}"
+echo "    to re-enable every workflow this freeze disabled."
+echo
+if [ $PARTIAL_FAIL -ne 0 ]; then
+  echo "WARNING: one or more workflows did not disable cleanly. Re-run after fixing." >&2
+  exit 2
+fi
+exit 0
--- a/scripts/demo-thaw.sh
+++ b/scripts/demo-thaw.sh
@ -0,0 +1,124 @@
+#!/usr/bin/env bash
+# demo-thaw.sh — re-enable workflows that demo-freeze.sh disabled.
+#
+# Usage:
+#   scripts/demo-thaw.sh <freeze-timestamp>
+#   scripts/demo-thaw.sh 20260503-180000
+#
+# Reads disabled-workflows-<ts>.txt produced by demo-freeze.sh and
+# runs `gh workflow enable` for each entry. Idempotent — re-enabling
+# an already-enabled workflow is a no-op.
+#
+# Defaults to executing (the inverse of freeze, which defaults to
+# dry-run). Pass --dry-run to print without executing.
+#
+# Prereqs:
+#   - gh CLI authenticated with workflow:write scope on Molecule-AI org
+#
+# Exit codes:
+#   0 — all workflows re-enabled
+#   1 — pre-flight failure (missing receipt file, missing tooling)
+#   2 — partial thaw (some workflows did not enable; check output)
+
+set -euo pipefail
+
+usage() {
+  cat <<'USAGE'
+demo-thaw.sh — re-enable workflows that demo-freeze.sh disabled.
+
+Usage:
+  scripts/demo-thaw.sh <freeze-timestamp>            # apply
+  scripts/demo-thaw.sh <freeze-timestamp> --dry-run  # print without applying
+
+ts is the YYYYMMDD-HHMMSS suffix on
+scripts/demo-freeze-snapshots/disabled-workflows-*.txt produced by
+demo-freeze.sh.
+USAGE
+}
+
+DRY_RUN=0
+TS=""
+for arg in "$@"; do
+  case "$arg" in
+    --dry-run)
+      DRY_RUN=1
+      ;;
+    --help|-h)
+      usage
+      exit 0
+      ;;
+    *)
+      if [ -z "$TS" ]; then
+        TS="$arg"
+      else
+        echo "unknown arg: $arg" >&2
+        usage >&2
+        exit 2
+      fi
+      ;;
+  esac
+done
+
+if [ -z "$TS" ]; then
+  echo "usage: $0 <freeze-timestamp> [--dry-run]" >&2
+  echo "  e.g. $0 20260503-180000" >&2
+  echo "  ts is the YYYYMMDD-HHMMSS suffix on demo-freeze-snapshots/disabled-workflows-*.txt" >&2
+  exit 2
+fi
+
+command -v gh >/dev/null || { echo "ERROR: gh CLI required" >&2; exit 1; }
+if ! gh auth status >/dev/null 2>&1; then
+  echo "ERROR: gh not authenticated. Run 'gh auth login' first." >&2
+  exit 1
+fi
+
+SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
+WORKFLOWS_FILE="${SCRIPT_DIR}/demo-freeze-snapshots/disabled-workflows-${TS}.txt"
+
+if [ ! -f "$WORKFLOWS_FILE" ]; then
+  echo "ERROR: receipt not found: $WORKFLOWS_FILE" >&2
+  echo "Available receipts:" >&2
+  ls "${SCRIPT_DIR}/demo-freeze-snapshots/" 2>/dev/null | grep '^disabled-workflows-' >&2 || echo "  (none)" >&2
+  exit 1
+fi
+
+if [ $DRY_RUN -eq 1 ]; then
+  echo "=== DRY RUN (no changes will be made) ==="
+else
+  echo "=== THAWING — re-enabling workflows ==="
+fi
+echo "Reading: $WORKFLOWS_FILE"
+echo
+
+PARTIAL_FAIL=0
+while IFS=': ' read -r repo workflow; do
+  [ -z "$repo" ] && continue
+  if [ $DRY_RUN -eq 1 ]; then
+    echo "  (dry-run) would enable: gh workflow enable $workflow -R $repo"
+  else
+    if gh workflow enable "$workflow" -R "$repo" 2>/tmp/thaw.err; then
+      echo "  OK   $repo/$workflow re-enabled"
+    else
+      echo "  FAIL $repo/$workflow: $(cat /tmp/thaw.err)" >&2
+      PARTIAL_FAIL=1
+    fi
+  fi
+done < "$WORKFLOWS_FILE"
+
+echo
+if [ $DRY_RUN -eq 1 ]; then
+  echo "=== DRY RUN COMPLETE ==="
+  echo "Re-run without --dry-run to apply."
+  exit 0
+fi
+
+echo "=== THAW COMPLETE ==="
+echo "Cascades restored. Next workspace/** push to molecule-core/staging will"
+echo "auto-publish the runtime wheel and fan out to template rebuilds as normal."
+if [ $PARTIAL_FAIL -ne 0 ]; then
+  echo
+  echo "WARNING: one or more workflows did not re-enable cleanly. Re-run or enable manually:" >&2
+  echo "  gh workflow list -R <repo>" >&2
+  exit 2
+fi
+exit 0