molecule-core/scripts/nuke-and-rebuild.sh
Hongming Wang 44d0444aae fix(scripts): nuke-and-rebuild self-bootstraps templates; add E2E test
Two paper cuts the fix addresses:

1. nuke-and-rebuild.sh wipes the compose stack but never re-populates
   workspace-configs-templates/, org-templates/, or plugins/. Those dirs
   are .gitignored — the curated set lives in manifest.json as external
   repos cloned via clone-manifest.sh (idempotent). Without that step,
   a fresh checkout or a post-deletion run leaves the dirs empty, which
   silently hides the entire template palette in Canvas + falls back to
   bare default workspace provisioning. Symptom: "Deploy your first
   agent" shows zero templates.

2. The existing ws-* container reap was already in the script (good),
   but it only fires when this script runs. Folks running `docker compose
   down -v` directly leave orphan ws-* containers behind. Documented
   that explicitly in the script comment so future readers understand
   why those lines are critical.

The fix is just `bash clone-manifest.sh` added to the script. clone-
manifest.sh is idempotent — populated dirs short-circuit, so a re-nuke
on a healthy machine pays only a few stat calls.

scripts/test-nuke-and-rebuild.sh exercises the canonical workflow end-
to-end:
  - plants a fake orphan ws-* container, then asserts it gets reaped
  - renames the manifest dirs to simulate a fresh checkout, then
    asserts they get repopulated
  - waits for /health and asserts the platform sees the same template
    count on disk as via /configs in the container (catches bind-mount
    drift)
  - asserts the image-auto-refresh watcher (PR #2114) starts, since
    that's load-bearing for the CD chain users now rely on

The test pre-flights port 5432/6379/8080 and exits 0 with a SKIP
message if a non-target compose project is holding them — common when
parallel monorepo checkouts coexist on one Docker daemon.

scripts/ is intentionally outside CI shellcheck per ci.yml comment, but
both files pass `shellcheck --severity=warning` anyway.

Defers but does not solve the runtime root-cause for orphan ws-* after
plain `docker compose down -v`: the orphan-sweeper in the platform only
reaps containers whose workspace row says status='removed', so a wiped
DB → no row → sweeper ignores them. Proper fix needs container labels
keyed to a per-platform-instance UUID so the sweeper can confidently
reap "containers I provisioned that aren't in my DB anymore" without
nuking a sibling platform's containers on a shared daemon. Tracked as
task #109's follow-up; out of scope for this PR.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 14:37:04 -07:00

53 lines
2.3 KiB
Bash

#!/bin/bash
# Full nuke + rebuild — one command to reset everything.
#
# What "everything" means:
# 1. The compose stack (containers + named volumes + network).
# 2. Dynamically-spawned ws-* workspace containers + their volumes.
# These are NOT in docker-compose.yml — the provisioner creates them
# at workspace-create time, so `compose down -v` leaves them behind.
# Without this step, a fresh DB plus old ws-* containers = ghost
# containers Canvas can't see, eating CPU + memory.
# 3. Repopulating the manifest-managed dirs (workspace-configs-templates/,
# org-templates/, plugins/). These are .gitignored — fresh checkouts
# and post-deletion runs leave them empty, which silently hides the
# entire template palette in Canvas. clone-manifest.sh is idempotent,
# so re-running with already-populated dirs is a fast no-op.
#
# Usage:
# bash scripts/nuke-and-rebuild.sh
set -euo pipefail
ROOT="$(cd "$(dirname "$0")/.." && pwd)"
echo "=== NUKE ==="
docker compose -f "$ROOT/docker-compose.yml" down -v 2>/dev/null || true
docker ps -a --format "{{.Names}}" | grep "^ws-" | xargs -r docker rm -f 2>/dev/null || true
docker volume ls --format "{{.Name}}" | grep "^ws-" | xargs -r docker volume rm 2>/dev/null || true
docker network rm molecule-monorepo-net 2>/dev/null || true
echo " cleaned"
echo "=== POPULATE MANIFEST DIRS ==="
# Idempotent: clone-manifest.sh skips dirs that already have content, so a
# re-nuke after templates are populated is a fast no-op (a few stat calls).
# Skip with a clear warning if jq is missing — installing it is a one-time
# step documented in the README quickstart.
if command -v jq >/dev/null 2>&1; then
bash "$ROOT/scripts/clone-manifest.sh" \
"$ROOT/manifest.json" \
"$ROOT/workspace-configs-templates" \
"$ROOT/org-templates" \
"$ROOT/plugins" 2>&1 | tail -3
else
echo " WARNING: jq not installed — skipping template/plugin clone."
echo " Install (brew install jq) and rerun, or Canvas's template"
echo " palette will be empty and provisioning falls back to defaults."
fi
echo "=== REBUILD ==="
docker compose -f "$ROOT/docker-compose.yml" up -d --build
echo " platform + canvas up"
echo "=== POST-REBUILD SETUP ==="
bash "$ROOT/scripts/post-rebuild-setup.sh"