Commit Graph

1 Commits

Author SHA1 Message Date
Hongming Wang
44d0444aae fix(scripts): nuke-and-rebuild self-bootstraps templates; add E2E test
Two paper cuts the fix addresses:

1. nuke-and-rebuild.sh wipes the compose stack but never re-populates
   workspace-configs-templates/, org-templates/, or plugins/. Those dirs
   are .gitignored — the curated set lives in manifest.json as external
   repos cloned via clone-manifest.sh (idempotent). Without that step,
   a fresh checkout or a post-deletion run leaves the dirs empty, which
   silently hides the entire template palette in Canvas + falls back to
   bare default workspace provisioning. Symptom: "Deploy your first
   agent" shows zero templates.

2. The existing ws-* container reap was already in the script (good),
   but it only fires when this script runs. Folks running `docker compose
   down -v` directly leave orphan ws-* containers behind. Documented
   that explicitly in the script comment so future readers understand
   why those lines are critical.

The fix is just `bash clone-manifest.sh` added to the script. clone-
manifest.sh is idempotent — populated dirs short-circuit, so a re-nuke
on a healthy machine pays only a few stat calls.

scripts/test-nuke-and-rebuild.sh exercises the canonical workflow end-
to-end:
  - plants a fake orphan ws-* container, then asserts it gets reaped
  - renames the manifest dirs to simulate a fresh checkout, then
    asserts they get repopulated
  - waits for /health and asserts the platform sees the same template
    count on disk as via /configs in the container (catches bind-mount
    drift)
  - asserts the image-auto-refresh watcher (PR #2114) starts, since
    that's load-bearing for the CD chain users now rely on

The test pre-flights port 5432/6379/8080 and exits 0 with a SKIP
message if a non-target compose project is holding them — common when
parallel monorepo checkouts coexist on one Docker daemon.

scripts/ is intentionally outside CI shellcheck per ci.yml comment, but
both files pass `shellcheck --severity=warning` anyway.

Defers but does not solve the runtime root-cause for orphan ws-* after
plain `docker compose down -v`: the orphan-sweeper in the platform only
reaps containers whose workspace row says status='removed', so a wiped
DB → no row → sweeper ignores them. Proper fix needs container labels
keyed to a per-platform-instance UUID so the sweeper can confidently
reap "containers I provisioned that aren't in my DB anymore" without
nuking a sibling platform's containers on a shared daemon. Tracked as
task #109's follow-up; out of scope for this PR.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 14:37:04 -07:00