diff --git a/review-pr3029-pr3033.md b/review-pr3029-pr3033.md new file mode 100644 index 000000000..9e2bd73dc --- /dev/null +++ b/review-pr3029-pr3033.md @@ -0,0 +1,130 @@ +# 5-Axis Review: PR #3029 (fix #2989) + PR #3033 (docs refresh) + +**Reviewer:** Kimi / Engineer-A +**Date:** 2026-05-31 +**Scope:** Local review (CR2 auth-down, filling review gap per PM dispatch) + +--- + +## PR #3029 — CP orphan sweeper + registry prefix abstraction + +### Correctness ✅ (with 1 semantic conflict to resolve) + +**cp_orphan_sweeper.go** — The deprovision split-write race fix is sound: +- SELECT `status='removed' AND instance_id IS NOT NULL AND instance_id != ''` correctly targets leaked EC2s. +- Stop → clear instance_id is idempotent; on Stop failure the row stays targeted for retry. +- `ORDER BY updated_at DESC` + `LIMIT $1` + `UPDATE updated_at = now()` creates fair round-robin drain across cycles. +- `supervised.RunWithRecover` wiring in `cmd/server/main.go` mirrors the Docker sweeper pattern. + +**provisioner/registry.go** — Clean env-driven prefix abstraction: +- `RegistryPrefix()` respects `MOLECULE_IMAGE_REGISTRY` override; falls back to GHCR OSS default. +- `RuntimeImage()` returns `""` for unknown runtimes, forcing explicit fallback at call sites. +- `computeRuntimeImages()` runs at init; captures prefix active at boot. + +** provisioner.go migration** — Hardcoded map → `computeRuntimeImages()` is a safe refactor; no behavioral change for OSS default. + +**admin_workspace_images.go** — `TemplateImageRef()` now uses `provisioner.RegistryPrefix()`; keeps admin ops and provisioner pulls consistent. + +### Security ✅ + +- Sweeper SQL has no user-input surface; parameters are internal LIMIT constant and DB-generated IDs. +- `RegistryPrefix()` reads env only; comment correctly notes it is deploy-time trusted (operator-set, not user-supplied). +- No new secrets, auth tokens, or credential exposure. + +### Performance ✅ + +- 60s tick / 30s deadline / LIMIT 100 is conservative and safe. +- Sequential Stop calls share the 30s parent context; with typical CP DELETE latency (<1s), 100 orphans finish well within budget. +- If CP is degraded, deadline expires, UPDATEs don't fire, and next cycle retries — no stampede. + +### Style / Readability ✅ + +- Excellent docstrings; the `#2989` race narrative is clearly documented for future maintainers. +- `CPOrphanReaper` interface is minimal and testable. +- Nil-reaper and nil-DB guards follow existing patterns. +- One minor nit: `cpSweepOnce` could return `[]string` of processed IDs to make post-hoc assertions easier, but the fake-reaper test pattern works fine as-is. + +### Tests ✅ (excellent coverage) + +| Scenario | Covered | +|---|---| +| Happy path: Stop succeeds, instance_id cleared | ✅ | +| Stop fails, instance_id retained for retry | ✅ | +| Empty result set (steady state) | ✅ | +| Multiple orphans, partial failure, others proceed | ✅ | +| DB query error (transient) | ✅ | +| UPDATE error after Stop success (logs, continues) | ✅ | +| Nil db.DB (defensive boot safety) | ✅ | +| Nil reaper (disabled, no goroutine leak) | ✅ | +| Boot sweep + tick cadence + ctx cancel | ✅ | +| Registry prefix default / env override / empty env | ✅ | +| Runtime image format for all known runtimes | ✅ | +| Unknown runtime returns `""` | ✅ | +| Registry override applies to ALL runtimes | ✅ | +| Alphabetical order pin | ✅ | + +**All tests pass:** +``` +ok github.com/.../internal/registry 0.107s (9/9 CP sweeper tests) +ok github.com/.../internal/provisioner 0.009s (7/7 registry tests) +``` + +### ⚠️ BLOCKER: Semantic conflict with PR #3033 + +`registry.go` adds `"codex"` to `knownRuntimes`, making **9** production runtimes: +```go +knownRuntimes = []string{ + "autogen", "claude-code", "codex", "crewai", "deepagents", + "gemini-cli", "hermes", "langgraph", "openclaw", +} +``` + +PR #3033 updates the README to claim **eight** production runtimes and explicitly lists: +> Claude Code, Hermes, Gemini CLI, LangGraph, DeepAgents, CrewAI, AutoGen, OpenClaw + +`codex` is absent from the README compatibility table, the "What Ships In main" section, and the architecture diagram list. After both PRs merge, the code will support 9 runtimes but the docs will claim 8 — a public-facing drift. + +**Fix path:** Add `codex` to the README runtime list in PR #3033 (or a fast-follow) so the count and table stay accurate. `codex` already exists in `manifest.json` and has a template repo, so it is legitimate to list as "shipping on main." + +--- + +## PR #3033 — Docs refresh (README + branding assets) + +### Correctness ✅ (with 1 semantic drift pending) + +- Terminology standardization ("adapters" → "runtimes") is correct and consistent with platform usage. +- Deploy buttons updated from `molecule-monorepo` → `molecule-core`. +- Canvas v4, Memory v2, SaaS surface, RFC #2967 mentions are all factually accurate. +- **Missing:** `codex` runtime (see blocker above). + +### Security ✅ + +- SVG assets are static branding; no scripts, no external references beyond the existing `