review: 5-axis review of PR #3029/#3033 + remove unused canvasUserMessage type (lint fix) #2055
@@ -0,0 +1,130 @@
|
||||
# 5-Axis Review: PR #3029 (fix #2989) + PR #3033 (docs refresh)
|
||||
|
||||
**Reviewer:** Kimi / Engineer-A
|
||||
**Date:** 2026-05-31
|
||||
**Scope:** Local review (CR2 auth-down, filling review gap per PM dispatch)
|
||||
|
||||
---
|
||||
|
||||
## PR #3029 — CP orphan sweeper + registry prefix abstraction
|
||||
|
||||
### Correctness ✅ (with 1 semantic conflict to resolve)
|
||||
|
||||
**cp_orphan_sweeper.go** — The deprovision split-write race fix is sound:
|
||||
- SELECT `status='removed' AND instance_id IS NOT NULL AND instance_id != ''` correctly targets leaked EC2s.
|
||||
- Stop → clear instance_id is idempotent; on Stop failure the row stays targeted for retry.
|
||||
- `ORDER BY updated_at DESC` + `LIMIT $1` + `UPDATE updated_at = now()` creates fair round-robin drain across cycles.
|
||||
- `supervised.RunWithRecover` wiring in `cmd/server/main.go` mirrors the Docker sweeper pattern.
|
||||
|
||||
**provisioner/registry.go** — Clean env-driven prefix abstraction:
|
||||
- `RegistryPrefix()` respects `MOLECULE_IMAGE_REGISTRY` override; falls back to GHCR OSS default.
|
||||
- `RuntimeImage()` returns `""` for unknown runtimes, forcing explicit fallback at call sites.
|
||||
- `computeRuntimeImages()` runs at init; captures prefix active at boot.
|
||||
|
||||
** provisioner.go migration** — Hardcoded map → `computeRuntimeImages()` is a safe refactor; no behavioral change for OSS default.
|
||||
|
||||
**admin_workspace_images.go** — `TemplateImageRef()` now uses `provisioner.RegistryPrefix()`; keeps admin ops and provisioner pulls consistent.
|
||||
|
||||
### Security ✅
|
||||
|
||||
- Sweeper SQL has no user-input surface; parameters are internal LIMIT constant and DB-generated IDs.
|
||||
- `RegistryPrefix()` reads env only; comment correctly notes it is deploy-time trusted (operator-set, not user-supplied).
|
||||
- No new secrets, auth tokens, or credential exposure.
|
||||
|
||||
### Performance ✅
|
||||
|
||||
- 60s tick / 30s deadline / LIMIT 100 is conservative and safe.
|
||||
- Sequential Stop calls share the 30s parent context; with typical CP DELETE latency (<1s), 100 orphans finish well within budget.
|
||||
- If CP is degraded, deadline expires, UPDATEs don't fire, and next cycle retries — no stampede.
|
||||
|
||||
### Style / Readability ✅
|
||||
|
||||
- Excellent docstrings; the `#2989` race narrative is clearly documented for future maintainers.
|
||||
- `CPOrphanReaper` interface is minimal and testable.
|
||||
- Nil-reaper and nil-DB guards follow existing patterns.
|
||||
- One minor nit: `cpSweepOnce` could return `[]string` of processed IDs to make post-hoc assertions easier, but the fake-reaper test pattern works fine as-is.
|
||||
|
||||
### Tests ✅ (excellent coverage)
|
||||
|
||||
| Scenario | Covered |
|
||||
|---|---|
|
||||
| Happy path: Stop succeeds, instance_id cleared | ✅ |
|
||||
| Stop fails, instance_id retained for retry | ✅ |
|
||||
| Empty result set (steady state) | ✅ |
|
||||
| Multiple orphans, partial failure, others proceed | ✅ |
|
||||
| DB query error (transient) | ✅ |
|
||||
| UPDATE error after Stop success (logs, continues) | ✅ |
|
||||
| Nil db.DB (defensive boot safety) | ✅ |
|
||||
| Nil reaper (disabled, no goroutine leak) | ✅ |
|
||||
| Boot sweep + tick cadence + ctx cancel | ✅ |
|
||||
| Registry prefix default / env override / empty env | ✅ |
|
||||
| Runtime image format for all known runtimes | ✅ |
|
||||
| Unknown runtime returns `""` | ✅ |
|
||||
| Registry override applies to ALL runtimes | ✅ |
|
||||
| Alphabetical order pin | ✅ |
|
||||
|
||||
**All tests pass:**
|
||||
```
|
||||
ok github.com/.../internal/registry 0.107s (9/9 CP sweeper tests)
|
||||
ok github.com/.../internal/provisioner 0.009s (7/7 registry tests)
|
||||
```
|
||||
|
||||
### ⚠️ BLOCKER: Semantic conflict with PR #3033
|
||||
|
||||
`registry.go` adds `"codex"` to `knownRuntimes`, making **9** production runtimes:
|
||||
```go
|
||||
knownRuntimes = []string{
|
||||
"autogen", "claude-code", "codex", "crewai", "deepagents",
|
||||
"gemini-cli", "hermes", "langgraph", "openclaw",
|
||||
}
|
||||
```
|
||||
|
||||
PR #3033 updates the README to claim **eight** production runtimes and explicitly lists:
|
||||
> Claude Code, Hermes, Gemini CLI, LangGraph, DeepAgents, CrewAI, AutoGen, OpenClaw
|
||||
|
||||
`codex` is absent from the README compatibility table, the "What Ships In main" section, and the architecture diagram list. After both PRs merge, the code will support 9 runtimes but the docs will claim 8 — a public-facing drift.
|
||||
|
||||
**Fix path:** Add `codex` to the README runtime list in PR #3033 (or a fast-follow) so the count and table stay accurate. `codex` already exists in `manifest.json` and has a template repo, so it is legitimate to list as "shipping on main."
|
||||
|
||||
---
|
||||
|
||||
## PR #3033 — Docs refresh (README + branding assets)
|
||||
|
||||
### Correctness ✅ (with 1 semantic drift pending)
|
||||
|
||||
- Terminology standardization ("adapters" → "runtimes") is correct and consistent with platform usage.
|
||||
- Deploy buttons updated from `molecule-monorepo` → `molecule-core`.
|
||||
- Canvas v4, Memory v2, SaaS surface, RFC #2967 mentions are all factually accurate.
|
||||
- **Missing:** `codex` runtime (see blocker above).
|
||||
|
||||
### Security ✅
|
||||
|
||||
- SVG assets are static branding; no scripts, no external references beyond the existing `<style>` media query.
|
||||
- No auth or credential surface touched.
|
||||
|
||||
### Performance N/A
|
||||
|
||||
- Docs-only; no runtime impact.
|
||||
|
||||
### Style / Readability ✅
|
||||
|
||||
- warm-paper theme description is concise and helpful.
|
||||
- Architecture diagram update (Docker → EC2 + SSM, KMS, SaaS CP) is accurate.
|
||||
- Quick Start clone URL fixed.
|
||||
|
||||
### Tests N/A
|
||||
|
||||
- No code changes; no test delta.
|
||||
|
||||
---
|
||||
|
||||
## Summary
|
||||
|
||||
| PR | Verdict | Action needed |
|
||||
|---|---|---|
|
||||
| #3029 | **Approve with nit** | Merge-ready after confirming #3033 (or follow-up) adds `codex` to README runtime list. |
|
||||
| #3033 | **Approve with blocker** | Add `codex` to the 8-runtimes list (making 9) and to the compatibility table before merge. |
|
||||
|
||||
**Risk if both merge as-is:** Public docs understate runtime count by 1; operators reading README may think `codex` is not supported when the provisioner already knows about it.
|
||||
|
||||
**Recommended merge order:** #3029 first (adds runtime support), then #3033 with `codex` line added (docs catch up).
|
||||
Reference in New Issue
Block a user