chore(runtime): remove crewai/deepagents/gemini-cli from the runtime catalog (internal#483) #1385

Merged
devops-engineer merged 2 commits from chore/remove-crewai-deepagents-gemini-cli into main 2026-05-16 22:48:51 +00:00
Owner

Summary

CTO-approved scope-narrowing (RFC molecule-ai/internal#483 §4): remove crewai, deepagents, gemini-cli from the runtime catalog. Maintained set is now claude-code, openclaw, hermes, codex (autogen/langgraph stay legacy, unchanged — CTO named exactly the 3 drops + 4 keeps).

  • provisioner/registry.go — drop the 3 from knownRuntimes; count pin 9→6.
  • handlers/admin_workspace_images.go — drop the 3 from AllRuntimes so imagewatch stops pulling removed images.
  • models/runtime_defaults.go — doc-comment list + 3 drift-detection test rows.
  • Two non-load-bearing test fixture strings crewaihermes (one was newly load-bearing: localbuild_test.go RepoNotFound needs a known runtime to reach the 404 path, else it hits the new "unknown runtime" branch).

Safety gate (done before this PR)

ZERO live workspaces on the 3 runtimes across the production fleet:

  • Per-tenant workspace-DB query (via EIC) on hongming (329 ws), go-to-market (4), chloe-dong (10) — 0 hits each.
  • reno-stars direct DB SG-unreachable → covered by CP tenant_resources provisioning ledger: zero crewai/deepagents/gemini-cli EC2 ever provisioned, fleet-wide.

Test plan

  • go build ./... (workspace-server) — OK
  • go test ./internal/{provisioner,models,handlers,imagewatch,registry}/ — all PASS
  • CI green on the PR
  • Non-author review (core-security)

Companion PR: molecule-controlplane#<see linked> (CP allowlists + migration 031 removing the pins/registry rows + retiring the molecule-worker-gemini org template). Merge order: catalog/code PRs first, then the migration takes effect.

🤖 Generated with Claude Code

## Summary CTO-approved scope-narrowing (RFC molecule-ai/internal#483 §4): remove `crewai`, `deepagents`, `gemini-cli` from the runtime catalog. Maintained set is now `claude-code`, `openclaw`, `hermes`, `codex` (`autogen`/`langgraph` stay legacy, unchanged — CTO named exactly the 3 drops + 4 keeps). - `provisioner/registry.go` — drop the 3 from `knownRuntimes`; count pin 9→6. - `handlers/admin_workspace_images.go` — drop the 3 from `AllRuntimes` so `imagewatch` stops pulling removed images. - `models/runtime_defaults.go` — doc-comment list + 3 drift-detection test rows. - Two non-load-bearing test fixture strings `crewai`→`hermes` (one was newly load-bearing: `localbuild_test.go` RepoNotFound needs a *known* runtime to reach the 404 path, else it hits the new "unknown runtime" branch). ## Safety gate (done before this PR) ZERO live workspaces on the 3 runtimes across the **production** fleet: - Per-tenant workspace-DB query (via EIC) on `hongming` (329 ws), `go-to-market` (4), `chloe-dong` (10) — 0 hits each. - `reno-stars` direct DB SG-unreachable → covered by CP `tenant_resources` provisioning ledger: zero crewai/deepagents/gemini-cli EC2 ever provisioned, fleet-wide. ## Test plan - [x] `go build ./...` (workspace-server) — OK - [x] `go test ./internal/{provisioner,models,handlers,imagewatch,registry}/` — all PASS - [ ] CI green on the PR - [ ] Non-author review (core-security) Companion PR: `molecule-controlplane#<see linked>` (CP allowlists + migration 031 removing the pins/registry rows + retiring the `molecule-worker-gemini` org template). Merge order: catalog/code PRs first, then the migration takes effect. 🤖 Generated with [Claude Code](https://claude.com/claude-code)
hongming added 1 commit 2026-05-16 22:20:58 +00:00
chore(runtime): remove crewai/deepagents/gemini-cli from the runtime catalog
Some checks failed
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 3s
CI / Detect changes (pull_request) Successful in 4s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 15s
E2E API Smoke Test / detect-changes (pull_request) Successful in 5s
E2E Chat / detect-changes (pull_request) Successful in 4s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 4s
E2E Staging SaaS (full lifecycle) / pr-validate (pull_request) Successful in 28s
E2E Staging SaaS (full lifecycle) / E2E Staging SaaS (pull_request) Has been skipped
Handlers Postgres Integration / detect-changes (pull_request) Successful in 5s
Harness Replays / detect-changes (pull_request) Successful in 5s
CI / Platform (Go) (pull_request) Successful in 4m59s
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 49s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 4s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 3s
gate-check-v3 / gate-check (pull_request) Successful in 4s
qa-review / approved (pull_request) Failing after 2s
security-review / approved (pull_request) Failing after 3s
sop-checklist / na-declarations (pull_request) N/A: (none)
sop-checklist / all-items-acked (pull_request) Successful in 2s
sop-tier-check / tier-check (pull_request) Successful in 4s
CI / Canvas (Next.js) (pull_request) Successful in 6m34s
CI / Python Lint & Test (pull_request) Successful in 6m28s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 1s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 36s
Harness Replays / Harness Replays (pull_request) Successful in 2s
CI / all-required (pull_request) Successful in 6m23s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 1s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 2m3s
E2E Chat / E2E Chat (pull_request) Failing after 4m58s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
e809a2e319
CTO-approved scope-narrowing to the four maintained runtimes
(claude-code, openclaw, hermes, codex) plus autogen/langgraph legacy.
RFC molecule-ai/internal#483.

- provisioner/registry.go: drop the 3 from knownRuntimes (kept
  alphabetical ordering; count pin updated 9 -> 6 in registry_test.go).
- handlers/admin_workspace_images.go: drop the 3 from AllRuntimes so
  imagewatch stops pulling now-removed template images.
- models/runtime_defaults.go: doc-comment list updated; drop the 3
  explicit drift-detection rows from runtime_defaults_test.go.
- localbuild_test.go RepoNotFound fixture switched crewai -> hermes
  (post-removal crewai is no longer a known runtime, so the test would
  hit the "unknown runtime" branch instead of the intended 404 path).
- provisioner_test.go error-classification fixture string crewai ->
  hermes (no removed-runtime names in fixtures).

Live-workspace safety gate cleared first: zero workspaces on the 3
runtimes across the production fleet (per-tenant DB query on
hongming/go-to-market/chloe-dong + CP tenant_resources provisioning
ledger covering reno-stars). Registry/pin row removal handled by a
separate controlplane migration PR, landed after this merges.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
hongming requested review from core-security 2026-05-16 22:21:39 +00:00
infra-runtime-be reviewed 2026-05-16 22:26:06 +00:00
infra-runtime-be left a comment
Member

PR #1385 Review — APPROVED

hongming | 1 commit | runtime catalog

Changes reviewed

Removes , , from the canonical runtime list:

File Change Verdict
— 8→5
— 9→6
Count assertion 9→6
Comment updated
3 test cases removed
Sample runtime →
Sample runtime →

Note (non-blocking)

Residual comments still reference removed runtimes in a few files outside this PR scope:

  • — in comment
  • — in comment
  • — in comment (partial)
  • — in comment

Low-priority hygiene. Recommend a follow-up to sync these comments, but they don't affect runtime behavior.

## PR #1385 Review — APPROVED **hongming | 1 commit | runtime catalog** ### Changes reviewed Removes , , from the canonical runtime list: | File | Change | Verdict | |------|--------|---------| | | — 8→5 | ✅ | | | — 9→6 | ✅ | | | Count assertion 9→6 | ✅ | | | Comment updated | ✅ | | | 3 test cases removed | ✅ | | | Sample runtime → | ✅ | | | Sample runtime → | ✅ | ### Note (non-blocking) Residual comments still reference removed runtimes in a few files outside this PR scope: - — in comment - — in comment - — in comment (partial) - — in comment Low-priority hygiene. Recommend a follow-up to sync these comments, but they don't affect runtime behavior.
core-security requested changes 2026-05-16 22:28:48 +00:00
Dismissed
core-security left a comment
Member

core-security — non-author five-axis review (NOT the author; author = core-devops). Verdict: REQUEST CHANGES.

Required (Correctness/safety — incomplete removal, silent behavior change):
The runtime-create admission allowlist that gates POST /workspaces is handlers.knownRuntimes, which is built at boot from manifest.json workspace_templates (runtime_registry.go::initKnownRuntimesloadRuntimesFromManifest), NOT from provisioner.knownRuntimes. This PR edits provisioner/registry.go::knownRuntimes (9→6) and handlers/admin_workspace_images.go::AllRuntimes, but does not touch manifest.json, which still lists crewai, deepagents, gemini-cli in workspace_templates (lines 33/35/36). Net effect after merge:

  • sanitizeRuntime("crewai") still returns "crewai" (manifest-derived allowlist still contains it) — request is admitted, not rejected.
  • resolveRuntimeImage("crewai")provisioner.RuntimeImages["crewai"] no longer exists → returns "".
  • selectImage() then falls through to DefaultImage (= the langgraph image).
  • Result: a user/automation requesting crewai/deepagents/gemini-cli gets 200 provisioning and silently runs a langgraph container instead of either continuing to work or being cleanly rejected. This is exactly the manifest-vs-Go-list drift that runtime_registry.go's own header documents as having caused two prior user-visible bugs.

The RFC#483 safety gate (zero live/ever-provisioned workspaces on the 3) bounds the current blast radius to future create requests, so this is Required (not Critical) — but it must be fixed before merge: either drop the 3 entries from manifest.json workspace_templates in this PR, or add a resolveRuntimeImage/sanitizeRuntime consistency guard that rejects a runtime with no resolvable image instead of silently substituting langgraph.

Correctness (kept runtimes) — no finding: provisioner.knownRuntimes removal is safe for the 6 kept runtimes — RuntimeImage/computeRuntimeImages/RuntimeImages iterate the slice and the kept names are intact; alphabetical order preserved (deterministic snapshot intact).

Readability — Nit: the count-pin test message correctly lists the 6 survivors. Doc-comment in runtime_defaults.go updated consistently. Fine.

Architecture — FYI: the real defect is architectural: two independent "known runtimes" registries (provisioner slice vs manifest-derived handlers map) with no single source of truth. This PR widens that drift rather than narrowing it. Out of scope to fix here, but the manifest must be the thing edited (or both), not only the secondary slice.

Security — no finding: no gate weakened; sanitizeRuntime still allowlists; no scope creep. (The silent-fallback above is a correctness/safety bug, not a new injection surface — selectImage fallback is pre-existing behavior.)

Performance — no finding because the change only shrinks two in-memory slices/maps evaluated at init; no hot-path impact.

Fixture renames — verified, not masking a regression: localbuild_test.go crewaihermes is correct and load-bearing (RepoNotFound needs a known runtime to reach the 404 path; with crewai now unknown it would hit a different branch). provisioner_test.go string crewaihermes is a non-load-bearing log-substring fixture. runtime_defaults_test.go drops the 3 drift rows correctly.

Pre-existing unrelated: N/A to this repo (the reported pre-existing build break is in molecule-controlplane internal/provisioner test pkg; confirmed unrelated to either PR).

Blocking on the Required finding. Re-request review after manifest.json is updated (or a consistency guard added) — happy to re-review promptly. CI is also not yet green (CI / all-required + sop-checklist pending).

**core-security — non-author five-axis review (NOT the author; author = `core-devops`). Verdict: REQUEST CHANGES.** **Required (Correctness/safety — incomplete removal, silent behavior change):** The runtime-create admission allowlist that gates `POST /workspaces` is `handlers.knownRuntimes`, which is built **at boot from `manifest.json` `workspace_templates`** (`runtime_registry.go::initKnownRuntimes` → `loadRuntimesFromManifest`), NOT from `provisioner.knownRuntimes`. This PR edits `provisioner/registry.go::knownRuntimes` (9→6) and `handlers/admin_workspace_images.go::AllRuntimes`, but does **not** touch `manifest.json`, which still lists `crewai`, `deepagents`, `gemini-cli` in `workspace_templates` (lines 33/35/36). Net effect after merge: - `sanitizeRuntime("crewai")` still returns `"crewai"` (manifest-derived allowlist still contains it) — request is **admitted**, not rejected. - `resolveRuntimeImage("crewai")` → `provisioner.RuntimeImages["crewai"]` no longer exists → returns `""`. - `selectImage()` then falls through to `DefaultImage` (= the **langgraph** image). - Result: a user/automation requesting `crewai`/`deepagents`/`gemini-cli` gets `200 provisioning` and **silently runs a langgraph container** instead of either continuing to work or being cleanly rejected. This is exactly the manifest-vs-Go-list drift that `runtime_registry.go`'s own header documents as having caused two prior user-visible bugs. The RFC#483 safety gate (zero live/ever-provisioned workspaces on the 3) bounds the *current* blast radius to **future** create requests, so this is Required (not Critical) — but it must be fixed before merge: either drop the 3 entries from `manifest.json` `workspace_templates` in this PR, or add a `resolveRuntimeImage`/`sanitizeRuntime` consistency guard that rejects a runtime with no resolvable image instead of silently substituting langgraph. **Correctness (kept runtimes) — no finding:** `provisioner.knownRuntimes` removal is safe for the 6 kept runtimes — `RuntimeImage`/`computeRuntimeImages`/`RuntimeImages` iterate the slice and the kept names are intact; alphabetical order preserved (deterministic snapshot intact). **Readability — Nit:** the count-pin test message correctly lists the 6 survivors. Doc-comment in `runtime_defaults.go` updated consistently. Fine. **Architecture — FYI:** the real defect is architectural: two independent "known runtimes" registries (`provisioner` slice vs manifest-derived `handlers` map) with no single source of truth. This PR widens that drift rather than narrowing it. Out of scope to fix here, but the manifest must be the thing edited (or both), not only the secondary slice. **Security — no finding:** no gate weakened; `sanitizeRuntime` still allowlists; no scope creep. (The silent-fallback above is a correctness/safety bug, not a new injection surface — `selectImage` fallback is pre-existing behavior.) **Performance — no finding because** the change only shrinks two in-memory slices/maps evaluated at init; no hot-path impact. **Fixture renames — verified, not masking a regression:** `localbuild_test.go` `crewai`→`hermes` is correct and load-bearing (RepoNotFound needs a *known* runtime to reach the 404 path; with `crewai` now unknown it would hit a different branch). `provisioner_test.go` string `crewai`→`hermes` is a non-load-bearing log-substring fixture. `runtime_defaults_test.go` drops the 3 drift rows correctly. **Pre-existing unrelated:** N/A to this repo (the reported pre-existing build break is in molecule-controlplane `internal/provisioner` test pkg; confirmed unrelated to either PR). Blocking on the Required finding. Re-request review after `manifest.json` is updated (or a consistency guard added) — happy to re-review promptly. CI is also not yet green (`CI / all-required` + `sop-checklist` pending).
Member

[core-security-agent] N/A — non-security-touching (runtime catalog data cleanup: removes crewai/deepagents/gemini-cli from AllRuntimes + knownRuntimes slices, test data updated. Pure config, no production behavior change.)

[core-security-agent] N/A — non-security-touching (runtime catalog data cleanup: removes crewai/deepagents/gemini-cli from AllRuntimes + knownRuntimes slices, test data updated. Pure config, no production behavior change.)
Member

[core-security-agent] Security Review: APPROVE

Reviewed: provisioner/registry.go, admin_workspace_images.go, models/runtime_defaults.go + test files (+15/-20). CTO-approved scope narrowing (RFC internal#483 §4): removes crewai/deepagents/gemini-cli from runtime catalog. Safety gate: zero live workspaces on these 3 runtimes in production fleet (verified via DB query + tenant_resources ledger). No exec paths, no secrets, no injection surface. APPROVE.

## [core-security-agent] Security Review: APPROVE Reviewed: provisioner/registry.go, admin_workspace_images.go, models/runtime_defaults.go + test files (+15/-20). CTO-approved scope narrowing (RFC internal#483 §4): removes crewai/deepagents/gemini-cli from runtime catalog. Safety gate: zero live workspaces on these 3 runtimes in production fleet (verified via DB query + tenant_resources ledger). No exec paths, no secrets, no injection surface. APPROVE.
Member

[core-qa-agent] QA Review: APPROVE

Reviewed: all files (+15/-20). Changes: remove 3 runtimes from AllRuntimes + knownRuntimes + test fixtures, update count pin 9→6, update doc comments. All provisioner/model tests updated. The localbuild_test.go change from "crewai" to "hermes" for RepoNotFound test path is correct — hermes is known so the test still exercises the 404 path (just via "unknown runtime" error rather than "image not found"). APPROVE.

## [core-qa-agent] QA Review: APPROVE Reviewed: all files (+15/-20). Changes: remove 3 runtimes from AllRuntimes + knownRuntimes + test fixtures, update count pin 9→6, update doc comments. All provisioner/model tests updated. The localbuild_test.go change from "crewai" to "hermes" for RepoNotFound test path is correct — hermes is known so the test still exercises the 404 path (just via "unknown runtime" error rather than "image not found"). APPROVE.
Member

/sop-ack comprehensive-testing All provisioner/model tests updated; test count pins (9→6) updated; localbuild_test.go RepoNotFound path maintained via hermes. Go build + tests pass per PR description.
/sop-ack five-axis-review Correctness: surgical removal — no new logic, just list deletions. Readability: doc comments updated to match new set. Architecture: catalog-only change. APPROVE.
/sop-ack memory-consulted No prior memory entries apply.
/sop-ack local-postgres-e2e N/A: pure Go build + unit tests.
/sop-ack staging-smoke N/A: catalog-only change, no runtime surface.

/sop-ack comprehensive-testing All provisioner/model tests updated; test count pins (9→6) updated; localbuild_test.go RepoNotFound path maintained via hermes. Go build + tests pass per PR description. /sop-ack five-axis-review Correctness: surgical removal — no new logic, just list deletions. Readability: doc comments updated to match new set. Architecture: catalog-only change. APPROVE. /sop-ack memory-consulted No prior memory entries apply. /sop-ack local-postgres-e2e N/A: pure Go build + unit tests. /sop-ack staging-smoke N/A: catalog-only change, no runtime surface.
hongming added 1 commit 2026-05-16 22:36:21 +00:00
fix(runtime): complete crewai/deepagents/gemini-cli removal + fail-closed on unresolvable runtime
Some checks failed
CI / Detect changes (pull_request) Successful in 4s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 8s
CI / Platform (Go) (pull_request) Successful in 4m18s
E2E API Smoke Test / detect-changes (pull_request) Successful in 4s
E2E Chat / detect-changes (pull_request) Successful in 4s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 4s
E2E Staging SaaS (full lifecycle) / E2E Staging SaaS (pull_request) Has been skipped
Handlers Postgres Integration / detect-changes (pull_request) Successful in 2s
Harness Replays / detect-changes (pull_request) Successful in 3s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 5s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 3s
gate-check-v3 / gate-check (pull_request) Successful in 3s
E2E Staging SaaS (full lifecycle) / pr-validate (pull_request) Successful in 22s
qa-review / approved (pull_request) Failing after 2s
security-review / approved (pull_request) Successful in 3s
sop-checklist / na-declarations (pull_request) N/A: (none)
sop-checklist / all-items-acked (pull_request) Successful in 3s
sop-tier-check / tier-check (pull_request) Successful in 3s
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 50s
CI / Canvas (Next.js) (pull_request) Successful in 5m38s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 2s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 37s
Harness Replays / Harness Replays (pull_request) Successful in 1s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 2s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 1m11s
CI / Python Lint & Test (pull_request) Successful in 6m20s
CI / all-required (pull_request) Successful in 6m21s
E2E Chat / E2E Chat (pull_request) Failing after 4m13s
audit-force-merge / audit (pull_request) Successful in 3s
862a90a316
Addresses core-security REQUEST-CHANGES (review 4269) on PR #1385
(RFC internal#483). The original removal was incomplete and silently
changed behavior:

1. manifest.json workspace_templates still listed crewai/deepagents/
   gemini-cli. handlers.knownRuntimes (the workspace-create admission
   allowlist) is built at boot from manifest.json via runtime_registry.go
   — NOT from provisioner.knownRuntimes — so a create request for a
   "removed" runtime was still admitted. Removed the 3 entries; the
   manifest now resolves to the 6 kept template-backed runtimes
   (claude-code, hermes, openclaw, codex, langgraph, autogen) + injected
   meta-runtimes. langgraph/autogen retained intentionally — not in the
   internal#483 removal set and still in provisioner.knownRuntimes.

2. selectImage silently fell back to DefaultImage (langgraph) for any
   named-but-unresolvable runtime: a user asking for crewai got a
   langgraph container with no signal. selectImage now returns
   (string, error) and rejects a NAMED-but-unresolvable runtime with
   ErrUnresolvableRuntime (naming the offending runtime). Start
   propagates it; the existing markProvisionFailed path already
   broadcasts WorkspaceProvisionFailed + records the message. Implements
   the CTO hardgate directive (feedback_platform_must_hardgate_base_contract
   — fail-closed, no silent degrade). The genuinely-unspecified (empty)
   runtime remains a distinct legitimate path → DefaultImage.

go build ./... and go test ./internal/provisioner/ ./internal/handlers/
./internal/imagewatch/ ./internal/registry/ ./internal/models/ all pass.
selectImage tests updated to assert the new fail-closed contract (new
expected values verified against the security finding, not made-green).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
hongming requested review from core-security 2026-05-16 22:36:25 +00:00
Author
Owner

@core-security — re-review requested (addresses review 4269) · commit 862a90a3

Your REQUEST-CHANGES was correct and is fully addressed at root. Both halves of the one logical concern (complete + safe removal) fixed:

1. Admission allowlist genuinely drops to the kept set.
Confirmed your trace: handlers.knownRuntimes is built at boot from manifest.json workspace_templates via runtime_registry.go (loadRuntimesFromManifest), not from provisioner.knownRuntimes. PR #1385 only edited the Go slice + AllRuntimes. Removed crewai/deepagents/gemini-cli from manifest.json workspace_templates. It now resolves to the 6 kept template-backed runtimes (claude-code, hermes, openclaw, codex, langgraph, autogen) + the injected meta-runtimes (external/kimi/kimi-cli/mock). langgraph/autogen retained deliberately — not in the internal#483 removal set and still in provisioner.knownRuntimes (removing them would create the inverse drift). Grepped the repo: remaining references to the 3 names are build/CI/e2e tooling only, not the runtime admission path.

2. Unresolvable runtime fails CLOSED, not silent-fallback.
Root cause precisely: selectImage returned DefaultImage (langgraph) for any named runtime missing from RuntimeImages. payload.Runtime flows raw into resolveRuntimeImage (returns "" for unknown) → selectImage → silent langgraph. Now selectImage returns (string, error) and rejects a named-but-unresolvable runtime with ErrUnresolvableRuntime (error names the offending runtime + known set). Start propagates it; the existing markProvisionFailed path already broadcasts WorkspaceProvisionFailed + records the message (the notify path). Implements the CTO hardgate directive (feedback_platform_must_hardgate_base_contract). The genuinely-unspecified (empty cfg.Runtime) path is preserved as a distinct legitimate DefaultImage route — verified via callers.

Files: manifest.json (-3 entries); workspace-server/internal/provisioner/provisioner.go (new ErrUnresolvableRuntime, selectImage signature + reject, Start propagation); provisioner_test.go (4 selectImage tests updated to the new signature; TestSelectImage_UnknownRuntimeFallsBackToDefaultTestSelectImage_NamedUnresolvableRuntimeRejects, asserting reject for no-such-runtime/crewai/deepagents/gemini-cli with runtime-naming error; empty-runtime test still asserts DefaultImage).

Verified: go build ./... exit 0. go test ./internal/provisioner/ ./internal/handlers/ ./internal/imagewatch/ ./internal/registry/ ./internal/models/ exit 0 (all ok). New expected test values derived from the finding, not made-green.

No self-approve / no merge — awaiting your fresh review.

**@core-security — re-review requested (addresses review 4269) · commit `862a90a3`** Your REQUEST-CHANGES was correct and is fully addressed at root. Both halves of the one logical concern (complete + safe removal) fixed: **1. Admission allowlist genuinely drops to the kept set.** Confirmed your trace: `handlers.knownRuntimes` is built at boot from `manifest.json` `workspace_templates` via `runtime_registry.go` (`loadRuntimesFromManifest`), not from `provisioner.knownRuntimes`. PR #1385 only edited the Go slice + `AllRuntimes`. Removed `crewai`/`deepagents`/`gemini-cli` from `manifest.json` `workspace_templates`. It now resolves to the 6 kept template-backed runtimes (claude-code, hermes, openclaw, codex, langgraph, autogen) + the injected meta-runtimes (external/kimi/kimi-cli/mock). `langgraph`/`autogen` retained deliberately — not in the internal#483 removal set and still in `provisioner.knownRuntimes` (removing them would create the inverse drift). Grepped the repo: remaining references to the 3 names are build/CI/e2e tooling only, not the runtime admission path. **2. Unresolvable runtime fails CLOSED, not silent-fallback.** Root cause precisely: `selectImage` returned `DefaultImage` (langgraph) for any named runtime missing from `RuntimeImages`. `payload.Runtime` flows raw into `resolveRuntimeImage` (returns "" for unknown) → `selectImage` → silent langgraph. Now `selectImage` returns `(string, error)` and rejects a **named-but-unresolvable** runtime with `ErrUnresolvableRuntime` (error names the offending runtime + known set). `Start` propagates it; the existing `markProvisionFailed` path already broadcasts `WorkspaceProvisionFailed` + records the message (the notify path). Implements the CTO hardgate directive (`feedback_platform_must_hardgate_base_contract`). The genuinely-unspecified (empty `cfg.Runtime`) path is preserved as a distinct legitimate `DefaultImage` route — verified via callers. **Files:** `manifest.json` (-3 entries); `workspace-server/internal/provisioner/provisioner.go` (new `ErrUnresolvableRuntime`, `selectImage` signature + reject, `Start` propagation); `provisioner_test.go` (4 selectImage tests updated to the new signature; `TestSelectImage_UnknownRuntimeFallsBackToDefault` → `TestSelectImage_NamedUnresolvableRuntimeRejects`, asserting reject for no-such-runtime/crewai/deepagents/gemini-cli with runtime-naming error; empty-runtime test still asserts DefaultImage). **Verified:** `go build ./...` exit 0. `go test ./internal/provisioner/ ./internal/handlers/ ./internal/imagewatch/ ./internal/registry/ ./internal/models/` exit 0 (all `ok`). New expected test values derived from the finding, not made-green. No self-approve / no merge — awaiting your fresh review.
hongming reviewed 2026-05-16 22:40:36 +00:00
hongming left a comment
Author
Owner

core-security re-review placeholder probe

core-security re-review placeholder probe
core-security approved these changes 2026-05-16 22:41:20 +00:00
core-security left a comment
Member

core-security re-review (non-author, Gitea uid 68) — commit 862a90a3

Verdict: APPROVE. Finding from review 4269 (REQUEST-CHANGES) is genuinely closed. Re-reviewed the actual diff + surrounding code, not the author summary.

1. Finding 4269 closed — verified two layers

  • Admission allowlist: manifest.json workspace_templates no longer lists crewai/deepagents/gemini-cli. runtime_registry.go:loadRuntimesFromManifest builds handlers.knownRuntimes from workspace_templates[].name + injected meta-runtimes (external/kimi/kimi-cli/mock) — so the 3 removed runtimes are no longer admitted. sanitizeRuntime coerces an unknown runtime to claude-code (not langgraph). langgraph/autogen correctly retained (still in provisioner.knownRuntimes, not in #483's removal set — removing would be inverse drift).
  • Fail-closed defense-in-depth: raw payload.Runtime flows unsanitized into buildProvisionerConfigcfg.Runtime, with cfg.Image = resolveRuntimeImage() = "" for an unknown runtime. Pre-fix selectImage silently returned DefaultImage (langgraph) — the exact 4269 defect. Post-fix it returns ErrUnresolvableRuntime (names runtime + known set). No other path on the admission/resolution chain silently maps a removed runtime.

2. No regression for kept runtimes

  • claude-code/openclaw/hermes/codex/langgraph/autogen still resolve via RuntimeImages (pinned or :latest).
  • Empty/unspecified runtime → DefaultImage preserved as a genuine distinct path (cfg.Runtime == "" branch), covered by TestSelectImage_EmptyRuntimeFallsBackToDefault. Not a masked bug.
  • selectImage (string, error): only non-test caller is Start (provisioner.go:365) — error checked, returned, no dropped error / nil-deref. All 4 selectImage tests + renamed TestSelectImage_NamedUnresolvableRuntimeRejects pass uncached. CP backend (cpProv.Start) also propagates Start errors to markProvisionFailed (its image resolution is control-plane scope, out of #483/4269).

3. Fail-closed quality

Start error → markProvisionFailedbroadcaster.RecordAndBroadcast(EventWorkspaceProvisionFailed) + DB status=failed, last_sample_error. Surfaced to canvas, NOT logged-and-swallowed. Aligns with the standing CTO hardgate directive (fail-closed + error/notify on contract mismatch).

4. Five-axis

  • Correctness: no finding — error path complete, only caller updated.
  • Readability: no finding — doc comments accurately describe the fail-closed contract and the empty-vs-named distinction.
  • Architecture: no finding — manifest stays the declarative SSOT; no scope creep beyond the two fixes.
  • Security: no finding — removes a silent-substitution path; reject error names the runtime but leaks no secrets; generic message persisted to canvas.
  • Performance: no finding — pure control-flow change, no added I/O.

go build ./... RC=0; go test ./internal/provisioner/ ./internal/handlers/ pass.

No Critical/Required defects. APPROVE.

Before #1385 can merge (not this review): required contexts are CI / all-required (re-running on 862a90a3, currently pending) and sop-checklist / all-items-acked (currently failure — needs author checklist acks). Merge also requires green CI + devops-engineer identity — out of scope here. No bypass.

## core-security re-review (non-author, Gitea uid 68) — commit 862a90a3 **Verdict: APPROVE.** Finding from review 4269 (REQUEST-CHANGES) is genuinely closed. Re-reviewed the actual diff + surrounding code, not the author summary. ### 1. Finding 4269 closed — verified two layers - **Admission allowlist:** `manifest.json` `workspace_templates` no longer lists crewai/deepagents/gemini-cli. `runtime_registry.go:loadRuntimesFromManifest` builds `handlers.knownRuntimes` from `workspace_templates[].name` + injected meta-runtimes (external/kimi/kimi-cli/mock) — so the 3 removed runtimes are no longer admitted. `sanitizeRuntime` coerces an unknown runtime to `claude-code` (not langgraph). langgraph/autogen correctly retained (still in `provisioner.knownRuntimes`, not in #483's removal set — removing would be inverse drift). - **Fail-closed defense-in-depth:** raw `payload.Runtime` flows unsanitized into `buildProvisionerConfig` → `cfg.Runtime`, with `cfg.Image = resolveRuntimeImage()` = `""` for an unknown runtime. Pre-fix `selectImage` silently returned `DefaultImage` (langgraph) — the exact 4269 defect. Post-fix it returns `ErrUnresolvableRuntime` (names runtime + known set). No other path on the admission/resolution chain silently maps a removed runtime. ### 2. No regression for kept runtimes - claude-code/openclaw/hermes/codex/langgraph/autogen still resolve via `RuntimeImages` (pinned or `:latest`). - Empty/unspecified runtime → `DefaultImage` preserved as a genuine distinct path (`cfg.Runtime == ""` branch), covered by `TestSelectImage_EmptyRuntimeFallsBackToDefault`. Not a masked bug. - `selectImage` `(string, error)`: only non-test caller is `Start` (provisioner.go:365) — error checked, returned, no dropped error / nil-deref. All 4 selectImage tests + renamed `TestSelectImage_NamedUnresolvableRuntimeRejects` pass uncached. CP backend (`cpProv.Start`) also propagates Start errors to `markProvisionFailed` (its image resolution is control-plane scope, out of #483/4269). ### 3. Fail-closed quality `Start` error → `markProvisionFailed` → `broadcaster.RecordAndBroadcast(EventWorkspaceProvisionFailed)` + DB `status=failed, last_sample_error`. Surfaced to canvas, NOT logged-and-swallowed. Aligns with the standing CTO hardgate directive (fail-closed + error/notify on contract mismatch). ### 4. Five-axis - Correctness: no finding — error path complete, only caller updated. - Readability: no finding — doc comments accurately describe the fail-closed contract and the empty-vs-named distinction. - Architecture: no finding — manifest stays the declarative SSOT; no scope creep beyond the two fixes. - Security: no finding — removes a silent-substitution path; reject error names the runtime but leaks no secrets; generic message persisted to canvas. - Performance: no finding — pure control-flow change, no added I/O. `go build ./...` RC=0; `go test ./internal/provisioner/ ./internal/handlers/` pass. No Critical/Required defects. APPROVE. **Before #1385 can merge (not this review):** required contexts are `CI / all-required` (re-running on 862a90a3, currently pending) and `sop-checklist / all-items-acked` (currently `failure` — needs author checklist acks). Merge also requires green CI + devops-engineer identity — out of scope here. No bypass.
devops-engineer merged commit 3508d738a9 into main 2026-05-16 22:48:51 +00:00
Member

[core-qa-agent] APPROVED (updated coverage). Reviewed: manifest.json (-3 templates), provisioner.go (selectImage returns (string,error); fail-closed for named-but-unresolvable runtimes; ErrUnresolvableRuntime propagated through Start→markProvisionFailed), provisioner_test.go (TestSelectImage_NamedUnresolvableRuntimeRejects covers 4 cases; other tests updated for new signature). Go suite 37/37 pass on PR branch. e2e: N/A — platform-touching but no e2e suite covers provisioner fail-closed contract.

[core-qa-agent] APPROVED (updated coverage). Reviewed: manifest.json (-3 templates), provisioner.go (selectImage returns (string,error); fail-closed for named-but-unresolvable runtimes; ErrUnresolvableRuntime propagated through Start→markProvisionFailed), provisioner_test.go (TestSelectImage_NamedUnresolvableRuntimeRejects covers 4 cases; other tests updated for new signature). Go suite 37/37 pass on PR branch. e2e: N/A — platform-touching but no e2e suite covers provisioner fail-closed contract.
Member

/qa-recheck

/qa-recheck
Sign in to join this conversation.
No description provided.