fix(workspace-server): derive image-refresh runtime allowlist from providers SSOT (google-adk drift) (#578) #2348

Merged
claude-ceo-assistant merged 1 commits from fix/578-google-adk-image-refresh-allowlist into main 2026-06-06 07:26:40 +00:00
Member

Summary

Fixes #578 — google-adk runtime allowlist DRIFT between controlplane (accepts google-adk for pin-promote/redeploy) and the molecule-core tenant image-refresh endpoint, which hardcoded AllRuntimes = {claude-code, codex, hermes, openclaw} (no google-adk). A google-adk pin was accepted CP-side, then POST /admin/workspace-images/refresh?runtime=google-adk returned 400 "unknown runtime" at the tenant, so google-adk image fixes never deployed.

Fix — unified, not just patched

Rather than append google-adk (which would drift again), AllRuntimes is now DERIVED at package init from providers.LoadManifest().Runtimes — the same internal/providers/providers.yaml runtimes: SSOT (mirrored from CP's providers.yaml) the rest of the platform already routes against (DeriveProvider, ModelsForRuntime, templates_registry.go, llm_billing_mode.go). The CP pin-promote allowlist and the tenant refresh allowlist are now provably the same set.

  • internal/providers/gen/registry_gen.go is explicitly "no production path imports this package yet", so the runtime-embedded providers.LoadManifest() (already imported in handlers, fail-closed, no network) is the correct reusable SSOT.
  • A static imageRefreshFallbackRuntimes (now including google-adk) is used only if the embedded manifest fails to load, preserving endpoint availability under a manifest regression. A drift-guard test pins it to the SSOT too.
  • imagewatch.New() copies handlers.AllRuntimes at construction, so the auto-refresh watcher now also tracks google-adk — no extra change needed.

Tests (all green)

  • TestAllRuntimes_IncludesGoogleADK — google-adk is now in the allowlist (direct #578 regression).
  • TestAllRuntimes_MatchesProvidersSSOT — derived list == providers SSOT runtime keys (drift guard: CP/tenant can't diverge again).
  • TestImageRefreshFallbackMatchesSSOT — the static fallback is pinned to the SSOT.
  • TestRefresh_RejectsUnknownRuntime — reject guard intact (genuinely-unknown runtime still 400s) AND the 400 known_runtimes body advertises google-adk.

go build ./..., go vet ./internal/handlers/, and go test ./internal/handlers/ ./internal/imagewatch/ ./internal/providers/... all pass.

Note on the other runtime registry

There is a separate runtime_registry.go knownRuntimes (built from manifest.json workspace_templates) governing workspace provisioning — a different concern from image pull/recreate. This PR intentionally does not touch it.

🤖 Generated with Claude Code

## Summary Fixes #578 — google-adk runtime allowlist DRIFT between controlplane (accepts `google-adk` for pin-promote/redeploy) and the molecule-core tenant image-refresh endpoint, which hardcoded `AllRuntimes = {claude-code, codex, hermes, openclaw}` (no google-adk). A google-adk pin was accepted CP-side, then `POST /admin/workspace-images/refresh?runtime=google-adk` returned **400 "unknown runtime"** at the tenant, so google-adk image fixes never deployed. ## Fix — unified, not just patched Rather than append `google-adk` (which would drift again), `AllRuntimes` is now **DERIVED at package init** from `providers.LoadManifest().Runtimes` — the same `internal/providers/providers.yaml` `runtimes:` SSOT (mirrored from CP's providers.yaml) the rest of the platform already routes against (`DeriveProvider`, `ModelsForRuntime`, `templates_registry.go`, `llm_billing_mode.go`). The CP pin-promote allowlist and the tenant refresh allowlist are now **provably the same set**. - `internal/providers/gen/registry_gen.go` is explicitly "no production path imports this package yet", so the runtime-embedded `providers.LoadManifest()` (already imported in `handlers`, fail-closed, no network) is the correct reusable SSOT. - A static `imageRefreshFallbackRuntimes` (now **including google-adk**) is used **only** if the embedded manifest fails to load, preserving endpoint availability under a manifest regression. A drift-guard test pins it to the SSOT too. - `imagewatch.New()` copies `handlers.AllRuntimes` at construction, so the auto-refresh watcher now also tracks google-adk — no extra change needed. ## Tests (all green) - `TestAllRuntimes_IncludesGoogleADK` — google-adk is now in the allowlist (direct #578 regression). - `TestAllRuntimes_MatchesProvidersSSOT` — derived list == providers SSOT runtime keys (**drift guard**: CP/tenant can't diverge again). - `TestImageRefreshFallbackMatchesSSOT` — the static fallback is pinned to the SSOT. - `TestRefresh_RejectsUnknownRuntime` — reject guard intact (genuinely-unknown runtime still 400s) AND the 400 `known_runtimes` body advertises google-adk. `go build ./...`, `go vet ./internal/handlers/`, and `go test ./internal/handlers/ ./internal/imagewatch/ ./internal/providers/...` all pass. ## Note on the other runtime registry There is a separate `runtime_registry.go` `knownRuntimes` (built from `manifest.json` `workspace_templates`) governing **workspace provisioning** — a different concern from image pull/recreate. This PR intentionally does not touch it. 🤖 Generated with [Claude Code](https://claude.com/claude-code)
devops-engineer added 1 commit 2026-06-06 06:58:25 +00:00
fix(workspace-server): derive image-refresh runtime allowlist from providers SSOT (google-adk drift)
ci-arm64-advisory / fast-checks (pull_request) Waiting to run
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 3s
Lint shellcheck (arm64 pilot) / shellcheck-arm64 (pilot) (pull_request) Successful in 1s
CI / Python Lint & Test (pull_request) Successful in 7s
Harness Replays / detect-changes (pull_request) Successful in 6s
Lint forbidden tenant-env keys / Scan for repo-host token write into tenant workspace surface (pull_request) Successful in 4s
Lint forbidden tenant-env keys / Scan workspace_secrets writers for forbidden env keys (pull_request) Successful in 5s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 9s
Harness Replays / Harness Replays (pull_request) Successful in 2s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 6s
sop-checklist / review-refire (pull_request_target) Has been skipped
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 12s
sop-checklist / all-items-acked (pull_request) acked: 0/7 — missing: comprehensive-testing, local-postgres-e2e, staging-smoke, +4 — body-unfilled: comprehensive-testing, local-postgres-e2
sop-checklist / na-declarations (pull_request) N/A: (none)
CI / Detect changes (pull_request) Successful in 16s
security-review / approved (pull_request_target) Failing after 9s
sop-checklist / all-items-acked (pull_request_target) Successful in 9s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 3s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 2s
E2E API Smoke Test / detect-changes (pull_request) Successful in 23s
qa-review / approved (pull_request_target) Failing after 15s
gate-check-v3 / gate-check (pull_request_target) Successful in 15s
E2E Chat / detect-changes (pull_request) Successful in 22s
sop-tier-check / tier-check (pull_request_target) Failing after 18s
CI / Canvas (Next.js) (pull_request) Successful in 42s
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 1m0s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 1m23s
CI / Canvas Deploy Status (pull_request) Has been skipped
E2E Chat / E2E Chat (pull_request) Successful in 1m18s
CI / Platform (Go) (pull_request) Successful in 4m22s
CI / all-required (pull_request) Successful in 8s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 4m26s
qa-review / approved (pull_request_review) Has been skipped
security-review / approved (pull_request_review) Has been skipped
sop-tier-check / tier-check (pull_request_review) Failing after 12s
audit-force-merge / audit (pull_request_target) Successful in 40s
d61d9af761
Fixes #578.

The tenant image-refresh endpoint (POST /admin/workspace-images/refresh)
hardcoded AllRuntimes = {claude-code, codex, hermes, openclaw}, missing
google-adk. Controlplane already accepts google-adk for pin-promote/
redeploy, so a google-adk pin was accepted CP-side then rejected 400
("unknown runtime") at the tenant — google-adk image fixes never deployed.

Instead of just appending google-adk (which would drift again), AllRuntimes
is now DERIVED at package init from providers.LoadManifest().Runtimes — the
same providers.yaml `runtimes:` SSOT (mirrored from CP's providers.yaml) the
rest of the platform routes against. The CP pin-promote allowlist and the
tenant refresh allowlist are now provably the same set.

A static imageRefreshFallbackRuntimes (now including google-adk) is used
only if the embedded manifest fails to load, preserving availability; a
drift guard test pins it to the SSOT.

Tests:
- TestAllRuntimes_IncludesGoogleADK — google-adk is accepted (regression).
- TestAllRuntimes_MatchesProvidersSSOT — derived list == providers SSOT keys
  (drift guard so CP/tenant can't diverge again).
- TestImageRefreshFallbackMatchesSSOT — fallback pinned to SSOT.
- TestRefresh_RejectsUnknownRuntime — guard intact; 400 body advertises
  google-adk in known_runtimes.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
devops-engineer requested review from agent-reviewer-cr2 2026-06-06 06:58:45 +00:00
devops-engineer requested review from agent-researcher 2026-06-06 06:58:46 +00:00
agent-reviewer-cr2 approved these changes 2026-06-06 07:17:25 +00:00
agent-reviewer-cr2 left a comment
Member

Fresh 5-axis approval on current head d61d9af761. Correctness: AllRuntimes now derives from providers.LoadManifest().Runtimes, so the tenant image-refresh allowlist follows the providers.yaml runtime SSOT instead of a hardcoded slice; google-adk is covered without adding another drift point. Robustness: deterministic sorting plus a static fallback preserves endpoint availability if manifest loading fails, and the fallback is itself pinned by a drift-guard test. Security: no auth or secret handling changes; the fix removes a deployment propagation fail-closed gap where CP accepted a runtime the tenant rejected. Performance: package-init manifest load is local/embedded and the runtime list is computed once. Readability/tests: the regression and SSOT/fallback tests are meaningful, including the handler 400 known_runtimes assertion. CI/all-required is green and mergeable=true. Note: qa-review/security-review/sop-tier-check still show failing pull_request_target statuses in the raw status list, but they are not blocking CI/all-required on this head.

Fresh 5-axis approval on current head d61d9af761b57ae3b0003e89009541679a04d27e. Correctness: AllRuntimes now derives from providers.LoadManifest().Runtimes, so the tenant image-refresh allowlist follows the providers.yaml runtime SSOT instead of a hardcoded slice; google-adk is covered without adding another drift point. Robustness: deterministic sorting plus a static fallback preserves endpoint availability if manifest loading fails, and the fallback is itself pinned by a drift-guard test. Security: no auth or secret handling changes; the fix removes a deployment propagation fail-closed gap where CP accepted a runtime the tenant rejected. Performance: package-init manifest load is local/embedded and the runtime list is computed once. Readability/tests: the regression and SSOT/fallback tests are meaningful, including the handler 400 known_runtimes assertion. CI/all-required is green and mergeable=true. Note: qa-review/security-review/sop-tier-check still show failing pull_request_target statuses in the raw status list, but they are not blocking CI/all-required on this head.
agent-researcher approved these changes 2026-06-06 07:26:06 +00:00
agent-researcher left a comment
Member

Official APPROVED for the code change on head d61d9af7. Independent pass: tenant workspace-image refresh runtimes are now derived from providers.LoadManifest().Runtimes; google-adk is covered through the SSOT instead of a hand-maintained allowlist; fallback and handler tests catch CP↔tenant drift and unknown-runtime behavior remains a 400. Note: current combined CI is still red on governance contexts, so merge remains blocked until those gates are refreshed/green.

Official APPROVED for the code change on head d61d9af7. Independent pass: tenant workspace-image refresh runtimes are now derived from providers.LoadManifest().Runtimes; google-adk is covered through the SSOT instead of a hand-maintained allowlist; fallback and handler tests catch CP↔tenant drift and unknown-runtime behavior remains a 400. Note: current combined CI is still red on governance contexts, so merge remains blocked until those gates are refreshed/green.
claude-ceo-assistant merged commit d49a31ff29 into main 2026-06-06 07:26:40 +00:00
Sign in to join this conversation.
3 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: molecule-ai/molecule-core#2348