fix(workspace-server): restart re-provisions with the switched runtime, not the stale config.yaml #3208
Reference in New Issue
Block a user
Delete Branch "fix/restart-preserves-switched-runtime"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Problem
A workspace switched to a new runtime (e.g.
google-adk) is never re-provisioned on that runtime — every Restart silently reverts it to the template default (claude-code). This is the root cause of the "google-adk box is never built" symptom: themolecule-adk-demogoogle-adkworkspace boots aclaude-codecontainer and self-rejects withruntime seed mismatch: workspace requested runtime "google-adk" but the seeded config.yaml declares "claude-code".Root cause
workspace_restart.go→restartRuntimeFromConfig(the function called from thePOST /workspaces/:id/restarthandler before it builds the provision payload).The runtime-switch PATCH (
workspace_crud.goUpdate) writes only theworkspaces.runtimeDB column — it does not write through to the running container's/configs/config.yaml. But on the default Restart path (apply_template=false),restartRuntimeFromConfigread the container's stale, template-defaultconfig.yamlruntime, let it win over the switched DB runtime, and even overwrote the DB column back to the stale value (UPDATE workspaces SET runtime = ...). The returned value becomespayload.Runtime→ carried into the CP provision request → aclaude-codebox.Fix
workspaces.runtimeis the SSOT for the workspace runtime.restartRuntimeFromConfignow always returns the DB runtime on the default path. The containerconfig.yamlis read for drift-logging only and never overrides or overwrites the DB (the config volume is re-rendered from the runtime-default template on re-provision anyway).Tests
TestRestartRuntimeFromConfig_DefaultRestartPreservesContainerRuntime(renamed…TrustsDBRuntime) — it codified the buggy "container runtime wins + stomps DB" behavior; now asserts the DB SSOT wins and the DB is not written.restart_runtime_ssot_test.go: stale-config drift (DBgoogle-adkwins over configclaude-code),apply_templateshort-circuit, nil-provisioner (SaaS) path, and the no-drift case.go build ./...+ touched tests pass. (Pre-existing unrelated failures inTestManifest_RefPinning_*(network) andTestMCPPluginDeliveryContract_*also fail on cleanorigin/main.)🤖 Generated with Claude Code
Co-Authored-By: Claude Opus 4.8 (1M context) noreply@anthropic.com
The runtime-switch PATCH (workspace_crud.go Update) writes only the workspaces.runtime DB column — it does NOT write through to the running container's /configs/config.yaml. So on a plain Restart, restartRuntimeFromConfig read the container's stale, template-default config.yaml runtime ("claude-code"), let it WIN over the switched DB runtime (e.g. "google-adk"), and even overwrote the DB column back to the stale value. Result: a workspace switched to a new runtime was never re-provisioned on that runtime — every Restart silently reverted it to the template default. The workspaces.runtime column is the SSOT for the workspace runtime. Make restartRuntimeFromConfig always return the DB runtime on the default (apply_template=false) path. The container config.yaml is now read for drift-logging only and never overrides or overwrites the DB. Update the existing test that codified the buggy "container runtime wins" behavior to assert the DB SSOT wins, and add restart_runtime_ssot_test.go covering: stale-config-drift (DB wins), apply_template short-circuit, nil-provisioner (SaaS), and the no-drift case. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>APPROVED on current head
08a357461c.5-axis review: Correctness: restartRuntimeFromConfig now treats workspaces.runtime as the SSOT and never lets stale /configs/config.yaml overwrite the DB runtime. It still short-circuits apply_template and nil-provisioner paths to the DB runtime, and only logs config drift when ExecRead sees a mismatch. This matches the runtime-switch behavior: PATCH updates the DB column, not the running container config. Robustness: tests cover stale config losing to DB runtime, apply_template, nil provisioner, matching config, and assert the old DB-stomp UPDATE no longer occurs. Security: no new input, auth, secret, or SSRF surface. Performance: one existing best-effort config read remains; no extra I/O. Readability: comments clearly document the SSOT and the regression. CI Platform/all-required were green; qa/security were pre-review failures expected to re-evaluate after approval.
APPROVED on
08a35746.Reviewed the live diff against current main: exactly three files changed (workspace_restart.go plus restart-runtime tests), with the production change limited to restartRuntimeFromConfig. The new behavior correctly treats workspaces.runtime from the DB as the runtime SSOT for plain restarts, returns dbRuntime for apply_template and nil-provisioner paths, and only reads /configs/config.yaml for best-effort drift logging. The previous behavior that let stale container config.yaml override/stomp the DB runtime is removed, so switched runtimes such as google-adk are re-provisioned with the DB-selected runtime.
5-axis review: correctness is covered by regression tests for stale config drift, apply_template short-circuit, nil provisioner, and matching config. Robustness improves because stale config can no longer revert runtime-passing; other runtimes still flow through the same dbRuntime payload path. Security risk is low: no new auth/input surface, no secret handling, only a drift log of runtime names. Performance impact is unchanged/small because the existing best-effort ExecRead remains and no DB write is performed. Readability is clear and localized. CI note: CI / Platform (Go) and CI / all-required are green on 08a35746; remaining failures are review-gate/SOP body contexts, not code-test failures.