From b8ccd21c8cf0f772332ee5a63aef13e96b57c755 Mon Sep 17 00:00:00 2001 From: hongming-pc2 Date: Tue, 12 May 2026 14:13:55 -0700 Subject: [PATCH] =?UTF-8?q?fix(platform):=20install=20docker-cli=20in=20wo?= =?UTF-8?q?rkspace-server=20image=20=E2=80=94=20unblocks=20RegistryModeLoc?= =?UTF-8?q?al?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The platform server's internal/provisioner/localbuild.go (Task #194 / Issue #63 — the post-2026-05-06 GHCR-suspension fallback) shells out via exec.Command("docker", "image", "inspect"/"build"/"tag", ...) in the production dockerHasTagProd / dockerBuildProd / dockerTagProd functions. The colocated workspace-server/Dockerfile installed `ca-certificates git tzdata wget` in the alpine runtime layer but NOT `docker-cli`, so every workspace re-provision in the now-permanent RegistryModeLocal path fails at step 2 (cache check): local-build: image inspect for molecule-local/workspace-template-claude-code: failed (exec: "docker": executable file not found in $PATH); will rebuild Provisioner: workspace start failed for : local-build mode: ensure image for runtime "claude-code": local-build: docker build molecule-local/workspace-template-claude-code:: exec: "docker": executable file not found in $PATH Net: ANY ws-* container that dies (auto-restart on container-dead, the liveness-monitor RestartByID, plugin auto-restart, secrets-set auto-restart, manual POST /workspaces/:id/restart) cannot come back up. Already took down CP-QA (ec6cf05b) and sdk-lead (360d42e4); also blocks the MiniMax LLM-provider switch for the 6 *-lead workspaces (which requires postgres UPDATE workspace_secrets + POST /restart to re-bake the env from the updated secrets). The Docker SOCKET is already mounted into the platform container — the entrypoint.sh adds the platform user to the docker group derived from the socket's gid. Only the CLI binary was missing. Per `registry_mode.go:Resolve()`, MOLECULE_IMAGE_REGISTRY is the toggle: set ⇒ RegistryModeSaaS pull from a real registry; unset ⇒ RegistryModeLocal clone+build from Gitea. Since 2026-05-06 the env var has been unset (GHCR was the only SaaS-mode target and it's unreachable post-suspension), so RegistryModeLocal is the permanent mode until internal#231 (GHCR→ECR migration) lands. This Dockerfile needs to support the mode the code is permanently in. Diff is +16/-1 (mostly comment explaining why). The single behavioural change: `docker-cli` added to the apk-add line. Verification: post-deploy, `POST /workspaces/360d42e4-…/restart` (the known-failed sdk-lead) should succeed and bring the workspace back up with its current Claude-Opus secrets — that's the first confirmation the local-build path is unblocked. Then the MiniMax switch can proceed (postgres UPDATE on each *-lead's workspace_secrets + POST /restart). Co-Authored-By: Claude Opus 4.7 (1M context) --- workspace-server/Dockerfile | 17 ++++++++++++++++- 1 file changed, 16 insertions(+), 1 deletion(-) diff --git a/workspace-server/Dockerfile b/workspace-server/Dockerfile index b1606e00..ade5812d 100644 --- a/workspace-server/Dockerfile +++ b/workspace-server/Dockerfile @@ -35,7 +35,22 @@ RUN CGO_ENABLED=0 GOOS=linux go build \ -o /memory-plugin ./cmd/memory-plugin-postgres FROM alpine:3.20@sha256:c64c687cbea9300178b30c95835354e34c4e4febc4badfe27102879de0483b5e -RUN apk add --no-cache ca-certificates git tzdata wget +# docker-cli is required by internal/provisioner/localbuild.go which +# shells out via exec.Command("docker", "image", "inspect"/"build"/"tag", ...) +# whenever Resolve().Mode == RegistryModeLocal — which is the permanent +# mode post-2026-05-06 (Molecule-AI GitHub org suspended → GHCR +# unreachable → MOLECULE_IMAGE_REGISTRY unset → registry_mode.go falls +# through to RegistryModeLocal). Without docker-cli here the platform +# fails every workspace re-provision with `local-build: image inspect +# for molecule-local/workspace-template-: failed +# (exec: "docker": executable file not found in $PATH)` and the +# workspace stays status=failed. The Docker SOCKET is already mounted +# (entrypoint.sh adds the platform user to the docker group) — only +# the CLI binary was missing. Caught after sdk-lead + CP-QA went down +# this way during the MiniMax-switch attempt + after-Class-A audit. +# Related: Task #194 / Issue #63 (local-build path added); +# `feedback_workspace_image_ghcr_dead`. +RUN apk add --no-cache ca-certificates docker-cli git tzdata wget COPY --from=builder /platform /platform COPY --from=builder /memory-plugin /memory-plugin COPY workspace-server/migrations /migrations -- 2.45.2