fix(platform): install docker-cli in workspace-server image — unblocks RegistryModeLocal #765
Merged
hongming
merged 1 commits from 2026-05-13 04:39:20 +00:00
infra/dockerfile-add-docker-cli-for-local-build into main
1 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
| b8ccd21c8c |
fix(platform): install docker-cli in workspace-server image — unblocks RegistryModeLocal
Some checks failed
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 18s
CI / Detect changes (pull_request) Successful in 17s
E2E API Smoke Test / detect-changes (pull_request) Successful in 18s
Harness Replays / detect-changes (pull_request) Successful in 13s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 22s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 21s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 13s
qa-review / approved (pull_request) Failing after 13s
security-review / approved (pull_request) Failing after 14s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 25s
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 1m24s
CI / Platform (Go) (pull_request) Has been skipped
CI / Canvas (Next.js) (pull_request) Has been skipped
CI / Shellcheck (E2E scripts) (pull_request) Has been skipped
E2E API Smoke Test / E2E API Smoke Test (pull_request) Has been skipped
CI / Python Lint & Test (pull_request) Has been skipped
Harness Replays / Harness Replays (pull_request) Has been skipped
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Has been skipped
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Has been skipped
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Has been skipped
CI / Canvas Deploy Reminder (pull_request) Has been skipped
CI / all-required (pull_request) Successful in 1s
sop-checklist-gate / gate (pull_request) Successful in 37s
gate-check-v3 / gate-check (pull_request) Successful in 38s
sop-tier-check / tier-check (pull_request) Successful in 37s
sop-checklist / all-items-acked (pull_request) acked: 0/7 — missing: comprehensive-testing, local-postgres-e2e, staging-smoke, +4
audit-force-merge / audit (pull_request) Successful in 8s
The platform server's internal/provisioner/localbuild.go (Task #194 / Issue #63 — the post-2026-05-06 GHCR-suspension fallback) shells out via exec.Command("docker", "image", "inspect"/"build"/"tag", ...) in the production dockerHasTagProd / dockerBuildProd / dockerTagProd functions. The colocated workspace-server/Dockerfile installed `ca-certificates git tzdata wget` in the alpine runtime layer but NOT `docker-cli`, so every workspace re-provision in the now-permanent RegistryModeLocal path fails at step 2 (cache check): local-build: image inspect for molecule-local/workspace-template-claude-code:<sha> failed (exec: "docker": executable file not found in $PATH); will rebuild Provisioner: workspace start failed for <id>: local-build mode: ensure image for runtime "claude-code": local-build: docker build molecule-local/workspace-template-claude-code:<sha>: exec: "docker": executable file not found in $PATH Net: ANY ws-* container that dies (auto-restart on container-dead, the liveness-monitor RestartByID, plugin auto-restart, secrets-set auto-restart, manual POST /workspaces/:id/restart) cannot come back up. Already took down CP-QA (ec6cf05b) and sdk-lead (360d42e4); also blocks the MiniMax LLM-provider switch for the 6 *-lead workspaces (which requires postgres UPDATE workspace_secrets + POST /restart to re-bake the env from the updated secrets). The Docker SOCKET is already mounted into the platform container — the entrypoint.sh adds the platform user to the docker group derived from the socket's gid. Only the CLI binary was missing. Per `registry_mode.go:Resolve()`, MOLECULE_IMAGE_REGISTRY is the toggle: set ⇒ RegistryModeSaaS pull from a real registry; unset ⇒ RegistryModeLocal clone+build from Gitea. Since 2026-05-06 the env var has been unset (GHCR was the only SaaS-mode target and it's unreachable post-suspension), so RegistryModeLocal is the permanent mode until internal#231 (GHCR→ECR migration) lands. This Dockerfile needs to support the mode the code is permanently in. Diff is +16/-1 (mostly comment explaining why). The single behavioural change: `docker-cli` added to the apk-add line. Verification: post-deploy, `POST /workspaces/360d42e4-…/restart` (the known-failed sdk-lead) should succeed and bring the workspace back up with its current Claude-Opus secrets — that's the first confirmation the local-build path is unblocked. Then the MiniMax switch can proceed (postgres UPDATE on each *-lead's workspace_secrets + POST /restart). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |