Some checks failed
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 18s
CI / Detect changes (pull_request) Successful in 17s
E2E API Smoke Test / detect-changes (pull_request) Successful in 18s
Harness Replays / detect-changes (pull_request) Successful in 13s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 22s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 21s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 13s
qa-review / approved (pull_request) Failing after 13s
security-review / approved (pull_request) Failing after 14s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 25s
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 1m24s
CI / Platform (Go) (pull_request) Has been skipped
CI / Canvas (Next.js) (pull_request) Has been skipped
CI / Shellcheck (E2E scripts) (pull_request) Has been skipped
E2E API Smoke Test / E2E API Smoke Test (pull_request) Has been skipped
CI / Python Lint & Test (pull_request) Has been skipped
Harness Replays / Harness Replays (pull_request) Has been skipped
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Has been skipped
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Has been skipped
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Has been skipped
CI / Canvas Deploy Reminder (pull_request) Has been skipped
CI / all-required (pull_request) Successful in 1s
sop-checklist-gate / gate (pull_request) Successful in 37s
gate-check-v3 / gate-check (pull_request) Successful in 38s
sop-tier-check / tier-check (pull_request) Successful in 37s
sop-checklist / all-items-acked (pull_request) acked: 0/7 — missing: comprehensive-testing, local-postgres-e2e, staging-smoke, +4
audit-force-merge / audit (pull_request) Successful in 8s
The platform server's internal/provisioner/localbuild.go (Task #194 / Issue #63 — the post-2026-05-06 GHCR-suspension fallback) shells out via exec.Command("docker", "image", "inspect"/"build"/"tag", ...) in the production dockerHasTagProd / dockerBuildProd / dockerTagProd functions. The colocated workspace-server/Dockerfile installed `ca-certificates git tzdata wget` in the alpine runtime layer but NOT `docker-cli`, so every workspace re-provision in the now-permanent RegistryModeLocal path fails at step 2 (cache check): local-build: image inspect for molecule-local/workspace-template-claude-code:<sha> failed (exec: "docker": executable file not found in $PATH); will rebuild Provisioner: workspace start failed for <id>: local-build mode: ensure image for runtime "claude-code": local-build: docker build molecule-local/workspace-template-claude-code:<sha>: exec: "docker": executable file not found in $PATH Net: ANY ws-* container that dies (auto-restart on container-dead, the liveness-monitor RestartByID, plugin auto-restart, secrets-set auto-restart, manual POST /workspaces/:id/restart) cannot come back up. Already took down CP-QA (ec6cf05b) and sdk-lead (360d42e4); also blocks the MiniMax LLM-provider switch for the 6 *-lead workspaces (which requires postgres UPDATE workspace_secrets + POST /restart to re-bake the env from the updated secrets). The Docker SOCKET is already mounted into the platform container — the entrypoint.sh adds the platform user to the docker group derived from the socket's gid. Only the CLI binary was missing. Per `registry_mode.go:Resolve()`, MOLECULE_IMAGE_REGISTRY is the toggle: set ⇒ RegistryModeSaaS pull from a real registry; unset ⇒ RegistryModeLocal clone+build from Gitea. Since 2026-05-06 the env var has been unset (GHCR was the only SaaS-mode target and it's unreachable post-suspension), so RegistryModeLocal is the permanent mode until internal#231 (GHCR→ECR migration) lands. This Dockerfile needs to support the mode the code is permanently in. Diff is +16/-1 (mostly comment explaining why). The single behavioural change: `docker-cli` added to the apk-add line. Verification: post-deploy, `POST /workspaces/360d42e4-…/restart` (the known-failed sdk-lead) should succeed and bring the workspace back up with its current Claude-Opus secrets — that's the first confirmation the local-build path is unblocked. Then the MiniMax switch can proceed (postgres UPDATE on each *-lead's workspace_secrets + POST /restart). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
138 lines
6.9 KiB
Docker
138 lines
6.9 KiB
Docker
# Platform-only image (no canvas). Used by publish-workspace-server-image
|
|
# workflow for ECR. Tenant image uses Dockerfile.tenant instead.
|
|
#
|
|
# Templates + plugins are pre-cloned by scripts/clone-manifest.sh (in CI
|
|
# or on the operator host) into .tenant-bundle-deps/ — same pattern as
|
|
# Dockerfile.tenant. See that file's header for the full rationale; the
|
|
# short version is that post-2026-05-06 every workspace-template-* and
|
|
# org-template-* repo on Gitea is private, so an in-image `git clone`
|
|
# has no auth path that doesn't leak the Gitea token into a layer.
|
|
#
|
|
# Build context: repo root, with `.tenant-bundle-deps/` populated by the
|
|
# workflow's "Pre-clone manifest deps" step (Task #173).
|
|
|
|
FROM golang:1.25-alpine@sha256:c4ea15b4a7912716eb362a022e2b12317762eca387423760bc59c0f9ae69423c AS builder
|
|
WORKDIR /app
|
|
COPY workspace-server/go.mod workspace-server/go.sum ./
|
|
# github-app-auth plugin removed 2026-05-07 (#157): per-agent Gitea
|
|
# identities replaced the GitHub-App-installation token flow after the
|
|
# 2026-05-06 suspension. Pre-removal this stage COPY'd the sibling
|
|
# plugin repo + injected a `replace` directive; both are gone.
|
|
RUN go mod download
|
|
COPY workspace-server/ .
|
|
# GIT_SHA mirror of Dockerfile.tenant — see that file for the rationale.
|
|
ARG GIT_SHA=dev
|
|
RUN CGO_ENABLED=0 GOOS=linux go build \
|
|
-ldflags "-X github.com/Molecule-AI/molecule-monorepo/platform/internal/buildinfo.GitSHA=${GIT_SHA}" \
|
|
-o /platform ./cmd/server
|
|
# Bundle the built-in memory-plugin-postgres binary so an operator can
|
|
# activate Memory v2 by setting MEMORY_V2_CUTOVER=true + (default)
|
|
# MEMORY_PLUGIN_URL=http://localhost:9100. The entrypoint starts this
|
|
# binary in the background; main /platform talks to it over loopback.
|
|
# Stays inert until the operator flips the cutover env var.
|
|
RUN CGO_ENABLED=0 GOOS=linux go build \
|
|
-ldflags "-X github.com/Molecule-AI/molecule-monorepo/platform/internal/buildinfo.GitSHA=${GIT_SHA}" \
|
|
-o /memory-plugin ./cmd/memory-plugin-postgres
|
|
|
|
FROM alpine:3.20@sha256:c64c687cbea9300178b30c95835354e34c4e4febc4badfe27102879de0483b5e
|
|
# docker-cli is required by internal/provisioner/localbuild.go which
|
|
# shells out via exec.Command("docker", "image", "inspect"/"build"/"tag", ...)
|
|
# whenever Resolve().Mode == RegistryModeLocal — which is the permanent
|
|
# mode post-2026-05-06 (Molecule-AI GitHub org suspended → GHCR
|
|
# unreachable → MOLECULE_IMAGE_REGISTRY unset → registry_mode.go falls
|
|
# through to RegistryModeLocal). Without docker-cli here the platform
|
|
# fails every workspace re-provision with `local-build: image inspect
|
|
# for molecule-local/workspace-template-<runtime>:<sha> failed
|
|
# (exec: "docker": executable file not found in $PATH)` and the
|
|
# workspace stays status=failed. The Docker SOCKET is already mounted
|
|
# (entrypoint.sh adds the platform user to the docker group) — only
|
|
# the CLI binary was missing. Caught after sdk-lead + CP-QA went down
|
|
# this way during the MiniMax-switch attempt + after-Class-A audit.
|
|
# Related: Task #194 / Issue #63 (local-build path added);
|
|
# `feedback_workspace_image_ghcr_dead`.
|
|
RUN apk add --no-cache ca-certificates docker-cli git tzdata wget
|
|
COPY --from=builder /platform /platform
|
|
COPY --from=builder /memory-plugin /memory-plugin
|
|
COPY workspace-server/migrations /migrations
|
|
# Templates + plugins (pre-cloned by scripts/clone-manifest.sh in the
|
|
# trusted CI / operator-host context, .git already stripped). The Gitea
|
|
# token used to clone them never enters this image — same shape as
|
|
# Dockerfile.tenant.
|
|
COPY .tenant-bundle-deps/workspace-configs-templates /workspace-configs-templates
|
|
COPY .tenant-bundle-deps/org-templates /org-templates
|
|
COPY .tenant-bundle-deps/plugins /plugins
|
|
# Non-root runtime with Docker socket access for workspace provisioning.
|
|
RUN addgroup -g 1000 platform && adduser -u 1000 -G platform -s /bin/sh -D platform
|
|
EXPOSE 8080
|
|
COPY <<'ENTRY' /entrypoint.sh
|
|
#!/bin/sh
|
|
# Set up docker-socket group (unchanged from pre-sidecar entrypoint).
|
|
if [ -S /var/run/docker.sock ]; then
|
|
SOCK_GID=$(stat -c '%g' /var/run/docker.sock 2>/dev/null || stat -f '%g' /var/run/docker.sock 2>/dev/null)
|
|
if [ -n "$SOCK_GID" ] && [ "$SOCK_GID" != "0" ]; then
|
|
addgroup -g "$SOCK_GID" docker 2>/dev/null || true
|
|
addgroup platform docker 2>/dev/null || true
|
|
else
|
|
addgroup platform root 2>/dev/null || true
|
|
fi
|
|
fi
|
|
|
|
# Memory v2 sidecar (built-in postgres plugin). Co-located with the
|
|
# main server so operators flipping MEMORY_V2_CUTOVER=true don't need
|
|
# to provision a separate service.
|
|
#
|
|
# Spawn-gating: only start the sidecar when the operator has indicated
|
|
# they want it — either MEMORY_V2_CUTOVER=true OR MEMORY_PLUGIN_URL set.
|
|
# Without that signal, the sidecar adds zero value (the platform's
|
|
# wiring.go skips building the client too) but pays a real cost: the
|
|
# plugin's first migration runs `CREATE EXTENSION vector`, which fails
|
|
# on tenant Postgres without pgvector preinstalled and aborts container
|
|
# boot via the 30s health gate. Caught on staging redeploy 2026-05-05.
|
|
#
|
|
# Env defaults (when sidecar IS spawned):
|
|
# MEMORY_PLUGIN_DATABASE_URL = $DATABASE_URL (share existing Postgres;
|
|
# plugin's `memory_namespaces` / `memory_records` tables coexist
|
|
# with `agent_memories` and the rest of the platform schema —
|
|
# no conflicts. Operator can override with a separate URL.)
|
|
# MEMORY_PLUGIN_LISTEN_ADDR = 127.0.0.1:9100
|
|
#
|
|
# Set MEMORY_PLUGIN_DISABLE=1 to force-skip the sidecar even with
|
|
# cutover env set (e.g. running the plugin externally on a separate host).
|
|
memory_plugin_wanted=""
|
|
if [ "$MEMORY_V2_CUTOVER" = "true" ] || [ -n "$MEMORY_PLUGIN_URL" ]; then
|
|
memory_plugin_wanted=1
|
|
fi
|
|
if [ -z "$MEMORY_PLUGIN_DISABLE" ] && [ -n "$memory_plugin_wanted" ] && [ -n "$DATABASE_URL" ]; then
|
|
: "${MEMORY_PLUGIN_DATABASE_URL:=$DATABASE_URL}"
|
|
: "${MEMORY_PLUGIN_LISTEN_ADDR:=:9100}"
|
|
export MEMORY_PLUGIN_DATABASE_URL MEMORY_PLUGIN_LISTEN_ADDR
|
|
echo "memory-plugin: starting sidecar on $MEMORY_PLUGIN_LISTEN_ADDR" >&2
|
|
# Drop privs to the platform user — the plugin doesn't need root and
|
|
# runs unprivileged elsewhere (tenant image already starts as canvas).
|
|
su-exec platform /memory-plugin &
|
|
MEMORY_PLUGIN_PID=$!
|
|
# Wait up to 30s for the plugin's /v1/health to return 200. Boot
|
|
# failure here is fatal — better to crash-loop than to silently
|
|
# serve cutover traffic against a dead plugin.
|
|
health_port=${MEMORY_PLUGIN_LISTEN_ADDR#:}
|
|
ready=0
|
|
for _ in $(seq 1 30); do
|
|
if wget -qO- --timeout=2 "http://localhost:${health_port}/v1/health" >/dev/null 2>&1; then
|
|
ready=1
|
|
break
|
|
fi
|
|
sleep 1
|
|
done
|
|
if [ "$ready" != "1" ]; then
|
|
echo "memory-plugin: ❌ /v1/health never returned 200 after 30s — aborting boot. Check that DATABASE_URL is reachable, has the pgvector extension, and the plugin's migrations applied." >&2
|
|
kill "$MEMORY_PLUGIN_PID" 2>/dev/null || true
|
|
exit 1
|
|
fi
|
|
echo "memory-plugin: ✅ sidecar healthy on :$health_port" >&2
|
|
fi
|
|
|
|
exec su-exec platform /platform "$@"
|
|
ENTRY
|
|
RUN chmod +x /entrypoint.sh && apk add --no-cache su-exec
|
|
ENTRYPOINT ["/entrypoint.sh"]
|