Files
molecule-core/workspace-server/Dockerfile
core-devops 1a88608e7c
Block internal-flavored paths / Block forbidden paths (push) Successful in 6s
CI / Python Lint & Test (push) Successful in 5s
Block integration-tester contamination artifacts / Block staging-trigger / invalid manifest contamination (push) Successful in 9s
E2E Peer Visibility (literal MCP list_peers) / detect-changes (push) Successful in 7s
E2E API Smoke Test / detect-changes (push) Successful in 15s
Handlers Postgres Integration / detect-changes (push) Successful in 7s
E2E Staging Canvas (Playwright) / detect-changes (push) Successful in 14s
Harness Replays / detect-changes (push) Successful in 7s
Lint forbidden tenant-env keys / Scan workspace_secrets writers for forbidden env keys (push) Successful in 7s
CI / Detect changes (push) Successful in 17s
Lint forbidden tenant-env keys / Scan for repo-host token write into tenant workspace surface (push) Successful in 5s
E2E Peer Visibility (literal MCP list_peers) / E2E Peer Visibility (local) (push) Has been skipped
E2E Chat / detect-changes (push) Successful in 19s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (push) Successful in 4s
E2E Peer Visibility (literal MCP list_peers) / E2E Peer Visibility (push) Successful in 6s
Secret scan / Scan diff for credential-shaped strings (push) Successful in 6s
CI / Shellcheck (E2E scripts) (push) Successful in 2s
CI / Canvas (Next.js) (push) Successful in 4s
CI / Canvas Deploy Status (push) Successful in 1s
lint-no-coe-on-required / lint-no-coe-on-required (push) Successful in 21s
template-delivery-e2e / detect-changes (push) Successful in 19s
template-delivery-e2e / Template-asset delivery (fresh seo-agent — config+prompts via asset channel, seo-all via plugin reconcile) (push) Successful in 1s
Handlers Postgres Integration / Handlers Postgres Integration (push) Successful in 26s
Local Provision Lifecycle E2E / Local Provision Lifecycle E2E (stub) (push) Successful in 33s
Harness Replays / Harness Replays (push) Successful in 1m25s
E2E Chat / E2E Chat (push) Successful in 1m36s
E2E API Smoke Test / E2E API Smoke Test (push) Successful in 2m19s
Local Provision Lifecycle E2E / Local Provision Lifecycle E2E (real image + MiniMax LLM, advisory) (push) Successful in 2m5s
publish-workspace-server-image / build-and-push (push) Successful in 3m49s
CI / Platform (Go) (push) Successful in 4m14s
CI / all-required (push) Successful in 3s
publish-workspace-server-image / Staging auto-deploy (push) Successful in 37s
publish-workspace-server-image / Production auto-deploy (push) Successful in 3m11s
E2E Staging SaaS (full lifecycle) / pr-validate (push) Compensated by status-reaper (push run was cancelled/superseded; Gitea 1.22.6 reports cancelled runs as failure statuses)
E2E Staging SaaS (full lifecycle) / E2E Staging SaaS (push) Compensated by status-reaper (push run was cancelled/superseded; Gitea 1.22.6 reports cancelled runs as failure statuses)
E2E Staging SaaS (full lifecycle) / Prune stale e2e DNS records (push) Compensated by status-reaper (push run was cancelled/superseded; Gitea 1.22.6 reports cancelled runs as failure statuses)
E2E Staging SaaS (full lifecycle) / E2E Staging Platform Boot (push) Compensated by status-reaper (push run was cancelled/superseded; Gitea 1.22.6 reports cancelled runs as failure statuses)
E2E Staging SaaS (full lifecycle) / E2E Staging Concierge user_tasks (push) Compensated by status-reaper (push run was cancelled/superseded; Gitea 1.22.6 reports cancelled runs as failure statuses)
E2E Staging SaaS (full lifecycle) / E2E Staging Workspace Requests (core#2606) (push) Compensated by status-reaper (push run was cancelled/superseded; Gitea 1.22.6 reports cancelled runs as failure statuses)
E2E Staging SaaS (full lifecycle) / E2E Staging Concierge Creates Workspace (push) Compensated by status-reaper (push run was cancelled/superseded; Gitea 1.22.6 reports cancelled runs as failure statuses)
E2E Staging SaaS (full lifecycle) / E2E Staging Concierge (compile+skip) (push) Compensated by status-reaper (push run was cancelled/superseded; Gitea 1.22.6 reports cancelled runs as failure statuses)
E2E Staging SaaS (full lifecycle) / E2E Staging Concierge Platform Agent (push) Compensated by status-reaper (push run was cancelled/superseded; Gitea 1.22.6 reports cancelled runs as failure statuses)
E2E Staging SaaS (full lifecycle) / E2E Staging Plugin Install Lifecycle (push) Compensated by status-reaper (push run was cancelled/superseded; Gitea 1.22.6 reports cancelled runs as failure statuses)
workspace-server/Dockerfile: add HEALTHCHECK for /health endpoint (#1261)
Co-authored-by: Molecule AI Core-DevOps <core-devops@agents.moleculesai.app>
Co-committed-by: Molecule AI Core-DevOps <core-devops@agents.moleculesai.app>
2026-06-23 09:18:38 +00:00

163 lines
8.5 KiB
Docker

# Platform-only image (no canvas). Used by publish-workspace-server-image
# workflow for ECR. Tenant image uses Dockerfile.tenant instead.
#
# Templates + plugins are pre-cloned by scripts/clone-manifest.sh (in CI
# or on the operator host) into .tenant-bundle-deps/ — same pattern as
# Dockerfile.tenant. See that file's header for the full rationale; the
# short version is that post-2026-05-06 every workspace-template-* and
# org-template-* repo on Gitea is private, so an in-image `git clone`
# has no auth path that doesn't leak the Gitea token into a layer.
#
# Build context: repo root, with `.tenant-bundle-deps/` populated by the
# workflow's "Pre-clone manifest deps" step (Task #173).
FROM golang:1.25-alpine@sha256:c4ea15b4a7912716eb362a022e2b12317762eca387423760bc59c0f9ae69423c AS builder
WORKDIR /app
COPY workspace-server/go.mod workspace-server/go.sum ./
# github-app-auth plugin removed 2026-05-07 (#157): per-agent Gitea
# identities replaced the GitHub-App-installation token flow after the
# 2026-05-06 suspension. Pre-removal this stage COPY'd the sibling
# plugin repo + injected a `replace` directive; both are gone.
RUN go mod download
COPY workspace-server/ .
# GIT_SHA mirror of Dockerfile.tenant — see that file for the rationale.
ARG GIT_SHA=dev
# Build flags (RFC#563):
# -trimpath strip absolute build-host paths from the binary
# (also slightly improves reproducibility)
# -ldflags "-s -w" omit symbol table (-s) and DWARF debug info (-w)
# -X ...GitSHA=... preserved — /buildinfo still returns the SHA at
# runtime. -s removes the symbol *table* but not
# -X-injected string vars (they're written into
# static data, not into the symtab).
# Empirical local measurement: ~29% smaller (87→61MB) for /platform.
# Mirrors the pattern already in molecule-controlplane/Dockerfile.
RUN CGO_ENABLED=0 GOOS=linux go build \
-trimpath \
-ldflags "-s -w -X git.moleculesai.app/molecule-ai/molecule-core/workspace-server/internal/buildinfo.GitSHA=${GIT_SHA}" \
-o /platform ./cmd/server
# Bundle the built-in memory-plugin-postgres binary so an operator can
# activate Memory v2 by setting MEMORY_V2_CUTOVER=true + (default)
# MEMORY_PLUGIN_URL=http://localhost:9100. The entrypoint starts this
# binary in the background; main /platform talks to it over loopback.
# Stays inert until the operator flips the cutover env var.
RUN CGO_ENABLED=0 GOOS=linux go build \
-trimpath \
-ldflags "-s -w -X git.moleculesai.app/molecule-ai/molecule-core/workspace-server/internal/buildinfo.GitSHA=${GIT_SHA}" \
-o /memory-plugin ./cmd/memory-plugin-postgres
FROM alpine:3.20@sha256:c64c687cbea9300178b30c95835354e34c4e4febc4badfe27102879de0483b5e
# docker-cli + docker-cli-buildx are required by internal/provisioner/
# localbuild.go which shells out via exec.Command("docker", "image",
# "inspect"/"build"/"tag", ...) whenever Resolve().Mode ==
# RegistryModeLocal — which is the permanent mode post-2026-05-06
# (Molecule-AI GitHub org suspended → GHCR unreachable →
# MOLECULE_IMAGE_REGISTRY unset → registry_mode.go falls through to
# RegistryModeLocal). The CLI binary alone is not enough: modern
# Docker (26.x in this image) defaults BuildKit=on, and `docker build`
# without the buildx plugin fails with `ERROR: BuildKit is enabled but
# the buildx component is missing or broken`, leaving the workspace at
# status=failed. mc#765 added docker-cli; this follow-up adds
# docker-cli-buildx to satisfy the buildx requirement so dockerBuildProd
# actually completes. The Docker SOCKET is already mounted (entrypoint.sh
# adds the platform user to the docker group). Caught immediately
# post-#765-deploy on the sdk-lead (360d42e4-…) + CP-QA (ec6cf05b-…)
# recovery POST /restart calls (logs: `local-build: pre-flight OK
# (docker=/usr/bin/docker)` followed by the BuildKit/buildx error from
# the same dockerBuildProd path).
# Related: mc#765 (parent fix), Task #194 / Issue #63 (local-build path
# added); `feedback_workspace_image_ghcr_dead`.
RUN apk add --no-cache ca-certificates docker-cli docker-cli-buildx git tzdata wget
COPY --from=builder /platform /platform
COPY --from=builder /memory-plugin /memory-plugin
COPY workspace-server/migrations /migrations
COPY manifest.json /app/manifest.json
# Templates + plugins (pre-cloned by scripts/clone-manifest.sh in the
# trusted CI / operator-host context, .git already stripped). The Gitea
# token used to clone them never enters this image — same shape as
# Dockerfile.tenant.
COPY .tenant-bundle-deps/workspace-configs-templates /workspace-configs-templates
COPY .tenant-bundle-deps/org-templates /org-templates
COPY .tenant-bundle-deps/plugins /plugins
# Non-root runtime with Docker socket access for workspace provisioning.
RUN addgroup -g 1000 platform && adduser -u 1000 -G platform -s /bin/sh -D platform
EXPOSE 8080
# HEALTHCHECK: probe the /health endpoint so orchestrators and docker's
# health monitoring can detect a crashed or wedged server.
# mc#1158: workspace/Dockerfile has a HEALTHCHECK; workspace-server/Dockerfile
# was missing one, so docker ps never shows (healthy) for this container.
# Interval 30s, timeout 5s, 3 retries, 30s start period (server boot).
HEALTHCHECK --interval=30s --timeout=5s --retries=3 --start-period=30s \
CMD wget -qO- --timeout=5 http://localhost:8080/health || exit 1
COPY <<'ENTRY' /entrypoint.sh
#!/bin/sh
# Set up docker-socket group (unchanged from pre-sidecar entrypoint).
if [ -S /var/run/docker.sock ]; then
SOCK_GID=$(stat -c '%g' /var/run/docker.sock 2>/dev/null || stat -f '%g' /var/run/docker.sock 2>/dev/null)
if [ -n "$SOCK_GID" ] && [ "$SOCK_GID" != "0" ]; then
addgroup -g "$SOCK_GID" docker 2>/dev/null || true
addgroup platform docker 2>/dev/null || true
else
addgroup platform root 2>/dev/null || true
fi
fi
# Memory v2 sidecar (built-in postgres plugin). Co-located with the
# main server so operators flipping MEMORY_V2_CUTOVER=true don't need
# to provision a separate service.
#
# Spawn-gating: only start the sidecar when the operator has indicated
# they want it — either MEMORY_V2_CUTOVER=true OR MEMORY_PLUGIN_URL set.
# Without that signal, the sidecar adds zero value (the platform's
# wiring.go skips building the client too) but pays a real cost: the
# plugin's first migration runs `CREATE EXTENSION vector`, which fails
# on tenant Postgres without pgvector preinstalled and aborts container
# boot via the 30s health gate. Caught on staging redeploy 2026-05-05.
#
# Env defaults (when sidecar IS spawned):
# MEMORY_PLUGIN_DATABASE_URL = $DATABASE_URL (share existing Postgres;
# plugin's `memory_namespaces` / `memory_records` tables coexist
# with `agent_memories` and the rest of the platform schema —
# no conflicts. Operator can override with a separate URL.)
# MEMORY_PLUGIN_LISTEN_ADDR = 127.0.0.1:9100
#
# Set MEMORY_PLUGIN_DISABLE=1 to force-skip the sidecar even with
# cutover env set (e.g. running the plugin externally on a separate host).
memory_plugin_wanted=""
if [ "$MEMORY_V2_CUTOVER" = "true" ] || [ -n "$MEMORY_PLUGIN_URL" ]; then
memory_plugin_wanted=1
fi
if [ -z "$MEMORY_PLUGIN_DISABLE" ] && [ -n "$memory_plugin_wanted" ] && [ -n "$DATABASE_URL" ]; then
: "${MEMORY_PLUGIN_DATABASE_URL:=$DATABASE_URL}"
: "${MEMORY_PLUGIN_LISTEN_ADDR:=:9100}"
export MEMORY_PLUGIN_DATABASE_URL MEMORY_PLUGIN_LISTEN_ADDR
echo "memory-plugin: starting sidecar on $MEMORY_PLUGIN_LISTEN_ADDR" >&2
# Drop privs to the platform user — the plugin doesn't need root and
# runs unprivileged elsewhere (tenant image already starts as canvas).
su-exec platform /memory-plugin &
MEMORY_PLUGIN_PID=$!
# Wait up to 30s for the plugin's /v1/health to return 200. Boot
# failure here is fatal — better to crash-loop than to silently
# serve cutover traffic against a dead plugin.
health_port=${MEMORY_PLUGIN_LISTEN_ADDR#:}
ready=0
for _ in $(seq 1 30); do
if wget -qO- --timeout=2 "http://localhost:${health_port}/v1/health" >/dev/null 2>&1; then
ready=1
break
fi
sleep 1
done
if [ "$ready" != "1" ]; then
echo "memory-plugin: ❌ /v1/health never returned 200 after 30s — aborting boot. Check that DATABASE_URL is reachable, has the pgvector extension, and the plugin's migrations applied." >&2
kill "$MEMORY_PLUGIN_PID" 2>/dev/null || true
exit 1
fi
echo "memory-plugin: ✅ sidecar healthy on :$health_port" >&2
fi
exec su-exec platform /platform "$@"
ENTRY
RUN chmod +x /entrypoint.sh && apk add --no-cache su-exec
ENTRYPOINT ["/entrypoint.sh"]