molecule-core/workspace-server/Dockerfile.tenant
Hongming Wang 998e13c4bd feat(deploy): verify each tenant /buildinfo matches published SHA after redeploy
Closes the gap that let issue #2395 ship: redeploy-fleet workflows reported
ssm_status=Success based on SSM RPC return code alone, while EC2 tenants
silently kept serving the previous :latest digest because docker compose up
without an explicit pull is a no-op when the local tag already exists.

Wire:
  - new buildinfo package exposes GitSHA, set at link time via -ldflags from
    the GIT_SHA build-arg (default "dev" so test runs without ldflags fail
    closed against an unset deploy)
  - router exposes GET /buildinfo returning {git_sha} — public, no auth,
    cheap enough to curl from CI for every tenant
  - both Dockerfiles thread GIT_SHA into the Go build
  - publish-workspace-server-image.yml passes GIT_SHA=github.sha for both
    images
  - redeploy-tenants-on-main.yml + redeploy-tenants-on-staging.yml curl each
    tenant's /buildinfo after the redeploy SSM RPC and fail the workflow on
    digest mismatch; staging treats both :latest and :staging-latest as
    moving tags; verification is skipped only when an operator pinned a
    specific tag via workflow_dispatch

Tests:
  - TestGitSHA_DefaultDevSentinel pins the dev default
  - TestBuildInfoEndpoint_ReturnsGitSHA pins the wire shape that the
    workflow's jq lookup depends on

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 10:55:08 -07:00

101 lines
4.4 KiB
Docker

# Dockerfile.tenant — combined platform (Go) + canvas (Next.js) image.
#
# Serves both the API (Go on :8080) and the UI (Node.js on :3000) in a
# single container. Go reverse-proxies unknown routes to canvas.
#
# Templates are cloned from standalone GitHub repos at build time so the
# monorepo doesn't need to carry them. The repos are public; no auth.
#
# Build context: repo root.
#
# docker buildx build --platform linux/amd64 \
# -f workspace-server/Dockerfile.tenant \
# -t registry.fly.io/molecule-tenant:latest \
# --push .
# ── Stage 1: Go platform binary ──────────────────────────────────────
FROM golang:1.25-alpine AS go-builder
WORKDIR /app
COPY molecule-ai-plugin-github-app-auth/ /plugin/
COPY workspace-server/go.mod workspace-server/go.sum ./
RUN echo 'replace github.com/Molecule-AI/molecule-ai-plugin-github-app-auth => /plugin' >> go.mod
RUN go mod download
COPY workspace-server/ .
# GIT_SHA is baked into the binary via -ldflags so /buildinfo can return
# it at runtime. CI passes ${{ github.sha }}; local builds default to
# "dev" so an unset value never reads as a real SHA.
#
# Why this matters: the redeploy verification step compares each tenant's
# /buildinfo against the SHA the workflow expects. If GIT_SHA isn't
# threaded through here, every tenant returns "dev" and the verification
# fails closed — which is the correct fail-direction (#2395 root fix).
ARG GIT_SHA=dev
RUN CGO_ENABLED=0 GOOS=linux go build \
-ldflags "-X github.com/Molecule-AI/molecule-monorepo/platform/internal/buildinfo.GitSHA=${GIT_SHA}" \
-o /platform ./cmd/server
# ── Stage 2: Canvas Next.js standalone ────────────────────────────────
FROM node:20-alpine AS canvas-builder
WORKDIR /canvas
COPY canvas/package.json canvas/package-lock.json* ./
RUN npm install
COPY canvas/ .
ARG NEXT_PUBLIC_PLATFORM_URL=""
ARG NEXT_PUBLIC_WS_URL=""
ENV NEXT_PUBLIC_PLATFORM_URL=$NEXT_PUBLIC_PLATFORM_URL
ENV NEXT_PUBLIC_WS_URL=$NEXT_PUBLIC_WS_URL
RUN npm run build
# ── Stage 3: Clone templates + plugins from manifest.json ─────────────
FROM alpine:3.20 AS templates
RUN apk add --no-cache git jq
COPY manifest.json /manifest.json
COPY scripts/clone-manifest.sh /scripts/clone-manifest.sh
RUN chmod +x /scripts/clone-manifest.sh && /scripts/clone-manifest.sh /manifest.json /workspace-configs-templates /org-templates /plugins
# ── Stage 4: Runtime ──────────────────────────────────────────────────
FROM node:20-alpine
RUN apk add --no-cache ca-certificates git tzdata openssh-client aws-cli
# Non-root runtime for the Node.js canvas process.
# The Go binary (started by entrypoint.sh) is also non-root — the
# entrypoint runs as root only long enough to set volume ownership,
# then exec's as the 'canvas' user via su-exec / setpriv.
# The Go platform itself drops privileges after init.
#
# node:20-alpine ships with uid/gid 1000 already taken by `node`. Delete
# it first so we can recreate `canvas` at the same uid/gid without
# conflict. Previously plain addgroup/adduser at 1000 failed with
# "group 'node' in use" — blocked the tenant image build for hours
# 2026-04-21. Picking a different uid would break mounted volumes
# that expect 1000, so we keep the slot and rename the user.
RUN deluser --remove-home node 2>/dev/null || true; \
delgroup node 2>/dev/null || true; \
addgroup -g 1000 canvas && adduser -u 1000 -G canvas -s /bin/sh -D canvas
# Go platform binary
COPY --from=go-builder /platform /platform
COPY workspace-server/migrations /migrations
# Templates + plugins (cloned from GitHub in stage 3)
COPY --from=templates /workspace-configs-templates /workspace-configs-templates
COPY --from=templates /org-templates /org-templates
COPY --from=templates /plugins /plugins
# Canvas standalone
WORKDIR /canvas
COPY --from=canvas-builder /canvas/.next/standalone ./
COPY --from=canvas-builder /canvas/.next/static ./.next/static
COPY --from=canvas-builder /canvas/public ./public
COPY workspace-server/entrypoint-tenant.sh /entrypoint.sh
RUN chmod +x /entrypoint.sh && \
chown -R canvas:canvas /canvas /platform /migrations
EXPOSE 8080
# entrypoint.sh starts as root to fix volume perms, then drops to
# canvas user. The Go binary (PID 1 replacement) runs as non-root.
USER canvas
CMD ["/entrypoint.sh"]