Closes the gap that let issue #2395 ship: redeploy-fleet workflows reported ssm_status=Success based on SSM RPC return code alone, while EC2 tenants silently kept serving the previous :latest digest because docker compose up without an explicit pull is a no-op when the local tag already exists. Wire: - new buildinfo package exposes GitSHA, set at link time via -ldflags from the GIT_SHA build-arg (default "dev" so test runs without ldflags fail closed against an unset deploy) - router exposes GET /buildinfo returning {git_sha} — public, no auth, cheap enough to curl from CI for every tenant - both Dockerfiles thread GIT_SHA into the Go build - publish-workspace-server-image.yml passes GIT_SHA=github.sha for both images - redeploy-tenants-on-main.yml + redeploy-tenants-on-staging.yml curl each tenant's /buildinfo after the redeploy SSM RPC and fail the workflow on digest mismatch; staging treats both :latest and :staging-latest as moving tags; verification is skipped only when an operator pinned a specific tag via workflow_dispatch Tests: - TestGitSHA_DefaultDevSentinel pins the dev default - TestBuildInfoEndpoint_ReturnsGitSHA pins the wire shape that the workflow's jq lookup depends on Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
101 lines
4.4 KiB
Docker
101 lines
4.4 KiB
Docker
# Dockerfile.tenant — combined platform (Go) + canvas (Next.js) image.
|
|
#
|
|
# Serves both the API (Go on :8080) and the UI (Node.js on :3000) in a
|
|
# single container. Go reverse-proxies unknown routes to canvas.
|
|
#
|
|
# Templates are cloned from standalone GitHub repos at build time so the
|
|
# monorepo doesn't need to carry them. The repos are public; no auth.
|
|
#
|
|
# Build context: repo root.
|
|
#
|
|
# docker buildx build --platform linux/amd64 \
|
|
# -f workspace-server/Dockerfile.tenant \
|
|
# -t registry.fly.io/molecule-tenant:latest \
|
|
# --push .
|
|
|
|
# ── Stage 1: Go platform binary ──────────────────────────────────────
|
|
FROM golang:1.25-alpine AS go-builder
|
|
WORKDIR /app
|
|
COPY molecule-ai-plugin-github-app-auth/ /plugin/
|
|
COPY workspace-server/go.mod workspace-server/go.sum ./
|
|
RUN echo 'replace github.com/Molecule-AI/molecule-ai-plugin-github-app-auth => /plugin' >> go.mod
|
|
RUN go mod download
|
|
COPY workspace-server/ .
|
|
|
|
# GIT_SHA is baked into the binary via -ldflags so /buildinfo can return
|
|
# it at runtime. CI passes ${{ github.sha }}; local builds default to
|
|
# "dev" so an unset value never reads as a real SHA.
|
|
#
|
|
# Why this matters: the redeploy verification step compares each tenant's
|
|
# /buildinfo against the SHA the workflow expects. If GIT_SHA isn't
|
|
# threaded through here, every tenant returns "dev" and the verification
|
|
# fails closed — which is the correct fail-direction (#2395 root fix).
|
|
ARG GIT_SHA=dev
|
|
RUN CGO_ENABLED=0 GOOS=linux go build \
|
|
-ldflags "-X github.com/Molecule-AI/molecule-monorepo/platform/internal/buildinfo.GitSHA=${GIT_SHA}" \
|
|
-o /platform ./cmd/server
|
|
|
|
# ── Stage 2: Canvas Next.js standalone ────────────────────────────────
|
|
FROM node:20-alpine AS canvas-builder
|
|
WORKDIR /canvas
|
|
COPY canvas/package.json canvas/package-lock.json* ./
|
|
RUN npm install
|
|
COPY canvas/ .
|
|
ARG NEXT_PUBLIC_PLATFORM_URL=""
|
|
ARG NEXT_PUBLIC_WS_URL=""
|
|
ENV NEXT_PUBLIC_PLATFORM_URL=$NEXT_PUBLIC_PLATFORM_URL
|
|
ENV NEXT_PUBLIC_WS_URL=$NEXT_PUBLIC_WS_URL
|
|
RUN npm run build
|
|
|
|
# ── Stage 3: Clone templates + plugins from manifest.json ─────────────
|
|
FROM alpine:3.20 AS templates
|
|
RUN apk add --no-cache git jq
|
|
COPY manifest.json /manifest.json
|
|
COPY scripts/clone-manifest.sh /scripts/clone-manifest.sh
|
|
RUN chmod +x /scripts/clone-manifest.sh && /scripts/clone-manifest.sh /manifest.json /workspace-configs-templates /org-templates /plugins
|
|
|
|
# ── Stage 4: Runtime ──────────────────────────────────────────────────
|
|
FROM node:20-alpine
|
|
RUN apk add --no-cache ca-certificates git tzdata openssh-client aws-cli
|
|
|
|
# Non-root runtime for the Node.js canvas process.
|
|
# The Go binary (started by entrypoint.sh) is also non-root — the
|
|
# entrypoint runs as root only long enough to set volume ownership,
|
|
# then exec's as the 'canvas' user via su-exec / setpriv.
|
|
# The Go platform itself drops privileges after init.
|
|
#
|
|
# node:20-alpine ships with uid/gid 1000 already taken by `node`. Delete
|
|
# it first so we can recreate `canvas` at the same uid/gid without
|
|
# conflict. Previously plain addgroup/adduser at 1000 failed with
|
|
# "group 'node' in use" — blocked the tenant image build for hours
|
|
# 2026-04-21. Picking a different uid would break mounted volumes
|
|
# that expect 1000, so we keep the slot and rename the user.
|
|
RUN deluser --remove-home node 2>/dev/null || true; \
|
|
delgroup node 2>/dev/null || true; \
|
|
addgroup -g 1000 canvas && adduser -u 1000 -G canvas -s /bin/sh -D canvas
|
|
|
|
# Go platform binary
|
|
COPY --from=go-builder /platform /platform
|
|
COPY workspace-server/migrations /migrations
|
|
|
|
# Templates + plugins (cloned from GitHub in stage 3)
|
|
COPY --from=templates /workspace-configs-templates /workspace-configs-templates
|
|
COPY --from=templates /org-templates /org-templates
|
|
COPY --from=templates /plugins /plugins
|
|
|
|
# Canvas standalone
|
|
WORKDIR /canvas
|
|
COPY --from=canvas-builder /canvas/.next/standalone ./
|
|
COPY --from=canvas-builder /canvas/.next/static ./.next/static
|
|
COPY --from=canvas-builder /canvas/public ./public
|
|
|
|
COPY workspace-server/entrypoint-tenant.sh /entrypoint.sh
|
|
RUN chmod +x /entrypoint.sh && \
|
|
chown -R canvas:canvas /canvas /platform /migrations
|
|
|
|
EXPOSE 8080
|
|
# entrypoint.sh starts as root to fix volume perms, then drops to
|
|
# canvas user. The Go binary (PID 1 replacement) runs as non-root.
|
|
USER canvas
|
|
CMD ["/entrypoint.sh"]
|