molecule-core/workspace-server/Dockerfile.tenant
Hongming Wang 89d9470ba4 feat(terminal): remote path via aws ec2-instance-connect + pty
Closes the last CP-provisioned-workspace gap: Terminal tab now works
for workspaces running on separate EC2 instances. Follow-up to
#1531 which added instance_id persistence.

How it works:
- HandleConnect checks workspaces.instance_id
- Empty → existing local Docker path (unchanged)
- Set   → spawn `aws ec2-instance-connect ssh --connection-type eice
          --instance-id X --os-user ec2-user -- docker exec -it ws-Y
          /bin/bash` under creack/pty, bridge pty ↔ canvas WebSocket

Why subprocess AWS CLI instead of native AWS SDK:
- EIC Endpoint tunnel needs a signed WebSocket with specific framing
- aws-cli v2 implements it correctly; reimplementing in Go is ~500
  lines of crypto + WS protocol work for zero user-visible benefit
- Tenant image picks up 1MB of aws-cli + openssh-client via apk

Handler design:
- sshCommandFactory is a var so tests can stub it (no real aws calls)
- Context cancellation propagates both ways (WS close → kill ssh;
  ssh exit → close WS)
- User-visible error points at docs/infra/workspace-terminal.md when
  EIC wiring is incomplete (common bootstrap failure)

Tests:
- TestHandleConnect_RoutesToRemote — instance_id in DB → CP branch
- TestHandleConnect_RoutesToLocal — empty instance_id → local branch
- TestSshCommandFactory_BuildsEICCommand — argv shape regression guard

Dockerfile.tenant: + openssh-client + aws-cli (Alpine main repo)

Refs: #1528, #1531

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 18:13:29 -07:00

89 lines
3.8 KiB
Docker

# Dockerfile.tenant — combined platform (Go) + canvas (Next.js) image.
#
# Serves both the API (Go on :8080) and the UI (Node.js on :3000) in a
# single container. Go reverse-proxies unknown routes to canvas.
#
# Templates are cloned from standalone GitHub repos at build time so the
# monorepo doesn't need to carry them. The repos are public; no auth.
#
# Build context: repo root.
#
# docker buildx build --platform linux/amd64 \
# -f workspace-server/Dockerfile.tenant \
# -t registry.fly.io/molecule-tenant:latest \
# --push .
# ── Stage 1: Go platform binary ──────────────────────────────────────
FROM golang:1.25-alpine AS go-builder
WORKDIR /app
COPY molecule-ai-plugin-github-app-auth/ /plugin/
COPY workspace-server/go.mod workspace-server/go.sum ./
RUN echo 'replace github.com/Molecule-AI/molecule-ai-plugin-github-app-auth => /plugin' >> go.mod
RUN go mod download
COPY workspace-server/ .
RUN CGO_ENABLED=0 GOOS=linux go build -o /platform ./cmd/server
# ── Stage 2: Canvas Next.js standalone ────────────────────────────────
FROM node:20-alpine AS canvas-builder
WORKDIR /canvas
COPY canvas/package.json canvas/package-lock.json* ./
RUN npm install
COPY canvas/ .
ARG NEXT_PUBLIC_PLATFORM_URL=""
ARG NEXT_PUBLIC_WS_URL=""
ENV NEXT_PUBLIC_PLATFORM_URL=$NEXT_PUBLIC_PLATFORM_URL
ENV NEXT_PUBLIC_WS_URL=$NEXT_PUBLIC_WS_URL
RUN npm run build
# ── Stage 3: Clone templates + plugins from manifest.json ─────────────
FROM alpine:3.20 AS templates
RUN apk add --no-cache git jq
COPY manifest.json /manifest.json
COPY scripts/clone-manifest.sh /scripts/clone-manifest.sh
RUN chmod +x /scripts/clone-manifest.sh && /scripts/clone-manifest.sh /manifest.json /workspace-configs-templates /org-templates /plugins
# ── Stage 4: Runtime ──────────────────────────────────────────────────
FROM node:20-alpine
RUN apk add --no-cache ca-certificates git tzdata openssh-client aws-cli
# Non-root runtime for the Node.js canvas process.
# The Go binary (started by entrypoint.sh) is also non-root — the
# entrypoint runs as root only long enough to set volume ownership,
# then exec's as the 'canvas' user via su-exec / setpriv.
# The Go platform itself drops privileges after init.
#
# node:20-alpine ships with uid/gid 1000 already taken by `node`. Delete
# it first so we can recreate `canvas` at the same uid/gid without
# conflict. Previously plain addgroup/adduser at 1000 failed with
# "group 'node' in use" — blocked the tenant image build for hours
# 2026-04-21. Picking a different uid would break mounted volumes
# that expect 1000, so we keep the slot and rename the user.
RUN deluser --remove-home node 2>/dev/null || true; \
delgroup node 2>/dev/null || true; \
addgroup -g 1000 canvas && adduser -u 1000 -G canvas -s /bin/sh -D canvas
# Go platform binary
COPY --from=go-builder /platform /platform
COPY workspace-server/migrations /migrations
# Templates + plugins (cloned from GitHub in stage 3)
COPY --from=templates /workspace-configs-templates /workspace-configs-templates
COPY --from=templates /org-templates /org-templates
COPY --from=templates /plugins /plugins
# Canvas standalone
WORKDIR /canvas
COPY --from=canvas-builder /canvas/.next/standalone ./
COPY --from=canvas-builder /canvas/.next/static ./.next/static
COPY --from=canvas-builder /canvas/public ./public
COPY workspace-server/entrypoint-tenant.sh /entrypoint.sh
RUN chmod +x /entrypoint.sh && \
chown -R canvas:canvas /canvas /platform /migrations
EXPOSE 8080
# entrypoint.sh starts as root to fix volume perms, then drops to
# canvas user. The Go binary (PID 1 replacement) runs as non-root.
USER canvas
CMD ["/entrypoint.sh"]