molecule-ai-workspace-templ.../Dockerfile
Molecule AI · core-devops 12dd60413d
Some checks failed
CI / validate (push) Blocked by required conditions
CI / Template validation (static) (push) Successful in 2m5s
CI / Adapter unit tests (push) Successful in 1m57s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 17s
CI / Template validation (static) (pull_request) Successful in 1m44s
CI / Adapter unit tests (pull_request) Successful in 1m49s
CI / Template validation (runtime) (push) Successful in 12m24s
CI / T4 tier-4 conformance (live) (push) Failing after 12m20s
CI / Template validation (runtime) (pull_request) Successful in 9m27s
CI / T4 tier-4 conformance (live) (pull_request) Successful in 8m59s
CI / validate (pull_request) Successful in 16s
feat(claude-code): T4 host-root escalation leg + real tier-4 conformance gate (RFC internal#456 §9-11)
T4 currently ships only the provisioner privileged-container shape;
the in-image uid-1000 agent has NO wired path to host root inside
--privileged --pid=host -v /:/host (--privileged grants caps to root,
not uid-1000; root:docker 0660 docker.sock unusable). This adds the
ADDITIVE escalation leg, preserving the uid-1000 + agent-owned-token
contract:

- Dockerfile: bake sudo + util-linux(nsenter) + docker.io CLI;
  /etc/sudoers.d/agent-t4 `agent ALL=(ALL) NOPASSWD:ALL` (0440,
  visudo-validated at build); `agent` in `docker` group. useradd
  -u 1000 + `exec gosu agent` UNCHANGED — agent stays uid-1000.
- entrypoint.sh: document the agent-owned-token half of the §10
  atomic co-sequencing contract on the existing `chown -R agent
  /configs` (token ownership NOT regressed).
- ci.yml: new `t4-conformance` job — NOT a string-match. Builds the
  real image, runs it under the EXACT controlplane tier-4 flags, and
  asserts on the RUNNING container, atomically: (a) the uid-1000
  agent attains host root (sudo nsenter --target 1 + host-fs
  write/readback through /host) AND (b) /configs/.auth_token
  owner_uid==1000. Wired into the required `validate` aggregator and
  fails closed (no skip except fork-PR short-circuit).

RFC internal#456 §9-11 / PR#474. Atomic per §10: uid-1000 enforcement
and the escalation leg ship in this one image revision.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 11:44:43 -07:00

130 lines
6.9 KiB
Docker

FROM python:3.11-slim
# System deps — curl/gosu/node/npm for the runtime; git + gh for agent
# autonomy (agents run `gh issue list`, `gh issue create`, `gh issue edit
# --add-assignee`, `git clone`, etc. per their idle/cron prompts).
# Without these the team's claim-and-ship loop silently returns
# "(no response generated)" because tools error out.
#
# T4 escalation leg (RFC internal#456 §9 / PR#474):
# sudo + util-linux(nsenter) + docker.io(CLI) are baked here so the
# uid-1000 `agent` (see useradd below — UNCHANGED, agent stays
# uid-1000) has a wired, audited path to host root inside the
# provisioner's `--privileged --pid=host -v /:/host
# -v /var/run/docker.sock:/var/run/docker.sock` container. Without
# sudo, a uid-1000 process in --privileged CANNOT nsenter/chroot
# /host (--privileged grants caps to root, not uid-1000) and cannot
# use the root:docker 0660 docker.sock — T4 would be
# provisioner-shape-only (the documented ABSENT-escalation-leg gap).
# The sudoers drop-in + docker-group add are below, after useradd,
# so `agent` exists. This is ADDITIVE: it does NOT change the agent
# uid and does NOT change /configs token ownership (still uid-1000,
# enforced by entrypoint.sh + the Layer-3 conformance gate).
RUN apt-get update && apt-get install -y --no-install-recommends \
curl gosu nodejs npm ca-certificates git sudo util-linux docker.io \
&& install -m 0755 -d /etc/apt/keyrings \
&& curl -fsSL https://cli.github.com/packages/githubcli-archive-keyring.gpg | tee /etc/apt/keyrings/githubcli-archive-keyring.gpg > /dev/null \
&& chmod go+r /etc/apt/keyrings/githubcli-archive-keyring.gpg \
&& echo "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/githubcli-archive-keyring.gpg] https://cli.github.com/packages stable main" > /etc/apt/sources.list.d/github-cli.list \
&& apt-get update && apt-get install -y --no-install-recommends gh \
&& rm -rf /var/lib/apt/lists/*
# Install claude-code CLI via npm
RUN npm install -g @anthropic-ai/claude-code 2>/dev/null || true
# Create agent user — UNCHANGED. The agent runs as uid-1000; the T4
# escalation leg below is additive and does NOT promote the agent to
# root. claude-code still refuses --dangerously-skip-permissions as
# root, and /configs/.auth_token must stay agent-owned (Hermes
# list_peers 401 class — RFC internal#456 §10).
RUN useradd -u 1000 -m -s /bin/bash agent
# --- T4 escalation leg (RFC internal#456 §9.3 / PR#474) ---
# Wired path: uid-1000 agent -> host root inside the provisioner's
# --privileged --pid=host -v /:/host -v docker.sock container.
# 1. NOPASSWD sudoers drop-in (mode 0440, visudo-validated at build
# so a malformed sudoers can never ship a broken-sudo image).
# 2. agent in the `docker` group so the bind-mounted root:docker
# 0660 /var/run/docker.sock is usable without sudo.
# Atomic co-sequencing (RFC §10): this ships in the SAME image
# revision as the uid-1000 + agent-owned-token entrypoint contract;
# the Layer-3 conformance gate asserts BOTH on the running container.
RUN set -eux; \
printf 'agent ALL=(ALL) NOPASSWD:ALL\n' > /etc/sudoers.d/agent-t4; \
chmod 0440 /etc/sudoers.d/agent-t4; \
visudo -cf /etc/sudoers.d/agent-t4; \
groupadd -f docker; \
usermod -aG docker agent; \
id agent
WORKDIR /app
# RUNTIME_VERSION is forwarded from the reusable publish workflow as
# a docker build-arg. When set (cascade-triggered builds), it's the
# exact runtime version PyPI just published. Including it as an ARG
# changes the cache key for the pip install layer below — without
# this, identical Dockerfile + identical requirements.txt content
# would let docker reuse the cached layer with the previous version
# baked in (the cache trap that bit us 5x on 2026-04-27).
# Empty default = falls back to whatever requirements.txt resolves to.
ARG RUNTIME_VERSION=
# Install Python deps. The RUNTIME_VERSION ARG is a no-op argument to
# the RUN command itself but its presence as a declared ARG above
# means buildx hashes it into the cache key.
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt && \
if [ -n "${RUNTIME_VERSION}" ]; then \
pip install --no-cache-dir --upgrade "molecule-ai-workspace-runtime==${RUNTIME_VERSION}"; \
fi
# Copy adapter code
COPY adapter.py .
COPY __init__.py .
# Provider registry. The adapter's _load_providers walks 4 paths:
# 1. /opt/adapter/config.yaml — provisioner-managed canonical
# 2. os.path.dirname(__file__)/config.yaml — alongside adapter.py (this image)
# 3. ${WORKSPACE_CONFIG_PATH}/config.yaml — workspace per-instance overrides
# 4. _BUILTIN_PROVIDERS — oauth + anthropic-api only
# On this image /opt/adapter/ is never populated by the platform
# provisioner, so path 2 (/app/config.yaml) is the load-bearing one.
# Without this COPY the file isn't in the image, all 3 file paths fail,
# and _load_providers falls through to _BUILTIN_PROVIDERS — every
# MiniMax/GLM/Kimi/DeepSeek model silently routes to anthropic-oauth →
# "Not logged in. Please run /login" at first LLM call. Caused the
# canary's 38h chronic red on 2026-05-07/08 (molecule-core#129).
COPY config.yaml .
# Adapter-specific executor — owned by THIS template (universal-runtime
# refactor, molecule-core task #87). Lives alongside adapter.py so
# Python's import system picks the local /app/claude_sdk_executor.py
# before the same-named module that older molecule-runtime versions
# also shipped under site-packages. Once molecule-core drops the file
# from its workspace/ package and bumps the runtime PyPI version, the
# template will be the sole source of truth.
COPY claude_sdk_executor.py .
# Set the adapter module for runtime discovery
ENV ADAPTER_MODULE=adapter
# Git credential helper + background refresh daemon — fix for #1933 / #1866 / #547.
# Without these, GH_TOKEN injected at provision time expires after ~60 min
# and every subsequent git push/clone returns 401, causing agents to
# infinite-loop status reports back to PMs and overflow A2A queues.
#
# The helper hits the platform's /admin/github-installation-token endpoint
# (and falls back to env-var GH_TOKEN when platform is unreachable). The
# refresh daemon calls _refresh_gh every ~45 min so `gh` CLI auth and the
# helper cache stay warm even when no git operation triggers a refresh.
COPY scripts/molecule-git-token-helper.sh /app/scripts/molecule-git-token-helper.sh
COPY scripts/molecule-gh-token-refresh.sh /app/scripts/molecule-gh-token-refresh.sh
RUN chmod +x /app/scripts/molecule-git-token-helper.sh /app/scripts/molecule-gh-token-refresh.sh
# Drop-priv entrypoint — claude-code refuses --dangerously-skip-permissions
# as root, so we run molecule-runtime as the agent user (uid 1000).
# The script handles volume-ownership fix + session-dir symlink before
# exec'ing via gosu.
COPY entrypoint.sh /entrypoint.sh
RUN chmod +x /entrypoint.sh
ENTRYPOINT ["/entrypoint.sh"]