molecule-ai-workspace-templ.../Dockerfile
rabbitblood d4ab584deb fix: wire up GitHub App token refresh — fixes #1933
Symptoms before this PR:
- After ~60 min of workspace uptime, every git push/clone returns 401
- PMM, DevRel, Social Media Brand and other content agents infinite-loop
  status reports back to PMs ("I tried, GH_TOKEN dead")
- PM A2A queues overflow with retry-status messages (depth 27 on Marketing
  Lead, 18 on Dev Lead, 11 on Core Platform Lead at peak)

Root cause:
- GH_TOKEN/GITHUB_TOKEN injected at provision time has a ~60 min TTL
  (GitHub App installation tokens cap at one hour)
- Workspace env is frozen at container start — no in-process mechanism
  to refresh after expiry
- The credential-helper architecture exists in the codebase but was
  never wired up at template boot. Specifically the claude-code template:
  - did not COPY the helper scripts into the image
  - did not configure git credential.helper at boot
  - did not start the background refresh daemon
  - did not run initial gh auth login

Fix:
1. Dockerfile COPYs scripts/molecule-git-token-helper.sh and
   scripts/molecule-gh-token-refresh.sh into /app/scripts/
2. entrypoint.sh (root half) configures git credential helper for
   github.com and creates the per-user token cache directory
3. entrypoint.sh (agent half) starts the refresh daemon under a
   respawn loop and runs initial `gh auth login --with-token`

The helper hits the platform's /admin/github-installation-token endpoint
(fallback to env-var GH_TOKEN when platform unreachable). The refresh
daemon calls _refresh_gh every ~45 min ± 2 min jitter so cli auth and
helper cache stay warm even when no git operation triggers a refresh.

Acceptance:
- After this image deploys, `gh api /user` from inside a workspace
  should keep returning 200 even after >60 min uptime
- Marketing Lead / Dev Lead a2a queues should drain to <5 within one
  cycle of the new image rolling

Follow-up issues to file (not in this PR):
- Replicate this wiring in the other 7 template repos (autogen, crewai,
  deepagents, gemini-cli, hermes, langgraph, openclaw)
- Lift the wiring into the molecule-runtime PyPI package so future
  templates inherit it instead of re-implementing

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 17:57:30 -07:00

56 lines
2.6 KiB
Docker

FROM python:3.11-slim
# System deps — curl/gosu/node/npm for the runtime; git + gh for agent
# autonomy (agents run `gh issue list`, `gh issue create`, `gh issue edit
# --add-assignee`, `git clone`, etc. per their idle/cron prompts).
# Without these the team's claim-and-ship loop silently returns
# "(no response generated)" because tools error out.
RUN apt-get update && apt-get install -y --no-install-recommends \
curl gosu nodejs npm ca-certificates git \
&& install -m 0755 -d /etc/apt/keyrings \
&& curl -fsSL https://cli.github.com/packages/githubcli-archive-keyring.gpg | tee /etc/apt/keyrings/githubcli-archive-keyring.gpg > /dev/null \
&& chmod go+r /etc/apt/keyrings/githubcli-archive-keyring.gpg \
&& echo "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/githubcli-archive-keyring.gpg] https://cli.github.com/packages stable main" > /etc/apt/sources.list.d/github-cli.list \
&& apt-get update && apt-get install -y --no-install-recommends gh \
&& rm -rf /var/lib/apt/lists/*
# Install claude-code CLI via npm
RUN npm install -g @anthropic-ai/claude-code 2>/dev/null || true
# Create agent user
RUN useradd -u 1000 -m -s /bin/bash agent
WORKDIR /app
# Install Python deps
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
# Copy adapter code
COPY adapter.py .
COPY __init__.py .
# Set the adapter module for runtime discovery
ENV ADAPTER_MODULE=adapter
# Git credential helper + background refresh daemon — fix for #1933 / #1866 / #547.
# Without these, GH_TOKEN injected at provision time expires after ~60 min
# and every subsequent git push/clone returns 401, causing agents to
# infinite-loop status reports back to PMs and overflow A2A queues.
#
# The helper hits the platform's /admin/github-installation-token endpoint
# (and falls back to env-var GH_TOKEN when platform is unreachable). The
# refresh daemon calls _refresh_gh every ~45 min so `gh` CLI auth and the
# helper cache stay warm even when no git operation triggers a refresh.
COPY scripts/molecule-git-token-helper.sh /app/scripts/molecule-git-token-helper.sh
COPY scripts/molecule-gh-token-refresh.sh /app/scripts/molecule-gh-token-refresh.sh
RUN chmod +x /app/scripts/molecule-git-token-helper.sh /app/scripts/molecule-gh-token-refresh.sh
# Drop-priv entrypoint — claude-code refuses --dangerously-skip-permissions
# as root, so we run molecule-runtime as the agent user (uid 1000).
# The script handles volume-ownership fix + session-dir symlink before
# exec'ing via gosu.
COPY entrypoint.sh /entrypoint.sh
RUN chmod +x /entrypoint.sh
ENTRYPOINT ["/entrypoint.sh"]