Symptoms before this PR:
- After ~60 min of workspace uptime, every git push/clone returns 401
- PMM, DevRel, Social Media Brand and other content agents infinite-loop
status reports back to PMs ("I tried, GH_TOKEN dead")
- PM A2A queues overflow with retry-status messages (depth 27 on Marketing
Lead, 18 on Dev Lead, 11 on Core Platform Lead at peak)
Root cause:
- GH_TOKEN/GITHUB_TOKEN injected at provision time has a ~60 min TTL
(GitHub App installation tokens cap at one hour)
- Workspace env is frozen at container start — no in-process mechanism
to refresh after expiry
- The credential-helper architecture exists in the codebase but was
never wired up at template boot. Specifically the claude-code template:
- did not COPY the helper scripts into the image
- did not configure git credential.helper at boot
- did not start the background refresh daemon
- did not run initial gh auth login
Fix:
1. Dockerfile COPYs scripts/molecule-git-token-helper.sh and
scripts/molecule-gh-token-refresh.sh into /app/scripts/
2. entrypoint.sh (root half) configures git credential helper for
github.com and creates the per-user token cache directory
3. entrypoint.sh (agent half) starts the refresh daemon under a
respawn loop and runs initial `gh auth login --with-token`
The helper hits the platform's /admin/github-installation-token endpoint
(fallback to env-var GH_TOKEN when platform unreachable). The refresh
daemon calls _refresh_gh every ~45 min ± 2 min jitter so cli auth and
helper cache stay warm even when no git operation triggers a refresh.
Acceptance:
- After this image deploys, `gh api /user` from inside a workspace
should keep returning 200 even after >60 min uptime
- Marketing Lead / Dev Lead a2a queues should drain to <5 within one
cycle of the new image rolling
Follow-up issues to file (not in this PR):
- Replicate this wiring in the other 7 template repos (autogen, crewai,
deepagents, gemini-cli, hermes, langgraph, openclaw)
- Lift the wiring into the molecule-runtime PyPI package so future
templates inherit it instead of re-implementing
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
58 lines
2.0 KiB
Bash
58 lines
2.0 KiB
Bash
#!/bin/bash
|
|
# molecule-gh-token-refresh.sh — background daemon that keeps GitHub
|
|
# credentials fresh inside Molecule AI workspace containers.
|
|
#
|
|
# Started by entrypoint.sh under a respawn wrapper. Every
|
|
# REFRESH_INTERVAL_SEC + jitter (default 45 min ± 2 min) it calls the
|
|
# credential helper's _refresh_gh action.
|
|
#
|
|
# # Jitter
|
|
# A 0..120s random offset prevents 39 containers from synchronizing
|
|
# their refresh requests against /workspaces/:id/github-installation-token.
|
|
#
|
|
# # Security
|
|
# - This daemon NEVER prints token values. Failures log the helper's
|
|
# exit code only, not its stderr, so token bytes can't leak via the
|
|
# docker log pipeline.
|
|
# - The helper script is responsible for chmod 600 on cache files.
|
|
#
|
|
set -uo pipefail
|
|
|
|
HELPER_SCRIPT="${TOKEN_HELPER_SCRIPT:-/app/scripts/molecule-git-token-helper.sh}"
|
|
REFRESH_INTERVAL_SEC="${TOKEN_REFRESH_INTERVAL_SEC:-2700}" # 45 min
|
|
JITTER_MAX_SEC="${TOKEN_REFRESH_JITTER_SEC:-120}"
|
|
INITIAL_DELAY_SEC="${TOKEN_REFRESH_INITIAL_DELAY_SEC:-60}"
|
|
|
|
log() {
|
|
echo "[molecule-gh-token-refresh] $(date -u '+%Y-%m-%dT%H:%M:%SZ') $*" >&2
|
|
}
|
|
|
|
jittered_sleep() {
|
|
local base="$1"
|
|
local jitter=$((RANDOM % (JITTER_MAX_SEC + 1)))
|
|
sleep $((base + jitter))
|
|
}
|
|
|
|
log "starting (interval=${REFRESH_INTERVAL_SEC}s ± ${JITTER_MAX_SEC}s, initial_delay=${INITIAL_DELAY_SEC}s)"
|
|
sleep "${INITIAL_DELAY_SEC}"
|
|
|
|
# Initial refresh — prime the cache + gh auth immediately after boot.
|
|
# Discard helper output to /dev/null so token can't leak via docker logs.
|
|
log "initial token refresh"
|
|
if bash "${HELPER_SCRIPT}" _refresh_gh >/dev/null 2>&1; then
|
|
log "initial refresh succeeded"
|
|
else
|
|
log "initial refresh failed (rc=$?) — will retry in ~${REFRESH_INTERVAL_SEC}s"
|
|
fi
|
|
|
|
# Steady-state loop.
|
|
while true; do
|
|
jittered_sleep "${REFRESH_INTERVAL_SEC}"
|
|
log "periodic token refresh"
|
|
if bash "${HELPER_SCRIPT}" _refresh_gh >/dev/null 2>&1; then
|
|
log "refresh succeeded"
|
|
else
|
|
log "refresh failed (rc=$?) — will retry in ~${REFRESH_INTERVAL_SEC}s"
|
|
fi
|
|
done
|