fix(github): refresh installation token when TTL < 10 min (#547) (#567)

Root cause: the github-app-auth plugin injects GH_TOKEN + GITHUB_TOKEN
into each workspace container's env at provision time (EnvMutator). Those
are GitHub App installation tokens with a fixed ~60 min TTL. The plugin
has an in-process cache that proactively refreshes 5 min before expiry —
but the workspace env is set once at container start and never updated.
Any workspace alive >60 min ends up with an expired token.

Fix (Option B — on-demand endpoint):

pkg/provisionhook:
  - Add TokenProvider interface: Token(ctx) (token, expiresAt, error)
    Lives in pkg/ (public) so the github-app-auth plugin can implement it.
  - Add Registry.FirstTokenProvider() — discovers the first mutator that
    also satisfies TokenProvider via interface assertion. Safe under
    concurrent reads (existing RWMutex).

platform/internal/handlers/github_token.go:
  - New GitHubTokenHandler serving GET /admin/github-installation-token
  - Delegates to the registered TokenProvider (plugin cache — always fresh)
  - 404 if no GitHub App configured, 500 + [github] prefix log on error
  - Never logs the token itself

platform/internal/handlers/workspace.go:
  - Add TokenRegistry() getter so the router can wire the handler without
    coupling to WorkspaceHandler internals

platform/internal/router/router.go:
  - Register GET /admin/github-installation-token under AdminAuth

workspace-template/:
  - scripts/molecule-git-token-helper.sh — git credential helper; calls
    the platform endpoint on every push/fetch; falls through to next
    helper (operator PAT) if platform unreachable
  - entrypoint.sh — configure the credential helper at startup

Why Option B over Option A (background goroutine):
  - The plugin already has its own cache refresh; nothing to refresh here.
  - Pushing env updates into running containers requires docker exec, which
    the architecture explicitly rejects (issue #547 "Alternatives").
  - Pull-based is stateless, trivially testable, zero extra goroutines.

Closes #547

Co-authored-by: Molecule AI DevOps Engineer <devops-engineer@agents.moleculesai.app>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
molecule-ai[bot] 2026-04-17 00:47:03 +00:00 committed by GitHub
parent 21f9e70b75
commit 3b5affb0d1
7 changed files with 552 additions and 0 deletions

View File

@ -0,0 +1,115 @@
// Package handlers — GitHub App installation-token refresh endpoint.
//
// GET /admin/github-installation-token returns a fresh GitHub App
// installation token on demand. Long-running workspace containers use
// this as a git credential helper and for explicit `gh auth` re-runs
// so they never operate with an expired GH_TOKEN.
//
// # Why this endpoint?
//
// The github-app-auth plugin (PR #506) injects GH_TOKEN + GITHUB_TOKEN
// into a workspace container's env at provision time. Those tokens are
// GitHub App installation tokens with a fixed ~60 min TTL. The plugin
// keeps a server-side in-process cache and proactively refreshes it
// 5 min before expiry, but the workspace env is set once at container
// start and never updated — so any workspace alive >60 min ends up with
// an expired token (issue #547).
//
// The fix is:
//
// 1. Platform side (this file): expose GET /admin/github-installation-token.
// The handler delegates to the registered TokenProvider (typically the
// github-app-auth plugin), whose cache is always fresh. Gated behind
// AdminAuth — any valid workspace bearer token can call it.
//
// 2. Workspace side: a shell credential helper
// (workspace-template/scripts/molecule-git-token-helper.sh) configured
// as the git credential helper. git calls it on every push/fetch;
// it hits this endpoint and emits the fresh token to stdout. A 30-min
// cron also runs `gh auth login --with-token` using the same helper.
//
// # Approach chosen
//
// Option B (pre-flight/on-demand): workspaces poll for a token when
// they need one (credential helper callback). This is preferable over a
// background goroutine pusher (Option A) because:
//
// - The plugin already maintains its own refresh cache — there is no
// token to refresh on the platform side.
// - Pushing a new token into running containers requires docker exec /
// env mutation, which the architecture explicitly rejects (see issue
// #547 "Alternatives considered").
// - On-demand is pull-based, stateless, and trivially testable.
package handlers
import (
"log"
"net/http"
"time"
"github.com/Molecule-AI/molecule-monorepo/platform/pkg/provisionhook"
"github.com/gin-gonic/gin"
)
// GitHubTokenHandler serves GET /admin/github-installation-token.
type GitHubTokenHandler struct {
registry *provisionhook.Registry
}
// NewGitHubTokenHandler constructs the handler. registry may be nil when
// no GitHub App plugin is registered (dev / self-hosted deployments).
func NewGitHubTokenHandler(reg *provisionhook.Registry) *GitHubTokenHandler {
return &GitHubTokenHandler{registry: reg}
}
// GetInstallationToken handles GET /admin/github-installation-token.
//
// Returns:
//
// 200 {"token": "ghs_...", "expires_at": "2026-04-17T22:50:00Z"}
// 404 {"error": "no GitHub App configured"} — GITHUB_APP_ID not set
// 404 {"error": "no token provider registered"} — plugin loaded but
// doesn't implement TokenProvider
// 500 {"error": "token refresh failed"} — provider returned error
//
// The 404 vs 403 distinction is intentional: a 404 means the feature is
// simply not configured, not that the caller is forbidden. This matches
// the pattern used by GET /admin/workspaces/:id/test-token.
//
// Callers must retry with exponential back-off on 500 — a transient
// upstream GitHub API error should not permanently block git operations.
func (h *GitHubTokenHandler) GetInstallationToken(c *gin.Context) {
if h.registry == nil {
c.JSON(http.StatusNotFound, gin.H{"error": "no GitHub App configured"})
return
}
provider := h.registry.FirstTokenProvider()
if provider == nil {
c.JSON(http.StatusNotFound, gin.H{"error": "no token provider registered"})
return
}
token, expiresAt, err := provider.Token(c.Request.Context())
if err != nil {
log.Printf("[github] token refresh failed: %v", err)
c.JSON(http.StatusInternalServerError, gin.H{"error": "token refresh failed"})
return
}
if token == "" {
log.Printf("[github] token provider returned empty token")
c.JSON(http.StatusInternalServerError, gin.H{"error": "token refresh failed: empty token"})
return
}
// Never log the token itself.
log.Printf("[github] served fresh installation token (expires %s, TTL %.0fs)",
expiresAt.Format(time.RFC3339),
time.Until(expiresAt).Seconds())
c.JSON(http.StatusOK, gin.H{
"token": token,
"expires_at": expiresAt.UTC().Format(time.RFC3339),
})
}

View File

@ -0,0 +1,232 @@
package handlers
import (
"context"
"encoding/json"
"errors"
"net/http"
"net/http/httptest"
"testing"
"time"
"github.com/Molecule-AI/molecule-monorepo/platform/pkg/provisionhook"
"github.com/gin-gonic/gin"
)
// ─── mock helpers ────────────────────────────────────────────────────────────
// mockMutatorOnly implements EnvMutator but NOT TokenProvider.
type mockMutatorOnly struct{ name string }
func (m *mockMutatorOnly) Name() string { return m.name }
func (m *mockMutatorOnly) MutateEnv(_ context.Context, _ string, _ map[string]string) error {
return nil
}
// mockTokenMutator implements both EnvMutator and TokenProvider.
// Set err to simulate a provider failure; otherwise returns token + expiresAt.
type mockTokenMutator struct {
name string
token string
expiresAt time.Time
err error
}
func (m *mockTokenMutator) Name() string { return m.name }
func (m *mockTokenMutator) MutateEnv(_ context.Context, _ string, _ map[string]string) error {
return nil
}
func (m *mockTokenMutator) Token(_ context.Context) (string, time.Time, error) {
return m.token, m.expiresAt, m.err
}
// ─── request helper ──────────────────────────────────────────────────────────
func newGitHubTokenRequest() (*httptest.ResponseRecorder, *gin.Context) {
w := httptest.NewRecorder()
c, _ := gin.CreateTestContext(w)
c.Request = httptest.NewRequest(http.MethodGet, "/admin/github-installation-token", nil)
return w, c
}
// ─── tests ───────────────────────────────────────────────────────────────────
// TestGitHubToken_NilRegistry — no GitHub App plugin loaded at all.
// Expect 404 so operators can distinguish "not configured" from "forbidden".
func TestGitHubToken_NilRegistry(t *testing.T) {
h := NewGitHubTokenHandler(nil)
w, c := newGitHubTokenRequest()
h.GetInstallationToken(c)
if w.Code != http.StatusNotFound {
t.Fatalf("expected 404 for nil registry, got %d: %s", w.Code, w.Body.String())
}
var body map[string]string
if err := json.Unmarshal(w.Body.Bytes(), &body); err != nil {
t.Fatalf("response is not valid JSON: %v", err)
}
if body["error"] == "" {
t.Error("expected non-empty error field in response")
}
}
// TestGitHubToken_NoTokenProvider — plugin registered but doesn't implement
// TokenProvider (e.g. a non-GitHub mutator in the chain).
// Expect 404 — the GitHub App endpoint is not available.
func TestGitHubToken_NoTokenProvider(t *testing.T) {
reg := provisionhook.NewRegistry()
reg.Register(&mockMutatorOnly{name: "other-plugin"})
h := NewGitHubTokenHandler(reg)
w, c := newGitHubTokenRequest()
h.GetInstallationToken(c)
if w.Code != http.StatusNotFound {
t.Fatalf("expected 404 when no TokenProvider, got %d: %s", w.Code, w.Body.String())
}
}
// TestGitHubToken_ProviderError — provider returns an error (e.g. GitHub API
// unreachable). Expect 500 so the workspace credential helper retries.
func TestGitHubToken_ProviderError(t *testing.T) {
reg := provisionhook.NewRegistry()
reg.Register(&mockTokenMutator{
name: "github-app-auth",
err: errors.New("github: 503 service unavailable"),
})
h := NewGitHubTokenHandler(reg)
w, c := newGitHubTokenRequest()
h.GetInstallationToken(c)
if w.Code != http.StatusInternalServerError {
t.Fatalf("expected 500 on provider error, got %d: %s", w.Code, w.Body.String())
}
var body map[string]string
if err := json.Unmarshal(w.Body.Bytes(), &body); err != nil {
t.Fatalf("response is not valid JSON: %v", err)
}
if body["error"] == "" {
t.Error("expected non-empty error field in 500 response")
}
}
// TestGitHubToken_EmptyToken — provider returns no error but an empty token.
// This should never happen in normal operation but is a programming error in
// the plugin; treat it as a refresh failure.
func TestGitHubToken_EmptyToken(t *testing.T) {
exp := time.Now().Add(55 * time.Minute)
reg := provisionhook.NewRegistry()
reg.Register(&mockTokenMutator{
name: "github-app-auth",
token: "", // empty — plugin bug
expiresAt: exp,
})
h := NewGitHubTokenHandler(reg)
w, c := newGitHubTokenRequest()
h.GetInstallationToken(c)
if w.Code != http.StatusInternalServerError {
t.Fatalf("expected 500 for empty token, got %d: %s", w.Code, w.Body.String())
}
}
// TestGitHubToken_HappyPath — provider returns a valid token.
// Assert: 200, token present, expires_at is a valid RFC3339 timestamp
// with a positive TTL (i.e. the token is not already expired).
func TestGitHubToken_HappyPath(t *testing.T) {
exp := time.Now().UTC().Add(55 * time.Minute).Truncate(time.Second)
reg := provisionhook.NewRegistry()
reg.Register(&mockTokenMutator{
name: "github-app-auth",
token: "ghs_TestTokenABC123",
expiresAt: exp,
})
h := NewGitHubTokenHandler(reg)
w, c := newGitHubTokenRequest()
h.GetInstallationToken(c)
if w.Code != http.StatusOK {
t.Fatalf("expected 200, got %d: %s", w.Code, w.Body.String())
}
var body struct {
Token string `json:"token"`
ExpiresAt string `json:"expires_at"`
}
if err := json.Unmarshal(w.Body.Bytes(), &body); err != nil {
t.Fatalf("response is not valid JSON: %v", err)
}
if body.Token != "ghs_TestTokenABC123" {
t.Errorf("expected token 'ghs_TestTokenABC123', got %q", body.Token)
}
parsed, err := time.Parse(time.RFC3339, body.ExpiresAt)
if err != nil {
t.Fatalf("expires_at is not valid RFC3339: %q — %v", body.ExpiresAt, err)
}
if !parsed.After(time.Now()) {
t.Errorf("expires_at %s is in the past — handler served an expired token", body.ExpiresAt)
}
}
// TestGitHubToken_FirstProviderWins — two mutators registered; only the first
// implements TokenProvider. Confirm the first one is used (registration order).
func TestGitHubToken_FirstProviderWins(t *testing.T) {
exp := time.Now().UTC().Add(55 * time.Minute)
reg := provisionhook.NewRegistry()
reg.Register(&mockTokenMutator{
name: "first-provider",
token: "ghs_First",
expiresAt: exp,
})
reg.Register(&mockTokenMutator{
name: "second-provider",
token: "ghs_Second",
expiresAt: exp,
})
h := NewGitHubTokenHandler(reg)
w, c := newGitHubTokenRequest()
h.GetInstallationToken(c)
if w.Code != http.StatusOK {
t.Fatalf("expected 200, got %d: %s", w.Code, w.Body.String())
}
var body map[string]string
_ = json.Unmarshal(w.Body.Bytes(), &body)
if body["token"] != "ghs_First" {
t.Errorf("expected first provider's token 'ghs_First', got %q", body["token"])
}
}
// TestGitHubToken_NonProviderBeforeProvider — a plain EnvMutator is registered
// first, then a TokenProvider. Confirm the provider is still found (skip over
// non-providers).
func TestGitHubToken_NonProviderBeforeProvider(t *testing.T) {
exp := time.Now().UTC().Add(55 * time.Minute)
reg := provisionhook.NewRegistry()
reg.Register(&mockMutatorOnly{name: "env-injector"})
reg.Register(&mockTokenMutator{
name: "github-app-auth",
token: "ghs_FoundBehindOther",
expiresAt: exp,
})
h := NewGitHubTokenHandler(reg)
w, c := newGitHubTokenRequest()
h.GetInstallationToken(c)
if w.Code != http.StatusOK {
t.Fatalf("expected 200, got %d: %s", w.Code, w.Body.String())
}
var body map[string]string
_ = json.Unmarshal(w.Body.Bytes(), &body)
if body["token"] != "ghs_FoundBehindOther" {
t.Errorf("expected 'ghs_FoundBehindOther', got %q", body["token"])
}
}

View File

@ -60,6 +60,14 @@ func (h *WorkspaceHandler) SetEnvMutators(r *provisionhook.Registry) {
h.envMutators = r
}
// TokenRegistry returns the provisionhook.Registry so the router can
// wire the GET /admin/github-installation-token handler without coupling
// to WorkspaceHandler's internals. Returns nil when no plugin has been
// registered (dev / self-hosted deployments without a GitHub App).
func (h *WorkspaceHandler) TokenRegistry() *provisionhook.Registry {
return h.envMutators
}
// Create handles POST /workspaces
func (h *WorkspaceHandler) Create(c *gin.Context) {
var payload models.CreateWorkspacePayload

View File

@ -304,6 +304,17 @@ func Setup(hub *ws.Hub, broadcaster *events.Broadcaster, prov *provisioner.Provi
r.GET("/admin/workspaces/:id/test-token", tokh.GetTestToken)
}
// Admin — GitHub App installation token refresh (issue #547).
// Long-running workspaces (>60 min) use this endpoint to refresh
// GH_TOKEN without restarting. Returns the current installation token
// from the github-app-auth plugin's in-process cache (which proactively
// refreshes 5 min before expiry). 404 when no GitHub App is configured
// (dev / self-hosted without GITHUB_APP_ID).
{
ghTokH := handlers.NewGitHubTokenHandler(wh.TokenRegistry())
r.GET("/admin/github-installation-token", middleware.AdminAuth(db.DB), ghTokH.GetInstallationToken)
}
// Terminal — shares Docker client with provisioner
var dockerCli *client.Client
if prov != nil {

View File

@ -48,6 +48,7 @@ import (
"context"
"fmt"
"sync"
"time"
)
// EnvMutator is implemented by plugins that want to inject env vars
@ -64,6 +65,34 @@ type EnvMutator interface {
MutateEnv(ctx context.Context, workspaceID string, env map[string]string) error
}
// TokenProvider is an optional interface that EnvMutator implementations
// may also satisfy. When a mutator implements TokenProvider the platform
// can serve GET /admin/github-installation-token, allowing long-running
// workspaces to fetch a fresh GitHub token without restarting.
//
// # Why a separate interface?
//
// EnvMutator.MutateEnv is called once at provision time and writes into
// an env map. Calling it again just to read the current token would be
// semantically wrong and potentially unsafe (the env map is a live
// workspace struct). TokenProvider cleanly separates "what do I inject
// at boot?" from "what is the live token right now?".
//
// # Plugin contract
//
// Token must return the current valid token and the time at which it
// will expire. If the plugin's internal cache is past its refresh
// threshold it must block until a new token is obtained before
// returning. Token should never return an expired token — callers rely
// on this guarantee and do not do their own expiry check.
//
// Returning a non-nil error causes the HTTP handler to respond 500 and
// log "[github] token refresh failed: <err>". The workspace will retry
// on its next credential-helper invocation.
type TokenProvider interface {
Token(ctx context.Context) (token string, expiresAt time.Time, err error)
}
// Registry holds the ordered list of EnvMutator instances the
// provisioner runs before each workspace boot. Safe for concurrent
// registration + execution.
@ -112,6 +141,26 @@ func (r *Registry) Names() []string {
return names
}
// FirstTokenProvider returns the first registered mutator that also
// implements TokenProvider, or nil if none do. Used to back the
// GET /admin/github-installation-token endpoint so long-running
// workspaces can refresh their GITHUB_TOKEN without a container restart.
//
// A nil registry returns nil (no provider configured).
func (r *Registry) FirstTokenProvider() TokenProvider {
if r == nil {
return nil
}
r.mu.RLock()
defer r.mu.RUnlock()
for _, m := range r.mutators {
if tp, ok := m.(TokenProvider); ok {
return tp
}
}
return nil
}
// Run calls every registered mutator in order. The first one to return
// a non-nil error aborts the chain — subsequent mutators do NOT run,
// and the error is returned to the caller (which marks the workspace

View File

@ -55,6 +55,31 @@ else:
echo "=== Molecule AI Workspace ==="
echo "Runtime: $RUNTIME"
# ──────────────────────────────────────────────────────────
# GitHub credential helper — issue #547
# ──────────────────────────────────────────────────────────
# GitHub App installation tokens expire after ~60 min. The platform
# exposes GET /admin/github-installation-token (backed by the plugin's
# in-process refreshing cache) so workspaces can always get a valid
# token without restarting.
#
# Register molecule-git-token-helper.sh as the git credential helper for
# github.com. git calls it on every push/fetch; it hits the platform
# endpoint and emits a fresh token. Falls through to any existing
# credential helper (e.g. operator .env PAT) if the platform is
# unreachable.
#
# Idempotent — safe to re-run on restart.
HELPER_SCRIPT="/workspace-template/scripts/molecule-git-token-helper.sh"
if [ -f "${HELPER_SCRIPT}" ]; then
git config --global \
"credential.https://github.com.helper" \
"!${HELPER_SCRIPT}" 2>/dev/null || true
echo "[entrypoint] git credential helper registered (molecule-git-token-helper)"
else
echo "[entrypoint] WARNING: molecule-git-token-helper.sh not found at ${HELPER_SCRIPT} — GitHub tokens may expire after 60 min"
fi
# NOTE: Adapter-specific deps are now pre-installed in each adapter's Docker image
# (standalone template repos). Each image installs molecule-ai-workspace-runtime
# from PyPI plus the adapter-specific requirements. No per-runtime pip install needed here.

View File

@ -0,0 +1,112 @@
#!/bin/bash
# molecule-git-token-helper.sh — git credential helper for GitHub App tokens
#
# Fetches a fresh GitHub App installation token from the Molecule AI
# platform endpoint GET /admin/github-installation-token on every git
# push/fetch, so workspace containers never use an expired GH_TOKEN after
# the ~60 min GitHub App token TTL.
#
# # Setup (called once at provision time or initial_prompt)
#
# git config --global \
# "credential.https://github.com.helper" \
# "!/workspace-template/scripts/molecule-git-token-helper.sh"
#
# # How git calls this helper
#
# git passes the action as the first positional arg. The protocol is:
# get → output credentials on stdout (we handle this)
# store → persist credentials (no-op — we never cache)
# erase → revoke credentials (no-op — platform manages lifecycle)
#
# On `get`, git reads key=value pairs terminated by an empty line.
# We must emit at minimum:
# username=x-access-token
# password=<token>
# (blank line)
#
# # Auth
#
# The platform endpoint requires a valid workspace bearer token. The
# token is stored at ${CONFIGS_DIR}/.auth_token (written by platform_auth.py
# on first /registry/register). Workspace env var PLATFORM_URL defaults
# to http://platform:8080.
#
# # Fallback
#
# If the platform endpoint is unreachable (e.g. network partition) or
# returns non-200, the script exits 1 without printing credentials so git
# will fall through to the next helper in the chain (if any). This
# preserves the operator's fallback PAT from .env if present.
#
# # gh CLI re-auth (30-min cron)
#
# To also fix `gh` CLI auth, run this from a workspace cron prompt:
#
# token=$(bash /workspace-template/scripts/molecule-git-token-helper.sh _fetch_token)
# echo "$token" | gh auth login --with-token
#
# (The _fetch_token private action returns only the raw token string.)
#
set -euo pipefail
PLATFORM_URL="${PLATFORM_URL:-http://platform:8080}"
CONFIGS_DIR="${CONFIGS_DIR:-/configs}"
TOKEN_FILE="${CONFIGS_DIR}/.auth_token"
ENDPOINT="${PLATFORM_URL}/admin/github-installation-token"
# _fetch_token — internal helper; also callable directly from cron.
# Outputs the raw token string on success; exits non-zero on failure.
_fetch_token() {
if [ ! -f "${TOKEN_FILE}" ]; then
echo "[molecule-git-token-helper] .auth_token not found at ${TOKEN_FILE}" >&2
exit 1
fi
bearer=$(cat "${TOKEN_FILE}" | tr -d '[:space:]')
if [ -z "${bearer}" ]; then
echo "[molecule-git-token-helper] .auth_token is empty" >&2
exit 1
fi
response=$(curl -sf \
-H "Authorization: Bearer ${bearer}" \
-H "Accept: application/json" \
--max-time 10 \
"${ENDPOINT}" 2>&1) || {
echo "[molecule-git-token-helper] platform request failed: ${response}" >&2
exit 1
}
# Parse {"token":"ghs_...","expires_at":"..."} with sed (no jq dependency).
token=$(echo "${response}" | sed -n 's/.*"token":"\([^"]*\)".*/\1/p')
if [ -z "${token}" ]; then
echo "[molecule-git-token-helper] empty token in platform response: ${response}" >&2
exit 1
fi
echo "${token}"
}
ACTION="${1:-get}"
case "${ACTION}" in
get)
token=$(_fetch_token) || exit 1
# Emit git credential protocol response.
printf 'username=x-access-token\n'
printf 'password=%s\n' "${token}"
printf '\n'
;;
store|erase)
# No-op — the platform manages token lifecycle.
;;
_fetch_token)
# Private action for cron-based gh auth login --with-token.
_fetch_token
;;
*)
echo "[molecule-git-token-helper] unknown action: ${ACTION}" >&2
exit 1
;;
esac