molecule-core/platform/pkg/provisionhook/mutator.go
molecule-ai[bot] b1c976a54d
fix(github): refresh installation token when TTL < 10 min (#547) (#567)
Root cause: the github-app-auth plugin injects GH_TOKEN + GITHUB_TOKEN
into each workspace container's env at provision time (EnvMutator). Those
are GitHub App installation tokens with a fixed ~60 min TTL. The plugin
has an in-process cache that proactively refreshes 5 min before expiry —
but the workspace env is set once at container start and never updated.
Any workspace alive >60 min ends up with an expired token.

Fix (Option B — on-demand endpoint):

pkg/provisionhook:
  - Add TokenProvider interface: Token(ctx) (token, expiresAt, error)
    Lives in pkg/ (public) so the github-app-auth plugin can implement it.
  - Add Registry.FirstTokenProvider() — discovers the first mutator that
    also satisfies TokenProvider via interface assertion. Safe under
    concurrent reads (existing RWMutex).

platform/internal/handlers/github_token.go:
  - New GitHubTokenHandler serving GET /admin/github-installation-token
  - Delegates to the registered TokenProvider (plugin cache — always fresh)
  - 404 if no GitHub App configured, 500 + [github] prefix log on error
  - Never logs the token itself

platform/internal/handlers/workspace.go:
  - Add TokenRegistry() getter so the router can wire the handler without
    coupling to WorkspaceHandler internals

platform/internal/router/router.go:
  - Register GET /admin/github-installation-token under AdminAuth

workspace-template/:
  - scripts/molecule-git-token-helper.sh — git credential helper; calls
    the platform endpoint on every push/fetch; falls through to next
    helper (operator PAT) if platform unreachable
  - entrypoint.sh — configure the credential helper at startup

Why Option B over Option A (background goroutine):
  - The plugin already has its own cache refresh; nothing to refresh here.
  - Pushing env updates into running containers requires docker exec, which
    the architecture explicitly rejects (issue #547 "Alternatives").
  - Pull-based is stateless, trivially testable, zero extra goroutines.

Closes #547

Co-authored-by: Molecule AI DevOps Engineer <devops-engineer@agents.moleculesai.app>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 00:47:03 +00:00

187 lines
6.6 KiB
Go

// Package provisionhook is the public extension point that lets external
// plugins mutate the env map a workspace container will boot with, just
// before the provisioner calls Start(cfg).
//
// The package lives under pkg/ (not internal/) because plugins import it
// from outside this Go module. Anything outside pkg/ is core-only.
//
// # Why this exists
//
// Auth providers (GitHub App tokens, GitLab tokens, Bitbucket app
// passwords, internal PAT vaults), secret managers (Vault, AWS Secrets
// Manager, GCP Secret Manager), per-tenant config injectors, and
// observability sidecars all want to write env vars into the workspace
// container before it starts. Each is an OPTIONAL concern that only some
// deployments need. Hardcoding any of them in the platform binary
// violates the "core stays small, capabilities are plugins" principle
// (CEO 2026-04-16, after the monorepo → 44 sub-repos split).
//
// # Plugin shape
//
// A plugin implements EnvMutator and registers an instance with a
// Registry at platform startup. The provisioner calls Run(...) on the
// registry before each workspace container starts.
//
// Plugins live in their own Go modules + repos (e.g.
// github.com/Molecule-AI/molecule-ai-plugin-github-app-auth). Each
// plugin ships its own cmd/server/main.go that imports core's startup
// function + registers the plugin's mutator. Operators deploy the
// plugin binary instead of core's vanilla cmd/server when they want
// the plugin's behaviour.
//
// # Failure handling
//
// MutateEnv returning a non-nil error aborts the provision (workspace
// is marked 'failed', container never starts). Plugins should fail open
// on transient external-service errors (log + return nil) so a flaky
// upstream doesn't block agent provisioning. Reserve errors for hard
// config bugs that the operator must fix.
//
// # Concurrency
//
// Registry is safe for concurrent registration + execution. MutateEnv
// implementations should be safe to call from goroutines (the
// provisioner runs each workspace's provision in its own goroutine).
package provisionhook
import (
"context"
"fmt"
"sync"
"time"
)
// EnvMutator is implemented by plugins that want to inject env vars
// into a workspace container at provision time.
//
// - Name returns a stable identifier for logging / metrics. Should
// match the plugin's repo / module name (e.g. "github-app-auth").
// - MutateEnv receives the workspace ID, the create payload, and a
// mutable env map. It can read existing values, add new ones, or
// overwrite as needed. Mutations are visible to subsequent
// mutators in the chain (registration order).
type EnvMutator interface {
Name() string
MutateEnv(ctx context.Context, workspaceID string, env map[string]string) error
}
// TokenProvider is an optional interface that EnvMutator implementations
// may also satisfy. When a mutator implements TokenProvider the platform
// can serve GET /admin/github-installation-token, allowing long-running
// workspaces to fetch a fresh GitHub token without restarting.
//
// # Why a separate interface?
//
// EnvMutator.MutateEnv is called once at provision time and writes into
// an env map. Calling it again just to read the current token would be
// semantically wrong and potentially unsafe (the env map is a live
// workspace struct). TokenProvider cleanly separates "what do I inject
// at boot?" from "what is the live token right now?".
//
// # Plugin contract
//
// Token must return the current valid token and the time at which it
// will expire. If the plugin's internal cache is past its refresh
// threshold it must block until a new token is obtained before
// returning. Token should never return an expired token — callers rely
// on this guarantee and do not do their own expiry check.
//
// Returning a non-nil error causes the HTTP handler to respond 500 and
// log "[github] token refresh failed: <err>". The workspace will retry
// on its next credential-helper invocation.
type TokenProvider interface {
Token(ctx context.Context) (token string, expiresAt time.Time, err error)
}
// Registry holds the ordered list of EnvMutator instances the
// provisioner runs before each workspace boot. Safe for concurrent
// registration + execution.
type Registry struct {
mu sync.RWMutex
mutators []EnvMutator
}
// NewRegistry returns an empty registry. The platform creates one at
// startup; plugins call Register on it.
func NewRegistry() *Registry {
return &Registry{}
}
// Register adds a mutator to the chain. Mutators run in registration
// order. Registering the same instance twice is allowed (it'll run
// twice) — the registry doesn't dedupe; that's the caller's
// responsibility if dedup matters.
func (r *Registry) Register(m EnvMutator) {
if m == nil {
return
}
r.mu.Lock()
defer r.mu.Unlock()
r.mutators = append(r.mutators, m)
}
// Len reports how many mutators are registered. Used by the platform's
// boot log so operators can see which extension hooks are wired.
func (r *Registry) Len() int {
r.mu.RLock()
defer r.mu.RUnlock()
return len(r.mutators)
}
// Names returns the names of registered mutators in registration order.
// Used by the boot log so operators can grep for which plugins are
// active.
func (r *Registry) Names() []string {
r.mu.RLock()
defer r.mu.RUnlock()
names := make([]string, len(r.mutators))
for i, m := range r.mutators {
names[i] = m.Name()
}
return names
}
// FirstTokenProvider returns the first registered mutator that also
// implements TokenProvider, or nil if none do. Used to back the
// GET /admin/github-installation-token endpoint so long-running
// workspaces can refresh their GITHUB_TOKEN without a container restart.
//
// A nil registry returns nil (no provider configured).
func (r *Registry) FirstTokenProvider() TokenProvider {
if r == nil {
return nil
}
r.mu.RLock()
defer r.mu.RUnlock()
for _, m := range r.mutators {
if tp, ok := m.(TokenProvider); ok {
return tp
}
}
return nil
}
// Run calls every registered mutator in order. The first one to return
// a non-nil error aborts the chain — subsequent mutators do NOT run,
// and the error is returned to the caller (which marks the workspace
// failed).
//
// A nil registry is a no-op (returns nil) so the provisioner doesn't
// have to nil-check before calling.
func (r *Registry) Run(ctx context.Context, workspaceID string, env map[string]string) error {
if r == nil {
return nil
}
r.mu.RLock()
mutators := make([]EnvMutator, len(r.mutators))
copy(mutators, r.mutators)
r.mu.RUnlock()
for _, m := range mutators {
if err := m.MutateEnv(ctx, workspaceID, env); err != nil {
return fmt.Errorf("provisionhook %q: %w", m.Name(), err)
}
}
return nil
}