Two surfaces in workspace-server hardcoded `ghcr.io` and silently bypassed
the `MOLECULE_IMAGE_REGISTRY` env override that flips every other image
operation to the configured private mirror (e.g. AWS ECR in production):
1. internal/imagewatch/watch.go — image-auto-refresh polled
`https://ghcr.io/v2/...` and `https://ghcr.io/token` directly. Post-
suspension, with the platform pointed at ECR, the watcher silently
stopped seeing digest changes (every poll either 404'd or hung on a
registry it has no business talking to).
2. internal/handlers/admin_workspace_images.go — Docker Engine auth
payload pinned `serveraddress: "ghcr.io"`, so when the operator sets
`MOLECULE_IMAGE_REGISTRY=…ecr…/molecule-ai` the engine matched the
wrong credential entry on every authenticated pull.
Fix: extract `provisioner.RegistryHost()` returning the host portion of
`RegistryPrefix()` (e.g. `ghcr.io` ← `ghcr.io/molecule-ai`, or
`004947743811.dkr.ecr.us-east-2.amazonaws.com` ← the ECR mirror prefix),
and route both surfaces through it. Default behavior is unchanged for
OSS users on GHCR.
Tests
- New `TestRegistryHost_SplitsHostFromOrgPath` and
`TestRegistryHost_NeverEmpty` pin the helper across GHCR / ECR /
self-hosted Gitea / bare-host edge cases.
- New `TestGHCRAuthHeader_RespectsRegistryEnv` asserts the Docker auth
payload's `serveraddress` follows MOLECULE_IMAGE_REGISTRY (and never
leaks the org-path suffix).
- New `TestRemoteDigest_RegistryHostFollowsEnv` stands up an httptest
server, points MOLECULE_IMAGE_REGISTRY at it, and confirms both the
token endpoint and the manifest HEAD land there — i.e. the full image-
watch loop respects the env override end-to-end.
Both new tests were verified to FAIL on the pre-fix code path before the
helper was wired in, so a future revert can't silently re-introduce the
bug.
Out of scope (followup needed)
ECR uses `aws ecr get-authorization-token` (SigV4 + basic-auth) instead
of GHCR's `/token?service=…&scope=…` flow. This PR makes the URL host-
configurable; the bearer-token negotiation in `fetchPullToken` still
speaks the GHCR flavor. On ECR with `IMAGE_AUTO_REFRESH=true`, the
watcher will now fail loudly at the token fetch (logged per tick) rather
than silently hitting ghcr.io. Operators on ECR should keep
IMAGE_AUTO_REFRESH=false until ECR auth is wired — tracked as a separate
task. Net effect of this PR alone is strictly better than pre-fix:
fail-loud > silent-broken.
Refs: RFC #229 P2-4
tier:low
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
123 lines
4.9 KiB
Go
123 lines
4.9 KiB
Go
package provisioner
|
|
|
|
import (
|
|
"fmt"
|
|
"os"
|
|
"strings"
|
|
)
|
|
|
|
// defaultRegistryPrefix is the upstream OSS face for all workspace template
|
|
// images. Self-hosted Molecule deployments without the MOLECULE_IMAGE_REGISTRY
|
|
// override pull from here.
|
|
const defaultRegistryPrefix = "ghcr.io/molecule-ai"
|
|
|
|
// knownRuntimes is the canonical list of workspace template runtimes shipped
|
|
// in main. Any runtime added here MUST also have a standalone template repo
|
|
// (Molecule-AI/molecule-ai-workspace-template-<name>) and an entry in the
|
|
// publish-template-image workflow that builds it.
|
|
//
|
|
// Order matters for deterministic test snapshots; keep alphabetical.
|
|
var knownRuntimes = []string{
|
|
"autogen",
|
|
"claude-code",
|
|
"codex",
|
|
"crewai",
|
|
"deepagents",
|
|
"gemini-cli",
|
|
"hermes",
|
|
"langgraph",
|
|
"openclaw",
|
|
}
|
|
|
|
// defaultRuntime is the fallback when a workspace's config doesn't specify a
|
|
// runtime. Picked because LangGraph is the most common in our org templates
|
|
// and has the smallest "first impression" cold-start surface.
|
|
const defaultRuntime = "langgraph"
|
|
|
|
// RegistryPrefix returns the registry prefix all workspace-template image
|
|
// references should use. Defaults to ghcr.io/molecule-ai (the upstream OSS
|
|
// face) and is overridden by the MOLECULE_IMAGE_REGISTRY env var in
|
|
// production tenants where we mirror images to a private registry.
|
|
//
|
|
// The override is set at deploy time (Railway env, EC2 user-data) — never
|
|
// from user-supplied input — so the value is trusted by the time it reaches
|
|
// this code. Validation is deliberately minimal: an operator-supplied
|
|
// prefix that points at a registry the EC2 can't authenticate to will fail
|
|
// loudly at docker-pull time, which is the right blast radius.
|
|
//
|
|
// Example values:
|
|
//
|
|
// (unset) → ghcr.io/molecule-ai (OSS default)
|
|
// "123456789012.dkr.ecr.us-east-2.amazonaws.com/molecule-ai" → AWS ECR mirror
|
|
// "git.moleculesai.app/molecule-ai" → self-hosted Gitea Container Registry (future)
|
|
//
|
|
// Auth is registry-specific and configured outside this function:
|
|
// - GHCR: GHCR_USER/GHCR_TOKEN env vars consumed by ghcrAuthHeader()
|
|
// - ECR: docker credential helper (amazon-ecr-credential-helper) configured
|
|
// in EC2 user-data; ~/.docker/config.json has credHelpers entry; the
|
|
// daemon resolves auth automatically on every pull.
|
|
func RegistryPrefix() string {
|
|
if v := os.Getenv("MOLECULE_IMAGE_REGISTRY"); v != "" {
|
|
return v
|
|
}
|
|
return defaultRegistryPrefix
|
|
}
|
|
|
|
// RegistryHost returns just the registry host portion of RegistryPrefix() —
|
|
// i.e. everything before the first "/" separator. This is the value that
|
|
// belongs in:
|
|
//
|
|
// - Docker Engine PullOptions.RegistryAuth payloads (`serveraddress` field)
|
|
// — the engine matches credentials against host, not host+org-path.
|
|
// - Docker Registry V2 HTTP API base URLs (e.g. `https://<host>/v2/...`)
|
|
// — the V2 API is host-rooted; the org-path lives in the manifest path.
|
|
//
|
|
// Examples:
|
|
//
|
|
// "ghcr.io/molecule-ai" → "ghcr.io"
|
|
// "123456789012.dkr.ecr.us-east-2.amazonaws.com/molecule-ai" → "123456789012.dkr.ecr.us-east-2.amazonaws.com"
|
|
// "git.moleculesai.app/molecule-ai" → "git.moleculesai.app"
|
|
//
|
|
// If RegistryPrefix() ever returns a bare host (no `/`), we return it as-is
|
|
// rather than letting strings.SplitN produce an empty string — defensive
|
|
// against a misconfiguration where the operator sets just the host.
|
|
func RegistryHost() string {
|
|
prefix := RegistryPrefix()
|
|
if i := strings.IndexByte(prefix, '/'); i > 0 {
|
|
return prefix[:i]
|
|
}
|
|
return prefix
|
|
}
|
|
|
|
// RuntimeImage returns the canonical image reference for the given runtime,
|
|
// using the current RegistryPrefix() and the moving `:latest` tag.
|
|
//
|
|
// For SHA-pinned references (production thin-AMI launches), the
|
|
// runtime_image_pins lookup in handlers/runtime_image_pin.go strips the
|
|
// `:latest` suffix and appends an immutable `@sha256:<digest>` from the DB.
|
|
// That code path naturally inherits any RegistryPrefix() change because it
|
|
// reads from RuntimeImages[runtime] and only re-formats the tag suffix.
|
|
//
|
|
// Returns the empty string for unknown runtimes; callers should fall through
|
|
// to DefaultImage in that case (matching legacy behavior).
|
|
func RuntimeImage(runtime string) string {
|
|
for _, r := range knownRuntimes {
|
|
if r == runtime {
|
|
return fmt.Sprintf("%s/workspace-template-%s:latest", RegistryPrefix(), runtime)
|
|
}
|
|
}
|
|
return ""
|
|
}
|
|
|
|
// computeRuntimeImages returns the {runtime: image-ref} map evaluated against
|
|
// the current RegistryPrefix(). Called at package init to populate the
|
|
// exported RuntimeImages var. Tests that flip MOLECULE_IMAGE_REGISTRY between
|
|
// expected values use this helper to rebuild the map mid-run.
|
|
func computeRuntimeImages() map[string]string {
|
|
out := make(map[string]string, len(knownRuntimes))
|
|
for _, r := range knownRuntimes {
|
|
out[r] = RuntimeImage(r)
|
|
}
|
|
return out
|
|
}
|