Some checks failed
pr-guards / disable-auto-merge-on-push (pull_request) Failing after 0s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Failing after 41s
Harness Replays / Harness Replays (pull_request) Failing after 30s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Failing after 5m7s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
CI / Platform (Go) (pull_request) Failing after 3m8s
CodeQL / Analyze (${{ matrix.language }}) (javascript-typescript) (pull_request) Failing after 14m4s
CodeQL / Analyze (${{ matrix.language }}) (go) (pull_request) Failing after 14m36s
CodeQL / Analyze (${{ matrix.language }}) (python) (pull_request) Failing after 14m30s
Block internal-flavored paths / Block forbidden paths (pull_request) Has been cancelled
CI / Python Lint & Test (pull_request) Has been cancelled
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Has been cancelled
CI / Canvas (Next.js) (pull_request) Has been cancelled
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Has been cancelled
CI / Detect changes (pull_request) Has been cancelled
Secret scan / Scan diff for credential-shaped strings (pull_request) Has been cancelled
E2E API Smoke Test / detect-changes (pull_request) Has been cancelled
Runtime PR-Built Compatibility / detect-changes (pull_request) Has been cancelled
Harness Replays / detect-changes (pull_request) Has been cancelled
Handlers Postgres Integration / detect-changes (pull_request) Has been cancelled
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Has been cancelled
CI / Shellcheck (E2E scripts) (pull_request) Has been cancelled
Add MOLECULE_IMAGE_REGISTRY env var to override the registry prefix used by all workspace-template image references. Defaults to ghcr.io/molecule-ai (unchanged for OSS users); set to an ECR URI in production tenants when mirroring to AWS. Why this matters: GitHub suspended the Molecule-AI org on 2026-05-06 with no warning. Production tenants kept running because they had images cached locally, but any tenant restart (AWS health event, redeploy, OS reboot) would have failed at `docker pull ghcr.io/molecule-ai/...` because GHCR returned 401. This change introduces the seam needed to point new pulls at a registry we control (AWS ECR) by flipping a single env var on Railway. Design (RFC: molecule-ai/internal#6): - New `RegistryPrefix()` function in `provisioner/registry.go` reads MOLECULE_IMAGE_REGISTRY, falls back to "ghcr.io/molecule-ai". - New `RuntimeImage(runtime)` returns the canonical ref using the prefix. - `RuntimeImages` map computed at init via `computeRuntimeImages()` so existing callers that range over it still work. - `DefaultImage` likewise computed via `RuntimeImage(defaultRuntime)`. - `handlers.TemplateImageRef()` switched from hardcoded format string to `provisioner.RegistryPrefix()`. - `runtime_image_pin.go::resolveRuntimeImage()` automatically inherits the prefix change because it reads from `provisioner.RuntimeImages[]` and only re-formats the tag suffix to a digest pin. Alternatives rejected (see RFC): - Multi-registry fallback chain (try ECR, fall back to GHCR): GHCR is locked from outbound for our org, so the fallback never works for us. Adds code complexity for no benefit. - Hardcoded ECR-only switch: couples production code to a specific deployment environment. OSS users self-hosting Molecule would need the upstream GHCR. - Self-hosted Harbor / registry-on-Hetzner: adds a component to operate. Not justified at 3-tenant scale; AWS ECR is mature and IAM-integrated. Auth — deliberately NOT changed in this commit: - For GHCR, the existing `ghcrAuthHeader()` reads GHCR_USER/GHCR_TOKEN. - For ECR, EC2 user-data installs `amazon-ecr-credential-helper` and adds a `credHelpers` entry in `~/.docker/config.json` so the daemon resolves ECR credentials via the EC2 instance role on every pull. The Go code needs no auth change. This keeps the diff minimal. Backwards compatibility: - Additive: env unset → identical behavior to today (GHCR). - Existing tests reference literal `ghcr.io/molecule-ai/...` strings; they continue to pass under the default prefix. - `RuntimeImages` map preserved for callers that iterate it. - No interface, schema, API, or migration version bump needed. Security review: - No untrusted input: MOLECULE_IMAGE_REGISTRY is set at deploy time (Railway env, EC2 user-data), not by users. - No expanded data collection or logging changes. - No new permissions: ECR pull permission is a future user-data + IAM role change, separate from this code change. - Worst-case: an attacker who already compromises Railway can swap the registry prefix to a malicious URI — same blast radius as compromising Railway today, no expansion. Tests: - 9 new unit tests in `registry_test.go` covering: default fallback, env override, empty env, all 9 known runtimes, unknown runtime, override-applies-to-all, computeRuntimeImages map population, env reflection, alphabetical ordering pin. - All existing provisioner + handlers tests continue to pass. - Mutation-tested mentally: deleting `if v := os.Getenv(...)` makes TestRegistryPrefix_RespectsEnv fail. Deleting `for _, r := range knownRuntimes` makes TestRuntimeImage_AllKnownRuntimes fail. The test suite would catch a regression of the original failure mode. Rollout plan: this PR is safe to merge with no env change. Production cutover happens by setting MOLECULE_IMAGE_REGISTRY on Railway after the AWS ECR mirror is populated (separate ops change, tracked in issue #6 phases 3b–3f). Tracking: - RFC: molecule-ai/internal#6 - Tasks: #97 (ECR setup), #98 (CP fallback) - Tech debt: runbooks/hetzner-rollout-tech-debt-2026-05-06.md item 7 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
141 lines
5.7 KiB
Go
141 lines
5.7 KiB
Go
package provisioner
|
|
|
|
import (
|
|
"strings"
|
|
"testing"
|
|
)
|
|
|
|
// TestRegistryPrefix_DefaultsToGHCR pins the OSS-default behavior. If a future
|
|
// refactor accidentally drops the default, OSS users self-hosting Molecule
|
|
// would silently lose image pulls — this test should fail loudly instead.
|
|
func TestRegistryPrefix_DefaultsToGHCR(t *testing.T) {
|
|
t.Setenv("MOLECULE_IMAGE_REGISTRY", "")
|
|
got := RegistryPrefix()
|
|
want := "ghcr.io/molecule-ai"
|
|
if got != want {
|
|
t.Fatalf("RegistryPrefix() = %q, want %q (default must remain GHCR for OSS users)", got, want)
|
|
}
|
|
}
|
|
|
|
// TestRegistryPrefix_RespectsEnv verifies the override path used in
|
|
// production tenants where MOLECULE_IMAGE_REGISTRY points at a private
|
|
// mirror (AWS ECR, self-hosted Harbor, etc.).
|
|
func TestRegistryPrefix_RespectsEnv(t *testing.T) {
|
|
t.Setenv("MOLECULE_IMAGE_REGISTRY", "123456789012.dkr.ecr.us-east-2.amazonaws.com/molecule-ai")
|
|
got := RegistryPrefix()
|
|
want := "123456789012.dkr.ecr.us-east-2.amazonaws.com/molecule-ai"
|
|
if got != want {
|
|
t.Fatalf("RegistryPrefix() = %q, want %q (env override path is the production cutover mechanism)", got, want)
|
|
}
|
|
}
|
|
|
|
// TestRegistryPrefix_EmptyEnvFallsBackToDefault — guard against an operator
|
|
// setting MOLECULE_IMAGE_REGISTRY="" by mistake (e.g. unset deploy variable
|
|
// becomes empty string, not literally absent). We treat "" as "use default"
|
|
// so a misconfigured env doesn't mean an empty registry prefix.
|
|
func TestRegistryPrefix_EmptyEnvFallsBackToDefault(t *testing.T) {
|
|
t.Setenv("MOLECULE_IMAGE_REGISTRY", "")
|
|
if RegistryPrefix() != defaultRegistryPrefix {
|
|
t.Fatalf("empty MOLECULE_IMAGE_REGISTRY should fall back to %q, got %q", defaultRegistryPrefix, RegistryPrefix())
|
|
}
|
|
}
|
|
|
|
// TestRuntimeImage_AllKnownRuntimes — every runtime in the canonical list
|
|
// must produce a properly-formatted image ref. If a new runtime is added to
|
|
// knownRuntimes but the format changes, this catches it.
|
|
func TestRuntimeImage_AllKnownRuntimes(t *testing.T) {
|
|
t.Setenv("MOLECULE_IMAGE_REGISTRY", "")
|
|
for _, r := range knownRuntimes {
|
|
got := RuntimeImage(r)
|
|
want := "ghcr.io/molecule-ai/workspace-template-" + r + ":latest"
|
|
if got != want {
|
|
t.Errorf("RuntimeImage(%q) = %q, want %q", r, got, want)
|
|
}
|
|
}
|
|
// Pin the count so adding a runtime requires explicit test acknowledgement.
|
|
if len(knownRuntimes) != 9 {
|
|
t.Errorf("knownRuntimes length = %d, want 9 (autogen, claude-code, codex, crewai, deepagents, gemini-cli, hermes, langgraph, openclaw)", len(knownRuntimes))
|
|
}
|
|
}
|
|
|
|
// TestRuntimeImage_UnknownRuntime — defensive: callers must fall back to
|
|
// DefaultImage when a runtime is unknown, never silently use the wrong
|
|
// prefix. Returning "" enforces an explicit fallback at every call site.
|
|
func TestRuntimeImage_UnknownRuntime(t *testing.T) {
|
|
for _, name := range []string{"", "nonexistent", "WORKSPACE-TEMPLATE-FAKE", "../../../etc/passwd"} {
|
|
if got := RuntimeImage(name); got != "" {
|
|
t.Errorf("RuntimeImage(%q) = %q, want empty string for unknown runtime", name, got)
|
|
}
|
|
}
|
|
}
|
|
|
|
// TestRuntimeImage_RegistryOverrideAppliesToAllRuntimes — the override
|
|
// flips ALL runtimes consistently. If a refactor accidentally hardcoded
|
|
// the prefix in some runtimes but not others (the failure mode that
|
|
// triggered this whole rollout), this test catches it.
|
|
func TestRuntimeImage_RegistryOverrideAppliesToAllRuntimes(t *testing.T) {
|
|
const ecr = "999999999999.dkr.ecr.us-east-2.amazonaws.com/molecule-ai"
|
|
t.Setenv("MOLECULE_IMAGE_REGISTRY", ecr)
|
|
|
|
for _, r := range knownRuntimes {
|
|
got := RuntimeImage(r)
|
|
if !strings.HasPrefix(got, ecr+"/workspace-template-") {
|
|
t.Errorf("RuntimeImage(%q) = %q, must start with override prefix %q", r, got, ecr)
|
|
}
|
|
if !strings.HasSuffix(got, ":latest") {
|
|
t.Errorf("RuntimeImage(%q) = %q, must keep :latest tag suffix", r, got)
|
|
}
|
|
}
|
|
}
|
|
|
|
// TestComputeRuntimeImages_AllRuntimesPresent — the map must contain every
|
|
// known runtime. Drift between knownRuntimes and computeRuntimeImages would
|
|
// silently break the runtime → image lookup that provisioner.Start uses.
|
|
func TestComputeRuntimeImages_AllRuntimesPresent(t *testing.T) {
|
|
t.Setenv("MOLECULE_IMAGE_REGISTRY", "")
|
|
m := computeRuntimeImages()
|
|
if len(m) != len(knownRuntimes) {
|
|
t.Fatalf("computeRuntimeImages() has %d entries, want %d (one per knownRuntime)", len(m), len(knownRuntimes))
|
|
}
|
|
for _, r := range knownRuntimes {
|
|
img, ok := m[r]
|
|
if !ok {
|
|
t.Errorf("computeRuntimeImages() missing runtime %q", r)
|
|
continue
|
|
}
|
|
if img == "" {
|
|
t.Errorf("computeRuntimeImages()[%q] is empty", r)
|
|
}
|
|
}
|
|
}
|
|
|
|
// TestComputeRuntimeImages_ReflectsCurrentEnv — calling computeRuntimeImages
|
|
// after env change rebuilds the map with new prefix. Tests + ops procedures
|
|
// that flip the env in-process rely on this.
|
|
func TestComputeRuntimeImages_ReflectsCurrentEnv(t *testing.T) {
|
|
t.Setenv("MOLECULE_IMAGE_REGISTRY", "")
|
|
defaultMap := computeRuntimeImages()
|
|
if !strings.HasPrefix(defaultMap["claude-code"], "ghcr.io/molecule-ai/") {
|
|
t.Fatalf("default map should be GHCR-prefixed, got %q", defaultMap["claude-code"])
|
|
}
|
|
|
|
const mirror = "registry.example.com/molecule-ai"
|
|
t.Setenv("MOLECULE_IMAGE_REGISTRY", mirror)
|
|
mirrorMap := computeRuntimeImages()
|
|
if !strings.HasPrefix(mirrorMap["claude-code"], mirror+"/") {
|
|
t.Fatalf("mirror-prefixed map should start with %q, got %q", mirror, mirrorMap["claude-code"])
|
|
}
|
|
}
|
|
|
|
// TestKnownRuntimes_AlphabeticalOrder — pin the order so test snapshots
|
|
// (and human readers diffing the file) see deterministic output. Adding a
|
|
// new runtime out of alphabetical order will fail this test, which is the
|
|
// nudge to keep the file readable.
|
|
func TestKnownRuntimes_AlphabeticalOrder(t *testing.T) {
|
|
for i := 1; i < len(knownRuntimes); i++ {
|
|
if knownRuntimes[i-1] >= knownRuntimes[i] {
|
|
t.Errorf("knownRuntimes not alphabetical: %q comes before %q", knownRuntimes[i-1], knownRuntimes[i])
|
|
}
|
|
}
|
|
}
|