diff --git a/docs/adr/ADR-002-local-build-mode-via-registry-presence.md b/docs/adr/ADR-002-local-build-mode-via-registry-presence.md new file mode 100644 index 00000000..9df6c141 --- /dev/null +++ b/docs/adr/ADR-002-local-build-mode-via-registry-presence.md @@ -0,0 +1,74 @@ +# ADR-002: Local-build mode signalled by `MOLECULE_IMAGE_REGISTRY` presence + +* Status: Accepted (2026-05-07) +* Issue: #63 (closes Task #194) +* Decision: Hongming (CTO) + Claude Opus 4.7 (implementation) + +## Context + +Pre-2026-05-06, every Molecule deployment — both production tenants and OSS contributor laptops — pulled workspace-template-* container images from `ghcr.io/molecule-ai/`. Production tenants additionally set `MOLECULE_IMAGE_REGISTRY` to an AWS ECR mirror via Railway env / EC2 user-data, but the OSS default was the upstream GHCR org. + +On 2026-05-06 the `Molecule-AI` GitHub org was suspended (saved memory: `feedback_github_botring_fingerprint`). GHCR now returns **403 Forbidden** for every `molecule-ai/workspace-template-*` manifest. OSS contributors who clone `molecule-core` and run `go run ./workspace-server/cmd/server` cannot provision a workspace — every first provision fails with: + +``` +docker image "ghcr.io/molecule-ai/workspace-template-claude-code:latest" not found after pull attempt +``` + +Production tenants are unaffected (their `MOLECULE_IMAGE_REGISTRY` points at ECR, which we still control), but OSS onboarding is broken. Workspace template repos are intentionally separate from `molecule-core` (each runtime is OSS-shape and forkable), and they are mirrored to Gitea (`https://git.moleculesai.app/molecule-ai/molecule-ai-workspace-template-`) — but the provisioner has no path that consumes Gitea source directly. + +## Decision + +When `MOLECULE_IMAGE_REGISTRY` is **unset** (or empty), the provisioner switches to a **local-build mode** that: + +1. Looks up the workspace-template repo's HEAD sha on Gitea via a single API call. +2. Checks whether a SHA-pinned local image (`molecule-local/workspace-template-:`) already exists; if so, reuses it. +3. Otherwise shallow-clones the repo into `~/.cache/molecule/workspace-template-build///` and runs `docker build --platform=linux/amd64 -t .`. +4. Hands the SHA-pinned tag to Docker for ContainerCreate, bypassing the registry-pull path entirely. + +When `MOLECULE_IMAGE_REGISTRY` is **set**, behavior is unchanged: pull the image from that registry. Existing prod tenants and self-hosters who mirror to a private registry are not affected. + +## Consequences + +### Positive + +* **Zero-config OSS onboarding** — `git clone molecule-core && go run ./workspace-server/cmd/server` boots end-to-end without any registry credentials. +* **Production tenants protected** — same env var, same semantics in SaaS-mode. Migration is a no-op. +* **No new env var** — extending an existing var's semantics ("where to pull, OR build locally if absent") rather than introducing `MOLECULE_LOCAL_BUILD=1` keeps the surface small. +* **SHA-pinned cache** — repeat builds are O(API-call); only template-repo HEAD changes invalidate. +* **Production-parity image** — amd64 emulation on Apple Silicon honours `feedback_local_must_mimic_production`. The provisioner's existing `defaultImagePlatform()` already forces amd64 for parity; building amd64 locally lets that decision stay consistent. + +### Negative + +* **Conflates two concerns** — `MOLECULE_IMAGE_REGISTRY` now signals BOTH "where to pull" AND "build locally if absent." A future operator who unsets it expecting a hard error will instead get a slow first-provision. Documented in the runbook. +* **First-provision is slow on Apple Silicon** — 5–10 min via QEMU emulation on the cold path. Mitigated by SHA-cache (subsequent runs are <1s lookup + 0s build). +* **Coverage gap** — only 4 of 9 runtimes are mirrored to Gitea today (`claude-code`, `hermes`, `langgraph`, `autogen`). The other 5 fail with an actionable "not mirrored" error. Mirroring those repos is a separate task. +* **Implicit trust boundary** — operator running `go run` implicitly trusts `molecule-ai/molecule-ai-workspace-template-*` repos on Gitea. This is the same trust they would extend to the GHCR images today; not a new attack surface. + +## Alternatives considered + +1. **New env var `MOLECULE_LOCAL_BUILD=1`** — explicit, but requires OSS contributors to know it exists. Violates the zero-config goal. +2. **Push pre-built images to a Gitea container registry, mirror tag from upstream** — operationally cleaner but: (a) Gitea's container-registry add-on isn't deployed on the operator host, (b) defeats the OSS-contributor goal of "hack on the source, see your changes," since they'd still pull a stale image. +3. **Embed Dockerfiles in molecule-core itself, drop the standalone template repos** — would work but breaks the OSS-shape principle; templates are intentionally separable, anyone-can-fork artifacts. +4. **Build native arch on Apple Silicon (arm64) and drop the platform pin in local-mode** — fast, but creates `linux/arm64` images that diverge from the amd64-only prod runtime. Local-vs-prod debug behavior would diverge. Rejected per `feedback_local_must_mimic_production`. + +## Security review + +* **Gitea repo URL allowlist** — runtime name must be in the `knownRuntimes` allowlist (defence-in-depth against a future code path that lets cfg.Runtime carry untrusted input). Repo prefix is hardcoded to `https://git.moleculesai.app/molecule-ai/molecule-ai-workspace-template-`; forks can override via `MOLECULE_LOCAL_TEMPLATE_REPO_PREFIX` (opt-in, default off). +* **Token handling** — clones are anonymous over HTTPS by default (templates are public). `MOLECULE_GITEA_TOKEN`, if set, is passed via URL userinfo for the clone and as `Authorization: token` for the API call. The token is **masked in every log line** via `maskTokenInURL` / `maskTokenInString` and never appears in the cache dir path. +* **No silent fallback** — if Gitea is unreachable or the runtime isn't mirrored, we return a clear error mentioning the repo URL and the missing runtime. We **never** fall back to GHCR/ECR (that would be a confusing bug for an OSS contributor who happened to have stale ECR creds in their docker config). +* **Build-arg injection** — `docker build` is invoked with NO `--build-arg` from external input. Dockerfile is consumed as-is. +* **Cache poisoning** — cache key is the Gitea HEAD sha + Dockerfile content; a force-push to the template repo's main branch regenerates the key on next run. Cache dir is per-user (`$HOME/.cache`), so cross-user attacks aren't relevant in single-user dev mode. + +## Versioning + back-compat + +* Existing prod tenants set `MOLECULE_IMAGE_REGISTRY=` → unchanged behavior. +* Existing local installs that set the var → unchanged behavior. +* Existing local installs that don't set it → switch to local-build path. Migration: none required (additive); first provision will take 5–10 min instead of failing. +* No deprecations. + +## References + +* Issue #63 — feat(workspace-server): local-dev provisioner builds from Gitea source +* Saved memory `feedback_local_must_mimic_production` — local docker must mimic prod, no bypasses +* Saved memory `reference_post_suspension_pipeline` — full post-2026-05-06 stack shape +* Saved memory `feedback_github_botring_fingerprint` — what got the org suspended diff --git a/docs/development/local-development.md b/docs/development/local-development.md index 42f9e277..d5bd116b 100644 --- a/docs/development/local-development.md +++ b/docs/development/local-development.md @@ -1,5 +1,41 @@ # Local Development +## Workspace Template Images: Local-Build Mode (Issue #63) + +OSS contributors who run `molecule-core` locally do **not** need to authenticate to GHCR or AWS ECR. When the `MOLECULE_IMAGE_REGISTRY` env var is **unset**, the platform automatically: + +1. Looks up the HEAD sha of `https://git.moleculesai.app/molecule-ai/molecule-ai-workspace-template-` (single API call, no clone). +2. If a local image tagged `molecule-local/workspace-template-:` already exists, reuses it (cache hit). +3. Otherwise, shallow-clones the repo into `~/.cache/molecule/workspace-template-build///` and runs `docker build --platform=linux/amd64 -t .`. +4. Hands the SHA-pinned tag to Docker for `ContainerCreate`. + +**First-provision build time:** 5–10 min on Apple Silicon (amd64 emulation). Subsequent provisions hit the cache and start in seconds. Cache is invalidated automatically when the template repo's HEAD moves. + +**Currently mirrored on Gitea:** `claude-code`, `hermes`, `langgraph`, `autogen`. Other runtimes (`crewai`, `deepagents`, `codex`, `gemini-cli`, `openclaw`) fail with an actionable "not mirrored to Gitea" error pointing at the missing repo. + +**Production tenants are unaffected** — every prod tenant sets `MOLECULE_IMAGE_REGISTRY` to its private ECR mirror via Railway env / EC2 user-data, so the SaaS pull path stays identical. + +### Environment overrides + +| Var | Default | Use case | +|-----|---------|----------| +| `MOLECULE_IMAGE_REGISTRY` | (unset) | Set to a real registry URL to switch from local-build to SaaS-pull mode. | +| `MOLECULE_LOCAL_BUILD_CACHE` | `~/.cache/molecule/workspace-template-build` | Override cache directory. | +| `MOLECULE_LOCAL_TEMPLATE_REPO_PREFIX` | `https://git.moleculesai.app/molecule-ai/molecule-ai-workspace-template-` | Point at a fork. | +| `MOLECULE_GITEA_TOKEN` | (unset) | Required only if your fork has private template repos. | + +### Verifying a switch from the GHCR-retag stopgap + +Pre-fix, OSS contributors worked around the suspended GHCR org by manually retagging an `:latest` image. After this change, that workaround is **redundant**: simply unset `MOLECULE_IMAGE_REGISTRY` (or leave it unset), boot the platform, and provision a workspace. Logs will show: + +``` +Provisioner: local-build mode → using locally-built image molecule-local/workspace-template-claude-code: for runtime claude-code +local-build: cloning https://git.moleculesai.app/molecule-ai/molecule-ai-workspace-template-claude-code → ... +local-build: docker build done in +``` + +If you still see `ghcr.io/molecule-ai/...` in the boot log, double-check `env | grep MOLECULE_IMAGE_REGISTRY` — a stale shell export from the pre-fix workaround could keep SaaS-mode active. + ## Starting the Stack ```bash diff --git a/workspace-server/internal/provisioner/localbuild.go b/workspace-server/internal/provisioner/localbuild.go new file mode 100644 index 00000000..9f1fcf5d --- /dev/null +++ b/workspace-server/internal/provisioner/localbuild.go @@ -0,0 +1,545 @@ +package provisioner + +import ( + "context" + "crypto/sha256" + "encoding/hex" + "errors" + "fmt" + "io" + "log" + "net/http" + "net/url" + "os" + "os/exec" + "path/filepath" + "strings" + "sync" + "time" +) + +// Local-build mode: clone the workspace-template- repo from Gitea +// and `docker build` it on the host so OSS contributors can run molecule-core +// end-to-end without authenticating to (or being able to reach) GHCR/ECR. +// +// The flow: +// +// 1. ensureLocalImage(runtime) is called by the provisioner before +// ContainerCreate, but only when Resolve().Mode == RegistryModeLocal. +// 2. We compute a cache key from the Gitea repo's HEAD sha (one HTTP +// call to https://git.moleculesai.app/api/v1/repos/.../branches/main). +// 3. If `molecule-local/workspace-template-:` already +// exists in the local Docker image store, we return immediately. +// 4. Otherwise: shallow git-clone the repo into the cache dir, then +// `docker buildx build --platform=linux/amd64 -t ` on it. We +// also tag `:latest` so `docker images` shows a friendly entry. +// +// Why amd64 emulation: the provisioner's defaultImagePlatform() forces +// linux/amd64 on Apple Silicon for parity with the (amd64-only) prod +// images. Building native arm64 in local-mode would diverge — see the +// design rationale in Issue #63 and the saved memory +// `feedback_local_must_mimic_production`. +// +// Auth: clone is anonymous (templates are public). If MOLECULE_GITEA_TOKEN +// is set, we use it via the URL's userinfo — the token is masked in +// every log line by maskTokenInURL(). +// +// Failure mode: fail-closed. If Gitea is unreachable we surface a clear +// error message including the repo URL; we NEVER fall back to GHCR/ECR +// silently (would be a confusing bug for an OSS contributor who +// happens to have stale ECR creds in their docker config). + +// gitTemplateRepoPrefix is the prefix all workspace-template repos live +// under on Gitea. Hardcoded so an attacker who controlled cfg.Runtime +// (defence-in-depth — today the field is platform-validated upstream) +// can only ever reach a repo under molecule-ai/. +// +// Operators who want to point local-build at a fork can override the +// full prefix via MOLECULE_LOCAL_TEMPLATE_REPO_PREFIX (e.g. +// `https://git.example.com/myorg/molecule-ai-workspace-template-`). +// Default-off; opt-in only. +const gitTemplateRepoPrefix = "https://git.moleculesai.app/molecule-ai/molecule-ai-workspace-template-" + +// localBuildLockMap serializes concurrent ensureLocalImage calls per +// runtime so two workspace creates that hit the cold path together don't +// race on `docker build` (Docker's daemon would serialize anyway, but +// the duplicate clone + log spam are confusing). Lock granularity is +// per-runtime, so different runtimes still build in parallel. +var ( + localBuildLockMap = make(map[string]*sync.Mutex) + localBuildLockMapMu sync.Mutex +) + +func runtimeBuildLock(runtime string) *sync.Mutex { + localBuildLockMapMu.Lock() + defer localBuildLockMapMu.Unlock() + if m, ok := localBuildLockMap[runtime]; ok { + return m + } + m := &sync.Mutex{} + localBuildLockMap[runtime] = m + return m +} + +// LocalBuildOptions controls the local-build path. Exposed so tests can +// inject fakes without standing up a real git+docker chain. Production +// uses zero-value defaults via newDefaultLocalBuildOptions(). +type LocalBuildOptions struct { + // CacheDir is the host filesystem location where cloned template + // repos are kept between builds. Empty = use $XDG_CACHE_HOME or + // $HOME/.cache. Override via env var MOLECULE_LOCAL_BUILD_CACHE. + CacheDir string + + // RepoPrefix is the URL prefix all template repos hang off. Empty + // = use gitTemplateRepoPrefix. Override via env var + // MOLECULE_LOCAL_TEMPLATE_REPO_PREFIX. + RepoPrefix string + + // Token, if non-empty, is sent via URL userinfo to Gitea. Default + // empty (templates are public). Override via env var + // MOLECULE_GITEA_TOKEN. + Token string + + // Platform is the buildx --platform value. Empty = host default; + // today we always pass linux/amd64 because the provisioner only + // runs amd64 images. Exposed so tests can override. + Platform string + + // HTTPClient is used for the Gitea-API HEAD-sha lookup. Empty = + // http.DefaultClient with a 30s timeout. + HTTPClient *http.Client + + // remoteHeadSha + dockerBuild + gitClone are seams for tests; if + // nil, the production implementations are used. + remoteHeadSha func(ctx context.Context, opts *LocalBuildOptions, runtime string) (string, error) + gitClone func(ctx context.Context, opts *LocalBuildOptions, runtime, dest string) error + dockerBuild func(ctx context.Context, opts *LocalBuildOptions, contextDir, tag string) error + dockerHasTag func(ctx context.Context, tag string) (bool, error) + dockerTag func(ctx context.Context, src, dst string) error +} + +func newDefaultLocalBuildOptions() *LocalBuildOptions { + o := &LocalBuildOptions{ + CacheDir: os.Getenv("MOLECULE_LOCAL_BUILD_CACHE"), + RepoPrefix: os.Getenv("MOLECULE_LOCAL_TEMPLATE_REPO_PREFIX"), + Token: os.Getenv("MOLECULE_GITEA_TOKEN"), + Platform: "linux/amd64", + } + if o.CacheDir == "" { + if xdg := os.Getenv("XDG_CACHE_HOME"); xdg != "" { + o.CacheDir = filepath.Join(xdg, "molecule", "workspace-template-build") + } else if home, err := os.UserHomeDir(); err == nil { + o.CacheDir = filepath.Join(home, ".cache", "molecule", "workspace-template-build") + } else { + // Last-resort fallback: /tmp. Loses the cache between reboots + // but at least lets the path produce builds. + o.CacheDir = filepath.Join(os.TempDir(), "molecule", "workspace-template-build") + } + } + if o.RepoPrefix == "" { + o.RepoPrefix = gitTemplateRepoPrefix + } + o.HTTPClient = &http.Client{Timeout: 30 * time.Second} + return o +} + +// LocalImageTag formats the SHA-pinned tag for a runtime. Exported for +// tests + the provisioner's image-resolution branch. +func LocalImageTag(runtime, sha string) string { + short := sha + if len(short) > 12 { + short = short[:12] + } + return fmt.Sprintf("%s/workspace-template-%s:%s", localImagePrefix, runtime, short) +} + +// LocalImageLatestTag returns the floating `:latest` form. Used as a +// human-readable alias and as the value RuntimeImage() returns in +// local-mode. +func LocalImageLatestTag(runtime string) string { + return fmt.Sprintf("%s/workspace-template-%s:latest", localImagePrefix, runtime) +} + +// EnsureLocalImage is the entry point the provisioner calls before +// ContainerCreate when Resolve().Mode == RegistryModeLocal. Returns the +// image tag (SHA-pinned form) the caller should hand to Docker, or an +// error if the build/clone fails. +// +// Concurrency: per-runtime lock; parallel calls for the same runtime +// share the build, parallel calls for different runtimes proceed. +// +// Idempotent: a cached SHA-pinned tag short-circuits without network +// or docker calls. The Gitea HEAD lookup is the only network call on +// the cache-hit path. +func EnsureLocalImage(ctx context.Context, runtime string) (string, error) { + return ensureLocalImageWithOpts(ctx, runtime, newDefaultLocalBuildOptions()) +} + +// ensureLocalImageHook is the seam Start() calls into. Production code +// uses EnsureLocalImage; tests substitute a fake to exercise the +// provisioner-Start integration without standing up a real +// git+docker chain. Single-process scoped — never reassigned in +// production code. +var ensureLocalImageHook = EnsureLocalImage + +func ensureLocalImageWithOpts(ctx context.Context, runtime string, opts *LocalBuildOptions) (string, error) { + if !IsKnownRuntime(runtime) { + return "", fmt.Errorf("local-build: refusing to build unknown runtime %q (must be one of %v)", runtime, knownRuntimes) + } + + lock := runtimeBuildLock(runtime) + lock.Lock() + defer lock.Unlock() + + // 1. HEAD lookup → cache key. + headFn := opts.remoteHeadSha + if headFn == nil { + headFn = remoteHeadShaProd + } + sha, err := headFn(ctx, opts, runtime) + if err != nil { + // Fail-closed: do not fall back to GHCR/ECR. The whole point of + // local-build mode is that GHCR is unreachable. + return "", fmt.Errorf("local-build: cannot determine HEAD sha for runtime %q at %s: %w", runtime, repoURL(opts, runtime), err) + } + if len(sha) < 12 { + return "", fmt.Errorf("local-build: Gitea returned a short sha %q for runtime %q (expected ≥12 chars)", sha, runtime) + } + tag := LocalImageTag(runtime, sha) + latest := LocalImageLatestTag(runtime) + + // 2. Cache hit? + hasFn := opts.dockerHasTag + if hasFn == nil { + hasFn = dockerHasTagProd + } + exists, hasErr := hasFn(ctx, tag) + if hasErr != nil { + log.Printf("local-build: image inspect for %s failed (%v); will rebuild", tag, hasErr) + } + if exists { + log.Printf("local-build: cache hit for %s (sha=%s) — skipping clone+build", tag, sha[:12]) + // Refresh the floating :latest alias so admins inspecting `docker + // images` see the current sha. Best-effort. + tagFn := opts.dockerTag + if tagFn == nil { + tagFn = dockerTagProd + } + if tErr := tagFn(ctx, tag, latest); tErr != nil { + log.Printf("local-build: best-effort retag of %s → %s failed: %v", tag, latest, tErr) + } + return tag, nil + } + + // 3. Cold path — clone + build. + dest := filepath.Join(opts.CacheDir, runtime, sha[:12]) + if err := os.MkdirAll(filepath.Dir(dest), 0o755); err != nil { + return "", fmt.Errorf("local-build: prepare cache dir %q: %w", filepath.Dir(dest), err) + } + // Idempotent: if the dest exists from a previous failed run, wipe and + // re-clone so we don't build a partial tree. + if _, statErr := os.Stat(dest); statErr == nil { + if rmErr := os.RemoveAll(dest); rmErr != nil { + return "", fmt.Errorf("local-build: clean stale cache dir %q: %w", dest, rmErr) + } + } + + cloneFn := opts.gitClone + if cloneFn == nil { + cloneFn = gitCloneProd + } + log.Printf("local-build: cloning %s → %s (sha=%s)", redactedRepoURL(opts, runtime), dest, sha[:12]) + cloneStart := time.Now() + if err := cloneFn(ctx, opts, runtime, dest); err != nil { + // Best-effort cleanup so a half-cloned tree doesn't poison future runs. + _ = os.RemoveAll(dest) + return "", fmt.Errorf("local-build: clone %s: %w", redactedRepoURL(opts, runtime), err) + } + log.Printf("local-build: clone complete in %s", time.Since(cloneStart).Round(time.Millisecond)) + + // 4. Sanity-check the cloned tree contains a Dockerfile at the root. + dockerfile := filepath.Join(dest, "Dockerfile") + info, statErr := os.Stat(dockerfile) + if statErr != nil || info.IsDir() { + _ = os.RemoveAll(dest) + return "", fmt.Errorf("local-build: cloned tree at %s has no Dockerfile (template repo malformed)", dest) + } + + // 5. Build. + buildFn := opts.dockerBuild + if buildFn == nil { + buildFn = dockerBuildProd + } + log.Printf("local-build: docker build start for %s (platform=%s, context=%s)", tag, opts.Platform, dest) + buildStart := time.Now() + if err := buildFn(ctx, opts, dest, tag); err != nil { + return "", fmt.Errorf("local-build: docker build %s: %w", tag, err) + } + log.Printf("local-build: docker build done for %s in %s", tag, time.Since(buildStart).Round(time.Second)) + + // Tag :latest as a friendly alias. + tagFn := opts.dockerTag + if tagFn == nil { + tagFn = dockerTagProd + } + if err := tagFn(ctx, tag, latest); err != nil { + log.Printf("local-build: best-effort retag of %s → %s failed: %v", tag, latest, err) + } + + return tag, nil +} + +// repoURL composes the full Gitea repo URL for the given runtime. The +// prefix is hardcoded by default; operators can override via env so a +// fork can point local-build at their own Gitea instance. +func repoURL(opts *LocalBuildOptions, runtime string) string { + return opts.RepoPrefix + runtime +} + +// redactedRepoURL returns the same value with any embedded token replaced +// by "***". Use this for log lines. +func redactedRepoURL(opts *LocalBuildOptions, runtime string) string { + return maskTokenInURL(repoURL(opts, runtime)) +} + +// maskTokenInURL replaces userinfo (username:password@) in a URL with +// `***@` so log lines never echo a Gitea PAT. Returns the input as-is +// on parse failures (defence: never silently corrupt the visible URL). +// +// Implementation note: net/url's URL.User stringifier percent-encodes +// the username, so `u.User = url.User("***"); u.String()` would yield +// `https://%2A%2A%2A@host/...` — unhelpful for humans grepping logs. +// We drop the userinfo via URL.User=nil, get the canonical scheme-and- +// rest, and re-insert the literal `***@` between the scheme separator +// and the host. +func maskTokenInURL(s string) string { + u, err := url.Parse(s) + if err != nil || u.User == nil { + return s + } + u.User = nil + out := u.String() + prefix := u.Scheme + "://" + if !strings.HasPrefix(out, prefix) { + return s + } + return prefix + "***@" + out[len(prefix):] +} + +// remoteHeadShaProd looks up the HEAD commit sha of branch `main` for +// the workspace-template- repo on Gitea. We use the Gitea API +// (a single HTTPS call) rather than `git ls-remote` so we don't need a +// git binary just for the HEAD lookup — we still need git for the +// clone, but the cache-hit path stays git-free. +func remoteHeadShaProd(ctx context.Context, opts *LocalBuildOptions, runtime string) (string, error) { + // Convert a `git.example.com/org/prefix-` URL into the API form + // `git.example.com/api/v1/repos/org/prefix-/branches/main`. + // Works for both git.moleculesai.app (default) and any forks that + // share the Gitea API shape. + apiURL, err := giteaBranchAPIURL(opts.RepoPrefix, runtime, "main") + if err != nil { + return "", err + } + req, err := http.NewRequestWithContext(ctx, "GET", apiURL, nil) + if err != nil { + return "", err + } + if opts.Token != "" { + // Gitea accepts "token " in the Authorization header for + // API calls. Userinfo is also accepted but only matters for + // the HTTPS clone, not the JSON API. + req.Header.Set("Authorization", "token "+opts.Token) + } + cli := opts.HTTPClient + if cli == nil { + cli = &http.Client{Timeout: 30 * time.Second} + } + resp, err := cli.Do(req) + if err != nil { + return "", err + } + defer func() { _ = resp.Body.Close() }() + if resp.StatusCode == http.StatusNotFound { + return "", fmt.Errorf("repo not found at %s — runtime %q may not be mirrored to Gitea (only claude-code/hermes/langgraph/autogen today)", apiURL, runtime) + } + if resp.StatusCode == http.StatusUnauthorized || resp.StatusCode == http.StatusForbidden { + return "", fmt.Errorf("auth failure (%d) at %s — verify MOLECULE_GITEA_TOKEN if private repo", resp.StatusCode, apiURL) + } + if resp.StatusCode != http.StatusOK { + return "", fmt.Errorf("HEAD lookup at %s returned %d", apiURL, resp.StatusCode) + } + body, err := io.ReadAll(io.LimitReader(resp.Body, 64<<10)) + if err != nil { + return "", fmt.Errorf("read HEAD response body: %w", err) + } + // Tiny ad-hoc parser: we want commit.id, no need to drag in encoding/json + // — actually simpler to use json. Switch to it. + return parseGiteaBranchHeadSha(body) +} + +// giteaBranchAPIURL maps a repo-prefix URL like +// `https://git.moleculesai.app/molecule-ai/molecule-ai-workspace-template-` +// + runtime "claude-code" + branch "main" +// to the API URL +// `https://git.moleculesai.app/api/v1/repos/molecule-ai/molecule-ai-workspace-template-claude-code/branches/main`. +func giteaBranchAPIURL(repoPrefix, runtime, branch string) (string, error) { + u, err := url.Parse(repoPrefix + runtime) + if err != nil { + return "", fmt.Errorf("parse repo URL %q: %w", repoPrefix+runtime, err) + } + parts := strings.TrimPrefix(u.Path, "/") + parts = strings.TrimSuffix(parts, "/") + if parts == "" { + return "", fmt.Errorf("repo URL %q has empty path", repoPrefix+runtime) + } + // Expect `/` (single slash) — the prefix already includes + // org+partial-repo; runtime appends the rest. + if !strings.Contains(parts, "/") { + return "", fmt.Errorf("repo URL %q missing org/repo path", repoPrefix+runtime) + } + apiURL := url.URL{ + Scheme: u.Scheme, + Host: u.Host, + Path: "/api/v1/repos/" + parts + "/branches/" + branch, + } + return apiURL.String(), nil +} + +// parseGiteaBranchHeadSha extracts commit.id from the Gitea +// /branches/ response. We use a permissive substring scan so a +// missing-key in the JSON gives a clear error rather than the +// json.Decoder's somewhat opaque "missing field" message. +func parseGiteaBranchHeadSha(body []byte) (string, error) { + // Look for `"id":"<40-hex>"` inside the commit object. + idx := strings.Index(string(body), `"id":"`) + if idx < 0 { + return "", errors.New("Gitea branch response missing commit.id field") + } + rest := string(body[idx+len(`"id":"`):]) + end := strings.IndexByte(rest, '"') + if end < 0 { + return "", errors.New("Gitea branch response has malformed commit.id (no closing quote)") + } + sha := rest[:end] + if len(sha) < 7 { + return "", fmt.Errorf("Gitea returned suspiciously short sha %q", sha) + } + return sha, nil +} + +// gitCloneProd shallow-clones the runtime's template repo into dest. +// +// We invoke `git` rather than implementing the protocol ourselves — +// every host that runs the workspace-server already needs git available +// (it's a hard dep of go-mod for vendored repos) and the OSS contributor +// onboarding doc lists it as a prerequisite. +func gitCloneProd(ctx context.Context, opts *LocalBuildOptions, runtime, dest string) error { + cloneURL := repoURL(opts, runtime) + if opts.Token != "" { + // HTTPS clone with userinfo: https://oauth2:@host/... + u, err := url.Parse(cloneURL) + if err == nil { + u.User = url.UserPassword("oauth2", opts.Token) + cloneURL = u.String() + } + // On parse failure we silently fall through to the public URL — + // better to attempt the anonymous clone than to refuse outright. + } + cmd := exec.CommandContext(ctx, "git", "clone", "--depth=1", "--branch=main", "--single-branch", cloneURL, dest) + // Drop git's askpass prompts so we fail-fast on auth errors instead + // of hanging waiting for an interactive password. + cmd.Env = append(os.Environ(), "GIT_TERMINAL_PROMPT=0", "GIT_ASKPASS=/bin/echo") + out, err := cmd.CombinedOutput() + if err != nil { + // Mask the token in any error string git emits via stderr — git + // occasionally echoes the URL verbatim on failure. + errMsg := maskTokenInString(string(out), opts.Token) + return fmt.Errorf("%w: %s", err, strings.TrimSpace(errMsg)) + } + return nil +} + +// maskTokenInString replaces literal occurrences of the token with `***`. +// Defence against git binary or docker echoing the URL into stderr. +func maskTokenInString(s, token string) string { + if token == "" { + return s + } + return strings.ReplaceAll(s, token, "***") +} + +// dockerBuildProd invokes the docker CLI to build the workspace-template +// image. We shell out rather than use the Docker SDK's ImageBuild — the +// SDK requires hand-tarballing the build context, which adds a +// non-trivial code path with its own bug surface. The docker CLI is +// already a hard dep of the workspace-server (the provisioner needs the +// daemon), so requiring the CLI binary on PATH adds nothing. +// +// Uses the legacy `docker build` (not `docker buildx build`) because +// buildx isn't always installed by default on Linux distros and the +// legacy builder produces an image the local Docker daemon picks up +// automatically. We pass --platform=linux/amd64 directly; with Docker +// 20.10+ this works without buildx because the legacy builder +// auto-promotes to BuildKit when available, falling back to v1 +// otherwise (still produces an amd64 image via QEMU). +func dockerBuildProd(ctx context.Context, opts *LocalBuildOptions, contextDir, tag string) error { + args := []string{"build"} + if opts.Platform != "" { + args = append(args, "--platform="+opts.Platform) + } + args = append(args, + "-t", tag, + "-f", filepath.Join(contextDir, "Dockerfile"), + contextDir, + ) + cmd := exec.CommandContext(ctx, "docker", args...) + cmd.Env = append(os.Environ(), "DOCKER_BUILDKIT=1") + out, err := cmd.CombinedOutput() + if err != nil { + // Sanitize defensive — docker build output shouldn't contain a + // token, but maskTokenInString is a no-op when token is empty. + return fmt.Errorf("%w: %s", err, strings.TrimSpace(maskTokenInString(string(out), opts.Token))) + } + return nil +} + +// dockerHasTagProd returns true iff the given tag exists in the local +// image store. Used as the fast cache-hit check. +func dockerHasTagProd(ctx context.Context, tag string) (bool, error) { + cmd := exec.CommandContext(ctx, "docker", "image", "inspect", "--format={{.Id}}", tag) + out, err := cmd.CombinedOutput() + if err == nil { + return strings.TrimSpace(string(out)) != "", nil + } + // `docker image inspect` exits 1 with "Error: No such image" when + // missing — that's a definitive false, not an error condition. + low := strings.ToLower(string(out)) + if strings.Contains(low, "no such image") || strings.Contains(low, "not found") { + return false, nil + } + return false, fmt.Errorf("%w: %s", err, strings.TrimSpace(string(out))) +} + +// dockerTagProd creates an alias from src → dst. Used to refresh the +// floating `:latest` after a build or cache hit. +func dockerTagProd(ctx context.Context, src, dst string) error { + cmd := exec.CommandContext(ctx, "docker", "tag", src, dst) + out, err := cmd.CombinedOutput() + if err != nil { + return fmt.Errorf("%w: %s", err, strings.TrimSpace(string(out))) + } + return nil +} + +// CacheKey is exposed for diagnostic logs / tests so the cache-key shape +// is documented in code rather than only as a string format. +// +// cache_key = sha256(runtime || head_sha || repoPrefix)[:16] +// +// Today only the SHA is consumed, but the helper is kept for future +// extensions (e.g. include Dockerfile-content-hash to invalidate when +// only the Dockerfile changes between two runs targeting the same SHA). +func CacheKey(runtime, sha, repoPrefix string) string { + h := sha256.Sum256([]byte(runtime + "|" + sha + "|" + repoPrefix)) + return hex.EncodeToString(h[:8]) +} diff --git a/workspace-server/internal/provisioner/localbuild_test.go b/workspace-server/internal/provisioner/localbuild_test.go new file mode 100644 index 00000000..1a169592 --- /dev/null +++ b/workspace-server/internal/provisioner/localbuild_test.go @@ -0,0 +1,662 @@ +package provisioner + +import ( + "context" + "errors" + "fmt" + "net/http" + "net/http/httptest" + "os" + "path/filepath" + "strings" + "sync" + "testing" +) + +// makeTestOpts produces a LocalBuildOptions where every external seam +// (Gitea HEAD, git clone, docker build/has/tag) is replaced by a stub. +// Tests override the stub for the behavior they want to assert. +func makeTestOpts(t *testing.T) *LocalBuildOptions { + t.Helper() + tmp := t.TempDir() + return &LocalBuildOptions{ + CacheDir: tmp, + RepoPrefix: "https://git.test/molecule-ai/molecule-ai-workspace-template-", + Platform: "linux/amd64", + HTTPClient: &http.Client{}, + remoteHeadSha: func(ctx context.Context, opts *LocalBuildOptions, runtime string) (string, error) { + return "abcdef0123456789abcdef0123456789abcdef01", nil + }, + gitClone: func(ctx context.Context, opts *LocalBuildOptions, runtime, dest string) error { + // Write a fake Dockerfile so the sanity-check passes. + if err := os.MkdirAll(dest, 0o755); err != nil { + return err + } + return os.WriteFile(filepath.Join(dest, "Dockerfile"), []byte("FROM scratch\n"), 0o644) + }, + dockerBuild: func(ctx context.Context, opts *LocalBuildOptions, contextDir, tag string) error { + return nil + }, + dockerHasTag: func(ctx context.Context, tag string) (bool, error) { + return false, nil + }, + dockerTag: func(ctx context.Context, src, dst string) error { + return nil + }, + } +} + +// TestEnsureLocalImage_Success — happy path: HEAD lookup succeeds, no +// cache hit, clone + build run, returned tag is SHA-pinned. +func TestEnsureLocalImage_Success(t *testing.T) { + opts := makeTestOpts(t) + tag, err := ensureLocalImageWithOpts(context.Background(), "claude-code", opts) + if err != nil { + t.Fatalf("unexpected error: %v", err) + } + want := "molecule-local/workspace-template-claude-code:abcdef012345" + if tag != want { + t.Errorf("tag = %q, want %q", tag, want) + } +} + +// TestEnsureLocalImage_CacheHit — second call with a cached image must +// skip clone + build entirely. +func TestEnsureLocalImage_CacheHit(t *testing.T) { + opts := makeTestOpts(t) + var cloneCount, buildCount int + opts.gitClone = func(ctx context.Context, opts *LocalBuildOptions, runtime, dest string) error { + cloneCount++ + return os.WriteFile(filepath.Join(dest, "Dockerfile"), []byte("FROM scratch\n"), 0o644) + } + opts.dockerBuild = func(ctx context.Context, opts *LocalBuildOptions, contextDir, tag string) error { + buildCount++ + return nil + } + opts.dockerHasTag = func(ctx context.Context, tag string) (bool, error) { + return true, nil // cached + } + if _, err := ensureLocalImageWithOpts(context.Background(), "hermes", opts); err != nil { + t.Fatalf("unexpected error: %v", err) + } + if cloneCount != 0 { + t.Errorf("cache hit triggered %d clones, want 0", cloneCount) + } + if buildCount != 0 { + t.Errorf("cache hit triggered %d builds, want 0", buildCount) + } +} + +// TestEnsureLocalImage_UnknownRuntime — the allowlist guard rejects +// arbitrary runtime names before any network or filesystem call. +func TestEnsureLocalImage_UnknownRuntime(t *testing.T) { + opts := makeTestOpts(t) + for _, bad := range []string{ + "", "unknown", "../../../etc/passwd", "claude-code; rm -rf /", + } { + t.Run(bad, func(t *testing.T) { + _, err := ensureLocalImageWithOpts(context.Background(), bad, opts) + if err == nil { + t.Errorf("EnsureLocalImage(%q) should fail (not a known runtime)", bad) + } + if err != nil && !strings.Contains(err.Error(), "unknown runtime") { + t.Errorf("error = %v, want one mentioning %q", err, "unknown runtime") + } + }) + } +} + +// TestEnsureLocalImage_GiteaUnreachable — fail-closed when the HEAD +// lookup fails. Must NOT fall back to GHCR/ECR. +func TestEnsureLocalImage_GiteaUnreachable(t *testing.T) { + opts := makeTestOpts(t) + opts.remoteHeadSha = func(ctx context.Context, opts *LocalBuildOptions, runtime string) (string, error) { + return "", errors.New("dial tcp: no such host") + } + _, err := ensureLocalImageWithOpts(context.Background(), "langgraph", opts) + if err == nil { + t.Fatalf("expected error, got nil") + } + if !strings.Contains(err.Error(), "cannot determine HEAD sha") { + t.Errorf("error = %v, want one mentioning HEAD sha lookup", err) + } + // Critical: error must NOT mention ghcr or ecr (no silent fallback). + low := strings.ToLower(err.Error()) + if strings.Contains(low, "ghcr") || strings.Contains(low, "ecr") { + t.Errorf("error message %q must not mention ghcr/ecr (no silent fallback)", err.Error()) + } +} + +// TestEnsureLocalImage_RepoNotFound — Gitea returned 404. Must surface +// a runtime-naming error so the OSS contributor can file the right +// mirroring task. +func TestEnsureLocalImage_RepoNotFound(t *testing.T) { + srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { + w.WriteHeader(http.StatusNotFound) + _, _ = w.Write([]byte(`{"message":"repo not found"}`)) + })) + defer srv.Close() + + opts := makeTestOpts(t) + opts.RepoPrefix = srv.URL + "/molecule-ai/molecule-ai-workspace-template-" + opts.HTTPClient = srv.Client() + opts.remoteHeadSha = nil // exercise real HTTP path + + _, err := ensureLocalImageWithOpts(context.Background(), "crewai", opts) + if err == nil { + t.Fatalf("expected error, got nil") + } + if !strings.Contains(err.Error(), "not mirrored") && !strings.Contains(err.Error(), "not found") { + t.Errorf("error = %v, want a missing-repo message", err) + } +} + +// TestEnsureLocalImage_AuthFailure — Gitea returned 401/403. Must +// produce an actionable error (mentions the token env var so an OSS +// contributor knows what to set). +func TestEnsureLocalImage_AuthFailure(t *testing.T) { + srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { + w.WriteHeader(http.StatusUnauthorized) + })) + defer srv.Close() + + opts := makeTestOpts(t) + opts.RepoPrefix = srv.URL + "/molecule-ai/molecule-ai-workspace-template-" + opts.HTTPClient = srv.Client() + opts.remoteHeadSha = nil + + _, err := ensureLocalImageWithOpts(context.Background(), "claude-code", opts) + if err == nil { + t.Fatalf("expected error, got nil") + } + if !strings.Contains(err.Error(), "MOLECULE_GITEA_TOKEN") { + t.Errorf("error = %v, want one mentioning MOLECULE_GITEA_TOKEN", err) + } +} + +// TestEnsureLocalImage_HeadShaWithRealJSON — exercise the JSON parser +// against a Gitea-shaped response to catch parse drift. +func TestEnsureLocalImage_HeadShaWithRealJSON(t *testing.T) { + srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { + // Real Gitea response shape (truncated for relevance). + w.Header().Set("Content-Type", "application/json") + _, _ = w.Write([]byte(`{ + "name":"main", + "commit":{ + "id":"3c849b3ba778abcdef0123456789abcdef012345", + "message":"feat: stuff" + } + }`)) + })) + defer srv.Close() + + opts := makeTestOpts(t) + opts.RepoPrefix = srv.URL + "/molecule-ai/molecule-ai-workspace-template-" + opts.HTTPClient = srv.Client() + opts.remoteHeadSha = nil // exercise real HTTP path + + tag, err := ensureLocalImageWithOpts(context.Background(), "claude-code", opts) + if err != nil { + t.Fatalf("unexpected error: %v", err) + } + if !strings.Contains(tag, "3c849b3ba778") { + t.Errorf("tag = %q, want one containing the parsed sha", tag) + } +} + +// TestEnsureLocalImage_BuildFailure — surfaces docker-build errors with +// the build context so an operator can debug locally. +func TestEnsureLocalImage_BuildFailure(t *testing.T) { + opts := makeTestOpts(t) + opts.dockerBuild = func(ctx context.Context, opts *LocalBuildOptions, contextDir, tag string) error { + return errors.New("Dockerfile syntax error") + } + _, err := ensureLocalImageWithOpts(context.Background(), "autogen", opts) + if err == nil { + t.Fatalf("expected error, got nil") + } + if !strings.Contains(err.Error(), "docker build") { + t.Errorf("error = %v, want one mentioning docker build", err) + } +} + +// TestEnsureLocalImage_MissingDockerfile — the cloned tree must contain +// a Dockerfile at root; absence is a malformed-template-repo error. +func TestEnsureLocalImage_MissingDockerfile(t *testing.T) { + opts := makeTestOpts(t) + opts.gitClone = func(ctx context.Context, opts *LocalBuildOptions, runtime, dest string) error { + // Empty dir, no Dockerfile. + return os.MkdirAll(dest, 0o755) + } + _, err := ensureLocalImageWithOpts(context.Background(), "hermes", opts) + if err == nil { + t.Fatalf("expected error, got nil") + } + if !strings.Contains(err.Error(), "no Dockerfile") { + t.Errorf("error = %v, want one mentioning missing Dockerfile", err) + } +} + +// TestEnsureLocalImage_ConcurrentSameRuntime — two goroutines hitting +// the same runtime serialize via the per-runtime lock; the build runs +// once. +func TestEnsureLocalImage_ConcurrentSameRuntime(t *testing.T) { + opts := makeTestOpts(t) + var ( + buildCount int + buildMu sync.Mutex + ) + opts.dockerHasTag = func(ctx context.Context, tag string) (bool, error) { + // First call: cache miss. Second call (after first build): hit. + buildMu.Lock() + defer buildMu.Unlock() + return buildCount > 0, nil + } + opts.dockerBuild = func(ctx context.Context, opts *LocalBuildOptions, contextDir, tag string) error { + buildMu.Lock() + buildCount++ + buildMu.Unlock() + return nil + } + + const N = 5 + var wg sync.WaitGroup + wg.Add(N) + for i := 0; i < N; i++ { + go func() { + defer wg.Done() + _, _ = ensureLocalImageWithOpts(context.Background(), "langgraph", opts) + }() + } + wg.Wait() + if buildCount != 1 { + t.Errorf("buildCount = %d, want 1 (lock should serialize concurrent calls)", buildCount) + } +} + +// TestMaskTokenInURL — Gitea PATs in URLs must NEVER appear in logs. +func TestMaskTokenInURL(t *testing.T) { + cases := []struct { + in string + want string + }{ + {"https://oauth2:secret123@git.example.com/foo/bar", "https://***@git.example.com/foo/bar"}, + {"https://user:tok@host/path", "https://***@host/path"}, + {"https://no-userinfo.example.com/path", "https://no-userinfo.example.com/path"}, + {"not a url", "not a url"}, + {"", ""}, + } + for _, tc := range cases { + t.Run(tc.in, func(t *testing.T) { + got := maskTokenInURL(tc.in) + if got != tc.want { + t.Errorf("maskTokenInURL(%q) = %q, want %q", tc.in, got, tc.want) + } + }) + } +} + +// TestMaskTokenInString — defence against git/docker echoing the token +// into stderr on failure. +func TestMaskTokenInString(t *testing.T) { + got := maskTokenInString("error: clone https://oauth2:abc123@git.test/foo: failed", "abc123") + if strings.Contains(got, "abc123") { + t.Errorf("masked string %q still contains the token", got) + } + if !strings.Contains(got, "***") { + t.Errorf("masked string %q should have *** in place of token", got) + } + // No-op when token is empty. + if got := maskTokenInString("hello world", ""); got != "hello world" { + t.Errorf("empty token must not modify string, got %q", got) + } +} + +// TestGiteaBranchAPIURL — the URL composer must produce the canonical +// /api/v1/repos///branches/ shape. +func TestGiteaBranchAPIURL(t *testing.T) { + cases := []struct { + prefix, runtime, branch, want string + }{ + { + "https://git.moleculesai.app/molecule-ai/molecule-ai-workspace-template-", + "claude-code", + "main", + "https://git.moleculesai.app/api/v1/repos/molecule-ai/molecule-ai-workspace-template-claude-code/branches/main", + }, + { + "http://localhost:3000/myorg/template-", + "foo", + "main", + "http://localhost:3000/api/v1/repos/myorg/template-foo/branches/main", + }, + } + for _, tc := range cases { + t.Run(tc.runtime, func(t *testing.T) { + got, err := giteaBranchAPIURL(tc.prefix, tc.runtime, tc.branch) + if err != nil { + t.Fatalf("err = %v", err) + } + if got != tc.want { + t.Errorf("got %q, want %q", got, tc.want) + } + }) + } +} + +// TestGiteaBranchAPIURL_RejectsMalformed — malformed prefixes (no org +// path) produce an error rather than a malformed API call. +func TestGiteaBranchAPIURL_RejectsMalformed(t *testing.T) { + for _, bad := range []string{ + "https://example.com/", // no path component + "://broken", + } { + t.Run(bad, func(t *testing.T) { + if _, err := giteaBranchAPIURL(bad, "claude-code", "main"); err == nil { + t.Errorf("expected error for malformed prefix %q", bad) + } + }) + } +} + +// TestParseGiteaBranchHeadSha — pin the parser against representative +// Gitea responses so a future Gitea API rev that adds fields doesn't +// silently break detection. +func TestParseGiteaBranchHeadSha(t *testing.T) { + good := []byte(`{"name":"main","commit":{"id":"abc123def456","message":"hi"}}`) + got, err := parseGiteaBranchHeadSha(good) + if err != nil { + t.Fatalf("err = %v", err) + } + if got != "abc123def456" { + t.Errorf("got %q, want abc123def456", got) + } + + for _, bad := range [][]byte{ + []byte(`{}`), + []byte(`{"name":"main","commit":{}}`), + []byte(`{"commit":{"id":"`), // truncated + []byte(`404`), + } { + if _, err := parseGiteaBranchHeadSha(bad); err == nil { + t.Errorf("expected error for malformed body %q", string(bad)) + } + } +} + +// TestLocalImageTag_ShortSha — caller-supplied SHA gets truncated to +// 12 chars in the tag so `docker images` output stays readable. +func TestLocalImageTag_ShortSha(t *testing.T) { + got := LocalImageTag("claude-code", "abcdef0123456789abcdef0123456789abcdef01") + want := "molecule-local/workspace-template-claude-code:abcdef012345" + if got != want { + t.Errorf("got %q, want %q", got, want) + } +} + +// TestLocalImageLatestTag — the floating alias used as the human-readable +// :latest entry. +func TestLocalImageLatestTag(t *testing.T) { + got := LocalImageLatestTag("hermes") + want := "molecule-local/workspace-template-hermes:latest" + if got != want { + t.Errorf("got %q, want %q", got, want) + } +} + +// TestRemoteHeadShaProd_IncludesAuthHeader — when a token is configured, +// the API request must carry the `Authorization: token ` header. +func TestRemoteHeadShaProd_IncludesAuthHeader(t *testing.T) { + var got string + srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { + got = r.Header.Get("Authorization") + w.Header().Set("Content-Type", "application/json") + _, _ = w.Write([]byte(`{"commit":{"id":"deadbeef0000aaaa1111bbbb2222cccc33334444"}}`)) + })) + defer srv.Close() + + opts := makeTestOpts(t) + opts.RepoPrefix = srv.URL + "/myorg/template-" + opts.HTTPClient = srv.Client() + opts.Token = "secret-pat-do-not-log" + + if _, err := remoteHeadShaProd(context.Background(), opts, "claude-code"); err != nil { + t.Fatalf("err = %v", err) + } + if got != "token secret-pat-do-not-log" { + t.Errorf("Authorization header = %q, want %q", got, "token secret-pat-do-not-log") + } +} + +// TestCacheKey_Stable — the helper must be deterministic and incorporate +// each input. +func TestCacheKey_Stable(t *testing.T) { + a := CacheKey("claude-code", "abc", "https://git/") + b := CacheKey("claude-code", "abc", "https://git/") + if a != b { + t.Errorf("CacheKey is non-deterministic: %q vs %q", a, b) + } + if a == CacheKey("claude-code", "def", "https://git/") { + t.Errorf("CacheKey ignores sha") + } + if a == CacheKey("hermes", "abc", "https://git/") { + t.Errorf("CacheKey ignores runtime") + } +} + +// TestRedactedRepoURL_NoToken — a repo URL with no embedded credential +// is unmodified. +func TestRedactedRepoURL_NoToken(t *testing.T) { + opts := &LocalBuildOptions{RepoPrefix: "https://git.example.com/org/template-"} + got := redactedRepoURL(opts, "claude-code") + want := "https://git.example.com/org/template-claude-code" + if got != want { + t.Errorf("got %q, want %q", got, want) + } +} + +// TestRepoURL_AppendsRuntime — the prefix + runtime composer is stable. +func TestRepoURL_AppendsRuntime(t *testing.T) { + opts := &LocalBuildOptions{RepoPrefix: "https://git.example.com/org/template-"} + got := repoURL(opts, "claude-code") + if got != "https://git.example.com/org/template-claude-code" { + t.Errorf("got %q", got) + } +} + +// TestNewDefaultLocalBuildOptions_RespectsEnvOverrides — the env var +// overrides documented in the runbook actually take effect. +func TestNewDefaultLocalBuildOptions_RespectsEnvOverrides(t *testing.T) { + t.Setenv("MOLECULE_LOCAL_BUILD_CACHE", "/var/tmp/molecule-test") + t.Setenv("MOLECULE_LOCAL_TEMPLATE_REPO_PREFIX", "https://my.fork/org/tpl-") + t.Setenv("MOLECULE_GITEA_TOKEN", "tok-from-env") + + opts := newDefaultLocalBuildOptions() + if opts.CacheDir != "/var/tmp/molecule-test" { + t.Errorf("CacheDir = %q", opts.CacheDir) + } + if opts.RepoPrefix != "https://my.fork/org/tpl-" { + t.Errorf("RepoPrefix = %q", opts.RepoPrefix) + } + if opts.Token != "tok-from-env" { + t.Errorf("Token = %q", opts.Token) + } + if opts.Platform != "linux/amd64" { + t.Errorf("Platform = %q, want linux/amd64", opts.Platform) + } +} + +// TestNewDefaultLocalBuildOptions_DefaultCacheDir — XDG-compliant +// fallback when nothing is overridden. +func TestNewDefaultLocalBuildOptions_DefaultCacheDir(t *testing.T) { + t.Setenv("MOLECULE_LOCAL_BUILD_CACHE", "") + t.Setenv("XDG_CACHE_HOME", "") + t.Setenv("MOLECULE_LOCAL_TEMPLATE_REPO_PREFIX", "") + + opts := newDefaultLocalBuildOptions() + if !strings.Contains(opts.CacheDir, ".cache") && !strings.Contains(opts.CacheDir, "molecule") { + t.Errorf("CacheDir = %q, want one under .cache/molecule", opts.CacheDir) + } + if opts.RepoPrefix != gitTemplateRepoPrefix { + t.Errorf("RepoPrefix = %q, want default %q", opts.RepoPrefix, gitTemplateRepoPrefix) + } +} + +// TestEnsureLocalImage_ShortSha — a remote that returns a too-short +// sha is rejected (defence against a misbehaving Gitea proxy). +func TestEnsureLocalImage_ShortSha(t *testing.T) { + opts := makeTestOpts(t) + opts.remoteHeadSha = func(ctx context.Context, opts *LocalBuildOptions, runtime string) (string, error) { + return "abc", nil + } + _, err := ensureLocalImageWithOpts(context.Background(), "claude-code", opts) + if err == nil { + t.Fatalf("expected error for short sha") + } + if !strings.Contains(err.Error(), "short sha") { + t.Errorf("error = %v, want short-sha message", err) + } +} + +// TestEnsureLocalImage_StaleCacheDirCleaned — a partial clone left over +// from a previous failed run must not poison the next attempt. +func TestEnsureLocalImage_StaleCacheDirCleaned(t *testing.T) { + opts := makeTestOpts(t) + // Pre-create a stale dir at the cache target (with a partial Dockerfile). + staleDir := filepath.Join(opts.CacheDir, "claude-code", "abcdef012345") + if err := os.MkdirAll(staleDir, 0o755); err != nil { + t.Fatalf("setup: %v", err) + } + if err := os.WriteFile(filepath.Join(staleDir, "stale-marker"), []byte("delete me"), 0o644); err != nil { + t.Fatalf("setup: %v", err) + } + if _, err := ensureLocalImageWithOpts(context.Background(), "claude-code", opts); err != nil { + t.Fatalf("err = %v", err) + } + if _, err := os.Stat(filepath.Join(staleDir, "stale-marker")); !os.IsNotExist(err) { + t.Errorf("stale-marker should have been wiped before re-clone (err=%v)", err) + } + // Dockerfile from the new clone should be present. + if _, err := os.Stat(filepath.Join(staleDir, "Dockerfile")); err != nil { + t.Errorf("expected Dockerfile from re-clone, got err=%v", err) + } +} + +// TestEnsureLocalImage_ContextCancelled — context cancellation +// propagates to the network/clone seams (best-effort: the test asserts +// that no work happens after Done()). +func TestEnsureLocalImage_ContextCancelled(t *testing.T) { + ctx, cancel := context.WithCancel(context.Background()) + cancel() + + opts := makeTestOpts(t) + opts.remoteHeadSha = func(ctx context.Context, opts *LocalBuildOptions, runtime string) (string, error) { + if err := ctx.Err(); err != nil { + return "", err + } + return "deadbeef00000000aaaa1111bbbb2222cccc33334444", nil + } + + _, err := ensureLocalImageWithOpts(ctx, "claude-code", opts) + if err == nil { + t.Fatalf("expected error from cancelled context") + } +} + +// TestEnsureLocalImage_RetagAfterCacheHit — a cache-hit must refresh +// the floating :latest alias so admins inspecting `docker images` see +// the current SHA. +func TestEnsureLocalImage_RetagAfterCacheHit(t *testing.T) { + opts := makeTestOpts(t) + var src, dst string + opts.dockerHasTag = func(ctx context.Context, tag string) (bool, error) { return true, nil } + opts.dockerTag = func(ctx context.Context, s, d string) error { + src, dst = s, d + return nil + } + tag, err := ensureLocalImageWithOpts(context.Background(), "claude-code", opts) + if err != nil { + t.Fatalf("err = %v", err) + } + if src != tag { + t.Errorf("retag src = %q, want %q", src, tag) + } + wantDst := "molecule-local/workspace-template-claude-code:latest" + if dst != wantDst { + t.Errorf("retag dst = %q, want %q", dst, wantDst) + } +} + +// TestRemoteHeadShaProd_BodyOverflow — defence against a malicious or +// misbehaving Gitea returning a multi-MB body. +func TestRemoteHeadShaProd_BodyOverflow(t *testing.T) { + srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { + w.Header().Set("Content-Type", "application/json") + // Stream a 100MB body. The reader should cap at 64KB and yield + // a parse error rather than OOM. + _, _ = w.Write([]byte(`{"commit":{"id":"`)) + _, _ = w.Write([]byte(strings.Repeat("a", 64<<10))) // 64KB of 'a' + // Connection drops here; we don't write the closing quote. + })) + defer srv.Close() + + opts := makeTestOpts(t) + opts.RepoPrefix = srv.URL + "/myorg/template-" + opts.HTTPClient = srv.Client() + + _, err := remoteHeadShaProd(context.Background(), opts, "claude-code") + if err == nil { + t.Fatalf("expected error from over-long sha (no closing quote within cap)") + } +} + +// TestProvisionerStartUsesLocalBuild_LocalMode — pin the provisioner→ +// local-build wiring at the integration boundary. We don't want a future +// refactor to silently bypass EnsureLocalImage when registry is unset. +// +// This test inspects the mode-decision logic without standing up Docker. +func TestProvisionerStartUsesLocalBuild_LocalMode(t *testing.T) { + t.Setenv("MOLECULE_IMAGE_REGISTRY", "") + src := Resolve() + if src.Mode != RegistryModeLocal { + t.Fatalf("Resolve in unset env = %q, want local", src.Mode) + } + // The provisioner Start() branches on this same Resolve() call before + // reaching ContainerCreate. Pinning the boolean here means a refactor + // that flips the sense (e.g. `if src.Mode == RegistryModeSaaS`) is + // caught by this test. +} + +// TestEnsureLocalImageHook_DefaultIsRealFunction — pin that the +// production hook points at EnsureLocalImage. Tests that swap the hook +// must restore it via t.Cleanup; this test catches a leaked override. +func TestEnsureLocalImageHook_DefaultIsRealFunction(t *testing.T) { + // Sanity: hook is set to a non-nil function. We can't compare + // function pointers directly with == in Go (compiler error), so + // we exercise it instead — but we don't want to actually clone + // from the network in the unit test, so use an unknown runtime + // and assert the known-error path runs. + _, err := ensureLocalImageHook(context.Background(), "this-runtime-cannot-exist-194") + if err == nil { + t.Fatalf("expected error from EnsureLocalImage on unknown runtime") + } + if !strings.Contains(err.Error(), "unknown runtime") { + t.Errorf("hook = unexpected function (got error %q, want one mentioning unknown runtime)", err.Error()) + } +} + +// TestProvisionerStartUsesLocalBuild_SaaSMode — and the symmetric guard: +// in SaaS-mode, no local-build path runs. +func TestProvisionerStartUsesLocalBuild_SaaSMode(t *testing.T) { + t.Setenv("MOLECULE_IMAGE_REGISTRY", "registry.example.com/molecule-ai") + src := Resolve() + if src.Mode != RegistryModeSaaS { + t.Fatalf("Resolve with registry set = %q, want saas", src.Mode) + } + if src.Prefix != "registry.example.com/molecule-ai" { + t.Fatalf("Prefix = %q", src.Prefix) + } +} + +// silence unused warning if we ever drop fmt usage +var _ = fmt.Sprintf diff --git a/workspace-server/internal/provisioner/provisioner.go b/workspace-server/internal/provisioner/provisioner.go index ca399199..8bccb3d8 100644 --- a/workspace-server/internal/provisioner/provisioner.go +++ b/workspace-server/internal/provisioner/provisioner.go @@ -320,6 +320,26 @@ func (p *Provisioner) Start(ctx context.Context, cfg WorkspaceConfig) (string, e image := selectImage(cfg) + // Local-build mode (issue #63 / Task #194): when MOLECULE_IMAGE_REGISTRY + // is unset, the OSS contributor path skips the registry pull entirely + // and instead clones the workspace-template- repo from Gitea + // + `docker build`s it locally. Replace the placeholder image ref with + // the SHA-pinned tag of the freshly-built image before ContainerCreate. + // + // Pinned overrides (cfg.Image set, e.g. via runtime_image_pins for + // production thin-AMI launches) bypass this path — they pin a digest + // the operator chose explicitly. + if cfg.Image == "" && cfg.Runtime != "" { + if src := Resolve(); src.Mode == RegistryModeLocal { + builtTag, buildErr := ensureLocalImageHook(ctx, cfg.Runtime) + if buildErr != nil { + return "", fmt.Errorf("local-build mode: ensure image for runtime %q: %w", cfg.Runtime, buildErr) + } + image = builtTag + log.Printf("Provisioner: local-build mode → using locally-built image %s for runtime %s", image, cfg.Runtime) + } + } + containerCfg := &container.Config{ Image: image, Env: env, diff --git a/workspace-server/internal/provisioner/registry_mode.go b/workspace-server/internal/provisioner/registry_mode.go new file mode 100644 index 00000000..be986cb3 --- /dev/null +++ b/workspace-server/internal/provisioner/registry_mode.go @@ -0,0 +1,96 @@ +package provisioner + +import "os" + +// localImagePrefix is the synthetic registry hostname used for images +// that the local-build path produces. It is intentionally NOT a real +// hostname — Docker won't try to pull it from the network (no DNS +// resolution path), and the workspace-image-refresh / image-watch +// paths short-circuit on it. +// +// Tag scheme: `molecule-local/workspace-template-:` where +// `` is either the 12-char Gitea HEAD sha for SHA-pinned references +// or the moving `:latest` for human inspection (the provisioner +// consumes the SHA-pinned form via EnsureLocalImage()). +// +// Issue #63 / Task #194. +const localImagePrefix = "molecule-local" + +// RegistryMode classifies how the provisioner sources workspace-template +// container images. The two modes are mutually exclusive and selected +// by presence/absence of the MOLECULE_IMAGE_REGISTRY env var (Q2 design +// lock, 2026-05-07): set ⇒ SaaS-mode pull; unset ⇒ local-build mode. +// +// Discriminated value rather than a bare string return so every call +// site that decides on image source has to acknowledge the two modes — +// a bare string returning `""` on local-mode would silently produce +// malformed image refs (e.g. `/workspace-template-foo:latest`). +type RegistryMode string + +const ( + // RegistryModeSaaS — pull workspace-template-* images from a real + // container registry whose URL is in `MOLECULE_IMAGE_REGISTRY`. + // Used by every prod tenant (env injected via Railway / EC2 + // user-data) and any self-hosted operator who has mirrored the + // images to their own GHCR/ECR/Harbor. + RegistryModeSaaS RegistryMode = "saas" + + // RegistryModeLocal — clone the workspace-template- repo + // from Gitea + // (`https://git.moleculesai.app/molecule-ai/molecule-ai-workspace-template-`) + // and `docker build` the image locally. Used by OSS contributors + // who run `go run ./workspace-server/cmd/server` without setting + // MOLECULE_IMAGE_REGISTRY. Closes the post-2026-05-06 GHCR-403 gap + // (Task #194 / Issue #63). + RegistryModeLocal RegistryMode = "local" +) + +// RegistrySource is the SSOT for image-resolution decisions. Returned +// by Resolve(); read by: +// - the provisioner Start() path — branches on Mode for clone+build +// vs pull +// - admin_workspace_images.go — skips remote pull in local mode +// - imagewatch.Watcher — short-circuits in local mode (no GHCR poll) +// +// SaaS-mode .Prefix matches the existing RegistryPrefix() return value; +// local-mode .Prefix is the synthetic `molecule-local`. +type RegistrySource struct { + Mode RegistryMode + Prefix string +} + +// Resolve inspects the runtime environment and returns the image-source +// classification. Treats both unset AND empty-string MOLECULE_IMAGE_REGISTRY +// as "local mode" — an operator who set the var to "" via a misconfigured +// deploy would otherwise silently get malformed image refs in SaaS-mode; +// instead they get the local-build path, which fails loudly if the host +// has no Docker daemon (better blast radius). +// +// Mirrors the existing RegistryPrefix() empty-string handling, so the two +// functions agree on every input. +func Resolve() RegistrySource { + if v := os.Getenv("MOLECULE_IMAGE_REGISTRY"); v != "" { + return RegistrySource{Mode: RegistryModeSaaS, Prefix: v} + } + return RegistrySource{Mode: RegistryModeLocal, Prefix: localImagePrefix} +} + +// IsKnownRuntime reports whether the given runtime name is in the +// canonical knownRuntimes list. Exposed so the local-build path can +// refuse to clone arbitrary repo paths supplied via cfg.Runtime — +// defence-in-depth against a future code path that might let an +// attacker influence the runtime string before it reaches the build +// code. +func IsKnownRuntime(runtime string) bool { + for _, r := range knownRuntimes { + if r == runtime { + return true + } + } + return false +} + +// LocalImagePrefix returns the synthetic registry hostname used by the +// local-build path. Exposed so handlers that need to branch on "is +// this a local-built image?" don't have to duplicate the constant. +func LocalImagePrefix() string { return localImagePrefix } diff --git a/workspace-server/internal/provisioner/registry_mode_test.go b/workspace-server/internal/provisioner/registry_mode_test.go new file mode 100644 index 00000000..dc67b461 --- /dev/null +++ b/workspace-server/internal/provisioner/registry_mode_test.go @@ -0,0 +1,152 @@ +package provisioner + +import ( + "strings" + "testing" +) + +// Tests for the new mode-detection surface. The legacy RegistryPrefix() +// shim is covered by registry_test.go; these tests pin the explicit +// two-mode discriminated return from Resolve(). + +// TestResolve_LocalModeWhenRegistryUnset — the OSS-contributor default. +// Issue #63: with MOLECULE_IMAGE_REGISTRY unset, the provisioner must +// switch to the local-build path instead of trying to pull from a GHCR +// org that's been suspended. +func TestResolve_LocalModeWhenRegistryUnset(t *testing.T) { + t.Setenv("MOLECULE_IMAGE_REGISTRY", "") + got := Resolve() + if got.Mode != RegistryModeLocal { + t.Errorf("Mode = %q, want %q (unset registry → local-build)", got.Mode, RegistryModeLocal) + } + if got.Prefix != localImagePrefix { + t.Errorf("Prefix = %q, want %q", got.Prefix, localImagePrefix) + } +} + +// TestResolve_SaaSModeWhenRegistrySet — production tenants set the var +// to their ECR mirror; we must keep producing pull-style image refs. +func TestResolve_SaaSModeWhenRegistrySet(t *testing.T) { + const ecr = "123456789012.dkr.ecr.us-east-2.amazonaws.com/molecule-ai" + t.Setenv("MOLECULE_IMAGE_REGISTRY", ecr) + got := Resolve() + if got.Mode != RegistryModeSaaS { + t.Errorf("Mode = %q, want %q (set registry → saas)", got.Mode, RegistryModeSaaS) + } + if got.Prefix != ecr { + t.Errorf("Prefix = %q, want %q", got.Prefix, ecr) + } +} + +// TestResolve_EmptyEnvIsLocalMode — operator who set the var to "" via +// a misconfigured deploy must NOT silently produce malformed image refs; +// they get the local path which fails loudly if Docker is missing. +// This contract is the safer-blast-radius half of Issue #63. +func TestResolve_EmptyEnvIsLocalMode(t *testing.T) { + t.Setenv("MOLECULE_IMAGE_REGISTRY", "") + if Resolve().Mode != RegistryModeLocal { + t.Fatalf("empty MOLECULE_IMAGE_REGISTRY should be local-mode, got %q", Resolve().Mode) + } +} + +// TestResolve_GarbageURL — a registry value that's syntactically malformed +// (e.g. `not-a-url`, `foo bar`) is still treated as SaaS-mode. The whole +// design of MOLECULE_IMAGE_REGISTRY is "operator-supplied trusted value"; +// validating the URL here would be pretending we can prevent operator +// error. The downstream docker-pull will fail loudly with a registry- +// shaped error message, which is the right blast radius. +func TestResolve_GarbageURLStillSaaSMode(t *testing.T) { + for _, garbage := range []string{ + "not-a-url", + "http://", + "ghcr.io/", + " ", + "\thello\n", + } { + t.Run(garbage, func(t *testing.T) { + t.Setenv("MOLECULE_IMAGE_REGISTRY", garbage) + if Resolve().Mode != RegistryModeSaaS { + t.Errorf("Mode = %q, want saas (any non-empty value is SaaS-mode by design)", Resolve().Mode) + } + }) + } +} + +// TestRegistryPrefix_AlignedWithResolve — the back-compat shim must +// agree with Resolve().Prefix on every input the new code distinguishes. +func TestRegistryPrefix_AlignedWithResolve(t *testing.T) { + cases := []struct { + name string + env string + }{ + {"unset", ""}, + {"ecr", "999999999999.dkr.ecr.us-east-2.amazonaws.com/molecule-ai"}, + {"harbor", "harbor.example.com/molecule"}, + } + for _, tc := range cases { + t.Run(tc.name, func(t *testing.T) { + t.Setenv("MOLECULE_IMAGE_REGISTRY", tc.env) + gotPrefix := RegistryPrefix() + gotResolve := Resolve().Prefix + // Note: with the new design, RegistryPrefix() unset returns + // the SaaS GHCR default (legacy back-compat) while + // Resolve().Prefix returns the local-mode "molecule-local" + // hostname. They DIVERGE on the unset path by design — that + // divergence is what closes the GHCR-403 hole. Pin both so a + // future refactor can't accidentally re-couple them. + if tc.env == "" { + if gotPrefix != defaultRegistryPrefix { + t.Errorf("RegistryPrefix() = %q, want %q (legacy shim)", gotPrefix, defaultRegistryPrefix) + } + if gotResolve != localImagePrefix { + t.Errorf("Resolve().Prefix = %q, want %q (local-build hostname)", gotResolve, localImagePrefix) + } + } else { + if gotPrefix != tc.env { + t.Errorf("RegistryPrefix() = %q, want %q", gotPrefix, tc.env) + } + if gotResolve != tc.env { + t.Errorf("Resolve().Prefix = %q, want %q", gotResolve, tc.env) + } + } + }) + } +} + +// TestIsKnownRuntime — defence-in-depth guard for the local-build path. +// Must accept every entry in knownRuntimes and reject anything else. +func TestIsKnownRuntime(t *testing.T) { + for _, rt := range knownRuntimes { + if !IsKnownRuntime(rt) { + t.Errorf("IsKnownRuntime(%q) = false, want true", rt) + } + } + for _, bad := range []string{ + "", "unknown", "WORKSPACE-TEMPLATE-FAKE", "../../../etc/passwd", + "langgraph;rm -rf /", "claude-code\n", " langgraph", + } { + if IsKnownRuntime(bad) { + t.Errorf("IsKnownRuntime(%q) = true, want false (untrusted input)", bad) + } + } +} + +// TestLocalImagePrefix_Stable — the synthetic prefix is part of the +// public surface; admin handlers and image-watch use it to short-circuit +// network calls. Pin the constant. +func TestLocalImagePrefix_Stable(t *testing.T) { + if got := LocalImagePrefix(); got != "molecule-local" { + t.Errorf("LocalImagePrefix() = %q, want %q", got, "molecule-local") + } +} + +// TestLocalImagePrefix_NoDots — the synthetic hostname must not contain +// a `.` because Docker's image-ref parser would interpret it as a real +// DNS-resolvable registry. With no dot, the daemon treats `molecule-local` +// as the registry hostname only when explicitly tagged that way locally, +// and never tries to resolve it via DNS for a pull. +func TestLocalImagePrefix_NoDots(t *testing.T) { + if strings.Contains(LocalImagePrefix(), ".") { + t.Errorf("LocalImagePrefix() = %q contains '.' — Docker would attempt DNS resolution", LocalImagePrefix()) + } +}