From 12bb73d0005a7037670c8dc9bc25d30b2352818f Mon Sep 17 00:00:00 2001 From: cp-lead Date: Sat, 9 May 2026 19:40:52 -0700 Subject: [PATCH] fix(dockerfile-tenant): chown /org-templates to canvas user so !external resolver can mkdir cache MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Root cause: Dockerfile.tenant chowns /canvas /platform /memory-plugin /migrations to canvas:canvas (line ~119) but not /org-templates. The image is built as root, COPY-ed templates inherit root:root 0755. The platform binary then runs as the canvas user (uid 1000) because of the USER directive on line ~124, so when the !external resolver (org_external.go, internal#77 / task #222) tries os.MkdirAll("/org-templates//.external-cache/") on first import, mkdir(2) returns EACCES and the import handler returns 400 "org template expansion failed" (org.go:592). The user-facing error is generic; only the server log carries: Org import: refusing import: !include expansion failed: !external at line 156: fetch git.moleculesai.app/molecule-ai/molecule-dev-department@v1.0.0: mkdir cache root: mkdir /org-templates/molecule-dev/.external-cache: permission denied Repro: Tenant staging-cplead-2 (canary AWS 004947743811, image SHA a93c4ce17725...). POST /org/import {"dir":"molecule-dev"} returns 400 while POST /org/import {"dir":"free-beats-all"} returns 201 — only templates with !external trip the bug. Fix: Add /org-templates to the chown -R argv. One-line change. Same ownership shape as the other writable platform-state dirs. Why this is safe for prod: * The platform binary already needs read access to /org-templates, so canvas:canvas owning it doesn't widen any attack surface. * /org-templates is image-resident, not bind-mounted; chown applies inside the image layers and prod tenants get the fix on next image rebuild + redeploy. Live prod tenants are unaffected until the next deploy (no orgs currently using !external in prod — molecule-dev consumers are all internal staging). Verification: After hand-applying the chown live (docker exec --user 0 ... chown -R canvas:canvas /org-templates/molecule-dev), POST /org/import {"dir":"molecule-dev"} returns 201 with 39 workspaces; cp-lead + CP-BE + CP-QA + CP-Security all reach status=online within ~2 min. Refs: internal#77 — !external RFC (Phase 3a) task #222 — resolver PR (introduced the unflagged-permission dependency this fixes) Live incident 2026-05-10 — staging-cplead-2 import failed, chown-on-host workaround in place pending image rebuild --- workspace-server/Dockerfile.tenant | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/workspace-server/Dockerfile.tenant b/workspace-server/Dockerfile.tenant index c7e039e0..1a560ed6 100644 --- a/workspace-server/Dockerfile.tenant +++ b/workspace-server/Dockerfile.tenant @@ -115,8 +115,16 @@ COPY --from=canvas-builder /canvas/.next/static ./.next/static COPY --from=canvas-builder /canvas/public ./public COPY workspace-server/entrypoint-tenant.sh /entrypoint.sh +# /org-templates must be writable by the canvas user — the !external +# resolver mkdirs /.external-cache/// on first +# import to cache cross-repo subtree fetches (org_external.go, +# internal#77 / task #222). Without this chown the resolver fails with +# "mkdir cache root: permission denied" and POST /org/import returns +# 400 "org template expansion failed" for any template that uses +# !external (e.g. molecule-dev → dev-lead). Caught on staging-cplead-2 +# 2026-05-10 — see internal incident debrief. RUN chmod +x /entrypoint.sh && \ - chown -R canvas:canvas /canvas /platform /memory-plugin /migrations + chown -R canvas:canvas /canvas /platform /memory-plugin /migrations /org-templates EXPOSE 8080 # entrypoint.sh starts as root to fix volume perms, then drops to