molecule-core/workspace-server
cp-lead 12bb73d000 fix(dockerfile-tenant): chown /org-templates to canvas user so !external resolver can mkdir cache
Root cause:
  Dockerfile.tenant chowns /canvas /platform /memory-plugin /migrations
  to canvas:canvas (line ~119) but not /org-templates. The image is
  built as root, COPY-ed templates inherit root:root 0755. The platform
  binary then runs as the canvas user (uid 1000) because of the USER
  directive on line ~124, so when the !external resolver
  (org_external.go, internal#77 / task #222) tries
  os.MkdirAll("/org-templates/<tmpl>/.external-cache/<repo>") on first
  import, mkdir(2) returns EACCES and the import handler returns 400
  "org template expansion failed" (org.go:592). The user-facing error
  is generic; only the server log carries:

    Org import: refusing import: !include expansion failed:
    !external at line 156: fetch git.moleculesai.app/molecule-ai/molecule-dev-department@v1.0.0:
    mkdir cache root: mkdir /org-templates/molecule-dev/.external-cache: permission denied

Repro:
  Tenant staging-cplead-2 (canary AWS 004947743811, image SHA
  a93c4ce17725...). POST /org/import {"dir":"molecule-dev"} returns 400
  while POST /org/import {"dir":"free-beats-all"} returns 201 — only
  templates with !external trip the bug.

Fix:
  Add /org-templates to the chown -R argv. One-line change. Same
  ownership shape as the other writable platform-state dirs.

Why this is safe for prod:
  * The platform binary already needs read access to /org-templates,
    so canvas:canvas owning it doesn't widen any attack surface.
  * /org-templates is image-resident, not bind-mounted; chown applies
    inside the image layers and prod tenants get the fix on next
    image rebuild + redeploy. Live prod tenants are unaffected until
    the next deploy (no orgs currently using !external in prod —
    molecule-dev consumers are all internal staging).

Verification:
  After hand-applying the chown live (docker exec --user 0 ... chown -R
  canvas:canvas /org-templates/molecule-dev), POST /org/import
  {"dir":"molecule-dev"} returns 201 with 39 workspaces; cp-lead +
  CP-BE + CP-QA + CP-Security all reach status=online within ~2 min.

Refs:
  internal#77 — !external RFC (Phase 3a)
  task #222 — resolver PR (introduced the unflagged-permission
              dependency this fixes)
  Live incident 2026-05-10 — staging-cplead-2 import failed,
              chown-on-host workaround in place pending image rebuild
2026-05-09 19:40:52 -07:00
..
cmd docs(runbook): add admin-auth.md covering test-token route lockdown 2026-05-10 02:20:30 +00:00
internal docs: cycle report 2026-05-10 2026-05-10 01:15:07 +00:00
migrations feat(plugins): plugin drift detector + queue + admin apply endpoint (#123) 2026-05-10 00:39:50 +00:00
pkg/provisionhook
.air.toml feat(local-dev): air-based hot-reload for workspace-server 2026-05-08 08:10:50 -07:00
.ci-force
.gitignore feat(local-dev): containerize platform + canvas stack via docker-compose (closes #126) 2026-05-08 10:53:39 -07:00
.golangci.yaml
Dockerfile ci(docker): pin base image digests in all Dockerfiles 2026-05-09 23:56:39 +00:00
Dockerfile.dev ci(docker): pin base image digests in all Dockerfiles 2026-05-09 23:56:39 +00:00
Dockerfile.tenant fix(dockerfile-tenant): chown /org-templates to canvas user so !external resolver can mkdir cache 2026-05-09 19:40:52 -07:00
entrypoint-tenant.sh
go.mod fix(deps): migrate gh-identity from GitHub to Gitea module path 2026-05-09 22:50:45 +00:00
go.sum chore: drop github-app-auth + swap GHCR→ECR (closes #157, #161) 2026-05-07 07:48:51 -07:00