Closes the last CP-provisioned-workspace gap: Terminal tab now works
for workspaces running on separate EC2 instances. Follow-up to
#1531 which added instance_id persistence.
How it works:
- HandleConnect checks workspaces.instance_id
- Empty → existing local Docker path (unchanged)
- Set → spawn `aws ec2-instance-connect ssh --connection-type eice
--instance-id X --os-user ec2-user -- docker exec -it ws-Y
/bin/bash` under creack/pty, bridge pty ↔ canvas WebSocket
Why subprocess AWS CLI instead of native AWS SDK:
- EIC Endpoint tunnel needs a signed WebSocket with specific framing
- aws-cli v2 implements it correctly; reimplementing in Go is ~500
lines of crypto + WS protocol work for zero user-visible benefit
- Tenant image picks up 1MB of aws-cli + openssh-client via apk
Handler design:
- sshCommandFactory is a var so tests can stub it (no real aws calls)
- Context cancellation propagates both ways (WS close → kill ssh;
ssh exit → close WS)
- User-visible error points at docs/infra/workspace-terminal.md when
EIC wiring is incomplete (common bootstrap failure)
Tests:
- TestHandleConnect_RoutesToRemote — instance_id in DB → CP branch
- TestHandleConnect_RoutesToLocal — empty instance_id → local branch
- TestSshCommandFactory_BuildsEICCommand — argv shape regression guard
Dockerfile.tenant: + openssh-client + aws-cli (Alpine main repo)
Refs: #1528, #1531
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
node:20-alpine ships with a `node` user at uid/gid 1000. The Dockerfile
tried `addgroup -g 1000 canvas` which fails with exit 1 because 1000
is already taken. Publish-workspace-server-image workflow has been
red for hours — tenant image :latest stuck on a digest that predates
the X-Molecule-Admin-Token CPProvisioner fix. Staging workspace
provisioning 401'd because the stale tenant binary never sent the
admin header.
Delete node user+group first (tolerant of future base-image changes
that might not ship it), then create canvas at 1000/1000 as before.
Mounted volumes continue to expect uid 1000.
Repro: publish-workspace-server-image workflow run 24731870797:
"process addgroup -g 1000 canvas && adduser... exit code: 1".
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closes: #177 (CRITICAL — Dockerfile runs as root)
Dockerfiles changed:
- workspace-server/Dockerfile (platform-only): addgroup/adduser + USER platform
- workspace-server/Dockerfile.tenant (combined Go+Canvas): addgroup/adduser + USER canvas
+ chown canvas:canvas on canvas dir so non-root node process can read it
- canvas/Dockerfile (canvas standalone): addgroup/adduser + USER canvas
- workspace-server/entrypoint-tenant.sh: update header comment (no longer starts
as root; both processes now start non-root)
The entrypoint no longer needs a root→non-root handoff since both the Go
platform and Canvas node run as non-root by default. The 'canvas' user owns
/app and /platform, so volume mounts owned by the host's canvas user work
without needing a root init step.
Co-authored-by: Molecule AI CP-BE <cp-be@agents.moleculesai.app>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>