molecule-core/workspace-terminal.md at 46a8d24b2d90371fb5acd797d18b68a4cd2f3cdc

Hongming Wang 46a8d24b2d feat(workspace): persist CP-returned EC2 instance_id on provision

Foundation for the EIC-based terminal handler (#1528). The tenant's
workspace-server needs to map workspace_id → EC2 instance_id to open
an SSH session, but CPProvisioner.Start returned the instance id only
for logging — it was never written anywhere. This PR adds the column
and writes it at provision time.

Scope kept intentionally small: no terminal code yet. The follow-up
PR will consume this column from the terminal handler.

What's here:
- migrations/038_workspace_instance_id — nullable TEXT column on
  workspaces, partial index on non-null for fast lookup
- workspace_provision.go — UPDATE after CPProvisioner.Start; failure
  logs but doesn't fail provisioning (row just lacks instance_id and
  terminal falls back to the existing not-reachable error)
- docs/infra/workspace-terminal.md — full design for the terminal
  flow: EIC vs SSM comparison, IAM policy JSON, SG rules, key
  lifetime, failure modes, rollout checklist

Refs: #1528
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Axis	EIC + SSH	SSM Session Manager
Uses existing `molecule-cp` creds	Yes	No — needs instance profile
AMI changes	None (EIC in OS since AL2 2019+, Ubuntu 20.04+)	Verify agent present
Infra changes	IAM policy + security group	IAM role + instance profile + maybe NAT/VPCe
Audit	CloudTrail for `SendSSHPublicKey`	CloudTrail + SSM session logs (richer)
Rotation	Every session (60s key lifetime)	Managed by AWS
Compliance story	"SSH with per-session keys, CloudTrailed"	"SSM Session Manager with recording available"

Condition	Message	Actionable?
`instance_id IS NULL` (local workspace)	Falls through to current local-Docker handler	n/a — existing behavior
`instance_id` set, DescribeInstances returns nothing	"workspace instance no longer exists — recreate the workspace"	Yes
`SendSSHPublicKey` 403	"tenant lacks EIC permission — contact your admin"	Yes (requires IAM fix)
SSH connect timeout	"tenant cannot reach workspace instance — check security group"	Yes (SG fix)
`docker exec` fails (no container)	"workspace container is not running — try restart"	Yes (normal ops)

8.6 KiB

Raw Blame History

Workspace Terminal over EIC + SSH

Problem

Chosen approach: EC2 Instance Connect + SSH

Why not SSM Session Manager

Comparison

Data flow

IAM policy addition for `molecule-cp`

Security group rule

Key lifetime

Failure modes + their user-visible messages

Rollout checklist

1. Infra prep (CP side)

2. Tenant code (this repo)

3. Verification

Future work (not in scope)

8.6 KiB Raw Blame History

Workspace Terminal over EIC + SSH

Problem

Chosen approach: EC2 Instance Connect + SSH

Why not SSM Session Manager

Comparison

Data flow

IAM policy addition for molecule-cp

Security group rule

Key lifetime

Failure modes + their user-visible messages

Rollout checklist

1. Infra prep (CP side)

2. Tenant code (this repo)

3. Verification

Future work (not in scope)

8.6 KiB

Raw Blame History

IAM policy addition for `molecule-cp`