molecule-core

History

Hongming Wang d012a803e4 feat(terminal): add diagnose endpoint for SSH probe stages GET /workspaces/:id/terminal/diagnose runs the same per-stage pipeline as /terminal (ssh-keygen → EIC send-key → tunnel → ssh) but non-interactively and returns JSON. Each stage reports {name, ok, duration_ms, error, detail}, plus a top-level first_failure naming the broken stage. Why: when the canvas terminal silently disconnects ("Session ended" with no error frame — the user-reported failure mode on hongmingwang's hermes workspace), there is no remote-readable signal of WHICH stage failed. The ssh client's stderr lives only in the workspace-server's stdout on the tenant CP EC2 — invisible without shell access. /terminal can't expose stderr cleanly because it has already upgraded to WebSocket binary frames by the time ssh runs. /terminal/diagnose stays pure HTTP/JSON, so the same auth (WorkspaceAuth + ADMIN_TOKEN fallback) gives operators a one-call probe that splits "IAM broke" (send-ssh-public-key fails) from "tunnel/SG broke" (wait-for-port fails) from "sshd auth broke" (ssh-probe gets Permission denied) from "shell broke" (probe exits non-zero with stderr). Stages mirrored from handleRemoteConnect in terminal.go: 1. ssh-keygen ephemeral session keypair 2. send-ssh-public-key AWS EIC API push, IAM-gated 3. pick-free-port local port for the tunnel 4. open-tunnel aws ec2-instance-connect open-tunnel start 5. wait-for-port the tunnel actually listens (folds tunnel stderr into Detail when it doesn't) 6. ssh-probe non-interactive `ssh ... 'echo MARKER'` that confirms auth + bash + the marker round-trip (CombinedOutput captures stderr verbatim — this is the whole reason the endpoint exists) Local Docker workspaces (no instance_id) get a smaller probe: container-found + container-running. Same response shape so callers don't need to branch. Tests stub sendSSHPublicKey / openTunnelCmd / sshProbeCmd via the existing package-level vars (same pattern as TestSSHCommandCmd_*) so the test suite stays hermetic — no AWS, no network. The three new tests pin: (a) routing to remote on instance_id present, (b) routing to local on empty instance_id, (c) the operationally critical case — full success through wait-for-port then a probe failure surfaces ssh stderr in the ssh-probe step's Error/Detail with first_failure="ssh-probe". Auth: rides on existing WorkspaceAuth middleware. Operators with the tenant ADMIN_TOKEN (fetched via /cp/admin/orgs/:slug/admin-token) can probe any workspace without per-workspace token; same admin path as the canvas dashboard reads workspace activity. Response always returns HTTP 200 (success or step failure are both in the JSON body) so callers don't need to branch on status code — the endpoint either reports a first_failure or doesn't. Resolves task #200, supports task #193 (workspace EC2 sshd unresponsive — without this endpoint we couldn't pin the failure stage from outside the tenant CP EC2). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>		2026-04-30 21:10:20 -07:00
..
cmd/server	fix(boot): always start health-sweep goroutine — SaaS tenants need it for external-runtime liveness	2026-04-30 12:05:40 -07:00
internal	feat(terminal): add diagnose endpoint for SSH probe stages	2026-04-30 21:10:20 -07:00
migrations	fix(workspaces): add missing 'awaiting_agent' + 'hibernating' to workspace_status enum	2026-04-30 08:52:05 -07:00
pkg/provisionhook	feat(#1957 ): wire gh-identity plugin into workspace-server	2026-04-24 15:01:41 +00:00
.ci-force	chore: force Platform(Go) CI run on main — validate go vet clean	2026-04-21 15:43:19 +00:00
.gitignore	feat(ws-server): pull env from CP on startup	2026-04-19 02:41:15 -07:00
.golangci.yaml	chore(workspace-server): add golangci.yaml disabling errcheck	2026-04-24 07:16:54 +00:00
Dockerfile	feat(deploy): verify each tenant /buildinfo matches published SHA after redeploy	2026-04-30 10:55:08 -07:00
Dockerfile.tenant	feat(deploy): verify each tenant /buildinfo matches published SHA after redeploy	2026-04-30 10:55:08 -07:00
entrypoint-tenant.sh	fix(security): add USER directive before ENTRYPOINT in all tenant images (#1155 )	2026-04-20 23:51:33 +00:00
go.mod	chore(deps): batch dep bumps — 11 safe upgrades from 2026-04-28 dependabot wave	2026-04-28 16:25:46 -07:00
go.sum	chore(deps): batch dep bumps — 11 safe upgrades from 2026-04-28 dependabot wave	2026-04-28 16:25:46 -07:00