From f39b595a9c5d32c5751134012b51c44baa58b72c Mon Sep 17 00:00:00 2001 From: Hongming Wang Date: Tue, 5 May 2026 20:18:05 -0700 Subject: [PATCH] fix(workspace files API): EIC parity for ListFiles + DeleteFile (closes #2999 PR-A) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit ## User-visible bug Canvas Files tab returns "0 files / No config files yet" for every SaaS workspace, every root (/configs, /home, /workspace, /plugins). Reported by user (canvas screenshot, hongming.moleculesai.app, Hongming Personal Brand Agent — claude-code, T4, online). ## Root cause `ListFiles` (templates.go) was missing the SSH-via-EIC branch that ReadFile (PR #2785) and WriteFile (PR #1702) already have. On SaaS, dockerCli is nil → findContainer returns "" → falls through to host-side resolveTemplateDir which only matches baked-in template names. For a user-named workspace it matches nothing, so the handler silently returns []fileEntry{}. DeleteFile had the same gap — right-click delete (introduced in PR-C of this issue) would silently no-op once #1 was fixed. ## Fix 1. Extracted shared EIC plumbing into `withEICTunnel` (closure-based, single SSOT for keypair → key push → tunnel → port-wait → cleanup). Refactored writeFileViaEIC + readFileViaEIC to use it. Added listFilesViaEIC + deleteFileViaEIC on the same scaffold. The `LogLevel=ERROR` shim from PR #2822 now lives in one `eicSSHSession.sshArgs()` helper instead of being duplicated per helper — the next time we need to tweak ssh options, one place. 2. Factored remote shell strings into pure functions (buildInstallShell / buildCatShell / buildRmShell / buildFindShell + parseFindOutput) so the wire shape can be pinned without booting a real EIC tunnel. 3. Refactored `resolveWorkspaceFilePath(runtime, root, relPath)` to honor `?root=`. New rule: `/configs` (or empty / unrecognized) → runtime managed-config dir via workspaceFilePathPrefix (preserves the v1 ReadFile/WriteFile behaviour where canvas's Config tab GETs/PUTs config.yaml without specifying a root and lands in the right per-runtime dir); `/home`, `/workspace`, `/plugins` → literal absolute path on the EC2 host. List/Read/Write/Delete now agree on what file a tree row points to — pre-fix List would say "/home contents" but Read/Write would route to /configs. 4. ListFiles + DeleteFile dispatch on instance_id != "" → EIC helper. Errors from the EIC path produce 500 (not silent fall-through to local-Docker, which would mask the failure as "0 files" — the exact user-visible symptom). 5. Added ?root= validation gate to WriteFile + DeleteFile so an out-of-allowlist root is rejected before the resolver runs. ## Test coverage - TestResolveWorkspaceFilePath_RuntimeIndirection — pins the /configs → runtime prefix translation per-runtime (hermes, claude-code, langgraph, external, unknown). Catches the regression where a future edit accidentally drops the runtime indirection. - TestResolveWorkspaceFilePath_LiteralRoots — pins /home, /workspace, /plugins as literal pass-through regardless of runtime. Catches the symmetric regression where the literal roots start getting rewritten to the runtime prefix (which would mean the FilesTab "/home" selector silently routes to /configs on hermes). - TestResolveWorkspaceRootPath — directory-only translation used by listFilesViaEIC, same indirection rules. - TestSSHArgs_HardenedFlags — pins the centralised ssh option set (LogLevel=ERROR + hardening). Catches drift in the one-place-where-ssh-flags-live. - TestEicSSHSessionSingleSourceForSSHFlags — behaviour-based AST gate (per memory). Counts s.sshArgs() callers (must be ≥4 — list/read/write/delete) and asserts LogLevel=ERROR appears exactly once in the source. Fires if anyone copy-pastes a raw ssh args slice instead of going through the helper. - TestBuildInstallShell / TestBuildCatShell / TestBuildRmShell / TestBuildFindShell — pure-function tests pinning the remote command shape. Catches regression like "rm -f silently becomes rm -rf" or "find loses node_modules pruning" without needing a real EC2. - TestBuildFindShell_DepthForwarding — catches a regression where the helper hard-codes a depth instead of using the caller's value. - TestParseFindOutput / TestParseFindOutput_EmptyInput — pin the TYPE|SIZE|REL parser. Empty-input case explicitly returns [] not nil so the JSON wire shape stays a list. - TestListFiles_EICDispatch_Success / Error — sqlmock-driven handler test. Verifies instance_id != "" routes to listFilesViaEIC and surfaces errors as 500 (does NOT silently fall through to local-Docker, which is the exact regression-mode of the original bug). - TestListFiles_EICBranch_NotTakenForSelfHosted — back-compat guard: instance_id == "" must NOT enter the EIC branch (would break self-hosted operators). - TestDeleteFile_EICDispatch_Success / Error — same shape for DeleteFile. - TestListFiles_RootValidation / TestDeleteFile_RootValidation — ?root=/etc must 400 before any DB query or EIC call. ## Verification - `go build ./...` clean - `go test ./...` clean (full workspace-server suite) - Will be live-verified against staging on hongming.moleculesai.app after merge: open Files tab → expect populated /home + /configs + /workspace listings (not "0 files"); right-click delete on /configs/old.yaml → expect file removed on the EC2 host. ## Three weakest spots (hostile self-review) 1. The LogLevel=ERROR drift gate counts source occurrences. A future refactor that intentionally moves the literal somewhere else (e.g. into a constant) would trigger a false positive. The gate's failure message points to the load-bearing constraint (must appear in sshArgs); operator can adjust. 2. `eicFileWriteTimeout` constant kept as an alias for back-compat with prior tests. Documented as intentional + safe to remove on the next pass. 3. The resolver tests pin the runtime → prefix map values (`/home/ubuntu/.hermes`, `/configs`, etc.). A future runtime addition that ships a new prefix needs the test updated. This is intentional — silent prefix changes orphan saved files, so a test failure on map edit IS the right signal. ## Follow-up (RFC #2312 subtask 2) Long-term the right fix is to drop EIC entirely and HTTP-forward to the workspace's own URL (RFC #2312). That's a substantially larger refactor across 5 surfaces (chat upload, files, templates, plugins, terminal) and out of scope for this bug-fix PR. Tracked separately under that RFC. Refs #2999. Co-Authored-By: Claude Opus 4.7 (1M context) --- .../internal/handlers/template_files_eic.go | 611 ++++++++++++------ .../template_files_eic_dispatch_test.go | 303 +++++++++ .../template_files_eic_shells_test.go | 200 ++++++ .../handlers/template_files_eic_test.go | 193 ++++-- .../internal/handlers/template_import.go | 7 +- .../internal/handlers/templates.go | 81 ++- .../internal/handlers/templates_test.go | 16 +- 7 files changed, 1144 insertions(+), 267 deletions(-) create mode 100644 workspace-server/internal/handlers/template_files_eic_dispatch_test.go create mode 100644 workspace-server/internal/handlers/template_files_eic_shells_test.go diff --git a/workspace-server/internal/handlers/template_files_eic.go b/workspace-server/internal/handlers/template_files_eic.go index edce34fc..b6dba98a 100644 --- a/workspace-server/internal/handlers/template_files_eic.go +++ b/workspace-server/internal/handlers/template_files_eic.go @@ -1,23 +1,20 @@ package handlers -// template_files_eic.go — SSH-backed file write for SaaS workspaces -// (EC2-per-workspace). Pairs with the existing Docker-path in templates.go -// (WriteFile) and template_import.go (ReplaceFiles). +// template_files_eic.go — SSH-backed file operations for SaaS workspaces +// (EC2-per-workspace). Pairs with the local-Docker path in templates.go +// (List/Read/Write/Delete) and template_import.go (ReplaceFiles). // -// Flow for a single file write: -// 1. Generate ephemeral ed25519 keypair (on-disk for ≤ write duration). -// 2. Push the public key via `aws ec2-instance-connect send-ssh-public-key` -// so the target sshd accepts it for the next 60s. -// 3. Open a TLS-tunnelled TCP port via `aws ec2-instance-connect open-tunnel` -// from a local free port → workspace's sshd on 22. -// 4. Pipe content to `ssh ... "install -D -m 0644 /dev/stdin "`. -// `install -D` creates any missing parent dirs atomically. File is owned -// by whichever $OSUser we authenticated as (ubuntu by default). -// 5. Close tunnel + wipe keydir. +// Architecture note: every operation goes through `withEICTunnel`, which +// owns the ephemeral-keypair → key-push → tunnel → port-wait dance. Per- +// op helpers (list/read/write/delete) only carry the remote command + +// stdin/stdout shape. This keeps the EIC connection logic in one place +// so a fix to the dance — e.g. PR #2822's `LogLevel=ERROR` shim — only +// touches one helper. // -// All the AWS calls + ssh tunnel exec go through the same package-level -// func vars defined in terminal.go (openTunnelCmd, sendSSHPublicKey) so -// tests can stub them the same way the terminal tests do. +// Path translation rules: see resolveWorkspaceFilePath. `/configs` +// is the per-runtime managed-config indirection (claude-code → /configs, +// hermes → /home/ubuntu/.hermes); other allow-listed roots (`/home`, +// `/workspace`, `/plugins`) pass through literally. import ( "bytes" @@ -32,8 +29,7 @@ import ( ) // workspaceFilePathPrefix maps a runtime name to the absolute base path on -// the workspace EC2 where the Files API's relative paths land. New runtimes -// can be added here without touching handler code. +// the workspace EC2 where the runtime's managed-config dir lives. // // Keep these stable — changing the base path for an existing runtime // without a migration shim will make previously-saved files disappear from @@ -60,41 +56,104 @@ var workspaceFilePathPrefix = map[string]string{ // those runtimes actually have on disk. } -func resolveWorkspaceFilePath(runtime, relPath string) (string, error) { +// resolveWorkspaceFilePath translates (runtime, root, relPath) into an +// absolute path on the workspace EC2. +// +// `root="/configs"` (or empty / unrecognized) is treated as the +// runtime's MANAGED-config dir via workspaceFilePathPrefix — +// /home/ubuntu/.hermes for hermes, /configs for claude-code, etc. +// This preserves the v1 ReadFile/WriteFile behavior where the canvas's +// Config tab GETs/PUTs "config.yaml" without specifying a root and +// lands in the runtime's own config dir, even though that dir's +// absolute path differs per runtime. +// +// Any other allow-listed root (`/home`, `/workspace`, `/plugins`) is +// treated as a LITERAL absolute path on the EC2 host. Those roots are +// universal Linux paths that don't need per-runtime indirection. +// +// Restricting the literal pass-through to allowedRoots is the +// security boundary — the handler also gates this same set, so the +// resolver is defence-in-depth: even if a future caller forgets the +// handler-side check, the resolver won't translate `?root=/etc` into +// a real absolute path. +// +// relPath is sanitised by validateRelPath (no absolute, no `..`). +func resolveWorkspaceFilePath(runtime, root, relPath string) (string, error) { if err := validateRelPath(relPath); err != nil { return "", err } - base, ok := workspaceFilePathPrefix[strings.ToLower(strings.TrimSpace(runtime))] - if !ok { - base = "/configs" - } + base := resolveWorkspaceRootPath(runtime, root) return filepath.Join(base, filepath.Clean(relPath)), nil } -// eicFileWriteTimeout bounds the whole dance. Key push is <500ms, tunnel -// is 1-2s, ssh + write is <2s. 30s gives headroom for slow pulls without -// hanging the Files API forever under EIC misconfiguration. -const eicFileWriteTimeout = 30 * time.Second - -// writeFileViaEIC writes a single file to the workspace EC2 at the -// absolute path that resolveWorkspaceFilePath computed. On success, -// optionally invokes the runtime's reload hook (not implemented yet — -// tracked as follow-up; for today the canvas issues a separate Restart -// after Save). +// resolveWorkspaceRootPath returns the absolute base directory on the +// workspace EC2 for a given (runtime, root) pair, without touching a +// relative file path. Used by listFilesViaEIC to compute the directory +// to walk; resolveWorkspaceFilePath joins this with relPath. // -// instanceID: AWS EC2 instance id from workspaces.instance_id. -// runtime: used only for path-prefix resolution. -// relPath: the relative path the caller validated (no /, no ..). -// content: file body bytes. -func writeFileViaEIC(ctx context.Context, instanceID, runtime, relPath string, content []byte) error { +// Centralising the runtime-vs-literal indirection here means +// list/read/write/delete agree on what `?root=/configs` means for +// hermes vs claude-code vs an unknown runtime — otherwise list could +// show one directory while read/write target another. +func resolveWorkspaceRootPath(runtime, root string) string { + root = strings.TrimSpace(root) + // "/configs" + empty + unrecognized → runtime's managed-config dir. + // The runtime prefix map is the SSOT for that translation. + if root == "" || root == "/configs" || !allowedRoots[root] { + base, ok := workspaceFilePathPrefix[strings.ToLower(strings.TrimSpace(runtime))] + if !ok { + base = "/configs" + } + return base + } + // Literal universal path (`/home`, `/workspace`, `/plugins`). + return root +} + +// eicFileOpTimeout bounds the whole tunnel + ssh dance. Key push is +// <500ms, tunnel is 1-2s, ssh + remote command is <2s for read/write. +// 30s gives headroom for slow EIC pulls + the larger `find` walk that +// listFilesViaEIC issues, without hanging the Files API forever under +// EIC misconfiguration. +const eicFileOpTimeout = 30 * time.Second + +// eicFileOpTimeout was historically named eicFileWriteTimeout when the +// only EIC op was writeFile. Keep an alias so any external test that +// pinned the old name still compiles; rename can land as a follow-up +// once we've gone a release without the alias being touched. +// +//nolint:revive // intentional alias for back-compat with prior tests. +const eicFileWriteTimeout = eicFileOpTimeout + +// eicSSHSession describes an open EIC tunnel ready for an ssh subprocess. +// Only valid inside the closure passed to withEICTunnel — the underlying +// keypair + tunnel are torn down when the closure returns. +type eicSSHSession struct { + keyPath string + localPort int + osUser string + instanceID string +} + +// withEICTunnel sets up an EIC SSH session (ephemeral keypair → push +// → AWS open-tunnel → wait-for-port), invokes fn with a session handle, +// and tears everything down on return. The caller is responsible for +// applying the per-op context.WithTimeout before calling — this helper +// only owns the EIC dance, not the operation budget, so a caller that +// needs a different timeout (e.g. a large bulk import) doesn't have to +// fight a hard-coded one. +// +// All AWS calls go through the package-level func vars in terminal.go +// (sendSSHPublicKey, openTunnelCmd) so tests can stub them the same way +// terminal_test.go does. The whole helper is also assigned to a +// `var` (`withEICTunnel`) so handler-dispatch tests can stub the entire +// dance instead of having to wire up a fake tunnel + fake ssh server. +var withEICTunnel = realWithEICTunnel + +func realWithEICTunnel(ctx context.Context, instanceID string, fn func(s eicSSHSession) error) error { if instanceID == "" { return fmt.Errorf("workspace has no instance_id — not a SaaS EC2 workspace") } - absPath, err := resolveWorkspaceFilePath(runtime, relPath) - if err != nil { - return fmt.Errorf("invalid path: %w", err) - } - osUser := os.Getenv("WORKSPACE_EC2_OS_USER") if osUser == "" { osUser = "ubuntu" @@ -104,11 +163,7 @@ func writeFileViaEIC(ctx context.Context, instanceID, runtime, relPath string, c region = "us-east-2" } - ctx, cancel := context.WithTimeout(ctx, eicFileWriteTimeout) - defer cancel() - - // Ephemeral keypair. - keyDir, err := os.MkdirTemp("", "molecule-filewrite-*") + keyDir, err := os.MkdirTemp("", "molecule-eic-*") if err != nil { return fmt.Errorf("keydir mkdir: %w", err) } @@ -116,7 +171,7 @@ func writeFileViaEIC(ctx context.Context, instanceID, runtime, relPath string, c keyPath := keyDir + "/id" if out, kerr := exec.CommandContext(ctx, "ssh-keygen", "-t", "ed25519", "-f", keyPath, "-N", "", "-q", - "-C", "molecule-filewrite", + "-C", "molecule-eic", ).CombinedOutput(); kerr != nil { return fmt.Errorf("ssh-keygen: %w (%s)", kerr, strings.TrimSpace(string(out))) } @@ -125,24 +180,21 @@ func writeFileViaEIC(ctx context.Context, instanceID, runtime, relPath string, c return fmt.Errorf("read pubkey: %w", err) } - // 1. Push key. if err := sendSSHPublicKey(ctx, region, instanceID, osUser, strings.TrimSpace(string(pubKey))); err != nil { return fmt.Errorf("send-ssh-public-key: %w", err) } - // 2. Open tunnel on an OS-picked free port. localPort, err := pickFreePort() if err != nil { return fmt.Errorf("pick free port: %w", err) } - opts := eicSSHOptions{ + tunnel := openTunnelCmd(eicSSHOptions{ InstanceID: instanceID, OSUser: osUser, Region: region, LocalPort: localPort, PrivateKeyPath: keyPath, - } - tunnel := openTunnelCmd(opts) + }) tunnel.Env = os.Environ() if err := tunnel.Start(); err != nil { return fmt.Errorf("open-tunnel start: %w", err) @@ -157,183 +209,330 @@ func writeFileViaEIC(ctx context.Context, instanceID, runtime, relPath string, c return fmt.Errorf("tunnel never listened: %w", err) } - // 3. SSH + install -D. `install` creates any missing parent dirs and - // writes the file atomically via temp-file-rename. Permissions 0644 - // match the existing tar-unpack defaults on the Docker path. - // - // `sudo -n` (non-interactive) prefix: the canonical containerized - // workspace layout puts /configs at the root, owned by root because - // cloud-init runs as root (see - // molecule-controlplane/internal/provisioner/userdata_containerized.go). - // SSH-as-ubuntu can't write into /configs without escalation. - // Ubuntu has passwordless sudo on EC2 by default; sudo -n fails fast - // (no prompt) if that ever changes, surfacing a clean error instead - // of a hang. The hermes path /home/ubuntu/.hermes is ubuntu-owned - // and doesn't strictly need sudo, but using it uniformly avoids - // per-runtime branching here. - // - // The remote command is fully deterministic — no user-controlled - // input reaches a shell eval (absPath is built from a map + Clean()). - sshArgs := []string{ - "-i", keyPath, + return fn(eicSSHSession{ + keyPath: keyPath, + localPort: localPort, + osUser: osUser, + instanceID: instanceID, + }) +} + +// sshArgs returns the standard ssh CLI args for an EIC session pointed +// at the local tunnel port + a single remote command string. +// +// `LogLevel=ERROR` silences the benign "Warning: Permanently added +// '[127.0.0.1]:NNNNN' to known hosts" notice that ssh emits on every +// fresh tunnel connection. Without this, the notice lands on stderr +// and fools the read/list "empty stdout + empty stderr → not found" +// classifiers into thinking the warning is a real ssh-layer error → 500 +// instead of 404 (Hermes config.yaml load, hongming tenant, 2026-05-05 +// 02:38; PR #2822). Real auth/tunnel errors stay visible because they're +// emitted at ERROR level. +// +// Originally each helper assembled its own ssh args inline, so PR #2822's +// LogLevel=ERROR fix had to be applied to every copy. Centralising here +// means future ssh-option tweaks only land in one place. +func (s eicSSHSession) sshArgs(remoteCommand string) []string { + return []string{ + "-i", s.keyPath, "-o", "StrictHostKeyChecking=no", "-o", "UserKnownHostsFile=/dev/null", - // LogLevel=ERROR silences the benign "Warning: Permanently - // added '[127.0.0.1]:NNNNN' to known hosts" notice that ssh - // emits on every fresh tunnel connection. Without this, the - // notice lands on stderr and fools readFileViaEIC's "empty - // stdout + empty stderr → file not found" classifier into - // thinking the warning is a real ssh-layer error → 500 - // instead of 404 (Hermes config.yaml load, hongming tenant, - // 2026-05-05 02:38). Real auth/tunnel errors stay visible - // because they're emitted at ERROR level. "-o", "LogLevel=ERROR", "-o", "ServerAliveInterval=15", - "-p", fmt.Sprintf("%d", localPort), - fmt.Sprintf("%s@127.0.0.1", osUser), - fmt.Sprintf("sudo -n install -D -m 0644 /dev/stdin %s", shellQuote(absPath)), + "-p", fmt.Sprintf("%d", s.localPort), + fmt.Sprintf("%s@127.0.0.1", s.osUser), + remoteCommand, } - sshCmd := exec.CommandContext(ctx, "ssh", sshArgs...) - sshCmd.Env = os.Environ() - sshCmd.Stdin = bytes.NewReader(content) - var stderr bytes.Buffer - sshCmd.Stderr = &stderr - if err := sshCmd.Run(); err != nil { - return fmt.Errorf("ssh install: %w (%s)", err, strings.TrimSpace(stderr.String())) +} + +// buildInstallShell returns the remote command for atomically writing +// `/dev/stdin` to absPath with mode 0644 via `sudo -n install -D`. +// `install -D` creates any missing parent dirs and writes via +// temp-file-rename (atomic). Pure function for direct testability — +// the only variable input (absPath) is shellQuote-wrapped to defeat +// any shell metachar in a future caller's path. +func buildInstallShell(absPath string) string { + return fmt.Sprintf("sudo -n install -D -m 0644 /dev/stdin %s", shellQuote(absPath)) +} + +// buildCatShell returns the remote command for reading absPath and +// swallowing missing-file stderr (so the empty-stdout + non-zero-exit +// case is unambiguous → os.ErrNotExist at the caller). +func buildCatShell(absPath string) string { + return fmt.Sprintf("sudo -n cat %s 2>/dev/null", shellQuote(absPath)) +} + +// buildRmShell returns the remote command for `sudo -n rm -f` against +// absPath. `-f` (not `-rf`) is intentional — directory removal needs +// its own explicit endpoint if/when the canvas grows that affordance, +// and `-rf` would let a misclassified directory entry trigger a +// recursive delete. +func buildRmShell(absPath string) string { + return fmt.Sprintf("sudo -n rm -f %s", shellQuote(absPath)) +} + +// buildFindShell returns the remote command for enumerating files +// under listPath up to maxDepth, emitting `TYPE|SIZE|REL_PATH` lines +// (matches the local-Docker container path's parser exactly). +// +// `2>/dev/null` swallows find's "No such file" error so a missing +// listing root surfaces as empty stdout (handler returns []) rather +// than 500. +// +// `stat -c %s` is GNU coreutils; `stat -f %z` is BSD. Try GNU first, +// fall back to BSD, then 0 — same shape the local-Docker `sh -c` +// version uses so a future cross-runtime fleet (Alpine vs Ubuntu) +// doesn't regress. +// +// Hidden / cache dir pruning matches the container path: .git, +// __pycache__, node_modules, .DS_Store. Without these the tree drowns +// in transient artefacts on a /workspace listing. +func buildFindShell(listPath string, maxDepth int) string { + return fmt.Sprintf( + `sudo -n find %s -maxdepth %d -not -path '*/.git/*' -not -path '*/__pycache__/*' -not -path '*/node_modules/*' -not -name .DS_Store 2>/dev/null | while IFS= read -r f; do `+ + `rel="${f#%s/}"; [ "$rel" = %s ] && continue; [ -z "$rel" ] && continue; `+ + `if [ -d "$f" ]; then echo "d|0|$rel"; else `+ + `s=$(stat -c %%s "$f" 2>/dev/null || stat -f %%z "$f" 2>/dev/null || echo 0); echo "f|$s|$rel"; `+ + `fi; done`, + shellQuote(listPath), maxDepth, shellQuote(listPath), shellQuote(listPath), + ) +} + +// parseFindOutput parses TYPE|SIZE|REL_PATH lines emitted by +// buildFindShell into eicFileEntry rows. Whitespace-only lines and +// malformed rows are silently skipped — the same behaviour as the +// local-Docker container parser for symmetric output. +func parseFindOutput(raw []byte) []eicFileEntry { + files := make([]eicFileEntry, 0) + for _, line := range strings.Split(string(raw), "\n") { + parts := strings.SplitN(line, "|", 3) + if len(parts) != 3 || parts[2] == "" { + continue + } + var size int64 + fmt.Sscanf(parts[1], "%d", &size) + files = append(files, eicFileEntry{ + Path: parts[2], + Size: size, + Dir: parts[0] == "d", + }) } - log.Printf("writeFileViaEIC: ws instance=%s runtime=%s wrote %d bytes → %s", - instanceID, runtime, len(content), absPath) - return nil + return files } // shellQuote wraps a value in single quotes + escapes embedded single -// quotes for POSIX sh. Used for the sole piece of variable data in the -// remote ssh command. (absPath is already built from a map + Clean() so -// traversal is blocked regardless; this is defence-in-depth against -// future refactor that might accept user paths here.) +// quotes for POSIX sh. Used for the variable parts of remote ssh +// commands (absolute paths). The paths are already built from a +// validated allowlist + Clean(), so traversal is blocked regardless; +// this is defence-in-depth against a future refactor that might accept +// user paths directly here. func shellQuote(s string) string { return "'" + strings.ReplaceAll(s, "'", `'\''`) + "'" } +// writeFileViaEIC writes a single file to the workspace EC2 at the +// absolute path that resolveWorkspaceFilePath computed. On success, +// optionally invokes the runtime's reload hook (not implemented yet — +// tracked as follow-up; for today the canvas issues a separate Restart +// after Save). +// +// `install -D` creates any missing parent dirs and writes atomically +// via temp-file-rename. Permissions 0644 match the existing tar-unpack +// defaults on the Docker path. +// +// `sudo -n` (non-interactive) prefix: the canonical containerized +// workspace layout puts /configs at the root, owned by root because +// cloud-init runs as root (see +// molecule-controlplane/internal/provisioner/userdata_containerized.go). +// SSH-as-ubuntu can't write into /configs without escalation. Ubuntu +// has passwordless sudo on EC2 by default; sudo -n fails fast (no +// prompt) if that ever changes, surfacing a clean error instead of a +// hang. The hermes path /home/ubuntu/.hermes is ubuntu-owned and +// doesn't strictly need sudo, but using it uniformly avoids per-runtime +// branching here. +func writeFileViaEIC(ctx context.Context, instanceID, runtime, root, relPath string, content []byte) error { + absPath, err := resolveWorkspaceFilePath(runtime, root, relPath) + if err != nil { + return fmt.Errorf("invalid path: %w", err) + } + ctx, cancel := context.WithTimeout(ctx, eicFileOpTimeout) + defer cancel() + + return withEICTunnel(ctx, instanceID, func(s eicSSHSession) error { + sshCmd := exec.CommandContext(ctx, "ssh", s.sshArgs(buildInstallShell(absPath))...) + sshCmd.Env = os.Environ() + sshCmd.Stdin = bytes.NewReader(content) + var stderr bytes.Buffer + sshCmd.Stderr = &stderr + if err := sshCmd.Run(); err != nil { + return fmt.Errorf("ssh install: %w (%s)", err, strings.TrimSpace(stderr.String())) + } + log.Printf("writeFileViaEIC: ws instance=%s runtime=%s root=%s wrote %d bytes → %s", + instanceID, runtime, root, len(content), absPath) + return nil + }) +} + // readFileViaEIC reads a single file from the workspace EC2 at the // absolute path that resolveWorkspaceFilePath computes. Mirrors -// writeFileViaEIC end-to-end (ephemeral keypair, EIC tunnel, ssh) so -// canvas's Config tab can GET back what it just PUT. Pre-fix the GET -// path (templates.go ReadFile) only handled local Docker containers -// + a host-side template fallback; SaaS workspaces (EC2-per-workspace) -// always 404'd because neither handles their on-EC2 layout. +// writeFileViaEIC (ephemeral keypair, EIC tunnel, ssh) so the canvas's +// Config tab can GET back what it just PUT. // // Returns ("", os.ErrNotExist) when the remote path doesn't exist so // the handler can map it to HTTP 404 cleanly. Other errors propagate. -func readFileViaEIC(ctx context.Context, instanceID, runtime, relPath string) ([]byte, error) { - if instanceID == "" { - return nil, fmt.Errorf("workspace has no instance_id — not a SaaS EC2 workspace") - } - absPath, err := resolveWorkspaceFilePath(runtime, relPath) +// +// `sudo -n cat`: /configs is root-owned (same reason writeFileViaEIC +// needs sudo). The path is built from a validated map + Clean(), so no +// user-controlled string reaches the shell here. `2>/dev/null` swallows +// `cat: ...: No such file` so the missing-file case returns empty +// stdout + non-zero exit, which we translate to os.ErrNotExist. +func readFileViaEIC(ctx context.Context, instanceID, runtime, root, relPath string) ([]byte, error) { + absPath, err := resolveWorkspaceFilePath(runtime, root, relPath) if err != nil { return nil, fmt.Errorf("invalid path: %w", err) } - - osUser := os.Getenv("WORKSPACE_EC2_OS_USER") - if osUser == "" { - osUser = "ubuntu" - } - region := os.Getenv("AWS_REGION") - if region == "" { - region = "us-east-2" - } - - ctx, cancel := context.WithTimeout(ctx, eicFileWriteTimeout) + ctx, cancel := context.WithTimeout(ctx, eicFileOpTimeout) defer cancel() - keyDir, err := os.MkdirTemp("", "molecule-fileread-*") - if err != nil { - return nil, fmt.Errorf("keydir mkdir: %w", err) - } - defer func() { _ = os.RemoveAll(keyDir) }() - keyPath := keyDir + "/id" - if out, kerr := exec.CommandContext(ctx, "ssh-keygen", - "-t", "ed25519", "-f", keyPath, "-N", "", "-q", - "-C", "molecule-fileread", - ).CombinedOutput(); kerr != nil { - return nil, fmt.Errorf("ssh-keygen: %w (%s)", kerr, strings.TrimSpace(string(out))) - } - pubKey, err := os.ReadFile(keyPath + ".pub") - if err != nil { - return nil, fmt.Errorf("read pubkey: %w", err) - } - - if err := sendSSHPublicKey(ctx, region, instanceID, osUser, strings.TrimSpace(string(pubKey))); err != nil { - return nil, fmt.Errorf("send-ssh-public-key: %w", err) - } - - localPort, err := pickFreePort() - if err != nil { - return nil, fmt.Errorf("pick free port: %w", err) - } - tunnel := openTunnelCmd(eicSSHOptions{ - InstanceID: instanceID, - OSUser: osUser, - Region: region, - LocalPort: localPort, - PrivateKeyPath: keyPath, + var out []byte + runErr := withEICTunnel(ctx, instanceID, func(s eicSSHSession) error { + sshCmd := exec.CommandContext(ctx, "ssh", s.sshArgs(buildCatShell(absPath))...) + sshCmd.Env = os.Environ() + var stdout, stderr bytes.Buffer + sshCmd.Stdout = &stdout + sshCmd.Stderr = &stderr + err := sshCmd.Run() + out = stdout.Bytes() + if err != nil { + // `cat` returns 1 on missing file; with 2>/dev/null we have no + // stderr distinguisher. Treat empty-stdout + empty-stderr + + // non-zero exit as not-found rather than a tunnel/auth error + // (those usually produce stderr from ssh itself, not from the + // remote command). + if len(out) == 0 && stderr.Len() == 0 { + return os.ErrNotExist + } + return fmt.Errorf("ssh cat: %w (%s)", err, strings.TrimSpace(stderr.String())) + } + log.Printf("readFileViaEIC: ws instance=%s runtime=%s root=%s read %d bytes ← %s", + instanceID, runtime, root, len(out), absPath) + return nil }) - tunnel.Env = os.Environ() - if err := tunnel.Start(); err != nil { - return nil, fmt.Errorf("open-tunnel start: %w", err) - } - defer func() { - if tunnel.Process != nil { - _ = tunnel.Process.Kill() - } - _ = tunnel.Wait() - }() - if err := waitForPort(ctx, "127.0.0.1", localPort, 10*time.Second); err != nil { - return nil, fmt.Errorf("tunnel never listened: %w", err) - } - - // `sudo -n cat`: /configs is root-owned by cloud-init (same reason - // writeFileViaEIC needs sudo to install). The path is built from a - // validated map + Clean(), so no user-controlled string reaches the - // shell here. `2>/dev/null` swallows `cat: ...: No such file` so - // the missing-file case returns empty stdout + non-zero exit, which - // we translate to os.ErrNotExist below. - sshCmd := exec.CommandContext(ctx, "ssh", - "-i", keyPath, - "-o", "StrictHostKeyChecking=no", - "-o", "UserKnownHostsFile=/dev/null", - // LogLevel=ERROR silences the benign "Warning: Permanently - // added '[127.0.0.1]:NNNNN' to known hosts" notice that ssh - // emits on every fresh tunnel connection. Without this, the - // notice lands on stderr and fools readFileViaEIC's "empty - // stdout + empty stderr → file not found" classifier into - // thinking the warning is a real ssh-layer error → 500 - // instead of 404 (Hermes config.yaml load, hongming tenant, - // 2026-05-05 02:38). Real auth/tunnel errors stay visible - // because they're emitted at ERROR level. - "-o", "LogLevel=ERROR", - "-o", "ServerAliveInterval=15", - "-p", fmt.Sprintf("%d", localPort), - fmt.Sprintf("%s@127.0.0.1", osUser), - fmt.Sprintf("sudo -n cat %s 2>/dev/null", shellQuote(absPath)), - ) - sshCmd.Env = os.Environ() - var stdout, stderr bytes.Buffer - sshCmd.Stdout = &stdout - sshCmd.Stderr = &stderr - runErr := sshCmd.Run() - out := stdout.Bytes() if runErr != nil { - // `cat` returns 1 on missing file; with 2>/dev/null we have no - // stderr distinguisher. Treat empty-stdout + non-zero exit as - // not-found rather than a tunnel/auth error (those usually - // produce stderr from ssh itself, not from the remote command). - if len(out) == 0 && stderr.Len() == 0 { - return nil, os.ErrNotExist - } - return nil, fmt.Errorf("ssh cat: %w (%s)", runErr, strings.TrimSpace(stderr.String())) + return nil, runErr } - log.Printf("readFileViaEIC: ws instance=%s runtime=%s read %d bytes ← %s", - instanceID, runtime, len(out), absPath) return out, nil } + +// eicFileEntry is the wire shape returned by listFilesViaEIC. It +// matches the inline `fileEntry` in templates.go::ListFiles so the +// handler can emit either path's output without a translation layer. +type eicFileEntry struct { + Path string `json:"path"` + Size int64 `json:"size"` + Dir bool `json:"dir"` +} + +// listFilesViaEIC enumerates files under / on the workspace +// EC2 host, up to the given depth, returning entries with paths +// relative to the listing root (matching the local-Docker path's +// output). Closes the symmetry gap that left ListFiles silently +// returning [] for SaaS workspaces — see issue #2999. +// +// Output line format: TYPE|SIZE|REL_PATH (matches the container's find +// shell so the parser is identical). `find -maxdepth N` traverses up +// to N levels; the canvas requests depth=1 by default and re-fetches +// when the user expands a directory. +// +// Pruning: same hidden / cache dirs as the container path (.git, +// __pycache__, node_modules, .DS_Store) so the canvas's tree doesn't +// drown in transient artefacts. +// +// `sudo -n` matches the read/write paths — even though the universal +// roots (/home, /workspace, /plugins) are typically ubuntu-owned and +// don't need it, /configs and runtime-prefix dirs do (root-owned by +// cloud-init), and using sudo uniformly avoids per-root branching. +func listFilesViaEIC(ctx context.Context, instanceID, runtime, root, sub string, depth int) ([]eicFileEntry, error) { + if sub != "" { + if err := validateRelPath(sub); err != nil { + return nil, fmt.Errorf("invalid sub: %w", err) + } + } + if depth < 1 { + depth = 1 + } + if depth > 5 { + depth = 5 + } + listPath := resolveWorkspaceRootPath(runtime, root) + if sub != "" { + listPath = filepath.Join(listPath, filepath.Clean(sub)) + } + + ctx, cancel := context.WithTimeout(ctx, eicFileOpTimeout) + defer cancel() + + var rawOutput []byte + runErr := withEICTunnel(ctx, instanceID, func(s eicSSHSession) error { + sshCmd := exec.CommandContext(ctx, "ssh", s.sshArgs(buildFindShell(listPath, depth))...) + sshCmd.Env = os.Environ() + var stdout, stderr bytes.Buffer + sshCmd.Stdout = &stdout + sshCmd.Stderr = &stderr + if err := sshCmd.Run(); err != nil { + // Empty stdout + empty stderr after we swallowed find's + // own error stream means the listing root genuinely + // doesn't exist on this workspace — return an empty + // slice rather than a 500. Real ssh/tunnel errors emit + // to stderr at LogLevel=ERROR. + if stdout.Len() == 0 && stderr.Len() == 0 { + rawOutput = nil + return nil + } + return fmt.Errorf("ssh find: %w (%s)", err, strings.TrimSpace(stderr.String())) + } + rawOutput = stdout.Bytes() + return nil + }) + if runErr != nil { + return nil, runErr + } + + files := parseFindOutput(rawOutput) + log.Printf("listFilesViaEIC: ws instance=%s runtime=%s root=%s sub=%s depth=%d → %d entries from %s", + instanceID, runtime, root, sub, depth, len(files), listPath) + return files, nil +} + +// deleteFileViaEIC removes a single file from the workspace EC2. +// Returns nil for both "deleted" and "didn't exist" — `rm -f` doesn't +// distinguish, and the canvas's delete-then-refresh flow doesn't need +// it to. +// +// Symmetry note: pre-fix DeleteFile (templates.go:514) had no EIC +// branch, so right-click delete on a SaaS workspace would fall through +// to the local-Docker path, find no container (dockerCli is nil on +// SaaS), and try the ephemeral-volume path which itself only handles +// local Docker volumes. Net effect: silent no-op. Closing this gap is +// part of issue #2999. +func deleteFileViaEIC(ctx context.Context, instanceID, runtime, root, relPath string) error { + absPath, err := resolveWorkspaceFilePath(runtime, root, relPath) + if err != nil { + return fmt.Errorf("invalid path: %w", err) + } + ctx, cancel := context.WithTimeout(ctx, eicFileOpTimeout) + defer cancel() + + return withEICTunnel(ctx, instanceID, func(s eicSSHSession) error { + sshCmd := exec.CommandContext(ctx, "ssh", s.sshArgs(buildRmShell(absPath))...) + sshCmd.Env = os.Environ() + var stderr bytes.Buffer + sshCmd.Stderr = &stderr + if err := sshCmd.Run(); err != nil { + return fmt.Errorf("ssh rm: %w (%s)", err, strings.TrimSpace(stderr.String())) + } + log.Printf("deleteFileViaEIC: ws instance=%s runtime=%s root=%s removed %s", + instanceID, runtime, root, absPath) + return nil + }) +} diff --git a/workspace-server/internal/handlers/template_files_eic_dispatch_test.go b/workspace-server/internal/handlers/template_files_eic_dispatch_test.go new file mode 100644 index 00000000..8b8d5892 --- /dev/null +++ b/workspace-server/internal/handlers/template_files_eic_dispatch_test.go @@ -0,0 +1,303 @@ +package handlers + +// template_files_eic_dispatch_test.go — handler-level tests for the +// EIC dispatch added in PR-A of issue #2999. Pre-PR-A, ListFiles and +// DeleteFile silently fell through to the local-Docker path on SaaS +// workspaces (where dockerCli is nil) and returned [] / silent no-op. +// These tests pin the new behavior: +// +// 1. instance_id != "" → handler invokes the EIC helper +// 2. EIC success → 200 with the helper's payload +// 3. EIC error → 500 (does NOT fall through to local-Docker / +// template-dir, which would mask the real failure) +// 4. instance_id == "" → existing local-Docker / template-dir +// fallback (back-compat with self-hosted operators) +// +// Stubs `withEICTunnel` so the entire EIC dance (keypair, AWS calls, +// tunnel, ssh) is replaced with a fake closure that yields a captured +// session — lets the test capture what the inner closure would have +// done without spinning up a real sshd. The test for the actual +// remote shell shapes lives in template_files_eic_shells_test.go +// (pure-function tests on buildFindShell / buildInstallShell etc). + +import ( + "context" + "encoding/json" + "errors" + "net/http" + "net/http/httptest" + "strings" + "testing" + + "github.com/DATA-DOG/go-sqlmock" + "github.com/gin-gonic/gin" +) + +// stubWithEICTunnel replaces the package-level withEICTunnel with a +// closure that records its inputs and runs fn against a fake session, +// returning fnErr from the inner fn if non-nil. Restores the original +// on test cleanup. +func stubWithEICTunnel(t *testing.T, fnErr error) (calls *[]string) { + t.Helper() + captured := []string{} + calls = &captured + prev := withEICTunnel + withEICTunnel = func(ctx context.Context, instanceID string, fn func(s eicSSHSession) error) error { + captured = append(captured, instanceID) + // Hand the closure a sentinel session so any code that pulls + // session fields gets deterministic non-empty values. The + // closure's exec.Command call will fail at runtime because no + // real ssh exists for instanceID="i-test"; but most + // dispatch-tests inject fnErr directly to skip that. + return fnErr + } + t.Cleanup(func() { withEICTunnel = prev }) + return calls +} + +// stubWithEICTunnelReturning is like stubWithEICTunnel but lets the +// test substitute the inner fn entirely so it can populate `out` / +// return shaped errors without invoking the real ssh closure. +func stubWithEICTunnelReturning(t *testing.T, replacement func(s eicSSHSession) error) (calls *[]string) { + t.Helper() + captured := []string{} + calls = &captured + prev := withEICTunnel + withEICTunnel = func(ctx context.Context, instanceID string, _ func(s eicSSHSession) error) error { + captured = append(captured, instanceID) + return replacement(eicSSHSession{instanceID: instanceID, osUser: "ubuntu", localPort: 12345, keyPath: "/tmp/k"}) + } + t.Cleanup(func() { withEICTunnel = prev }) + return calls +} + +// ---- ListFiles EIC dispatch ---- + +// TestListFiles_EICDispatch_Success: a workspace with instance_id set +// must route to listFilesViaEIC, NOT to local-Docker / template-dir. +// Verifies the handler hands the EIC helper's output back as JSON. +// +// Until PR-A this test would fail no matter what mocks were in place — +// the dispatch branch did not exist. +func TestListFiles_EICDispatch_Success(t *testing.T) { + mock := setupTestDB(t) + setupTestRedis(t) + + mock.ExpectQuery(`SELECT name, COALESCE\(instance_id, ''\), COALESCE\(runtime, ''\) FROM workspaces WHERE id =`). + WithArgs("ws-eic"). + WillReturnRows(sqlmock.NewRows([]string{"name", "instance_id", "runtime"}). + AddRow("My Agent", "i-test", "claude-code")) + + // The package-level withEICTunnel stub doesn't get to set the + // listFilesViaEIC outparam, so we have to override the helper at + // a higher level. Instead, we stub withEICTunnel to *return* the + // inner closure's err — but we can't reach the byte-output path. + // Use the dedicated stubWithEICTunnelReturning + intercept ssh: + // since the tunnel stub doesn't run the closure's ssh exec at all + // when we replace the inner fn, the helper's `rawOutput` stays + // nil and parseFindOutput returns []. Sufficient for "200 + empty" + // dispatch verification. + stubWithEICTunnelReturning(t, func(s eicSSHSession) error { + return nil // skip the real ssh; outer rawOutput stays nil → [] + }) + + w := httptest.NewRecorder() + c, _ := gin.CreateTestContext(w) + c.Params = gin.Params{{Key: "id", Value: "ws-eic"}} + c.Request = httptest.NewRequest("GET", "/workspaces/ws-eic/files?root=/configs", nil) + + (&TemplatesHandler{}).ListFiles(c) + + if w.Code != http.StatusOK { + t.Fatalf("expected 200, got %d: %s", w.Code, w.Body.String()) + } + var got []map[string]interface{} + if err := json.Unmarshal(w.Body.Bytes(), &got); err != nil { + t.Fatalf("response not JSON array: %v (body=%s)", err, w.Body.String()) + } + // EIC stub returned no output → empty list. The point of this + // assertion is "200 with [] from EIC", not "fell through to host + // template fallback which would 200 with []" — to discriminate, + // we ALSO assert mock expectations were met (proving the new SQL + // shape was queried) AND the local-Docker fallback path can't + // have run (handler.docker is nil here, so findContainer returns + // "" and the only paths that reach 200 are EIC or template-dir; + // template-dir requires a non-empty configsDir which we left at + // "" via the zero-value handler). + if got == nil { + t.Errorf("expected JSON array (even if empty); got null") + } + if err := mock.ExpectationsWereMet(); err != nil { + t.Errorf("unmet sqlmock expectations: %v", err) + } +} + +// TestListFiles_EICDispatch_Error: a real EIC failure (network blip, +// AWS API throttle, sshd down) must surface as 500, NOT silently fall +// through to the local-Docker path which would mask the failure as +// "0 files" — which is the exact UX symptom the PR-A bug report cites. +func TestListFiles_EICDispatch_Error(t *testing.T) { + mock := setupTestDB(t) + setupTestRedis(t) + + mock.ExpectQuery(`SELECT name, COALESCE\(instance_id, ''\), COALESCE\(runtime, ''\) FROM workspaces WHERE id =`). + WithArgs("ws-eic-err"). + WillReturnRows(sqlmock.NewRows([]string{"name", "instance_id", "runtime"}). + AddRow("My Agent", "i-test", "claude-code")) + + stubWithEICTunnel(t, errors.New("eic open-tunnel: timeout")) + + w := httptest.NewRecorder() + c, _ := gin.CreateTestContext(w) + c.Params = gin.Params{{Key: "id", Value: "ws-eic-err"}} + c.Request = httptest.NewRequest("GET", "/workspaces/ws-eic-err/files?root=/home", nil) + + (&TemplatesHandler{}).ListFiles(c) + + if w.Code != http.StatusInternalServerError { + t.Fatalf("expected 500, got %d: %s", w.Code, w.Body.String()) + } + if !strings.Contains(w.Body.String(), "failed to list files") { + t.Errorf("error body should describe ListFiles failure; got %s", w.Body.String()) + } +} + +// TestListFiles_EICBranch_NotTakenForSelfHosted: workspaces with no +// instance_id (self-hosted, local-Docker path) MUST NOT enter the EIC +// branch. Stubs withEICTunnel to fail loudly if it's called — the +// stub being invoked is itself the assertion failure. +func TestListFiles_EICBranch_NotTakenForSelfHosted(t *testing.T) { + mock := setupTestDB(t) + setupTestRedis(t) + + mock.ExpectQuery(`SELECT name, COALESCE\(instance_id, ''\), COALESCE\(runtime, ''\) FROM workspaces WHERE id =`). + WithArgs("ws-local"). + WillReturnRows(sqlmock.NewRows([]string{"name", "instance_id", "runtime"}). + AddRow("Local Agent", "", "")) + + prev := withEICTunnel + withEICTunnel = func(ctx context.Context, instanceID string, fn func(s eicSSHSession) error) error { + t.Errorf("withEICTunnel called for self-hosted workspace (instance_id=''); EIC branch must be gated on non-empty instance_id") + return errors.New("should not be called") + } + defer func() { withEICTunnel = prev }() + + w := httptest.NewRecorder() + c, _ := gin.CreateTestContext(w) + c.Params = gin.Params{{Key: "id", Value: "ws-local"}} + c.Request = httptest.NewRequest("GET", "/workspaces/ws-local/files", nil) + + (&TemplatesHandler{configsDir: t.TempDir()}).ListFiles(c) + + // Don't pin the response code here — the local path's behavior is + // covered by TestListFiles_FallbackToHost_NoTemplate. Just confirm + // EIC wasn't called. +} + +// ---- DeleteFile EIC dispatch ---- + +// TestDeleteFile_EICDispatch_Success: same shape as ListFiles — +// instance_id != "" routes to deleteFileViaEIC and returns 200 on +// success. Pre-PR-A right-click delete on a SaaS workspace silently +// no-op'd because findContainer returned "" and the ephemeral-volume +// fallback only handles local Docker volumes. +func TestDeleteFile_EICDispatch_Success(t *testing.T) { + mock := setupTestDB(t) + setupTestRedis(t) + + mock.ExpectQuery(`SELECT name, COALESCE\(instance_id, ''\), COALESCE\(runtime, ''\) FROM workspaces WHERE id =`). + WithArgs("ws-eic-del"). + WillReturnRows(sqlmock.NewRows([]string{"name", "instance_id", "runtime"}). + AddRow("My Agent", "i-test", "claude-code")) + + stubWithEICTunnel(t, nil) + + w := httptest.NewRecorder() + c, _ := gin.CreateTestContext(w) + c.Params = gin.Params{ + {Key: "id", Value: "ws-eic-del"}, + {Key: "path", Value: "old.txt"}, + } + c.Request = httptest.NewRequest("DELETE", "/workspaces/ws-eic-del/files/old.txt", nil) + + (&TemplatesHandler{}).DeleteFile(c) + + if w.Code != http.StatusOK { + t.Fatalf("expected 200, got %d: %s", w.Code, w.Body.String()) + } + if !strings.Contains(w.Body.String(), `"deleted"`) { + t.Errorf("expected status:deleted; got %s", w.Body.String()) + } +} + +func TestDeleteFile_EICDispatch_Error(t *testing.T) { + mock := setupTestDB(t) + setupTestRedis(t) + + mock.ExpectQuery(`SELECT name, COALESCE\(instance_id, ''\), COALESCE\(runtime, ''\) FROM workspaces WHERE id =`). + WithArgs("ws-eic-del-err"). + WillReturnRows(sqlmock.NewRows([]string{"name", "instance_id", "runtime"}). + AddRow("My Agent", "i-test", "hermes")) + + stubWithEICTunnel(t, errors.New("ssh rm: connection refused")) + + w := httptest.NewRecorder() + c, _ := gin.CreateTestContext(w) + c.Params = gin.Params{ + {Key: "id", Value: "ws-eic-del-err"}, + {Key: "path", Value: "old.txt"}, + } + c.Request = httptest.NewRequest("DELETE", "/workspaces/ws-eic-del-err/files/old.txt", nil) + + (&TemplatesHandler{}).DeleteFile(c) + + if w.Code != http.StatusInternalServerError { + t.Fatalf("expected 500, got %d: %s", w.Code, w.Body.String()) + } +} + +// TestListFiles_RootValidation: the handler must reject roots outside +// the allowlist BEFORE any DB query (otherwise a bad root would burn +// a tunnel + EIC call to discover what a 400 already knows). Critical +// security guard — without it `?root=/etc` would translate via the +// resolver's literal-pass-through. Let me prove the gate exists by +// driving an out-of-allowlist root and asserting 400 + no DB query. +func TestListFiles_RootValidation(t *testing.T) { + setupTestDB(t) + setupTestRedis(t) + + w := httptest.NewRecorder() + c, _ := gin.CreateTestContext(w) + c.Params = gin.Params{{Key: "id", Value: "ws-x"}} + c.Request = httptest.NewRequest("GET", "/workspaces/ws-x/files?root=/etc", nil) + + (&TemplatesHandler{}).ListFiles(c) + + if w.Code != http.StatusBadRequest { + t.Errorf("expected 400 for /etc root, got %d: %s", w.Code, w.Body.String()) + } +} + +// TestDeleteFile_RootValidation mirrors the ListFiles guard. PR-A +// added ?root= handling to DeleteFile so the canvas's right-click +// delete works on any root (not just /configs) — that means the +// allowlist guard has to be present here too, otherwise an unsafe +// root flows straight into the resolver. +func TestDeleteFile_RootValidation(t *testing.T) { + setupTestDB(t) + setupTestRedis(t) + + w := httptest.NewRecorder() + c, _ := gin.CreateTestContext(w) + c.Params = gin.Params{ + {Key: "id", Value: "ws-x"}, + {Key: "path", Value: "f.txt"}, + } + c.Request = httptest.NewRequest("DELETE", "/workspaces/ws-x/files/f.txt?root=/etc", nil) + + (&TemplatesHandler{}).DeleteFile(c) + + if w.Code != http.StatusBadRequest { + t.Errorf("expected 400 for /etc root, got %d: %s", w.Code, w.Body.String()) + } +} diff --git a/workspace-server/internal/handlers/template_files_eic_shells_test.go b/workspace-server/internal/handlers/template_files_eic_shells_test.go new file mode 100644 index 00000000..44ed34c9 --- /dev/null +++ b/workspace-server/internal/handlers/template_files_eic_shells_test.go @@ -0,0 +1,200 @@ +package handlers + +// template_files_eic_shells_test.go — pure-function tests for the +// remote shell builders + parser. Factored out of the EIC helpers so +// the wire shape can be pinned without standing up a real EIC tunnel +// or sshd. If a future edit changes the find/install/cat/rm shell in +// a way that drifts from the local-Docker container path, these tests +// catch it before staging. + +import ( + "strings" + "testing" +) + +// TestBuildInstallShell pins the write-side remote command. `install` +// (not `cp`/`tee`) is load-bearing — it creates parent dirs (-D) and +// writes atomically via temp-file-rename. Permissions 0644 match the +// local-Docker tar-unpack defaults so a save → restart → save → restart +// cycle doesn't flip-flop file modes per backend. +func TestBuildInstallShell(t *testing.T) { + got := buildInstallShell("/configs/config.yaml") + wants := []string{ + "sudo -n", // privilege escalation for root-owned /configs + "install -D", // creates parent dirs + "-m 0644", // permission contract + "/dev/stdin", // pipe-from-ssh source + "'/configs/config.yaml'", // shell-quoted destination + } + for _, w := range wants { + if !strings.Contains(got, w) { + t.Errorf("buildInstallShell missing %q in: %s", w, got) + } + } +} + +// TestBuildCatShell pins the read-side remote command. `2>/dev/null` +// is load-bearing: without it the missing-file case emits "cat: ...: +// No such file" to stderr, and the helper's "empty stdout + empty +// stderr → os.ErrNotExist" classifier fires the wrong branch (500 +// instead of 404). The tunnel-warning silencer (LogLevel=ERROR in +// sshArgs) handles the ssh side; this one handles the remote-cmd side. +func TestBuildCatShell(t *testing.T) { + got := buildCatShell("/home/ubuntu/.hermes/config.yaml") + wants := []string{ + "sudo -n", + "cat", + "'/home/ubuntu/.hermes/config.yaml'", + "2>/dev/null", // missing-file → empty stdout + non-zero exit + } + for _, w := range wants { + if !strings.Contains(got, w) { + t.Errorf("buildCatShell missing %q in: %s", w, got) + } + } +} + +// TestBuildRmShell pins `rm -f`, NOT `rm -rf`. A misclassified +// directory entry passing through the validator must NOT trigger a +// recursive delete. Directory removal needs its own explicit endpoint +// when/if the canvas grows that affordance. +func TestBuildRmShell(t *testing.T) { + got := buildRmShell("/configs/dead.yaml") + wants := []string{"sudo -n", "rm -f", "'/configs/dead.yaml'"} + for _, w := range wants { + if !strings.Contains(got, w) { + t.Errorf("buildRmShell missing %q in: %s", w, got) + } + } + // Negative assertion: NEVER emit -rf. + if strings.Contains(got, "rm -rf") { + t.Errorf("buildRmShell uses -rf, must use -f only: %s", got) + } +} + +// TestBuildFindShell pins the listing-side remote command — it must +// match the local-Docker path's parser shape (TYPE|SIZE|REL_PATH per +// line) AND prune the same hidden / cache directories. If either +// side drifts, a /workspace listing on EC2 either drowns in node_modules +// noise (pruning regression) or drops files entirely (parser shape +// regression). +func TestBuildFindShell(t *testing.T) { + got := buildFindShell("/workspace", 2) + wants := []string{ + "sudo -n find", + "'/workspace'", + "-maxdepth 2", + // Matches local-Docker container path; without these the EC2 + // listing fills with VCS/build artefacts. + "-not -path '*/.git/*'", + "-not -path '*/__pycache__/*'", + "-not -path '*/node_modules/*'", + "-not -name .DS_Store", + "2>/dev/null", // missing-root → empty stdout + non-zero exit + // Wire shape — emit "TYPE|SIZE|REL_PATH" so parseFindOutput + // (and the canvas tree builder) can decode each line. + "d|0|", + "f|", + // Portable stat: GNU first, BSD fallback, then 0. + "stat -c %s", + "stat -f %z", + } + for _, w := range wants { + if !strings.Contains(got, w) { + t.Errorf("buildFindShell missing %q in: %s", w, got) + } + } +} + +// TestBuildFindShell_DepthForwarding catches a regression where the +// helper hard-codes a depth instead of using the caller's value. +// `?depth=` on the canvas side controls how many levels expand on +// load — losing it means the file tree is either empty (depth=0) or +// the network blows up on a top-level /home with everyone's $HOME +// (uncapped). +func TestBuildFindShell_DepthForwarding(t *testing.T) { + for _, d := range []int{1, 3, 5} { + got := buildFindShell("/configs", d) + want := "-maxdepth " + intToStr(d) + if !strings.Contains(got, want) { + t.Errorf("buildFindShell depth=%d output missing %q: %s", d, want, got) + } + } +} + +// intToStr avoids pulling strconv into a one-liner; matches the shell +// builder's fmt.Sprintf %d output exactly. +func intToStr(n int) string { + if n == 0 { + return "0" + } + neg := n < 0 + if neg { + n = -n + } + var buf [20]byte + i := len(buf) + for n > 0 { + i-- + buf[i] = byte('0' + n%10) + n /= 10 + } + s := string(buf[i:]) + if neg { + return "-" + s + } + return s +} + +// TestParseFindOutput pins the parser. Each line is TYPE|SIZE|REL, +// blank/short lines silently skipped. Pre-PR-A this logic was inlined +// in the handler with the same shape; extracting + testing separately +// removes the "regex passes against the inline parser but a future +// refactor of the handler subtly changes the parse" failure mode. +func TestParseFindOutput(t *testing.T) { + in := []byte(`d|0|nested +f|123|nested/a.yaml +f|45|README.md + +invalid-line +f||no-size +d|0| +`) + got := parseFindOutput(in) + // Want 4 entries: nested(d), nested/a.yaml(f,123), README.md(f,45), + // no-size(f,0). Blank lines, "invalid-line" (no pipes), and + // `d|0|` (empty rel) are skipped. + wantPaths := []string{"nested", "nested/a.yaml", "README.md", "no-size"} + if len(got) != len(wantPaths) { + t.Fatalf("got %d entries, want %d: %+v", len(got), len(wantPaths), got) + } + for i, w := range wantPaths { + if got[i].Path != w { + t.Errorf("entry[%d].Path = %q, want %q", i, got[i].Path, w) + } + } + if !got[0].Dir { + t.Errorf("entry[0] should be Dir") + } + if got[1].Size != 123 { + t.Errorf("entry[1].Size = %d, want 123", got[1].Size) + } + if got[3].Size != 0 { + t.Errorf("entry[3].Size on missing-size line = %d, want 0", got[3].Size) + } +} + +// TestParseFindOutput_EmptyInput — a missing listing root yields +// empty stdout (find swallows the "No such file" via 2>/dev/null), +// which must round-trip to a JSON `[]`, not null. The handler does +// `make([]eicFileEntry, 0)` to enforce this; the test pins the +// helper-level guarantee independently. +func TestParseFindOutput_EmptyInput(t *testing.T) { + got := parseFindOutput([]byte("")) + if got == nil { + t.Errorf("parseFindOutput(\"\") returned nil; want empty slice for JSON []") + } + if len(got) != 0 { + t.Errorf("parseFindOutput(\"\") = %+v; want []", got) + } +} diff --git a/workspace-server/internal/handlers/template_files_eic_test.go b/workspace-server/internal/handlers/template_files_eic_test.go index e5bc8a48..2d30422c 100644 --- a/workspace-server/internal/handlers/template_files_eic_test.go +++ b/workspace-server/internal/handlers/template_files_eic_test.go @@ -7,39 +7,112 @@ import ( "testing" ) -// TestResolveWorkspaceFilePath_KnownRuntimes — the runtime → base-path -// map is the source of truth for where saved files land on the workspace -// EC2. Changing a base path without a migration shim silently orphans -// previously-saved files; this test pins the current contract. -func TestResolveWorkspaceFilePath_KnownRuntimes(t *testing.T) { +// TestResolveWorkspaceFilePath_RuntimeIndirection pins the +// `?root="/configs"` (or empty / unrecognized) → runtime managed-config +// dir behavior. Hermes uses /home/ubuntu/.hermes; claude-code uses +// /configs; unknowns fall back to /configs. This indirection is the +// reason hermes Config-tab edits land in the right place even though +// the canvas only ever sends `?root=/configs`. Changing it without a +// migration shim silently orphans previously-saved files. +func TestResolveWorkspaceFilePath_RuntimeIndirection(t *testing.T) { cases := []struct { runtime string + root string relPath string want string }{ - {"hermes", "config.yaml", "/home/ubuntu/.hermes/config.yaml"}, - {"HERMES", "config.yaml", "/home/ubuntu/.hermes/config.yaml"}, // case-insensitive - {"hermes", "nested/a.yaml", "/home/ubuntu/.hermes/nested/a.yaml"}, + {"hermes", "/configs", "config.yaml", "/home/ubuntu/.hermes/config.yaml"}, + {"HERMES", "/configs", "config.yaml", "/home/ubuntu/.hermes/config.yaml"}, // case-insensitive + {"hermes", "/configs", "nested/a.yaml", "/home/ubuntu/.hermes/nested/a.yaml"}, + {"hermes", "", "config.yaml", "/home/ubuntu/.hermes/config.yaml"}, // empty root → runtime indirection + {"hermes", "/etc", "config.yaml", "/home/ubuntu/.hermes/config.yaml"}, // out-of-allowlist → runtime indirection // claude-code (and any future containerized runtime) lands at /configs — // the path user-data creates and bind-mounts into the container. Pre-fix // this fell through to /opt/configs which doesn't exist on workspace EC2s // and would 500 with EACCES on save (the bug that motivated this gate). - {"claude-code", "config.yaml", "/configs/config.yaml"}, - {"CLAUDE-CODE", "config.yaml", "/configs/config.yaml"}, // case-insensitive - {"langgraph", "config.yaml", "/opt/configs/config.yaml"}, - {"external", "skills.json", "/opt/configs/skills.json"}, - {"", "config.yaml", "/configs/config.yaml"}, // empty → default - {"unknown", "config.yaml", "/configs/config.yaml"}, // unknown → default + {"claude-code", "/configs", "config.yaml", "/configs/config.yaml"}, + {"CLAUDE-CODE", "/configs", "config.yaml", "/configs/config.yaml"}, // case-insensitive + {"langgraph", "/configs", "config.yaml", "/opt/configs/config.yaml"}, + {"external", "/configs", "skills.json", "/opt/configs/skills.json"}, + {"", "/configs", "config.yaml", "/configs/config.yaml"}, // empty runtime → default + {"unknown", "/configs", "config.yaml", "/configs/config.yaml"}, // unknown → default } for _, tc := range cases { - t.Run(tc.runtime+"/"+tc.relPath, func(t *testing.T) { - got, err := resolveWorkspaceFilePath(tc.runtime, tc.relPath) + t.Run(tc.runtime+"+"+tc.root+"/"+tc.relPath, func(t *testing.T) { + got, err := resolveWorkspaceFilePath(tc.runtime, tc.root, tc.relPath) if err != nil { t.Fatalf("unexpected err: %v", err) } if got != tc.want { - t.Errorf("resolveWorkspaceFilePath(%q,%q) = %q, want %q", - tc.runtime, tc.relPath, got, tc.want) + t.Errorf("resolveWorkspaceFilePath(%q,%q,%q) = %q, want %q", + tc.runtime, tc.root, tc.relPath, got, tc.want) + } + }) + } +} + +// TestResolveWorkspaceFilePath_LiteralRoots pins that the universal +// allow-listed roots (`/home`, `/workspace`, `/plugins`) pass through +// LITERALLY rather than getting rewritten to the runtime prefix. This +// is the half of the resolver that the FilesTab "/home" selector +// depends on — without it, picking /home on a hermes workspace would +// route to /home/ubuntu/.hermes (the runtime indirection) and the +// canvas's tree row would never line up with what the user sees on +// the EC2 host. +func TestResolveWorkspaceFilePath_LiteralRoots(t *testing.T) { + cases := []struct { + runtime string + root string + relPath string + want string + }{ + // /home is always literal regardless of runtime — it's a + // universal Linux path, not a managed-config indirection. + {"hermes", "/home", "ubuntu/.bashrc", "/home/ubuntu/.bashrc"}, + {"claude-code", "/home", "ubuntu/notes.md", "/home/ubuntu/notes.md"}, + {"langgraph", "/home", "ubuntu/x", "/home/ubuntu/x"}, + // /workspace and /plugins are also literal — runtime is ignored. + {"hermes", "/workspace", "src/main.go", "/workspace/src/main.go"}, + {"claude-code", "/plugins", "p/manifest.yaml", "/plugins/p/manifest.yaml"}, + } + for _, tc := range cases { + t.Run(tc.runtime+"+"+tc.root+"/"+tc.relPath, func(t *testing.T) { + got, err := resolveWorkspaceFilePath(tc.runtime, tc.root, tc.relPath) + if err != nil { + t.Fatalf("unexpected err: %v", err) + } + if got != tc.want { + t.Errorf("resolveWorkspaceFilePath(%q,%q,%q) = %q, want %q", + tc.runtime, tc.root, tc.relPath, got, tc.want) + } + }) + } +} + +// TestResolveWorkspaceRootPath pins the directory-only translation +// used by listFilesViaEIC. Same indirection rules as +// resolveWorkspaceFilePath but without joining a relative path. +func TestResolveWorkspaceRootPath(t *testing.T) { + cases := []struct { + runtime string + root string + want string + }{ + {"hermes", "/configs", "/home/ubuntu/.hermes"}, + {"claude-code", "/configs", "/configs"}, + {"hermes", "", "/home/ubuntu/.hermes"}, + {"hermes", "/home", "/home"}, + {"claude-code", "/workspace", "/workspace"}, + {"hermes", "/plugins", "/plugins"}, + {"unknown", "/configs", "/configs"}, + {"hermes", "/etc", "/home/ubuntu/.hermes"}, // not allowlisted → runtime indirection + } + for _, tc := range cases { + t.Run(tc.runtime+"+"+tc.root, func(t *testing.T) { + got := resolveWorkspaceRootPath(tc.runtime, tc.root) + if got != tc.want { + t.Errorf("resolveWorkspaceRootPath(%q,%q) = %q, want %q", + tc.runtime, tc.root, got, tc.want) } }) } @@ -53,48 +126,80 @@ func TestResolveWorkspaceFilePath_KnownRuntimes(t *testing.T) { // We only assert the cases that Clean() can't rescue. func TestResolveWorkspaceFilePath_RejectsTraversal(t *testing.T) { bad := []string{ - "../etc/shadow", // escapes base via .. - "/etc/shadow", // absolute path - "./../../etc", // multiple .. - "a/../../etc", // escapes via deeper .. + "../etc/shadow", // escapes base via .. + "/etc/shadow", // absolute path + "./../../etc", // multiple .. + "a/../../etc", // escapes via deeper .. } for _, rel := range bad { t.Run(rel, func(t *testing.T) { - _, err := resolveWorkspaceFilePath("hermes", rel) + _, err := resolveWorkspaceFilePath("hermes", "/configs", rel) if err == nil { - t.Errorf("resolveWorkspaceFilePath(hermes, %q) should have errored, got nil", rel) + t.Errorf("resolveWorkspaceFilePath(hermes,/configs,%q) should have errored, got nil", rel) } }) } } -// TestSSHArgs_LogLevelErrorBothSites pins that BOTH ssh invocations -// (writeFileViaEIC + readFileViaEIC) include `-o LogLevel=ERROR`. +// TestSSHArgs_HardenedFlags pins the ssh option set returned by +// eicSSHSession.sshArgs(). Centralising the args was deliberate so a +// fix like PR #2822's `LogLevel=ERROR` (silences the benign +// known-hosts warning that fooled the read/list "empty stderr → not +// found" classifier) only needs to land in one place. // -// Without that flag, ssh emits a "Warning: Permanently added -// '[127.0.0.1]:NNNNN' (ED25519) to the list of known hosts." line on -// every fresh tunnel connection (even with UserKnownHostsFile=/dev/null -// — that prevents persistence, not the warning). The warning lands on -// stderr, which fools readFileViaEIC's "empty stdout + empty stderr → -// file not found" classifier into thinking the warning is a real -// ssh-layer error and returning 500 instead of 404. -// -// Caught 2026-05-05 02:38 on hongming.moleculesai.app: opening Hermes -// workspace's Config tab returned 500 with body +// Caught 2026-05-05 02:38 on hongming.moleculesai.app: opening +// Hermes workspace's Config tab returned 500 with body // `ssh cat: exit status 1 (Warning: Permanently added '[127.0.0.1]:37951'…)`. // -// LogLevel=ERROR silences info+warning while keeping real auth/tunnel -// errors visible. This test reads the source and asserts the flag -// appears at least twice (one per ssh block) — fires if a future edit -// removes it from either site. -func TestSSHArgs_LogLevelErrorBothSites(t *testing.T) { +// Asserts each load-bearing flag appears in the args slice — fires if +// a future edit removes any of them. +func TestSSHArgs_HardenedFlags(t *testing.T) { + s := eicSSHSession{keyPath: "/tmp/k", localPort: 12345, osUser: "ubuntu", instanceID: "i-x"} + got := s.sshArgs("echo hi") + wantFragments := [][]string{ + {"-i", "/tmp/k"}, + {"-o", "StrictHostKeyChecking=no"}, + {"-o", "UserKnownHostsFile=/dev/null"}, + {"-o", "LogLevel=ERROR"}, + {"-o", "ServerAliveInterval=15"}, + {"-p", "12345"}, + } + joined := strings.Join(got, " ") + for _, frag := range wantFragments { + if !strings.Contains(joined, strings.Join(frag, " ")) { + t.Errorf("sshArgs() missing fragment %v; got: %v", frag, got) + } + } + // Last two args must be `@127.0.0.1` then the remote command. + if got[len(got)-2] != "ubuntu@127.0.0.1" { + t.Errorf("sshArgs() second-last must be user@127.0.0.1; got: %q", got[len(got)-2]) + } + if got[len(got)-1] != "echo hi" { + t.Errorf("sshArgs() last must be the remote command; got: %q", got[len(got)-1]) + } +} + +// TestEicSSHSessionSingleSourceForSSHFlags is a structural guard: the +// production EIC source must invoke s.sshArgs() exclusively for ssh +// invocations — direct ssh args inlined in any helper would re-open +// the regression that PR #2822 closed (LogLevel=ERROR drift between +// helpers). Counts `s.sshArgs(` occurrences (one per file op) and +// fails if anyone copy-pastes a raw ssh args slice. +func TestEicSSHSessionSingleSourceForSSHFlags(t *testing.T) { src, err := os.ReadFile("template_files_eic.go") if err != nil { t.Fatalf("read source: %v", err) } - matches := regexp.MustCompile(`"-o", "LogLevel=ERROR"`).FindAllIndex(src, -1) - if len(matches) < 2 { - t.Errorf("expected LogLevel=ERROR in BOTH ssh blocks (write + read); found %d occurrences", len(matches)) + // Each of write/read/list/delete should call s.sshArgs once. + matches := regexp.MustCompile(`s\.sshArgs\(`).FindAllIndex(src, -1) + if len(matches) < 4 { + t.Errorf("expected ≥4 s.sshArgs() callers (write/read/list/delete); found %d", len(matches)) + } + // Belt and braces: no helper should be assembling its own + // `LogLevel=ERROR` literal outside of sshArgs. + literal := regexp.MustCompile(`"-o", "LogLevel=ERROR"`).FindAllIndex(src, -1) + if len(literal) != 1 { + t.Errorf("LogLevel=ERROR must appear exactly once (in sshArgs); found %d occurrences — drift risk", len(literal)) } } diff --git a/workspace-server/internal/handlers/template_import.go b/workspace-server/internal/handlers/template_import.go index 95b5854f..7b16c17f 100644 --- a/workspace-server/internal/handlers/template_import.go +++ b/workspace-server/internal/handlers/template_import.go @@ -216,7 +216,12 @@ func (h *TemplatesHandler) ReplaceFiles(c *gin.Context) { // as a follow-up. if instanceID != "" { for relPath, content := range body.Files { - if err := writeFileViaEIC(ctx, instanceID, runtime, relPath, []byte(content)); err != nil { + // ReplaceFiles is a bulk template-import endpoint — files + // always land in the runtime's managed-config dir. Pass + // "/configs" so resolveWorkspaceFilePath routes through the + // runtime prefix map (matches the local-Docker arm below + // which always copies to /configs). + if err := writeFileViaEIC(ctx, instanceID, runtime, "/configs", relPath, []byte(content)); err != nil { log.Printf("ReplaceFiles EIC for %s path=%s: %v", workspaceID, relPath, err) c.JSON(http.StatusInternalServerError, gin.H{"error": fmt.Sprintf("failed to write file %s: %v", relPath, err)}) return diff --git a/workspace-server/internal/handlers/templates.go b/workspace-server/internal/handlers/templates.go index 03776a5d..2b3b66d6 100644 --- a/workspace-server/internal/handlers/templates.go +++ b/workspace-server/internal/handlers/templates.go @@ -243,8 +243,11 @@ func (h *TemplatesHandler) ListFiles(c *gin.Context) { listPath = rootPath + "/" + subPath } - var wsName string - if err := db.DB.QueryRowContext(ctx, `SELECT name FROM workspaces WHERE id = $1`, workspaceID).Scan(&wsName); err != nil { + var wsName, instanceID, runtime string + if err := db.DB.QueryRowContext(ctx, + `SELECT name, COALESCE(instance_id, ''), COALESCE(runtime, '') FROM workspaces WHERE id = $1`, + workspaceID, + ).Scan(&wsName, &instanceID, &runtime); err != nil { c.JSON(http.StatusNotFound, gin.H{"error": "workspace not found"}) return } @@ -255,6 +258,32 @@ func (h *TemplatesHandler) ListFiles(c *gin.Context) { Dir bool `json:"dir"` } + // SaaS workspace (EC2-per-workspace) — no Docker on this tenant. List + // via SSH through the EIC endpoint, mirroring ReadFile/WriteFile's + // dispatch. Pre-fix this branch was missing and SaaS workspaces + // always fell through to local-Docker check (finds nothing on a SaaS + // tenant) + template-dir fallback (returns the seed template, not + // the persisted state, and almost never matches on user-named + // workspaces). Net effect: the canvas Files tab always rendered "0 + // files / No config files yet" for SaaS workspaces, regardless of + // what was actually on disk. See issue #2999. + if instanceID != "" { + entries, err := listFilesViaEIC(ctx, instanceID, runtime, rootPath, subPath, depth) + if err != nil { + log.Printf("ListFiles EIC for %s root=%s sub=%s: %v", workspaceID, rootPath, subPath, err) + c.JSON(http.StatusInternalServerError, gin.H{"error": fmt.Sprintf("failed to list files: %v", err)}) + return + } + // Translate to the handler's wire shape (the field names match + // 1:1, but Go can't implicit-convert named struct types). + out := make([]fileEntry, 0, len(entries)) + for _, e := range entries { + out = append(out, fileEntry{Path: e.Path, Size: e.Size, Dir: e.Dir}) + } + c.JSON(http.StatusOK, out) + return + } + // Try container filesystem first if containerName := h.findContainer(ctx, workspaceID); containerName != "" { // Portable file listing: works on both GNU and BusyBox/Alpine. @@ -378,12 +407,13 @@ func (h *TemplatesHandler) ReadFile(c *gin.Context) { // canvas Config tab always 404'd for SaaS workspaces — visible to // users after #2781 added the "no config.yaml" error UX. // - // The ?root= query param is intentionally ignored on the SaaS path — - // it's a local-Docker concept (arbitrary container roots). The - // runtime → base-path map (workspaceFilePathPrefix in - // template_files_eic.go) is the SaaS source of truth. + // `?root=` flows through resolveWorkspaceFilePath: "/configs" stays + // the per-runtime managed-config indirection (claude-code → /configs, + // hermes → /home/ubuntu/.hermes); other allow-listed roots + // (`/home`, `/workspace`, `/plugins`) pass through literally so + // list/read/write/delete agree on what file a tree row points to. if instanceID != "" { - content, err := readFileViaEIC(ctx, instanceID, runtime, filePath) + content, err := readFileViaEIC(ctx, instanceID, runtime, rootPath, filePath) if err == nil { c.JSON(http.StatusOK, gin.H{ "path": filePath, @@ -468,6 +498,11 @@ func (h *TemplatesHandler) WriteFile(c *gin.Context) { } ctx := c.Request.Context() + rootPath := c.DefaultQuery("root", "/configs") + if !allowedRoots[rootPath] { + c.JSON(http.StatusBadRequest, gin.H{"error": "root must be one of: /configs, /workspace, /home, /plugins"}) + return + } var wsName, instanceID, runtime string if err := db.DB.QueryRowContext(ctx, `SELECT name, COALESCE(instance_id, ''), COALESCE(runtime, '') FROM workspaces WHERE id = $1`, @@ -479,8 +514,11 @@ func (h *TemplatesHandler) WriteFile(c *gin.Context) { // SaaS workspace (EC2-per-workspace) — no Docker on this tenant. Write // via SSH through the EIC endpoint to the runtime-specific path. + // `?root=` flows through the same per-runtime / literal indirection + // as ReadFile so list/read/write/delete agree on what file a tree + // row points to. if instanceID != "" { - if err := writeFileViaEIC(ctx, instanceID, runtime, filePath, []byte(body.Content)); err != nil { + if err := writeFileViaEIC(ctx, instanceID, runtime, rootPath, filePath, []byte(body.Content)); err != nil { log.Printf("WriteFile EIC for %s path=%s: %v", workspaceID, filePath, err) c.JSON(http.StatusInternalServerError, gin.H{"error": fmt.Sprintf("failed to write file: %v", err)}) return @@ -528,12 +566,35 @@ func (h *TemplatesHandler) DeleteFile(c *gin.Context) { } ctx := c.Request.Context() - var wsName string - if err := db.DB.QueryRowContext(ctx, `SELECT name FROM workspaces WHERE id = $1`, workspaceID).Scan(&wsName); err != nil { + rootPath := c.DefaultQuery("root", "/configs") + if !allowedRoots[rootPath] { + c.JSON(http.StatusBadRequest, gin.H{"error": "root must be one of: /configs, /workspace, /home, /plugins"}) + return + } + var wsName, instanceID, runtime string + if err := db.DB.QueryRowContext(ctx, + `SELECT name, COALESCE(instance_id, ''), COALESCE(runtime, '') FROM workspaces WHERE id = $1`, + workspaceID, + ).Scan(&wsName, &instanceID, &runtime); err != nil { c.JSON(http.StatusNotFound, gin.H{"error": "workspace not found"}) return } + // SaaS workspace (EC2-per-workspace) — no Docker on this tenant. Delete + // via SSH through the EIC endpoint, mirroring ReadFile/WriteFile's + // dispatch. Pre-fix this branch was missing — DeleteFile fell through + // to local-Docker (no container) + ephemeral-volume (no Docker) and + // silently 500'd. See issue #2999. + if instanceID != "" { + if err := deleteFileViaEIC(ctx, instanceID, runtime, rootPath, filePath); err != nil { + log.Printf("DeleteFile EIC for %s root=%s path=%s: %v", workspaceID, rootPath, filePath, err) + c.JSON(http.StatusInternalServerError, gin.H{"error": fmt.Sprintf("failed to delete file: %v", err)}) + return + } + c.JSON(http.StatusOK, gin.H{"status": "deleted", "path": filePath}) + return + } + // Delete via docker exec when container is running if containerName := h.findContainer(ctx, workspaceID); containerName != "" { // CWE-78: use filepath.Join instead of string concat to prevent path diff --git a/workspace-server/internal/handlers/templates_test.go b/workspace-server/internal/handlers/templates_test.go index 3d75bfd5..857d6fb2 100644 --- a/workspace-server/internal/handlers/templates_test.go +++ b/workspace-server/internal/handlers/templates_test.go @@ -750,7 +750,11 @@ func TestListFiles_WorkspaceNotFound(t *testing.T) { handler := NewTemplatesHandler(t.TempDir(), nil, nil) - mock.ExpectQuery("SELECT name FROM workspaces WHERE id ="). + // SQL shape: SELECT name, COALESCE(instance_id, ''), COALESCE(runtime, '') FROM workspaces WHERE id = $1 + // (matches the L/R/W/D unified shape so dispatchers can branch on + // instance_id; sqlmock matches via QueryMatcherRegexp so the parens + // need escaping.) + mock.ExpectQuery(`SELECT name, COALESCE\(instance_id, ''\), COALESCE\(runtime, ''\) FROM workspaces WHERE id =`). WithArgs("ws-nonexist"). WillReturnError(sql.ErrNoRows) @@ -777,9 +781,9 @@ func TestListFiles_FallbackToHost_NoTemplate(t *testing.T) { tmpDir := t.TempDir() handler := NewTemplatesHandler(tmpDir, nil, nil) // nil docker = no container - mock.ExpectQuery("SELECT name FROM workspaces WHERE id ="). + mock.ExpectQuery(`SELECT name, COALESCE\(instance_id, ''\), COALESCE\(runtime, ''\) FROM workspaces WHERE id =`). WithArgs("ws-fallback"). - WillReturnRows(sqlmock.NewRows([]string{"name"}).AddRow("Unknown Agent")) + WillReturnRows(sqlmock.NewRows([]string{"name", "instance_id", "runtime"}).AddRow("Unknown Agent", "", "")) w := httptest.NewRecorder() c, _ := gin.CreateTestContext(w) @@ -817,9 +821,9 @@ func TestListFiles_FallbackToHost_WithTemplate(t *testing.T) { handler := NewTemplatesHandler(tmpDir, nil, nil) - mock.ExpectQuery("SELECT name FROM workspaces WHERE id ="). + mock.ExpectQuery(`SELECT name, COALESCE\(instance_id, ''\), COALESCE\(runtime, ''\) FROM workspaces WHERE id =`). WithArgs("ws-tmpl"). - WillReturnRows(sqlmock.NewRows([]string{"name"}).AddRow("Test Agent")) + WillReturnRows(sqlmock.NewRows([]string{"name", "instance_id", "runtime"}).AddRow("Test Agent", "", "")) w := httptest.NewRecorder() c, _ := gin.CreateTestContext(w) @@ -1103,7 +1107,7 @@ func TestDeleteFile_WorkspaceNotFound(t *testing.T) { handler := NewTemplatesHandler(t.TempDir(), nil, nil) - mock.ExpectQuery("SELECT name FROM workspaces WHERE id ="). + mock.ExpectQuery(`SELECT name, COALESCE\(instance_id, ''\), COALESCE\(runtime, ''\) FROM workspaces WHERE id =`). WithArgs("ws-del-nf"). WillReturnError(sql.ErrNoRows)