From a242ca8b01ce89cccffbc2cd4debb2dbb8e5c4dd Mon Sep 17 00:00:00 2001 From: Hongming Wang Date: Mon, 4 May 2026 14:43:58 -0700 Subject: [PATCH] test(synth-e2e): add Files API config.yaml round-trip gate MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Today's user-visible bug ("PUT /workspaces//files/config.yaml: 500 … install: cannot create directory '/opt/configs': Permission denied", fixed in #2769) shipped to production and was caught only when an operator opened the Canvas Config tab and clicked Save & Restart on a claude-code workspace. Two compounding root causes: 1. Path-map fall-through: claude-code wasn't in workspaceFilePathPrefix, so it fell through to the /opt/configs default — a path the workspace EC2 doesn't have (cloud-init only creates /configs). 2. Permission: /configs is root-owned, but the SSH-as-ubuntu install command had no sudo prefix, so the write would have failed with EACCES even with the right path. The synth E2E provisions a fresh workspace every cron firing but never PUTs a file via the Files API. So neither failure mode could fail the canary. Add a new step 7c (between terminal-diagnose and A2A) that: - PUTs a known marker into config.yaml on each provisioned workspace - GETs it back and asserts the marker is present - Fails with an actionable message that names the likely class of regression (path map vs permission) so the next operator doesn't have to re-discover today's debugging path The marker includes the run ID so stale state from a prior canary can't false-pass. Why round-trip (not just PUT-and-200): a 200 from PUT only proves the SSH install succeeded somewhere on disk; the GET-back proves the file landed at the path the runtime actually reads from (i.e., that the host:/configs → container:/configs bind-mount sees it). Without the GET, a future bug that writes to a non-bind-mounted host path would silently no-op from the runtime's POV but pass the gate. Deferred (separate PR, requires AWS-creds wiring): a parallel gate that aws ec2 describe-instances on the workspace EC2 and asserts the attached IamInstanceProfile.Arn — would directly catch the #466 IAM profile gap class. Punted because it needs aws-actions/configure-aws- credentials added to continuous-synth-e2e.yml + a read-only IAM role provisioned on the AWS side. Tracked as task #301. Co-Authored-By: Claude Opus 4.7 (1M context) --- tests/e2e/test_staging_full_saas.sh | 48 +++++++++++++++++++++++++++++ 1 file changed, 48 insertions(+) diff --git a/tests/e2e/test_staging_full_saas.sh b/tests/e2e/test_staging_full_saas.sh index 41c58fd5..cffd5b42 100755 --- a/tests/e2e/test_staging_full_saas.sh +++ b/tests/e2e/test_staging_full_saas.sh @@ -504,6 +504,54 @@ for wid in $WS_TO_CHECK; do fi done +# ─── 7c. Workspace files API config.yaml round-trip ──────────────────── +# Pin the config-save path that drives the Canvas Config tab's Save & +# Restart. Two failure classes this gate catches in one shot: +# +# 1. Path map drift (PR #2769). Runtime falls through to the wrong +# base path (e.g. /opt/configs when user-data only created /configs) +# → SSH `install -D` fails with EACCES on a parent dir that doesn't +# exist. The user-visible 500 was unobservable without exercising +# this code path on a fresh workspace. +# 2. Permission drift on /configs. The path is root-owned by cloud-init, +# so the SSH-as-ubuntu install needs `sudo -n`. Any future change +# that drops the sudo, switches to a non-passwordless-sudo OS user, +# or moves the path to a non-ubuntu-writable dir without sudo will +# regress this gate. +# +# Round-trip: PUT a known marker, GET it back, assert content matches. +# Marker shape includes the run id so a stale file from a prior canary +# can't false-pass. +log "7c/11 Files API config.yaml round-trip..." +CONFIG_MARKER="# molecule-synth-e2e: ${E2E_RUN_ID:-unknown} ${RUNTIME} $(date -u +%Y-%m-%dT%H:%M:%SZ)" +CONFIG_PAYLOAD="${CONFIG_MARKER} +name: synth-canary +runtime: ${RUNTIME} +" +for wid in $WS_TO_CHECK; do + PUT_BODY=$(python3 -c "import json,sys; print(json.dumps({'content': sys.stdin.read()}))" <<< "$CONFIG_PAYLOAD") + PUT_RESP=$(tenant_call PUT "/workspaces/$wid/files/config.yaml" \ + -H "Content-Type: application/json" \ + -d "$PUT_BODY" \ + -w $'\n%{http_code}\n' \ + 2>/dev/null || printf '\n500\n') + PUT_CODE=$(echo "$PUT_RESP" | tail -n 2 | head -n 1) + PUT_BODY_OUT=$(echo "$PUT_RESP" | sed '$d' | sed '$d') + if [ "$PUT_CODE" != "200" ] && [ "$PUT_CODE" != "204" ]; then + fail "Workspace $wid Files API PUT config.yaml returned $PUT_CODE: $PUT_BODY_OUT — likely a path-map or permission regression in workspace-server template_files_eic.go" + fi + # GET back and assert the marker line is present. Don't require exact + # equality — the runtime's loader may normalize trailing newline / + # quoting; presence of the marker proves the content landed at the + # path the runtime reads from (vs landing at a host path that's + # invisible to the bind-mounted container). + GET_RESP=$(tenant_call GET "/workspaces/$wid/files/config.yaml" 2>/dev/null || echo "") + if ! echo "$GET_RESP" | grep -qF "$CONFIG_MARKER"; then + fail "Workspace $wid Files API GET config.yaml does not contain the marker just written ('$CONFIG_MARKER'). Either the PUT landed at a host path the container doesn't bind-mount, or the GET reads from a different path. Either way, Canvas Save & Restart will appear to succeed but the workspace won't pick up the change." + fi + ok " $wid config.yaml round-trip OK" +done + # ─── 8. A2A round-trip on parent ─────────────────────────────────────── log "8/11 Sending A2A message to parent — expecting agent response..." # Smoke prompt phrasing — DO NOT trim back to the bare "Reply with exactly: PONG"