From 46fbffb95bae7e8a9dd78d20af3e848df0e2507e Mon Sep 17 00:00:00 2001 From: Hongming Wang Date: Fri, 24 Apr 2026 01:26:13 -0700 Subject: [PATCH] =?UTF-8?q?fix(canvas/e2e):=20raise=20staging-setup=20dead?= =?UTF-8?q?line=2015=20min=20=E2=86=92=2020=20min?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Matches tests/e2e/test_staging_full_saas.sh's 20-min budget (#1930). Canvas E2E was still stuck at 900s (15 min) which regularly flakes on tenant cold boots in 12-15 min range — especially on staging where workspace-server image pulls + AMI bootstrapping add 3-5 min vs prod. Concrete blocker: 2026-04-24 staging→main sync (#1981) kept failing on "tenant provision: timed out after 900s" in canvas/e2e/staging-setup.ts despite the actual sync E2E going green. Canvas-side timeout was strictly tighter than the sync-side timeout. Also raises WORKSPACE_ONLINE_TIMEOUT_MS to 20 min to cover the case where the workspace EC2 is provisioned but hermes cold-install (apt + uv + hermes-agent clone + gateway boot) takes longer than the original 10-min budget — matches the 20-min workspace deadline in SaaS E2E. No behavior change when things are fast. Just covers the tail. Co-Authored-By: Claude Opus 4.7 (1M context) --- canvas/e2e/staging-setup.ts | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/canvas/e2e/staging-setup.ts b/canvas/e2e/staging-setup.ts index 598fb877..7147f4ea 100644 --- a/canvas/e2e/staging-setup.ts +++ b/canvas/e2e/staging-setup.ts @@ -26,8 +26,13 @@ const CP_URL = process.env.MOLECULE_CP_URL || "https://staging-api.moleculesai.a const ADMIN_TOKEN = process.env.MOLECULE_ADMIN_TOKEN; const STAGING = process.env.CANVAS_E2E_STAGING === "1"; -const PROVISION_TIMEOUT_MS = 15 * 60 * 1000; -const WORKSPACE_ONLINE_TIMEOUT_MS = 10 * 60 * 1000; +// Tenant cold boot on staging regularly takes 12-15 min when the +// workspace-server Docker image isn't already cached on the AMI. Raised +// to 20 min to match tests/e2e/test_staging_full_saas.sh (PR #1930) +// after repeated "tenant provision: timed out after 900s" flakes +// were blocking staging→main syncs on 2026-04-24. +const PROVISION_TIMEOUT_MS = 20 * 60 * 1000; +const WORKSPACE_ONLINE_TIMEOUT_MS = 20 * 60 * 1000; const TLS_TIMEOUT_MS = 3 * 60 * 1000; async function jsonFetch(