fix(e2e): increase hermes workspace wait from 20 to 30 min

Root cause of PR #1981 E2E failures (step 7 timeout): - hermes-agent install from NousResearch (Node 22 tarball + Python deps from source) + gateway health wait takes 15-25 min on staging
2026-04-24 17:11:37 +00:00 · 2026-04-24 17:11:37 +00:00 · ca7fa3b65e
commit ca7fa3b65e
parent 6b62391e5d
2 changed files with 4 additions and 4 deletions
--- a/.github/workflows/e2e-staging-saas.yml
+++ b/.github/workflows/e2e-staging-saas.yml
@ -5,7 +5,7 @@ name: E2E Staging SaaS (full lifecycle)
 # HMA memory → activity → peers), then tears down and asserts leak-free.
 #
 # Why a separate workflow (not folded into ci.yml):
-#   - The run takes ~20 min (EC2 boot + cloudflared DNS + provision sweeps +
+#   - The run takes ~25-35 min (EC2 boot + cloudflared DNS + provision sweeps +
 #     agent bootstrap), way too slow for every PR.
 #   - Needs its own concurrency group so two pushes don't fight over the
 #     same staging org slug prefix.
@ -68,7 +68,7 @@ jobs:
  e2e-staging-saas:
    name: E2E Staging SaaS
    runs-on: ubuntu-latest
-    timeout-minutes: 30
+    timeout-minutes: 45
    permissions:
      contents: read

--- a/tests/e2e/test_staging_full_saas.sh
+++ b/tests/e2e/test_staging_full_saas.sh
@ -308,8 +308,8 @@ fi
 #     polling, only hard-fail at the deadline. Pre-bootstrap-watcher-fix
 #     (controlplane#245) this was a flake generator: workspace went
 #     failed→online inside our window but we bailed at the failed read.
-log "7/11 Waiting for workspace(s) to reach status=online (up to 20 min — hermes cold boot)..."
-WS_DEADLINE=$(( $(date +%s) + 1200 ))
+log "7/11 Waiting for workspace(s) to reach status=online (up to 30 min — hermes cold boot)..."
+WS_DEADLINE=$(( $(date +%s) + 1800 ))
 WS_TO_CHECK="$PARENT_ID"
 [ -n "$CHILD_ID" ] && WS_TO_CHECK="$WS_TO_CHECK $CHILD_ID"
 for wid in $WS_TO_CHECK; do