Hermes workspaces cold-boot in 8-13 min (ripgrep + ffmpeg + node22 +
hermes-agent source build + Playwright + Chromium ~300MB). The canvas's
2-min hardcoded "Provisioning Timeout" warning fired at ~2min and told
users their workspace was "stuck" while it was still mid-install. Users
hit Retry, triggering fresh cold boots and cancelling healthy workspaces.
User-facing symptom (reported 2026-04-24 18:35Z): hermes workspace showed
"has been provisioning for 3m 15s — it may have encountered an issue"
with Retry + Cancel buttons, while the EC2 was installing node_modules.
Fix:
- Keep DEFAULT_PROVISION_TIMEOUT_MS = 120_000 (2min) — correct for fast
docker runtimes (claude-code, langgraph, crewai) where cold boot is
30-90s.
- Add RUNTIME_TIMEOUT_OVERRIDES_MS = { hermes: 720_000 } (12min).
Aligns with tests/e2e/test_staging_full_saas.sh's
PROVISION_TIMEOUT_SECS=900 (15min) so UI warns shortly before the
backend itself gives up.
- New timeoutForRuntime() resolves the base; per-node lookup in the
check-timeouts interval so a mixed batch (1 hermes + 2 langgraph) uses
the right threshold for each.
- timeoutMs prop is now optional. Undefined → per-runtime lookup; a
number → forces a single threshold for every workspace (tests use this
for deterministic behavior).
Tests: 4 new cases pinning the runtime-aware resolution, including a
guard that catches future regressions that would weaken hermes's budget.
Existing tests unchanged (they import DEFAULT_PROVISION_TIMEOUT_MS which
still exports 120_000).
13/13 pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>