molecule-core/tests/e2e
Hongming Wang b3da0b29c5 fix(e2e): hermes cold-boot tolerance — 20min deadline + treat failed as transient
Today's E2E run 24864011116 timed out at 10 min waiting for workspace
to reach online. Hermes cold-boot measured 13 min on the same day's
apt mirror (my manual repro on 18.217.175.225). The original 10 min
deadline was a ~2x too-tight budget.

Also: the `failed` branch was a hard fail, but bootstrap-watcher
(cp#245) marks workspace=failed at 5 min if install.sh hasn't
finished yet. Heartbeat then transitions failed → online around
10-13 min. Pre this fix, the E2E bailed at the failed read and
missed the recovery that was seconds away.

## Changes

- Deadline: 10 min → 20 min (hermes worst-case 15 + slack)
- `failed` status: now tolerated as transient; loop logs once then
  keeps polling. Only hard-fails at the final deadline.
- Added transition logging (`WS_LAST_STATUS`) so CI output shows
  the provisioning → failed → online flow instead of silent polling.

## Why not fix cp#245 instead

Both should be fixed. cp#245 (bootstrap-watcher deadline) is the
root cause; this E2E fix is the defense-in-depth. When cp#245 lands,
the `failed` transient log will stop firing but the rest of the
logic still protects against other slow-apt-day spikes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 17:42:52 -07:00
..
_extract_token.py chore: apply round-7 review nits 2026-04-13 17:08:45 -07:00
_lib.sh feat(platform): GET /admin/workspaces/:id/test-token for E2E (#6) 2026-04-14 09:35:26 -07:00
STAGING_SAAS_E2E.md feat(e2e): pivot to admin-bearer-only auth + add sanity self-check workflow 2026-04-21 04:34:11 -07:00
test_a2a_e2e.sh initial commit — Molecule AI platform 2026-04-13 11:55:37 -07:00
test_activity_e2e.sh chore: apply code-review round-6 suggestions 2026-04-13 17:08:45 -07:00
test_api.sh fix(e2e): stop asserting current_task on public workspace GET (#966) 2026-04-19 02:19:15 -07:00
test_claude_code_e2e.sh chore: final open-source cleanup — binary, stale paths, private refs 2026-04-18 00:38:55 -07:00
test_comprehensive_e2e.sh fix(e2e): make provisioning-status assertions robust to CI environment 2026-04-13 17:31:07 -07:00
test_dev_mode.sh fix(quickstart): hotfixes discovered during live testing session 2026-04-23 14:57:18 -07:00
test_saas_tenant.sh chore: final open-source cleanup — binary, stale paths, private refs 2026-04-18 00:38:55 -07:00
test_staging_full_saas.sh fix(e2e): hermes cold-boot tolerance — 20min deadline + treat failed as transient 2026-04-23 17:42:52 -07:00