molecule-core/.github
Hongming Wang 3a36d732e4 fix(ci): sweep prior UTC day in e2e safety nets (midnight-rollover)
[Molecule-Platform-Evolvement-Manager]

## What was breaking

All three staging e2e workflows' "Teardown safety net" steps
filtered candidate slugs by `f'e2e-...-{today}-...'` where `today`
was computed at safety-net-step time via `datetime.date.today()`.

When a run crossed midnight UTC (start before 00:00, end after),
`today` became the NEXT day, but the slug it created carried the
PRIOR day's date. The filter never matched its own slug → leak.

## Today's incident

E2E Staging Canvas run [24970092066](
https://github.com/Molecule-AI/molecule-core/actions/runs/24970092066):
  - started 2026-04-26 23:45:59Z
  - created slug `e2e-canvas-20260426-1u8nz3` at 23:59Z
  - ended 2026-04-27 00:12:47Z (failure)
  - safety-net step ran with `today=20260427`
  - filter `e2e-canvas-20260427-` did not match `...20260426-1u8nz3`
  - tenant + child workspace EC2 both stayed up

Confirmed via CP staging logs: no DELETE for `1u8nz3` ever issued.
The Playwright globalTeardown didn't fire (test crashed mid-run);
the workflow safety-net was the last line and it missed.

## Fix

All three workflows now sweep BOTH today AND yesterday's UTC dates,
so a run that crosses midnight still matches its own slug:

```python
today = datetime.date.today()
yesterday = today - datetime.timedelta(days=1)
dates = (today.strftime('%Y%m%d'), yesterday.strftime('%Y%m%d'))
prefixes = tuple(f'e2e-canvas-{d}-' for d in dates)  # (canvas variant)
```

Per-run-id scoping (saas + canary) is preserved — the prior-day
prefix still includes the run_id, so cross-midnight runs only sweep
their own slugs, not other in-flight runs from yesterday.

## Why two-day window vs. arbitrary lookback

A run can't legitimately last more than 24h on GitHub-hosted
runners (workflow `timeout-minutes` caps; canary=25, e2e-saas=45,
canvas=30). Two-day window is enough to cover any cross-midnight
run without widening the cross-run-cleanup blast radius further.
The `sweep-stale-e2e-orgs.yml` cron (with its 120-min age threshold)
remains the catch-all for anything older that drifts through.

## Test plan

- [x] Manual logic simulation: post-midnight slug matches yesterday's
      prefix; same-day still matches; 2-days-ago does NOT match;
      production tenant never matches
- [x] All three workflow YAMLs syntactically valid
- [ ] Next cross-midnight run cleans up its own slug

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 19:23:36 -07:00
..
workflows fix(ci): sweep prior UTC day in e2e safety nets (midnight-rollover) 2026-04-26 19:23:36 -07:00
CODEOWNERS chore: add CODEOWNERS to auto-route agent PRs to personal review account 2026-04-26 13:40:13 -07:00