molecule-core/.github/workflows
Hongming Wang ff1003e5f6 ci(canary): bump timeout-minutes 12 → 20 to absorb apt tail latency
Today's 4 cancelled canaries (25319625186 / 25320942822 / 25321618230 /
25322499952) were all blown by the workflow timeout despite the
underlying tenant boot completing successfully (PR molecule-controlplane#455
fix verified — boot events all reach `boot_script_finished/ok`).

Why the budget was wrong:

The tenant user-data install phase runs apt-get update + install of
docker.io / jq / awscli / caddy / amazon-ssm-agent FROM RAW UBUNTU on
every tenant boot — none of it is pre-baked into the tenant AMI
(EC2_AMI=ami-0ea3c35c5c3284d82, raw Jammy 22.04). Empirical
fetch_secrets/ok timing across today's canaries:

  51s   debug-mm-1777888039 (09:47Z)
  82s   25319625186          (12:42Z)
  143s  25320942822          (13:11Z)
  625s  25322499952          (13:43Z)

Same EC2_AMI, same instance type (t3.small), same user-data install
sequence — variance is entirely apt-mirror tail latency. A 12-min job
budget leaves only ~2 min for the workspace on slow-apt days; the
workspace itself needs ~3.5 min for claude-code cold boot, so the
budget is structurally too tight whenever apt is slow.

20 min absorbs even the 10+ min boot worst-case and still leaves the
workspace its full ~7 min budget. Cap stays well under the runner's
6-hour ubuntu-latest job ceiling.

Real fix: pre-bake caddy + ssm-agent into the tenant AMI so the boot
phase is no-ops on cached pkgs (will file controlplane#TBD as
follow-up — packer/install-base.sh today only bakes the WORKSPACE thin
AMI, not the tenant AMI; tenants always boot from raw Ubuntu).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 07:02:12 -07:00
..
auto-promote-on-e2e.yml chore(deps)(deps): bump imjasonh/setup-crane from 0.4 to 0.5 2026-05-02 19:23:13 +00:00
auto-promote-staging.yml fix(auto-promote): skip empty-tree promotes to break perpetual cycle 2026-05-03 08:56:44 -07:00
auto-sync-main-to-staging.yml chore(deps)(deps): bump actions/checkout from 4 to 6 2026-05-02 19:23:01 +00:00
auto-tag-runtime.yml chore(deps)(deps): bump actions/checkout from 4 to 6 2026-05-02 19:23:01 +00:00
block-internal-paths.yml chore(deps)(deps): bump actions/checkout from 4 to 6 2026-05-02 19:23:01 +00:00
canary-staging.yml e2e: add direct-Anthropic LLM-key path alongside MiniMax + OpenAI 2026-05-04 00:51:14 -07:00
canary-verify.yml Merge pull request #2521 from Molecule-AI/dependabot/github_actions/actions/checkout-6 2026-05-03 01:36:57 +00:00
cascade-list-drift-gate.yml feat(ci): structural drift gate for cascade list vs manifest (RFC #388 PR-3) 2026-05-03 03:52:39 -07:00
check-merge-group-trigger.yml chore(deps)(deps): bump actions/checkout from 4 to 6 2026-05-02 19:23:01 +00:00
check-migration-collisions.yml chore(deps)(deps): bump actions/checkout from 4 to 6 2026-05-02 19:23:01 +00:00
ci.yml test(e2e): pin pick_model_slug behavior with bash unit tests 2026-05-03 12:04:12 -07:00
codeql.yml chore(deps)(deps): bump actions/checkout from 4 to 6 2026-05-02 19:23:01 +00:00
continuous-synth-e2e.yml ci(canary): bump timeout-minutes 12 → 20 to absorb apt tail latency 2026-05-04 07:02:12 -07:00
e2e-api.yml chore(deps)(deps): bump actions/checkout from 4 to 6 2026-05-02 19:23:01 +00:00
e2e-staging-canvas.yml ci: port DELETE-verify pattern to remaining staging e2e workflows 2026-05-03 16:24:43 -07:00
e2e-staging-external.yml ci: port DELETE-verify pattern to remaining staging e2e workflows 2026-05-03 16:24:43 -07:00
e2e-staging-saas.yml e2e: add direct-Anthropic LLM-key path alongside MiniMax + OpenAI 2026-05-04 00:51:14 -07:00
e2e-staging-sanity.yml ci: port DELETE-verify pattern to remaining staging e2e workflows 2026-05-03 16:24:43 -07:00
harness-replays.yml chore(deps)(deps): bump actions/checkout from 4 to 6 2026-05-02 19:23:01 +00:00
pr-guards.yml
promote-latest.yml chore(deps)(deps): bump imjasonh/setup-crane from 0.4 to 0.5 2026-05-02 19:23:13 +00:00
publish-canvas-image.yml Merge pull request #2521 from Molecule-AI/dependabot/github_actions/actions/checkout-6 2026-05-03 01:36:57 +00:00
publish-runtime.yml fix(publish-runtime): re-add 5 templates wrongly removed from cascade (#2566) 2026-05-03 05:41:53 -07:00
publish-workspace-server-image.yml Merge pull request #2521 from Molecule-AI/dependabot/github_actions/actions/checkout-6 2026-05-03 01:36:57 +00:00
railway-pin-audit.yml Merge pull request #2523 from Molecule-AI/dependabot/github_actions/actions/github-script-9.0.0 2026-05-03 01:37:00 +00:00
redeploy-tenants-on-main.yml ci(redeploy): fix stale canary_slug default 'hongmingwang' → 'hongming' 2026-05-03 05:06:01 -07:00
redeploy-tenants-on-staging.yml ci(deploy): broaden ephemeral-prefix matchers to cover rt-e2e-* 2026-05-03 04:28:29 -07:00
retarget-main-to-staging.yml fix(retarget): skip PRs whose head is staging (auto-promote PRs) 2026-05-03 07:34:24 -07:00
runtime-pin-compat.yml chore(deps)(deps): bump actions/checkout from 4 to 6 2026-05-02 19:23:01 +00:00
runtime-prbuild-compat.yml chore(deps)(deps): bump actions/checkout from 4 to 6 2026-05-02 19:23:01 +00:00
secret-pattern-drift.yml chore(deps)(deps): bump actions/checkout from 4 to 6 2026-05-02 19:23:01 +00:00
secret-scan.yml chore(deps)(deps): bump actions/checkout from 4 to 6 2026-05-02 19:23:01 +00:00
sweep-aws-secrets.yml feat(ops): add sweep-aws-secrets janitor — orphan tenant bootstrap secrets 2026-05-03 02:38:08 -07:00
sweep-cf-orphans.yml chore(deps)(deps): bump actions/checkout from 4 to 6 2026-05-02 19:23:01 +00:00
sweep-cf-tunnels.yml chore(deps)(deps): bump actions/checkout from 4 to 6 2026-05-02 19:23:01 +00:00
sweep-stale-e2e-orgs.yml ci: tighten e2e cleanup race window 120m -> ~45m worst case 2026-05-03 16:08:40 -07:00
test-ops-scripts.yml chore(deps)(deps): bump actions/checkout from 4 to 6 2026-05-02 19:23:01 +00:00