molecule-core/.github/workflows
dev-lead 5c0c15eb4f chore(canary): workflow_dispatch input keep_on_failure for log capture
Investigating molecule-core#129 failure mode #1 (claude-code "Agent
error (Exception)") needs the workspace's docker logs to find the
actual exception. The canary tears down the tenant on every failure,
so the workspace container is destroyed before anyone can SSM in.

Add a workflow_dispatch input `keep_on_failure: bool` (default false).
When true, sets `E2E_KEEP_ORG=1` for the canary script — its existing
debug path skips teardown, leaving the tenant + EC2 + CF tunnel + DNS
alive. Operator can then SSM into the workspace EC2 (via the same
flow as recover-tunnels.py) and capture `docker logs` from the
claude-code container.

Cron-triggered runs never set the input (it only exists on dispatch),
so unattended scheduled canaries always tear down — no risk of
unattended cost leak.

Operator workflow:
  1. Dispatch canary-staging.yml with keep_on_failure=true
  2. Watch CI; on failure (likely, given the 38h chronic red),
     note the SLUG / TENANT_URL printed at step 1/11
  3. SSM exec into the workspace EC2 (us-east-2) and run
     `docker logs <claude-code-container>` to find the actual
     exception traceback
  4. Manually delete via DELETE /cp/admin/tenants/<slug> when done
     (the script logs this reminder on E2E_KEEP_ORG=1 path)

Refs: molecule-core#129 (canary investigation)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 10:58:19 -07:00
..
auto-tag-runtime.yml fix(ci): replace gh pr CLI with Gitea v1 REST in workflows + scripts (#75 class A) 2026-05-07 15:29:26 -07:00
block-internal-paths.yml fix(ci): lowercase 'molecule-ai/' in cross-repo workflow refs 2026-05-07 01:00:10 -07:00
branch-protection-drift.yml ci(branch-protection): check-name parity gate (#144) 2026-05-07 14:42:50 -07:00
canary-staging.yml chore(canary): workflow_dispatch input keep_on_failure for log capture 2026-05-08 10:58:19 -07:00
canary-verify.yml fix(post-suspension): migrate github.com/Molecule-AI refs to git.moleculesai.app (Class G #168) 2026-05-07 13:08:15 -07:00
cascade-list-drift-gate.yml feat(ci): structural drift gate for cascade list vs manifest (RFC #388 PR-3) 2026-05-03 03:52:39 -07:00
check-merge-group-trigger.yml chore(deps)(deps): bump actions/checkout from 4 to 6 2026-05-02 19:23:01 +00:00
check-migration-collisions.yml chore(deps)(deps): bump actions/checkout from 4 to 6 2026-05-02 19:23:01 +00:00
ci.yml fix(ci): pin actions/upload-artifact + download-artifact to @v3 for Gitea compatibility 2026-05-07 16:54:44 -07:00
codeql.yml fix(ci): convert CodeQL workflow to no-op stub on Gitea (#156) 2026-05-07 14:26:57 -07:00
continuous-synth-e2e.yml ci(canary): bump timeout-minutes 12 → 20 to absorb apt tail latency 2026-05-04 07:02:12 -07:00
e2e-api.yml fix(ci): e2e-api — parallel-safe postgres/redis containers + provisioner setup 2026-05-07 18:59:56 -07:00
e2e-staging-canvas.yml chore(workflows): drop staging-branch triggers (Phase 3b of internal#81) 2026-05-08 13:08:51 +00:00
e2e-staging-external.yml chore(workflows): drop staging-branch triggers (Phase 3b of internal#81) 2026-05-08 13:08:51 +00:00
e2e-staging-saas.yml chore(workflows): drop staging-branch triggers (Phase 3b of internal#81) 2026-05-08 13:08:51 +00:00
e2e-staging-sanity.yml fix(workflows): preserve curl stderr in 8 status-capture sites 2026-05-04 18:54:50 -07:00
handlers-postgres-integration.yml chore(ci): retrigger Handlers Postgres Integration for second-green proof 2026-05-07 18:23:05 -07:00
harness-replays.yml fix(ci): pre-clone manifest deps in harness-replays workflow (#173 followup) 2026-05-07 14:26:52 -07:00
lint-curl-status-capture.yml fix(workflows): rewrite curl status-capture to prevent exit-code pollution 2026-05-04 18:29:38 -07:00
pr-guards.yml fix(ci): close 3 chronic Gitea-Actions workflow flakes (closes #88) 2026-05-07 17:06:09 -07:00
promote-latest.yml chore(deps)(deps): bump imjasonh/setup-crane from 0.4 to 0.5 2026-05-02 19:23:13 +00:00
publish-canvas-image.yml Merge pull request #2521 from Molecule-AI/dependabot/github_actions/actions/checkout-6 2026-05-03 01:36:57 +00:00
publish-runtime.yml chore: reconcile main → staging post-suspension divergence 2026-05-07 14:24:37 -07:00
publish-workspace-server-image.yml chore(ci): retrigger publish-workspace-server-image after ECR repo create (#173) 2026-05-07 13:54:11 -07:00
railway-pin-audit.yml Merge pull request #2523 from Molecule-AI/dependabot/github_actions/actions/github-script-9.0.0 2026-05-03 01:37:00 +00:00
redeploy-tenants-on-main.yml fix(ci): lowercase 'molecule-ai/' in cross-repo workflow refs 2026-05-07 01:00:10 -07:00
redeploy-tenants-on-staging.yml chore(workflows): drop staging-branch triggers (Phase 3b of internal#81) 2026-05-08 13:08:51 +00:00
runtime-pin-compat.yml chore(deps)(deps): bump actions/checkout from 4 to 6 2026-05-02 19:23:01 +00:00
runtime-prbuild-compat.yml fix(ci): include event_name in runtime-prbuild-compat concurrency group 2026-05-05 04:01:20 -07:00
secret-pattern-drift.yml chore(deps)(deps): bump actions/checkout from 4 to 6 2026-05-02 19:23:01 +00:00
secret-scan.yml fix(ci): lowercase 'molecule-ai/' in cross-repo workflow refs 2026-05-07 01:00:10 -07:00
sweep-aws-secrets.yml feat(ops): add sweep-aws-secrets janitor — orphan tenant bootstrap secrets 2026-05-03 02:38:08 -07:00
sweep-cf-orphans.yml chore(deps)(deps): bump actions/checkout from 4 to 6 2026-05-02 19:23:01 +00:00
sweep-cf-tunnels.yml chore(deps)(deps): bump actions/checkout from 4 to 6 2026-05-02 19:23:01 +00:00
sweep-stale-e2e-orgs.yml chore(sweep): add orphan-tunnel cleanup step (#2987 / #340) 2026-05-05 19:36:20 -07:00
test-ops-scripts.yml chore(deps)(deps): bump actions/checkout from 4 to 6 2026-05-02 19:23:01 +00:00