molecule-core/.github
dev-lead 5c0c15eb4f
All checks were successful
CodeQL / Analyze (${{ matrix.language }}) (go) (pull_request) Successful in 4s
Check merge_group trigger on required workflows / Required workflows have merge_group trigger (pull_request) Successful in 10s
CodeQL / Analyze (${{ matrix.language }}) (javascript-typescript) (pull_request) Successful in 3s
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 12s
CodeQL / Analyze (${{ matrix.language }}) (python) (pull_request) Successful in 2s
pr-guards / disable-auto-merge-on-push (pull_request) Successful in 2s
CI / Detect changes (pull_request) Successful in 11s
branch-protection drift check / Branch protection drift (pull_request) Successful in 14s
Lint curl status-code capture / Scan workflows for curl status-capture pollution (pull_request) Successful in 7s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 8s
E2E API Smoke Test / detect-changes (pull_request) Successful in 9s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 9s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 8s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 10s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 4s
CI / Platform (Go) (pull_request) Successful in 5s
CI / Python Lint & Test (pull_request) Successful in 6s
CI / Canvas (Next.js) (pull_request) Successful in 7s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 4s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 5s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 3s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 6s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
chore(canary): workflow_dispatch input keep_on_failure for log capture
Investigating molecule-core#129 failure mode #1 (claude-code "Agent
error (Exception)") needs the workspace's docker logs to find the
actual exception. The canary tears down the tenant on every failure,
so the workspace container is destroyed before anyone can SSM in.

Add a workflow_dispatch input `keep_on_failure: bool` (default false).
When true, sets `E2E_KEEP_ORG=1` for the canary script — its existing
debug path skips teardown, leaving the tenant + EC2 + CF tunnel + DNS
alive. Operator can then SSM into the workspace EC2 (via the same
flow as recover-tunnels.py) and capture `docker logs` from the
claude-code container.

Cron-triggered runs never set the input (it only exists on dispatch),
so unattended scheduled canaries always tear down — no risk of
unattended cost leak.

Operator workflow:
  1. Dispatch canary-staging.yml with keep_on_failure=true
  2. Watch CI; on failure (likely, given the 38h chronic red),
     note the SLUG / TENANT_URL printed at step 1/11
  3. SSM exec into the workspace EC2 (us-east-2) and run
     `docker logs <claude-code-container>` to find the actual
     exception traceback
  4. Manually delete via DELETE /cp/admin/tenants/<slug> when done
     (the script logs this reminder on E2E_KEEP_ORG=1 path)

Refs: molecule-core#129 (canary investigation)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 10:58:19 -07:00
..
scripts fix(scripts): migrate ghcr.io→ECR + raw.githubusercontent.com→Gitea (#46) 2026-05-07 00:56:23 -07:00
workflows chore(canary): workflow_dispatch input keep_on_failure for log capture 2026-05-08 10:58:19 -07:00
CODEOWNERS chore: add CODEOWNERS to auto-route agent PRs to personal review account 2026-04-26 13:40:13 -07:00
dependabot.yml chore(security): pin Actions to SHAs + enable Dependabot auto-bumps 2026-04-28 15:37:06 -07:00