chore(canary): workflow_dispatch input keep_on_failure for log capture #132
Loading…
Reference in New Issue
Block a user
No description provided.
Delete Branch "chore/canary-keep-on-failure-input"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Why
Investigating
molecule-core#129failure mode #1 (the chronic-red claude-code "Agent error (Exception)" the canary keeps catching) needs the workspace container's docker logs to find the actual exception. The canary tears down the tenant on every failure, so the workspace container is destroyed before anyone can SSM in.What this changes
Adds a
workflow_dispatchinputkeep_on_failure: bool(default false). Whentrue, setsE2E_KEEP_ORG=1for the canary script — its existing debug path skips teardown, leaving the tenant + EC2 + CF tunnel + DNS alive. Operator can then SSM into the workspace EC2 and capturedocker logsfrom the claude-code container.Cron-triggered runs never set the input (it only exists on dispatch), so unattended scheduled canaries always tear down — no risk of unattended cost leak.
Operator workflow
canary-staging.ymlwithkeep_on_failure=trueSLUG/TENANT_URLprinted at step 1/11docker logs <claude-code-container>to find the actual exception tracebackDELETE /cp/admin/tenants/<slug>when done (the script logs this reminder onE2E_KEEP_ORG=1path)Verification
github.event.inputs.keep_on_failureresolves to empty (not 'true'), so the ternary lands on the'0'branch — preserved no-leak behavior.Refs:
molecule-core#129(canary investigation, failure mode #1).🤖 Generated with Claude Code
LGTM. workflow_dispatch input only — no cron impact, no unattended leak risk. Unblocks live log capture for #129.