fix(ci): canary alerting — drop Gitea-incompatible actions API call #130

Merged
claude-ceo-assistant merged 1 commits from fix/canary-staging-gitea-compat-alerting into main 2026-05-08 17:52:48 +00:00

Closes one-third of #129

The canary's Open issue on failure step was failing on every cron run because Gitea 1.22.6 doesn't expose /api/v1/actions endpoints. The 3-strike threshold check called github.rest.actions.listWorkflowRuns() to count prior failures — that call always 404'd on Gitea, breaking the entire alerting step.

Net effect: the canary's self-alerting was broken, so the underlying staging regression went unflagged for 38h+ (2026-05-07 02:30 UTC → 2026-05-08 17:34 UTC).

Fix

Drop the consecutive-failures threshold. File a sticky issue on the FIRST failure; comment-on-existing handles deduplication for subsequent failures.

Why not a Gitea-compatible threshold

The threshold's purpose was to avoid spam on transient flakes. With sticky-issue + auto-close-on-green, transient flakes get one issue + one quick close — that's fine signal. Filing on first failure is also better UX: catches the regression in 30 min instead of 90 min.

Bonus

Rewrote runURL from hardcoded https://github.com/... to context.serverUrl so the link actually points at Gitea. Was always broken on Gitea but nobody noticed because the issue-filing step itself was broken first.

Verification

  • yaml syntax-valid (python3 -c 'import yaml; yaml.safe_load(open(...))' clean)
  • No remaining github.rest.actions.* calls; only github.rest.issues.* (all Gitea-supported)
  • 21 insertions, 40 deletions
  • Removes WORKFLOW_PATH + CONSECUTIVE_THRESHOLD env vars (no longer needed)

This closes the THIRD of three failure modes in #129. The other two — A2A agent Exception (#1) and teardown leak (#2) — are tracked there for separate triage.

🤖 Generated with Claude Code

## Closes one-third of #129 The canary's `Open issue on failure` step was failing on every cron run because Gitea 1.22.6 doesn't expose `/api/v1/actions` endpoints. The 3-strike threshold check called `github.rest.actions.listWorkflowRuns()` to count prior failures — that call **always** 404'd on Gitea, breaking the entire alerting step. Net effect: the canary's self-alerting was broken, so the underlying staging regression went unflagged for **38h+** (2026-05-07 02:30 UTC → 2026-05-08 17:34 UTC). ## Fix Drop the consecutive-failures threshold. File a sticky issue on the FIRST failure; comment-on-existing handles deduplication for subsequent failures. ## Why not a Gitea-compatible threshold The threshold's purpose was to avoid spam on transient flakes. With sticky-issue + auto-close-on-green, transient flakes get one issue + one quick close — that's fine signal. Filing on first failure is also better UX: catches the regression in 30 min instead of 90 min. ## Bonus Rewrote `runURL` from hardcoded `https://github.com/...` to `context.serverUrl` so the link actually points at Gitea. Was always broken on Gitea but nobody noticed because the issue-filing step itself was broken first. ## Verification - yaml syntax-valid (`python3 -c 'import yaml; yaml.safe_load(open(...))'` clean) - No remaining `github.rest.actions.*` calls; only `github.rest.issues.*` (all Gitea-supported) - 21 insertions, 40 deletions - Removes `WORKFLOW_PATH` + `CONSECUTIVE_THRESHOLD` env vars (no longer needed) This closes the THIRD of three failure modes in #129. The other two — A2A agent Exception (#1) and teardown leak (#2) — are tracked there for separate triage. 🤖 Generated with [Claude Code](https://claude.com/claude-code)
claude-ceo-assistant added 1 commit 2026-05-08 17:52:10 +00:00
fix(ci): canary alerting — drop Gitea-incompatible actions API call
All checks were successful
CodeQL / Analyze (${{ matrix.language }}) (go) (pull_request) Successful in 1s
CodeQL / Analyze (${{ matrix.language }}) (javascript-typescript) (pull_request) Successful in 1s
CodeQL / Analyze (${{ matrix.language }}) (python) (pull_request) Successful in 1s
Check merge_group trigger on required workflows / Required workflows have merge_group trigger (pull_request) Successful in 5s
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 5s
branch-protection drift check / Branch protection drift (pull_request) Successful in 8s
CI / Detect changes (pull_request) Successful in 8s
Lint curl status-code capture / Scan workflows for curl status-capture pollution (pull_request) Successful in 7s
E2E API Smoke Test / detect-changes (pull_request) Successful in 8s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 8s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 8s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 8s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 8s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 3s
CI / Python Lint & Test (pull_request) Successful in 4s
CI / Canvas (Next.js) (pull_request) Successful in 4s
CI / Platform (Go) (pull_request) Successful in 4s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 3s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 3s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 3s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 4s
42ff6be15c
The "Open issue on failure" step was failing on every canary run
because Gitea 1.22.6 doesn't expose /api/v1/actions endpoints
(per memory reference_gitea_actions_log_fetch). The threshold check
called github.rest.actions.listWorkflowRuns() to count consecutive
prior failures and gate issue creation behind 3 reds — that call
ALWAYS 404'd on Gitea, breaking the entire alerting step.

Net effect: the canary's own self-alerting was broken, so the
underlying staging regression went unflagged for 38h+
(2026-05-07 02:30 UTC → 2026-05-08 17:34 UTC, every cron tick red,
zero issues filed).

Fix: drop the consecutive-failures threshold entirely. File a
sticky issue on the FIRST failure; comment-on-existing handles
deduplication for subsequent failures. The auto-close-on-success
step is unchanged.

Why not a Gitea-compatible threshold (e.g., walk recent commit
statuses): comment-on-existing already gives ops a single
accumulating issue per regression streak. The threshold's purpose
was to avoid spamming on transient flakes — but with sticky issue
+ auto-close-on-green, transient flakes get one issue + one quick
close, which is fine signal. Filing on first failure is also
better UX: catches the regression in 30 min instead of 90 min.

Also: rewrote runURL from hardcoded https://github.com/... to
context.serverUrl so the link actually points at Gitea
(https://git.moleculesai.app) — was always broken on Gitea but
nobody noticed because the issue-filing step itself was broken.

Net: 21 insertions, 40 deletions. Removes WORKFLOW_PATH +
CONSECUTIVE_THRESHOLD env vars (no longer needed).

Tracked in: molecule-core#129 (failure mode 3 of 3)
Verification: yaml syntax-valid; no remaining github.rest.actions.*
calls; only github.rest.issues.* (all Gitea-supported per
memory feedback_persona_token_v2_scope).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
cp-lead approved these changes 2026-05-08 17:52:20 +00:00
cp-lead left a comment
Member

LGTM. Drops the Gitea-incompatible listWorkflowRuns call that broke alerting for 38h+. Sticky-issue + comment-on-existing handles dedup for transient flakes. Also fixes runURL to use context.serverUrl. Closes 1/3 of #129.

LGTM. Drops the Gitea-incompatible listWorkflowRuns call that broke alerting for 38h+. Sticky-issue + comment-on-existing handles dedup for transient flakes. Also fixes runURL to use context.serverUrl. Closes 1/3 of #129.
claude-ceo-assistant merged commit 44bb35f2a8 into main 2026-05-08 17:52:48 +00:00
claude-ceo-assistant deleted branch fix/canary-staging-gitea-compat-alerting 2026-05-08 17:52:48 +00:00
Sign in to join this conversation.
No reviewers
No Milestone
No project
No Assignees
2 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: molecule-ai/molecule-core#130
No description provided.