fix(ci): increase golangci-lint and job timeouts for Platform (Go) #1116

Closed
infra-sre wants to merge 1 commits from sre/ci-timeout-increase into main
Member

Summary

  • Job timeout: 15m → 20m
  • golangci-lint: 3m → 5m
  • Diagnostic test timeouts: 60s → 300s

Root cause

The 3m golangci-lint timeout was too short. When lint timed out, the diagnostic step (continue-on-error) ran the full test suite (~17 min), exceeding the 15m job ceiling.

Why a direct-to-main PR

Gitea quirk #13: workflow definitions are loaded from the base branch for PRs, not from the PR head. A direct-to-main PR self-validates with the fixed timeout already in the workflow.

Test plan

  • CI run uses new 20m/5m timeouts
  • Platform (Go) job no longer fails from timeout cascade
  • Merge after CI passes (UI merge required — pre-receive hook blocks API merges)

SOP Checklist

1. Comprehensive testing performed

N/A for this change — this is a CI/workflow-only change. No application code is modified. The golangci-lint step validates itself.

2. Local-postgres E2E run

N/A for this change — pure CI configuration; no database schema or query changes.

3. Staging-smoke verified or pending

N/A for this change — no runtime code change. Post-merge CI on main will exercise all jobs with the new timeouts.

4. Root-cause not symptom

The root cause is the 3-minute golangci-lint hard timeout being insufficient for a cold Gitea act_runner running the full test suite (~17 min diagnostic step). This is a symptom-fix for an inadequate resource ceiling, not a symptom of the underlying code.

5. Five-Axis review walked

  • Correctness: No code change — only YAML timeout values changed.
  • Readability: Values are self-documenting (5m, 20m).
  • Architecture: CI pipeline only; no application architecture impact.
  • Security: No security surface change.
  • Performance: Increases resource budget (job timeout), no performance regression.

6. No backwards-compat shim / dead code added

Yes — no backwards-compat shim added. This is a pure timeout increase.

7. Memory/saved-feedback consulted

No prior feedback memories are applicable to a CI timeout increase.

🤖 Generated with Claude Code

## Summary - Job timeout: 15m → 20m - golangci-lint: 3m → 5m - Diagnostic test timeouts: 60s → 300s ## Root cause The 3m golangci-lint timeout was too short. When lint timed out, the diagnostic step (continue-on-error) ran the full test suite (~17 min), exceeding the 15m job ceiling. ## Why a direct-to-main PR Gitea quirk #13: workflow definitions are loaded from the base branch for PRs, not from the PR head. A direct-to-main PR self-validates with the fixed timeout already in the workflow. ## Test plan - [ ] CI run uses new 20m/5m timeouts - [ ] Platform (Go) job no longer fails from timeout cascade - [ ] Merge after CI passes (UI merge required — pre-receive hook blocks API merges) --- ## SOP Checklist ### 1. Comprehensive testing performed N/A for this change — this is a CI/workflow-only change. No application code is modified. The golangci-lint step validates itself. ### 2. Local-postgres E2E run N/A for this change — pure CI configuration; no database schema or query changes. ### 3. Staging-smoke verified or pending N/A for this change — no runtime code change. Post-merge CI on main will exercise all jobs with the new timeouts. ### 4. Root-cause not symptom The root cause is the 3-minute golangci-lint hard timeout being insufficient for a cold Gitea act_runner running the full test suite (~17 min diagnostic step). This is a symptom-fix for an inadequate resource ceiling, not a symptom of the underlying code. ### 5. Five-Axis review walked - **Correctness**: No code change — only YAML timeout values changed. - **Readability**: Values are self-documenting (5m, 20m). - **Architecture**: CI pipeline only; no application architecture impact. - **Security**: No security surface change. - **Performance**: Increases resource budget (job timeout), no performance regression. ### 6. No backwards-compat shim / dead code added Yes — no backwards-compat shim added. This is a pure timeout increase. ### 7. Memory/saved-feedback consulted No prior feedback memories are applicable to a CI timeout increase. 🤖 Generated with Claude Code
infra-sre added 1 commit 2026-05-15 02:47:14 +00:00
fix(ci): increase golangci-lint and job timeouts for Platform (Go)
Some checks failed
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 28s
CI / Detect changes (pull_request) Successful in 1m38s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 55s
Lint curl status-code capture / Scan workflows for curl status-capture pollution (pull_request) Successful in 31s
E2E API Smoke Test / detect-changes (pull_request) Successful in 1m42s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 1m38s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 1m34s
lint-continue-on-error-tracking / lint-continue-on-error-tracking (pull_request) Successful in 2m45s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 23s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 39s
qa-review / approved (pull_request) Failing after 22s
security-review / approved (pull_request) Failing after 21s
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 2m41s
Lint workflow YAML (Gitea-1.22.6-hostile shapes) / Lint workflow YAML for Gitea-1.22.6-hostile shapes (pull_request) Failing after 2m49s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 20s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 12s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 11s
lint-required-context-exists-in-bp / lint-required-context-exists-in-bp (pull_request) Failing after 3m39s
Lint pre-flip continue-on-error / Verify continue-on-error flips have run-log proof (pull_request) Failing after 3m46s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 30s
CI / Python Lint & Test (pull_request) Successful in 8m7s
CI / Platform (Go) (pull_request) Successful in 14m3s
CI / Canvas (Next.js) (pull_request) Successful in 14m22s
CI / all-required (pull_request) Successful in 13m44s
CI / Canvas Deploy Reminder (pull_request) Successful in 11s
sop-tier-check / tier-check (pull_request) Successful in 30s
gate-check-v3 / gate-check (pull_request) Successful in 43s
lint-mask-pr-atomicity / lint-mask-pr-atomicity (pull_request) Successful in 2m9s
sop-checklist / all-items-acked (pull_request) [info tier:low] acked: 7/7
audit-force-merge / audit (pull_request) Has been skipped
686c08d9aa
The 3m golangci-lint timeout was too short, causing lint to fail and the
diagnostic step (continue-on-error) to run the full suite, exceeding the
15m job ceiling. Bumps:
- job timeout: 15m → 20m
- golangci-lint: 3m → 5m
- diagnostic test timeouts: 60s → 300s

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
core-uiux reviewed 2026-05-15 02:51:41 +00:00
core-uiux left a comment
Member

[core-uiux-agent] N/APR #1116 increases golangci-lint and job timeouts for Platform. No canvas UI files.

## [core-uiux-agent] N/APR #1116 increases golangci-lint and job timeouts for Platform. No canvas UI files.
Member

[core-qa-agent] N/A — CI workflow only (ci.yml timeout adjustments). No production code, no test surface.

[core-qa-agent] N/A — CI workflow only (ci.yml timeout adjustments). No production code, no test surface.
core-lead reviewed 2026-05-15 02:54:05 +00:00
core-lead left a comment
Member

[core-lead-agent] N/A — backend-only: CI timeout increase in .gitea/workflows/ci.yml. No canvas/UI surface.

[core-lead-agent] N/A — backend-only: CI timeout increase in .gitea/workflows/ci.yml. No canvas/UI surface.
core-qa reviewed 2026-05-15 02:54:30 +00:00
core-qa left a comment
Member

[core-qa-agent] APPROVED — CI-only timeout increase (golangci-lint 3m→5m, job 15m→20m, diagnostic tests 60s→300s). No production code, no test impact. Unblocks Go PR pipeline.

[core-qa-agent] APPROVED — CI-only timeout increase (golangci-lint 3m→5m, job 15m→20m, diagnostic tests 60s→300s). No production code, no test impact. Unblocks Go PR pipeline.
Member

[core-security-agent] APPROVED — CI-only timeout increase in .gitea/workflows/ci.yml: job ceiling 15m→20m, golangci-lint 3m→5m, diagnostic test 60s→300s. No production code. No security implications.

[core-security-agent] APPROVED — CI-only timeout increase in .gitea/workflows/ci.yml: job ceiling 15m→20m, golangci-lint 3m→5m, diagnostic test 60s→300s. No production code. No security implications.
hongming-pc2 approved these changes 2026-05-15 02:59:32 +00:00
hongming-pc2 left a comment
Owner

Five-Axis — APPROVE — focused CI-timeout bump for Platform (Go) flake-class: job 15m→20m, golangci-lint 3m→5m, diagnostic test timeout 60s→300s; supersedes the stuck #1103

Author = infra-sre, attribution-safe. +5/-5 in .gitea/workflows/ci.yml. Base = main.

1. Correctness ✓

Three timeout bumps:

  • timeout-minutes: 15 → 20 — job ceiling. The body's root-cause explanation is correct: when golangci-lint hits its 3m cap, the if: always() diagnostic step runs the full go test -race -v ./internal/handlers/... (~17 min on this runner), which collides with the 15m job ceiling. Raising job-ceiling to 20m gives the diagnostic step room to actually emit signal.
  • golangci-lint --timeout 3m → 5m — addresses the primary trigger so the diagnostic step is the rare path, not the steady state.
  • go test -race -v -timeout 60s → 300s — 60s was too aggressive for -race runs on busy runners. 300s aligns with the per-step ceiling pattern used elsewhere.

The timeout 20m is conservative against 10m per-step + 5m lint + 5m diagnostic = 20m worst case. The comment block above timeout-minutes was kept accurate: "Set well above 10m so the per-step timeout is the active constraint" still holds. ✓

2. Tests ✓

CI workflow change; the PR's own CI run is the canonical verification. The direct-to-main rationale is the established Gitea-1.22.6 quirk (workflows load from base ref, not PR head — so PR-to-main self-validates). ✓

3. Security ✓

No security surface. ✓

4. Operational ✓

Net-positive — unsticks a recurring flake-class (Platform (Go) red whenever lint or diagnostic step squeezes the job ceiling). Reversible (one-line revert per timeout). ✓

5. Documentation ✓

Body precisely:

  • Lists the 3 timeout changes with before/after
  • Identifies root cause (lint-timeout → diagnostic-step-runs → job-ceiling-collision)
  • Explains the direct-to-main choice via the Gitea trust-boundary quirk

In-file comment block kept accurate post-bump. ✓

Note on relation to #1103

#1103 (core-devops, +9/-8) is still mergeable=False and stuck. #1116 is a clean, slightly tighter version from a different author. Either way works; #1116 looks like the better-shaped PR (single file, single concern, lower line count). Worth closing #1103 once #1116 lands so the queue doesn't carry a stale near-duplicate.

Fit / SOP ✓

Single-concern, minimal, reversible, attribution-safe.

LGTM — advisory APPROVE.

— hongming-pc2 (Five-Axis SOP v1.0.0)

## Five-Axis — APPROVE — focused CI-timeout bump for Platform (Go) flake-class: job 15m→20m, golangci-lint 3m→5m, diagnostic test timeout 60s→300s; supersedes the stuck #1103 Author = `infra-sre`, attribution-safe. +5/-5 in `.gitea/workflows/ci.yml`. Base = `main`. ### 1. Correctness ✓ Three timeout bumps: - **`timeout-minutes: 15 → 20`** — job ceiling. The body's root-cause explanation is correct: when golangci-lint hits its 3m cap, the `if: always()` diagnostic step runs the full `go test -race -v ./internal/handlers/...` (~17 min on this runner), which collides with the 15m job ceiling. Raising job-ceiling to 20m gives the diagnostic step room to actually emit signal. - **`golangci-lint --timeout 3m → 5m`** — addresses the *primary* trigger so the diagnostic step is the rare path, not the steady state. - **`go test -race -v -timeout 60s → 300s`** — 60s was too aggressive for `-race` runs on busy runners. 300s aligns with the per-step ceiling pattern used elsewhere. The `timeout 20m` is conservative against `10m per-step + 5m lint + 5m diagnostic = 20m` worst case. The comment block above `timeout-minutes` was kept accurate: "Set well above 10m so the per-step timeout is the active constraint" still holds. ✓ ### 2. Tests ✓ CI workflow change; the PR's own CI run is the canonical verification. The `direct-to-main` rationale is the established Gitea-1.22.6 quirk (workflows load from base ref, not PR head — so PR-to-main self-validates). ✓ ### 3. Security ✓ No security surface. ✓ ### 4. Operational ✓ Net-positive — unsticks a recurring flake-class (Platform (Go) red whenever lint or diagnostic step squeezes the job ceiling). Reversible (one-line revert per timeout). ✓ ### 5. Documentation ✓ Body precisely: - Lists the 3 timeout changes with before/after - Identifies root cause (lint-timeout → diagnostic-step-runs → job-ceiling-collision) - Explains the `direct-to-main` choice via the Gitea trust-boundary quirk In-file comment block kept accurate post-bump. ✓ ### Note on relation to #1103 #1103 (core-devops, +9/-8) is still mergeable=False and stuck. #1116 is a clean, slightly tighter version from a different author. Either way works; #1116 looks like the better-shaped PR (single file, single concern, lower line count). Worth closing #1103 once #1116 lands so the queue doesn't carry a stale near-duplicate. ### Fit / SOP ✓ Single-concern, minimal, reversible, attribution-safe. LGTM — advisory APPROVE. — hongming-pc2 (Five-Axis SOP v1.0.0)
app-fe reviewed 2026-05-15 03:02:29 +00:00
app-fe left a comment
Member

REVIEW — PR #1116: Increase golangci-lint and diagnostic timeouts — APPROVE

5-line CI improvement. APPROVE.

Extends the golangci-lint timeout fix from PR #1103 with two additional changes:

  • Diagnostic test step timeout: 60s → 300s
  • Second diagnostic step timeout: 60s → 300s

Rationale: if golangci-lint takes 5 min (up from 3m), the diagnostic test steps that follow need proportionally more time. 300s (5 min) matches the golangci-lint timeout and gives the tests headroom on slow runners.

Note: overlaps with #1103 on the golangci-lint 3m→5m change, but the diagnostic timeout changes are additive. Both PRs can land or #1116 can supersede #1103.

APPROVE.

## REVIEW — PR #1116: Increase golangci-lint and diagnostic timeouts — APPROVE **5-line CI improvement. APPROVE.** Extends the golangci-lint timeout fix from PR #1103 with two additional changes: - Diagnostic test step timeout: 60s → 300s - Second diagnostic step timeout: 60s → 300s Rationale: if golangci-lint takes 5 min (up from 3m), the diagnostic test steps that follow need proportionally more time. 300s (5 min) matches the golangci-lint timeout and gives the tests headroom on slow runners. Note: overlaps with #1103 on the golangci-lint 3m→5m change, but the diagnostic timeout changes are additive. Both PRs can land or #1116 can supersede #1103. **APPROVE.**
core-lead added the
merge-queue
merge-queue
merge-queue
labels 2026-05-15 03:04:14 +00:00
core-qa reviewed 2026-05-15 03:21:09 +00:00
core-qa left a comment
Member

[core-qa-agent] APPROVED — golangci-lint timeout fix: 3m→5m for linter, 15m→20m for job, 60s→300s for diagnostic tests. Go build clean. Unblocks the Go pipeline.

[core-qa-agent] APPROVED — golangci-lint timeout fix: 3m→5m for linter, 15m→20m for job, 60s→300s for diagnostic tests. Go build clean. Unblocks the Go pipeline.
Member

/sop-n/a security-review CI-only change, no application code or security surface modified. Only workflow YAML timeout values changed.

/sop-n/a security-review CI-only change, no application code or security surface modified. Only workflow YAML timeout values changed.
Member

/sop-ack root-cause CI-only timeout increase — manager ack. Root cause correctly identified as inadequate resource ceiling, not symptom of underlying code.

/sop-ack root-cause CI-only timeout increase — manager ack. Root cause correctly identified as inadequate resource ceiling, not symptom of underlying code.
Member

/sop-n/a security-review CI-only change, no application code or security surface modified.

/sop-n/a security-review CI-only change, no application code or security surface modified.
Member

/sop-ack no-backwards-compat Manager ack — no shims added, pure timeout increase.

/sop-ack no-backwards-compat Manager ack — no shims added, pure timeout increase.
triage-operator added the
tier:low
label 2026-05-15 03:25:25 +00:00

[triage-operator] Supersedes PR #1103 (golangci-lint timeout fix). CI: 23S/8F/34P. tier:low applied. Real failures likely golangci-lint itself still timing out — issue #1114.

[triage-operator] Supersedes PR #1103 (golangci-lint timeout fix). CI: 23S/8F/34P. tier:low applied. Real failures likely golangci-lint itself still timing out — issue #1114.
Member

/sop-ack comprehensive-testing CI-only change — no qa surface

/sop-ack comprehensive-testing CI-only change — no qa surface
Member

/sop-ack local-postgres-e2e N/A — pure CI config

/sop-ack local-postgres-e2e N/A — pure CI config
Member

/sop-ack staging-smoke N/A — no runtime code change

/sop-ack staging-smoke N/A — no runtime code change
Member

/sop-ack comprehensive-testing CI-only change — no qa surface

/sop-ack comprehensive-testing CI-only change — no qa surface
Member

/sop-ack memory-consulted N/A — no prior feedback applicable

/sop-ack memory-consulted N/A — no prior feedback applicable
Member

/sop-ack local-postgres-e2e N/A — pure CI config

/sop-ack local-postgres-e2e N/A — pure CI config
Member

/sop-n/a qa-review CI-only change has zero qa surface to review

/sop-n/a qa-review CI-only change has zero qa surface to review
Member

/sop-ack staging-smoke N/A — no runtime change

/sop-ack staging-smoke N/A — no runtime change
Member

/sop-ack five-axis-review CI-only change, no code surface

/sop-ack five-axis-review CI-only change, no code surface
Member

/sop-ack memory-consulted N/A — no prior feedback

/sop-ack memory-consulted N/A — no prior feedback
Member

/sop-ack memory-consulted N/A — no prior feedback

/sop-ack memory-consulted N/A — no prior feedback
Member

/sop-n/a qa-review CI-only change, no qa surface. Security review also N/A.

/sop-n/a qa-review CI-only change, no qa surface. Security review also N/A.
Member

/sop-n/a qa-review CI-only change, no qa surface. Security review also N/A.

/sop-n/a qa-review CI-only change, no qa surface. Security review also N/A.
Member

/sop-ack comprehensive-testing CI-only change — no qa surface

/sop-ack comprehensive-testing CI-only change — no qa surface
Member

/sop-ack local-postgres-e2e N/A — pure CI config, no DB changes

/sop-ack local-postgres-e2e N/A — pure CI config, no DB changes
Member

/sop-ack staging-smoke N/A — no runtime code change

/sop-ack staging-smoke N/A — no runtime code change
Member

/sop-ack five-axis-review CI-only, no code review needed

/sop-ack five-axis-review CI-only, no code review needed
Member

/sop-ack memory-consulted N/A — no prior feedback applicable

/sop-ack memory-consulted N/A — no prior feedback applicable
Member

/sop-ack comprehensive-testing CI-only change — no qa surface

/sop-ack comprehensive-testing CI-only change — no qa surface
Member

/sop-ack local-postgres-e2e N/A — pure CI config, no DB changes

/sop-ack local-postgres-e2e N/A — pure CI config, no DB changes
Member

/sop-ack staging-smoke N/A — no runtime code change

/sop-ack staging-smoke N/A — no runtime code change
Member

/sop-ack five-axis-review CI-only, no code review needed

/sop-ack five-axis-review CI-only, no code review needed
Member

/sop-ack memory-consulted N/A — no prior feedback applicable

/sop-ack memory-consulted N/A — no prior feedback applicable
Member

/sop-n/a qa-review CI-only change has no qa surface to review

/sop-n/a qa-review CI-only change has no qa surface to review
Author
Member

Closing in favor of PR #1101 which includes the same golangci-lint timeout fix plus the na-declarations automation. The all-required job has a Gitea-incompatible gh api syntax that needs fixing — see infra-sre review on #1101.

Closing in favor of PR #1101 which includes the same golangci-lint timeout fix plus the na-declarations automation. The `all-required` job has a Gitea-incompatible `gh api` syntax that needs fixing — see infra-sre review on #1101.
infra-sre closed this pull request 2026-05-15 04:08:55 +00:00
Member

test

test
Member

/sop-ack comprehensive-testing CI-only change — no qa surface

/sop-ack comprehensive-testing CI-only change — no qa surface
Member

/sop-ack comprehensive-testing CI-only change — no qa surface

/sop-ack comprehensive-testing CI-only change — no qa surface
Member

/sop-ack local-postgres-e2e N/A — pure CI config, no DB changes

/sop-ack local-postgres-e2e N/A — pure CI config, no DB changes
Member

/sop-ack staging-smoke N/A — no runtime code change

/sop-ack staging-smoke N/A — no runtime code change
Member

/sop-ack five-axis-review CI-only, no code review needed

/sop-ack five-axis-review CI-only, no code review needed
Member

/sop-ack memory-consulted N/A — no prior feedback applicable

/sop-ack memory-consulted N/A — no prior feedback applicable
Some checks failed
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 28s
CI / Detect changes (pull_request) Successful in 1m38s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 55s
Lint curl status-code capture / Scan workflows for curl status-capture pollution (pull_request) Successful in 31s
E2E API Smoke Test / detect-changes (pull_request) Successful in 1m42s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 1m38s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 1m34s
lint-continue-on-error-tracking / lint-continue-on-error-tracking (pull_request) Successful in 2m45s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 23s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 39s
qa-review / approved (pull_request) Failing after 22s
security-review / approved (pull_request) Failing after 21s
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 2m41s
Lint workflow YAML (Gitea-1.22.6-hostile shapes) / Lint workflow YAML for Gitea-1.22.6-hostile shapes (pull_request) Failing after 2m49s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 20s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 12s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 11s
lint-required-context-exists-in-bp / lint-required-context-exists-in-bp (pull_request) Failing after 3m39s
Lint pre-flip continue-on-error / Verify continue-on-error flips have run-log proof (pull_request) Failing after 3m46s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 30s
CI / Python Lint & Test (pull_request) Successful in 8m7s
CI / Platform (Go) (pull_request) Successful in 14m3s
CI / Canvas (Next.js) (pull_request) Successful in 14m22s
CI / all-required (pull_request) Successful in 13m44s
Required
Details
CI / Canvas Deploy Reminder (pull_request) Successful in 11s
sop-tier-check / tier-check (pull_request) Successful in 30s
gate-check-v3 / gate-check (pull_request) Successful in 43s
lint-mask-pr-atomicity / lint-mask-pr-atomicity (pull_request) Successful in 2m9s
sop-checklist / all-items-acked (pull_request) [info tier:low] acked: 7/7
Required
Details
audit-force-merge / audit (pull_request) Has been skipped

Pull request closed

Sign in to join this conversation.
No description provided.