fix(sre): add explicit 15s timeout to gate-check-v3 HTTP calls
Some checks failed
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 10s
Lint curl status-code capture / Scan workflows for curl status-capture pollution (pull_request) Successful in 11s
CI / Detect changes (pull_request) Successful in 26s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 13s
E2E API Smoke Test / detect-changes (pull_request) Successful in 26s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 25s
qa-review / approved (pull_request) Failing after 12s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 30s
security-review / approved (pull_request) Failing after 13s
gate-check-v3 / gate-check (pull_request) Successful in 20s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 28s
sop-tier-check / tier-check (pull_request) Successful in 13s
CI / Platform (Go) (pull_request) Successful in 6s
CI / Canvas (Next.js) (pull_request) Successful in 6s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 5s
CI / Python Lint & Test (pull_request) Successful in 6s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 8s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 8s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 8s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 7s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
CI / all-required (pull_request) Successful in 3s
audit-force-merge / audit (pull_request) Has been skipped

gate-check-v3 workflow is failing ~15s on main with
"Failing after 15s" (job-level failure). Root cause is likely
SOP_TIER_CHECK_TOKEN not provisioned (confirmed absent from repo
Actions secrets), causing unauthenticated API calls to hang or
fail. Without an explicit timeout, urllib uses the OS TCP default
(~60s on some platforms, ~15s on others).

Fix: add DEFAULT_TIMEOUT=15 to gate_check.py and pass it to all
urllib.request.urlopen() calls (api_get + comment POST/PATCH).
Cron step inline Python gets socket.setdefaulttimeout(15).

This is defense-in-depth. The real fix is provisioning
SOP_TIER_CHECK_TOKEN (tracked separately). The timeout ensures
a missing/invalid token causes a fast failure rather than
an indefinite hang that blocks the job for minutes.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
Molecule AI · infra-sre 2026-05-11 23:23:34 +00:00
parent 8bd3585f55
commit da1487adad
2 changed files with 10 additions and 4 deletions

View File

@ -72,7 +72,8 @@ jobs:
set -euo pipefail
# Fetch all open PRs and run gate-check on each
pr_numbers=$(python3 -c "
import urllib.request, json, os
import urllib.request, json, os, socket
socket.setdefaulttimeout(15)
token = os.environ['GITEA_TOKEN']
req = urllib.request.Request(
'https://git.moleculesai.app/api/v1/repos/${{ github.repository }}/pulls?state=open&limit=100',

View File

@ -35,6 +35,11 @@ GITEA_HOST = os.environ.get("GITEA_HOST", "git.moleculesai.app")
GITEA_TOKEN = os.environ.get("GITEA_TOKEN", os.environ.get("GITHUB_TOKEN", ""))
API_BASE = f"https://{GITEA_HOST}/api/v1"
# Timeout for all HTTP requests (seconds). Prevents indefinite hangs when the
# Gitea instance is slow or unreachable. Must be short enough that a timeout
# failure doesn't block the overall job beyond a reasonable window.
DEFAULT_TIMEOUT = 15
def api_get(path: str) -> dict | list:
url = f"{API_BASE}{path}"
@ -46,7 +51,7 @@ def api_get(path: str) -> dict | list:
},
)
try:
with urllib.request.urlopen(req) as r:
with urllib.request.urlopen(req, timeout=DEFAULT_TIMEOUT) as r:
return json.loads(r.read())
except urllib.error.HTTPError as e:
body = e.read().decode(errors="replace")
@ -521,12 +526,12 @@ def run(repo: str, pr_number: int, post_comment: bool = False) -> dict:
comment_id = our_comments[-1]["id"]
url = f"{API_BASE}/repos/{owner}/{name}/issues/comments/{comment_id}"
req = urllib.request.Request(url, data=json.dumps({"body": comment_body}).encode(), headers=headers, method="PATCH")
with urllib.request.urlopen(req) as r:
with urllib.request.urlopen(req, timeout=DEFAULT_TIMEOUT) as r:
r.read()
else:
url = f"{API_BASE}/repos/{owner}/{name}/issues/{pr_number}/comments"
req = urllib.request.Request(url, data=json.dumps({"body": comment_body}).encode(), headers=headers, method="POST")
with urllib.request.urlopen(req) as r:
with urllib.request.urlopen(req, timeout=DEFAULT_TIMEOUT) as r:
r.read()
except urllib.error.HTTPError as e:
if e.code == 403: