docs(runbooks): update gitea-operational-quirks with Compare API as primary fix
All checks were successful
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 19s
CI / Detect changes (pull_request) Successful in 1m17s
E2E API Smoke Test / detect-changes (pull_request) Successful in 1m19s
Lint curl status-code capture / Scan workflows for curl status-capture pollution (pull_request) Successful in 16s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 1m20s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 20s
sop-tier-check / tier-check (pull_request) Successful in 22s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 1m10s
CI / Platform (Go) (pull_request) Successful in 10s
CI / Canvas (Next.js) (pull_request) Successful in 10s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 8s
CI / Python Lint & Test (pull_request) Successful in 8s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 11s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 1m5s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 10s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 9s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 7s
All checks were successful
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 19s
CI / Detect changes (pull_request) Successful in 1m17s
E2E API Smoke Test / detect-changes (pull_request) Successful in 1m19s
Lint curl status-code capture / Scan workflows for curl status-capture pollution (pull_request) Successful in 16s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 1m20s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 20s
sop-tier-check / tier-check (pull_request) Successful in 22s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 1m10s
CI / Platform (Go) (pull_request) Successful in 10s
CI / Canvas (Next.js) (pull_request) Successful in 10s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 8s
CI / Python Lint & Test (pull_request) Successful in 8s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 11s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 1m5s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 10s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 9s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 7s
Add SRE's empirical corrections (PR #478): shallow fetch succeeds ~16s, runner CAN reach git.moleculesai.app, full-history fetch times out due to ~75MB repo size (not network isolation). Also add Compare API (PR #476) as the primary recommended fix for detect-changes git-fetch timeout, superseding the legacy timeout+fallback approach documented in PR #441. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
parent
e7db2d3344
commit
e05db75bb8
@ -8,57 +8,53 @@ runbooks.
|
||||
|
||||
---
|
||||
|
||||
## Gitea 1.22.6 runner network isolation
|
||||
## Large repo causes fetch timeout on Gitea Actions runner
|
||||
|
||||
### Finding
|
||||
|
||||
The Gitea Actions runner (container on host `5.78.80.188`) cannot reach the
|
||||
git remote (`https://git.moleculesai.app`) over HTTPS from inside the runner
|
||||
container. Any `git fetch`, `git clone`, or `git push` command that contacts
|
||||
the remote times out at 12–15 s.
|
||||
The Gitea Actions runner (container on host `5.78.80.188`) CAN reach the git
|
||||
remote (`https://git.moleculesai.app`) over HTTPS. A single-commit shallow fetch
|
||||
(`--depth=1`) succeeds in ~16 s. However, fetching the **full compressed repo
|
||||
history** (~75+ MB) exceeds the runner's network timeout window (~15 s).
|
||||
|
||||
This is **not a Gitea Actions bug** — it is an operator-level network policy
|
||||
where the runner container's network namespace is restricted from reaching the
|
||||
Gitea host HTTPS endpoint. The runner can reach external hosts (GitHub,
|
||||
Docker Hub, PyPI) normally.
|
||||
This is **not a Gitea Actions bug** and **not a network isolation policy** —
|
||||
it is a repo-size constraint. The runner can reach external hosts (GitHub,
|
||||
Docker Hub, PyPI) without issue.
|
||||
|
||||
### Impact
|
||||
|
||||
Workflows that rely on `git fetch origin <ref>` or `actions/checkout` with
|
||||
`fetch-depth: 0` (full history) will hang or time out.
|
||||
|
||||
Specifically:
|
||||
- `actions/checkout@v*` with `fetch-depth: 0` hangs (fetching full repo
|
||||
history takes >30 s before hitting the timeout).
|
||||
- `git fetch origin main --depth=1` times out at ~15 s.
|
||||
- `git clone <url>` times out at ~15 s.
|
||||
Workflows that rely on `actions/checkout` with `fetch-depth: 0` (full history)
|
||||
or `git clone` will time out. The `git fetch origin <ref> --depth=1` pattern
|
||||
succeeds reliably.
|
||||
|
||||
### Affected workflows
|
||||
|
||||
| Workflow | Issue | Workaround |
|
||||
| Workflow | Issue | Fix |
|
||||
|---|---|---|
|
||||
| `harness-replays.yml` detect-changes job | `git fetch origin main --depth=1` times out | Added `timeout 20` + graceful fallback to `run=true` (always run harness) per PR #441 |
|
||||
| `harness-replays.yml` detect-changes | `fetch-depth: 1` shallow clone + `git fetch origin $BASE --depth=1` times out | Use Gitea Compare API (Gitea→Gitea, no runner network needed) — **primary fix** (PR #476) |
|
||||
| `publish-workspace-server-image.yml` | In-image `git clone` of workspace templates | Pre-clone manifest deps before compose build (Task #173 pattern) |
|
||||
| Any workflow using `fetch-depth: 0` | Full history fetch times out | Use `fetch-depth: 1` + explicit `git fetch` for needed refs |
|
||||
| Any workflow using `fetch-depth: 0` | Full history fetch times out | Use `fetch-depth: 1` + Compare API for changed-file detection |
|
||||
|
||||
### How to diagnose
|
||||
|
||||
```bash
|
||||
# From inside the runner (add as a debug step):
|
||||
timeout 20 git fetch origin main --depth=1
|
||||
# If this times out: runner cannot reach git remote
|
||||
# If this SUCCEEDS (~16s): runner can reach the git remote — repo is too large for full-history fetch.
|
||||
# If this times out: true network isolation (check firewall rules).
|
||||
```
|
||||
|
||||
### Verification
|
||||
|
||||
Confirmed 2026-05-11 by running `timeout 20 git fetch origin main --depth=1`
|
||||
in the `detect-changes` job of `harness-replays.yml` — consistently times
|
||||
out at 15 s. Runner can reach `https://api.github.com` and `https://pypi.org`
|
||||
without issue.
|
||||
Confirmed 2026-05-11 by running `timeout 20 git fetch origin base.ref --depth=1`
|
||||
in the `detect-changes` job of `harness-replays.yml` — **succeeds in ~16 s**.
|
||||
Runner can reach `https://api.github.com` and `https://pypi.org` without issue,
|
||||
confirming this is a repo-size constraint, not network isolation.
|
||||
|
||||
### References
|
||||
|
||||
- PR #441: fix for `harness-replays.yml` detect-changes
|
||||
- PR #476: **primary fix** — use Gitea Compare API instead of git fetch/diff
|
||||
- PR #441: legacy timeout+fallback fix (now superseded by PR #476)
|
||||
- Task #173: pre-clone manifest deps pattern for compose build
|
||||
- internal#102: tracking customer-private + marketplace third-party repos
|
||||
- `feedback_oss_first_repo_visibility_default`: 5 workspace-template repos
|
||||
@ -87,7 +83,7 @@ exits with code 0 (e.g., append `|| true` to commands that might fail).
|
||||
|
||||
| Workflow | Fix |
|
||||
|---|---|
|
||||
| `harness-replays.yml` detect-changes | Added `continue-on-error: true` to fetch step + decide step; added `|| true` to `DIFF=$(git diff ...)` per PR #441 |
|
||||
| `harness-replays.yml` detect-changes | Added `continue-on-error: true` to fetch step + decide step; replaced git diff with Compare API per PR #476 |
|
||||
|
||||
### How to diagnose
|
||||
|
||||
@ -111,7 +107,7 @@ jobs:
|
||||
### References
|
||||
|
||||
- Gitea Actions quirk #10 (from migration checklist)
|
||||
- PR #441: fix applied to `harness-replays.yml`
|
||||
- PR #476: Compare API fix applied to `harness-replays.yml`
|
||||
|
||||
---
|
||||
|
||||
@ -144,7 +140,10 @@ files. Secrets and variables are repo-level.
|
||||
`actions/checkout` with `fetch-depth: 0` triggers a full repo history fetch
|
||||
which exceeds the runner's network timeout to the git remote (~15 s).
|
||||
|
||||
**Workaround**: Use `fetch-depth: 1` (default) and add explicit
|
||||
**Primary fix**: Use `fetch-depth: 1` + Gitea Compare API for changed-file
|
||||
detection (PR #476). This avoids git network calls entirely.
|
||||
|
||||
**Legacy workaround**: Use `fetch-depth: 1` (default) and add explicit
|
||||
`git fetch origin <ref> --depth=1` for any additional refs needed.
|
||||
|
||||
**Reference**: PR #441 detect-changes fetch step.
|
||||
**Reference**: PR #476 (Compare API fix), PR #441 (legacy timeout+fallback).
|
||||
|
||||
Loading…
Reference in New Issue
Block a user