fix(runbooks): correct Gitea runner network/fetch timing facts
SRE review of PR #457 flagged two factual errors that were not addressed before merge. Applying corrections directly per SRE mandate: no manual production changes without config-as-code. Corrections: 1. Remove "git fetch --depth=1 times out" — shallow fetch succeeds in ~16s per PR #441 detect-changes evidence. Only fetch-depth:0 and git clone time out due to ~75MB repo history size. 2. Rewrite "runner cannot reach git remote" to accurately state: runner CAN reach the remote; fetching full compressed history exceeds the ~15s network timeout window. Repo-size constraint, not network isolation. 3. Updated diagnosis snippet and verification section to match. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
parent
6403c5196f
commit
58be7b29ac
@ -8,36 +8,36 @@ runbooks.
|
||||
|
||||
---
|
||||
|
||||
## Gitea 1.22.6 runner network isolation
|
||||
## Large repo causes fetch timeout on Gitea Actions runner
|
||||
|
||||
### Finding
|
||||
|
||||
The Gitea Actions runner (container on host `5.78.80.188`) cannot reach the
|
||||
git remote (`https://git.moleculesai.app`) over HTTPS from inside the runner
|
||||
container. Any `git fetch`, `git clone`, or `git push` command that contacts
|
||||
the remote times out at 12–15 s.
|
||||
The Gitea Actions runner (container on host `5.78.80.188`) can reach the git
|
||||
remote (`https://git.moleculesai.app`) over HTTPS — a single-commit shallow
|
||||
fetch (`--depth=1`) succeeds in ~16 s. However, fetching the **full compressed
|
||||
repo history** (~75+ MB) exceeds the runner's network timeout window (~15 s).
|
||||
|
||||
This is **not a Gitea Actions bug** — it is an operator-level network policy
|
||||
where the runner container's network namespace is restricted from reaching the
|
||||
Gitea host HTTPS endpoint. The runner can reach external hosts (GitHub,
|
||||
Docker Hub, PyPI) normally.
|
||||
This is **not a Gitea Actions bug** and **not a network isolation policy** —
|
||||
it is a repo-size constraint. The runner can reach external hosts (GitHub,
|
||||
Docker Hub, PyPI) without issue.
|
||||
|
||||
### Impact
|
||||
|
||||
Workflows that rely on `git fetch origin <ref>` or `actions/checkout` with
|
||||
`fetch-depth: 0` (full history) will hang or time out.
|
||||
Workflows that rely on `actions/checkout` with `fetch-depth: 0` (full history)
|
||||
or `git clone` will time out.
|
||||
|
||||
Specifically:
|
||||
- `actions/checkout@v*` with `fetch-depth: 0` hangs (fetching full repo
|
||||
history takes >30 s before hitting the timeout).
|
||||
- `git fetch origin main --depth=1` times out at ~15 s.
|
||||
- `git clone <url>` times out at ~15 s.
|
||||
history takes >15 s before hitting the timeout).
|
||||
- `git clone <url>` hangs for the same reason.
|
||||
- `git fetch origin <ref> --depth=1` **succeeds** in ~16 s — this is the
|
||||
working pattern.
|
||||
|
||||
### Affected workflows
|
||||
|
||||
| Workflow | Issue | Workaround |
|
||||
|---|---|---|
|
||||
| `harness-replays.yml` detect-changes job | `git fetch origin main --depth=1` times out | Added `timeout 20` + graceful fallback to `run=true` (always run harness) per PR #441 |
|
||||
| `harness-replays.yml` detect-changes job | `fetch-depth: 0` + `git clone` time out | Added `timeout 20 git fetch origin base.ref --depth=1` + `continue-on-error: true` + fallback to `run=true` per PR #441 |
|
||||
| `publish-workspace-server-image.yml` | In-image `git clone` of workspace templates | Pre-clone manifest deps before compose build (Task #173 pattern) |
|
||||
| Any workflow using `fetch-depth: 0` | Full history fetch times out | Use `fetch-depth: 1` + explicit `git fetch` for needed refs |
|
||||
|
||||
@ -46,15 +46,17 @@ Specifically:
|
||||
```bash
|
||||
# From inside the runner (add as a debug step):
|
||||
timeout 20 git fetch origin main --depth=1
|
||||
# If this times out: runner cannot reach git remote
|
||||
# If this SUCCEEDS (~16s): runner can reach the git remote — the repo is
|
||||
# too large for full-history fetch.
|
||||
# If this times out: true network isolation (unlikely; check firewall rules).
|
||||
```
|
||||
|
||||
### Verification
|
||||
|
||||
Confirmed 2026-05-11 by running `timeout 20 git fetch origin main --depth=1`
|
||||
in the `detect-changes` job of `harness-replays.yml` — consistently times
|
||||
out at 15 s. Runner can reach `https://api.github.com` and `https://pypi.org`
|
||||
without issue.
|
||||
Confirmed 2026-05-11 by running `timeout 20 git fetch origin base.ref --depth=1`
|
||||
in the `detect-changes` job of `harness-replays.yml` — **succeeds in ~16 s**.
|
||||
Runner can reach `https://api.github.com` and `https://pypi.org` without issue,
|
||||
confirming this is a repo-size constraint, not network isolation.
|
||||
|
||||
### References
|
||||
|
||||
|
||||
Loading…
Reference in New Issue
Block a user