diff --git a/runbooks/gitea-operational-quirks.md b/runbooks/gitea-operational-quirks.md index 43c0dbaa..be4bafa6 100644 --- a/runbooks/gitea-operational-quirks.md +++ b/runbooks/gitea-operational-quirks.md @@ -8,57 +8,53 @@ runbooks. --- -## Gitea 1.22.6 runner network isolation +## Large repo causes fetch timeout on Gitea Actions runner ### Finding -The Gitea Actions runner (container on host `5.78.80.188`) cannot reach the -git remote (`https://git.moleculesai.app`) over HTTPS from inside the runner -container. Any `git fetch`, `git clone`, or `git push` command that contacts -the remote times out at 12–15 s. +The Gitea Actions runner (container on host `5.78.80.188`) CAN reach the git +remote (`https://git.moleculesai.app`) over HTTPS. A single-commit shallow fetch +(`--depth=1`) succeeds in ~16 s. However, fetching the **full compressed repo +history** (~75+ MB) exceeds the runner's network timeout window (~15 s). -This is **not a Gitea Actions bug** — it is an operator-level network policy -where the runner container's network namespace is restricted from reaching the -Gitea host HTTPS endpoint. The runner can reach external hosts (GitHub, -Docker Hub, PyPI) normally. +This is **not a Gitea Actions bug** and **not a network isolation policy** — +it is a repo-size constraint. The runner can reach external hosts (GitHub, +Docker Hub, PyPI) without issue. ### Impact -Workflows that rely on `git fetch origin ` or `actions/checkout` with -`fetch-depth: 0` (full history) will hang or time out. - -Specifically: -- `actions/checkout@v*` with `fetch-depth: 0` hangs (fetching full repo - history takes >30 s before hitting the timeout). -- `git fetch origin main --depth=1` times out at ~15 s. -- `git clone ` times out at ~15 s. +Workflows that rely on `actions/checkout` with `fetch-depth: 0` (full history) +or `git clone` will time out. The `git fetch origin --depth=1` pattern +succeeds reliably. ### Affected workflows -| Workflow | Issue | Workaround | +| Workflow | Issue | Fix | |---|---|---| -| `harness-replays.yml` detect-changes job | `git fetch origin main --depth=1` times out | Added `timeout 20` + graceful fallback to `run=true` (always run harness) per PR #441 | +| `harness-replays.yml` detect-changes | `fetch-depth: 1` shallow clone + `git fetch origin $BASE --depth=1` times out | Use Gitea Compare API (Gitea→Gitea, no runner network needed) — **primary fix** (PR #476) | | `publish-workspace-server-image.yml` | In-image `git clone` of workspace templates | Pre-clone manifest deps before compose build (Task #173 pattern) | -| Any workflow using `fetch-depth: 0` | Full history fetch times out | Use `fetch-depth: 1` + explicit `git fetch` for needed refs | +| Any workflow using `fetch-depth: 0` | Full history fetch times out | Use `fetch-depth: 1` + Compare API for changed-file detection | ### How to diagnose ```bash # From inside the runner (add as a debug step): timeout 20 git fetch origin main --depth=1 -# If this times out: runner cannot reach git remote +# If this SUCCEEDS (~16s): runner can reach the git remote — repo is too large for full-history fetch. +# If this times out: true network isolation (check firewall rules). ``` ### Verification -Confirmed 2026-05-11 by running `timeout 20 git fetch origin main --depth=1` -in the `detect-changes` job of `harness-replays.yml` — consistently times -out at 15 s. Runner can reach `https://api.github.com` and `https://pypi.org` -without issue. +Confirmed 2026-05-11 by running `timeout 20 git fetch origin base.ref --depth=1` +in the `detect-changes` job of `harness-replays.yml` — **succeeds in ~16 s**. +Runner can reach `https://api.github.com` and `https://pypi.org` without issue, +confirming this is a repo-size constraint, not network isolation. ### References -- PR #441: fix for `harness-replays.yml` detect-changes +- PR #476: **primary fix** — use Gitea Compare API instead of git fetch/diff +- PR #441: legacy timeout+fallback fix (now superseded by PR #476) - Task #173: pre-clone manifest deps pattern for compose build - internal#102: tracking customer-private + marketplace third-party repos - `feedback_oss_first_repo_visibility_default`: 5 workspace-template repos @@ -87,7 +83,7 @@ exits with code 0 (e.g., append `|| true` to commands that might fail). | Workflow | Fix | |---|---| -| `harness-replays.yml` detect-changes | Added `continue-on-error: true` to fetch step + decide step; added `|| true` to `DIFF=$(git diff ...)` per PR #441 | +| `harness-replays.yml` detect-changes | Added `continue-on-error: true` to fetch step + decide step; replaced git diff with Compare API per PR #476 | ### How to diagnose @@ -111,7 +107,7 @@ jobs: ### References - Gitea Actions quirk #10 (from migration checklist) -- PR #441: fix applied to `harness-replays.yml` +- PR #476: Compare API fix applied to `harness-replays.yml` --- @@ -144,7 +140,10 @@ files. Secrets and variables are repo-level. `actions/checkout` with `fetch-depth: 0` triggers a full repo history fetch which exceeds the runner's network timeout to the git remote (~15 s). -**Workaround**: Use `fetch-depth: 1` (default) and add explicit +**Primary fix**: Use `fetch-depth: 1` + Gitea Compare API for changed-file +detection (PR #476). This avoids git network calls entirely. + +**Legacy workaround**: Use `fetch-depth: 1` (default) and add explicit `git fetch origin --depth=1` for any additional refs needed. -**Reference**: PR #441 detect-changes fetch step. +**Reference**: PR #476 (Compare API fix), PR #441 (legacy timeout+fallback).