docs(runbooks): update gitea-operational-quirks with Compare API as primary fix
All checks were successful
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 19s
CI / Detect changes (pull_request) Successful in 1m17s
E2E API Smoke Test / detect-changes (pull_request) Successful in 1m19s
Lint curl status-code capture / Scan workflows for curl status-capture pollution (pull_request) Successful in 16s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 1m20s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 20s
sop-tier-check / tier-check (pull_request) Successful in 22s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 1m10s
CI / Platform (Go) (pull_request) Successful in 10s
CI / Canvas (Next.js) (pull_request) Successful in 10s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 8s
CI / Python Lint & Test (pull_request) Successful in 8s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 11s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 1m5s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 10s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 9s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 7s

Add SRE's empirical corrections (PR #478): shallow fetch succeeds ~16s,
runner CAN reach git.moleculesai.app, full-history fetch times out due
to ~75MB repo size (not network isolation).

Also add Compare API (PR #476) as the primary recommended fix for
detect-changes git-fetch timeout, superseding the legacy timeout+fallback
approach documented in PR #441.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
Molecule AI · core-devops 2026-05-11 13:13:27 +00:00
parent e7db2d3344
commit e05db75bb8

View File

@ -8,57 +8,53 @@ runbooks.
---
## Gitea 1.22.6 runner network isolation
## Large repo causes fetch timeout on Gitea Actions runner
### Finding
The Gitea Actions runner (container on host `5.78.80.188`) cannot reach the
git remote (`https://git.moleculesai.app`) over HTTPS from inside the runner
container. Any `git fetch`, `git clone`, or `git push` command that contacts
the remote times out at 1215 s.
The Gitea Actions runner (container on host `5.78.80.188`) CAN reach the git
remote (`https://git.moleculesai.app`) over HTTPS. A single-commit shallow fetch
(`--depth=1`) succeeds in ~16 s. However, fetching the **full compressed repo
history** (~75+ MB) exceeds the runner's network timeout window (~15 s).
This is **not a Gitea Actions bug** — it is an operator-level network policy
where the runner container's network namespace is restricted from reaching the
Gitea host HTTPS endpoint. The runner can reach external hosts (GitHub,
Docker Hub, PyPI) normally.
This is **not a Gitea Actions bug** and **not a network isolation policy**
it is a repo-size constraint. The runner can reach external hosts (GitHub,
Docker Hub, PyPI) without issue.
### Impact
Workflows that rely on `git fetch origin <ref>` or `actions/checkout` with
`fetch-depth: 0` (full history) will hang or time out.
Specifically:
- `actions/checkout@v*` with `fetch-depth: 0` hangs (fetching full repo
history takes >30 s before hitting the timeout).
- `git fetch origin main --depth=1` times out at ~15 s.
- `git clone <url>` times out at ~15 s.
Workflows that rely on `actions/checkout` with `fetch-depth: 0` (full history)
or `git clone` will time out. The `git fetch origin <ref> --depth=1` pattern
succeeds reliably.
### Affected workflows
| Workflow | Issue | Workaround |
| Workflow | Issue | Fix |
|---|---|---|
| `harness-replays.yml` detect-changes job | `git fetch origin main --depth=1` times out | Added `timeout 20` + graceful fallback to `run=true` (always run harness) per PR #441 |
| `harness-replays.yml` detect-changes | `fetch-depth: 1` shallow clone + `git fetch origin $BASE --depth=1` times out | Use Gitea Compare API (Gitea→Gitea, no runner network needed) — **primary fix** (PR #476) |
| `publish-workspace-server-image.yml` | In-image `git clone` of workspace templates | Pre-clone manifest deps before compose build (Task #173 pattern) |
| Any workflow using `fetch-depth: 0` | Full history fetch times out | Use `fetch-depth: 1` + explicit `git fetch` for needed refs |
| Any workflow using `fetch-depth: 0` | Full history fetch times out | Use `fetch-depth: 1` + Compare API for changed-file detection |
### How to diagnose
```bash
# From inside the runner (add as a debug step):
timeout 20 git fetch origin main --depth=1
# If this times out: runner cannot reach git remote
# If this SUCCEEDS (~16s): runner can reach the git remote — repo is too large for full-history fetch.
# If this times out: true network isolation (check firewall rules).
```
### Verification
Confirmed 2026-05-11 by running `timeout 20 git fetch origin main --depth=1`
in the `detect-changes` job of `harness-replays.yml`consistently times
out at 15 s. Runner can reach `https://api.github.com` and `https://pypi.org`
without issue.
Confirmed 2026-05-11 by running `timeout 20 git fetch origin base.ref --depth=1`
in the `detect-changes` job of `harness-replays.yml`**succeeds in ~16 s**.
Runner can reach `https://api.github.com` and `https://pypi.org` without issue,
confirming this is a repo-size constraint, not network isolation.
### References
- PR #441: fix for `harness-replays.yml` detect-changes
- PR #476: **primary fix** — use Gitea Compare API instead of git fetch/diff
- PR #441: legacy timeout+fallback fix (now superseded by PR #476)
- Task #173: pre-clone manifest deps pattern for compose build
- internal#102: tracking customer-private + marketplace third-party repos
- `feedback_oss_first_repo_visibility_default`: 5 workspace-template repos
@ -87,7 +83,7 @@ exits with code 0 (e.g., append `|| true` to commands that might fail).
| Workflow | Fix |
|---|---|
| `harness-replays.yml` detect-changes | Added `continue-on-error: true` to fetch step + decide step; added `|| true` to `DIFF=$(git diff ...)` per PR #441 |
| `harness-replays.yml` detect-changes | Added `continue-on-error: true` to fetch step + decide step; replaced git diff with Compare API per PR #476 |
### How to diagnose
@ -111,7 +107,7 @@ jobs:
### References
- Gitea Actions quirk #10 (from migration checklist)
- PR #441: fix applied to `harness-replays.yml`
- PR #476: Compare API fix applied to `harness-replays.yml`
---
@ -144,7 +140,10 @@ files. Secrets and variables are repo-level.
`actions/checkout` with `fetch-depth: 0` triggers a full repo history fetch
which exceeds the runner's network timeout to the git remote (~15 s).
**Workaround**: Use `fetch-depth: 1` (default) and add explicit
**Primary fix**: Use `fetch-depth: 1` + Gitea Compare API for changed-file
detection (PR #476). This avoids git network calls entirely.
**Legacy workaround**: Use `fetch-depth: 1` (default) and add explicit
`git fetch origin <ref> --depth=1` for any additional refs needed.
**Reference**: PR #441 detect-changes fetch step.
**Reference**: PR #476 (Compare API fix), PR #441 (legacy timeout+fallback).