molecule-core/runbooks/gitea-operational-quirks.md
Molecule AI Core-DevOps df821c8258
Some checks failed
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 6s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 11s
security-review / approved (pull_request) Failing after 12s
qa-review / approved (pull_request) Failing after 13s
CI / Detect changes (pull_request) Successful in 18s
E2E API Smoke Test / detect-changes (pull_request) Successful in 19s
sop-tier-check / tier-check (pull_request) Successful in 14s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 18s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 19s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 19s
gate-check-v3 / gate-check (pull_request) Successful in 20s
CI / Platform (Go) (pull_request) Successful in 6s
CI / Canvas (Next.js) (pull_request) Successful in 5s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 5s
CI / Python Lint & Test (pull_request) Successful in 4s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 6s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 6s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 9s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 7s
CI / all-required (pull_request) Successful in 4s
audit-force-merge / audit (pull_request) Successful in 4s
fix(ci): sop-tier-check gracefully handles empty/invalid token
SOP_FAIL_OPEN=1 was not preventing CI failures because three API calls
with `set -euo pipefail` would abort the script before reaching the
SOP_FAIL_OPEN exit block:

1. `WHOAMI=$(curl ... | jq -r ...)` — jq exits 1 on empty input,
   triggering set -e → script exits before SOP_FAIL_OPEN check.
2. `curl` for reviews — curl exits non-zero on 401 from empty token,
   triggering set -e → same problem.
3. `curl` for org teams list — same issue.

Fix: add `|| true` to jq pipelines and `set +e` / `set -e` guards
around curl calls that may fail with empty token. When SOP_FAIL_OPEN=1
and the token is invalid, the script now exits 0 instead of 1,
preventing blocking CI failures on unconfigured runners.

Refs: sop-tier-check failure on PRs #617, #621, #587, #562
2026-05-12 03:16:17 +00:00

12 KiB
Raw Blame History

Gitea Actions operational quirks (molecule-core)

Documents persistent operational findings about Gitea Actions runner behaviour that differ from GitHub Actions and require workarounds in workflow YAML or runbooks.

Last updated: 2026-05-12 (infra-runtime-be-agent)


Quirk #1 — Large repo causes fetch timeout on Gitea Actions runner

Finding

The Gitea Actions runner (container on host 5.78.80.188) can reach the git remote (https://git.moleculesai.app) over HTTPS — a single-commit shallow fetch (--depth=1) succeeds in ~16 s. However, fetching the full compressed repo history (~75+ MB) exceeds the runner's network timeout window (~15 s).

This is not a Gitea Actions bug and not a network isolation policy — it is a repo-size constraint. The runner can reach external hosts (GitHub, Docker Hub, PyPI) without issue.

Impact

Workflows that rely on actions/checkout with fetch-depth: 0 (full history) or git clone will time out.

Specifically:

  • actions/checkout@v* with fetch-depth: 0 hangs (fetching full repo history takes >15 s before hitting the timeout).
  • git clone <url> hangs for the same reason.
  • git fetch origin <ref> --depth=1 succeeds in ~16 s — this is the working pattern.

Affected workflows

Workflow Issue Workaround
harness-replays.yml detect-changes job fetch-depth: 0 + git clone time out Added timeout 20 git fetch origin base.ref --depth=1 + continue-on-error: true + fallback to run=true per PR #441
publish-workspace-server-image.yml In-image git clone of workspace templates Pre-clone manifest deps before compose build (Task #173 pattern)
Any workflow using fetch-depth: 0 Full history fetch times out Use fetch-depth: 1 + explicit git fetch for needed refs

How to diagnose

# From inside the runner (add as a debug step):
timeout 20 git fetch origin main --depth=1
# If this SUCCEEDS (~16s): runner can reach the git remote — the repo is
#   too large for full-history fetch.
# If this times out: true network isolation (unlikely; check firewall rules).

Verification

Confirmed 2026-05-11 by running timeout 20 git fetch origin base.ref --depth=1 in the detect-changes job of harness-replays.ymlsucceeds in ~16 s. Runner can reach https://api.github.com and https://pypi.org without issue, confirming this is a repo-size constraint, not network isolation.

References

  • PR #441: fix for harness-replays.yml detect-changes
  • Task #173: pre-clone manifest deps pattern for compose build
  • internal#102: tracking customer-private + marketplace third-party repos
  • feedback_oss_first_repo_visibility_default: 5 workspace-template repos flipped public to allow pre-clone without auth

Quirk #2 — continue-on-error only works at step level, not job level

Finding

Gitea Actions (1.22.6) does not honour continue-on-error: true at the job level the way GitHub Actions does. A job with continue-on-error: true that fails still reports status: failure in the commit status API.

Only continue-on-error: true at the step level works as expected.

Impact

If you want a job to always "pass" in the status API (so dependent jobs can run and the overall CI does not show failure), you must add continue-on-error: true to every step that can fail, AND ensure each step exits with code 0 (e.g., append || true to commands that might fail).

Affected workflows

Workflow Fix
harness-replays.yml detect-changes Added continue-on-error: true to fetch step + decide step; added `

How to diagnose

# WRONG — job reports as failure despite flag
jobs:
  my-job:
    continue-on-error: true   # ← ignored by Gitea
    steps:
      - run: git diff ...    # ← if this fails, job = failure
        # job-level flag does not help

# RIGHT — step-level flag prevents step from failing
jobs:
  my-job:
    steps:
      - run: git diff ... || true  # ← step exits 0
        continue-on-error: true     # ← belt and suspenders

References

  • Quirk #10 (this document): Gitea does NOT auto-populate secrets.GITHUB_TOKEN
  • PR #441: fix applied to harness-replays.yml

Quirk #3 — workflow_dispatch.inputs not supported

Gitea 1.22.6 parser rejects workflow_dispatch.inputs. Drop from all workflow YAML files ported from GitHub Actions. Manual triggers should use workflow_dispatch without inputs:.

Reference: feedback_gitea_workflow_dispatch_inputs_unsupported


Quirk #4 — merge_group not supported

Gitea has no merge queue concept. Drop merge_group: triggers from all workflow YAML files.


Quirk #5 — environment: blocks not supported

Gitea has no environments concept. Drop environment: from all workflow YAML files. Secrets and variables are repo-level.


Quirk #6 — Gitea combined status reports failure when all contexts are null

Finding

When ALL individual status contexts for a commit have state: null (no runner has reported yet), Gitea reports the combined commit status as failure. This is a Gitea Actions bug — it conflates "no status reported yet" with "failed".

Impact

  • The main-red-watchdog workflow opens a [main-red] issue for every scheduled workflow run where the combined state is failure — even when the failure is entirely due to Gitea's combined-status bug.
  • This causes spurious [main-red] issues that waste SRE time investigating non-existent failures.
  • This is especially confusing for schedule:-only workflows (canary, sweep jobs, synth-E2E): Gitea attributes their scheduled runs to main's HEAD commit, so if a scheduled run fires while all contexts are still state: null, the watchdog opens a [main-red] issue on the latest main commit even though that commit itself is perfectly fine.

How to diagnose

Always check the individual context state fields, not the combined state/combined_state. In the /repos/{org}/{repo}/commits/{sha}/statuses API response, look for "state": null on every entry — if all are null, the combined failure is Gitea's bug, not a real CI failure.

{
  "combined_state": "failure",   // ← Gitea bug when all are null
  "contexts": [
    { "context": "CI / Lint", "state": null },  // still running
    { "context": "CI / Test", "state": null }   // still running
  ]
}

Affected workflows

All workflows, but especially schedule:-only workflows that run on main. The main-red-watchdog (.gitea/workflows/main-red-watchdog.yml) is the primary consumer of combined status and is affected.

References

  • Issue #481: first real-world case of this bug (2026-05-11)
  • feedback_no_such_thing_as_flakes: watchdog directive

Quirk #7 — TBD

[Placeholder — document here when a new Gitea Actions quirk is discovered.]

Finding

[What Gitea Actions does differently from GitHub Actions.]

Impact

[Which workflows or operations are affected.]

Workaround

[How to work around this quirk.]

References

  • internal#[N]: first observation

Quirk #8 — TBD

[Placeholder — document here when a new Gitea Actions quirk is discovered.]

Finding

[What Gitea Actions does differently from GitHub Actions.]

Impact

[Which workflows or operations are affected.]

Workaround

[How to work around this quirk.]

References

  • internal#[N]: first observation

Quirk #9 — TBD

[Placeholder — document here when a new Gitea Actions quirk is discovered.]

Finding

[What Gitea Actions does differently from GitHub Actions.]

Impact

[Which workflows or operations are affected.]

Workaround

[How to work around this quirk.]

References

  • internal#[N]: first observation

Quirk #10 — Gitea does NOT auto-populate secrets.GITHUB_TOKEN

Finding

Gitea Actions (1.22.6) does not auto-populate secrets.GITHUB_TOKEN the way GitHub Actions does. A workflow that references secrets.GITHUB_TOKEN without explicitly provisioning a named secret gets an empty string — not a read-only token scoped to the repo.

Impact

Workflows that call the Gitea REST API using secrets.GITHUB_TOKEN as auth receive HTTP 401 on every API call. Affected workflows in molecule-core:

Workflow Symptom Workaround
gate-check-v3.yml Reports BLOCKED on every PR Provision SOP_TIER_CHECK_TOKEN; update workflow to use it
qa-review.yml Fails immediately on PR open Same — needs named secret
security-review.yml Fails immediately on PR open Same — needs named secret

How to diagnose

Add a debug step to the failing workflow:

- name: Diagnose token
  run: |
    echo "Token present: ${{ secrets.GITHUB_TOKEN != '' }}"
    curl -sS --fail -H "Authorization: token ${{ secrets.GITHUB_TOKEN }}" \
      "$GITHUB_SERVER_URL/api/v1/user" | jq -r '.login'
    # Expected (GitHub): prints your username.
    # Actual (Gitea): HTTP 401 or empty string.    

References

  • internal#325: root-cause analysis and token provisioning
  • feedback_gitea_no_auto_supplied_github_token

Quirk #11 — PR-create event dispatcher races — only 1 of N workflows fires on pull_request opened

Finding

When a PR is created via the Gitea web UI or API, the Gitea Actions event dispatcher may fire only 1 of N eligible workflows on the initial pull_request opened event. All other eligible workflows are silently dropped.

This was observed on molecule-core PR #558 (created 2026-05-11T19:54:10Z): 12+ workflows had no paths: filter and should have fired, but only sop-tier-check.yml dispatched.

Concurrent PRs created within the same minute received 1230 dispatches each, confirming this is specific to the PR-create event dispatch, not a general runner capacity issue.

Impact

  • PRs may not run the full CI suite on first open.
  • gate-check-v3, secret-scan, qa-review, and security-review can be silently absent from the PR's status checks.
  • Branch protection may block merge even though CI is effectively green.

How to diagnose

# List workflow runs for the PR:
gh run list --event pull_request --repo molecule-ai/molecule-core \
  | grep "$(gh pr view $PR --json number --jq '.number')"

# Expected: 12+ runs on PR open.
# Actual (when race fires): only 1 run.

Workaround

Force a second dispatch by pushing a no-op synchronize commit:

git commit --allow-empty -m "chore: trigger workflows [skip ci]"
git push

The synchronize event fires a second pull_request event, which reliably triggers all eligible workflows.

References

  • internal#329: first observation on PR #558
  • feedback_gitea_pr_create_dispatcher_race

When you find a new quirk

Copy the template below, increment the quirk number, and fill in the finding, impact, workaround, and references. Place the new section in the correct numerical position (before the next higher-numbered quirk). Update this section's final paragraph to remove the next slot's number.

Template

## Quirk #N — <short title>

### Finding

<What Gitea Actions does differently from GitHub Actions.>

### Impact

<Which workflows or operations are affected. Include an affected workflows
table if more than one is affected.>

### How to diagnose

<Shell commands or API calls that confirm this is the quirk, not a real failure.>

### Workaround

<How to work around this quirk in workflow YAML or operations.>

### References

- internal#[N]: first observation
- <Any Gitea issue, feedback label, or upstream bug tracker reference>

Open questions for Gitea 1.23

  • act_runner concurrent-job cap: issue #305 — runner saturation under merge burst; needs max_concurrent_jobs cap configured on act_runner
  • Infisical→Gitea secret-sync: issue #307 — eliminate manual secret PUTs by wiring an Infisical cron to the Gitea API
  • PR-create dispatcher race resolution: internal #329 — is there a Gitea fix or config knob to disable the race? File upstream bug if not
  • GITHUB_TOKEN auto-population: internal #325 — is this on the Gitea 1.23 roadmap? If not, the workaround (named secret) is the permanent answer