fix(ci): close 3 chronic Gitea-Actions workflow flakes (closes #88)
Some checks failed
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 10s
E2E API Smoke Test / detect-changes (pull_request) Successful in 9s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 8s
CI / Detect changes (pull_request) Successful in 9s
CodeQL / Analyze (${{ matrix.language }}) (javascript-typescript) (pull_request) Successful in 9s
Lint curl status-code capture / Scan workflows for curl status-capture pollution (pull_request) Successful in 7s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 9s
branch-protection drift check / Branch protection drift (pull_request) Successful in 11s
CodeQL / Analyze (${{ matrix.language }}) (go) (pull_request) Successful in 10s
Check merge_group trigger on required workflows / Required workflows have merge_group trigger (pull_request) Successful in 11s
CodeQL / Analyze (${{ matrix.language }}) (python) (pull_request) Successful in 9s
Harness Replays / detect-changes (pull_request) Successful in 9s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 7s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 2s
CI / Platform (Go) (pull_request) Successful in 3s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 6s
CI / Python Lint & Test (pull_request) Successful in 4s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 4s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 3s
CI / Canvas (Next.js) (pull_request) Successful in 5s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 5s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
Harness Replays / Harness Replays (pull_request) Failing after 46s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 49s

Three workflows have been failing on every push to this Gitea repo for
GitHub-shaped reasons that don't translate to act_runner. Surfaced
while landing #84; bundled per `feedback_gitea_actions_migration_audit_pattern`
("bundle per-repo, not per-finding") instead of three separate PRs.

1) handlers-postgres-integration: localhost → 127.0.0.1
   - lib/pq tries to dial localhost → ::1 first; the postgres service
     container only listens on IPv4 → ECONNREFUSED → all
     TestIntegration_* fail. Pin IPv4 to make the job deterministic.

2) pr-guards / disable-auto-merge-on-push: Gitea no-op
   - The previous reusable-workflow caller invoked `gh pr merge
     --disable-auto`, which calls GitHub's GraphQL API. Gitea returns
     HTTP 405 on /api/graphql → step always fails. Inline the step so
     it can detect Gitea (GITEA_ACTIONS=true OR repo url under
     moleculesai.app) and no-op with a notice. Auto-merge gating is
     moot on Gitea anyway: there's no `--auto` primitive being
     touched. Job stays ALWAYS-RUN so branch protection's required
     check still lands SUCCESS (avoids the SKIPPED-in-set trap from
     `feedback_branch_protection_check_name_parity`).

3) Harness Replays: cf-proxy nginx.conf via docker `configs:` (not bind)
   - act_runner runs the workflow inside a runner container; runc in
     the docker daemon below resolves bind-mount source paths on the
     OUTER host, not inside the runner. The path
     `/workspace/.../cf-proxy/nginx.conf` is invisible there → "not a
     directory" runc error. Switching to compose `configs:` packages
     the file as content rather than a host bind, sidestepping the
     DinD path-translation gap.

Local validation:
  - YAML parsed clean for all 3 files.
  - cf-proxy nginx.conf: standalone `docker compose run cf-proxy
    nginx -T` reproduced the configs: mount end-to-end and dumped the
    config correctly. The full harness compose still renders via
    `docker compose config`.

Real-CI verification will land on this branch's first push.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
parent 0c7f3c8909
commit 87b971a292
3 changed files with 85 additions and 16 deletions

View File

@ -97,7 +97,7 @@ jobs:
# Wait for postgres to actually accept connections (the # Wait for postgres to actually accept connections (the
# GHA --health-cmd is best-effort but psql can still race). # GHA --health-cmd is best-effort but psql can still race).
for i in {1..15}; do for i in {1..15}; do
if pg_isready -h localhost -p 5432 -U postgres -q; then break; fi if pg_isready -h 127.0.0.1 -p 5432 -U postgres -q; then break; fi
echo "waiting for postgres..."; sleep 2 echo "waiting for postgres..."; sleep 2
done done
@ -131,7 +131,7 @@ jobs:
# not fine once a cross-table atomicity test came in. # not fine once a cross-table atomicity test came in.
set +e set +e
for migration in $(ls migrations/*.sql 2>/dev/null | grep -v '\.down\.sql$' | sort); do for migration in $(ls migrations/*.sql 2>/dev/null | grep -v '\.down\.sql$' | sort); do
if psql -h localhost -U postgres -d molecule -v ON_ERROR_STOP=1 \ if psql -h 127.0.0.1 -U postgres -d molecule -v ON_ERROR_STOP=1 \
-f "$migration" >/dev/null 2>&1; then -f "$migration" >/dev/null 2>&1; then
echo "✓ $(basename "$migration")" echo "✓ $(basename "$migration")"
else else
@ -145,7 +145,7 @@ jobs:
# fail if any didn't land — that would be a real regression we # fail if any didn't land — that would be a real regression we
# want loud. # want loud.
for tbl in delegations workspaces activity_logs pending_uploads; do for tbl in delegations workspaces activity_logs pending_uploads; do
if ! psql -h localhost -U postgres -d molecule -tA \ if ! psql -h 127.0.0.1 -U postgres -d molecule -tA \
-c "SELECT 1 FROM information_schema.tables WHERE table_name = '$tbl'" \ -c "SELECT 1 FROM information_schema.tables WHERE table_name = '$tbl'" \
| grep -q 1; then | grep -q 1; then
echo "::error::$tbl table missing after migration replay — handler integration tests would be meaningless" echo "::error::$tbl table missing after migration replay — handler integration tests would be meaningless"
@ -157,7 +157,14 @@ jobs:
- if: needs.detect-changes.outputs.handlers == 'true' - if: needs.detect-changes.outputs.handlers == 'true'
name: Run integration tests name: Run integration tests
env: env:
INTEGRATION_DB_URL: postgres://postgres:test@localhost:5432/molecule?sslmode=disable # 127.0.0.1, NOT localhost. On Gitea / act_runner the runner host
# has IPv6 enabled, so `localhost` resolves to `::1` first, and
# the Postgres service container only listens on IPv4 → lib/pq's
# first dial hits ECONNREFUSED. The migration step uses psql -h
# localhost which falls back to IPv4 cleanly, so the flake hides
# there and surfaces only at test time. Pinning IPv4 makes the
# whole job deterministic. (Issue #88, item 3.)
INTEGRATION_DB_URL: postgres://postgres:test@127.0.0.1:5432/molecule?sslmode=disable
run: | run: |
go test -tags=integration -timeout 5m -v ./internal/handlers/ -run "^TestIntegration_" go test -tags=integration -timeout 5m -v ./internal/handlers/ -run "^TestIntegration_"
@ -167,5 +174,5 @@ jobs:
PGPASSWORD: test PGPASSWORD: test
run: | run: |
echo "::group::delegations table state" echo "::group::delegations table state"
psql -h localhost -U postgres -d molecule -c "SELECT * FROM delegations LIMIT 50;" || true psql -h 127.0.0.1 -U postgres -d molecule -c "SELECT * FROM delegations LIMIT 50;" || true
echo "::endgroup::" echo "::endgroup::"

View File

@ -1,14 +1,25 @@
name: pr-guards name: pr-guards
# Thin caller that delegates to the molecule-ci reusable guard. Today # PR-time guards. Today the only guard is "disable auto-merge when a
# the guard is just "disable auto-merge when a new commit is pushed # new commit is pushed after auto-merge was enabled" — added 2026-04-27
# after auto-merge was enabled" — added 2026-04-27 after PR #2174 # after PR #2174 auto-merged with only its first commit because the
# auto-merged with only its first commit because the second commit # second commit was pushed after the merge queue had locked the PR's
# was pushed after the merge queue had locked the PR's SHA. # SHA.
# #
# When more PR-time guards land in molecule-ci, add them here as # Why this is inlined (not delegated to molecule-ci's reusable
# additional jobs that share the same pull_request:synchronize # workflow): the reusable workflow uses `gh pr merge --disable-auto`,
# trigger. # which calls GitHub's GraphQL API. Gitea has no GraphQL endpoint and
# returns HTTP 405 on /api/graphql, so the job failed on every Gitea
# PR push since the 2026-05-06 migration. Gitea also has no `--auto`
# merge primitive that this job could be acting on, so the right
# behaviour on Gitea is "no-op + green status" — not a 405.
#
# Inlining (vs. an `if:` on the `uses:` line) keeps the job ALWAYS
# running, which matters for branch protection: required-check names
# need a job that emits SUCCESS terminal state, not SKIPPED. See
# `feedback_branch_protection_check_name_parity` and `feedback_pr_merge_safety_guards`.
#
# Issue #88 item 1.
on: on:
pull_request: pull_request:
@ -19,4 +30,34 @@ permissions:
jobs: jobs:
disable-auto-merge-on-push: disable-auto-merge-on-push:
uses: molecule-ai/molecule-ci/.github/workflows/disable-auto-merge-on-push.yml@main runs-on: ubuntu-latest
steps:
# Detect Gitea Actions. act_runner sets GITEA_ACTIONS=true in the
# step env on every job. Belt-and-suspenders: also check the repo
# url's host, which is independent of any runner-side env config
# (covers a future Gitea host where the env var is forgotten).
- name: Detect runner host
id: host
run: |
if [[ "${GITEA_ACTIONS:-}" == "true" ]] || [[ "${{ github.server_url }}" == *moleculesai.app* ]] || [[ "${{ github.event.repository.html_url }}" == *moleculesai.app* ]]; then
echo "is_gitea=true" >> "$GITHUB_OUTPUT"
echo "::notice::Gitea Actions detected — auto-merge gating is not applicable here (Gitea has no --auto merge primitive). Job will no-op."
else
echo "is_gitea=false" >> "$GITHUB_OUTPUT"
fi
- name: Disable auto-merge (GitHub only)
if: steps.host.outputs.is_gitea != 'true'
env:
GH_TOKEN: ${{ github.token }}
PR: ${{ github.event.pull_request.number }}
REPO: ${{ github.repository }}
NEW_SHA: ${{ github.sha }}
run: |
set -eu
gh pr merge "$PR" --disable-auto -R "$REPO" || true
gh pr comment "$PR" -R "$REPO" --body "🔒 Auto-merge disabled — new commit (\`${NEW_SHA:0:7}\`) pushed after auto-merge was enabled. The merge queue locks SHAs at entry, so subsequent pushes can race. Verify the new commit and re-enable with \`gh pr merge --auto\`."
- name: Gitea no-op
if: steps.host.outputs.is_gitea == 'true'
run: echo "Gitea Actions — auto-merge gating not applicable; no-op (job intentionally green so branch protection's required-check name lands SUCCESS)."

View File

@ -167,6 +167,18 @@ services:
# Production shape: same single CF tunnel front-doors every tenant # Production shape: same single CF tunnel front-doors every tenant
# subdomain — the Host header carries the tenant identity, not the # subdomain — the Host header carries the tenant identity, not the
# routing destination. Local cf-proxy mirrors this exactly. # routing destination. Local cf-proxy mirrors this exactly.
#
# nginx.conf delivery: docker compose `configs:` block (not a bind
# mount) so the file ships as content packaged by compose, not a
# host-path bind that has to be visible to the docker daemon's runc.
# Bind mounts break under Gitea's act_runner DinD because runc
# resolves the source path on the OUTER docker host (the runner's
# host filesystem), not inside the runner container — the path
# `/workspace/.../tests/harness/cf-proxy/nginx.conf` is only visible
# to the runner, not to the daemon below it. The `configs:` form
# uploads the file to the daemon as part of the service definition
# and is bind-mount-equivalent at the container level. See issue #88
# item 2.
cf-proxy: cf-proxy:
image: nginx:1.27-alpine image: nginx:1.27-alpine
depends_on: depends_on:
@ -174,14 +186,23 @@ services:
condition: service_healthy condition: service_healthy
tenant-beta: tenant-beta:
condition: service_healthy condition: service_healthy
volumes: configs:
- ./cf-proxy/nginx.conf:/etc/nginx/nginx.conf:ro - source: cf-proxy-nginx-conf
target: /etc/nginx/nginx.conf
mode: 0444
# Bind to 127.0.0.1 only — hardcoded ADMIN_TOKENs make 0.0.0.0 # Bind to 127.0.0.1 only — hardcoded ADMIN_TOKENs make 0.0.0.0
# exposure unsafe even on a local network. # exposure unsafe even on a local network.
ports: ports:
- "127.0.0.1:8080:8080" - "127.0.0.1:8080:8080"
networks: [harness-net] networks: [harness-net]
configs:
# Defined once at compose level so any future service (e.g. a second
# nginx variant for an external-connect smoke test) can reuse the
# same source file.
cf-proxy-nginx-conf:
file: ./cf-proxy/nginx.conf
networks: networks:
harness-net: harness-net:
name: molecule-harness-net name: molecule-harness-net