CI: 3 chronic Gitea-Actions workflow flakes (pr-guards / Harness Replays / handlers-postgres-integration) #88

Closed
opened 2026-05-07 23:35:04 +00:00 by hongming · 0 comments
Owner

Context

Surfaced while landing PR #84 (SaaS plugin install via EIC SSH). The PR's actual code is green, but three workflows still fail on every push regardless of what the PR touches. All three are GitHub-Actions-shaped patterns that don't translate cleanly to Gitea / act_runner. Bundling per feedback_gitea_actions_migration_audit_pattern ("bundle per-repo, not per-finding") instead of three separate issues.

1. pr-guards / disable-auto-merge-on-push — fails on every push

Workflow: .github/workflows/pr-guards.yml (the disable-auto-merge-on-push job).

Log excerpt: gh pr merge --disable-auto … → HTTP 405: 405 Method Not Allowed (https://git.moleculesai.app/api/graphql)

Root cause: gh CLI uses GitHub's GraphQL API; Gitea has no GraphQL endpoint. The disable-auto-merge intent is a no-op on Gitea anyway (Gitea has no --auto merge equivalent that the workflow is touching).

Proposed fix: add a Gitea-detection short-circuit at the top of the Disable auto-merge step, e.g.:

if [[ "$GH_HOST" == git.moleculesai.app || -z "$GITHUB_SERVER_URL" || "$GITHUB_SERVER_URL" == *moleculesai.app* ]]; then
  echo 'Gitea — auto-merge gating not applicable; no-op.'
  exit 0
fi

The pr-comment can use Gitea's REST API directly via curl -H "Authorization: token $GH_TOKEN" -X POST .../issues/$PR/comments.

2. Harness Replays / Harness Replays — fails on every commit that touches workspace-server / canvas / tests/harness

Log excerpt: mount src=/workspace/molecule-ai/molecule-core/tests/harness/cf-proxy/nginx.conf, dst=/etc/nginx/nginx.conf: not a directory: Are you trying to mount a directory onto a file (or vice-versa)?

Root cause: act_runner runs the workflow inside a container; when docker-compose tries to bind-mount tests/harness/cf-proxy/nginx.conf into the cf-proxy container, runc looks for the source path on the OUTER docker host (the runner's host), not inside the runner container. The path doesn't exist there.

Chronic on every recent staging commit that touched workspace-server.

Proposed fix options (smallest first):

  • A. Switch the cf-proxy bind to a docker configs: block (file passed as content, not bind path). Smallest change, no runner reconfig.
  • B. docker cp the file in after up, before the smoke-test step.
  • C. Run the harness in act_runner's --privileged / host mode so nested Docker sees the workspace path. Bigger blast radius.

3. Handlers Postgres Integration — intermittent IPv6 flake

Log excerpt: ping: dial tcp [::1]:5432: connect: connection refused from every TestIntegration_*.

Root cause: INTEGRATION_DB_URL=postgres://postgres:test@localhost:5432/moleculelocalhost resolves to ::1 on the runner first, but the postgres service container is only bound on IPv4. The lib/pq driver tries IPv6 first → connection refused.

Intermittent (passed on PR #84's 1st commit, failed on the 2nd; passes on most staging commits). Same Gitea / act_runner network setup quirk as the docker-compose mount issue above.

Proposed fix: in .github/workflows/handlers-postgres-integration.yml, change the env to INTEGRATION_DB_URL=postgres://postgres:test@127.0.0.1:5432/molecule?sslmode=disable. One-character change forces IPv4 and removes the resolution race.

  1. 127.0.0.1 swap in handlers-postgres-integration.yml — 1-line fix, eliminates the intermittent flake immediately.
  2. pr-guards Gitea no-op + REST comment — small.
  3. Harness Replays DinD bind-mount — biggest; tests need a docker-config / docker-cp pattern audit across the harness compose.

All three are independent of any code path PR #84 touches. Once these land, overall: success becomes a reliable signal again on backend PRs.

Refs: PR #84, feedback_gitea_actions_migration_audit_pattern, feedback_act_runner_github_server_url.

## Context Surfaced while landing PR #84 (SaaS plugin install via EIC SSH). The PR's actual code is green, but three workflows still fail on every push regardless of what the PR touches. All three are GitHub-Actions-shaped patterns that don't translate cleanly to Gitea / act_runner. Bundling per `feedback_gitea_actions_migration_audit_pattern` ("bundle per-repo, not per-finding") instead of three separate issues. ## 1. `pr-guards / disable-auto-merge-on-push` — fails on every push Workflow: `.github/workflows/pr-guards.yml` (the `disable-auto-merge-on-push` job). Log excerpt: `gh pr merge --disable-auto … → HTTP 405: 405 Method Not Allowed (https://git.moleculesai.app/api/graphql)` Root cause: `gh` CLI uses GitHub's GraphQL API; Gitea has no GraphQL endpoint. The disable-auto-merge intent is a no-op on Gitea anyway (Gitea has no `--auto` merge equivalent that the workflow is touching). **Proposed fix:** add a Gitea-detection short-circuit at the top of the `Disable auto-merge` step, e.g.: ```bash if [[ "$GH_HOST" == git.moleculesai.app || -z "$GITHUB_SERVER_URL" || "$GITHUB_SERVER_URL" == *moleculesai.app* ]]; then echo 'Gitea — auto-merge gating not applicable; no-op.' exit 0 fi ``` The pr-comment can use Gitea's REST API directly via `curl -H "Authorization: token $GH_TOKEN" -X POST .../issues/$PR/comments`. ## 2. `Harness Replays / Harness Replays` — fails on every commit that touches workspace-server / canvas / tests/harness Log excerpt: `mount src=/workspace/molecule-ai/molecule-core/tests/harness/cf-proxy/nginx.conf, dst=/etc/nginx/nginx.conf: not a directory: Are you trying to mount a directory onto a file (or vice-versa)?` Root cause: act_runner runs the workflow inside a container; when docker-compose tries to bind-mount `tests/harness/cf-proxy/nginx.conf` into the cf-proxy container, runc looks for the source path on the OUTER docker host (the runner's host), not inside the runner container. The path doesn't exist there. Chronic on every recent staging commit that touched workspace-server. **Proposed fix options (smallest first):** - **A.** Switch the cf-proxy bind to a docker `configs:` block (file passed as content, not bind path). Smallest change, no runner reconfig. - **B.** `docker cp` the file in after `up`, before the smoke-test step. - **C.** Run the harness in act_runner's `--privileged` / `host` mode so nested Docker sees the workspace path. Bigger blast radius. ## 3. `Handlers Postgres Integration` — intermittent IPv6 flake Log excerpt: `ping: dial tcp [::1]:5432: connect: connection refused` from every TestIntegration_*. Root cause: `INTEGRATION_DB_URL=postgres://postgres:test@localhost:5432/molecule` — `localhost` resolves to `::1` on the runner first, but the postgres service container is only bound on IPv4. The lib/pq driver tries IPv6 first → connection refused. Intermittent (passed on PR #84's 1st commit, failed on the 2nd; passes on most staging commits). Same Gitea / act_runner network setup quirk as the docker-compose mount issue above. **Proposed fix:** in `.github/workflows/handlers-postgres-integration.yml`, change the env to `INTEGRATION_DB_URL=postgres://postgres:test@127.0.0.1:5432/molecule?sslmode=disable`. One-character change forces IPv4 and removes the resolution race. ## Recommended order to ship 1. **`127.0.0.1` swap in handlers-postgres-integration.yml** — 1-line fix, eliminates the intermittent flake immediately. 2. **pr-guards Gitea no-op + REST comment** — small. 3. **Harness Replays DinD bind-mount** — biggest; tests need a docker-config / docker-cp pattern audit across the harness compose. All three are independent of any code path PR #84 touches. Once these land, `overall: success` becomes a reliable signal again on backend PRs. Refs: PR #84, `feedback_gitea_actions_migration_audit_pattern`, `feedback_act_runner_github_server_url`.
Sign in to join this conversation.
No Milestone
No project
No Assignees
1 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: molecule-ai/molecule-core#88
No description provided.