Merge branch 'staging' into fix/saas-plugin-install-eic
Some checks failed
CodeQL / Analyze (${{ matrix.language }}) (go) (pull_request) Successful in 6s
CodeQL / Analyze (${{ matrix.language }}) (javascript-typescript) (pull_request) Successful in 6s
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 19s
CodeQL / Analyze (${{ matrix.language }}) (python) (pull_request) Successful in 7s
CI / Detect changes (pull_request) Successful in 20s
pr-guards / disable-auto-merge-on-push (pull_request) Successful in 6s
E2E API Smoke Test / detect-changes (pull_request) Successful in 17s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 17s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 19s
Harness Replays / detect-changes (pull_request) Successful in 18s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 15s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 15s
CI / Canvas (Next.js) (pull_request) Successful in 16s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 6s
CI / Python Lint & Test (pull_request) Successful in 8s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 12s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 17s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Failing after 2m13s
Harness Replays / Harness Replays (pull_request) Successful in 2m14s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 7m13s
CI / Platform (Go) (pull_request) Successful in 12m8s

This commit is contained in:
claude-ceo-assistant 2026-05-08 01:29:11 +00:00
commit 576166c8c3
7 changed files with 524 additions and 102 deletions

View File

@ -87,7 +87,7 @@ jobs:
run: go mod download run: go mod download
- if: needs.changes.outputs.platform == 'true' - if: needs.changes.outputs.platform == 'true'
run: go build ./cmd/server run: go build ./cmd/server
# CLI (molecli) moved to standalone repo: git.moleculesai.app/molecule-ai/molecule-cli # CLI (molecli) moved to standalone repo: github.com/molecule-ai/molecule-cli
- if: needs.changes.outputs.platform == 'true' - if: needs.changes.outputs.platform == 'true'
run: go vet ./... || true run: go vet ./... || true
- if: needs.changes.outputs.platform == 'true' - if: needs.changes.outputs.platform == 'true'
@ -235,7 +235,13 @@ jobs:
run: npx vitest run --coverage run: npx vitest run --coverage
- name: Upload coverage summary as artifact - name: Upload coverage summary as artifact
if: needs.changes.outputs.canvas == 'true' && always() if: needs.changes.outputs.canvas == 'true' && always()
uses: actions/upload-artifact@v3 # pinned to v3 for Gitea act_runner v0.6 compatibility (internal#46) # Pinned to v3 for Gitea act_runner v0.6 compatibility — v4+ uses
# the GHES 3.10+ artifact protocol that Gitea 1.22.x does NOT
# implement, surfacing as `GHESNotSupportedError: @actions/artifact
# v2.0.0+, upload-artifact@v4+ and download-artifact@v4+ are not
# currently supported on GHES`. Drop this pin when Gitea ships
# the v4 protocol (tracked: post-Gitea-1.23 followup).
uses: actions/upload-artifact@c6a366c94c3e0affe28c06c8df20a878f24da3cf # v3.2.2
with: with:
name: canvas-coverage-${{ github.run_id }} name: canvas-coverage-${{ github.run_id }}
path: canvas/coverage/ path: canvas/coverage/
@ -243,8 +249,8 @@ jobs:
if-no-files-found: warn if-no-files-found: warn
# MCP Server + SDK removed from CI — now in standalone repos: # MCP Server + SDK removed from CI — now in standalone repos:
# - git.moleculesai.app/molecule-ai/molecule-mcp-server (npm CI) # - github.com/molecule-ai/molecule-mcp-server (npm CI)
# - git.moleculesai.app/molecule-ai/molecule-sdk-python (PyPI CI) # - github.com/molecule-ai/molecule-sdk-python (PyPI CI)
# e2e-api job moved to .github/workflows/e2e-api.yml (issue #458). # e2e-api job moved to .github/workflows/e2e-api.yml (issue #458).
# It now has workflow-level concurrency (cancel-in-progress: false) so # It now has workflow-level concurrency (cancel-in-progress: false) so
@ -434,5 +440,5 @@ jobs:
fi fi
# SDK + plugin validation moved to standalone repo: # SDK + plugin validation moved to standalone repo:
# git.moleculesai.app/molecule-ai/molecule-sdk-python # github.com/molecule-ai/molecule-sdk-python

View File

@ -139,7 +139,11 @@ jobs:
- name: Upload Playwright report on failure - name: Upload Playwright report on failure
if: failure() && needs.detect-changes.outputs.canvas == 'true' if: failure() && needs.detect-changes.outputs.canvas == 'true'
uses: actions/upload-artifact@v3 # pinned to v3 for Gitea act_runner v0.6 compatibility (internal#46) # Pinned to v3 for Gitea act_runner v0.6 compatibility — v4+ uses
# the GHES 3.10+ artifact protocol that Gitea 1.22.x does NOT
# implement (see ci.yml upload step for the canonical error
# cite). Drop this pin when Gitea ships the v4 protocol.
uses: actions/upload-artifact@c6a366c94c3e0affe28c06c8df20a878f24da3cf # v3.2.2
with: with:
name: playwright-report-staging name: playwright-report-staging
path: canvas/playwright-report-staging/ path: canvas/playwright-report-staging/
@ -147,7 +151,8 @@ jobs:
- name: Upload screenshots on failure - name: Upload screenshots on failure
if: failure() && needs.detect-changes.outputs.canvas == 'true' if: failure() && needs.detect-changes.outputs.canvas == 'true'
uses: actions/upload-artifact@v3 # pinned to v3 for Gitea act_runner v0.6 compatibility (internal#46) # Pinned to v3 for Gitea act_runner v0.6 compatibility (see above).
uses: actions/upload-artifact@c6a366c94c3e0affe28c06c8df20a878f24da3cf # v3.2.2
with: with:
name: playwright-screenshots name: playwright-screenshots
path: canvas/test-results/ path: canvas/test-results/

View File

@ -14,12 +14,42 @@ name: Handlers Postgres Integration
# self-review caught it took 2 minutes to set up and would have caught # self-review caught it took 2 minutes to set up and would have caught
# the bug at PR-time. # the bug at PR-time.
# #
# This job spins a Postgres service container, applies the migration, # Why this workflow does NOT use `services: postgres:` (Class B fix)
# and runs `go test -tags=integration` against a live DB. Required # ------------------------------------------------------------------
# check on staging branch protection — backend handler PRs cannot # Our act_runner config has `container.network: host` (operator host
# merge without a real-DB regression gate. # /opt/molecule/runners/config.yaml), which act_runner applies to BOTH
# the job container AND every service container. With host-net, two
# concurrent runs of this workflow both try to bind 0.0.0.0:5432 — the
# second postgres FATALs with `could not create any TCP/IP sockets:
# Address in use`, and Docker auto-removes it (act_runner sets
# AutoRemove:true on service containers). By the time the migrations
# step runs `psql`, the postgres container is gone, hence
# `Connection refused` then `failed to remove container: No such
# container` at cleanup time.
# #
# Cost: ~30s job (postgres pull from GH cache + go build + 4 tests). # Per-job `container.network` override is silently ignored by
# act_runner — `--network and --net in the options will be ignored.`
# appears in the runner log. Documented constraint.
#
# So we sidestep `services:` entirely. The job container still uses
# host-net (inherited from runner config; required for cache server
# discovery on the bridge IP 172.18.0.17:42631). We launch a sibling
# postgres on the existing `molecule-monorepo-net` bridge with a
# UNIQUE name per run — `pg-handlers-${RUN_ID}-${RUN_ATTEMPT}` — and
# read its bridge IP via `docker inspect`. A host-net job container
# can reach a bridge-net container directly via the bridge IP (verified
# manually on operator host 2026-05-08).
#
# Trade-offs vs. the original `services:` shape:
# + No host-port collision; N parallel runs share the bridge cleanly
# + `if: always()` cleanup runs even on test-step failure
# - One more step in the workflow (+~3 lines)
# - Requires `molecule-monorepo-net` to exist on the operator host
# (it does; declared in docker-compose.yml + docker-compose.infra.yml)
#
# Class B Hongming-owned CICD red sweep, 2026-05-08.
#
# Cost: ~30s job (postgres pull from cache + go build + 4 tests).
on: on:
push: push:
@ -59,20 +89,14 @@ jobs:
name: Handlers Postgres Integration name: Handlers Postgres Integration
needs: detect-changes needs: detect-changes
runs-on: ubuntu-latest runs-on: ubuntu-latest
services: env:
postgres: # Unique name per run so concurrent jobs don't collide on the
image: postgres:15-alpine # bridge network. ${RUN_ID}-${RUN_ATTEMPT} is unique even across
env: # workflow_dispatch reruns of the same run_id.
POSTGRES_PASSWORD: test PG_NAME: pg-handlers-${{ github.run_id }}-${{ github.run_attempt }}
POSTGRES_DB: molecule # Bridge network already exists on the operator host (declared
ports: # in docker-compose.yml + docker-compose.infra.yml).
- 5432:5432 PG_NETWORK: molecule-monorepo-net
# GHA spins this with --health-cmd built in for postgres images.
options: >-
--health-cmd pg_isready
--health-interval 5s
--health-timeout 5s
--health-retries 10
defaults: defaults:
run: run:
working-directory: workspace-server working-directory: workspace-server
@ -89,16 +113,57 @@ jobs:
with: with:
go-version: 'stable' go-version: 'stable'
- if: needs.detect-changes.outputs.handlers == 'true'
name: Start sibling Postgres on bridge network
working-directory: .
run: |
# Sanity: the bridge network must exist on the operator host.
# Hard-fail loud if it doesn't — easier to spot than a silent
# auto-create that diverges from the rest of the stack.
if ! docker network inspect "${PG_NETWORK}" >/dev/null 2>&1; then
echo "::error::Bridge network '${PG_NETWORK}' missing on operator host. Re-run docker-compose.infra.yml or check ops handbook."
exit 1
fi
# If a stale container with the same name exists (rerun on
# the same run_id), wipe it first.
docker rm -f "${PG_NAME}" >/dev/null 2>&1 || true
docker run -d \
--name "${PG_NAME}" \
--network "${PG_NETWORK}" \
--health-cmd "pg_isready -U postgres" \
--health-interval 5s \
--health-timeout 5s \
--health-retries 10 \
-e POSTGRES_PASSWORD=test \
-e POSTGRES_DB=molecule \
postgres:15-alpine >/dev/null
# Read back the bridge IP. Always present immediately after
# `docker run -d` for bridge networks.
PG_HOST=$(docker inspect "${PG_NAME}" \
--format "{{(index .NetworkSettings.Networks \"${PG_NETWORK}\").IPAddress}}")
if [ -z "${PG_HOST}" ]; then
echo "::error::Could not resolve PG_HOST for ${PG_NAME} on ${PG_NETWORK}"
docker logs "${PG_NAME}" || true
exit 1
fi
echo "PG_HOST=${PG_HOST}" >> "$GITHUB_ENV"
echo "INTEGRATION_DB_URL=postgres://postgres:test@${PG_HOST}:5432/molecule?sslmode=disable" >> "$GITHUB_ENV"
echo "Started ${PG_NAME} at ${PG_HOST}:5432"
- if: needs.detect-changes.outputs.handlers == 'true' - if: needs.detect-changes.outputs.handlers == 'true'
name: Apply migrations to Postgres service name: Apply migrations to Postgres service
env: env:
PGPASSWORD: test PGPASSWORD: test
run: | run: |
# Wait for postgres to actually accept connections (the # Wait for postgres to actually accept connections. Docker's
# GHA --health-cmd is best-effort but psql can still race). # health-cmd handles container-side readiness, but the wire
# to the bridge IP is best-tested with pg_isready directly.
for i in {1..15}; do for i in {1..15}; do
if pg_isready -h 127.0.0.1 -p 5432 -U postgres -q; then break; fi if pg_isready -h "${PG_HOST}" -p 5432 -U postgres -q; then break; fi
echo "waiting for postgres..."; sleep 2 echo "waiting for postgres at ${PG_HOST}:5432..."; sleep 2
done done
# Apply every .up.sql in lexicographic order with # Apply every .up.sql in lexicographic order with
@ -131,7 +196,7 @@ jobs:
# not fine once a cross-table atomicity test came in. # not fine once a cross-table atomicity test came in.
set +e set +e
for migration in $(ls migrations/*.sql 2>/dev/null | grep -v '\.down\.sql$' | sort); do for migration in $(ls migrations/*.sql 2>/dev/null | grep -v '\.down\.sql$' | sort); do
if psql -h 127.0.0.1 -U postgres -d molecule -v ON_ERROR_STOP=1 \ if psql -h "${PG_HOST}" -U postgres -d molecule -v ON_ERROR_STOP=1 \
-f "$migration" >/dev/null 2>&1; then -f "$migration" >/dev/null 2>&1; then
echo "✓ $(basename "$migration")" echo "✓ $(basename "$migration")"
else else
@ -145,7 +210,7 @@ jobs:
# fail if any didn't land — that would be a real regression we # fail if any didn't land — that would be a real regression we
# want loud. # want loud.
for tbl in delegations workspaces activity_logs pending_uploads; do for tbl in delegations workspaces activity_logs pending_uploads; do
if ! psql -h 127.0.0.1 -U postgres -d molecule -tA \ if ! psql -h "${PG_HOST}" -U postgres -d molecule -tA \
-c "SELECT 1 FROM information_schema.tables WHERE table_name = '$tbl'" \ -c "SELECT 1 FROM information_schema.tables WHERE table_name = '$tbl'" \
| grep -q 1; then | grep -q 1; then
echo "::error::$tbl table missing after migration replay — handler integration tests would be meaningless" echo "::error::$tbl table missing after migration replay — handler integration tests would be meaningless"
@ -156,23 +221,32 @@ jobs:
- if: needs.detect-changes.outputs.handlers == 'true' - if: needs.detect-changes.outputs.handlers == 'true'
name: Run integration tests name: Run integration tests
env:
# 127.0.0.1, NOT localhost. On Gitea / act_runner the runner host
# has IPv6 enabled, so `localhost` resolves to `::1` first, and
# the Postgres service container only listens on IPv4 → lib/pq's
# first dial hits ECONNREFUSED. The migration step uses psql -h
# localhost which falls back to IPv4 cleanly, so the flake hides
# there and surfaces only at test time. Pinning IPv4 makes the
# whole job deterministic. (Issue #88, item 3.)
INTEGRATION_DB_URL: postgres://postgres:test@127.0.0.1:5432/molecule?sslmode=disable
run: | run: |
# INTEGRATION_DB_URL is exported by the start-postgres step;
# points at the per-run bridge IP, not 127.0.0.1, so concurrent
# workflow runs don't fight over a host-net 5432 port.
go test -tags=integration -timeout 5m -v ./internal/handlers/ -run "^TestIntegration_" go test -tags=integration -timeout 5m -v ./internal/handlers/ -run "^TestIntegration_"
- if: needs.detect-changes.outputs.handlers == 'true' && failure() - if: failure() && needs.detect-changes.outputs.handlers == 'true'
name: Diagnostic dump on failure name: Diagnostic dump on failure
env: env:
PGPASSWORD: test PGPASSWORD: test
run: | run: |
echo "::group::delegations table state" echo "::group::postgres container status"
psql -h 127.0.0.1 -U postgres -d molecule -c "SELECT * FROM delegations LIMIT 50;" || true docker ps -a --filter "name=${PG_NAME}" --format '{{.Status}} {{.Names}}' || true
docker logs "${PG_NAME}" 2>&1 | tail -50 || true
echo "::endgroup::" echo "::endgroup::"
echo "::group::delegations table state"
psql -h "${PG_HOST}" -U postgres -d molecule -c "SELECT * FROM delegations LIMIT 50;" || true
echo "::endgroup::"
- if: always() && needs.detect-changes.outputs.handlers == 'true'
name: Stop sibling Postgres
working-directory: .
run: |
# always() so containers don't leak when migrations or tests
# fail. The cleanup is best-effort: if the container is
# already gone (e.g. concurrent rerun race), don't fail the job.
docker rm -f "${PG_NAME}" >/dev/null 2>&1 || true
echo "Cleaned up ${PG_NAME}"

View File

@ -1,16 +1,99 @@
name: Retarget main PRs to staging name: Retarget main PRs to staging
# Mechanical enforcement of SHARED_RULES rule 8 ("Staging-first workflow, no # Mechanical enforcement of SHARED_RULES rule 8 ("Staging-first
# exceptions"). When a bot opens a PR against main, retarget it to staging # workflow, no exceptions"). When a bot opens a PR against `main`,
# automatically and leave an explanatory comment. Human CEO-authored PRs (the # retarget it to `staging` automatically and leave an explanatory
# staging→main promotion PR, etc.) are left alone — they're the authorised # comment. Human / CEO-authored PRs (the staging→main promotion
# exception to the rule. # PRs, etc.) are left alone — they're the authorised exception
# to the rule.
# #
# Why an Action instead of only a prompt rule: prompt rules depend on every # ============================================================
# role's system-prompt.md staying in sync. Today 5 of 8 engineer roles # What this workflow does
# (core-be, core-fe, app-fe, app-qa, devops-engineer) don't have the # ============================================================
# staging-first section — the bot keeps opening PRs to main. An Action #
# enforces the invariant regardless of prompt drift. # On `pull_request_target` opened/reopened against `main`:
# 1. If the PR head is `staging`, skip (the auto-promote PRs
# MUST stay base=main).
# 2. If the PR author is a bot, retarget the PR base to
# `staging` via Gitea REST `PATCH /pulls/{N}` body
# `{"base":"staging"}`.
# 3. If the retarget returns 422 "pull request already exists
# for base branch 'staging'" (issue #1884 case: another PR
# on the same head already targets staging), close the
# now-redundant main-PR via Gitea REST instead of failing
# red.
# 4. Post an explainer comment on the retargeted PR via
# Gitea REST `POST /issues/{N}/comments`.
#
# ============================================================
# Why Gitea REST (and not `gh api / gh pr close / gh pr comment`)
# ============================================================
#
# Pre-2026-05-06 this workflow used `gh api -X PATCH "repos/{owner}/{repo}/pulls/{N}" -f base=staging`
# plus `gh pr close` and `gh pr comment`. After the GitHub→Gitea
# cutover those calls fail because:
#
# - `gh` CLI defaults to `api.github.com`. Even with `GH_HOST`
# pointing at Gitea, `gh pr close / comment` route through
# GraphQL (`/api/graphql`) which Gitea does not expose.
# Empirical: every `gh pr *` call returns
# `HTTP 405 Method Not Allowed (https://git.moleculesai.app/api/graphql)`
# — same root cause as #65 (auto-sync, fixed in PR #66) and
# #73/#195 (auto-promote, fixed in PR #78).
# - `gh api -X PATCH /pulls/{N}` happens to use a REST path
# that Gitea also has, but the `gh` host-resolution layer
# and pagination/retry logic don't always hit Gitea cleanly,
# and the cost of switching to direct `curl` is one extra
# line of code.
#
# So this workflow uses direct `curl` calls to Gitea REST. No
# `gh` CLI dependency, no GraphQL, no flaky host-resolution.
#
# ============================================================
# Identity + token (anti-bot-ring per saved-memory
# `feedback_per_agent_gitea_identity_default`)
# ============================================================
#
# Pre-fix this workflow used the per-job ephemeral
# `secrets.GITHUB_TOKEN`. On Gitea Actions that token has
# narrow scope and unpredictable cross-PR write capability.
#
# Post-fix: `secrets.AUTO_SYNC_TOKEN` (the `devops-engineer`
# Gitea persona). Same persona used by `auto-sync-main-to-staging.yml`
# (PR #66) and `auto-promote-staging.yml` (PR #78). Token scope:
# `push: true` repo write, sufficient for PR-edit + close + comment.
#
# Why this token does NOT need branch-protection bypass:
# patching a PR's base ref is a PR-level operation that does not
# require push perms on either branch (the PR's own commits stay
# put; only the metadata changes).
#
# ============================================================
# Failure modes & operational notes
# ============================================================
#
# A — PATCH base→staging returns 422 "pull request already exists"
# (issue #1884 case):
# - Detected by string-match on response body. Workflow
# falls through to closing the now-redundant main-PR
# (Gitea REST `PATCH /pulls/{N}` with `state: closed`)
# and posts an explanation comment. Step summary surfaces.
#
# B — `AUTO_SYNC_TOKEN` rotated / wrong scope:
# - First REST call returns 401/403. Step summary surfaces.
# Re-issue token from `~/.molecule-ai/personas/` on the
# operator host and update repo Actions secret.
#
# C — PR was deleted between trigger and run:
# - REST call returns 404. Workflow exits 0 with a notice
# (the rule was already enforced or the PR is gone).
#
# D — author is not actually a bot but the filter mis-fires:
# - Filter is conservative: only triggers on
# `user.type == 'Bot'`, `login` ends with `[bot]`, or
# known bot logins (`molecule-ai[bot]`, `app/molecule-ai`).
# Human PRs slip through unaffected. If a NEW bot login
# starts shipping main-PRs, add it to the filter.
on: on:
pull_request_target: pull_request_target:
@ -24,16 +107,16 @@ jobs:
retarget: retarget:
name: Retarget to staging name: Retarget to staging
runs-on: ubuntu-latest runs-on: ubuntu-latest
# Only fire for bot-authored PRs. Human CEO PRs (staging→main promotion) # Only fire for bot-authored PRs. Human CEO PRs (staging→main
# are intentional and pass through. # promotion) are intentional and pass through.
# #
# Head-ref guard: never retarget a PR whose head IS `staging` — those # Head-ref guard: never retarget a PR whose head IS `staging`
# are the auto-promote staging→main PRs (opened by molecule-ai[bot] # — those are the auto-promote staging→main PRs (opened by
# since #2586 switched to an App token, which now passes the bot # `devops-engineer` since PR #78 / #195 fix). Retargeting
# filter below). Retargeting head=staging onto base=staging fails # head=staging onto base=staging fails with HTTP 422 "no new
# with HTTP 422 "no new commits between base 'staging' and head # commits between base 'staging' and head 'staging'", which
# 'staging'", which used to surface as a noisy red workflow run on # would surface as a noisy red workflow run on every
# every auto-promote (caught 2026-05-03 on PR #2588). # auto-promote (caught 2026-05-03 on the GitHub-era PR #2588).
if: >- if: >-
github.event.pull_request.head.ref != 'staging' github.event.pull_request.head.ref != 'staging'
&& ( && (
@ -41,65 +124,153 @@ jobs:
|| endsWith(github.event.pull_request.user.login, '[bot]') || endsWith(github.event.pull_request.user.login, '[bot]')
|| github.event.pull_request.user.login == 'app/molecule-ai' || github.event.pull_request.user.login == 'app/molecule-ai'
|| github.event.pull_request.user.login == 'molecule-ai[bot]' || github.event.pull_request.user.login == 'molecule-ai[bot]'
|| github.event.pull_request.user.login == 'devops-engineer'
) )
steps: steps:
- name: Retarget PR base to staging - name: Retarget PR base to staging via Gitea REST
id: retarget id: retarget
env: env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }} GITEA_TOKEN: ${{ secrets.AUTO_SYNC_TOKEN }}
GITEA_HOST: ${{ vars.GITEA_HOST || 'https://git.moleculesai.app' }}
REPO: ${{ github.repository }}
PR_NUMBER: ${{ github.event.pull_request.number }} PR_NUMBER: ${{ github.event.pull_request.number }}
PR_AUTHOR: ${{ github.event.pull_request.user.login }} PR_AUTHOR: ${{ github.event.pull_request.user.login }}
# Issue #1884: when the bot opens a PR against main and there's # Issue #1884 case: when the bot opens a PR against main
# already another PR on the same head branch targeting staging, # and there's already another PR on the same head branch
# GitHub's PATCH /pulls returns 422 with # targeting staging, Gitea's PATCH returns 422 with a
# "A pull request already exists for base branch 'staging' …". # body mentioning "pull request already exists for base
# The retarget can't proceed — but the right response is to # branch 'staging'" (the Gitea message wording is
# close the now-redundant main-PR, not to fail the workflow # slightly different from GitHub's; the substring match
# noisily. Detect that specific 422 and close instead. # below covers both for forward/back compat).
# The retarget can't proceed — but the right response is
# to close the now-redundant main-PR, not to fail the
# workflow noisily. Detect that specific 422 and close
# instead.
run: | run: |
set +e set -euo pipefail
API="${GITEA_HOST}/api/v1/repos/${REPO}"
AUTH=(-H "Authorization: token ${GITEA_TOKEN}" -H "Accept: application/json")
echo "Retargeting PR #${PR_NUMBER} (author: ${PR_AUTHOR}) from main → staging" echo "Retargeting PR #${PR_NUMBER} (author: ${PR_AUTHOR}) from main → staging"
PATCH_OUTPUT=$(gh api -X PATCH \
"repos/${{ github.repository }}/pulls/${PR_NUMBER}" \ # Curl-status-capture pattern per `feedback_curl_status_capture_pollution`:
-f base=staging \ # http_code via -w to its own scalar, body to a tempfile, set +e/-e
--jq '.base.ref' 2>&1) # bracket so curl's non-zero-on-4xx doesn't pollute the script's exit chain.
PATCH_EXIT=$? BODY_FILE=$(mktemp)
REQ='{"base":"staging"}'
set +e
STATUS=$(curl -sS "${AUTH[@]}" -H "Content-Type: application/json" \
-X PATCH -d "${REQ}" \
-o "${BODY_FILE}" -w "%{http_code}" \
"${API}/pulls/${PR_NUMBER}")
CURL_RC=$?
set -e set -e
if [ "$PATCH_EXIT" -eq 0 ]; then
echo "::notice::Retargeted PR #${PR_NUMBER} → staging" if [ "${CURL_RC}" -ne 0 ]; then
echo "outcome=retargeted" >> "$GITHUB_OUTPUT" echo "::error::curl PATCH failed (rc=${CURL_RC})"
exit 0 rm -f "${BODY_FILE}"
exit 1
fi fi
if [ "${STATUS}" = "201" ] || [ "${STATUS}" = "200" ]; then
NEW_BASE=$(jq -r '.base.ref // "?"' < "${BODY_FILE}")
rm -f "${BODY_FILE}"
if [ "${NEW_BASE}" = "staging" ]; then
echo "::notice::Retargeted PR #${PR_NUMBER} → staging"
echo "outcome=retargeted" >> "$GITHUB_OUTPUT"
exit 0
fi
echo "::error::PATCH returned ${STATUS} but base.ref is '${NEW_BASE}', not 'staging'"
exit 1
fi
# Specifically match the 422 duplicate-base/head error so # Specifically match the 422 duplicate-base/head error so
# any OTHER PATCH failure (auth, deleted PR, etc.) still # any OTHER PATCH failure (auth, deleted PR, etc.) still
# surfaces as a real workflow failure. # surfaces as a real workflow failure.
if echo "$PATCH_OUTPUT" | grep -q "pull request already exists for base branch 'staging'"; then BODY=$(cat "${BODY_FILE}" || true)
rm -f "${BODY_FILE}"
if [ "${STATUS}" = "422" ] && echo "${BODY}" | grep -qE "(pull request already exists for base branch 'staging'|already exists.*base.*staging)"; then
echo "::notice::PR #${PR_NUMBER}: duplicate target-staging PR exists on same head — closing this main-PR as redundant." echo "::notice::PR #${PR_NUMBER}: duplicate target-staging PR exists on same head — closing this main-PR as redundant."
gh pr close "$PR_NUMBER" \
--repo "${{ github.repository }}" \ # Close the now-redundant main-PR via Gitea REST
--comment "[retarget-bot] Closing — another PR on the same head branch already targets \`staging\`. This PR is redundant. See issue #1884 for the rationale." # (PATCH state=closed). Post comment explaining
echo "outcome=closed-as-duplicate" >> "$GITHUB_OUTPUT" # rationale BEFORE close so the comment lands on the
exit 0 # PR (commenting on a closed PR works on Gitea, but
# historically caused notification ordering surprises).
CLOSE_BODY_FILE=$(mktemp)
CMT_REQ=$(jq -n '{body:"[retarget-bot] Closing — another PR on the same head branch already targets `staging`. This PR is redundant. See issue #1884 for the rationale."}')
set +e
CMT_STATUS=$(curl -sS "${AUTH[@]}" -H "Content-Type: application/json" \
-X POST -d "${CMT_REQ}" \
-o "${CLOSE_BODY_FILE}" -w "%{http_code}" \
"${API}/issues/${PR_NUMBER}/comments")
set -e
if [ "${CMT_STATUS}" != "201" ]; then
echo "::warning::dup-close comment POST returned ${CMT_STATUS}; continuing to close anyway"
cat "${CLOSE_BODY_FILE}" | head -c 300 || true
fi
rm -f "${CLOSE_BODY_FILE}"
CLOSE_REQ='{"state":"closed"}'
CLOSE_RESP=$(mktemp)
set +e
CL_STATUS=$(curl -sS "${AUTH[@]}" -H "Content-Type: application/json" \
-X PATCH -d "${CLOSE_REQ}" \
-o "${CLOSE_RESP}" -w "%{http_code}" \
"${API}/pulls/${PR_NUMBER}")
set -e
if [ "${CL_STATUS}" = "201" ] || [ "${CL_STATUS}" = "200" ]; then
echo "::notice::Closed PR #${PR_NUMBER} as redundant"
echo "outcome=closed-as-duplicate" >> "$GITHUB_OUTPUT"
rm -f "${CLOSE_RESP}"
exit 0
fi
echo "::error::Failed to close redundant PR: HTTP ${CL_STATUS}"
cat "${CLOSE_RESP}" | head -c 300 || true
rm -f "${CLOSE_RESP}"
exit 1
fi fi
echo "::error::Retarget PATCH failed and was NOT a duplicate-base error:"
echo "$PATCH_OUTPUT" >&2 echo "::error::Retarget PATCH failed and was NOT a duplicate-base error: HTTP ${STATUS}"
echo "${BODY}" | head -c 500 >&2
exit 1 exit 1
- name: Post explainer comment - name: Post explainer comment
if: steps.retarget.outputs.outcome == 'retargeted' if: steps.retarget.outputs.outcome == 'retargeted'
env: env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }} GITEA_TOKEN: ${{ secrets.AUTO_SYNC_TOKEN }}
GITEA_HOST: ${{ vars.GITEA_HOST || 'https://git.moleculesai.app' }}
REPO: ${{ github.repository }}
PR_NUMBER: ${{ github.event.pull_request.number }} PR_NUMBER: ${{ github.event.pull_request.number }}
run: | run: |
gh pr comment "$PR_NUMBER" \ set -euo pipefail
--repo "${{ github.repository }}" \
--body "$(cat <<'BODY'
[retarget-bot] This PR was opened against `main` and has been retargeted to `staging` automatically.
**Why:** per [SHARED_RULES rule 8](https://git.moleculesai.app/molecule-ai/molecule-ai-org-template-molecule-dev/blob/main/SHARED_RULES.md), all feature work targets `staging` first; the CEO promotes `staging → main` separately. API="${GITEA_HOST}/api/v1/repos/${REPO}"
AUTH=(-H "Authorization: token ${GITEA_TOKEN}" -H "Accept: application/json")
**What changed:** just the base branch — no code change. CI will re-run against `staging`. If you get merge conflicts, rebase on `staging`. # PR comments live on the issue endpoint in Gitea
# (PRs ARE issues — same endpoint, different sub-resources
# for diffs/files/etc.). The body uses jq to safely
# encode the multi-line markdown without shell-quote
# nightmares.
REQ=$(jq -n '{body:"[retarget-bot] This PR was opened against `main` and has been retargeted to `staging` automatically.\n\n**Why:** per [SHARED_RULES rule 8](https://git.moleculesai.app/molecule-ai/molecule-ai-org-template-molecule-dev/src/branch/main/SHARED_RULES.md), all feature work targets `staging` first; the CEO promotes `staging → main` separately.\n\n**What changed:** just the base branch — no code change. CI will re-run against `staging`. If you get merge conflicts, rebase on `staging`.\n\n**If this PR is the CEO`s staging→main promotion:** the Action skipped you (only bot-authored PRs are retargeted, head=staging is also exempted). If you see this comment on your CEO PR, that`s a bug — please tag @hongmingwang."}')
**If this PR is the CEO's staging→main promotion:** the Action skipped you (only bot-authored PRs are retargeted). If you see this comment on your CEO PR, that's a bug — please tag @HongmingWang-Rabbit. BODY_FILE=$(mktemp)
BODY set +e
)" STATUS=$(curl -sS "${AUTH[@]}" -H "Content-Type: application/json" \
-X POST -d "${REQ}" \
-o "${BODY_FILE}" -w "%{http_code}" \
"${API}/issues/${PR_NUMBER}/comments")
set -e
if [ "${STATUS}" = "201" ]; then
echo "::notice::Posted explainer comment on PR #${PR_NUMBER}"
else
echo "::warning::Failed to post explainer (HTTP ${STATUS}) — retarget itself succeeded"
cat "${BODY_FILE}" | head -c 300 || true
fi
rm -f "${BODY_FILE}"

View File

@ -7,6 +7,32 @@ export default defineConfig({
test: { test: {
environment: 'node', environment: 'node',
exclude: ['e2e/**', 'node_modules/**', '**/dist/**'], exclude: ['e2e/**', 'node_modules/**', '**/dist/**'],
// CI-conditional test timeout (issue #96).
//
// Vitest's 5000ms default is too tight for the first test in any
// file under our CI shape: `npx vitest run --coverage` on the
// self-hosted Gitea Actions Docker runner. The cold-start cost
// (v8 coverage instrumentation init + JSDOM bootstrap + module-
// graph import for @/components/* and @/lib/* + first React
// render) consistently consumes 5-7 seconds for the first
// synchronous test in heavyweight component files
// (ActivityTab.test.tsx, CreateWorkspaceDialog.test.tsx,
// ConfigTab.provider.test.tsx) — even though every subsequent
// test in the same file completes in 100-1500ms.
//
// Empirically the worst observed first-test was 6453ms in a
// single file (CreateWorkspaceDialog). 30000ms gives ~5x
// headroom over that on CI; we still keep 5000ms locally so
// genuine waitFor races / hung promises stay sensitive in dev.
//
// Same vitest pattern documented at:
// https://vitest.dev/config/testtimeout
// https://vitest.dev/guide/coverage#profiling-test-performance
//
// Per-test duration is still emitted to the CI log; if a test
// ever silently approaches 25-30s under this raised ceiling that
// will surface as a duration regression and we revisit.
testTimeout: process.env.CI ? 30000 : 5000,
// Coverage is instrumented but NOT yet a CI gate — first land // Coverage is instrumented but NOT yet a CI gate — first land
// observability so we can see the baseline, then dial in // observability so we can see the baseline, then dial in
// thresholds + a hard gate in a follow-up PR (#1815). Today's // thresholds + a hard gate in a follow-up PR (#1815). Today's

View File

@ -58,8 +58,11 @@ green — proves wire shape end-to-end against a real `hermes gateway run`
subprocess + stub OpenAI-compat LLM. Caught + fixed a real `KeyError` subprocess + stub OpenAI-compat LLM. Caught + fixed a real `KeyError`
in upstream `hermes_cli/tools_config.py` (PLATFORMS dict lookup in upstream `hermes_cli/tools_config.py` (PLATFORMS dict lookup
crashed on plugin platforms) — fix on the patched fork branch crashed on plugin platforms) — fix on the patched fork branch
(`HongmingWang-Rabbit/hermes-agent` `feat/platform-adapter-plugins`, (`molecule-ai/hermes-agent` `feat/platform-adapter-plugins`, commit
commit `18e4849e`). Upstream PR #18775 OPEN; CONFLICTING with main. `18e4849e`, hosted on Gitea at
`https://git.moleculesai.app/molecule-ai/hermes-agent` — moved from the
suspended `github.com/HongmingWang-Rabbit/hermes-agent`, see
`molecule-ai/internal#72`). Upstream PR #18775 OPEN; CONFLICTING with main.
Not on critical path for our platform — patched fork is what the Not on critical path for our platform — patched fork is what the
workspace image installs. workspace image installs.

View File

@ -0,0 +1,137 @@
# Runbook — Handlers Postgres Integration port-collision substrate
**Status:** Resolved 2026-05-08 (PR for class B Hongming-owned CICD red sweep).
## Symptom
`Handlers Postgres Integration` workflow fails on staging push and PRs.
Step `Apply migrations to Postgres service` shows:
```
psql: error: connection to server at "127.0.0.1", port 5432 failed: Connection refused
```
Job-cleanup step further down logs:
```
Cleaning up services for job Handlers Postgres Integration
failed to remove container: Error response from daemon: No such container: <id>
```
…confirming the postgres service container was already gone before
cleanup ran.
## Root cause
Our Gitea act_runner (operator host `5.78.80.188`,
`/opt/molecule/runners/config.yaml`) sets:
```yaml
container:
network: host
```
…which act_runner applies to BOTH the job container AND every
`services:` container in a workflow. Multiple workflow instances
running concurrently across the 16 parallel runners each try to bind
postgres on `0.0.0.0:5432`. The first wins; subsequent instances exit
immediately with:
```
LOG: could not bind IPv4 address "0.0.0.0": Address in use
HINT: Is another postmaster already running on port 5432?
FATAL: could not create any TCP/IP sockets
```
act_runner sets `AutoRemove:true` on service containers, so Docker
garbage-collects them as soon as they exit. By the time the migrations
step runs `pg_isready` / `psql`, the container is gone and connection
refused.
Reproduction (operator host):
```bash
docker run --rm -d --name pg-A --network host \
-e POSTGRES_PASSWORD=test postgres:15-alpine
docker run -d --name pg-B --network host \
-e POSTGRES_PASSWORD=test postgres:15-alpine
docker logs pg-B # FATAL: could not create any TCP/IP sockets
```
## Why per-job override doesn't work
The natural fix — per-job `container.network` override — is silently
ignored by act_runner. The runner log emits:
```
--network and --net in the options will be ignored.
```
This is a documented act_runner constraint: container network is a
runner-wide setting, not per-job. Source: gitea/act_runner config docs
+ vegardit/docker-gitea-act-runner issue #7.
Flipping the global `container.network` to `bridge` would break every
other workflow in the repo (cache server discovery,
`molecule-monorepo-net` peer access during integration tests, etc.) —
unacceptable blast radius for a per-test bug.
## Fix shape
`handlers-postgres-integration.yml` no longer uses `services: postgres:`.
It launches a sibling postgres container manually on the existing
`molecule-monorepo-net` bridge network with a per-run unique name:
```yaml
env:
PG_NAME: pg-handlers-${{ github.run_id }}-${{ github.run_attempt }}
PG_NETWORK: molecule-monorepo-net
steps:
- name: Start sibling Postgres on bridge network
run: |
docker run -d --name "${PG_NAME}" --network "${PG_NETWORK}" \
...
postgres:15-alpine
PG_HOST=$(docker inspect "${PG_NAME}" \
--format "{{(index .NetworkSettings.Networks \"${PG_NETWORK}\").IPAddress}}")
echo "PG_HOST=${PG_HOST}" >> "$GITHUB_ENV"
# … migrations + tests use ${PG_HOST}, not 127.0.0.1 …
- if: always() &&
name: Stop sibling Postgres
run: docker rm -f "${PG_NAME}" || true
```
The host-net job container can reach a bridge-net container via the
bridge IP directly (verified manually, 2026-05-08). Two parallel runs
use different names + different bridge IPs — no collision.
## Future-proofing
Other workflows that hit the same shape (any `services:` with a
fixed-port image) will exhibit the same failure mode under
host-network runner config. Translate using this same pattern:
1. Drop the `services:` block.
2. Use `${{ github.run_id }}-${{ github.run_attempt }}` for unique
container name.
3. Launch on `molecule-monorepo-net` (already trusted bridge in
`docker-compose.infra.yml`).
4. Read back the bridge IP via `docker inspect` and export as a step env.
5. `if: always()` cleanup step at the end.
If the count of such workflows grows, factor into a composite action
(`./.github/actions/sibling-postgres`) so the substrate logic lives
in one place.
## Related
- Issue #88 (closed by #92): localhost → 127.0.0.1 fix that unmasked
this collision; the IPv6 fix is correct, port collision is the new
layer.
- Issue #94 created `molecule-monorepo-net` + `alpine:latest` as
prereqs.
- Saved memory `feedback_act_runner_github_server_url` documents
another act_runner-vs-GHA divergence (server URL).