chore(ci): migrate all jobs to self-hosted macOS arm64 runner

* chore(ci): migrate all jobs to self-hosted macOS arm64 runner

Switches every job in `ci.yml` and `publish-platform-image.yml` from
`ubuntu-latest` to `[self-hosted, macos, arm64]` to avoid GitHub-hosted
minute rate limits. All jobs run on a single Apple-silicon self-hosted
runner registered at the Molecule-AI org level.

Notable non-trivial adaptations (macOS runners can't use `services:` and
some GHA marketplace actions are Linux-only):

- e2e-api: `services: postgres/redis` replaced with inline `docker run`
  steps. Ports remapped to 15432/16379 to avoid collision with anything
  the host may already expose on the standard ports. Containers are named
  (`molecule-ci-postgres` / `molecule-ci-redis`) and torn down in an
  `if: always()` step. Postgres readiness is still gated on pg_isready
  via `docker exec`.
- shellcheck: `ludeeus/action-shellcheck` is a Docker action, Linux-only.
  Replaced with a direct `shellcheck` invocation (pre-installed on the
  runner) that scans `tests/e2e/*.sh` with `--severity=warning`.
- publish-platform-image: added `docker/setup-qemu-action@v3` and an
  explicit `platforms: linux/amd64` on both `docker/build-push-action`
  invocations. The runner is arm64 but Fly tenant machines pull amd64,
  so QEMU-emulated cross-arch builds are required. GHA cache-from/cache-to
  behavior is unchanged.

Runner prereqs (one-time host setup):
- Docker Desktop installed and running (for e2e-api + image publish)
- `shellcheck` on PATH
- `docker` on PATH
- Go / Node / gh / Python are installed via setup-* actions per job

* fix(ci): set AGENT_TOOLSDIRECTORY for python-lint on self-hosted runner

setup-python@v5 defaults to /Users/runner/hostedtoolcache which doesn't
exist on the hongming-claw self-hosted runner. AGENT_TOOLSDIRECTORY tells
the action to use a writable path under the runner user's home directory.

Fixes the only failing job in CI run 24469156329 on PR #186.

---------

Co-authored-by: Hongming Wang <HongmingWang-Rabbit@users.noreply.github.com>
This commit is contained in:
Hongming Wang 2026-04-15 10:48:27 -07:00 committed by GitHub
parent cdb45a3786
commit aa419477b7
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
2 changed files with 81 additions and 51 deletions

View File

@ -9,7 +9,7 @@ on:
jobs:
platform-build:
name: Platform (Go)
runs-on: ubuntu-latest
runs-on: [self-hosted, macos, arm64]
defaults:
run:
working-directory: platform
@ -43,7 +43,7 @@ jobs:
canvas-build:
name: Canvas (Next.js)
runs-on: ubuntu-latest
runs-on: [self-hosted, macos, arm64]
defaults:
run:
working-directory: canvas
@ -59,7 +59,7 @@ jobs:
mcp-server-build:
name: MCP Server (Node.js)
runs-on: ubuntu-latest
runs-on: [self-hosted, macos, arm64]
defaults:
run:
working-directory: mcp-server
@ -75,37 +75,17 @@ jobs:
e2e-api:
name: E2E API Smoke Test
runs-on: ubuntu-latest
timeout-minutes: 10
services:
postgres:
# Credentials match .env.example (dev:dev) so local reproduction is
# identical to CI. POSTGRES_DB matches the default there too.
image: postgres:16
env:
POSTGRES_USER: dev
POSTGRES_PASSWORD: dev
POSTGRES_DB: molecule
ports:
- 5432:5432
options: >-
--health-cmd "pg_isready -U dev"
--health-interval 10s
--health-timeout 5s
--health-retries 5
redis:
image: redis:7
ports:
- 6379:6379
options: >-
--health-cmd "redis-cli ping"
--health-interval 10s
--health-timeout 5s
--health-retries 5
runs-on: [self-hosted, macos, arm64]
timeout-minutes: 15
# `services:` is Linux-only on self-hosted runners — we start postgres
# and redis via `docker run` instead. Ports 15432/16379 avoid collision
# with anything the host may already have on the standard ports.
env:
DATABASE_URL: postgres://dev:dev@localhost:5432/molecule?sslmode=disable
REDIS_URL: redis://localhost:6379
DATABASE_URL: postgres://dev:dev@localhost:15432/molecule?sslmode=disable
REDIS_URL: redis://localhost:16379
PORT: "8080"
PG_CONTAINER: molecule-ci-postgres
REDIS_CONTAINER: molecule-ci-redis
steps:
- uses: actions/checkout@v4
- uses: actions/setup-go@v5
@ -113,6 +93,38 @@ jobs:
go-version: 'stable'
cache: true
cache-dependency-path: platform/go.sum
- name: Start Postgres (docker)
run: |
docker rm -f "$PG_CONTAINER" 2>/dev/null || true
docker run -d --name "$PG_CONTAINER" \
-e POSTGRES_USER=dev \
-e POSTGRES_PASSWORD=dev \
-e POSTGRES_DB=molecule \
-p 15432:5432 \
postgres:16
for i in $(seq 1 30); do
if docker exec "$PG_CONTAINER" pg_isready -U dev >/dev/null 2>&1; then
echo "Postgres ready after ${i}s"
exit 0
fi
sleep 1
done
echo "::error::Postgres did not become ready in 30s"
docker logs "$PG_CONTAINER" || true
exit 1
- name: Start Redis (docker)
run: |
docker rm -f "$REDIS_CONTAINER" 2>/dev/null || true
docker run -d --name "$REDIS_CONTAINER" -p 16379:6379 redis:7
for i in $(seq 1 15); do
if docker exec "$REDIS_CONTAINER" redis-cli ping 2>/dev/null | grep -q PONG; then
echo "Redis ready after ${i}s"
exit 0
fi
sleep 1
done
echo "::error::Redis did not become ready in 15s"
exit 1
- name: Build platform
working-directory: platform
run: go build -o platform-server ./cmd/server
@ -135,17 +147,9 @@ jobs:
exit 1
- name: Assert migrations applied
# Migrations auto-run at platform boot. Fail fast if they silently
# didn't — catches future migration-author mistakes (e.g. a new
# privileged op Postgres "dev" can't execute) before the E2E run.
# Uses docker exec into the service container's own psql — avoids
# a 10-20s apt-install step in the runner.
# didn't — catches future migration-author mistakes before the E2E run.
run: |
pg_container=$(docker ps --filter "ancestor=postgres:16" --format "{{.ID}}" | head -1)
if [ -z "$pg_container" ]; then
echo "::error::Could not find postgres service container"
exit 1
fi
tables=$(docker exec "$pg_container" psql -U dev -d molecule -tAc "SELECT count(*) FROM information_schema.tables WHERE table_schema='public' AND table_name='workspaces'")
tables=$(docker exec "$PG_CONTAINER" psql -U dev -d molecule -tAc "SELECT count(*) FROM information_schema.tables WHERE table_schema='public' AND table_name='workspaces'")
if [ "$tables" != "1" ]; then
echo "::error::Migrations did not apply — 'workspaces' table missing"
cat platform/platform.log || true
@ -163,22 +167,31 @@ jobs:
if [ -f platform/platform.pid ]; then
kill "$(cat platform/platform.pid)" 2>/dev/null || true
fi
- name: Stop service containers
if: always()
run: |
docker rm -f "$PG_CONTAINER" 2>/dev/null || true
docker rm -f "$REDIS_CONTAINER" 2>/dev/null || true
shellcheck:
name: Shellcheck (E2E scripts)
runs-on: ubuntu-latest
runs-on: [self-hosted, macos, arm64]
steps:
- uses: actions/checkout@v4
- name: Run shellcheck on tests/e2e/*.sh
uses: ludeeus/action-shellcheck@master
env:
SHELLCHECK_OPTS: --severity=warning
with:
scandir: tests/e2e
# `ludeeus/action-shellcheck` is a Docker action (Linux-only). We rely
# on shellcheck being pre-installed on the self-hosted runner instead.
run: |
if ! command -v shellcheck >/dev/null 2>&1; then
echo "::error::shellcheck is not installed on the runner"
exit 1
fi
find tests/e2e -type f -name '*.sh' -print0 \
| xargs -0 shellcheck --severity=warning
canvas-deploy-reminder:
name: Canvas Deploy Reminder
runs-on: ubuntu-latest
runs-on: [self-hosted, macos, arm64]
needs: canvas-build
# Only fires on direct pushes to main (i.e. after a PR merges).
# PRs get canvas-build CI but no reminder — no deployment happens on PRs.
@ -216,7 +229,12 @@ jobs:
python-lint:
name: Python Lint & Test
runs-on: ubuntu-latest
runs-on: [self-hosted, macos, arm64]
env:
# setup-python@v5 defaults to /Users/runner/hostedtoolcache which does
# not exist on the self-hosted runner (user is hongming-claw). Point it
# to the runner user's writable directory so Python 3.11 can be cached.
AGENT_TOOLSDIRECTORY: /Users/hongming-claw/hostedtoolcache
defaults:
run:
working-directory: workspace-template

View File

@ -32,11 +32,19 @@ env:
jobs:
build-and-push:
runs-on: ubuntu-latest
runs-on: [self-hosted, macos, arm64]
steps:
- name: Checkout
uses: actions/checkout@v4
- name: Set up QEMU
# Required on the Apple-silicon self-hosted runner — Fly tenant machines
# pull linux/amd64, and buildx needs binfmt handlers in Docker Desktop's
# VM to emulate amd64 during the build.
uses: docker/setup-qemu-action@v3
with:
platforms: linux/amd64
- name: Set up Docker Buildx
# Buildx enables cache-from/cache-to via GHA cache and multi-arch
# builds without local docker daemon wrangling.
@ -75,10 +83,13 @@ jobs:
# GHCR (or vice versa) — each registry's failure mode is isolated.
# GHA cache is shared because both steps re-use the same Dockerfile
# context + build args.
# Explicit linux/amd64 target: the runner is Apple-silicon (arm64),
# but Fly tenant machines are amd64. QEMU handles the emulation.
uses: docker/build-push-action@v5
with:
context: ./platform
file: ./platform/Dockerfile
platforms: linux/amd64
push: true
tags: |
${{ env.IMAGE_NAME }}:latest
@ -99,6 +110,7 @@ jobs:
with:
context: ./platform
file: ./platform/Dockerfile
platforms: linux/amd64
push: true
tags: |
${{ env.FLY_IMAGE_NAME }}:latest