chore(ci): remove dead arm64-darwin self-hosted lanes #2569

Merged
agent-reviewer-cr2 merged 1 commits from chore/remove-dead-arm64-darwin-lanes into main 2026-06-11 04:59:28 +00:00
4 changed files with 2 additions and 334 deletions
-195
View File
@@ -1,195 +0,0 @@
# ci-arm64-advisory — Mac arm64 self-hosted ADVISORY fast-check lane.
#
# === WHY ===
#
# The amd64 Gitea runner pool (molecule-runner-1..20) is queue-contended
# (internal#418). This lane offloads the *genuinely container-independent*
# fast checks (Go build/vet/lint, shellcheck, Python lint) onto the Mac
# arm64 self-hosted runner so developers get a fast arm64 signal WITHOUT
# adding load to the starved amd64 pool — capability-honestly, as an
# additive pilot. Pilot ② of the Mac-CI strategy (CTO-delegated 2026-05-17).
#
# === NON-NEGOTIABLE SAFETY CONTRACT (the prime directive) ===
#
# This lane is **ADVISORY ONLY**. It is provably incapable of hanging a
# merge. Concretely:
#
# 1. It is a SEPARATE workflow file. `ci.yml` is byte-for-byte
# untouched by this PR. The `CI / all-required` aggregator sentinel
# and the five contexts it polls
# (`CI / Detect changes|Platform (Go)|Canvas (Next.js)|
# Shellcheck (E2E scripts)|Python Lint & Test (pull_request)`)
# are unchanged. The canonical required gate stays 100% on the
# existing amd64 pool.
#
# 2. The context this workflow emits is
# `ci-arm64-advisory / fast-checks (pull_request)`. That string is
# DELIBERATELY NOT present in, and this PR does NOT add it to:
# - branch_protections/{main,staging}.status_check_contexts
# (DB-verified pb 86/75 = exactly
# ["CI / all-required (pull_request)",
# "sop-checklist / all-items-acked (pull_request)"])
# - audit-force-merge.yml REQUIRED_CHECKS env
# - ci.yml `all-required` sentinel's hardcoded `required[]` list
# Branch protection therefore never waits on this context. If the
# Mac runner is absent / offline / removed, this workflow's status
# simply never appears — and because nothing requires it, every
# merge proceeds exactly as it does today. There is no path by
# which a missing/red arm64 status blocks a merge.
#
# 3. `continue-on-error: true` on the job — even a genuine arm64-only
# failure (toolchain drift, arch-specific test flake) is surfaced
# as information, never as a merge blocker, for the duration of
# the pilot.
#
# 4. The job carries a `github.event_name` `if:` gate. Beyond its
# functional purpose this also keeps the job OUT of
# `ci-required-drift.py:ci_job_names()` (which excludes
# `github.event_name`/`github.ref`-gated jobs), so the hourly
# ci-required-drift sentinel's F1 ("job not under sentinel needs")
# cannot ever flag this advisory job. F2/F3 are untouched because
# this context is absent from BP and from REQUIRED_CHECKS.
# `lint-bp-context-emit-match` only fails on BP→emitter gaps; an
# emitter without a BP context is explicitly informational there.
#
# === RUNNER TARGETING ===
#
# The Mac runner is `hongming-pc-runner-1`. The bare `self-hosted`
# label is POLLUTED in this Gitea instance: molecule-runner-1..20
# (the contended amd64 pool) also advertise `self-hosted`. Targeting
# bare `self-hosted` would route back onto the very pool we are trying
# to relieve — and onto amd64 hardware. We therefore require an
# AND-set of labels that ONLY the Mac satisfies. `macos-self-hosted`
# is Mac-exclusive (the amd64 pool does not carry it). Until the
# label-install burst (a10862b2) lands `self-hosted`+`macos-self-hosted`
# on the Mac, the runner's current unique label `hongming-pc-laptop`
# is also listed; AND-semantics over the labels a runner advertises
# means a job requiring [self-hosted, macos-self-hosted] can ONLY be
# claimed once the Mac advertises both. If neither label set is yet
# present on the Mac, the workflow stays queued harmlessly and is
# garbage-collected by the normal stale-run reaper — it blocks nothing
# (see safety contract point 2).
#
# === ROLLBACK ===
#
# Delete this single file (`git rm .gitea/workflows/ci-arm64-advisory.yml`)
# and merge. No branch-protection edit, no ci.yml edit, no
# REQUIRED_CHECKS edit is required to roll back, because none were made
# to roll forward. Zero blast radius either direction.
name: ci-arm64-advisory
on:
push:
branches: [main, staging]
pull_request:
branches: [main, staging]
# Per-ref cancel: a newer commit on the same ref supersedes the older
# advisory run. Distinct from ci.yml's `ci-${ref}` group so this lane
# never cancels (or is cancelled by) the canonical required CI.
concurrency:
group: ci-arm64-advisory-${{ github.ref }}
cancel-in-progress: true
env:
GITHUB_SERVER_URL: https://git.moleculesai.app
jobs:
# bp-exempt: advisory arm64 pilot, non-gating by design (internal#418).
fast-checks:
name: fast-checks
# AND-set: only the Mac arm64 runner advertises macos-self-hosted.
# See "RUNNER TARGETING" header note for why bare self-hosted is unsafe.
runs-on: [self-hosted, macos-self-hosted]
# ADVISORY: never blocks. See safety contract point 3. mc#1982
# internal#418 — tracked: arm64 advisory pilot, non-gating by design.
continue-on-error: true
# event_name gate: functional (only meaningful on push/PR) AND keeps
# this job out of ci-required-drift.py:ci_job_names() so F1 can never
# flag it. See safety contract point 4.
if: ${{ github.event_name == 'push' || github.event_name == 'pull_request' }}
timeout-minutes: 20
steps:
- name: Provenance — advisory lane, non-gating
run: |
echo "This is the arm64 ADVISORY fast-check lane."
echo "It does NOT gate merges. Canonical required CI is ci.yml"
echo "on the amd64 pool. Arch: $(uname -m) on $(uname -s)."
- name: Checkout
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
# ---- Go: build + vet + lint (container-independent: needs only the
# Go toolchain; no amd64 ECR image, no docker-in-job). Race-detector
# unit-test + coverage gates are deliberately NOT duplicated here —
# those stay authoritative on amd64 ci.yml `Platform (Go)`. This lane
# is fast-feedback for the compile/vet/lint surface only. ----
- name: Setup Go
uses: actions/setup-go@40f1582b2485089dde7abd97c1529aa768e1baff # v5
with:
go-version: 'stable'
# cache:false — the self-hosted runner bind-mounts a persistent
# GOCACHE/GOMODCACHE (/var/cache/ci-go-{build,mod}); actions/cache is
# redundant and corrupts it by untarring over the bind mount ("File
# exists" -> "Failed to restore" -> partial cache -> linker/typecheck
# errors on heavy jobs, e.g. test -race link "too many errors" and
# go-arch-lint "without types"). Fleet sweep after the cp ci.yml find.
cache: false
- name: Go build + vet (workspace-server)
working-directory: workspace-server
run: |
go mod download
go build ./cmd/server
go vet ./...
- name: golangci-lint (workspace-server)
working-directory: workspace-server
run: |
go install github.com/golangci/golangci-lint/v2/cmd/golangci-lint@v2.12.2
"$(go env GOPATH)/bin/golangci-lint" run --timeout 3m ./...
# ---- Shellcheck (container-independent: shellcheck binary only).
# Mirrors ci.yml `Shellcheck (E2E scripts)` bulk pass scope. ----
- name: Install shellcheck (arm64)
run: |
if ! command -v shellcheck >/dev/null 2>&1; then
echo "shellcheck not preinstalled on this self-hosted runner."
echo "Attempting Homebrew install (Mac arm64)."
brew install shellcheck || {
echo "::warning::shellcheck unavailable on runner; advisory shellcheck skipped."
exit 0
}
fi
shellcheck --version
- name: Shellcheck tests/e2e + infra/scripts
run: |
command -v shellcheck >/dev/null 2>&1 || { echo "skip"; exit 0; }
find tests/e2e infra/scripts -type f -name '*.sh' -print0 \
| xargs -0 shellcheck --severity=warning
# ---- Python lint/compile (container-independent: CPython only).
# Lint + import-compile surface; the authoritative pytest + coverage
# floors stay on amd64 ci.yml `Python Lint & Test`. ----
- name: Setup Python
uses: actions/setup-python@a309ff8b426b58ec0e2a45f0f869d46889d02405 # v6.2.0
with:
python-version: '3.11'
- name: Python byte-compile (workspace)
working-directory: workspace
run: |
python -m pip install --quiet ruff || true
python -m compileall -q .
if command -v ruff >/dev/null 2>&1; then
ruff check . || echo "::warning::ruff findings (advisory only)"
fi
- name: Advisory summary
if: always()
run: |
{
echo "## arm64 advisory fast-checks complete"
echo ""
echo "This lane is **advisory** — it does not gate merges."
echo "Authoritative required CI remains \`CI / all-required\`"
echo "on the amd64 pool (\`ci.yml\`, unchanged by this PR)."
} >> "$GITHUB_STEP_SUMMARY"
+1 -1
View File
@@ -74,7 +74,7 @@ jobs:
# queue now reads the required status contexts from BRANCH PROTECTION
# (status_check_contexts) so non-required governance reds (qa-review,
# security-review, sop-checklist when not branch-required,
# E2E Chat, Staging SaaS, ci-arm64-advisory) cannot block a merge.
# E2E Chat, Staging SaaS) cannot block a merge.
# If branch protection cannot be enumerated the queue HOLDS
# (fail-closed). REQUIRED_APPROVALS below is only a fallback used when
# branch protection does not specify required_approvals.
+1 -1
View File
@@ -30,7 +30,7 @@ name: lint-setup-go-cache
# (advisory). FOLLOW-UP: after core#2524 merges and main is clean for
# 3 days, flip continue-on-error -> false to make this a hard gate.
# This PR already removes the default-true hits the sweep PR does not
# touch (ci.yml, ci-arm64-advisory.yml, handlers-postgres-integration.yml,
# touch (ci.yml, handlers-postgres-integration.yml,
# weekly-platform-go.yml).
on:
@@ -1,137 +0,0 @@
name: Lint shellcheck (arm64 pilot)
# Mac-CI dual-track pilot (#233). ADDITIVE / NOT REQUIRED.
#
# Validates the arm64 self-hosted lane (no docker.sock, no privileged
# ops) before any required gate moves onto it.
#
# Runner label mapping (2026-05-22 fix): the actual Mac mini runner
# registered in this Gitea ships labels
# ["self-hosted","macos-self-hosted-arm64","arm64-darwin"]
# — no plain `arm64`. The earlier `runs-on: [self-hosted, arm64]`
# could not match any registered runner so every fire of this workflow
# was assigned task_id=0 / runner_id=NULL → Gitea cancelled it. The
# rows showed up as Cancelled in the action status feed (not Failed)
# but the lane never actually ran. Workflow now selects on
# `arm64-darwin` which is the canonical Mac-arm64 label per the
# Mac mini's registration (per internal#494 capability-honest labels).
#
# If we later want to add a Linux-arm64 runner to the same lane, add
# both labels to that runner's registration AND broaden the selector
# here — don't rename `arm64-darwin` (it's Mac-specific by design and
# `feedback_pc2_runner_labels_must_stay_narrow` rule applies).
#
# Pairs with internal#543 (RFC: Mac arm64 multi-arch runner-base) and
# internal#494 (multi-arch runner-base capability-honest labels).
# No paths: filter on purpose (feedback_path_filtered_workflow_cant_be_required).
on:
pull_request:
branches:
- main
- staging
push:
branches:
- main
permissions:
contents: read
jobs:
shellcheck-arm64:
name: shellcheck-arm64 (pilot)
runs-on: [self-hosted, arm64-darwin]
# NOT a required check; safe to sit pending until Mac runner is up.
# If the Mac runner has trouble pulling actions/checkout we fall
# back to a plain git clone (see step 'fallback clone').
timeout-minutes: 10
env:
GITHUB_SERVER_URL: https://git.moleculesai.app
steps:
- name: Identify runner
id: identify
continue-on-error: true
run: |
set -eu
echo "arch=$(uname -m)"
echo "kernel=$(uname -sr)"
echo "shell=$BASH_VERSION"
# Sanity: must actually be arm64. If amd64 sneaks in here,
# the job skips gracefully rather than hard-failing, because
# a mislabelled runner is an ops concern, not a code defect.
# Pilot lane must not make main red (#2146).
case "$(uname -m)" in
aarch64|arm64)
echo "arm64 confirmed"
echo "arm64=true" >> "$GITHUB_OUTPUT"
;;
*)
echo "ERROR: expected arm64, got $(uname -m) — label routing may be wrong"
echo "arm64=false" >> "$GITHUB_OUTPUT"
exit 1
;;
esac
- name: Checkout
if: steps.identify.outputs.arm64 == 'true'
uses: actions/checkout@v4
with:
fetch-depth: 1
- name: Install shellcheck (arm64)
if: steps.identify.outputs.arm64 == 'true'
continue-on-error: true
run: |
set -eu
if command -v shellcheck >/dev/null 2>&1; then
echo "shellcheck already present: $(shellcheck --version | head -1)"
else
# Prefer apt if the runner base ships it; else download the
# correct platform binary (darwin vs linux).
if command -v apt-get >/dev/null 2>&1; then
sudo apt-get update -qq
sudo apt-get install -y --no-install-recommends shellcheck
else
SC_VER=v0.10.0
if [ "$(uname -s)" = "Darwin" ]; then
SC_PKG="shellcheck-${SC_VER}.darwin.aarch64.tar.xz"
else
SC_PKG="shellcheck-${SC_VER}.linux.aarch64.tar.xz"
fi
curl -fsSL "https://github.com/koalaman/shellcheck/releases/download/${SC_VER}/${SC_PKG}" \
| tar -xJf - --strip-components=1
sudo mv shellcheck /usr/local/bin/
fi
fi
shellcheck --version | head -2
- name: Run shellcheck on .gitea/scripts/*.sh
if: steps.identify.outputs.arm64 == 'true'
continue-on-error: true
run: |
set -eu
# Only the scripts we control under .gitea/scripts. Pilot
# scope is intentionally narrow — broaden in a follow-up
# once the lane is proven.
if ! command -v shellcheck >/dev/null 2>&1 || ! shellcheck --version >/dev/null 2>&1; then
echo "WARN: shellcheck not functional — skipping (pilot mode)"
exit 0
fi
# NOTE: macOS ships Bash 3.2 (Apple license), no `mapfile`
# (Bash 4+ builtin). Mac mini runner empirically failed at
# `mapfile: command not found` (run 79275 / task 145654).
# Use the portable `while read` pattern instead — works on
# both Bash 3.2 (macOS) and Bash 4+ (Linux).
TARGETS=()
while IFS= read -r f; do
TARGETS+=("$f")
done < <(find .gitea/scripts -maxdepth 2 -type f -name '*.sh' | sort)
if [ "${#TARGETS[@]}" -eq 0 ]; then
echo "No .sh files found under .gitea/scripts — nothing to check"
exit 0
fi
echo "Checking ${#TARGETS[@]} file(s):"
printf ' %s\n' "${TARGETS[@]}"
# SC1091 = couldn't follow non-constant source; expected for
# CI-time analysis without the full runtime layout.
shellcheck --severity=error --exclude=SC1091 "${TARGETS[@]}"