fix(ci): publish-platform-image keychain + path diagnostics

Every publish-platform-image run since the aa41947 self-hosted runner
migration has been failing with two runner-level issues that the
workflow now works around (keychain) or surfaces clearly (path):

1. "error storing credentials - err: exit status 1, out:
   'User interaction is not allowed. (-25308)'"

   docker/login-action tries to persist the GHCR + Fly tokens in the
   macOS Keychain, but the Mac mini runner runs as a non-interactive
   launchd service without an unlocked desktop session — keychain
   access raises -25308. Fix: set DOCKER_CONFIG to a per-run temp dir
   containing a plain config.json before the login step so credentials
   land in a file, not the keychain. This is the same trick the
   GitHub-hosted macos runners use in docker action examples.

2. "Unexpected error attempting to determine if executable file
   exists '/usr/local/bin/docker': Error: EACCES: permission denied,
   stat '/usr/local/bin/docker'"

   Not a workflow bug — the runner literally can't read the Docker
   binary path. Adds a diagnostic step before QEMU/buildx setup that
   prints: PATH, `command -v docker`, `docker --version`, and
   `ls -la` on both /usr/local/bin/docker and /opt/homebrew/bin/docker.
   Surfacing these in the log means the next failure (if any) shows
   the actual problem instead of hiding behind a cryptic buildx error.

Does NOT fix the root cause of #2 — that needs the user to SSH into
the Mac mini runner and reinstall / re-permission Docker Desktop
(or switch to Colima/OrbStack). The diagnostic output will tell us
exactly which path is broken.

The 20+ queued CI runs from `ci.yml` are unrelated to this PR — they
are stuck because the self-hosted runner has severely degraded queue
throughput (runs wait 2+ hours before being picked up). That's a
separate runner-health issue tracked as a user action in the triage
report.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
Hongming Wang 2026-04-15 16:06:28 -07:00
parent 2afd65104d
commit 0b403aeeab

View File

@ -37,6 +37,30 @@ jobs:
- name: Checkout
uses: actions/checkout@v4
- name: Isolate Docker config (skip keychain)
# The Mac mini self-hosted runner runs as a non-interactive
# launchd service; docker/login-action's default credential store
# is the macOS Keychain, which raises
# error storing credentials - err: exit status 1, out:
# `User interaction is not allowed. (-25308)`
# without an unlocked desktop session. Point DOCKER_CONFIG at a
# per-run temp dir so the login step writes a plain config.json
# that never touches the keychain. Plus diagnostics: print the
# docker path so a future EACCES on /usr/local/bin/docker
# surfaces in the log instead of via a cryptic docker-login
# failure mid-step.
shell: bash
run: |
set -euo pipefail
mkdir -p "${RUNNER_TEMP}/docker-config"
echo '{"auths": {}}' > "${RUNNER_TEMP}/docker-config/config.json"
echo "DOCKER_CONFIG=${RUNNER_TEMP}/docker-config" >> "${GITHUB_ENV}"
echo "=== Runner docker diagnostics ==="
echo "PATH=$PATH"
command -v docker || echo "(docker not in PATH — the runner is missing the Docker CLI or it's not symlinked to a visible location)"
docker --version 2>&1 || true
ls -la /usr/local/bin/docker /opt/homebrew/bin/docker 2>&1 || true
- name: Set up QEMU
# Required on the Apple-silicon self-hosted runner — Fly tenant machines
# pull linux/amd64, and buildx needs binfmt handlers in Docker Desktop's