From 0b403aeeab55330a47329b034ee3baf81bd2e456 Mon Sep 17 00:00:00 2001 From: Hongming Wang Date: Wed, 15 Apr 2026 16:06:28 -0700 Subject: [PATCH] fix(ci): publish-platform-image keychain + path diagnostics MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Every publish-platform-image run since the aa41947 self-hosted runner migration has been failing with two runner-level issues that the workflow now works around (keychain) or surfaces clearly (path): 1. "error storing credentials - err: exit status 1, out: 'User interaction is not allowed. (-25308)'" docker/login-action tries to persist the GHCR + Fly tokens in the macOS Keychain, but the Mac mini runner runs as a non-interactive launchd service without an unlocked desktop session — keychain access raises -25308. Fix: set DOCKER_CONFIG to a per-run temp dir containing a plain config.json before the login step so credentials land in a file, not the keychain. This is the same trick the GitHub-hosted macos runners use in docker action examples. 2. "Unexpected error attempting to determine if executable file exists '/usr/local/bin/docker': Error: EACCES: permission denied, stat '/usr/local/bin/docker'" Not a workflow bug — the runner literally can't read the Docker binary path. Adds a diagnostic step before QEMU/buildx setup that prints: PATH, `command -v docker`, `docker --version`, and `ls -la` on both /usr/local/bin/docker and /opt/homebrew/bin/docker. Surfacing these in the log means the next failure (if any) shows the actual problem instead of hiding behind a cryptic buildx error. Does NOT fix the root cause of #2 — that needs the user to SSH into the Mac mini runner and reinstall / re-permission Docker Desktop (or switch to Colima/OrbStack). The diagnostic output will tell us exactly which path is broken. The 20+ queued CI runs from `ci.yml` are unrelated to this PR — they are stuck because the self-hosted runner has severely degraded queue throughput (runs wait 2+ hours before being picked up). That's a separate runner-health issue tracked as a user action in the triage report. Co-Authored-By: Claude Opus 4.6 (1M context) --- .github/workflows/publish-platform-image.yml | 24 ++++++++++++++++++++ 1 file changed, 24 insertions(+) diff --git a/.github/workflows/publish-platform-image.yml b/.github/workflows/publish-platform-image.yml index eed94c3e..6c530584 100644 --- a/.github/workflows/publish-platform-image.yml +++ b/.github/workflows/publish-platform-image.yml @@ -37,6 +37,30 @@ jobs: - name: Checkout uses: actions/checkout@v4 + - name: Isolate Docker config (skip keychain) + # The Mac mini self-hosted runner runs as a non-interactive + # launchd service; docker/login-action's default credential store + # is the macOS Keychain, which raises + # error storing credentials - err: exit status 1, out: + # `User interaction is not allowed. (-25308)` + # without an unlocked desktop session. Point DOCKER_CONFIG at a + # per-run temp dir so the login step writes a plain config.json + # that never touches the keychain. Plus diagnostics: print the + # docker path so a future EACCES on /usr/local/bin/docker + # surfaces in the log instead of via a cryptic docker-login + # failure mid-step. + shell: bash + run: | + set -euo pipefail + mkdir -p "${RUNNER_TEMP}/docker-config" + echo '{"auths": {}}' > "${RUNNER_TEMP}/docker-config/config.json" + echo "DOCKER_CONFIG=${RUNNER_TEMP}/docker-config" >> "${GITHUB_ENV}" + echo "=== Runner docker diagnostics ===" + echo "PATH=$PATH" + command -v docker || echo "(docker not in PATH — the runner is missing the Docker CLI or it's not symlinked to a visible location)" + docker --version 2>&1 || true + ls -la /usr/local/bin/docker /opt/homebrew/bin/docker 2>&1 || true + - name: Set up QEMU # Required on the Apple-silicon self-hosted runner — Fly tenant machines # pull linux/amd64, and buildx needs binfmt handlers in Docker Desktop's