molecule-core/.github/workflows/publish-canvas-image.yml
Hongming Wang ca1d5741d5 fix(ci): bypass docker login + macOS Keychain for image publish
Six prior PRs (#273, #319, #322, #341, #484, #486) all kept calling
`docker login` and tried to coerce credsStore via increasingly elaborate
config tricks. None worked. The latest publish-canvas-image and
publish-platform-image runs on main are still failing with:

    error storing credentials - err: exit status 1,
    out: `User interaction is not allowed. (-25308)`

Verified locally on the runner host (2026-04-16): `docker login` on
macOS unconditionally writes credentials to osxkeychain after a
successful login, regardless of the config presented to it.

    # I wrote this:
    { "auths": {}, "credsStore": "", "credHelpers": {} }
    # After `docker login --config <dir> ghcr.io ...` succeeded:
    {
      "auths": { "ghcr.io": {} },        # empty — auth is in Keychain
      "credsStore": "osxkeychain"        # Docker rewrote it back
    }

So `--config` flag, DOCKER_CONFIG env var, credsStore="" etc. all share
the same fate: Docker re-enables osxkeychain after every successful
login. The Mac mini runner is a launchd user agent with a locked
Keychain, so storage fails with -25308.

This PR replaces the `docker login` invocation entirely. We write
`base64(user:pat)` directly into the disposable DOCKER_CONFIG's `auths`
map. `docker/build-push-action@v5` and the daemon honor the auths map
for push without ever calling `docker login`, so the Keychain is never
involved.

Same shape in both workflows:
- publish-canvas-image.yml — single registry (ghcr.io)
- publish-platform-image.yml — two registries (ghcr.io + registry.fly.io)
  Fly username remains literal "x".

Security:
- Token env vars never echoed. Heredoc writes the auth blob via
  `umask 077` (file mode 600). The temp config dir lives under
  RUNNER_TEMP and is reaped at job end.
- Diagnostics preserved (docker version + binary ls + registry keys
  only, no values) so future runner permission regressions remain
  visible without leaking secrets.

Equivalent to closed PR #464 — re-opening because main is still
broken (verified by inspecting the most recent failure). The closing
comment on #464 stated the issue was already addressed by #341, but
it isn't.
2026-04-16 09:25:20 -07:00

142 lines
6.0 KiB
YAML

name: publish-canvas-image
# Builds and pushes the canvas Docker image to GHCR whenever a commit lands
# on main that touches canvas code. Previously canvas changes were visible in
# CI (npm run build passed) but the live container was never updated —
# operators had to manually run `docker compose build canvas` each time.
#
# Mirror of publish-platform-image.yml, adapted for the Next.js canvas layer.
# See that workflow for inline notes on macOS Keychain isolation and QEMU.
on:
push:
branches: [main]
paths:
# Only rebuild when canvas source changes — saves GHA minutes on
# platform-only / docs-only / MCP-only merges.
- 'canvas/**'
- '.github/workflows/publish-canvas-image.yml'
# Manual trigger: use after a non-canvas merge that still needs a fresh
# image (e.g. a Dockerfile change lives outside the canvas/ tree).
workflow_dispatch:
inputs:
platform_url:
description: 'NEXT_PUBLIC_PLATFORM_URL baked into the bundle (default: http://localhost:8080)'
required: false
default: ''
ws_url:
description: 'NEXT_PUBLIC_WS_URL baked into the bundle (default: ws://localhost:8080/ws)'
required: false
default: ''
permissions:
contents: read
packages: write # required to push to ghcr.io/${{ github.repository_owner }}/*
env:
IMAGE_NAME: ghcr.io/molecule-ai/canvas
jobs:
build-and-push:
name: Build & push canvas image
runs-on: [self-hosted, macos, arm64]
steps:
- name: Checkout
uses: actions/checkout@v4
- name: Configure GHCR auth (write auths map; do NOT call docker login)
# `docker login` on macOS unconditionally writes credentials to the
# osxkeychain credential helper, even when DOCKER_CONFIG/config.json
# declares `credsStore: ""` and even when invoked with `--config`.
# Verified locally 2026-04-16 — after a successful login, Docker
# rewrites the same config file to:
# { "auths": { "ghcr.io": {} }, "credsStore": "osxkeychain" }
# i.e. the auth lives in the Keychain, not the config file. The
# Mac mini runner is a launchd user agent with a locked Keychain,
# so storage fails with `User interaction is not allowed (-25308)`.
#
# Six prior PRs (#273, #319, #322, #341, #484, #486) all kept calling
# `docker login` and tried to coerce credsStore — none worked.
# The only reliable fix is to skip `docker login` entirely and write
# the auth string directly. `docker/build-push-action@v5` and the
# daemon honor the `auths` map for push without needing login.
shell: bash
env:
GHCR_USER: ${{ github.actor }}
GHCR_TOKEN: ${{ secrets.GITHUB_TOKEN }}
run: |
set -eu
mkdir -p "${RUNNER_TEMP}/docker-config"
AUTH=$(printf '%s:%s' "${GHCR_USER}" "${GHCR_TOKEN}" | base64)
umask 077
cat > "${RUNNER_TEMP}/docker-config/config.json" <<JSON
{ "auths": { "ghcr.io": { "auth": "${AUTH}" } } }
JSON
echo "DOCKER_CONFIG=${RUNNER_TEMP}/docker-config" >> "${GITHUB_ENV}"
# Diagnostics that don't leak the token.
echo "=== docker ==="
command -v docker || echo "(docker not in PATH)"
docker --version 2>&1 || true
ls -la /usr/local/bin/docker /opt/homebrew/bin/docker 2>&1 || true
echo "=== auths registries (no values) ==="
grep -o '"[a-zA-Z0-9.-]*\.io"' "${RUNNER_TEMP}/docker-config/config.json" || true
- name: Set up QEMU
# Apple-silicon runner building linux/amd64 images for x86 hosts.
uses: docker/setup-qemu-action@v3
with:
platforms: linux/amd64
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
- name: Compute tags
id: tags
shell: bash
run: |
echo "sha=${GITHUB_SHA::7}" >> "$GITHUB_OUTPUT"
- name: Resolve build args
id: build_args
# Priority: workflow_dispatch input > repo secret > hardcoded default.
# NEXT_PUBLIC_* env vars are baked into the JS bundle at build time by
# Next.js — they cannot be changed at runtime without a full rebuild.
# For local docker-compose deployments the defaults (localhost:8080)
# work as-is; production deployments should set CANVAS_PLATFORM_URL
# and CANVAS_WS_URL as repository secrets.
#
# Inputs are passed via env vars (not direct ${{ }} interpolation) to
# prevent shell injection from workflow_dispatch string inputs.
shell: bash
env:
INPUT_PLATFORM_URL: ${{ github.event.inputs.platform_url }}
SECRET_PLATFORM_URL: ${{ secrets.CANVAS_PLATFORM_URL }}
INPUT_WS_URL: ${{ github.event.inputs.ws_url }}
SECRET_WS_URL: ${{ secrets.CANVAS_WS_URL }}
run: |
PLATFORM_URL="${INPUT_PLATFORM_URL:-${SECRET_PLATFORM_URL:-http://localhost:8080}}"
WS_URL="${INPUT_WS_URL:-${SECRET_WS_URL:-ws://localhost:8080/ws}}"
echo "platform_url=${PLATFORM_URL}" >> "$GITHUB_OUTPUT"
echo "ws_url=${WS_URL}" >> "$GITHUB_OUTPUT"
- name: Build & push canvas image to GHCR
uses: docker/build-push-action@v5
with:
context: ./canvas
file: ./canvas/Dockerfile
platforms: linux/amd64
push: true
build-args: |
NEXT_PUBLIC_PLATFORM_URL=${{ steps.build_args.outputs.platform_url }}
NEXT_PUBLIC_WS_URL=${{ steps.build_args.outputs.ws_url }}
tags: |
${{ env.IMAGE_NAME }}:latest
${{ env.IMAGE_NAME }}:sha-${{ steps.tags.outputs.sha }}
cache-from: type=gha
cache-to: type=gha,mode=max
labels: |
org.opencontainers.image.source=https://github.com/${{ github.repository }}
org.opencontainers.image.revision=${{ github.sha }}
org.opencontainers.image.description=Molecule AI canvas (Next.js 15 + React Flow)