molecule-core

Author	SHA1	Message	Date
Hongming Wang	aa2a283835	fix(ci): explicitly disable osxkeychain credsStore for self-hosted runner #273 tried to fix the macOS Keychain -25308 error by pointing DOCKER_CONFIG at a per-run temp dir with `{"auths": {}}`. That was necessary but not sufficient: Docker on macOS inherits `osxkeychain` as the default credsStore even when config.json doesn't declare one (comes from Docker Desktop's bundled binding), so the login-action still tried to call /usr/local/bin/docker-credential-osxkeychain which fails with -25308 from the non-interactive launchd session. Evidence: after #273, publish-platform-image still failed on every main merge with: error saving credentials: error storing credentials - err: exit status 1, out: `User interaction is not allowed. (-25308)` Fix: write a config.json that explicitly sets `credsStore: ""` and clears `credHelpers`, forcing Docker to store creds in the inline `auths` map of this disposable config.json instead of reaching for the keychain. Also print config.json at diagnostic time so a future regression surfaces in the log instead of at login. No runtime / test impact — this only changes what the runner writes to the workflow's temp DOCKER_CONFIG directory. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 21:20:06 -07:00
Hongming Wang	63934ab487	fix(ci): publish-platform-image keychain + path diagnostics Every publish-platform-image run since the `3ff40c4` self-hosted runner migration has been failing with two runner-level issues that the workflow now works around (keychain) or surfaces clearly (path): 1. "error storing credentials - err: exit status 1, out: 'User interaction is not allowed. (-25308)'" docker/login-action tries to persist the GHCR + Fly tokens in the macOS Keychain, but the Mac mini runner runs as a non-interactive launchd service without an unlocked desktop session — keychain access raises -25308. Fix: set DOCKER_CONFIG to a per-run temp dir containing a plain config.json before the login step so credentials land in a file, not the keychain. This is the same trick the GitHub-hosted macos runners use in docker action examples. 2. "Unexpected error attempting to determine if executable file exists '/usr/local/bin/docker': Error: EACCES: permission denied, stat '/usr/local/bin/docker'" Not a workflow bug — the runner literally can't read the Docker binary path. Adds a diagnostic step before QEMU/buildx setup that prints: PATH, `command -v docker`, `docker --version`, and `ls -la` on both /usr/local/bin/docker and /opt/homebrew/bin/docker. Surfacing these in the log means the next failure (if any) shows the actual problem instead of hiding behind a cryptic buildx error. Does NOT fix the root cause of #2 — that needs the user to SSH into the Mac mini runner and reinstall / re-permission Docker Desktop (or switch to Colima/OrbStack). The diagnostic output will tell us exactly which path is broken. The 20+ queued CI runs from `ci.yml` are unrelated to this PR — they are stuck because the self-hosted runner has severely degraded queue throughput (runs wait 2+ hours before being picked up). That's a separate runner-health issue tracked as a user action in the triage report. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 16:06:28 -07:00
Hongming Wang	3ff40c4b68	chore(ci): migrate all jobs to self-hosted macOS arm64 runner * chore(ci): migrate all jobs to self-hosted macOS arm64 runner Switches every job in `ci.yml` and `publish-platform-image.yml` from `ubuntu-latest` to `[self-hosted, macos, arm64]` to avoid GitHub-hosted minute rate limits. All jobs run on a single Apple-silicon self-hosted runner registered at the Molecule-AI org level. Notable non-trivial adaptations (macOS runners can't use `services:` and some GHA marketplace actions are Linux-only): - e2e-api: `services: postgres/redis` replaced with inline `docker run` steps. Ports remapped to 15432/16379 to avoid collision with anything the host may already expose on the standard ports. Containers are named (`molecule-ci-postgres` / `molecule-ci-redis`) and torn down in an `if: always()` step. Postgres readiness is still gated on pg_isready via `docker exec`. - shellcheck: `ludeeus/action-shellcheck` is a Docker action, Linux-only. Replaced with a direct `shellcheck` invocation (pre-installed on the runner) that scans `tests/e2e/.sh` with `--severity=warning`. - publish-platform-image: added `docker/setup-qemu-action@v3` and an explicit `platforms: linux/amd64` on both `docker/build-push-action` invocations. The runner is arm64 but Fly tenant machines pull amd64, so QEMU-emulated cross-arch builds are required. GHA cache-from/cache-to behavior is unchanged. Runner prereqs (one-time host setup): - Docker Desktop installed and running (for e2e-api + image publish) - `shellcheck` on PATH - `docker` on PATH - Go / Node / gh / Python are installed via setup- actions per job * fix(ci): set AGENT_TOOLSDIRECTORY for python-lint on self-hosted runner setup-python@v5 defaults to /Users/runner/hostedtoolcache which doesn't exist on the hongming-claw self-hosted runner. AGENT_TOOLSDIRECTORY tells the action to use a writable path under the runner user's home directory. Fixes the only failing job in CI run 24469156329 on PR #186. --------- Co-authored-by: Hongming Wang <HongmingWang-Rabbit@users.noreply.github.com>	2026-04-15 10:48:27 -07:00
Hongming Wang	6f785f0b5a	fix(ci): revert Fly registry username to 'x' — 'molecule-ai' gets 401 Post-mortem on the failed publish-platform-image run on main (PR #82): Fly's Docker registry requires username EXACTLY equal to "x". My code-review "readability fix" changing it to "molecule-ai" caused every push to return 401 Unauthorized. Verified locally: echo $FLY_API_TOKEN \| docker login registry.fly.io -u x --password-stdin → Login Succeeded echo $FLY_API_TOKEN \| docker login registry.fly.io -u molecule-ai --password-stdin → 401 Unauthorized Lesson: don't second-guess docs that specify a literal value. Comment now says "MUST be literal 'x'" with a 2026-04-15 verification note to prevent future regressions. Code-review process improvement: when reviewing a change against a vendor API, prefer "preserve exact doc-specified values" over readability suggestions. Logged as a cron-learning. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-14 17:21:53 -07:00
Hongming Wang	855d423f6c	review: split push steps, runbook for secret rotation, username clarity Addresses PR #82 code review: 🟡×3 + 🔵×5. - Fly registry login username: 'x' → 'molecule-ai' + explanatory comment. - Build & push split into two steps (GHCR / Fly registry) so a single- registry outage can't fail the other. Second step uses 'if: always()' to ensure Fly mirror runs even if GHCR push flakes. - docs/runbooks/saas-secrets.md: full secret map + rotation procedures for every SaaS credential, with danger-case callouts. Documents the coupled FLY_API_TOKEN (lives in GHA secret AND fly secrets — must be rotated in both). - CLAUDE.md: new 'SaaS ops' section linking to the runbook.	2026-04-14 17:09:11 -07:00
Hongming Wang	b811b47334	feat(ci): mirror platform image to registry.fly.io/molecule-tenant Keeps ghcr.io/molecule-ai/platform private (per CEO direction — open- source when full SaaS ships) while still letting the private control plane's Fly provisioner boot tenant machines: Fly auto-authenticates same-org machines against registry.fly.io, no per-tenant pull credentials to wire. Workflow now logs into both GHCR (using built-in GITHUB_TOKEN) and Fly registry (using FLY_API_TOKEN secret) and pushes the same image to four tags total: - ghcr.io/molecule-ai/platform:latest - ghcr.io/molecule-ai/platform:sha-<short> - registry.fly.io/molecule-tenant:latest - registry.fly.io/molecule-tenant:sha-<short> Secret added via `gh secret set FLY_API_TOKEN` on the public repo. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-14 17:05:36 -07:00
Hongming Wang	035287df38	feat(ci): publish-platform-image workflow → ghcr.io/molecule-ai/platform Phase B.2 companion to the private molecule-controlplane provisioner PR. On every push to main that touches platform/**, builds platform/Dockerfile and pushes to GHCR with two tags: - :latest (floating, always main's tip) - :sha-<short-commit> (immutable, pin-friendly) Cache via GitHub Actions cache (cache-from: type=gha). Workflow_dispatch trigger so we can re-publish after a docs-only merge if needed. The private molecule-controlplane sets TENANT_IMAGE=ghcr.io/molecule-ai/platform:<tag> and the provisioner creates each tenant Fly Machine from this image. Staying on the same base image across tenants keeps upgrades atomic. CLAUDE.md updated to document the new workflow in the CI pipeline section. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-14 16:37:49 -07:00

7 Commits