Commit Graph

5 Commits

Author SHA1 Message Date
Hongming Wang
3ff40c4b68 chore(ci): migrate all jobs to self-hosted macOS arm64 runner
* chore(ci): migrate all jobs to self-hosted macOS arm64 runner

Switches every job in `ci.yml` and `publish-platform-image.yml` from
`ubuntu-latest` to `[self-hosted, macos, arm64]` to avoid GitHub-hosted
minute rate limits. All jobs run on a single Apple-silicon self-hosted
runner registered at the Molecule-AI org level.

Notable non-trivial adaptations (macOS runners can't use `services:` and
some GHA marketplace actions are Linux-only):

- e2e-api: `services: postgres/redis` replaced with inline `docker run`
  steps. Ports remapped to 15432/16379 to avoid collision with anything
  the host may already expose on the standard ports. Containers are named
  (`molecule-ci-postgres` / `molecule-ci-redis`) and torn down in an
  `if: always()` step. Postgres readiness is still gated on pg_isready
  via `docker exec`.
- shellcheck: `ludeeus/action-shellcheck` is a Docker action, Linux-only.
  Replaced with a direct `shellcheck` invocation (pre-installed on the
  runner) that scans `tests/e2e/*.sh` with `--severity=warning`.
- publish-platform-image: added `docker/setup-qemu-action@v3` and an
  explicit `platforms: linux/amd64` on both `docker/build-push-action`
  invocations. The runner is arm64 but Fly tenant machines pull amd64,
  so QEMU-emulated cross-arch builds are required. GHA cache-from/cache-to
  behavior is unchanged.

Runner prereqs (one-time host setup):
- Docker Desktop installed and running (for e2e-api + image publish)
- `shellcheck` on PATH
- `docker` on PATH
- Go / Node / gh / Python are installed via setup-* actions per job

* fix(ci): set AGENT_TOOLSDIRECTORY for python-lint on self-hosted runner

setup-python@v5 defaults to /Users/runner/hostedtoolcache which doesn't
exist on the hongming-claw self-hosted runner. AGENT_TOOLSDIRECTORY tells
the action to use a writable path under the runner user's home directory.

Fixes the only failing job in CI run 24469156329 on PR #186.

---------

Co-authored-by: Hongming Wang <HongmingWang-Rabbit@users.noreply.github.com>
2026-04-15 10:48:27 -07:00
Hongming Wang
6f785f0b5a fix(ci): revert Fly registry username to 'x' — 'molecule-ai' gets 401
Post-mortem on the failed publish-platform-image run on main (PR #82):

Fly's Docker registry requires username EXACTLY equal to "x". My
code-review "readability fix" changing it to "molecule-ai" caused
every push to return 401 Unauthorized. Verified locally:

  echo $FLY_API_TOKEN | docker login registry.fly.io -u x --password-stdin
  → Login Succeeded

  echo $FLY_API_TOKEN | docker login registry.fly.io -u molecule-ai --password-stdin
  → 401 Unauthorized

Lesson: don't second-guess docs that specify a literal value. Comment
now says "MUST be literal 'x'" with a 2026-04-15 verification note to
prevent future regressions.

Code-review process improvement: when reviewing a change against a
vendor API, prefer "preserve exact doc-specified values" over readability
suggestions. Logged as a cron-learning.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 17:21:53 -07:00
Hongming Wang
855d423f6c review: split push steps, runbook for secret rotation, username clarity
Addresses PR #82 code review: 🟡×3 + 🔵×5.

- Fly registry login username: 'x' → 'molecule-ai' + explanatory comment.
- Build & push split into two steps (GHCR / Fly registry) so a single-
  registry outage can't fail the other. Second step uses 'if: always()'
  to ensure Fly mirror runs even if GHCR push flakes.
- docs/runbooks/saas-secrets.md: full secret map + rotation procedures
  for every SaaS credential, with danger-case callouts. Documents the
  coupled FLY_API_TOKEN (lives in GHA secret AND fly secrets — must be
  rotated in both).
- CLAUDE.md: new 'SaaS ops' section linking to the runbook.
2026-04-14 17:09:11 -07:00
Hongming Wang
b811b47334 feat(ci): mirror platform image to registry.fly.io/molecule-tenant
Keeps ghcr.io/molecule-ai/platform private (per CEO direction — open-
source when full SaaS ships) while still letting the private control
plane's Fly provisioner boot tenant machines: Fly auto-authenticates
same-org machines against registry.fly.io, no per-tenant pull
credentials to wire.

Workflow now logs into both GHCR (using built-in GITHUB_TOKEN) and
Fly registry (using FLY_API_TOKEN secret) and pushes the same image to
four tags total:
- ghcr.io/molecule-ai/platform:latest
- ghcr.io/molecule-ai/platform:sha-<short>
- registry.fly.io/molecule-tenant:latest
- registry.fly.io/molecule-tenant:sha-<short>

Secret added via `gh secret set FLY_API_TOKEN` on the public repo.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 17:05:36 -07:00
Hongming Wang
035287df38 feat(ci): publish-platform-image workflow → ghcr.io/molecule-ai/platform
Phase B.2 companion to the private molecule-controlplane provisioner PR.
On every push to main that touches platform/**, builds platform/Dockerfile
and pushes to GHCR with two tags:

- :latest              (floating, always main's tip)
- :sha-<short-commit>  (immutable, pin-friendly)

Cache via GitHub Actions cache (cache-from: type=gha). Workflow_dispatch
trigger so we can re-publish after a docs-only merge if needed.

The private molecule-controlplane sets TENANT_IMAGE=ghcr.io/molecule-ai/platform:<tag>
and the provisioner creates each tenant Fly Machine from this image. Staying
on the same base image across tenants keeps upgrades atomic.

CLAUDE.md updated to document the new workflow in the CI pipeline section.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 16:37:49 -07:00