Merge pull request #82 from Molecule-AI/feat/mirror-to-fly-registry
feat(ci): mirror platform image to registry.fly.io/molecule-tenant
This commit is contained in:
commit
31fca5ea6e
44
.github/workflows/publish-platform-image.yml
vendored
44
.github/workflows/publish-platform-image.yml
vendored
@ -23,6 +23,12 @@ permissions:
|
||||
env:
|
||||
# GHCR accepts mixed-case, but most tooling lowercases — keep us consistent.
|
||||
IMAGE_NAME: ghcr.io/molecule-ai/platform
|
||||
# Fly registry mirror — tenant machines provisioned by the private
|
||||
# `molecule-controlplane` pull from here (private GHCR image can't be
|
||||
# pulled by Fly machines without auth plumbing we don't want to add).
|
||||
# Fly auto-authenticates same-org machines against registry.fly.io, so
|
||||
# mirroring keeps GHCR private while tenants still boot.
|
||||
FLY_IMAGE_NAME: registry.fly.io/molecule-tenant
|
||||
|
||||
jobs:
|
||||
build-and-push:
|
||||
@ -43,6 +49,19 @@ jobs:
|
||||
username: ${{ github.actor }}
|
||||
password: ${{ secrets.GITHUB_TOKEN }}
|
||||
|
||||
- name: Log in to Fly registry
|
||||
# Fly's registry is entirely token-auth: username is ignored, password
|
||||
# must be a valid FLY_API_TOKEN. We pass "molecule-ai" as a human-
|
||||
# readable placeholder so this step is obvious to future readers.
|
||||
# Rotation: see docs/runbooks/saas-secrets.md — FLY_API_TOKEN lives in
|
||||
# two places (GitHub Actions secret here + `fly secrets` on molecule-cp)
|
||||
# and MUST be updated in both on rotation.
|
||||
uses: docker/login-action@v3
|
||||
with:
|
||||
registry: registry.fly.io
|
||||
username: molecule-ai
|
||||
password: ${{ secrets.FLY_API_TOKEN }}
|
||||
|
||||
- name: Compute tags
|
||||
id: tags
|
||||
# Emit two tags per build: `latest` (floating, always the main tip)
|
||||
@ -51,7 +70,11 @@ jobs:
|
||||
run: |
|
||||
echo "sha=${GITHUB_SHA::7}" >> "$GITHUB_OUTPUT"
|
||||
|
||||
- name: Build & push
|
||||
- name: Build & push to GHCR
|
||||
# Split from the Fly mirror so a registry.fly.io outage doesn't block
|
||||
# GHCR (or vice versa) — each registry's failure mode is isolated.
|
||||
# GHA cache is shared because both steps re-use the same Dockerfile
|
||||
# context + build args.
|
||||
uses: docker/build-push-action@v5
|
||||
with:
|
||||
context: ./platform
|
||||
@ -66,3 +89,22 @@ jobs:
|
||||
org.opencontainers.image.source=https://github.com/${{ github.repository }}
|
||||
org.opencontainers.image.revision=${{ github.sha }}
|
||||
org.opencontainers.image.description=Molecule AI tenant platform (one instance per org)
|
||||
|
||||
- name: Build & push to Fly registry
|
||||
# Continues even if GHCR push failed — `if: always()` ensures the
|
||||
# private control plane's tenant-image mirror lands regardless of
|
||||
# any GHCR-side flakiness.
|
||||
if: always()
|
||||
uses: docker/build-push-action@v5
|
||||
with:
|
||||
context: ./platform
|
||||
file: ./platform/Dockerfile
|
||||
push: true
|
||||
tags: |
|
||||
${{ env.FLY_IMAGE_NAME }}:latest
|
||||
${{ env.FLY_IMAGE_NAME }}:sha-${{ steps.tags.outputs.sha }}
|
||||
cache-from: type=gha
|
||||
labels: |
|
||||
org.opencontainers.image.source=https://github.com/${{ github.repository }}
|
||||
org.opencontainers.image.revision=${{ github.sha }}
|
||||
org.opencontainers.image.description=Molecule AI tenant platform (one instance per org)
|
||||
|
||||
@ -14,6 +14,15 @@ overlap / differentiation / terminology-collision notes. Cross-referenced
|
||||
from `PLAN.md` and `README.md`; it's the canonical starting point for
|
||||
"what else is out there."
|
||||
|
||||
## SaaS ops
|
||||
|
||||
When rotating SaaS credentials (Fly / Neon / Upstash / envelope key), read
|
||||
**`docs/runbooks/saas-secrets.md`** first. It documents which secrets live
|
||||
in multiple places (e.g. `FLY_API_TOKEN` in both GitHub Actions and `fly
|
||||
secrets` on `molecule-cp`), the correct rotation order, and danger cases —
|
||||
notably `SECRETS_ENCRYPTION_KEY`, which cannot be rotated without a data
|
||||
migration until Phase H lands KMS envelope encryption.
|
||||
|
||||
## Agent operating rules (auto-loaded — read first)
|
||||
|
||||
The following are project-level rules that override default behavior. They
|
||||
|
||||
102
docs/runbooks/saas-secrets.md
Normal file
102
docs/runbooks/saas-secrets.md
Normal file
@ -0,0 +1,102 @@
|
||||
# SaaS secret rotation — runbook
|
||||
|
||||
Where each secret lives, why, and the **full rotation procedure** so a partial
|
||||
update doesn't silently break production.
|
||||
|
||||
## Secret map
|
||||
|
||||
| Secret | Location(s) | Purpose |
|
||||
|---|---|---|
|
||||
| `FLY_API_TOKEN` | **(a)** `molecule-monorepo` GitHub Actions secret (push image to `registry.fly.io/molecule-tenant`) + **(b)** `fly secrets` on `molecule-cp` app (control plane creates + deletes tenant Fly Machines) | Any Fly Machines API call |
|
||||
| `NEON_API_KEY` | `fly secrets` on `molecule-cp` | Create + delete tenant Neon branches |
|
||||
| `DATABASE_URL` | `fly secrets` on `molecule-cp` | Control-plane Postgres connection (Neon `cool-sea-89357706`) |
|
||||
| `TENANT_REDIS_URL` | `fly secrets` on `molecule-cp` | Injected into every tenant container as `REDIS_URL` |
|
||||
| `SECRETS_ENCRYPTION_KEY` | `fly secrets` on `molecule-cp` | AES-256 key wrapping tenant DB/Redis URLs in `org_instances` (provisioner + tenant use this) |
|
||||
| `GITHUB_TOKEN` | Built-in GitHub Actions token | GHCR push; rotated automatically |
|
||||
|
||||
## Coupled secrets — MUST rotate together
|
||||
|
||||
`FLY_API_TOKEN` is the one secret duplicated across systems. Rotating **only
|
||||
one** will cause **silent** breakage:
|
||||
|
||||
- Rotating **only (a) GHA** → image publish workflow fails, but no alert; control plane keeps provisioning from the stale `latest` tag.
|
||||
- Rotating **only (b) Fly secrets** → control plane's Fly API calls start erroring (`401`), tenant provisioning fails, but image publishes keep succeeding so everything *looks* fine on the build side.
|
||||
|
||||
## Rotation procedure — FLY_API_TOKEN
|
||||
|
||||
1. Generate new token:
|
||||
```
|
||||
flyctl tokens create deploy --name molecule-cp-rotation-$(date +%Y%m%d)
|
||||
```
|
||||
2. Update **both** locations (order matters — Fly secrets first, then GHA):
|
||||
```
|
||||
# (b) Fly secrets — triggers zero-downtime redeploy
|
||||
flyctl secrets set --app molecule-cp FLY_API_TOKEN='FlyV1 fm2_...'
|
||||
|
||||
# (a) GitHub Actions secret — next workflow run uses new token
|
||||
echo 'FlyV1 fm2_...' | gh secret set FLY_API_TOKEN --repo Molecule-AI/molecule-monorepo
|
||||
```
|
||||
3. Verify:
|
||||
```
|
||||
# Control plane can reach Fly API:
|
||||
curl https://molecule-cp.fly.dev/health
|
||||
# Trigger image publish (dispatches workflow, pushes to both registries):
|
||||
gh workflow run publish-platform-image.yml --repo Molecule-AI/molecule-monorepo
|
||||
gh run list --repo Molecule-AI/molecule-monorepo --workflow publish-platform-image --limit 1
|
||||
```
|
||||
4. Revoke the old token:
|
||||
```
|
||||
flyctl tokens list
|
||||
flyctl tokens revoke <id-of-old-token>
|
||||
```
|
||||
|
||||
## Rotation procedure — NEON_API_KEY
|
||||
|
||||
1. Create replacement key in Neon console → Account Settings → API Keys.
|
||||
2. Update Fly secrets:
|
||||
```
|
||||
flyctl secrets set --app molecule-cp NEON_API_KEY='napi_...'
|
||||
```
|
||||
3. Trigger a test provision (dry run — create + delete):
|
||||
```
|
||||
curl -X POST https://molecule-cp.fly.dev/cp/orgs \
|
||||
-H 'Content-Type: application/json' \
|
||||
-d '{"slug":"keytest-'$(date +%s)'","name":"Rotation test"}'
|
||||
# Wait 60s, inspect logs:
|
||||
flyctl logs --app molecule-cp --no-tail | tail -30
|
||||
# Clean up the test org via DELETE once live
|
||||
```
|
||||
4. Revoke old key in Neon console.
|
||||
|
||||
## Rotation procedure — SECRETS_ENCRYPTION_KEY
|
||||
|
||||
**DANGEROUS**: rotating this key will invalidate every encrypted row in
|
||||
`org_instances.database_url_encrypted` + `redis_url_encrypted`. Every tenant
|
||||
becomes unreachable until re-provisioned.
|
||||
|
||||
Mitigation: we intentionally defer real KMS + key-rotation to Phase H. Until
|
||||
then, **do not rotate this key unless compromised.** If compromise, procedure is:
|
||||
|
||||
1. Generate new key: `openssl rand -hex 32`
|
||||
2. Set new key on `molecule-cp`.
|
||||
3. For every row in `org_instances`: re-provision the tenant (creates fresh
|
||||
Neon branch + Fly machine). The old encrypted URLs are un-decryptable but
|
||||
irrelevant — we mint fresh ones.
|
||||
4. Migration to rotate encrypted columns in-place (decrypt-with-old → encrypt-
|
||||
with-new) is Phase H work and requires envelope encryption with KMS.
|
||||
|
||||
## Rotation procedure — DATABASE_URL (control plane)
|
||||
|
||||
The Neon `molecule-cp` project has a stable primary endpoint. Rotate only if:
|
||||
- Neon forces a migration
|
||||
- The connection-URI password is leaked
|
||||
|
||||
Procedure: regenerate URI via Neon API → `flyctl secrets set DATABASE_URL=...`.
|
||||
Zero-downtime (Fly applies secret via rolling restart).
|
||||
|
||||
## Emergency contacts
|
||||
|
||||
- **Fly**: billing dashboard at fly.io → Support
|
||||
- **Neon**: console.neon.tech → Support
|
||||
- **Upstash**: upstash.com → Support
|
||||
- **GHCR**: github.com/orgs/Molecule-AI (org admins)
|
||||
Loading…
Reference in New Issue
Block a user