diff --git a/docs/runbooks/saas-secrets.md b/docs/runbooks/saas-secrets.md index 16fd3ad9..249b1120 100644 --- a/docs/runbooks/saas-secrets.md +++ b/docs/runbooks/saas-secrets.md @@ -12,6 +12,10 @@ update doesn't silently break production. | `DATABASE_URL` | `fly secrets` on `molecule-cp` | Control-plane Postgres connection (Neon `cool-sea-89357706`) | | `TENANT_REDIS_URL` | `fly secrets` on `molecule-cp` | Injected into every tenant container as `REDIS_URL` | | `SECRETS_ENCRYPTION_KEY` | `fly secrets` on `molecule-cp` | AES-256 key wrapping tenant DB/Redis URLs in `org_instances` (provisioner + tenant use this) | +| `RESEND_API_KEY` | `fly secrets` on `molecule-cp` | Resend REST API token used by `internal/email.ResendProvider` — GDPR erasure confirmation today; welcome + plan-change emails later. Empty → `DisabledProvider` silently no-ops all sends | +| `RESEND_FROM_EMAIL` | `fly secrets` on `molecule-cp` | RFC-5322 From line, typically `"Molecule AI "`. Must resolve to a Resend-verified domain or sends fail with `403 domain not verified` | +| `STRIPE_API_KEY` | `fly secrets` on `molecule-cp` | `sk_live_…` secret key used by `internal/billing.StripeProvider` for customer/subscription/checkout mutations + GDPR Art. 17 cascade | +| `STRIPE_WEBHOOK_SECRET` | `fly secrets` on `molecule-cp` | `whsec_…` used by `internal/billing.verifySignature` to reject forged webhook calls. Rotated independently from the API key — Stripe treats them as separate secrets | | `GITHUB_TOKEN` | Built-in GitHub Actions token | GHCR push; rotated automatically | ## Coupled secrets — MUST rotate together @@ -94,9 +98,88 @@ The Neon `molecule-cp` project has a stable primary endpoint. Rotate only if: Procedure: regenerate URI via Neon API → `flyctl secrets set DATABASE_URL=...`. Zero-downtime (Fly applies secret via rolling restart). +## Rotation procedure — RESEND_API_KEY + +Low-blast-radius rotation — the only consumer is the transactional-email +path and sends fail loudly (the cascade logs `purge confirmation email +failed`) without breaking user-facing flows. + +1. In Resend dashboard → API Keys → create a new key scoped to + "molecule-cp production", e.g. name + `molecule-cp-rotation-$(date +%Y%m%d)`. +2. Stage the replacement on Fly (not immediately live): + ``` + flyctl secrets set --app molecule-cp \ + --stage RESEND_API_KEY='re_...' + ``` + `--stage` holds the secret for the next deploy instead of restarting + machines immediately. Skip `--stage` if you want a rolling restart + right now. +3. Redeploy (or wait for the next image publish) — machines pick up the + new key. +4. Trigger a real send to verify: delete a disposable test org via + `DELETE /cp/orgs/test-rotate` and confirm the Resend dashboard shows + the event in Emails → Logs within a minute. +5. Revoke the old key in the Resend dashboard. + +### Blast-radius note + +The GDPR Art. 17 cascade sends a best-effort confirmation email after +purge succeeds; a failed send is logged but does **not** flip the 204 +response (purge data is already gone). This means a broken +`RESEND_API_KEY` silently skips confirmation emails — monitor the +`purge confirmation email failed` log line after any rotation. + +### Domain verification + +`RESEND_FROM_EMAIL` must come from a Resend-verified domain or every +send returns `403 domain not verified`. Domain verification lives in +Resend dashboard → Domains → Add Domain; Resend gives you 3 DNS records +(SPF, DKIM, DMARC) to add to the DNS provider for `moleculesai.app`. +**Do not rotate the From address without confirming the new domain is +verified** — there's no server-side check at deploy time. + +## Rotation procedure — STRIPE_API_KEY + STRIPE_WEBHOOK_SECRET + +These are independent Stripe secrets. Rotating one does **not** affect +the other — they can be rotated on separate schedules. + +1. Stripe dashboard → Developers → API keys → **Roll key** on the live + secret key. Stripe gives you a new `sk_live_…`. +2. Stage on Fly: + ``` + flyctl secrets set --app molecule-cp \ + --stage STRIPE_API_KEY='sk_live_...' + ``` +3. Redeploy, then verify: hit + `https://molecule-cp.fly.dev/cp/billing/checkout` from an authenticated + test session and confirm the returned checkout URL redirects to a + valid Stripe-hosted page. +4. Stripe auto-revokes the old key after rolling — no manual revoke + step. + +For `STRIPE_WEBHOOK_SECRET`: + +1. Stripe dashboard → Developers → Webhooks → the molecule-cp endpoint → + **Roll secret**. +2. Stripe shows you BOTH old and new secret for a 24-hour overlap window. + Copy the new `whsec_…`. +3. Stage + deploy on Fly as above. +4. Inside the overlap window, send a Stripe CLI test event: + ``` + stripe trigger customer.subscription.updated \ + --forward-to https://molecule-cp.fly.dev/webhooks/stripe + ``` + If the signature-verification layer accepts it (no `400 invalid + signature` in Fly logs), the new secret is live. +5. Wait for the overlap window to expire or click "Delete old secret" + in Stripe dashboard. + ## Emergency contacts - **Fly**: billing dashboard at fly.io → Support - **Neon**: console.neon.tech → Support - **Upstash**: upstash.com → Support +- **Resend**: resend.com/dashboard → Help (email-only support, ~24h turnaround) +- **Stripe**: stripe.com/support → live chat - **GHCR**: github.com/orgs/Molecule-AI (org admins)