molecule-core/docs/architecture/secrets-key-custody.md
documentation-specialist 26afbbfdf4
Some checks failed
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 5s
CI / Detect changes (pull_request) Successful in 5s
E2E API Smoke Test / detect-changes (pull_request) Successful in 6s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 6s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 6s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 5s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 5s
CI / Platform (Go) (pull_request) Successful in 3s
CI / Canvas (Next.js) (pull_request) Successful in 5s
CI / Python Lint & Test (pull_request) Successful in 3s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 4s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 8s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 4s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 4s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 3s
Ops Scripts Tests / Ops scripts (unittest) (pull_request) Failing after 12s
CodeQL / Analyze (${{ matrix.language }}) (go) (pull_request) Failing after 51s
CodeQL / Analyze (${{ matrix.language }}) (javascript-typescript) (pull_request) Failing after 1m20s
CodeQL / Analyze (${{ matrix.language }}) (python) (pull_request) Failing after 1m20s
docs(internal): bulk-sed molecule-core .md docs → Gitea (#37 final molecule-core sweep)
Mass-sed across 17 files / 38 active refs in molecule-core .md docs
(README + CONTRIBUTING + docs/architecture/ + docs/blog/ + docs/guides/
+ docs/integrations/ + docs/quickstart.md + scripts/README.md).

Driver: /tmp/sweep_core.py — same pattern set as the
internal-marketing bulk-sed (PR #50). 4 url-substitution patterns +
SKIP_PATTERN preserves /pull/<n> /issues/<n> /commit/<sha>
/releases/... historical refs.

Files NOT touched in this PR:
- docs/workspace-runtime-package.md — owned by molecule-core#15
  (workspace-runtime source-edit per #41). Reverted my bulk-sed of
  that file to avoid merge conflict.
- 2 Go-import-path refs in docs/memory-plugins/testing-your-plugin.md
  (github.com/Molecule-AI/molecule-monorepo/platform/internal/...) —
  Q5 cross-repo Go-module migration territory.
- 1 GitHub Gist link in docs/guides/external-workspace-quickstart.md
  (gist.github.com/molecule-ai/...) — no Gitea equivalent;
  consistent with the same handling in docs#1.

Manual fixes (2):
- docs/blog/2026-04-20-chrome-devtools-mcp-seo/index.md:306 —
  GitHub Discussions (no Gitea equivalent) → issue tracker link
- docs/guides/external-workspace-quickstart.md:218 — tracking-issue
  ?q= query-string url (regex didn't catch) → reformulated text +
  Gitea search-by-query approach

Pattern matches my docs#1 (public docs site) PR + internal#50
(internal/marketing bulk-sed). Standard substitutions:
- https://github.com/Molecule-AI/<repo> → https://git.moleculesai.app/molecule-ai/<repo>
- /blob/<branch>/ + /tree/<branch>/ → /src/branch/<branch>/

Refs: molecule-ai/internal#37, molecule-ai/internal#38
2026-05-07 01:27:50 -07:00

86 lines
5.3 KiB
Markdown

# Secrets Key Custody
How the encryption keys that protect Molecule workspace secrets are managed, where each key lives, and what an attacker who compromises one layer can or cannot read.
This document exists because the platform repo (`workspace-server`) reads `SECRETS_ENCRYPTION_KEY` from its process env, which on its own looks like "encryption-at-rest theater." The full custody chain runs through the control plane (`molecule-controlplane`) where AWS KMS holds the key material at rest. Anyone reading only the platform repo sees half the picture.
## Two modes
The control plane's `internal/crypto.Envelope` ships in two modes, picked at boot from env:
| Mode | Trigger | At-rest format | Recommended for |
|------|---------|----------------|-----------------|
| **KMS envelope** | `KMS_KEY_ARN` set | Per-blob KMS-wrapped DEK + AES-256-GCM ciphertext | Production, multi-tenant SaaS |
| **Static key** | Only `SECRETS_ENCRYPTION_KEY` set | AES-256-GCM with one process-wide key | Dev, self-hosted single-tenant |
`Envelope.Decrypt` is dual-mode — it can read either format on the way out, so a deployment can flip from static-key to KMS envelope without re-encrypting historical rows. Code: `molecule-controlplane/internal/crypto/kms.go`.
## KMS envelope flow
When `KMS_KEY_ARN` is configured, every secret write looks like:
1. CP calls `kms.GenerateDataKey(KeyId=KMS_KEY_ARN, KeySpec=AES_256)` → returns `{Plaintext, CiphertextBlob}`.
2. CP encrypts the secret with AES-256-GCM using `Plaintext` as the key.
3. CP discards `Plaintext` from memory; persists the blob:
```
[0x02 prefix][uint16 BE: encrypted_dek_len][encrypted_dek][nonce(12)][ct+tag]
```
The `0x02` byte distinguishes v2 (KMS-wrapped) blobs from legacy static-key blobs.
4. To read: CP calls `kms.Decrypt(CiphertextBlob)` → recovers the AES key → unwraps the GCM ciphertext.
KMS calls cost ~$0.03 per 10k requests. We do not cache DEKs — provisioning rate is orders below steady-state reads, and not caching keeps key rotation reasoning simple.
## What lives where
| Layer | Key custody | Plaintext key in memory? |
|-------|-------------|--------------------------|
| AWS KMS | KMS-resident, never leaves the HSM | No (hardware) |
| `molecule-controlplane` process | KMS client + IAM role | Briefly per-secret-op only |
| CP database (`database_url_encrypted`, tenant secrets) | KMS-wrapped blobs | Never |
| Per-tenant `workspace-server` env (`SECRETS_ENCRYPTION_KEY`) | Provisioned at tenant boot by CP | Yes, for the tenant's process lifetime |
| Tenant Postgres (`workspace_secrets.value`) | AES-256-GCM with the tenant's key | Never |
The "plaintext in tenant memory" row is the standard envelope-encryption trade-off: a DEK has to be unwrapped somewhere to be used. The blast radius of compromising one tenant's process is one tenant's secrets — not the whole fleet.
## Threat model
| Attacker capability | Can they read tenant secrets? |
|---------------------|-------------------------------|
| Reads CP database backup | No — KMS unwrap requires IAM-scoped `kms:Decrypt` |
| Steals `KMS_KEY_ARN` value | No — ARN alone does nothing without IAM access |
| Compromises CP IAM role | Yes — can `kms:Decrypt` any wrapped DEK |
| Reads tenant Postgres (one tenant) | No — `SECRETS_ENCRYPTION_KEY` lives only in the tenant's own EC2 process env, not in DB |
| Compromises one tenant's EC2 | Yes for that tenant's secrets, no for any other tenant |
| Compromises CP host | Game over (CP can provision arbitrary tenants) |
The two boundaries the design protects:
- **DB-only compromise (incl. backups)** → secrets remain encrypted; attacker needs separate access to either KMS (prod) or CP env (dev).
- **One-tenant compromise** → blast radius limited to that tenant; no cross-tenant key reuse.
## Rotation
- **Tenant key rotation** (per-tenant `SECRETS_ENCRYPTION_KEY`): re-encrypt the tenant's `workspace_secrets` rows under a new key, then swap the env var. Static-key mode requires this for all rotation; KMS mode only requires it on suspected key compromise.
- **KMS CMK rotation**: AWS KMS handles annual automatic rotation of the customer master key. Re-wrapping data keys is unnecessary because each `Decrypt` call routes through the current CMK version automatically (KMS keeps prior versions for decrypt-only).
## Audit / compliance posture
For SOC2 / ISO 27001 / customer security questionnaires:
- **Key custody**: AWS KMS (FIPS 140-2 Level 3 HSM-backed)
- **Key isolation**: per-tenant DEK; no shared keys across tenants
- **Access control**: IAM-scoped `kms:Decrypt`, audited via CloudTrail
- **At-rest encryption**: AES-256-GCM (NIST-approved, authenticated)
- **In-transit encryption**: TLS 1.2+ for KMS, CP-to-tenant, tenant-to-DB
- **Rotation**: AWS-managed CMK rotation annually; manual DEK rotation on incident
## Pointers
- KMS envelope code: [`molecule-controlplane/internal/crypto/kms.go`](https://git.moleculesai.app/molecule-ai/molecule-controlplane/src/branch/main/internal/crypto/kms.go)
- Static-key fallback: [`molecule-controlplane/internal/crypto/aes.go`](https://git.moleculesai.app/molecule-ai/molecule-controlplane/src/branch/main/internal/crypto/aes.go)
- Tenant secrets handler: [`workspace-server/internal/crypto/aes.go`](../../workspace-server/internal/crypto/aes.go)
- Tenant secrets schema: [database-schema.md](./database-schema.md#workspace_secrets)