External architecture review flagged the SECRETS_ENCRYPTION_KEY env var on the platform as encryption-at-rest theater. The reviewer read only the platform repo and missed that the master key actually lives in AWS KMS at the control plane layer, with envelope encryption wrapping each tenant secret blob. Adds docs/architecture/secrets-key-custody.md as the canonical source of truth for the full chain: - Two-mode envelope (KMS_KEY_ARN vs static-key fallback) - Per-blob AES-256-GCM with KMS-wrapped DEKs - Where each key actually lives (KMS, CP env, tenant env) - Threat model per attacker capability - Rotation story (annual KMS CMK rotation, manual DEK rotation on incident) - Audit posture (SOC2 / ISO 27001 questionnaire bullets) Patches three downstream docs that previously stopped at the env-var level and link them to the new custody doc: - development/constraints-and-rules.md (Rule 11) - architecture/database-schema.md (workspace_secrets paragraph) - architecture/molecule-technical-doc.md (env-vars table) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
5.3 KiB
Secrets Key Custody
How the encryption keys that protect Molecule workspace secrets are managed, where each key lives, and what an attacker who compromises one layer can or cannot read.
This document exists because the platform repo (workspace-server) reads SECRETS_ENCRYPTION_KEY from its process env, which on its own looks like "encryption-at-rest theater." The full custody chain runs through the control plane (molecule-controlplane) where AWS KMS holds the key material at rest. Anyone reading only the platform repo sees half the picture.
Two modes
The control plane's internal/crypto.Envelope ships in two modes, picked at boot from env:
| Mode | Trigger | At-rest format | Recommended for |
|---|---|---|---|
| KMS envelope | KMS_KEY_ARN set |
Per-blob KMS-wrapped DEK + AES-256-GCM ciphertext | Production, multi-tenant SaaS |
| Static key | Only SECRETS_ENCRYPTION_KEY set |
AES-256-GCM with one process-wide key | Dev, self-hosted single-tenant |
Envelope.Decrypt is dual-mode — it can read either format on the way out, so a deployment can flip from static-key to KMS envelope without re-encrypting historical rows. Code: molecule-controlplane/internal/crypto/kms.go.
KMS envelope flow
When KMS_KEY_ARN is configured, every secret write looks like:
-
CP calls
kms.GenerateDataKey(KeyId=KMS_KEY_ARN, KeySpec=AES_256)→ returns{Plaintext, CiphertextBlob}. -
CP encrypts the secret with AES-256-GCM using
Plaintextas the key. -
CP discards
Plaintextfrom memory; persists the blob:[0x02 prefix][uint16 BE: encrypted_dek_len][encrypted_dek][nonce(12)][ct+tag]The
0x02byte distinguishes v2 (KMS-wrapped) blobs from legacy static-key blobs. -
To read: CP calls
kms.Decrypt(CiphertextBlob)→ recovers the AES key → unwraps the GCM ciphertext.
KMS calls cost ~$0.03 per 10k requests. We do not cache DEKs — provisioning rate is orders below steady-state reads, and not caching keeps key rotation reasoning simple.
What lives where
| Layer | Key custody | Plaintext key in memory? |
|---|---|---|
| AWS KMS | KMS-resident, never leaves the HSM | No (hardware) |
molecule-controlplane process |
KMS client + IAM role | Briefly per-secret-op only |
CP database (database_url_encrypted, tenant secrets) |
KMS-wrapped blobs | Never |
Per-tenant workspace-server env (SECRETS_ENCRYPTION_KEY) |
Provisioned at tenant boot by CP | Yes, for the tenant's process lifetime |
Tenant Postgres (workspace_secrets.value) |
AES-256-GCM with the tenant's key | Never |
The "plaintext in tenant memory" row is the standard envelope-encryption trade-off: a DEK has to be unwrapped somewhere to be used. The blast radius of compromising one tenant's process is one tenant's secrets — not the whole fleet.
Threat model
| Attacker capability | Can they read tenant secrets? |
|---|---|
| Reads CP database backup | No — KMS unwrap requires IAM-scoped kms:Decrypt |
Steals KMS_KEY_ARN value |
No — ARN alone does nothing without IAM access |
| Compromises CP IAM role | Yes — can kms:Decrypt any wrapped DEK |
| Reads tenant Postgres (one tenant) | No — SECRETS_ENCRYPTION_KEY lives only in the tenant's own EC2 process env, not in DB |
| Compromises one tenant's EC2 | Yes for that tenant's secrets, no for any other tenant |
| Compromises CP host | Game over (CP can provision arbitrary tenants) |
The two boundaries the design protects:
- DB-only compromise (incl. backups) → secrets remain encrypted; attacker needs separate access to either KMS (prod) or CP env (dev).
- One-tenant compromise → blast radius limited to that tenant; no cross-tenant key reuse.
Rotation
- Tenant key rotation (per-tenant
SECRETS_ENCRYPTION_KEY): re-encrypt the tenant'sworkspace_secretsrows under a new key, then swap the env var. Static-key mode requires this for all rotation; KMS mode only requires it on suspected key compromise. - KMS CMK rotation: AWS KMS handles annual automatic rotation of the customer master key. Re-wrapping data keys is unnecessary because each
Decryptcall routes through the current CMK version automatically (KMS keeps prior versions for decrypt-only).
Audit / compliance posture
For SOC2 / ISO 27001 / customer security questionnaires:
- Key custody: AWS KMS (FIPS 140-2 Level 3 HSM-backed)
- Key isolation: per-tenant DEK; no shared keys across tenants
- Access control: IAM-scoped
kms:Decrypt, audited via CloudTrail - At-rest encryption: AES-256-GCM (NIST-approved, authenticated)
- In-transit encryption: TLS 1.2+ for KMS, CP-to-tenant, tenant-to-DB
- Rotation: AWS-managed CMK rotation annually; manual DEK rotation on incident
Pointers
- KMS envelope code:
molecule-controlplane/internal/crypto/kms.go - Static-key fallback:
molecule-controlplane/internal/crypto/aes.go - Tenant secrets handler:
workspace-server/internal/crypto/aes.go - Tenant secrets schema: database-schema.md