rabbitblood 262a52a32c docs(security): document the KMS-rooted custody chain for SECRETS_ENCRYPTION_KEY

External architecture review flagged the SECRETS_ENCRYPTION_KEY env var
on the platform as encryption-at-rest theater. The reviewer read only
the platform repo and missed that the master key actually lives in AWS
KMS at the control plane layer, with envelope encryption wrapping each
tenant secret blob.

Adds docs/architecture/secrets-key-custody.md as the canonical source
of truth for the full chain:

- Two-mode envelope (KMS_KEY_ARN vs static-key fallback)
- Per-blob AES-256-GCM with KMS-wrapped DEKs
- Where each key actually lives (KMS, CP env, tenant env)
- Threat model per attacker capability
- Rotation story (annual KMS CMK rotation, manual DEK rotation on incident)
- Audit posture (SOC2 / ISO 27001 questionnaire bullets)

Patches three downstream docs that previously stopped at the env-var
level and link them to the new custody doc:

- development/constraints-and-rules.md (Rule 11)
- architecture/database-schema.md (workspace_secrets paragraph)
- architecture/molecule-technical-doc.md (env-vars table)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-04-26 11:29:16 -07:00

5.3 KiB

Raw Permalink Blame History

Secrets Key Custody

How the encryption keys that protect Molecule workspace secrets are managed, where each key lives, and what an attacker who compromises one layer can or cannot read.

This document exists because the platform repo (workspace-server) reads SECRETS_ENCRYPTION_KEY from its process env, which on its own looks like "encryption-at-rest theater." The full custody chain runs through the control plane (molecule-controlplane) where AWS KMS holds the key material at rest. Anyone reading only the platform repo sees half the picture.

Two modes

The control plane's internal/crypto.Envelope ships in two modes, picked at boot from env:

Mode	Trigger	At-rest format	Recommended for
KMS envelope	`KMS_KEY_ARN` set	Per-blob KMS-wrapped DEK + AES-256-GCM ciphertext	Production, multi-tenant SaaS
Static key	Only `SECRETS_ENCRYPTION_KEY` set	AES-256-GCM with one process-wide key	Dev, self-hosted single-tenant

Envelope.Decrypt is dual-mode — it can read either format on the way out, so a deployment can flip from static-key to KMS envelope without re-encrypting historical rows. Code: molecule-controlplane/internal/crypto/kms.go.

KMS envelope flow

When KMS_KEY_ARN is configured, every secret write looks like:

CP calls kms.GenerateDataKey(KeyId=KMS_KEY_ARN, KeySpec=AES_256) → returns {Plaintext, CiphertextBlob}.
CP encrypts the secret with AES-256-GCM using Plaintext as the key.
CP discards Plaintext from memory; persists the blob:
```
[0x02 prefix][uint16 BE: encrypted_dek_len][encrypted_dek][nonce(12)][ct+tag]
```
The 0x02 byte distinguishes v2 (KMS-wrapped) blobs from legacy static-key blobs.
To read: CP calls kms.Decrypt(CiphertextBlob) → recovers the AES key → unwraps the GCM ciphertext.

KMS calls cost ~$0.03 per 10k requests. We do not cache DEKs — provisioning rate is orders below steady-state reads, and not caching keeps key rotation reasoning simple.

What lives where

Layer	Key custody	Plaintext key in memory?
AWS KMS	KMS-resident, never leaves the HSM	No (hardware)
`molecule-controlplane` process	KMS client + IAM role	Briefly per-secret-op only
CP database (`database_url_encrypted`, tenant secrets)	KMS-wrapped blobs	Never
Per-tenant `workspace-server` env (`SECRETS_ENCRYPTION_KEY`)	Provisioned at tenant boot by CP	Yes, for the tenant's process lifetime
Tenant Postgres (`workspace_secrets.value`)	AES-256-GCM with the tenant's key	Never

The "plaintext in tenant memory" row is the standard envelope-encryption trade-off: a DEK has to be unwrapped somewhere to be used. The blast radius of compromising one tenant's process is one tenant's secrets — not the whole fleet.

Threat model

Attacker capability	Can they read tenant secrets?
Reads CP database backup	No — KMS unwrap requires IAM-scoped `kms:Decrypt`
Steals `KMS_KEY_ARN` value	No — ARN alone does nothing without IAM access
Compromises CP IAM role	Yes — can `kms:Decrypt` any wrapped DEK
Reads tenant Postgres (one tenant)	No — `SECRETS_ENCRYPTION_KEY` lives only in the tenant's own EC2 process env, not in DB
Compromises one tenant's EC2	Yes for that tenant's secrets, no for any other tenant
Compromises CP host	Game over (CP can provision arbitrary tenants)

The two boundaries the design protects:

DB-only compromise (incl. backups) → secrets remain encrypted; attacker needs separate access to either KMS (prod) or CP env (dev).
One-tenant compromise → blast radius limited to that tenant; no cross-tenant key reuse.

Rotation

Tenant key rotation (per-tenant SECRETS_ENCRYPTION_KEY): re-encrypt the tenant's workspace_secrets rows under a new key, then swap the env var. Static-key mode requires this for all rotation; KMS mode only requires it on suspected key compromise.
KMS CMK rotation: AWS KMS handles annual automatic rotation of the customer master key. Re-wrapping data keys is unnecessary because each Decrypt call routes through the current CMK version automatically (KMS keeps prior versions for decrypt-only).

Audit / compliance posture

For SOC2 / ISO 27001 / customer security questionnaires:

Key custody: AWS KMS (FIPS 140-2 Level 3 HSM-backed)
Key isolation: per-tenant DEK; no shared keys across tenants
Access control: IAM-scoped kms:Decrypt, audited via CloudTrail
At-rest encryption: AES-256-GCM (NIST-approved, authenticated)
In-transit encryption: TLS 1.2+ for KMS, CP-to-tenant, tenant-to-DB
Rotation: AWS-managed CMK rotation annually; manual DEK rotation on incident

Pointers

KMS envelope code: molecule-controlplane/internal/crypto/kms.go
Static-key fallback: molecule-controlplane/internal/crypto/aes.go
Tenant secrets handler: workspace-server/internal/crypto/aes.go
Tenant secrets schema: database-schema.md

5.3 KiB Raw Permalink Blame History