docs: GDPR Art. 17 erasure runbook
Documents the 4-step hard-delete cascade implemented in molecule-controlplane PR #29 (Stripe → Redis → Infra → DB rows), how to read the org_purges audit table when a purge fails, the 30-day GDPR deadline, and what the cascade deliberately does NOT cover (WorkOS users, LLM provider history, Langfuse traces). Cross-referenced from the "SaaS ops" block in CLAUDE.md so future agents find it when handling erasure requests. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
parent
eb6796042b
commit
9b82bce7ef
@ -23,6 +23,13 @@ secrets` on `molecule-cp`), the correct rotation order, and danger cases —
|
||||
notably `SECRETS_ENCRYPTION_KEY`, which cannot be rotated without a data
|
||||
migration until Phase H lands KMS envelope encryption.
|
||||
|
||||
When handling a GDPR erasure request (user asks "delete my org and all
|
||||
my data"), read **`docs/runbooks/gdpr-erasure.md`** first. It explains the
|
||||
4-step cascade in `molecule-controlplane` (Stripe → Redis → Infra → DB
|
||||
rows), how to read the `org_purges` audit table, how to resume a failed
|
||||
purge, and what the cascade deliberately does NOT cover (WorkOS users,
|
||||
LLM provider history, Langfuse traces).
|
||||
|
||||
## Agent operating rules (auto-loaded — read first)
|
||||
|
||||
The following are project-level rules that override default behavior. They
|
||||
|
||||
106
docs/runbooks/gdpr-erasure.md
Normal file
106
docs/runbooks/gdpr-erasure.md
Normal file
@ -0,0 +1,106 @@
|
||||
# GDPR Art. 17 hard-delete cascade
|
||||
|
||||
Operational reference for the "delete my org" flow in `molecule-controlplane`.
|
||||
Skim this before replying to an erasure request, answering a DPA (Data
|
||||
Processing Addendum) audit, or debugging a failed purge.
|
||||
|
||||
## What Art. 17 actually requires
|
||||
|
||||
The EU General Data Protection Regulation, Article 17 ("right to erasure" /
|
||||
"right to be forgotten") says: when a user asks us to delete their personal
|
||||
data, we must do so within 30 days and destroy **every copy** we control —
|
||||
including copies held by our sub-processors (Stripe, Fly, Neon, Upstash,
|
||||
Vercel, WorkOS). Soft-delete is not compliant. A database row with
|
||||
`deleted_at IS NOT NULL` still counts as "data we process" under the GDPR
|
||||
definition.
|
||||
|
||||
## What the cascade actually does
|
||||
|
||||
`DELETE /cp/orgs/:slug` triggers `handlers.executeOrgPurge`, which walks
|
||||
four steps in order:
|
||||
|
||||
| # | Step | Action | Idempotent? |
|
||||
|---|------|--------|-------------|
|
||||
| 1 | `stripe` | List every subscription on `cus_*`, DELETE each, then DELETE the customer record. Stripe retains deleted customers for ~30 days per their internal policy — we do not control that window | Yes (404 → success) |
|
||||
| 2 | `redis` | Upstash REST `scan` + `del` against pattern `<org_slug>:*` | Yes (empty scan = no-op) |
|
||||
| 3 | `infra` | `Provisioner.DeprovisionInstance` → Fly Machine destroy + Neon branch delete + Vercel subdomain removal | Yes at each sub-step |
|
||||
| 4 | `db_rows` | One transaction: `DELETE FROM org_instances` → `DELETE FROM org_members` → `DELETE FROM organizations` | Atomic |
|
||||
|
||||
Each successful step writes `org_purges.last_step`. The orchestrator reads
|
||||
`last_step` on entry and **skips every step at or before it** — so a retried
|
||||
DELETE resumes from the first unfinished step instead of repeating Stripe
|
||||
cancellations or Redis scans.
|
||||
|
||||
The `org_purges` audit row outlives the deleted org on purpose — `org_id` is
|
||||
NOT a foreign key. Auditors or support staff can still answer "when was
|
||||
acme.moleculesai.app deleted and did it succeed?" three months later.
|
||||
|
||||
## When a purge fails mid-cascade
|
||||
|
||||
The API returns `500` with a JSON body:
|
||||
|
||||
```json
|
||||
{
|
||||
"error": "purge cascade failed; retry the request to resume",
|
||||
"purge_id": "<uuid>"
|
||||
}
|
||||
```
|
||||
|
||||
What to do:
|
||||
|
||||
1. **Inspect the audit row** — `SELECT status, last_step, last_error, attempts
|
||||
FROM org_purges WHERE id = '<purge_id>'`. That tells you which step blew
|
||||
up and why.
|
||||
2. **Fix the underlying cause** if it's ours (Stripe API key rotation,
|
||||
Upstash network blip, Fly API 500).
|
||||
3. **Re-issue the DELETE** — the handler picks up from `last_step + 1`. No
|
||||
manual DB surgery is needed in the happy path.
|
||||
4. **If the step that failed is `db_rows`** — the transaction rolled back, so
|
||||
the org is still fully intact. Retry is safe.
|
||||
5. **If the step that failed is `infra`** — check Fly + Neon + Vercel
|
||||
dashboards before retrying. A half-destroyed Fly Machine won't block the
|
||||
retry (DeprovisionInstance is idempotent), but it's worth confirming the
|
||||
resource actually went away.
|
||||
|
||||
## 30-day deadline
|
||||
|
||||
GDPR gives us one calendar month to complete erasure from the request date.
|
||||
The cascade runs synchronously and typically finishes in <15 seconds, so
|
||||
latency is not the concern — **unattended failure** is. If an `org_purges`
|
||||
row sits in `status='failed'` for more than 24h, that's the operator's cue
|
||||
to intervene. A future Phase H task will add a cron that pings Slack when
|
||||
any purge row is older than 48h without hitting `completed`.
|
||||
|
||||
## What this cascade does NOT do
|
||||
|
||||
- **It does not delete WorkOS user records.** WorkOS Users are org-scoped
|
||||
(a user can belong to multiple orgs), and we don't own enough lifecycle
|
||||
signal to decide when to purge the underlying user account. When the last
|
||||
org containing a user is erased, the WorkOS user will be orphaned. Phase
|
||||
H.2 adds a sweep to reconcile.
|
||||
- **It does not delete LLM provider history.** Agent conversations that
|
||||
used OpenAI / Anthropic / OpenRouter may still appear in the provider's
|
||||
own retention window. Our DPAs with those vendors cap that at 30 days; we
|
||||
do not expose a hook to accelerate it.
|
||||
- **It does not delete Langfuse traces** for self-hosted Langfuse. In
|
||||
production we forward traces to Langfuse Cloud which has its own
|
||||
retention policy — check `LANGFUSE_HOST` in the env before claiming
|
||||
compliance.
|
||||
|
||||
## Testing the cascade
|
||||
|
||||
See the test plan in [PR #29](https://github.com/Molecule-AI/molecule-controlplane/pull/29)
|
||||
for the staging checklist. The unit tests cover the orchestrator logic
|
||||
(happy path, resume-from-step, Stripe failure, no-customer); end-to-end
|
||||
proof requires a real Stripe test-mode customer + provisioned Fly Machine
|
||||
because the failure modes that matter are transport errors, not logic.
|
||||
|
||||
## Related
|
||||
|
||||
- `docs/runbooks/saas-secrets.md` — if a cascade fails with "invalid API
|
||||
key" the relevant secret probably rotated
|
||||
- `docs/runbooks/admin-auth.md` — `DELETE /cp/orgs/:slug` is behind
|
||||
session-cookie auth in controlplane, not the workspace bearer-token
|
||||
middleware documented there
|
||||
- `molecule-controlplane/internal/handlers/purge.go` — the orchestrator
|
||||
- `molecule-controlplane/migrations/006_org_purges.*.sql` — audit schema
|
||||
Loading…
Reference in New Issue
Block a user