molecule-core/gdpr-erasure.md at 39074cc4ae2419a393eb37b80307298f68f477a5

Hongming Wang 39074cc4ae chore: final open-source cleanup — binary, stale paths, private refs

- Remove compiled workspace-server/server binary from git
- Fix .gitignore, .gitattributes, .githooks/pre-commit for renamed dirs
- Fix CI workflow path filters (workspace-template → workspace)
- Replace real EC2 IP and personal slug in test_saas_tenant.sh
- Scrub molecule-controlplane references in docs
- Fix stale workspace-template/ paths in provisioner, handlers, tests
- Clean tracked Python cache files

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-04-18 00:38:55 -07:00

5.1 KiB

Raw Blame History

Operational reference for the "delete my org" flow in the private control-plane repo. Skim this before replying to an erasure request, answering a DPA (Data Processing Addendum) audit, or debugging a failed purge.

What Art. 17 actually requires

The EU General Data Protection Regulation, Article 17 ("right to erasure" / "right to be forgotten") says: when a user asks us to delete their personal data, we must do so within 30 days and destroy every copy we control — including copies held by our sub-processors (Stripe, Fly, Neon, Upstash, Vercel, WorkOS). Soft-delete is not compliant. A database row with deleted_at IS NOT NULL still counts as "data we process" under the GDPR definition.

What the cascade actually does

DELETE /cp/orgs/:slug triggers handlers.executeOrgPurge, which walks four steps in order:

#	Step	Action	Idempotent?
1	`stripe`	List every subscription on `cus_*`, DELETE each, then DELETE the customer record. Stripe retains deleted customers for ~30 days per their internal policy — we do not control that window	Yes (404 → success)
2	`redis`	Upstash REST `scan` + `del` against pattern `<org_slug>:*`	Yes (empty scan = no-op)
3	`infra`	`Provisioner.DeprovisionInstance` → Fly Machine destroy + Neon branch delete + Vercel subdomain removal	Yes at each sub-step
4	`db_rows`	One transaction: `DELETE FROM org_instances` → `DELETE FROM org_members` → `DELETE FROM organizations`	Atomic

Each successful step writes org_purges.last_step. The orchestrator reads last_step on entry and skips every step at or before it — so a retried DELETE resumes from the first unfinished step instead of repeating Stripe cancellations or Redis scans.

The org_purges audit row outlives the deleted org on purpose — org_id is NOT a foreign key. Auditors or support staff can still answer "when was acme.moleculesai.app deleted and did it succeed?" three months later.

When a purge fails mid-cascade

The API returns 500 with a JSON body:

{
  "error":    "purge cascade failed; retry the request to resume",
  "purge_id": "<uuid>"
}

What to do:

Inspect the audit row — SELECT status, last_step, last_error, attempts FROM org_purges WHERE id = '<purge_id>'. That tells you which step blew up and why.
Fix the underlying cause if it's ours (Stripe API key rotation, Upstash network blip, Fly API 500).
Re-issue the DELETE — the handler picks up from last_step + 1. No manual DB surgery is needed in the happy path.
If the step that failed is db_rows — the transaction rolled back, so the org is still fully intact. Retry is safe.
If the step that failed is infra — check Fly + Neon + Vercel dashboards before retrying. A half-destroyed Fly Machine won't block the retry (DeprovisionInstance is idempotent), but it's worth confirming the resource actually went away.

30-day deadline

GDPR gives us one calendar month to complete erasure from the request date. The cascade runs synchronously and typically finishes in <15 seconds, so latency is not the concern — unattended failure is. If an org_purges row sits in status='failed' for more than 24h, that's the operator's cue to intervene. A future Phase H task will add a cron that pings Slack when any purge row is older than 48h without hitting completed.

What this cascade does NOT do

It does not delete WorkOS user records. WorkOS Users are org-scoped (a user can belong to multiple orgs), and we don't own enough lifecycle signal to decide when to purge the underlying user account. When the last org containing a user is erased, the WorkOS user will be orphaned. Phase H.2 adds a sweep to reconcile.
It does not delete LLM provider history. Agent conversations that used OpenAI / Anthropic / OpenRouter may still appear in the provider's own retention window. Our DPAs with those vendors cap that at 30 days; we do not expose a hook to accelerate it.
It does not delete Langfuse traces for self-hosted Langfuse. In production we forward traces to Langfuse Cloud which has its own retention policy — check LANGFUSE_HOST in the env before claiming compliance.

Testing the cascade

See the test plan in [PR #29](https://github.com/Molecule-AI/the private control-plane repo/pull/29) for the staging checklist. The unit tests cover the orchestrator logic (happy path, resume-from-step, Stripe failure, no-customer); end-to-end proof requires a real Stripe test-mode customer + provisioned Fly Machine because the failure modes that matter are transport errors, not logic.

docs/runbooks/saas-secrets.md — if a cascade fails with "invalid API key" the relevant secret probably rotated
docs/runbooks/admin-auth.md — DELETE /cp/orgs/:slug is behind session-cookie auth in controlplane, not the workspace bearer-token middleware documented there
the private control-plane repo/internal/handlers/purge.go — the orchestrator
the private control-plane repo/migrations/006_org_purges.*.sql — audit schema

5.1 KiB Raw Blame History

GDPR Art. 17 hard-delete cascade