From a49e828588da7f9595e8377d5d85bccbef7a15b8 Mon Sep 17 00:00:00 2001 From: Hongming Wang Date: Mon, 20 Apr 2026 14:31:46 -0700 Subject: [PATCH] docs: strip internal roadmap/followups from public org-api-keys docs MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The monorepo docs/ tree is ecosystem + user-facing. Internal roadmap ("what we'll build next", priorities, effort estimates) doesn't belong there — customers reading our docs don't need our backlog in their face, and we shouldn't signal "feature X is coming" contractually when it's just a P2 item in internal tracking. Removes: - docs/architecture/org-api-keys-followups.md (the whole prioritized roadmap). Moved to the internal repo at runbooks/org-api-keys-followups.md where it belongs. - "Follow-up roadmap" section in docs/architecture/org-api- keys.md, replaced with a shorter "Known limitations" section that names the current constraints (full-admin only, no expiry, no user_id in session-minted audit) without speculating on when they change. - "What's coming" section in docs/guides/org-api-keys.md, replaced with "Current limits" that names the same constraints from the user's POV. Public docs now describe the feature as it exists TODAY. Internal tracking of what comes next lives in Molecule-AI/internal (private). --- docs/architecture/org-api-keys-followups.md | 213 -------------------- docs/architecture/org-api-keys.md | 27 +-- docs/guides/org-api-keys.md | 10 +- 3 files changed, 13 insertions(+), 237 deletions(-) delete mode 100644 docs/architecture/org-api-keys-followups.md diff --git a/docs/architecture/org-api-keys-followups.md b/docs/architecture/org-api-keys-followups.md deleted file mode 100644 index 1712e0aa..00000000 --- a/docs/architecture/org-api-keys-followups.md +++ /dev/null @@ -1,213 +0,0 @@ -# Organization API Keys — Follow-up Work - -> Tracked improvements to the beta `org_api_tokens` system. Each item -> has a rationale + sketch implementation + rough effort estimate. -> Ordered by priority. - -## 1. Role scoping (P1 — next after beta signal) - -**Problem:** Today every token is full-admin. A token given to a -simple read-only monitoring script is as dangerous as one given to -a deploy bot. No way to hand an AI agent a token that lets it read -workspace state but not nuke the org. - -**Proposal:** Add a `role` column to `org_api_tokens`: - -```sql -ALTER TABLE org_api_tokens - ADD COLUMN role TEXT NOT NULL DEFAULT 'admin' - CHECK (role IN ('admin', 'editor', 'reader')); -``` - -- `admin` — current behavior (all AdminAuth routes) -- `editor` — workspace CRUD + secrets + approvals, but NOT mint/ - revoke org tokens (closes the self-escalation loop) -- `reader` — GETs only, no mutations - -New middleware wrapper `RequireRole(role)` checks token's row -against the route's required minimum. Extend AdminAuth to stash -the resolved role on `c.Set("org_token_role", r)`. - -**Effort:** ~200 LOC + migration + UI role-picker in -`OrgTokensTab.tsx`. Breaking change for existing tokens (default -to `admin` preserves behavior). - -## 2. Per-workspace binding (P1) - -**Problem:** An org-admin token that only needs to touch one -workspace is overkill. AWS IAM equivalent: "this key can only read -bucket foo". - -**Proposal:** Optional `workspace_id` FK on the token. When set, -AdminAuth + WorkspaceAuth both accept the token ONLY for routes -scoped to that workspace (`/workspaces//*`). Tokens with -`workspace_id = NULL` behave as today (full-org). - -```sql -ALTER TABLE org_api_tokens - ADD COLUMN workspace_id UUID REFERENCES workspaces(id) ON DELETE CASCADE; -``` - -Cascade delete means revoking a workspace revokes its scoped -tokens automatically. UI adds a workspace dropdown at mint time. - -**Effort:** ~250 LOC. Pairs naturally with role scoping. - -## 3. Expiry (P2) - -**Problem:** Long-lived tokens are a liability. "Mint this key for -this one deploy and die after 1 hour" is a common ask. - -**Proposal:** Optional `expires_at` on the row, enforced in the -hot-path query: - -```sql -WHERE token_hash = $1 AND revoked_at IS NULL - AND (expires_at IS NULL OR expires_at > now()) -``` - -UI: mint form has "Expires in: [Never / 1h / 1d / 30d]" picker. -Show time-left on the list view; flag soon-to-expire in amber. - -**Effort:** ~80 LOC. Additive; existing tokens have NULL = never. - -## 4. Usage metrics (P2) - -**Problem:** `last_used_at` is the only observation we have. Users -want to see what a token is doing — which paths, from which IPs, -how often — so they can detect anomalies. - -**Proposal:** Async counter writes on every successful Validate. -New table: - -```sql -CREATE TABLE org_api_token_usage ( - token_id UUID REFERENCES org_api_tokens(id) ON DELETE CASCADE, - hour TIMESTAMPTZ NOT NULL, -- truncated to hour - request_count BIGINT NOT NULL DEFAULT 0, - last_path TEXT, - last_ip INET, - last_user_agent TEXT, - PRIMARY KEY (token_id, hour) -); -``` - -`ON CONFLICT DO UPDATE SET request_count = request_count + 1` — -atomic counter upserts, one row per token-hour. UI graphs last 30 -days per token. - -**Effort:** ~150 LOC + background sweep to prune >90-day rows. - -## 5. Rotation webhooks (P3) - -**Problem:** When a user revokes a token, integrations using it -get 401 with no warning. Big ones want "you're about to lose -access, here's 60s to rotate" signals. - -**Proposal:** Soft-revoke tier. Revoke now accepts -`?drain_seconds=60`. Token enters a `draining` state (still valid -but a warning header `X-Molecule-Token-Draining: true` is added to -every response). After drain window, fully revoked. - -Alternative / complement: webhook URL on the token. POST to it -when revoked. Safer because no drain period. - -**Effort:** ~200 LOC. Webhook variant requires retry logic + -delivery audit. - -## 6. Capture WorkOS user_id in created_by (P2, quick win) - -**Problem:** Today, tokens minted via the canvas UI log -`created_by: "session"` — we know it was a session but not whose. -Post-incident review can't link a token back to a user. - -**Proposal:** Thread the WorkOS user_id from the session-auth -verification through to the handler. The CP's -`/cp/auth/tenant-member` already returns `user_id`; stash it on -the gin context in `session_auth.go`; handler reads it for -`created_by`. - -```go -// session_auth.go after successful verify -c.Set("session_user_id", body.UserID) - -// handler -if v, ok := c.Get("session_user_id"); ok { - createdBy = "session:" + v.(string) -} -``` - -**Effort:** ~20 LOC. Unblocks Important follow-up #6 from today's -code review. - -## 7. Mint-rate limit (P3) - -**Problem:** A compromised session or admin token could mint -thousands of org tokens quickly, making forensic cleanup painful. - -**Proposal:** Rate limit mint calls per-org: max N tokens per 5 min. -Existing `middleware/ratelimit` package does exactly this — bind -the limiter to the mint route with a low ceiling. - -**Effort:** ~30 LOC. Do this before #5 — revoke-storms could hit -the same pattern. - -## 8. Audit log (P2) - -**Problem:** Token revocation is logged to stdout. That's fine for -Railway's retention window but ops want a queryable audit log. - -**Proposal:** New table `org_token_audit` with (token_id, action, -actor, occurred_at). Write on mint/revoke. Surface in admin -diagnostics endpoint. - -**Effort:** ~100 LOC + lightweight read API. - -## 9. CLI for local development (P3) - -**Problem:** Developers running canvas locally can't easily mint -and use org tokens against their dev tenant because the UI -requires a WorkOS session. - -**Proposal:** `molecli org-token create --name