From 3d7244ab9402cd83de981a5e20614b0e6b7dc35a Mon Sep 17 00:00:00 2001 From: Hongming Wang Date: Mon, 20 Apr 2026 14:11:45 -0700 Subject: [PATCH] feat(auth): org tokens reach /workspaces/:id/* subroutes + docs MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Extends WorkspaceAuth to accept org API tokens as a valid credential for any workspace sub-route in the org. Previously a user minting an org token could hit admin-surface endpoints (/workspaces, /org/import, etc.) but couldn't reach per-workspace routes like /workspaces/:id/channels — those were gated by WorkspaceAuth which only knew about workspace-scoped tokens. Scope matches the explicit product spec: one org API key can manipulate every workspace in the org. AI agents given a key can read/write channels, tokens, schedules, secrets, tasks across all workspaces. ## WorkspaceAuth tier order 1. ADMIN_TOKEN exact match (break-glass / bootstrap) 2. Org API token (Validate against org_api_tokens) NEW 3. Workspace-scoped token (ValidateToken with :id binding) 4. Same-origin canvas referer Org token tier sits above the per-workspace check so a presenter of an org key doesn't hit the narrower ValidateToken failure path first. Checked with isSameOriginCanvas path unchanged. ## End-to-end verified Minted test token via ADMIN_TOKEN, then with that org token: - GET /workspaces → 200 (list all) - GET /workspaces/ → 200 (detail, admin-only route) - GET /workspaces//channels → 200 (workspace sub-route) - GET /workspaces//tokens → 200 (workspace tokens list) - GET /workspaces/ → 404 workspace not found (routing still scoped correctly) ## Documentation - docs/architecture/org-api-keys.md — design, data model, threat model, security properties - docs/architecture/org-api-keys-followups.md — 10 tracked follow-ups prioritized (role scoping P1, per-workspace binding P1, expiry P2, usage metrics P2, WorkOS user_id capture P2, rotation webhooks P3, mint-rate limit P3, audit log P2, CLI P3, migrate ADMIN_TOKEN to the same table P4) - docs/guides/org-api-keys.md — end-user guide (mint via UI, use in curl/Python/TS/AI agents, session-vs-key comparison) Co-Authored-By: Claude Opus 4.7 (1M context) --- docs/architecture/org-api-keys-followups.md | 213 ++++++++++++++++++ docs/architecture/org-api-keys.md | 167 ++++++++++++++ docs/guides/org-api-keys.md | 140 ++++++++++++ .../internal/middleware/wsauth_middleware.go | 17 +- 4 files changed, 536 insertions(+), 1 deletion(-) create mode 100644 docs/architecture/org-api-keys-followups.md create mode 100644 docs/architecture/org-api-keys.md create mode 100644 docs/guides/org-api-keys.md diff --git a/docs/architecture/org-api-keys-followups.md b/docs/architecture/org-api-keys-followups.md new file mode 100644 index 00000000..1712e0aa --- /dev/null +++ b/docs/architecture/org-api-keys-followups.md @@ -0,0 +1,213 @@ +# Organization API Keys — Follow-up Work + +> Tracked improvements to the beta `org_api_tokens` system. Each item +> has a rationale + sketch implementation + rough effort estimate. +> Ordered by priority. + +## 1. Role scoping (P1 — next after beta signal) + +**Problem:** Today every token is full-admin. A token given to a +simple read-only monitoring script is as dangerous as one given to +a deploy bot. No way to hand an AI agent a token that lets it read +workspace state but not nuke the org. + +**Proposal:** Add a `role` column to `org_api_tokens`: + +```sql +ALTER TABLE org_api_tokens + ADD COLUMN role TEXT NOT NULL DEFAULT 'admin' + CHECK (role IN ('admin', 'editor', 'reader')); +``` + +- `admin` — current behavior (all AdminAuth routes) +- `editor` — workspace CRUD + secrets + approvals, but NOT mint/ + revoke org tokens (closes the self-escalation loop) +- `reader` — GETs only, no mutations + +New middleware wrapper `RequireRole(role)` checks token's row +against the route's required minimum. Extend AdminAuth to stash +the resolved role on `c.Set("org_token_role", r)`. + +**Effort:** ~200 LOC + migration + UI role-picker in +`OrgTokensTab.tsx`. Breaking change for existing tokens (default +to `admin` preserves behavior). + +## 2. Per-workspace binding (P1) + +**Problem:** An org-admin token that only needs to touch one +workspace is overkill. AWS IAM equivalent: "this key can only read +bucket foo". + +**Proposal:** Optional `workspace_id` FK on the token. When set, +AdminAuth + WorkspaceAuth both accept the token ONLY for routes +scoped to that workspace (`/workspaces//*`). Tokens with +`workspace_id = NULL` behave as today (full-org). + +```sql +ALTER TABLE org_api_tokens + ADD COLUMN workspace_id UUID REFERENCES workspaces(id) ON DELETE CASCADE; +``` + +Cascade delete means revoking a workspace revokes its scoped +tokens automatically. UI adds a workspace dropdown at mint time. + +**Effort:** ~250 LOC. Pairs naturally with role scoping. + +## 3. Expiry (P2) + +**Problem:** Long-lived tokens are a liability. "Mint this key for +this one deploy and die after 1 hour" is a common ask. + +**Proposal:** Optional `expires_at` on the row, enforced in the +hot-path query: + +```sql +WHERE token_hash = $1 AND revoked_at IS NULL + AND (expires_at IS NULL OR expires_at > now()) +``` + +UI: mint form has "Expires in: [Never / 1h / 1d / 30d]" picker. +Show time-left on the list view; flag soon-to-expire in amber. + +**Effort:** ~80 LOC. Additive; existing tokens have NULL = never. + +## 4. Usage metrics (P2) + +**Problem:** `last_used_at` is the only observation we have. Users +want to see what a token is doing — which paths, from which IPs, +how often — so they can detect anomalies. + +**Proposal:** Async counter writes on every successful Validate. +New table: + +```sql +CREATE TABLE org_api_token_usage ( + token_id UUID REFERENCES org_api_tokens(id) ON DELETE CASCADE, + hour TIMESTAMPTZ NOT NULL, -- truncated to hour + request_count BIGINT NOT NULL DEFAULT 0, + last_path TEXT, + last_ip INET, + last_user_agent TEXT, + PRIMARY KEY (token_id, hour) +); +``` + +`ON CONFLICT DO UPDATE SET request_count = request_count + 1` — +atomic counter upserts, one row per token-hour. UI graphs last 30 +days per token. + +**Effort:** ~150 LOC + background sweep to prune >90-day rows. + +## 5. Rotation webhooks (P3) + +**Problem:** When a user revokes a token, integrations using it +get 401 with no warning. Big ones want "you're about to lose +access, here's 60s to rotate" signals. + +**Proposal:** Soft-revoke tier. Revoke now accepts +`?drain_seconds=60`. Token enters a `draining` state (still valid +but a warning header `X-Molecule-Token-Draining: true` is added to +every response). After drain window, fully revoked. + +Alternative / complement: webhook URL on the token. POST to it +when revoked. Safer because no drain period. + +**Effort:** ~200 LOC. Webhook variant requires retry logic + +delivery audit. + +## 6. Capture WorkOS user_id in created_by (P2, quick win) + +**Problem:** Today, tokens minted via the canvas UI log +`created_by: "session"` — we know it was a session but not whose. +Post-incident review can't link a token back to a user. + +**Proposal:** Thread the WorkOS user_id from the session-auth +verification through to the handler. The CP's +`/cp/auth/tenant-member` already returns `user_id`; stash it on +the gin context in `session_auth.go`; handler reads it for +`created_by`. + +```go +// session_auth.go after successful verify +c.Set("session_user_id", body.UserID) + +// handler +if v, ok := c.Get("session_user_id"); ok { + createdBy = "session:" + v.(string) +} +``` + +**Effort:** ~20 LOC. Unblocks Important follow-up #6 from today's +code review. + +## 7. Mint-rate limit (P3) + +**Problem:** A compromised session or admin token could mint +thousands of org tokens quickly, making forensic cleanup painful. + +**Proposal:** Rate limit mint calls per-org: max N tokens per 5 min. +Existing `middleware/ratelimit` package does exactly this — bind +the limiter to the mint route with a low ceiling. + +**Effort:** ~30 LOC. Do this before #5 — revoke-storms could hit +the same pattern. + +## 8. Audit log (P2) + +**Problem:** Token revocation is logged to stdout. That's fine for +Railway's retention window but ops want a queryable audit log. + +**Proposal:** New table `org_token_audit` with (token_id, action, +actor, occurred_at). Write on mint/revoke. Surface in admin +diagnostics endpoint. + +**Effort:** ~100 LOC + lightweight read API. + +## 9. CLI for local development (P3) + +**Problem:** Developers running canvas locally can't easily mint +and use org tokens against their dev tenant because the UI +requires a WorkOS session. + +**Proposal:** `molecli org-token create --name