docs: strip internal roadmap/followups from public org-api-keys docs

The monorepo docs/ tree is ecosystem + user-facing. Internal
roadmap ("what we'll build next", priorities, effort estimates)
doesn't belong there — customers reading our docs don't need our
backlog in their face, and we shouldn't signal "feature X is
coming" contractually when it's just a P2 item in internal
tracking.

Removes:
  - docs/architecture/org-api-keys-followups.md (the whole
    prioritized roadmap). Moved to the internal repo at
    runbooks/org-api-keys-followups.md where it belongs.
  - "Follow-up roadmap" section in docs/architecture/org-api-
    keys.md, replaced with a shorter "Known limitations" section
    that names the current constraints (full-admin only, no
    expiry, no user_id in session-minted audit) without
    speculating on when they change.
  - "What's coming" section in docs/guides/org-api-keys.md,
    replaced with "Current limits" that names the same
    constraints from the user's POV.

Public docs now describe the feature as it exists TODAY. Internal
tracking of what comes next lives in Molecule-AI/internal (private).
This commit is contained in:
Hongming Wang 2026-04-20 14:31:46 -07:00
parent 2a0a6153fb
commit a49e828588
3 changed files with 13 additions and 237 deletions

View File

@ -1,213 +0,0 @@
# Organization API Keys — Follow-up Work
> Tracked improvements to the beta `org_api_tokens` system. Each item
> has a rationale + sketch implementation + rough effort estimate.
> Ordered by priority.
## 1. Role scoping (P1 — next after beta signal)
**Problem:** Today every token is full-admin. A token given to a
simple read-only monitoring script is as dangerous as one given to
a deploy bot. No way to hand an AI agent a token that lets it read
workspace state but not nuke the org.
**Proposal:** Add a `role` column to `org_api_tokens`:
```sql
ALTER TABLE org_api_tokens
ADD COLUMN role TEXT NOT NULL DEFAULT 'admin'
CHECK (role IN ('admin', 'editor', 'reader'));
```
- `admin` — current behavior (all AdminAuth routes)
- `editor` — workspace CRUD + secrets + approvals, but NOT mint/
revoke org tokens (closes the self-escalation loop)
- `reader` — GETs only, no mutations
New middleware wrapper `RequireRole(role)` checks token's row
against the route's required minimum. Extend AdminAuth to stash
the resolved role on `c.Set("org_token_role", r)`.
**Effort:** ~200 LOC + migration + UI role-picker in
`OrgTokensTab.tsx`. Breaking change for existing tokens (default
to `admin` preserves behavior).
## 2. Per-workspace binding (P1)
**Problem:** An org-admin token that only needs to touch one
workspace is overkill. AWS IAM equivalent: "this key can only read
bucket foo".
**Proposal:** Optional `workspace_id` FK on the token. When set,
AdminAuth + WorkspaceAuth both accept the token ONLY for routes
scoped to that workspace (`/workspaces/<id>/*`). Tokens with
`workspace_id = NULL` behave as today (full-org).
```sql
ALTER TABLE org_api_tokens
ADD COLUMN workspace_id UUID REFERENCES workspaces(id) ON DELETE CASCADE;
```
Cascade delete means revoking a workspace revokes its scoped
tokens automatically. UI adds a workspace dropdown at mint time.
**Effort:** ~250 LOC. Pairs naturally with role scoping.
## 3. Expiry (P2)
**Problem:** Long-lived tokens are a liability. "Mint this key for
this one deploy and die after 1 hour" is a common ask.
**Proposal:** Optional `expires_at` on the row, enforced in the
hot-path query:
```sql
WHERE token_hash = $1 AND revoked_at IS NULL
AND (expires_at IS NULL OR expires_at > now())
```
UI: mint form has "Expires in: [Never / 1h / 1d / 30d]" picker.
Show time-left on the list view; flag soon-to-expire in amber.
**Effort:** ~80 LOC. Additive; existing tokens have NULL = never.
## 4. Usage metrics (P2)
**Problem:** `last_used_at` is the only observation we have. Users
want to see what a token is doing — which paths, from which IPs,
how often — so they can detect anomalies.
**Proposal:** Async counter writes on every successful Validate.
New table:
```sql
CREATE TABLE org_api_token_usage (
token_id UUID REFERENCES org_api_tokens(id) ON DELETE CASCADE,
hour TIMESTAMPTZ NOT NULL, -- truncated to hour
request_count BIGINT NOT NULL DEFAULT 0,
last_path TEXT,
last_ip INET,
last_user_agent TEXT,
PRIMARY KEY (token_id, hour)
);
```
`ON CONFLICT DO UPDATE SET request_count = request_count + 1`
atomic counter upserts, one row per token-hour. UI graphs last 30
days per token.
**Effort:** ~150 LOC + background sweep to prune >90-day rows.
## 5. Rotation webhooks (P3)
**Problem:** When a user revokes a token, integrations using it
get 401 with no warning. Big ones want "you're about to lose
access, here's 60s to rotate" signals.
**Proposal:** Soft-revoke tier. Revoke now accepts
`?drain_seconds=60`. Token enters a `draining` state (still valid
but a warning header `X-Molecule-Token-Draining: true` is added to
every response). After drain window, fully revoked.
Alternative / complement: webhook URL on the token. POST to it
when revoked. Safer because no drain period.
**Effort:** ~200 LOC. Webhook variant requires retry logic +
delivery audit.
## 6. Capture WorkOS user_id in created_by (P2, quick win)
**Problem:** Today, tokens minted via the canvas UI log
`created_by: "session"` — we know it was a session but not whose.
Post-incident review can't link a token back to a user.
**Proposal:** Thread the WorkOS user_id from the session-auth
verification through to the handler. The CP's
`/cp/auth/tenant-member` already returns `user_id`; stash it on
the gin context in `session_auth.go`; handler reads it for
`created_by`.
```go
// session_auth.go after successful verify
c.Set("session_user_id", body.UserID)
// handler
if v, ok := c.Get("session_user_id"); ok {
createdBy = "session:" + v.(string)
}
```
**Effort:** ~20 LOC. Unblocks Important follow-up #6 from today's
code review.
## 7. Mint-rate limit (P3)
**Problem:** A compromised session or admin token could mint
thousands of org tokens quickly, making forensic cleanup painful.
**Proposal:** Rate limit mint calls per-org: max N tokens per 5 min.
Existing `middleware/ratelimit` package does exactly this — bind
the limiter to the mint route with a low ceiling.
**Effort:** ~30 LOC. Do this before #5 — revoke-storms could hit
the same pattern.
## 8. Audit log (P2)
**Problem:** Token revocation is logged to stdout. That's fine for
Railway's retention window but ops want a queryable audit log.
**Proposal:** New table `org_token_audit` with (token_id, action,
actor, occurred_at). Write on mint/revoke. Surface in admin
diagnostics endpoint.
**Effort:** ~100 LOC + lightweight read API.
## 9. CLI for local development (P3)
**Problem:** Developers running canvas locally can't easily mint
and use org tokens against their dev tenant because the UI
requires a WorkOS session.
**Proposal:** `molecli org-token create --name <label>` uses
`ADMIN_TOKEN` from env + `MOLECULE_ORG_URL` to mint. Same API,
scripts-friendly.
**Effort:** ~80 LOC in molecli + a line in the docs guide.
## 10. Migrate ADMIN_TOKEN to org_api_tokens table (P4 — long-term)
**Problem:** `ADMIN_TOKEN` as an env var is a special case that
every auth tier has to handle. Once org tokens are feature-
complete (roles, expiry, binding), the env-var token is redundant
and complicates the auth code.
**Proposal:** Bootstrap the tenant by inserting a row labeled
`bootstrap` into `org_api_tokens` at provision time with the
current ADMIN_TOKEN value's hash. Remove the env-var check entirely
from AdminAuth. `ADMIN_TOKEN` becomes just "the initial token that
happens to be stored as a normal row".
Requires: roles + expiry shipped first (bootstrap token needs to
be demarcated as revocable-but-permanent-by-default).
**Effort:** ~150 LOC once prerequisites land.
---
## Tracked issues to file
Each of the above should become a GitHub issue when we're ready to
work it. One-liner label for the batch: `area:org-api-keys`.
## Non-goals
Explicit list of things we do NOT want to add:
- JWT / signed tokens. Opaque bearers + DB lookup is simpler and
matches every other token type in the system.
- OAuth scopes. We're not a third-party OAuth provider; this is
for internal integrations only.
- IP allow-lists per token. Captured nominally by the usage log
(#4) for detection, but enforcement adds operational friction
(customer VPN changes → all tokens break).

View File

@ -146,22 +146,13 @@ DELETE /org/tokens/:id revoke; idempotent (404 on already-revoked)
All three behind `AdminAuth`. See `internal/handlers/org_tokens.go`.
## Follow-up roadmap
## Known limitations
See `docs/architecture/org-api-keys-followups.md` for the full
list; headline items:
1. **Role scoping**: split into ADMIN / EDITOR / READER tiers. Then
WORKSPACE-SPECIFIC tokens ("this key can only touch workspace
X"). Aligns with the AWS IAM-style direction the product wants.
2. **Expiry**: optional `expires_at`, enforced in the hot-path
query. Lets users mint short-lived tokens for specific jobs.
3. **Usage metrics**: counter + last-request metadata
(path/ip/user-agent) for the UI so users can see what a token
is actually doing.
4. **Rotation hooks**: webhook-on-revoke so integrations know to
re-mint.
5. **Capture WorkOS user_id in `created_by`** when minted via session
(currently just records "session"). Requires propagating session
identity from the CP's tenant-member check through
`session_auth.go`.
- Every token is full-org admin. Role scoping (admin / editor /
reader) and per-workspace binding are planned but not shipped
today.
- No expiry / TTL. Tokens live until explicitly revoked.
- Tokens minted via canvas session are audited as
`created_by: "session"` without the WorkOS user_id. A specific
user's mint activity can't be attributed from the table alone
until the session identity is captured.

View File

@ -131,10 +131,8 @@ Full API reference: `docs/api-reference.md`.
Both unlock the same surface; the key is just the non-browser
equivalent.
## What's coming
## Current limits
Scoped roles (READ / WORKSPACE-WRITE / ORG-ADMIN), expiry timers,
per-workspace bindings, and usage metrics are on the roadmap. See
`docs/architecture/org-api-keys-followups.md`. For now every key
is full-admin by design — trading scope granularity for beta
shipping speed.
Every key is full-admin. Scoped roles (read-only / workspace-
write / admin), per-workspace bindings, and expiry are not yet
supported — treat every key as equivalent to being logged in.