Merge pull request #1111 from Molecule-AI/docs/remove-internal-from-public
docs: strip internal roadmap from public org-api-keys docs
This commit is contained in:
commit
078e83f7bc
@ -1,213 +0,0 @@
|
||||
# Organization API Keys — Follow-up Work
|
||||
|
||||
> Tracked improvements to the beta `org_api_tokens` system. Each item
|
||||
> has a rationale + sketch implementation + rough effort estimate.
|
||||
> Ordered by priority.
|
||||
|
||||
## 1. Role scoping (P1 — next after beta signal)
|
||||
|
||||
**Problem:** Today every token is full-admin. A token given to a
|
||||
simple read-only monitoring script is as dangerous as one given to
|
||||
a deploy bot. No way to hand an AI agent a token that lets it read
|
||||
workspace state but not nuke the org.
|
||||
|
||||
**Proposal:** Add a `role` column to `org_api_tokens`:
|
||||
|
||||
```sql
|
||||
ALTER TABLE org_api_tokens
|
||||
ADD COLUMN role TEXT NOT NULL DEFAULT 'admin'
|
||||
CHECK (role IN ('admin', 'editor', 'reader'));
|
||||
```
|
||||
|
||||
- `admin` — current behavior (all AdminAuth routes)
|
||||
- `editor` — workspace CRUD + secrets + approvals, but NOT mint/
|
||||
revoke org tokens (closes the self-escalation loop)
|
||||
- `reader` — GETs only, no mutations
|
||||
|
||||
New middleware wrapper `RequireRole(role)` checks token's row
|
||||
against the route's required minimum. Extend AdminAuth to stash
|
||||
the resolved role on `c.Set("org_token_role", r)`.
|
||||
|
||||
**Effort:** ~200 LOC + migration + UI role-picker in
|
||||
`OrgTokensTab.tsx`. Breaking change for existing tokens (default
|
||||
to `admin` preserves behavior).
|
||||
|
||||
## 2. Per-workspace binding (P1)
|
||||
|
||||
**Problem:** An org-admin token that only needs to touch one
|
||||
workspace is overkill. AWS IAM equivalent: "this key can only read
|
||||
bucket foo".
|
||||
|
||||
**Proposal:** Optional `workspace_id` FK on the token. When set,
|
||||
AdminAuth + WorkspaceAuth both accept the token ONLY for routes
|
||||
scoped to that workspace (`/workspaces/<id>/*`). Tokens with
|
||||
`workspace_id = NULL` behave as today (full-org).
|
||||
|
||||
```sql
|
||||
ALTER TABLE org_api_tokens
|
||||
ADD COLUMN workspace_id UUID REFERENCES workspaces(id) ON DELETE CASCADE;
|
||||
```
|
||||
|
||||
Cascade delete means revoking a workspace revokes its scoped
|
||||
tokens automatically. UI adds a workspace dropdown at mint time.
|
||||
|
||||
**Effort:** ~250 LOC. Pairs naturally with role scoping.
|
||||
|
||||
## 3. Expiry (P2)
|
||||
|
||||
**Problem:** Long-lived tokens are a liability. "Mint this key for
|
||||
this one deploy and die after 1 hour" is a common ask.
|
||||
|
||||
**Proposal:** Optional `expires_at` on the row, enforced in the
|
||||
hot-path query:
|
||||
|
||||
```sql
|
||||
WHERE token_hash = $1 AND revoked_at IS NULL
|
||||
AND (expires_at IS NULL OR expires_at > now())
|
||||
```
|
||||
|
||||
UI: mint form has "Expires in: [Never / 1h / 1d / 30d]" picker.
|
||||
Show time-left on the list view; flag soon-to-expire in amber.
|
||||
|
||||
**Effort:** ~80 LOC. Additive; existing tokens have NULL = never.
|
||||
|
||||
## 4. Usage metrics (P2)
|
||||
|
||||
**Problem:** `last_used_at` is the only observation we have. Users
|
||||
want to see what a token is doing — which paths, from which IPs,
|
||||
how often — so they can detect anomalies.
|
||||
|
||||
**Proposal:** Async counter writes on every successful Validate.
|
||||
New table:
|
||||
|
||||
```sql
|
||||
CREATE TABLE org_api_token_usage (
|
||||
token_id UUID REFERENCES org_api_tokens(id) ON DELETE CASCADE,
|
||||
hour TIMESTAMPTZ NOT NULL, -- truncated to hour
|
||||
request_count BIGINT NOT NULL DEFAULT 0,
|
||||
last_path TEXT,
|
||||
last_ip INET,
|
||||
last_user_agent TEXT,
|
||||
PRIMARY KEY (token_id, hour)
|
||||
);
|
||||
```
|
||||
|
||||
`ON CONFLICT DO UPDATE SET request_count = request_count + 1` —
|
||||
atomic counter upserts, one row per token-hour. UI graphs last 30
|
||||
days per token.
|
||||
|
||||
**Effort:** ~150 LOC + background sweep to prune >90-day rows.
|
||||
|
||||
## 5. Rotation webhooks (P3)
|
||||
|
||||
**Problem:** When a user revokes a token, integrations using it
|
||||
get 401 with no warning. Big ones want "you're about to lose
|
||||
access, here's 60s to rotate" signals.
|
||||
|
||||
**Proposal:** Soft-revoke tier. Revoke now accepts
|
||||
`?drain_seconds=60`. Token enters a `draining` state (still valid
|
||||
but a warning header `X-Molecule-Token-Draining: true` is added to
|
||||
every response). After drain window, fully revoked.
|
||||
|
||||
Alternative / complement: webhook URL on the token. POST to it
|
||||
when revoked. Safer because no drain period.
|
||||
|
||||
**Effort:** ~200 LOC. Webhook variant requires retry logic +
|
||||
delivery audit.
|
||||
|
||||
## 6. Capture WorkOS user_id in created_by (P2, quick win)
|
||||
|
||||
**Problem:** Today, tokens minted via the canvas UI log
|
||||
`created_by: "session"` — we know it was a session but not whose.
|
||||
Post-incident review can't link a token back to a user.
|
||||
|
||||
**Proposal:** Thread the WorkOS user_id from the session-auth
|
||||
verification through to the handler. The CP's
|
||||
`/cp/auth/tenant-member` already returns `user_id`; stash it on
|
||||
the gin context in `session_auth.go`; handler reads it for
|
||||
`created_by`.
|
||||
|
||||
```go
|
||||
// session_auth.go after successful verify
|
||||
c.Set("session_user_id", body.UserID)
|
||||
|
||||
// handler
|
||||
if v, ok := c.Get("session_user_id"); ok {
|
||||
createdBy = "session:" + v.(string)
|
||||
}
|
||||
```
|
||||
|
||||
**Effort:** ~20 LOC. Unblocks Important follow-up #6 from today's
|
||||
code review.
|
||||
|
||||
## 7. Mint-rate limit (P3)
|
||||
|
||||
**Problem:** A compromised session or admin token could mint
|
||||
thousands of org tokens quickly, making forensic cleanup painful.
|
||||
|
||||
**Proposal:** Rate limit mint calls per-org: max N tokens per 5 min.
|
||||
Existing `middleware/ratelimit` package does exactly this — bind
|
||||
the limiter to the mint route with a low ceiling.
|
||||
|
||||
**Effort:** ~30 LOC. Do this before #5 — revoke-storms could hit
|
||||
the same pattern.
|
||||
|
||||
## 8. Audit log (P2)
|
||||
|
||||
**Problem:** Token revocation is logged to stdout. That's fine for
|
||||
Railway's retention window but ops want a queryable audit log.
|
||||
|
||||
**Proposal:** New table `org_token_audit` with (token_id, action,
|
||||
actor, occurred_at). Write on mint/revoke. Surface in admin
|
||||
diagnostics endpoint.
|
||||
|
||||
**Effort:** ~100 LOC + lightweight read API.
|
||||
|
||||
## 9. CLI for local development (P3)
|
||||
|
||||
**Problem:** Developers running canvas locally can't easily mint
|
||||
and use org tokens against their dev tenant because the UI
|
||||
requires a WorkOS session.
|
||||
|
||||
**Proposal:** `molecli org-token create --name <label>` uses
|
||||
`ADMIN_TOKEN` from env + `MOLECULE_ORG_URL` to mint. Same API,
|
||||
scripts-friendly.
|
||||
|
||||
**Effort:** ~80 LOC in molecli + a line in the docs guide.
|
||||
|
||||
## 10. Migrate ADMIN_TOKEN to org_api_tokens table (P4 — long-term)
|
||||
|
||||
**Problem:** `ADMIN_TOKEN` as an env var is a special case that
|
||||
every auth tier has to handle. Once org tokens are feature-
|
||||
complete (roles, expiry, binding), the env-var token is redundant
|
||||
and complicates the auth code.
|
||||
|
||||
**Proposal:** Bootstrap the tenant by inserting a row labeled
|
||||
`bootstrap` into `org_api_tokens` at provision time with the
|
||||
current ADMIN_TOKEN value's hash. Remove the env-var check entirely
|
||||
from AdminAuth. `ADMIN_TOKEN` becomes just "the initial token that
|
||||
happens to be stored as a normal row".
|
||||
|
||||
Requires: roles + expiry shipped first (bootstrap token needs to
|
||||
be demarcated as revocable-but-permanent-by-default).
|
||||
|
||||
**Effort:** ~150 LOC once prerequisites land.
|
||||
|
||||
---
|
||||
|
||||
## Tracked issues to file
|
||||
|
||||
Each of the above should become a GitHub issue when we're ready to
|
||||
work it. One-liner label for the batch: `area:org-api-keys`.
|
||||
|
||||
## Non-goals
|
||||
|
||||
Explicit list of things we do NOT want to add:
|
||||
|
||||
- JWT / signed tokens. Opaque bearers + DB lookup is simpler and
|
||||
matches every other token type in the system.
|
||||
- OAuth scopes. We're not a third-party OAuth provider; this is
|
||||
for internal integrations only.
|
||||
- IP allow-lists per token. Captured nominally by the usage log
|
||||
(#4) for detection, but enforcement adds operational friction
|
||||
(customer VPN changes → all tokens break).
|
||||
@ -146,22 +146,13 @@ DELETE /org/tokens/:id revoke; idempotent (404 on already-revoked)
|
||||
|
||||
All three behind `AdminAuth`. See `internal/handlers/org_tokens.go`.
|
||||
|
||||
## Follow-up roadmap
|
||||
## Known limitations
|
||||
|
||||
See `docs/architecture/org-api-keys-followups.md` for the full
|
||||
list; headline items:
|
||||
|
||||
1. **Role scoping**: split into ADMIN / EDITOR / READER tiers. Then
|
||||
WORKSPACE-SPECIFIC tokens ("this key can only touch workspace
|
||||
X"). Aligns with the AWS IAM-style direction the product wants.
|
||||
2. **Expiry**: optional `expires_at`, enforced in the hot-path
|
||||
query. Lets users mint short-lived tokens for specific jobs.
|
||||
3. **Usage metrics**: counter + last-request metadata
|
||||
(path/ip/user-agent) for the UI so users can see what a token
|
||||
is actually doing.
|
||||
4. **Rotation hooks**: webhook-on-revoke so integrations know to
|
||||
re-mint.
|
||||
5. **Capture WorkOS user_id in `created_by`** when minted via session
|
||||
(currently just records "session"). Requires propagating session
|
||||
identity from the CP's tenant-member check through
|
||||
`session_auth.go`.
|
||||
- Every token is full-org admin. Role scoping (admin / editor /
|
||||
reader) and per-workspace binding are planned but not shipped
|
||||
today.
|
||||
- No expiry / TTL. Tokens live until explicitly revoked.
|
||||
- Tokens minted via canvas session are audited as
|
||||
`created_by: "session"` without the WorkOS user_id. A specific
|
||||
user's mint activity can't be attributed from the table alone
|
||||
until the session identity is captured.
|
||||
|
||||
@ -131,10 +131,8 @@ Full API reference: `docs/api-reference.md`.
|
||||
Both unlock the same surface; the key is just the non-browser
|
||||
equivalent.
|
||||
|
||||
## What's coming
|
||||
## Current limits
|
||||
|
||||
Scoped roles (READ / WORKSPACE-WRITE / ORG-ADMIN), expiry timers,
|
||||
per-workspace bindings, and usage metrics are on the roadmap. See
|
||||
`docs/architecture/org-api-keys-followups.md`. For now every key
|
||||
is full-admin by design — trading scope granularity for beta
|
||||
shipping speed.
|
||||
Every key is full-admin. Scoped roles (read-only / workspace-
|
||||
write / admin), per-workspace bindings, and expiry are not yet
|
||||
supported — treat every key as equivalent to being logged in.
|
||||
|
||||
Loading…
Reference in New Issue
Block a user