molecule-core/docs/engineering/ratelimit-observability.md
security-auditor 62e793040e
All checks were successful
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 28s
Check merge_group trigger on required workflows / Required workflows have merge_group trigger (pull_request) Successful in 35s
branch-protection drift check / Branch protection drift (pull_request) Successful in 36s
CI / Detect changes (pull_request) Successful in 21s
CodeQL / Analyze (${{ matrix.language }}) (javascript-typescript) (pull_request) Successful in 8s
CodeQL / Analyze (${{ matrix.language }}) (python) (pull_request) Successful in 8s
CodeQL / Analyze (${{ matrix.language }}) (go) (pull_request) Successful in 9s
E2E API Smoke Test / detect-changes (pull_request) Successful in 22s
Lint curl status-code capture / Scan workflows for curl status-capture pollution (pull_request) Successful in 17s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 20s
Retarget main PRs to staging / Retarget to staging (pull_request) Has been skipped
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 23s
Harness Replays / detect-changes (pull_request) Successful in 23s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 24s
CI / Platform (Go) (pull_request) Successful in 12s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 19s
CI / Python Lint & Test (pull_request) Successful in 17s
CI / Canvas (Next.js) (pull_request) Successful in 24s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 14s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
CI / Shellcheck (E2E scripts) (pull_request) Successful in 29s
Harness Replays / Harness Replays (pull_request) Successful in 9s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 16s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 10s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 13s
Ops Scripts Tests / Ops scripts (unittest) (pull_request) Successful in 1m1s
chore(observability): edge-429 probe + ratelimit observability runbook
Two artifacts that unblock the parked follow-ups from #59:

  1. scripts/edge-429-probe.sh (closes the "operator-blocked" status of
     #62). An operator without CF/Vercel dashboard access can reproduce
     a canvas-sized burst against a tenant subdomain and read each 429's
     response shape — workspace-server bucket overflow (JSON body +
     X-RateLimit-* headers) is distinguishable from CF (cf-ray) and
     Vercel (x-vercel-id) by inspection of the report. Read-only,
     parallel via background subshells (no GNU parallel dependency),
     no credential use. Smoke-tested against example.com end-to-end.

  2. docs/engineering/ratelimit-observability.md (closes the
     "metric-blocked" status of #64). The existing
     molecule_http_requests_total{path,status} counter + X-RateLimit-*
     response headers already cover #64's acceptance criterion ("watch
     metrics for two weeks"). The runbook collects the PromQL queries,
     a decision tree for the re-tune (keep / per-tenant override /
     change default), an alert rule template, and a hard "do not roll
     ad-hoc per-bucket-key exposure" note (in-memory map includes
     SHA-256 of bearer tokens — exposing it is a security review
     surface, file a follow-up if needed).

Neither artifact changes runtime behaviour. Pure operational tooling.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 15:48:34 -07:00

5.7 KiB

Rate-limit observability runbook

Companion to issue #64 ("RATE_LIMIT default re-tune analysis"). After #60 deployed the per-tenant keyFor keying, the right RATE_LIMIT default became data-dependent. This runbook documents the metrics + queries an operator should run to confirm whether the current 600 req/min/key default is correct, too tight, or too loose.

What's already exposed

The workspace-server's existing Prometheus middleware (workspace-server/internal/metrics/metrics.go) tracks every request on every path:

molecule_http_requests_total{method, path, status}      counter
molecule_http_request_duration_seconds_total{method,path,status}  counter

Path is the matched route pattern (/workspaces/:id/activity etc), so high-cardinality workspace UUIDs do not explode the label space.

The rate limiter middleware (#60, workspace-server/internal/middleware/ratelimit.go) also stamps every response with X-RateLimit-Limit, X-RateLimit-Remaining, and X-RateLimit-Reset. Operators with browser-side or proxy-side header capture can read per-request bucket state directly.

No new instrumentation is needed for #64's acceptance criteria. The metric surface is sufficient — this runbook just collects the queries.

Queries to run after #60 deploys

1. Is the bucket actually firing 429s?

sum(rate(molecule_http_requests_total{status="429"}[5m]))

If this is zero on a given tenant, the bucket isn't being hit. If it's sustained > 1/min, dig in.

2. Which routes attract 429s?

topk(
  10,
  sum by (path) (
    rate(molecule_http_requests_total{status="429"}[5m])
  )
)

Expected shape post-#60:

  • /workspaces/:id/activity should be near zero — the canvas no longer polls it on a 30s/60s/5s cadence (PRs #69 / #71 / #76).
  • Probe / health / heartbeat paths should be ~0 (those routes have a separate IP-fallback bucket).

If /workspaces/:id/activity 429s persist post-PRs-69/71/76 deploy, the canvas isn't running the WS-subscriber path — investigate WS health on that tenant.

3. Per-bucket-key inference (no direct exposure today)

The bucket map itself is in-memory only; we deliberately do not expose org:<uuid> ↔ remaining-tokens because that map can include SHA-256 hashes of bearer tokens. A tenant that wants per-key visibility should rely on response headers (X-RateLimit-Remaining on every response from a given session is the bucket's view of that session).

If you genuinely need server-side per-bucket counts for triage, file a follow-up — the proper shape is a /internal/ratelimit-stats endpoint that emits counts per key prefix only (e.g. org:, tok:, ip:), never the key payloads. Don't roll that ad-hoc; it's a security review surface.

Decision tree for the re-tune

After 14 days of production traffic on a tenant, look at the queries above and walk this tree:

Q1: Is the 429 rate sustained > 0.1/sec on any tenant?
  ├─ NO  → The 600 default has comfortable headroom. Either keep it,
  │        or lower it carefully (300) ONLY if you have a documented
  │        reason (e.g. a misbehaving client we want to throttle harder).
  │        Default to "no change" — see #64 for the math.
  └─ YES → Q2.

Q2: Is the 429 rate concentrated on ONE tenant or spread across many?
  ├─ ONE tenant → Operator override: set RATE_LIMIT=1200 or 1800 on that
  │               tenant's box. Document in the tenant's ops note. The
  │               default does not need to change.
  └─ MANY tenants → Q3.

Q3: Are the 429s on a route that polls (e.g. /activity / /peers)?
  ├─ YES → Confirm PRs #69, #71, #76 have actually deployed to those
  │         tenants. If they have and 429s persist, the canvas may have
  │         a regression — do not raise RATE_LIMIT. File a canvas issue.
  └─ NO  → 429s on mutating routes mean genuine load. Raise the default
            to 1200 in `workspace-server/internal/router/router.go:54`.
            Same PR should attach: the metric chart, the time window,
            and a paragraph explaining what changed in our traffic shape.

Alert rule template (drop-in for Prometheus)

# Sustained 429s — file is the SLO trip-wire. If this fires, walk the
# decision tree above. NB: the issue#64 acceptance criterion is "two
# weeks of metrics"; this alert is the inverse — it tells you something
# changed before the two weeks are up.
groups:
  - name: workspace-server-ratelimit
    rules:
      - alert: WorkspaceServerRateLimit429Sustained
        expr: |
          sum by (instance) (
            rate(molecule_http_requests_total{status="429"}[10m])
          ) > 0.1          
        for: 30m
        labels:
          severity: warning
          owner: workspace-server
        annotations:
          summary: "{{ $labels.instance }} sustained 429s — see ratelimit-observability runbook"
          runbook: "https://git.moleculesai.app/molecule-ai/molecule-core/blob/main/docs/engineering/ratelimit-observability.md"

Threshold rationale: 0.1 req/s = 6/min sustained over 10min. Below that, a 429 is almost certainly a transient burst that the canvas's retry-once handler at canvas/src/lib/api.ts:55 already absorbs. The 30m for: keeps the alert from chattering on a brief blip.

Companion probe script

For one-off triage when an operator can reproduce the problem in their own browser, scripts/edge-429-probe.sh (#62) reproduces a canvas- sized burst against a tenant subdomain and dumps each 429's response shape so the operator can distinguish workspace-server bucket overflow from CF/Vercel edge rate-limiting without dashboard access.

./scripts/edge-429-probe.sh hongming.moleculesai.app --burst 80 --out /tmp/edge.txt

The script's report header explains how to read the output.