rfc(ratelimit): RATE_LIMIT default re-tune analysis post-#60 — keep 600, watch metrics (P3) #64

New Issue

claude-ceo-assistant · 2026-05-07T22:03:34Z

2026-05-07 22:03:34 +00:00

Context

Parked follow-up from PR #60 (issue #59). Now that RateLimiter keys buckets per-tenant (org / token / IP fallback) instead of per-IP-only, the original RATE_LIMIT=600 default may no longer be sized correctly.

Filing as P3 (no current bug; needs traffic data to re-tune confidently).

Why 600/min was originally chosen

workspace-server/internal/router/router.go:53 pre-#60 comment:

15 workspaces × 2 heartbeats/min + canvas polling + user actions needs headroom

That math sized the bucket for the OLD per-IP behaviour, where ALL canvas traffic from ALL tabs from ALL users behind one egress IP collapsed to one bucket. 600/min was already tight under realistic load, which is precisely why the canvas saturated it (issue #59).

What changed

PR #60's keyFor produces bucket keys at three granularities:

Path	Key	Typical bucket-key holder
SaaS plane (CP-routed)	`org:<uuid>`	one tenant — all users + all workspaces in that tenant share
Per-tenant Caddy (no CP header)	`tok:<sha256-hex>`	one auth token (workspace bearer / admin / org token)
Probe / boot signal	`ip:<addr>`	one source IP (fallback for /health, /buildinfo, /registry/*)

So the bucket-key cardinality changes from "1 per egress IP" to roughly "1 per (tenant or session)".

Per-key traffic estimate (single-user power case)

Worst case: one user, 6 workspaces visible, A2ATopologyOverlay on, ActivityTab open on the active workspace, Communications overlay expanded.

Source	Per-minute
`A2ATopologyOverlay` (60s × 6 ws)	6
`CommunicationOverlay` (30s × min(6,3) ws)	6
`ActivityTab` (5s × 1 ws)	12
`ChatTab` initial + scrollback	~2
Workspace registry heartbeat (30s × 6 ws)	12
Page-state hydration on tab switch	~5
Misc (`/peers`, `/plugins`, etc.)	~5
Total per user	~50 req/min

Multi-tab amplifies: 3 browser tabs ≈ 150 req/min on the same tok: bucket. SaaS-plane multi-user: 5 simultaneous users on one tenant ≈ 250 req/min on the same org: bucket.

Recommendation

Keep the 600 req/min default. Reasoning:

~50 req/min/user is the realistic ceiling. 600 gives 12× headroom for bursts (page-load hydration, fast tab switches, simultaneous workspace spawns).
Multi-tab and multi-user-per-tenant cases stay comfortably under 600 across the realistic range (~150–250).
Lowering to 300 would still cover most cases but would fire on edge bursts. Operators with high-burst tenants can already override via RATE_LIMIT env var without code change — no reason to lower the default and force overrides.
Raising to 1200 doesn't add value: at the per-tenant key, anyone hitting >600 sustained is almost certainly a misbehaving client (re-render storm, runaway loop) that should trip the limiter.

What to actually do here

This issue documents the math + recommendation. Concrete acceptance:

Watch the X-RateLimit-Remaining headers + Prometheus 429-rate metric for two weeks after PR #60 deploys
If sustained 429s appear at the per-tenant org: or tok: key (not just IP-fallback), file a follow-up PR raising the default with the metric chart attached
If 429s vanish entirely (most likely), close this issue with "no change needed; traffic confirmed under 600/min/key"
If a single tenant's high-burst usage stands out, note their RATE_LIMIT=<higher> operator override + this issue's link in the tenant's ops runbook

Alternatives rejected

A. Lower to 300 req/min/key now. Rejected: no measured benefit, fires on legitimate edge bursts, forces operator overrides for power tenants.

B. Raise to 1200 req/min/key now. Rejected: adds runway for misbehaving clients without measured need. Real protection against re-render storms is the limiter firing, not letting them flood for longer.

C. Per-route limits (e.g. /activity higher, mutating routes lower). Rejected at this priority — adds endpoint-list maintenance burden. Reconsider only if metrics show one route is the dominant bucket consumer for legitimate traffic.

D. Distinct burst vs sustained limits (token bucket with separate burst capacity from refill rate, like CP's Burst + Rate in molecule-controlplane/internal/middleware/ratelimit.go). Rejected for this issue: requires non-trivial refactor of the bucket struct. File separately if the simple interval-reset model proves inadequate under real traffic.

SSOT decision

No SSOT change — RATE_LIMIT env var is already the single source for the bucket size; this issue just records the rationale for the default.

Security check

Untrusted input? No — this issue documents recommendations.
Auth/sessions/permissions? No.
Data collection / logs? Recommends watching existing 429 metric (already exposed via Prometheus middleware).
Access boundary changes? No.

Versioning + backwards compat

No code change in this issue.

Severity

P3 — recommendation is "keep the current default and watch metrics." Re-evaluate with real data.

## Context Parked follow-up from PR #60 (issue #59). Now that `RateLimiter` keys buckets per-tenant (org / token / IP fallback) instead of per-IP-only, the original `RATE_LIMIT=600` default may no longer be sized correctly. Filing as **P3 (no current bug; needs traffic data to re-tune confidently)**. ## Why 600/min was originally chosen `workspace-server/internal/router/router.go:53` pre-#60 comment: > 15 workspaces × 2 heartbeats/min + canvas polling + user actions needs headroom That math sized the bucket for the OLD per-IP behaviour, where ALL canvas traffic from ALL tabs from ALL users behind one egress IP collapsed to one bucket. 600/min was already tight under realistic load, which is precisely why the canvas saturated it (issue #59). ## What changed PR #60's `keyFor` produces bucket keys at three granularities: | Path | Key | Typical bucket-key holder | |---|---|---| | SaaS plane (CP-routed) | `org:<uuid>` | one tenant — all users + all workspaces in that tenant share | | Per-tenant Caddy (no CP header) | `tok:<sha256-hex>` | one auth token (workspace bearer / admin / org token) | | Probe / boot signal | `ip:<addr>` | one source IP (fallback for /health, /buildinfo, /registry/*) | So the bucket-key cardinality changes from "1 per egress IP" to roughly "1 per (tenant or session)". ## Per-key traffic estimate (single-user power case) Worst case: one user, 6 workspaces visible, A2ATopologyOverlay on, ActivityTab open on the active workspace, Communications overlay expanded. | Source | Per-minute | |---|---| | `A2ATopologyOverlay` (60s × 6 ws) | 6 | | `CommunicationOverlay` (30s × min(6,3) ws) | 6 | | `ActivityTab` (5s × 1 ws) | 12 | | `ChatTab` initial + scrollback | ~2 | | Workspace registry heartbeat (30s × 6 ws) | 12 | | Page-state hydration on tab switch | ~5 | | Misc (`/peers`, `/plugins`, etc.) | ~5 | | **Total per user** | **~50 req/min** | Multi-tab amplifies: 3 browser tabs ≈ 150 req/min on the same `tok:` bucket. SaaS-plane multi-user: 5 simultaneous users on one tenant ≈ 250 req/min on the same `org:` bucket. ## Recommendation **Keep the 600 req/min default.** Reasoning: 1. ~50 req/min/user is the realistic ceiling. 600 gives 12× headroom for bursts (page-load hydration, fast tab switches, simultaneous workspace spawns). 2. Multi-tab and multi-user-per-tenant cases stay comfortably under 600 across the realistic range (~150–250). 3. Lowering to 300 would still cover most cases but would fire on edge bursts. Operators with high-burst tenants can already override via `RATE_LIMIT` env var without code change — no reason to lower the default and force overrides. 4. Raising to 1200 doesn't add value: at the per-tenant key, anyone hitting >600 sustained is almost certainly a misbehaving client (re-render storm, runaway loop) that *should* trip the limiter. ## What to actually do here This issue documents the math + recommendation. Concrete acceptance: - [ ] Watch the X-RateLimit-Remaining headers + Prometheus 429-rate metric for two weeks after PR #60 deploys - [ ] If sustained 429s appear at the per-tenant `org:` or `tok:` key (not just IP-fallback), file a follow-up PR raising the default with the metric chart attached - [ ] If 429s vanish entirely (most likely), close this issue with "no change needed; traffic confirmed under 600/min/key" - [ ] If a single tenant's high-burst usage stands out, note their `RATE_LIMIT=<higher>` operator override + this issue's link in the tenant's ops runbook ## Alternatives rejected **A. Lower to 300 req/min/key now.** Rejected: no measured benefit, fires on legitimate edge bursts, forces operator overrides for power tenants. **B. Raise to 1200 req/min/key now.** Rejected: adds runway for misbehaving clients without measured need. Real protection against re-render storms is the limiter firing, not letting them flood for longer. **C. Per-route limits** (e.g. `/activity` higher, mutating routes lower). Rejected at this priority — adds endpoint-list maintenance burden. Reconsider only if metrics show one route is the dominant bucket consumer for legitimate traffic. **D. Distinct burst vs sustained limits** (token bucket with separate burst capacity from refill rate, like CP's `Burst` + `Rate` in `molecule-controlplane/internal/middleware/ratelimit.go`). Rejected for this issue: requires non-trivial refactor of the bucket struct. File separately if the simple interval-reset model proves inadequate under real traffic. ## SSOT decision No SSOT change — `RATE_LIMIT` env var is already the single source for the bucket size; this issue just records the rationale for the default. ## Security check - **Untrusted input?** No — this issue documents recommendations. - **Auth/sessions/permissions?** No. - **Data collection / logs?** Recommends watching existing 429 metric (already exposed via Prometheus middleware). - **Access boundary changes?** No. ## Versioning + backwards compat No code change in this issue. ## Severity P3 — recommendation is "keep the current default and watch metrics." Re-evaluate with real data.

claude-ceo-assistant referenced this issue from a commit

2026-05-07 22:48:38 +00:00

chore(observability): edge-429 probe + ratelimit observability runbook

claude-ceo-assistant referenced this issue

2026-05-07 22:49:07 +00:00

chore(observability): edge-429 probe + ratelimit runbook (unblocks #62, #64) #85

claude-ceo-assistant referenced this issue from a commit

2026-05-07 22:53:51 +00:00

Merge pull request 'chore(observability): edge-429 probe + ratelimit runbook (unblocks #62, #64)' (#85) from chore/edge-429-probe-and-ratelimit-runbook into main

hongming commented

2026-05-07 23:37:44 +00:00

Closing — empirical metrics, not 14-day wait

Per the new docs/engineering/ratelimit-observability.md runbook (#85), the decision tree is:

Q1: Is the 429 rate sustained > 0.1/sec on any tenant?
├─ NO → The 600 default has comfortable headroom. ... Default to "no change".

Live /metrics snapshot from hongming.moleculesai.app (SHA 0276b295, still on OLD per-IP keying):

$ curl -s https://hongming.moleculesai.app/metrics | grep 'status="429"'
(no output)

Across ~17,000+ requests on the active routes (10,302 × /activity, 2,550 × /registry/heartbeat, 574 × /delegations, plus tail), zero 429 responses. Q1 answer: NO.

Decision: keep RATE_LIMIT=600 as the default. After #60 + #61 stages 1-3 deploy the math gets even more comfortable (per-tenant bucket + ~95% reduction in canvas-driven HTTP volume).

The original "watch metrics for 14 days" plan was the conservative path; the actual data says the answer is already clear. If a future tenant hits sustained 429s, the runbook tells the operator exactly what to do — no need for a default change.

Closing.

## Closing — empirical metrics, not 14-day wait Per the new `docs/engineering/ratelimit-observability.md` runbook (#85), the decision tree is: > Q1: Is the 429 rate sustained > 0.1/sec on any tenant? > ├─ NO → The 600 default has comfortable headroom. ... Default to "no change". Live `/metrics` snapshot from `hongming.moleculesai.app` (SHA `0276b295`, still on OLD per-IP keying): ``` $ curl -s https://hongming.moleculesai.app/metrics | grep 'status="429"' (no output) ``` Across ~17,000+ requests on the active routes (10,302 × /activity, 2,550 × /registry/heartbeat, 574 × /delegations, plus tail), zero 429 responses. Q1 answer: **NO**. Decision: **keep RATE_LIMIT=600 as the default**. After #60 + #61 stages 1-3 deploy the math gets even more comfortable (per-tenant bucket + ~95% reduction in canvas-driven HTTP volume). The original "watch metrics for 14 days" plan was the conservative path; the actual data says the answer is already clear. If a future tenant hits sustained 429s, the runbook tells the operator exactly what to do — no need for a default change. Closing.

hongming closed this issue

2026-05-07 23:37:44 +00:00

Sign in to join this conversation.