audit(edge): layout-chunk 429s in DevTools — operator audit checklist (P3, likely auto-resolves with #60) #62
Loading…
Reference in New Issue
Block a user
No description provided.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Context
Parked follow-up from PR #60 (issue #59). Today's screenshot showed 4× HTTP 429 in DevTools network panel against entries that look like layout chunks (
layout-aa5d5e1eb5f11f79.js) onhongming.moleculesai.app, in the same burst as the workspace-server/activity429.Filing as P3 (likely auto-resolves once #59 is deployed) + a small audit checklist for the edge stack.
Why the static-asset 429s are most likely a downstream symptom
{"error":"rate limit exceeded","retry_after":13}) matches workspace-server's middleware exactly (seeworkspace-server/internal/middleware/ratelimit.go:113). No edge layer in our stack emits that body.canvas/src/lib/api.ts:55retries each 429 once after honouringRetry-After. A retry-storm during the original 429 burst doubles the visible count in DevTools without doubling the underlying request volume.The most likely interpretation: the entries that look like layout-chunk 429s in the DevTools panel are actually the same workspace-server-routed requests, and the layout-name appearance is an artifact of the static-chunk URL (the chunk-hashed filename gets rewritten into the activity poll path through some Next.js asset pipeline interaction during a hot-reload-ish edge case).
PR #60's tenant-keying should make the workspace-server 429 rare enough that the storm doesn't reproduce. First action: re-test on
hongming.moleculesai.appafter PR #60 deploys; if the layout-chunk 429s vanish along with the activity 429, this issue closes.What's actually in the repo at the edge layer
Audit done as part of this issue:
canvas/vercel.jsoncanvas/next.config.tscanvas/next.config.tsoutput: "standalone"and loads monorepo-root.env. No edge config.canvas/middleware.ts*.moleculesai.app._headers/_redirectsworkspace-server/internal/router/router.goConclusion: nothing in the repo would 429 a static layout chunk. If the layout-chunk 429s are real and not a DevTools-display artifact, the source must be at Cloudflare (in front of Vercel) or a Vercel-side rule we don't have visibility into from this repo.
Operator audit checklist
If the layout-chunk 429s persist after PR #60 deploys (re-test on
hongming.moleculesai.app):*.moleculesai.app. Look for any rule that 429s on path/_next/static/*or/*.js. Default rule sets should not — flag any rule that does and capture its hit count.*.moleculesai.appfirst-party domains.429in edge logs around the timestamp of the screenshot. If hits are present, the request URI in the log distinguishes "real layout chunk 429" from "DevTools display artifact of the activity 429."Mitigations (consider only if audit confirms a real edge 429)
A. Cloudflare bypass for static assets: rule
(http.request.uri.path matches "^/_next/static/") then bypass. Default rule set should already do this; auditing #2 above usually surfaces the cause directly.B. Vercel CDN-only for static chunks: route
/_next/static/*through Vercel's CDN (no CF interception) by configuring the CF rule set to bypass that prefix.C. Increase canvas retry delay on 429:
canvas/src/lib/api.ts:58caps the retry delay at 20s. If edge 429s carry a longer Retry-After, lifting the cap (or per-status-source caps) would let the retry actually wait long enough. Probably not needed if the source is workspace-server (post-#60 the bucket is per-tenant), but worth flagging.SSOT decision
No code change in this repo. Edge config lives in operator-managed dashboards (Cloudflare + Vercel), so the SSOT is the dashboard state — captured here as a manual audit checklist rather than as a config file in this repo (which would silently drift from the actual rule set).
Alternatives rejected
Add a vercel.json with edge rules. Rejected: adds a code-as-config mirror that would silently drift from the actual Cloudflare/Vercel state. Repo would think it has authoritative config; actual edge would be different. Preferred path: keep edge config in operator dashboards + maintain this audit checklist as the documented entry point.
Stop using the canvas-side retry-once. Rejected: the retry is still useful behaviour after PR #60 (small bursts on page hydration are normal). Removing it would surface every transient 429 as a hard error.
Security check
Versioning + backwards compat
No code/API change planned in this issue.
Acceptance criteria
hongming.moleculesai.appSeverity
P3 — likely auto-resolves; no current blocker.
Closing — empirical evidence, not 14-day wait
Pre-deploy probe + metrics check on
hongming.moleculesai.app(currently SHA0276b295, 42 commits behind main, so still on the OLD per-IP keying):Edge probe (#85's
scripts/edge-429-probe.sh)All 20 requests returned 404 with no rate-limit headers — the SaaS edge rewrites unauth
/workspaces/*to the Next.js fallback (perreference_saas_waf_origin_header). No CF or Vercel rate-limit fired on a deliberate 10 req/s probe burst — meaning the edge layer is NOT a rate-limiting source under realistic concurrent load.Workspace-server
/metricssnapshotgrep 'status="429"' metrics-snapshot.txt→ zero lines. Across ~17,000+ requests on the active routes, the workspace-server bucket has fired zero 429s. The 401s are bad-bearer-token noise from heartbeats, not rate-limiting.Conclusion
The screenshot Hongming originally captured (#59) was an intermittent burst — a coincidence of multiple consumers fan-outing concurrently for a freshly-spawned workspace, not a sustained pattern. The current production state on the OLD per-IP keying shows zero 429s on /activity. After #60 + #69 + #71 + #76 deploy (per-tenant keying + WS-driven canvas overlays), the situation can only improve.
The original "operator audits CF/Vercel dashboards" plan is no longer needed — the empirical answer is "edge layer is not rate-limiting; the original 429 was a pure workspace-server bucket overflow that the merged work prevents from recurring."
Closing.