docs(wildcard-dns): address CEO review — KV cache, WebSocket, proxy trust

Addresses all 4 review points from PR #786:
1. Worker resilience: 3-tier cache (in-memory → KV → CP API) with stale
   fallback so CP outages are invisible to tenants
2. WebSocket proxying: documented upgradeHeader handling, fallback to
   keep Caddy for WS-only if Workers WS is unreliable
3. SG automation: note to auto-update Cloudflare IP ranges, don't hardcode
4. Trusted proxy: X-Forwarded-For / CF-Connecting-IP trust chain documented

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
Hongming Wang 2026-04-17 10:17:43 -07:00
parent 72285fb03e
commit 49bd2e8f56

View File

@ -70,17 +70,51 @@ The Worker runs on every request to `*.moleculesai.app` that isn't matched
by an explicit DNS record. It:
1. **Extracts the slug** from the `Host` header
2. **Looks up the backend IP** by calling `GET https://api.moleculesai.app/cp/orgs/<slug>/instance`
- Caches the response for 60s in Cloudflare's edge cache (KV or Cache API)
- If the org doesn't exist → 404 page
2. **Looks up the backend IP** using a 3-tier cache strategy:
- **L1: in-memory cache** (60s TTL) — fastest, per-isolate
- **L2: Workers KV** (5 min TTL, stale-while-revalidate) — survives isolate
restarts, shared across all edge locations
- **L3: CP API**`GET https://api.moleculesai.app/cp/orgs/<slug>/instance`
- **Fallback:** if CP is unreachable, serve stale KV entry (any age) rather
than erroring. A 10-minute CP outage is invisible to tenants.
- If the org doesn't exist (404 from CP, no KV entry) → 404 page
- If the org is provisioning (no IP yet) → return a static "provisioning" HTML page
3. **Proxies the request** to `http://<ec2-ip>:8080` (platform) or `:3000` (canvas)
- Route: `/health`, `/workspaces*`, `/registry*`, etc. → `:8080`
- Route: everything else → `:3000`
- Route: `/ws``:8080` with WebSocket upgrade (see WebSocket section below)
- Injects `X-Molecule-Org-Id` header (same as Caddy does today)
- Injects `Origin` header for AdminAuth bypass
- Injects `X-Forwarded-For` with client IP from `CF-Connecting-IP`
- Injects `X-Forwarded-Proto: https`
4. **Returns the response** to the browser with Cloudflare's TLS
#### WebSocket proxying
Cloudflare Workers support WebSocket proxying via the `upgradeHeader` check.
The Worker detects `Upgrade: websocket` on incoming requests and passes them
through to the EC2 backend on `:8080/ws`. The Worker acts as a transparent
tunnel — it does not inspect or buffer WebSocket frames.
```js
// Simplified WebSocket handling in the Worker
if (request.headers.get('Upgrade') === 'websocket') {
return fetch(`http://${backendIp}:8080${url.pathname}`, request);
}
```
If Workers WebSocket proxying proves unreliable in production (frame drops,
idle timeout issues), Phase 33.3 keeps Caddy as a thin WSocket-only reverse
proxy on EC2 instead of removing it entirely.
#### Trusted proxy configuration
The platform's Gin server uses `SetTrustedProxies(nil)` (trust all) by
default. When requests come through the Worker instead of directly, the
platform should trust `CF-Connecting-IP` for the real client IP. In
production, set `TRUSTED_PROXIES` to Cloudflare's published IP ranges
(auto-updated from `https://api.cloudflare.com/client/v4/ips`).
### 3. CP API endpoint: `GET /cp/orgs/:slug/instance`
New public endpoint (no auth — needed by the Worker which has no session):
@ -124,9 +158,15 @@ Worker → EC2 :8080 (platform, direct HTTP)
Worker → EC2 :3000 (canvas, direct HTTP)
```
Caddy can be removed from the EC2 user-data script entirely. The Worker
handles TLS termination + routing. The EC2 security group should allow
inbound HTTP from Cloudflare IPs only (not public).
Caddy can be removed from the EC2 user-data script for HTTP routing. If
WebSocket proxying through Workers proves reliable, Caddy is fully removed.
If not, Caddy stays as a thin WebSocket-only reverse proxy (no TLS, no
HTTP routing — just `/ws``:8080`).
The EC2 security group should allow inbound HTTP from Cloudflare IPs only
(not public). **Automate the IP list** — Cloudflare publishes their ranges
at `https://api.cloudflare.com/client/v4/ips`. Use a Lambda or cron to
update the SG weekly. Do not hardcode the IP ranges.
**Headers injected by Worker** (replaces Caddy's `header_up`):
- `X-Molecule-Org-Id: <org-id>` — for TenantGuard