molecule-core/workspace-server/internal/wsauth
Hongming Wang b040171fa1 perf(wsauth): in-process cache for platform_inbound_secret reads
Heartbeats fire every 60s per workspace and were the dominant caller
of ReadPlatformInboundSecret — one DB SELECT each, purely to redeliver
the same value. For an N-workspace fleet that's N SELECTs/minute of
pure overhead, growing linearly with the fleet (#189).

This adds a sync.Map cache keyed by workspaceID with a 5-minute TTL:

- **Read-through**: cache miss → DB SELECT → populate → return.
- **Write-through**: every IssuePlatformInboundSecret call refreshes
  the cache with the new value before returning, so the lazy-heal mint
  path (readOrLazyHealInboundSecret) doesn't see a stale read of the
  value it just wrote.
- **TTL eviction**: 5 minutes — generous enough that the heartbeat
  hot path hits cache for ~5 reads in a row before re-validating, short
  enough that an out-of-band rotation (operator running
  `UPDATE workspaces SET platform_inbound_secret=...` directly)
  propagates within minutes without requiring a redeploy.
- **Absence not cached**: ErrNoInboundSecret skips the cache write so
  the lazy-heal recovery contract for the column-NULL case
  (readOrLazyHealInboundSecret in workspace_provision_shared.go) keeps
  working.

Memory footprint is bounded by the active workspace fleet (~200 bytes
per entry); deleted workspaces leave dead entries until process restart,
acceptable given workspace-deletion is operator-rare.

Why in-process instead of Redis: workspace-server runs as a single
Railway service today (per memory project_controlplane_ownership);
adding Redis for this single column read would be over-engineering.
The cache is a self-contained, Redis-free upgrade that keeps the same
semantic surface (read returns the latest secret) while collapsing
the heartbeat read storm. If the deployment ever fans out across
replicas, an operator-side rotation propagates per-replica TTL-bounded
without needing a shared write log.

Tests: 5 new cases covering cache hit within TTL, refresh after TTL
(simulating an operator rotation via SQL), write-through on Issue,
absence-not-cached, and Reset clearing all entries. The setupMock
helper in wsauth and setupTestDB helper in handlers both call
ResetInboundSecretCacheForTesting() at start + cleanup so write-through
state from one test doesn't shadow SELECT expectations in the next.
SetInboundSecretCacheNowForTesting() exposes a deterministic clock
override so the TTL test doesn't sleep.

Task: #189.
2026-05-03 00:04:38 -07:00
..
architecture_test.go test(arch): codify 4 module boundaries as architecture tests (#2344) 2026-04-29 22:12:58 -07:00
platform_inbound_test.go perf(wsauth): in-process cache for platform_inbound_secret reads 2026-05-03 00:04:38 -07:00
platform_inbound.go perf(wsauth): in-process cache for platform_inbound_secret reads 2026-05-03 00:04:38 -07:00
tokens_test.go perf(wsauth): in-process cache for platform_inbound_secret reads 2026-05-03 00:04:38 -07:00
tokens.go refactor(wsauth): extract lookupTokenByHash to dedup auth predicate across 3 callers 2026-04-30 03:11:38 -07:00