forked from molecule-ai/molecule-core
Heartbeats fire every 60s per workspace and were the dominant caller of ReadPlatformInboundSecret — one DB SELECT each, purely to redeliver the same value. For an N-workspace fleet that's N SELECTs/minute of pure overhead, growing linearly with the fleet (#189). This adds a sync.Map cache keyed by workspaceID with a 5-minute TTL: - **Read-through**: cache miss → DB SELECT → populate → return. - **Write-through**: every IssuePlatformInboundSecret call refreshes the cache with the new value before returning, so the lazy-heal mint path (readOrLazyHealInboundSecret) doesn't see a stale read of the value it just wrote. - **TTL eviction**: 5 minutes — generous enough that the heartbeat hot path hits cache for ~5 reads in a row before re-validating, short enough that an out-of-band rotation (operator running `UPDATE workspaces SET platform_inbound_secret=...` directly) propagates within minutes without requiring a redeploy. - **Absence not cached**: ErrNoInboundSecret skips the cache write so the lazy-heal recovery contract for the column-NULL case (readOrLazyHealInboundSecret in workspace_provision_shared.go) keeps working. Memory footprint is bounded by the active workspace fleet (~200 bytes per entry); deleted workspaces leave dead entries until process restart, acceptable given workspace-deletion is operator-rare. Why in-process instead of Redis: workspace-server runs as a single Railway service today (per memory project_controlplane_ownership); adding Redis for this single column read would be over-engineering. The cache is a self-contained, Redis-free upgrade that keeps the same semantic surface (read returns the latest secret) while collapsing the heartbeat read storm. If the deployment ever fans out across replicas, an operator-side rotation propagates per-replica TTL-bounded without needing a shared write log. Tests: 5 new cases covering cache hit within TTL, refresh after TTL (simulating an operator rotation via SQL), write-through on Issue, absence-not-cached, and Reset clearing all entries. The setupMock helper in wsauth and setupTestDB helper in handlers both call ResetInboundSecretCacheForTesting() at start + cleanup so write-through state from one test doesn't shadow SELECT expectations in the next. SetInboundSecretCacheNowForTesting() exposes a deterministic clock override so the TTL test doesn't sleep. Task: #189. |
||
|---|---|---|
| .. | ||
| architecture_test.go | ||
| platform_inbound_test.go | ||
| platform_inbound.go | ||
| tokens_test.go | ||
| tokens.go | ||