molecule-core

Author	SHA1	Message	Date
Hongming Wang	b040171fa1	perf(wsauth): in-process cache for platform_inbound_secret reads Heartbeats fire every 60s per workspace and were the dominant caller of ReadPlatformInboundSecret — one DB SELECT each, purely to redeliver the same value. For an N-workspace fleet that's N SELECTs/minute of pure overhead, growing linearly with the fleet (#189). This adds a sync.Map cache keyed by workspaceID with a 5-minute TTL: - Read-through: cache miss → DB SELECT → populate → return. - Write-through: every IssuePlatformInboundSecret call refreshes the cache with the new value before returning, so the lazy-heal mint path (readOrLazyHealInboundSecret) doesn't see a stale read of the value it just wrote. - TTL eviction: 5 minutes — generous enough that the heartbeat hot path hits cache for ~5 reads in a row before re-validating, short enough that an out-of-band rotation (operator running `UPDATE workspaces SET platform_inbound_secret=...` directly) propagates within minutes without requiring a redeploy. - Absence not cached: ErrNoInboundSecret skips the cache write so the lazy-heal recovery contract for the column-NULL case (readOrLazyHealInboundSecret in workspace_provision_shared.go) keeps working. Memory footprint is bounded by the active workspace fleet (~200 bytes per entry); deleted workspaces leave dead entries until process restart, acceptable given workspace-deletion is operator-rare. Why in-process instead of Redis: workspace-server runs as a single Railway service today (per memory project_controlplane_ownership); adding Redis for this single column read would be over-engineering. The cache is a self-contained, Redis-free upgrade that keeps the same semantic surface (read returns the latest secret) while collapsing the heartbeat read storm. If the deployment ever fans out across replicas, an operator-side rotation propagates per-replica TTL-bounded without needing a shared write log. Tests: 5 new cases covering cache hit within TTL, refresh after TTL (simulating an operator rotation via SQL), write-through on Issue, absence-not-cached, and Reset clearing all entries. The setupMock helper in wsauth and setupTestDB helper in handlers both call ResetInboundSecretCacheForTesting() at start + cleanup so write-through state from one test doesn't shadow SELECT expectations in the next. SetInboundSecretCacheNowForTesting() exposes a deterministic clock override so the TTL test doesn't sleep. Task: #189.	2026-05-03 00:04:38 -07:00
Hongming Wang	64822dac49	refactor(wsauth): extract lookupTokenByHash to dedup auth predicate across 3 callers ValidateToken, WorkspaceFromToken, and ValidateAnyToken each duplicated the same JOIN+WHERE auth predicate: FROM workspace_auth_tokens t JOIN workspaces w ON w.id = t.workspace_id WHERE t.token_hash = $1 AND t.revoked_at IS NULL AND w.status != 'removed' Same drift class as the SaaS provision-mint bug fixed in #2366. A future safety addition (e.g. exclude paused workspaces from auth) had to be applied to all three queries; a partial application would silently re-open one auth path while closing the others. Fix: hoist the predicate into lookupTokenByHash, which projects (id, workspace_id) — the union of fields any caller needs. Each public function picks what it uses: - ValidateToken — needs both (compares workspaceID, updates last_used_at by id) - WorkspaceFromToken — needs workspace_id - ValidateAnyToken — needs id The trivial perf cost of selecting one extra column per call is worth the single-source-of-truth guarantee for the auth predicate. Test mock updates: two upstream test files (a2a_proxy_test, middleware wsauth_middleware_test{,_canvasorbearer_test}) had hand-typed regex matchers and row shapes pinned to the per-function SELECT projection. Updated to the unified shape; behavior is unchanged. All wsauth + middleware + handlers + full-module tests green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 03:11:38 -07:00
Hongming Wang	ca6fc55c8b	fix(a2a_proxy): derive callerID from bearer when X-Workspace-ID absent (#2306 ) External callers (third-party SDKs, the channel plugin) authenticate purely via bearer and frequently don't set the X-Workspace-ID header. Without this, activity_logs.source_id ends up NULL — breaking the peer_id signal on notifications, the "Agent Comms by peer" canvas tab, and any analytics that breaks down inbound A2A by sender. The bearer is the authoritative caller identity per the wsauth contract (it's what proves who you are); the header is a display/routing hint that must agree with it. So we derive callerID from the bearer's owning workspace whenever the header is absent. The existing validateCallerToken guard fires after this and enforces token-to-callerID binding the same way it always has. Org-token requests are skipped — those grant org-wide access and don't bind to a single workspace, so the canvas-class semantics (callerID="") are preserved. Bearer-resolution failures (revoked, removed workspace) fall through to canvas-class as well, never 401. New wsauth.WorkspaceFromToken exposes the bearer→workspace lookup as a modular interface; mirrors ValidateAnyToken's defense-in-depth JOIN on workspaces.status != 'removed'. Tests: 4 unit tests on WorkspaceFromToken + 3 integration tests on ProxyA2A covering the three observable paths (bearer-derived, org-token skipped, derive-failure fallthrough). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 16:05:56 -07:00
Hongming Wang	479a027e4b	chore: open-source restructure — rename dirs, remove internal files, scrub secrets Renames: - platform/ → workspace-server/ (Go module path stays as "platform" for external dep compat — will update after plugin module republish) - workspace-template/ → workspace/ Removed (moved to separate repos or deleted): - PLAN.md — internal roadmap (move to private project board) - HANDOFF.md, AGENTS.md — one-time internal session docs - .claude/ — gitignored entirely (local agent config) - infra/cloudflare-worker/ → Molecule-AI/molecule-tenant-proxy - org-templates/molecule-dev/ → standalone template repo - .mcp-eval/ → molecule-mcp-server repo - test-results/ — ephemeral, gitignored Security scrubbing: - Cloudflare account/zone/KV IDs → placeholders - Real EC2 IPs → <EC2_IP> in all docs - CF token prefix, Neon project ID, Fly app names → redacted - Langfuse dev credentials → parameterized - Personal runner username/machine name → generic Community files: - CONTRIBUTING.md — build, test, branch conventions - CODE_OF_CONDUCT.md — Contributor Covenant 2.1 All Dockerfiles, CI workflows, docker-compose, railway.toml, render.yaml, README, CLAUDE.md updated for new directory names. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-18 00:24:44 -07:00

4 Commits