forked from molecule-ai/molecule-core
9052d2d2ad
13 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
8059fee128 |
fix(tenant-guard): allowlist /registry/register + /registry/heartbeat (#1236)
* fix(security): call redactSecrets before seeding workspace memories (F1085) seedInitialMemories() in workspace_provision.go was inserting template/config memories directly into agent_memories without scrubbing credential patterns. A workspace provisioned from a template containing API keys, tokens, or other secrets would store them in plain text — the same class of issue as #838. Fix: call redactSecrets(workspaceID, content) on the truncated memory content before the INSERT. The truncation (maxMemoryContentLength = 100 KiB, CWE-400) is preserved — redaction runs after truncation so the size limit still applies. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * test(workspace_provision): add seedInitialMemories coverage for #1208 Cover the truncate-at-100k boundary (PR #1167, CWE-400) and the redactSecrets call (F1085 / #1132), both identified as untested in #1208. - TestSeedInitialMemories_TruncatesOversizedContent: boundary at exactly 100k, 1 byte over, far over, and well under. Verifies INSERT receives exactly maxMemoryContentLength bytes. - TestSeedInitialMemories_RedactsSecrets: verifies redactSecrets runs before INSERT, regression test for F1085. - TestSeedInitialMemories_InvalidScopeSkipped: invalid scope is silently skipped, no INSERT called. - TestSeedInitialMemories_EmptyMemoriesNil: nil slice is handled without DB calls. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * docs(marketing): Discord adapter launch visual assets (#1209) Squash-merge: Discord adapter launch visual assets (3 PNGs) + social copy. Acceptance: assets on staging. * fix(ci): golangci-lint errcheck failures on staging Suppress errcheck warnings for calls where the return value is safely ignored: - resp.Body.Close() (artifacts/client.go): deferred cleanup — failure to close a response body is non-critical; the defer itself is what matters for connection reuse. - rows.Close() (bundle/exporter.go): deferred cleanup in a loop where rows.Err() already handles query errors. - filepath.Walk (bundle/exporter.go): top-level walk call; errors in sub-directory traversal are handled by the inner callback (which returns nil for err != nil). - broadcaster.RecordAndBroadcast (bundle/importer.go): fire-and-forget event broadcast; errors are logged internally by the broadcaster. - db.DB.ExecContext (bundle/importer.go): best-effort runtime column update; non-critical auxiliary data that the provisioner re-extracts if needed. Fixes: #1143 * test(artifacts): suppress w.Write return values to satisfy errcheck All httptest.ResponseWriter.Write calls in client_test.go now discard the byte count and error return with _, _ = prefix. The Write method is safe to discard in test handlers — httptest.ResponseWriter.Write never returns an error for in-memory buffers. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(CI): move changes job off self-hosted runner + add workflow concurrency Cherry-pick from staging PR #1194 for main. Two changes to relieve macOS arm64 runner saturation: 1. `changes` job: runs on ubuntu-latest instead of [self-hosted, macos, arm64]. This job does a plain `git diff` with zero macOS dependencies — moving it off the runner frees a slot immediately on every workflow trigger. 2. Add workflow-level concurrency: concurrency: group: ci-${{ github.ref }}; cancel-in-progress: true Prevents multiple stale in-flight CI runs from queuing on the same ref when new commits arrive. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(security): call redactSecrets before seeding workspace memories (F1085) (#1203) seedInitialMemories() in workspace_provision.go was inserting template/config memories directly into agent_memories without scrubbing credential patterns. A workspace provisioned from a template containing API keys, tokens, or other secrets would store them in plain text — the same class of issue as #838. Fix: call redactSecrets(workspaceID, content) on the truncated memory content before the INSERT. The truncation (maxMemoryContentLength = 100 KiB, CWE-400) is preserved — redaction runs after truncation so the size limit still applies. Co-authored-by: Molecule AI Core-BE <core-be@agents.moleculesai.app> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> * tick: 2026-04-21 ~03:40Z — CI stalled 59+ min, GH_TOKEN 4th rotation, PR reviews done * fix(tenant-guard): allowlist /registry/register + /registry/heartbeat Final layer of today's stuck-provisioning saga. With the private-IP platform_url fix and the intra-VPC :8080 SG rule in place, workspace EC2s finally reached the tenant on the right port — only to have every POST bounced with a synthetic 404 by TenantGuard. TenantGuard is the SaaS hook that rejects cross-tenant routing. It demands X-Molecule-Org-Id on every request, but CP's workspace user- data doesn't export MOLECULE_ORG_ID (only WORKSPACE_ID, PLATFORM_URL, RUNTIME, PORT), so the runtime can't attach the header. Net effect: every workspace's first heartbeat to /registry/heartbeat was a silent 404, and the workspace sat in 'provisioning' until the platform sweeper timed it out. Allowlist the two workspace-boot paths: - /registry/register — one-shot at runtime startup - /registry/heartbeat — every 30s Both are still gated by wsauth.HasAnyLiveToken (workspaces with a token on file must present it; legacy tokenless workspaces are grandfathered). And the tenant SG already scopes :8080 to the VPC CIDR, so only intra-VPC callers can reach these paths in the first place. The allowlist bypasses cross-org routing, not auth. Follow-up: passing MOLECULE_ORG_ID into the workspace env would let the runtime attach the header and drop this allowlist entry. Tracked separately; not urgent since the multi-layer auth above is already adequate. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Molecule AI Core-BE <core-be@agents.moleculesai.app> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> Co-authored-by: Molecule AI Infra-SRE <infra-sre@agents.moleculesai.app> Co-authored-by: molecule-ai[bot] <276602405+molecule-ai[bot]@users.noreply.github.com> Co-authored-by: Molecule AI Core-DevOps <core-devops@agents.moleculesai.app> Co-authored-by: Molecule AI Core-UIUX <core-uiux@agents.moleculesai.app> Co-authored-by: Hongming Wang <hongmingwang.rabbit@users.noreply.github.com> |
||
|
|
2575960805 |
fix(errcheck): suppress unchecked resp.Body.Close() across workspace-server (#1229)
Issue #1196: golangci-lint errcheck flags bare resp.Body.Close() calls because Body.Close() can return a non-nil error (e.g. when the server sent fewer bytes than Content-Length). All occurrences fixed: defer resp.Body.Close() → defer func() { _ = resp.Body.Close() }() resp.Body.Close() → _ = resp.Body.Close() 12 files affected across all Go packages — channels, handlers, middleware, provisioner, artifacts, and cmd. The body is already fully consumed at each call site, so the error is always safe to discard. 🤖 Generated with [Claude Code](https://claude.ai) Co-authored-by: Molecule AI Core-BE <core-be@agents.moleculesai.app> |
||
|
|
5b5a634b5b |
fix(middleware): set org_id in context after orgtoken.Validate (F1097) (#1232)
PR #1210 added org_api_tokens.org_id but c.Set("org_id", ...) was never called — so orgCallerID() always returns "" and all token callers are denied org-scoped access even within their own org. Fix: after orgtoken.Validate succeeds in AdminAuth, look up the token's org_id column and set it in the gin context. Pre-fix tokens (org_id=NULL) get no org_id in context, which is correct — requireCallerOwnsOrg already denies access for nil org_id. Test: TestAdminAuth_OrgToken_SetsOrgID covers both post-fix tokens (org_id set) and pre-fix tokens (org_id=NULL, not set). Co-authored-by: Molecule AI Infra-SRE <infra-sre@agents.moleculesai.app> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> |
||
|
|
ad28e10bf4 |
fix(org-tokens): rate-limit mint, bound list, correct audit provenance
Addresses the Critical + Important findings from today's code
review of the org API keys feature (PRs #1105-1108).
## Critical-1: rate-limit mint endpoint
Previously POST /org/tokens had no mint-rate limit. A compromised
WorkOS session or leaked bearer could mint thousands of tokens in
seconds, forcing a painful manual cleanup of each one.
Fix: dedicated per-IP token bucket, 10 mints/hour/IP. Legitimate
bursts fit under the ceiling; abuse bounces. List + Delete stay
on the global limiter — they can't be used to generate new
secret material.
## Important-1: HTTP handler integration tests
internal/orgtoken had 9 unit tests; the HTTP layer (org_tokens.go)
had none. Adds org_tokens_test.go covering:
- List happy path + DB error → 500
- Create actor="admin-token" (bootstrap), actor="org-token:<prefix>"
(chained mint), actor="session" (canvas browser path)
- Create name>100 chars → 400
- Create with empty body mints with no name
- Revoke happy path 200, missing id 404, empty id 400
- Plaintext returned in response body and prefix matches first 8 chars
- Warning text present
A regression that breaks the tier-ordering, drops the createdBy
field, or accepts oversized names now fails at CI not prod.
## Important-2: bound List output
List() had no LIMIT — a mint-storm bug or abuse could make the
admin UI slow to render and allocate proportionally. Adds
LIMIT 500 at the SQL layer. 10x realistic ceiling, guardrail
against pathological cases.
## Important-3: audit provenance uses plaintext prefix, not UUID
orgTokenActor() was logging "org-token:<first-8-of-uuid>" which
couldn't be cross-referenced with the UI (which shows first-8
of the plaintext). Users could not correlate "who minted this"
audit entries with the revoke button they're looking at.
Fix: Validate() now returns (id, prefix, error). Middleware
stashes both on the gin context. Handler reads prefix for the
actor string. Audit rows now match UI prefixes exactly.
## Nit: named constants for audit labels
actorOrgTokenPrefix / actorSession / actorAdminToken replace
the hardcoded strings scattered across the handler. Greppable
across log pipelines + audit queries; one place to change if
the format evolves.
## Tests
- internal/orgtoken: 9 existing + 0 new, all still green (updated
signatures for Validate returning prefix).
- internal/handlers/org_tokens_test.go: new — 9 HTTP-layer tests
above. Full gin.Context + sqlmock harness.
- Full `go test ./...` green except one pre-existing
TestGitHubToken_NoTokenProvider flake unrelated to this change
(expects 404, gets 500 — tracked separately).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
3d7244ab94 |
feat(auth): org tokens reach /workspaces/:id/* subroutes + docs
Extends WorkspaceAuth to accept org API tokens as a valid
credential for any workspace sub-route in the org. Previously a
user minting an org token could hit admin-surface endpoints
(/workspaces, /org/import, etc.) but couldn't reach per-workspace
routes like /workspaces/:id/channels — those were gated by
WorkspaceAuth which only knew about workspace-scoped tokens.
Scope matches the explicit product spec: one org API key can
manipulate every workspace in the org. AI agents given a key can
read/write channels, tokens, schedules, secrets, tasks across all
workspaces.
## WorkspaceAuth tier order
1. ADMIN_TOKEN exact match (break-glass / bootstrap)
2. Org API token (Validate against org_api_tokens) NEW
3. Workspace-scoped token (ValidateToken with :id binding)
4. Same-origin canvas referer
Org token tier sits above the per-workspace check so a presenter
of an org key doesn't hit the narrower ValidateToken failure path
first. Checked with isSameOriginCanvas path unchanged.
## End-to-end verified
Minted test token via ADMIN_TOKEN, then with that org token:
- GET /workspaces → 200 (list all)
- GET /workspaces/<id> → 200 (detail, admin-only route)
- GET /workspaces/<id>/channels → 200 (workspace sub-route)
- GET /workspaces/<id>/tokens → 200 (workspace tokens list)
- GET /workspaces/<bad-uuid> → 404 workspace not found
(routing still scoped correctly)
## Documentation
- docs/architecture/org-api-keys.md — design, data model, threat
model, security properties
- docs/architecture/org-api-keys-followups.md — 10 tracked
follow-ups prioritized (role scoping P1, per-workspace binding
P1, expiry P2, usage metrics P2, WorkOS user_id capture P2,
rotation webhooks P3, mint-rate limit P3, audit log P2, CLI
P3, migrate ADMIN_TOKEN to the same table P4)
- docs/guides/org-api-keys.md — end-user guide (mint via UI,
use in curl/Python/TS/AI agents, session-vs-key comparison)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
91187342b4 |
feat(auth): organization-scoped API keys for admin access
Adds user-facing API keys with full-org admin scope. Replaces the
single ADMIN_TOKEN env var with named, revocable, audited tokens
that users can mint/rotate from the canvas UI without ops
intervention.
Designed for the beta growth phase — one token tier (full admin).
Future work will split into scoped roles (admin / workspace-write
/ read-only) and per-workspace bindings. See docs/architecture/
org-api-keys.md for the design + follow-up roadmap.
## Surface
POST /org/tokens mint (plaintext returned once)
GET /org/tokens list live keys (prefix-only)
DELETE /org/tokens/:id revoke (idempotent)
All AdminAuth-gated. Bootstrap path: mint the first token via
ADMIN_TOKEN or canvas session; tokens can mint more tokens after.
## Validation as a new AdminAuth tier (2a)
AdminAuth evaluation order:
Tier 0 lazy-bootstrap fail-open (only when no live tokens AND
no ADMIN_TOKEN env)
Tier 1 verified WorkOS session via /cp/auth/tenant-member
Tier 2a org_api_tokens SELECT — NEW
Tier 2b ADMIN_TOKEN env (bootstrap / CLI break-glass)
Tier 3 any live workspace token (deprecated, only when ADMIN_TOKEN
unset)
Tier 2a runs ONE indexed lookup (partial index on
token_hash WHERE revoked_at IS NULL) + an async last_used_at
bump. No measurable latency cost on the hot path.
## UI
New "Org API Keys" tab in the settings panel. Label field for
human-readable naming. Plaintext shown once + clipboard copy.
Revoke with confirm dialog. Mirrors the existing workspace-
TokensTab flow so users who've used one get the other for free.
## Security properties
- Plaintext never stored. sha256 hash + 8-char display prefix.
- Revocation is immediate: partial index on revoked_at IS NULL
means the next request validates or fails in microseconds.
- created_by audit field captures provenance: "org-token:<short>"
when a token mints another, "session" for browser-UI mints,
"admin-token" for the ADMIN_TOKEN bootstrap path.
- Validate() collapses all failure shapes into ErrInvalidToken
so response-shape can't distinguish "never existed" from
"revoked".
## Tests
- internal/orgtoken: 9 unit tests (hash storage, empty field
null-ing, validation happy path, empty plaintext, unknown hash,
revoked filtering, list ordering, revoke idempotency, has-any-
live short-circuit).
- AdminAuth tier-2a integration covered by existing middleware
tests unchanged (fail-open + bearer paths).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
d03f2d47e0 |
fix: close cross-tenant authz + cp_proxy admin-traversal gaps
Addresses three Critical findings from today's code review of the
SaaS-canvas routing stack.
## Critical-1: session verification scoped to the current tenant
session_auth.go previously verified via GET /cp/auth/me, which
only answers "is someone logged in" — NOT "is this user in the
org they're targeting." Every WorkOS-authed user (including folks
who only signed up via app.moleculesai.app with no tenant
relationship) could call /workspaces, /approvals/pending,
/bundles/import, /org/import etc. on ANY tenant they could reach.
Cross-tenant read: user at acme.moleculesai.app could hit
bob.moleculesai.app/workspaces with their cookie and get Bob's
workspaces.
Fix:
- CP gains GET /cp/auth/tenant-member?slug=<slug> which joins
org_members × organizations and only returns member:true when
the authenticated user is actually in that org.
- Tenant sets MOLECULE_ORG_SLUG at boot via user-data.
- session_auth now calls tenant-member (not /me), passing its
own slug. Cache key includes slug so one tenant's cached
positive never satisfies another's check.
## Critical-2: cp_proxy path allowlist (lateral-movement fix)
cp_proxy.go forwarded any /cp/* path upstream with the cookie
and bearer attached. Since /cp/admin/* accepts sessions as one
of its auth tiers, a tenant-authed user could curl
/cp/admin/tenants/other-slug/diagnostics through their tenant
and the CP would honor it — turning any tenant into a lateral
hop into admin surface.
Fix: explicit allowlist of paths the canvas browser bundle
actually needs (/cp/auth, /cp/orgs, /cp/billing, /cp/templates,
/cp/legal). Everything else 404s at the tenant before cookies
leave. Fail-closed: future UI paths require explicit entries.
## Important-1,2: bounded session cache + split positive/negative TTL
Previous sync.Map cache grew unbounded (one entry per unique
Cookie header for process lifetime) and cached failures for 30s,
meaning a 3s CP blip locked users out for the full window.
Fix:
- Bounded map with batch random eviction at cap (10k entries ×
~100 bytes = 1 MB ceiling). Random eviction is O(1)
expected; we don't need precise LRU.
- Periodic sweeper goroutine (2 min) reclaims expired entries
even when they're not re-hit.
- Positive TTL 30s, negative TTL 5s — short negative so CP
flakes self-heal fast.
- Transport errors NOT cached (would otherwise trap every
user during a multi-second upstream outage).
- Cache key = sha256(slug + cookie) so raw session tokens
don't sit in process memory, and cross-tenant isolation is
structural not policy.
## Important-3: TenantGuard /cp/* bypass documented
Added a security note to the bypass explaining why it's safe
only under the current setup (cp_proxy allowlist + tunnel-only
ingress), and what would require revisiting (SG opens :8080
inbound to the VPC).
## Tests
- session_auth_test.go: 12 new tests — empty cookie, missing
slug, no CP, member:true happy path with cache hit, member:
false, 401 upstream, malformed JSON, transport error not
cached, cross-tenant isolation (same cookie different
tenants hit upstream separately), bounded eviction, expired
entries, cache key collision resistance.
- cp_proxy_test.go: new — isCPProxyAllowedPath covers 17
allow/block cases, forwarding preserves Cookie+Auth, Host
rewritten, blocked paths 404 without calling upstream.
All platform tests pass. CP provisioner tests pass after
threading cfg.OrgSlug into the container env.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
03178b4712 |
feat(middleware): AdminAuth accepts CP-verified WorkOS session
Canvas (SaaS tenant UI) runs in the browser and authenticates the
user via a WorkOS session cookie scoped to .moleculesai.app. It
has no bearer token — the token-based ADMIN_TOKEN scheme is for
CLI + server-to-server callers, not end users.
Adds a session-verification tier to AdminAuth that runs BEFORE the
bearer check:
1. If Cookie header present AND CP_UPSTREAM_URL configured →
GET /cp/auth/me upstream with the same cookie. 200 + valid
user_id → grant admin access. Non-200 → fall through.
2. Else (no cookie, or no CP configured, or CP said no) →
existing bearer-only path unchanged.
Positive verifications are cached 30s keyed by the raw Cookie
header, so a burst of canvas admin-page renders doesn't DDoS
the CP. Revocations propagate within that window.
Self-hosted / dev deploys without CP_UPSTREAM_URL: feature
disabled, behavior unchanged. So this is strictly additive for
the SaaS case.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
0b8f3239f6 |
fix(middleware): TenantGuard passes through /cp/* to CP proxy
Today's rollout of cp_proxy (PR #1095/1096) mounted /cp/* as a reverse-proxy to the control plane, but the TenantGuard middleware runs first in the global chain and 404s anything that isn't in its exact-path allowlist (/health + /metrics). Every /cp/auth/me fetch from canvas landed on a 40µs 404 before ever reaching the proxy. /cp/* is handled upstream (WorkOS session + admin bearer), so the tenant doesn't need to attach org identity for those paths. Passing them through is correct — matches the design where the tenant platform is a pure transit layer for /cp/*. Verified: /cp/auth/me via tunnel now returns 401 (correct unauth from CP) instead of 404 from TenantGuard. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
992e6d3f38 | fix(auth): accept admin token in CanvasOrBearer for viewport PUT | ||
|
|
1e30386aec |
fix(auth): accept admin token in WorkspaceAuth for canvas dashboard
The canvas sends NEXT_PUBLIC_ADMIN_TOKEN on all API calls but per-workspace routes (/activity, /delegations, /traces) use WorkspaceAuth which only accepts per-workspace bearer tokens. This made the canvas dashboard 401 on every workspace detail view. Fix: WorkspaceAuth now accepts the admin token as a fallback after workspace token validation fails. This lets the canvas read all workspace data with a single admin credential. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
||
|
|
481b5cfb1a |
fix(security): C4 — close AdminAuth fail-open race on hosted-SaaS fresh install
Pre-launch review blocker. AdminAuth's Tier-1 fail-open fired whenever the workspace_auth_tokens table was empty — including the window between a hosted tenant EC2 booting and the first workspace being created. In that window, every admin-gated route (POST /org/import, POST /workspaces, POST /bundles/import, etc.) was reachable without a bearer, letting an attacker pre-empt the first real user by importing a hostile workspace into a freshly provisioned instance. Fix: fail-open is now ONLY applied when ADMIN_TOKEN is unset (self- hosted dev with zero auth configured). Hosted SaaS always sets ADMIN_TOKEN at provision time, so the branch never fires in prod and requests with no bearer get 401 even before the first token is minted. Tier-2 / Tier-3 paths unchanged. The old TestAdminAuth_684_FailOpen_AdminTokenSet_NoGlobalTokens test was codifying exactly this bug (asserting 200 on fresh install with ADMIN_TOKEN set). Renamed and flipped to TestAdminAuth_C4_AdminTokenSet_FreshInstall_FailsClosed asserting 401. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
d8026347e5 |
chore: open-source restructure — rename dirs, remove internal files, scrub secrets
Renames: - platform/ → workspace-server/ (Go module path stays as "platform" for external dep compat — will update after plugin module republish) - workspace-template/ → workspace/ Removed (moved to separate repos or deleted): - PLAN.md — internal roadmap (move to private project board) - HANDOFF.md, AGENTS.md — one-time internal session docs - .claude/ — gitignored entirely (local agent config) - infra/cloudflare-worker/ → Molecule-AI/molecule-tenant-proxy - org-templates/molecule-dev/ → standalone template repo - .mcp-eval/ → molecule-mcp-server repo - test-results/ — ephemeral, gitignored Security scrubbing: - Cloudflare account/zone/KV IDs → placeholders - Real EC2 IPs → <EC2_IP> in all docs - CF token prefix, Neon project ID, Fly app names → redacted - Langfuse dev credentials → parameterized - Personal runner username/machine name → generic Community files: - CONTRIBUTING.md — build, test, branch conventions - CODE_OF_CONDUCT.md — Contributor Covenant 2.1 All Dockerfiles, CI workflows, docker-compose, railway.toml, render.yaml, README, CLAUDE.md updated for new directory names. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |