molecule-core

Author	SHA1	Message	Date
Hongming Wang	c1593dd328	Merge remote-tracking branch 'origin/staging' into feat/bootstrap-failed-and-console-proxy # Conflicts: # workspace-server/internal/handlers/admin_memories_test.go	2026-04-20 17:31:16 -07:00
Hongming Wang	4641151b09	Merge remote-tracking branch 'origin/staging' into feat/bootstrap-failed-and-console-proxy # Conflicts: # workspace-server/internal/router/router.go	2026-04-20 17:25:24 -07:00
Molecule AI Core-DevOps	70d47e2730	fix(security): SSRF URL validation (#1130 ) + redactSecrets on memory admin endpoints (#1131 , #1132 ) URLs returned from DB and Redis cache (db.GetCachedURL, workspaces.url column) are now validated via validateAgentURL() before any HTTP request is made: - mcpResolveURL (mcp.go): added validateAgentURL() calls on all three return paths (internal cache, Redis cache, DB fallback). - resolveAgentURL (a2a_proxy.go): added validateAgentURL() call before returning agentURL to the A2A dispatcher. validateAgentURL() was extended (registry.go) to resolve DNS hostnames and check each returned IP against the blocklist (private ranges, loopback, cloud-metadata 169.254.0.0/16). "localhost" is allowed by name for local dev. GET /admin/memories/export now applies redactSecrets() to each content field before including it in the JSON response. Pre-SAFE-T1201 memories (stored before redactSecrets was mandatory on writes) no longer leak credentials. POST /admin/memories/import now calls redactSecrets() on content before both the deduplication check and the INSERT. Imported memories with embedded credentials cannot bypass SAFE-T1201 (#838). - admin_memories.go: GET /admin/memories/export + POST /admin/memories/import handler (from PR #1051, with security fixes applied). - admin_memories_test.go: 6 tests covering redactSecrets parity on both endpoints. - registry_test.go: added DNS-lookup test cases for validateAgentURL (F1083). "localhost" allowed by name (preserves existing test); nxdomain blocked. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-21 00:24:02 +00:00
Hongming Wang	731a9aef6e	feat(platform): bootstrap-failed + console endpoints for CP watcher Workspaces stuck in provisioning used to sit in "starting" for 10min until the sweeper flipped them. The real signal — a runtime crash at EC2 boot — lands on the serial console within seconds but nothing listened. These endpoints close the loop. 1. POST /admin/workspaces/:id/bootstrap-failed The control plane's bootstrap watcher posts here when it spots "RUNTIME CRASHED" in ec2:GetConsoleOutput. Handler: - UPDATEs workspaces SET status='failed' only when status was 'provisioning' (idempotent — a raced online/failed stays put) - Stores the error + log_tail in last_sample_error so the canvas can render the real stack trace, not a generic "timeout" string - Broadcasts WORKSPACE_PROVISION_FAILED with source='bootstrap_watcher' 2. GET /workspaces/:id/console Proxies to CP's new /cp/admin/workspaces/:id/console endpoint so the tenant platform can surface EC2 serial console output without holding AWS credentials. CPProvisioner.GetConsoleOutput is the client; returns 501 in non-CP deployments (docker-compose dev). Both gated by AdminAuth — CP holds the tenant ADMIN_TOKEN that the middleware accepts on its tier 2b branch. Tests cover: happy-path fail, already-transitioned no-op, empty id, log_tail truncation, and the 501 fallback when no CP is wired. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-20 17:11:34 -07:00
molecule-ai[bot]	517c2f869c	Merge pull request #1053 from Molecule-AI/fix/memory-backup-restore-1051 feat(platform): memory backup/restore for nuke-safe development (#1051)	2026-04-20 23:18:30 +00:00
Hongming Wang	ad28e10bf4	fix(org-tokens): rate-limit mint, bound list, correct audit provenance Addresses the Critical + Important findings from today's code review of the org API keys feature (PRs #1105-1108). ## Critical-1: rate-limit mint endpoint Previously POST /org/tokens had no mint-rate limit. A compromised WorkOS session or leaked bearer could mint thousands of tokens in seconds, forcing a painful manual cleanup of each one. Fix: dedicated per-IP token bucket, 10 mints/hour/IP. Legitimate bursts fit under the ceiling; abuse bounces. List + Delete stay on the global limiter — they can't be used to generate new secret material. ## Important-1: HTTP handler integration tests internal/orgtoken had 9 unit tests; the HTTP layer (org_tokens.go) had none. Adds org_tokens_test.go covering: - List happy path + DB error → 500 - Create actor="admin-token" (bootstrap), actor="org-token:<prefix>" (chained mint), actor="session" (canvas browser path) - Create name>100 chars → 400 - Create with empty body mints with no name - Revoke happy path 200, missing id 404, empty id 400 - Plaintext returned in response body and prefix matches first 8 chars - Warning text present A regression that breaks the tier-ordering, drops the createdBy field, or accepts oversized names now fails at CI not prod. ## Important-2: bound List output List() had no LIMIT — a mint-storm bug or abuse could make the admin UI slow to render and allocate proportionally. Adds LIMIT 500 at the SQL layer. 10x realistic ceiling, guardrail against pathological cases. ## Important-3: audit provenance uses plaintext prefix, not UUID orgTokenActor() was logging "org-token:<first-8-of-uuid>" which couldn't be cross-referenced with the UI (which shows first-8 of the plaintext). Users could not correlate "who minted this" audit entries with the revoke button they're looking at. Fix: Validate() now returns (id, prefix, error). Middleware stashes both on the gin context. Handler reads prefix for the actor string. Audit rows now match UI prefixes exactly. ## Nit: named constants for audit labels actorOrgTokenPrefix / actorSession / actorAdminToken replace the hardcoded strings scattered across the handler. Greppable across log pipelines + audit queries; one place to change if the format evolves. ## Tests - internal/orgtoken: 9 existing + 0 new, all still green (updated signatures for Validate returning prefix). - internal/handlers/org_tokens_test.go: new — 9 HTTP-layer tests above. Full gin.Context + sqlmock harness. - Full `go test ./...` green except one pre-existing TestGitHubToken_NoTokenProvider flake unrelated to this change (expects 404, gets 500 — tracked separately). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-20 14:22:38 -07:00
Hongming Wang	91187342b4	feat(auth): organization-scoped API keys for admin access Adds user-facing API keys with full-org admin scope. Replaces the single ADMIN_TOKEN env var with named, revocable, audited tokens that users can mint/rotate from the canvas UI without ops intervention. Designed for the beta growth phase — one token tier (full admin). Future work will split into scoped roles (admin / workspace-write / read-only) and per-workspace bindings. See docs/architecture/ org-api-keys.md for the design + follow-up roadmap. ## Surface POST /org/tokens mint (plaintext returned once) GET /org/tokens list live keys (prefix-only) DELETE /org/tokens/:id revoke (idempotent) All AdminAuth-gated. Bootstrap path: mint the first token via ADMIN_TOKEN or canvas session; tokens can mint more tokens after. ## Validation as a new AdminAuth tier (2a) AdminAuth evaluation order: Tier 0 lazy-bootstrap fail-open (only when no live tokens AND no ADMIN_TOKEN env) Tier 1 verified WorkOS session via /cp/auth/tenant-member Tier 2a org_api_tokens SELECT — NEW Tier 2b ADMIN_TOKEN env (bootstrap / CLI break-glass) Tier 3 any live workspace token (deprecated, only when ADMIN_TOKEN unset) Tier 2a runs ONE indexed lookup (partial index on token_hash WHERE revoked_at IS NULL) + an async last_used_at bump. No measurable latency cost on the hot path. ## UI New "Org API Keys" tab in the settings panel. Label field for human-readable naming. Plaintext shown once + clipboard copy. Revoke with confirm dialog. Mirrors the existing workspace- TokensTab flow so users who've used one get the other for free. ## Security properties - Plaintext never stored. sha256 hash + 8-char display prefix. - Revocation is immediate: partial index on revoked_at IS NULL means the next request validates or fails in microseconds. - created_by audit field captures provenance: "org-token:<short>" when a token mints another, "session" for browser-UI mints, "admin-token" for the ADMIN_TOKEN bootstrap path. - Validate() collapses all failure shapes into ErrInvalidToken so response-shape can't distinguish "never existed" from "revoked". ## Tests - internal/orgtoken: 9 unit tests (hash storage, empty field null-ing, validation happy path, empty plaintext, unknown hash, revoked filtering, list ordering, revoke idempotency, has-any- live short-circuit). - AdminAuth tier-2a integration covered by existing middleware tests unchanged (fail-open + bearer paths). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-20 14:01:41 -07:00
Hongming Wang	52235aeb27	feat(router): /cp/* reverse-proxy to CP + same-origin canvas fetches Canvas's browser bundle issues fetches to both CP endpoints (/cp/auth/me, /cp/orgs, ...) AND tenant-platform endpoints (/canvas/viewport, /approvals/pending, /org/templates). They share ONE build-time base URL. Baking api.moleculesai.app broke tenant calls with 404; baking the tenant subdomain broke auth. Tried both today and saw exactly one failure mode per attempt. Real fix: same-origin fetches + tenant-side split. Adds: internal/router/cp_proxy.go # /cp/* → CP_UPSTREAM_URL mounted before NoRoute(canvasProxy). Now a tenant serves: /cp/* → reverse-proxy to api.moleculesai.app /canvas/viewport, /approvals/pending, /workspaces/:id/*, /ws, /registry, → tenant platform (existing handlers) /metrics everything else → canvas UI (existing reverse-proxy) Canvas middleware reverts to `connect-src 'self' wss:` for the same-origin path (keeping explicit PLATFORM_URL whitelist as a self-hosted escape hatch when the build-arg is non-empty). CI build-arg flips to NEXT_PUBLIC_PLATFORM_URL="" so the bundle issues relative fetches. Security of cp_proxy: - Cookie + Authorization PRESERVED across the hop (opposite of canvas proxy) — they carry the WorkOS session, which is the whole point. - Host rewritten to upstream so CORS + cookie-domain on the CP side see their own hostname. - Upstream URL validated at construction: must parse, must be http(s), must have a host — misconfig fails closed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-20 13:01:40 -07:00
rabbitblood	b1bb5f838a	fix: GitHub token refresh — add WorkspaceAuth path for credential helper (#1068 ) PR #729 tightened AdminAuth to require ADMIN_TOKEN, breaking the workspace credential helper which called /admin/github-installation-token with a workspace bearer token. Tokens expired after 60 min with no refresh. Fix: Add /workspaces/:id/github-installation-token under WorkspaceAuth so any authenticated workspace can refresh its GitHub token. Keep the admin path as backward-compatible alias. Update molecule-git-token-helper.sh to use the workspace-scoped path when WORKSPACE_ID is set. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-20 08:30:02 -07:00
rabbitblood	c9e4e349b2	Add memory backup/restore endpoints for safe Docker rebuilds (#1051 ) GET /admin/memories/export returns all agent memories with workspace name mapping. POST /admin/memories/import accepts the same format, resolves workspaces by name, and deduplicates on content+scope. Both endpoints are AdminAuth-gated. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-20 00:29:24 -07:00
Hongming Wang	d8026347e5	chore: open-source restructure — rename dirs, remove internal files, scrub secrets Renames: - platform/ → workspace-server/ (Go module path stays as "platform" for external dep compat — will update after plugin module republish) - workspace-template/ → workspace/ Removed (moved to separate repos or deleted): - PLAN.md — internal roadmap (move to private project board) - HANDOFF.md, AGENTS.md — one-time internal session docs - .claude/ — gitignored entirely (local agent config) - infra/cloudflare-worker/ → Molecule-AI/molecule-tenant-proxy - org-templates/molecule-dev/ → standalone template repo - .mcp-eval/ → molecule-mcp-server repo - test-results/ — ephemeral, gitignored Security scrubbing: - Cloudflare account/zone/KV IDs → placeholders - Real EC2 IPs → <EC2_IP> in all docs - CF token prefix, Neon project ID, Fly app names → redacted - Langfuse dev credentials → parameterized - Personal runner username/machine name → generic Community files: - CONTRIBUTING.md — build, test, branch conventions - CODE_OF_CONDUCT.md — Contributor Covenant 2.1 All Dockerfiles, CI workflows, docker-compose, railway.toml, render.yaml, README, CLAUDE.md updated for new directory names. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-18 00:24:44 -07:00

11 Commits