molecule-core

Author	SHA1	Message	Date
Molecule AI Backend Engineer	13b8965c99	fix(platform): cap token counts before upsert to prevent NUMERIC overflow (#615 ) Adversarial or buggy agents can report INT64_MAX token counts via A2A responses. Without clamping, upsertTokenUsage would pass these directly to Postgres NUMERIC(12,6), causing a silent upsert failure that corrupts the workspace's cost accounting. Fix: clamp input_tokens/output_tokens to [0, 10_000_000] before any arithmetic or DB write. 10M tokens/call is well above any real LLM API response; clamped values still produce valid cost rows. Adds 4 regression tests: - TestUpsertTokenUsage_615_CapsInt64Max — INT64_MAX → maxTokensPerCall - TestUpsertTokenUsage_615_CapsNegative — negative → 0 (no DB call) - TestUpsertTokenUsage_615_NormalValuesUnchanged — passthrough for normal counts - TestUpsertTokenUsage_615_ExactlyAtCap — at-cap value accepted unchanged Closes #615 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-17 06:03:40 +00:00
Molecule AI Backend Engineer	67a9ec8fcb	fix(platform): pin X-Content-Type-Options nosniff + add /orgs API prefix (#614 ) SecurityHeaders() middleware already sets X-Content-Type-Options: nosniff and X-Frame-Options: DENY globally on every response (issue #151 / PR ~securityheaders). This commit adds the explicit acceptance test that #614 requires and extends the apiPrefixes list to cover the new /orgs allowlist routes from PR #610. Changes: - securityheaders.go: add "/orgs" to apiPrefixes so allowlist routes get the strict CSP (no unsafe-inline) rather than the canvas-tier permissive policy - securityheaders_test.go: TestSecurityHeaders_614_NosniffOnSSEAndAPIEndpoints verifies the header is present on SSE endpoint, /settings/secrets, /events, and /orgs paths; TestIsAPIPath gains /orgs cases Closes #614 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-17 06:02:18 +00:00
Molecule AI Backend Engineer	cc45f0c0f6	fix(security): remove canvasOriginAllowed from AdminAuth middleware (#623 ) The Origin header is trivially forgeable by any container on the Docker network. Having canvasOriginAllowed() / isSameOriginCanvas() as auth bypass paths in AdminAuth let any curl/container without a bearer token reach /settings/secrets, /bundles/import, /bundles/export, /events, and all other AdminAuth-gated routes by forging Origin: http://localhost:3000. Fix: remove both Origin bypass branches from AdminAuth. Bearer token is now the only accepted credential. Lazy-bootstrap fail-open (zero tokens → pass-through) is preserved for fresh installs. CanvasOrBearer retains the Origin bypass because it is scoped exclusively to cosmetic routes (PUT /canvas/viewport) where a forged request has zero security impact — worst case is viewport position corruption. Added 3 regression tests: - TestAdminAuth_623_ForgedOrigin_Returns401 - TestAdminAuth_623_ForgedCORSOrigin_Returns401 - TestAdminAuth_623_ValidBearer_WithOrigin_Passes Closes #623, Closes #626 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-17 06:00:45 +00:00
molecule-ai[bot]	b948f0b140	Merge pull request #610 from Molecule-AI/feat/issue-591-org-plugin-allowlist feat(platform): per-org plugin governance registry (allowlist)	2026-04-17 05:55:27 +00:00
molecule-ai[bot]	9f815e27a1	Merge pull request #602 from Molecule-AI/feat/issue-593-workspace-token-tracking feat(platform): per-workspace token tracking + GET /workspaces/:id/metrics	2026-04-17 05:54:27 +00:00
molecule-ai[bot]	588190a92f	Merge pull request #612 from Molecule-AI/fix/test-token-adminauth fix(security): gate test-token endpoint behind AdminAuth	2026-04-17 05:53:49 +00:00
molecule-ai[bot]	3ecdcf8c6b	Merge pull request #601 from Molecule-AI/feat/issue-590-agui-sse-endpoint feat(platform): AG-UI compatible SSE endpoint for streaming agent events	2026-04-17 05:45:29 +00:00
Molecule AI Backend Engineer	53284c4626	feat(platform): per-org plugin governance registry (#591 ) Add an org-scoped allowlist table so org admins can restrict which plugins workspace agents are allowed to install. An empty allowlist means allow-all (backward-compatible with existing deployments). • migrations/027_org_plugin_allowlist.{up,down}.sql — new table + unique index on (org_id, plugin_name) • handlers/org_plugin_allowlist.go — resolveOrgID, checkOrgPluginAllowlist (fail-open on DB errors), GetAllowlist, PutAllowlist (atomic tx replace) • handlers/org_plugin_allowlist_test.go — 23 unit tests covering all handler paths, resolveOrgID, and all checkOrgPluginAllowlist branches • handlers/plugins_install.go — allowlist gate between resolveAndStage and deliverToContainer; returns 403 if plugin is blocked • router/router.go — GET/PUT /orgs/:id/plugins/allowlist under AdminAuth All tests pass; go build ./... clean; gosec Issues: 0 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-17 05:40:23 +00:00
Molecule AI Backend Engineer	f60c9df26f	feat(platform): per-workspace token tracking + GET /workspaces/:id/metrics (#593 ) Migration 026 adds workspace_token_usage table (uuid pk, workspace_id FK with CASCADE, period_start TIMESTAMPTZ, input_tokens, output_tokens, call_count, estimated_cost_usd NUMERIC(12,6), updated_at) with a UNIQUE index on (workspace_id, period_start) for day-granularity upserts. A2A proxy (proxyA2ARequest) now spawns a detached goroutine after each successful call to extractAndUpsertTokenUsage, which: 1. Parses usage.input_tokens / usage.output_tokens from result.usage (JSON-RPC wrapper) with fallback to top-level usage (direct Anthropic). 2. Calls upsertTokenUsage — INSERT ... ON CONFLICT DO UPDATE so multi- call days accumulate correctly. Estimated cost = input×$0.000003 + output×$0.000015 (Claude Sonnet default; adjustable in a later phase). Token tracking never blocks the critical A2A path. New endpoint: GET /workspaces/:id/metrics (wsAuth — WorkspaceAuth bearer bound to :id). Returns: {"input_tokens":N,"output_tokens":N,"total_calls":N, "estimated_cost_usd":"0.000000","period_start":"...","period_end":"..."} 404 if workspace missing. Period is current UTC day. 11 new tests (4 handler + 7 parse-unit); 19/19 packages pass. Closes #593 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-17 05:29:10 +00:00
Molecule AI Backend Engineer	c2891b5aba	feat(platform): AG-UI compatible SSE endpoint for streaming agent events (#590 ) - Add in-process SSE subscription mechanism to Broadcaster (SubscribeSSE, deliverToSSE) so both RecordAndBroadcast and BroadcastOnly fan out to SSE subscribers — critical because BroadcastOnly skips Redis pub/sub and would be invisible to a Redis-only subscriber (AGENT_MESSAGE, A2A_RESPONSE, TASK_UPDATED are all BroadcastOnly events). - Add handlers/sse.go: SSEHandler.StreamEvents sets text/event-stream headers, checks workspace existence (404 if missing), subscribes via broadcaster, and wraps each WSMessage in an AG-UI envelope: data: {"type":"<event>","timestamp":<unix_ms>,"data":{...}}\n\n - Register wsAuth.GET("/workspaces/:id/events/stream") behind existing WorkspaceAuth middleware — bearer token bound to :id. - Add 6 tests: Content-Type, initial ping, AG-UI format, workspace filter (cross-workspace events not leaked), 404 on missing workspace, multiple sequential events. All 19 packages pass. Build clean. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-17 05:16:51 +00:00
Molecule AI Backend Engineer	3e1e68004d	fix(security): add AdminAuth to /admin/workspaces/:id/test-token route Without middleware, any caller on a non-production instance could mint a bearer token for any workspace UUID with no authentication. AdminAuth is defence-in-depth: on a fresh install (no tokens yet) it is fail-open so the bootstrap path still works; once the first workspace enrolls a token all callers must present a valid bearer. Adds two router-level tests confirming the gate: - TestTestTokenRoute_RequiresAdminAuth_WhenTokensExist → 401 with no header - TestTestTokenRoute_FailOpenOnFreshInstall → 200 (bootstrap path intact) Env-var gating inside GetTestToken is retained as a second layer. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-17 02:48:00 +00:00
Hongming Wang	b0ec35e644	fix(auth): TenantGuard same-origin bypass for EC2 tenant Canvas On EC2 tenant instances, Caddy serves Canvas (:3000) and API (:8080) under the same domain. Canvas makes same-origin requests without X-Molecule-Org-Id or Fly-Replay-Src headers, causing TenantGuard to 404 every API route. - Add isSameOriginCanvas() as tertiary check in TenantGuard — when CANVAS_PROXY_URL is set and Referer/Origin matches Host, pass through. - Enhance isSameOriginCanvas() to also check Origin header (WebSocket upgrade requests send Origin but may not send Referer). - Add 3 new tests: Referer bypass, Origin bypass (WS), inactive without env. Fixes all 404s on /workspaces, /templates, /org/templates, /approvals/pending, /canvas/viewport, and /ws WebSocket on tenant EC2 instances. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-16 18:22:23 -07:00
molecule-ai[bot]	b1c976a54d	fix(github): refresh installation token when TTL < 10 min (#547 ) (#567 ) Root cause: the github-app-auth plugin injects GH_TOKEN + GITHUB_TOKEN into each workspace container's env at provision time (EnvMutator). Those are GitHub App installation tokens with a fixed ~60 min TTL. The plugin has an in-process cache that proactively refreshes 5 min before expiry — but the workspace env is set once at container start and never updated. Any workspace alive >60 min ends up with an expired token. Fix (Option B — on-demand endpoint): pkg/provisionhook: - Add TokenProvider interface: Token(ctx) (token, expiresAt, error) Lives in pkg/ (public) so the github-app-auth plugin can implement it. - Add Registry.FirstTokenProvider() — discovers the first mutator that also satisfies TokenProvider via interface assertion. Safe under concurrent reads (existing RWMutex). platform/internal/handlers/github_token.go: - New GitHubTokenHandler serving GET /admin/github-installation-token - Delegates to the registered TokenProvider (plugin cache — always fresh) - 404 if no GitHub App configured, 500 + [github] prefix log on error - Never logs the token itself platform/internal/handlers/workspace.go: - Add TokenRegistry() getter so the router can wire the handler without coupling to WorkspaceHandler internals platform/internal/router/router.go: - Register GET /admin/github-installation-token under AdminAuth workspace-template/: - scripts/molecule-git-token-helper.sh — git credential helper; calls the platform endpoint on every push/fetch; falls through to next helper (operator PAT) if platform unreachable - entrypoint.sh — configure the credential helper at startup Why Option B over Option A (background goroutine): - The plugin already has its own cache refresh; nothing to refresh here. - Pushing env updates into running containers requires docker exec, which the architecture explicitly rejects (issue #547 "Alternatives"). - Pull-based is stateless, trivially testable, zero extra goroutines. Closes #547 Co-authored-by: Molecule AI DevOps Engineer <devops-engineer@agents.moleculesai.app> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-17 00:47:03 +00:00
molecule-ai[bot]	d08f237de9	fix(platform): reject self-delegation to prevent _run_lock deadlock (#570 ) When a workspace delegated a task to itself, it would acquire _run_lock twice on the same goroutine mutex, blocking permanently. Add an early-return guard in `DelegationHandler.Delegate` that returns HTTP 400 {"error": "self-delegation not permitted"} as soon as sourceID == body.TargetID, before any DB or A2A work is done. Adds TestDelegate_SelfDelegation_Rejected to delegation_test.go. Closes #548 Co-authored-by: Molecule AI Backend Engineer <backend-engineer@agents.moleculesai.app> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-17 00:46:20 +00:00
molecule-ai[bot]	a360b64157	fix(platform): persist secrets envelope from POST /workspaces payload (#568 ) `CreateWorkspacePayload` was missing a `Secrets` field, so any `secrets: { KEY: value }` included in a POST /workspaces body was silently dropped by ShouldBindJSON. Changes: - Add `Secrets map[string]string` field to `CreateWorkspacePayload` - Wrap workspace INSERT in a DB transaction; iterate over secrets, encrypt each value via `crypto.Encrypt`, and upsert into `workspace_secrets` within the same tx — rollback both on any failure - Add `mock.ExpectBegin()`/`mock.ExpectCommit()`/`mock.ExpectRollback()` to all existing Create tests that were missing transaction expectations - Add 3 new tests: WithSecrets_Persists, SecretPersistFails_RollsBack, EmptySecrets_OK Closes #545 Co-authored-by: Molecule AI Backend Engineer <backend-engineer@agents.moleculesai.app> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-17 00:46:17 +00:00
Hongming Wang	737dd1999b	fix: restore cp_provisioner.go updated for EC2 backend The CP provisioner calls POST /cp/workspaces/provision which now creates EC2 instances (not Fly Machines). The tenant platform auto-activates this when MOLECULE_ORG_ID is set. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-16 14:25:43 -07:00
Molecule AI Backend Engineer	a84a33523c	fix(middleware): split CSP by route type — strict for API, permissive for canvas (#450 ) API routes return JSON and never need 'unsafe-inline' or 'unsafe-eval'. Serving those directives globally defeated the purpose of CSP and gave false security assurance. Canvas-proxied routes (NoRoute → Next.js) keep 'unsafe-inline' because React hydration requires it; 'unsafe-eval' was already absent and is confirmed unnecessary in production builds. Implementation: - Add isAPIPath() helper with an explicit prefix allowlist that mirrors the routes registered in router/router.go - Strict "default-src 'self'" on all /workspaces, /registry, /health, /admin, /metrics, /settings, /bundles, /org, /templates, /plugins, /webhooks, /channels, /ws, /events, /approvals paths - Permissive CSP (unsafe-inline, no unsafe-eval) on canvas/NoRoute paths - 4 new test functions: TestCSPAPIRoutesGetStrictPolicy (covers every prefix + sub-path), TestCSPCanvasRoutesGetPermissivePolicy, and TestIsAPIPath unit test including substring-non-match guard Resolves #450 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-16 20:26:17 +00:00
rabbitblood	3609b7ab8c	feat(platform): wire github-app-auth plugin for per-installation tokens Integrates github.com/Molecule-AI/molecule-ai-plugin-github-app-auth. When GITHUB_APP_ID is set, the platform constructs a plugin Authenticator at boot and registers it as an EnvMutator on the WorkspaceHandler. Every workspace provision then gets a fresh GITHUB_TOKEN / GH_TOKEN injected from the App's installation token (rotates ~hourly, refresh 5 min before expiry). Verified live this turn: - Platform boot log: `github-app-auth: registered, 1 mutator(s) in chain` - `docker exec ws-<id> gh auth status` → `Logged in as molecule-ai[bot] (GH_TOKEN)` - `gh issue list --repo Molecule-AI/molecule-core` returns real data (Hermes #498/#499/#500 visible from inside a workspace container) ## Changes - platform/go.mod + go.sum: new dep on the plugin - platform/cmd/server/main.go: import + conditional registration (soft-skip when GITHUB_APP_ID is unset for self-hosted/dev) - docker-compose.yml: pass GITHUB_APP_* env + bind-mount private key ## Drive-by .gitignore: exclude /org-templates /plugins /workspace-configs-templates — these dirs are populated locally by clone-manifest.sh from the standalone repos, should never be committed to core. Without this rule my previous git add -A staged 33 embedded git dirs. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-16 12:52:20 -07:00
Hongming Wang	b6e039cb49	fix: code review findings — dead code, DRY, rate limit, docs 1. Delete fly_provisioner.go — superseded by control plane architecture. Direct Fly provisioning from tenant was intentionally removed. 2. Extract loadWorkspaceSecrets() — shared by Docker + CP provisioner paths. Eliminates 30-line secret-loading duplication. 3. Token rate limit — max 50 active tokens per workspace. Returns 429 if exceeded. Prevents unbounded token creation by compromised client. 4. CLAUDE.md — add GET/POST/DELETE /workspaces/:id/tokens to route table. 5. .env.example — document MOLECULE_ORG_ID and CP_PROVISION_URL. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-16 12:04:37 -07:00
Hongming Wang	1ea615df4c	feat(platform): auto-detect SaaS tenant → control plane provisioner No env vars to configure. The platform auto-detects the backend: MOLECULE_ORG_ID set → SaaS tenant → control plane provisioner MOLECULE_ORG_ID empty → self-hosted → Docker provisioner The control plane URL defaults to https://api.moleculesai.app (override with CP_PROVISION_URL for testing). No FLY_API_TOKEN on the tenant. Removed: direct Fly provisioner (FlyProvisioner) — all SaaS workspace provisioning goes through the control plane which holds the Fly token and manages billing, quotas, and cleanup. Two backends: CPProvisioner (SaaS) and Docker Provisioner (self-hosted). Closes #494 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-16 11:50:52 -07:00
Hongming Wang	1949846001	fix(auth): allow nesting + delete from tenant canvas (same-origin) PATCH /workspaces/:id field-level auth for parent_id/tier/runtime required a bearer token, blocking canvas nesting (drag-to-nest). Added IsSameOriginCanvas check so the tenant canvas can update sensitive fields without a bearer. Exported IsSameOriginCanvas from middleware package so workspace.go can call it for the field-level auth path. DELETE /workspaces/:id is behind AdminAuth which already has the same-origin check — if delete still fails, it's a different issue. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-16 11:22:45 -07:00
Hongming Wang	7160d1a1a8	feat(platform): Fly Machines provisioner for SaaS workspace deployment When CONTAINER_BACKEND=flyio, workspaces are provisioned as Fly Machines instead of local Docker containers. This enables workspace deployment on SaaS tenants where no Docker daemon is available. New files: - provisioner/fly_provisioner.go: FlyProvisioner with Start/Stop/ IsRunning/Restart/Close via Fly Machines API (api.machines.dev/v1) - FlyRuntimeImages maps runtimes to GHCR image tags Changes: - main.go: select Docker vs Fly based on CONTAINER_BACKEND env var - workspace.go: SetFlyProvisioner() setter, Create checks flyProv first - workspace_provision.go: provisionWorkspaceFly() loads secrets, calls FlyProvisioner.Start, issues auth token for the new machine Env vars for Fly backend: - CONTAINER_BACKEND=flyio (activates Fly provisioner) - FLY_API_TOKEN (Fly deploy token) - FLY_WORKSPACE_APP (Fly app name for workspace machines) - FLY_REGION (default: ord) Closes #494 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-16 10:51:15 -07:00
Hongming Wang	96b909b8f3	fix: code review findings — token UI, auth hardening, WS dedup 1. Settings panel: wire TokensTab into "API Tokens" tab (was imported but not rendered). Rename "API Keys" → "Secrets", add "API Tokens" tab. Fix docs link → doc.moleculesai.app/docs/tokens. 2. Referer match hardening: require exact host match or trailing slash to prevent evil.com subdomain bypass. Cache CANVAS_PROXY_URL at init time instead of per-request os.Getenv. 3. Extract shared deriveWsBaseUrl() to lib/ws-url.ts — eliminates duplicate 12-line derivation in socket.ts and TerminalTab.tsx. 4. Token list pagination: add ?limit= and ?offset= params (default 50, max 200) to GET /workspaces/:id/tokens. 507/507 canvas tests pass, Go build + vet clean. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-16 10:42:26 -07:00
Hongming Wang	c4b56c6c84	fix(auth): allow same-origin canvas requests through WorkspaceAuth on tenant WorkspaceAuth only accepted bearer tokens, blocking the canvas from calling per-workspace routes (restart, config, secrets, chat) on the tenant image where canvas + API share the same origin. Added isSameOriginCanvas() fallback (same check used by AdminAuth): checks Referer matches request Host, gated behind CANVAS_PROXY_URL so only tenant deployments are affected. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-16 10:06:33 -07:00
Hongming Wang	25bd9241d1	fix(tenant): WebSocket URL derivation + AdminAuth same-origin for tenant image Two bugs on the combined tenant image (canvas + API same-origin): 1. WebSocket URL: NEXT_PUBLIC_WS_URL="" (empty string for same-origin) was preserved by ?? operator, producing an invalid WS URL. Now derives from window.location when both env vars are empty. Same fix applied to TerminalTab. 2. AdminAuth blocking canvas: same-origin requests have no Origin header, so neither AdminAuth nor CanvasOrBearer could authenticate the canvas. Added isSameOriginCanvas() that checks Referer against request Host, gated behind CANVAS_PROXY_URL (only active on tenant image). This lets the canvas create/list workspaces, view events, etc. without a bearer token when served from the same Go process. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-16 08:43:01 -07:00
Hongming Wang	de9f3d179c	feat(platform): token management API + MCP setup + external agent guide 1. Token Management API (closes production gap): - GET /workspaces/:id/tokens — list tokens (prefix + metadata, never plaintext) - POST /workspaces/:id/tokens — create new token (plaintext returned once) - DELETE /workspaces/:id/tokens/:tokenId — revoke specific token - Behind WorkspaceAuth middleware (need existing token to manage tokens) - Tests skip gracefully when no DB available 2. MCP Server Setup: - Fix .mcp.json to use npx @molecule-ai/mcp-server (was referencing non-existent local ./mcp-server/dist/index.js) - Add comprehensive tool→API mapping doc (87 tools across 15 categories) 3. External Agent Registration Guide: - Step-by-step: create workspace, register, heartbeat, A2A messaging - Python (Flask) and Node.js (Express) complete working examples - Communication rules, lifecycle, security, troubleshooting 4. Token Management Guide: - Bootstrap flow, rotation procedure, security properties Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-16 08:37:42 -07:00
Hongming Wang	206564c90b	Merge pull request #483 from Molecule-AI/fix/platform-modular-template-support fix(platform): unblock org-template imports against modular workspace templates	2026-04-16 07:55:26 -07:00
rabbitblood	ff2394c085	fix(platform): unblock org-template imports against modular workspace templates Two adjacent fixes that surfaced trying to bring the molecule-dev org template back up against the new standalone workspace-template-* repos. 1) handlers/org.go — expand ${VAR} in workspace_dir before validation. The molecule-dev pm/workspace.yaml (and any operator's per-host binding) ships `workspace_dir: ${WORKSPACE_DIR}` so each operator can pick the host path PM bind-mounts. Without expansion the literal "${WORKSPACE_DIR}" string reaches validateWorkspaceDir and fails with "must be an absolute path", aborting the whole org import. Other fields (channel config, prompts) already go through expandWithEnv; workspace_dir was the last hold-out. 2) provisioner/provisioner.go — inject PYTHONPATH=/app for every workspace container. Standalone template Dockerfiles COPY adapter.py to /app and set ENV ADAPTER_MODULE=adapter, but molecule-runtime is a pip console_script entry point so cwd isn't on sys.path automatically. Setting PYTHONPATH here fixes every adapter image at once instead of needing 8 PRs against template repos. Operator override still wins (workspace EnvVars are appended after, so Docker takes the later duplicate). Note: this unblocks the import path but does NOT make claude-code / hermes / etc. boot. The runtime itself has a separate top-level `from adapters import` that breaks against modular templates — tracked at workspace-runtime#1. Tests: TestBuildContainerEnv_InjectsPYTHONPATH + TestBuildContainerEnv_WorkspaceEnvVarsCanOverridePYTHONPATH lock the default + operator-override invariants. expandWithEnv is already covered by TestExpandWithEnv_* — the workspace_dir use site is a one-line call to that primitive. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-16 07:49:45 -07:00
rabbitblood	e7710d2e6f	feat(channels): Lark / Feishu adapter (outbound webhook + Events API inbound) New ChannelAdapter implementation for Lark (international, open.larksuite.com) and Feishu (China, open.feishu.cn). Both speak the same payload format — only the host differs — so a single adapter covers both. Outbound: POST text to a Custom Bot webhook URL with msg_type:"text". Lark returns 200 OK even when delivery fails — the body's `code` field is the truth. Adapter parses the response and returns a Go error when code != 0 so callers don't think a revoked-webhook send succeeded. Inbound: handles both v1 url_verification (handshake) and v2 event_callback (im.message.receive_v1) shapes. Optional verify_token field — when set, inbound payloads with mismatching tokens are rejected via constant-time compare (#337 class — never raw == against a stored secret). Sender ID resolution prefers user_id → falls back to open_id (open_id is always present; user_id only when the bot has the contacts permission). Non-text message types and non-message events return nil, nil so the receiver responds 200 OK without dispatching. Tests: 23 cases — identity, ValidateConfig (6 sub-cases incl. URL prefix matrix), SendMessage (no URL / invalid prefix / happy-path body shape / api-error-code surfacing), ParseWebhook (handshake + token mismatch + text message + open_id fallback + non-message + non-text + token mismatch + malformed JSON + malformed content + empty text), StartPolling no-op, registry presence. Also: make migration 023 idempotent (ADD COLUMN IF NOT EXISTS) — the platform's migration runner has no schema_migrations tracking table, so every .up.sql replays on every boot. Without IF NOT EXISTS the second boot against an existing volume crashes with "column already exists". Followup issue to be filed for proper migration tracking. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-16 07:10:58 -07:00
rabbitblood	e08f28c962	feat(platform): provision-time env mutator hook for plugins Add `provisionhook.EnvMutator` extension point so out-of-tree plugins (e.g. github-app-auth, vault-secrets) can inject or override env vars right before container Start, without forking core or piling more provider-specific code into the handlers package. WorkspaceHandler gains an optional `envMutators provisionhook.Registry` wired in via SetEnvMutators during boot. The hook fires after built-in secret loads + per-agent git identity, so plugins can both read what's already there and override anything they own (GIT_AUTHOR_, GITHUB_TOKEN). A nil registry is a no-op via Registry.Run's nil-receiver branch — keeps the hot path a single nil compare and means existing flows stay green even with zero plugins registered. Mutator failure aborts provisioning and marks the workspace failed with the wrapped error in last_sample_error. Failing fast surfaces the cause to the operator instead of letting an agent boot into opaque "git push 401" loops it can never recover from on its own. Tests cover ordered execution, chained env visibility, first-error abort, nil-receiver no-op, nil-mutator drop, registration order, and concurrent register-vs-run safety (-race clean). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-16 06:47:09 -07:00
Security Auditor	284fb26558	fix(security): YAML-quote skill/prompt names in generateDefaultConfig + opaque file-write errors Closes #460, #461. #460 — YAML injection via unquoted skill/prompt filenames `generateDefaultConfig` extracted skill directory names and prompt file names from user-supplied `body.Files` keys and wrote them directly into YAML list items without quoting: cfg.WriteString(" - " + s + "\n") `validateRelPath` only blocks path traversal (`../`); it does NOT block YAML control characters including newlines. On Linux, filenames can contain newlines, so an attacker with any live workspace bearer token could submit: {"files": {"skills/legit\nruntime: malicious/SKILL.md": "# skill"}} The generated config.yaml would then contain `runtime: malicious` as a top-level YAML key, overriding the runtime for workspaces provisioned from the template. Fix: extract `yamlEscape` as a reusable local from the same `strings.NewReplacer` already used for the `name` field (#221) and apply it to both the `skills:` and `prompt_files:` list items, wrapping each in double-quotes. #461 — Docker error details in ReplaceFiles 500 responses `ReplaceFiles` returned `fmt.Sprintf("failed to write files: %v", err)` in two 500 paths, where `err` comes from Docker API calls and may include internal container names, volume names, and daemon error messages. Fix: log the full error server-side and return a static opaque string to the caller. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-16 05:40:45 -07:00
Hongming Wang	8b13fff355	fix(test): wrap httptest.ResponseRecorder with CloseNotify for canvas proxy tests httputil.ReverseProxy calls CloseNotify() which httptest.ResponseRecorder doesn't implement. Gin casts the writer, causing a panic. Added a closeNotifyRecorder wrapper with a no-op channel.	2026-04-16 05:40:17 -07:00
Molecule AI Backend Engineer	eec59fe63b	fix(auth): inject fresh bearer token into config volume on every provision (closes #418 ) Container rebuild or volume wipe caused workspaces to lose /configs/.auth_token. On re-registration the platform returned no auth_token (HasAnyLiveToken==true → no re-issue), leaving the workspace unable to authenticate any subsequent API call. Fix: provisionWorkspaceOpts now calls issueAndInjectToken before Start(). This revokes any existing live tokens (plaintext is irrecoverable from the stored hash, so rotation is the only safe path) and issues a fresh token that is written into cfg.ConfigFiles[".auth_token"]. WriteFilesToContainer delivers it to /configs immediately after ContainerStart, racing safely ahead of the Python adapter's 1-2s startup time. Failure modes are soft: revoke or issue errors skip injection with a warning; provisioning continues and the workspace recovers on the next restart. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-16 05:26:10 -07:00
Hongming Wang	7fca9723a0	Merge pull request #467 from Molecule-AI/feat/slack-webhook-validation [Backend Engineer] feat(channels): Slack adapter with webhook URL validation (#384)	2026-04-16 05:22:47 -07:00
Hongming Wang	d6e7784f11	Merge pull request #469 from Molecule-AI/feat/per-channel-budget [Backend Engineer] feat(channels): per-channel message budget with 429 enforcement (#368)	2026-04-16 05:22:39 -07:00
Hongming Wang	6c374833b0	Merge pull request #457 from Molecule-AI/fix/issue-451-strip-auth-header-canvas-proxy [Backend Engineer] fix(security): strip Authorization + Cookie in canvas reverse proxy	2026-04-16 05:17:01 -07:00
Hongming Wang	1184232d86	Merge pull request #446 from Molecule-AI/fix/issue-435-registry-error-leak fix(security): suppress raw DB error from /registry/register response	2026-04-16 05:16:57 -07:00
Hongming Wang	370fb151b2	Merge pull request #465 from Molecule-AI/fix/memory-recall-flood-limit [Backend Engineer] fix(memories): hard cap of 50 on recall results (#377)	2026-04-16 05:16:49 -07:00
Hongming Wang	74e4f30216	fix: address all code review findings + remove exposed secrets Code review fixes: - 🟡 #1: Replace python3 with jq in Dockerfile template stages (~50MB → ~2MB) - 🟡 #2: Add clone count verification to scripts/clone-manifest.sh (set -e + expected vs actual count check — fails build if any clone fails) - 🟡 #3: Drop 'unsafe-eval' from CSP (not needed for Next.js production standalone builds, only dev mode). Updated test assertion. - 🟡 #4: Remove broken pyproject.toml from workspace-template/ (it claimed to package as molecule-ai-workspace-runtime but the directory structure didn't match — the real package ships from the standalone repo) - 🔵 #1: Add version-pinning TODO comment to manifest.json - 🔵 #3: Add full repo URLs + test counts for SDK/MCP/CLI/runtime in CLAUDE.md Security (GitGuardian alert): - Removed Telegram bot token (8633739353:AA...) from template-molecule-dev pm/.env — replaced with ${TELEGRAM_BOT_TOKEN} placeholder - Removed Claude OAuth token (sk-ant-oat01-...) from template-molecule-dev root .env — replaced with ${CLAUDE_CODE_OAUTH_TOKEN} placeholder - Both tokens need immediate rotation by the operator Tests: Platform middleware tests updated + all pass.	2026-04-16 05:05:49 -07:00
Molecule AI Backend Engineer	b021f85af9	feat(channels): per-channel message budget with 429 enforcement (#368 ) Add an optional channel_budget (INTEGER, nullable) to workspace_channels via migration 024. When channel_budget IS NOT NULL and message_count has reached the budget, the Send handler returns 429 {"error":"channel budget exceeded"} and aborts before calling SendOutbound. Implementation details: - Single SELECT query reads both message_count and channel_budget in one round-trip (avoids TOCTOU window between read and write) - Fail-open on DB error: transient failures log but don't block sends - Early-return on budget hit is before SendOutbound so message_count cannot be incremented past the limit by a concurrent send that slips through the window (best-effort; atomic enforcement requires DB-level CAS) - NULL channel_budget = unlimited (default, backward-compatible) Migration is idempotent (ADD COLUMN IF NOT EXISTS). Down migration drops the column cleanly. Four sqlmock tests cover: at-limit → 429, above-limit → 429, NULL budget passes through, under-limit passes through. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-16 11:17:14 +00:00
Molecule AI Backend Engineer	68c9b37048	feat(channels): add Slack adapter with webhook URL validation (#384 ) Implement SlackAdapter satisfying the ChannelAdapter interface: - ValidateConfig: rejects any webhook_url that doesn't start with https://hooks.slack.com/ — returns "invalid Slack webhook URL" so the handler surfaces 400 {"error":"invalid config: invalid Slack webhook URL"} - SendMessage: HTTP POST JSON {"text":"..."} to the webhook URL with a 10s timeout; rejects invalid-prefix URLs at send time too (defence in depth) - ParseWebhook: handles both slash-command (form-encoded) and Events API (JSON) payloads; no-ops on url_verification and non-message events - StartPolling: returns nil immediately (Slack doesn't support polling via Incoming Webhooks) Register "slack" in the adapter registry. Twelve unit tests cover Type/DisplayName, happy-path validation, every bad-URL variant (wrong scheme, wrong host, SSRF lookalike, empty string), empty webhook in SendMessage, StartPolling nil return, and registry lookup/listing. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-16 11:14:31 +00:00
Hongming Wang	8e304e69e8	chore: remove extracted directories, add manifest-driven Docker builds Remove plugins/, workspace-configs-templates/, org-templates/ dirs (now in standalone repos). Add manifest.json listing all 33 repos and scripts/clone-manifest.sh to clone them. Both Dockerfiles now use the manifest script instead of 33 hardcoded git-clone lines. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-16 04:13:29 -07:00
Molecule AI Backend Engineer	6fb4b7b282	fix(memories): add hard cap of 50 on recall results (#377 ) Introduce `memoryRecallMaxLimit = 50` constant and honour the `?limit=N` query parameter in Search. Values above 50 are silently clamped to 50; absent or invalid values default to 50. The LIMIT clause is now a parameterised argument (nextArg pattern) instead of a hardcoded literal. Three sqlmock tests verify the cap, the explicit limit, and the default. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-16 11:12:35 +00:00
Molecule AI Backend Engineer	479b172b25	fix(security): strip Authorization + Cookie headers in canvas reverse proxy (closes #451 ) The canvas proxy was forwarding all headers verbatim to the Next.js process. Workspace bearer tokens sent by agents (e.g. during an A2A call that hit a canvas-side route) could reach unvalidated Next.js handlers and be echoed back to an attacker via an error page or a debug endpoint. Fix: Director now calls Header.Del("Authorization") + Header.Del("Cookie") before forwarding. Non-credential headers (Accept, X-Request-Id, etc.) are unaffected — the strip is surgical. Four unit tests added (strips Authorization, strips Cookie, forwards other headers, strips both simultaneously). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-16 11:00:43 +00:00
Backend Engineer	b0381d656c	fix(security): registry DB errors must not leak raw driver messages (closes #435 ) The Register handler was serialising the raw Go error into the HTTP response: c.JSON(500, gin.H{"error": fmt.Sprintf("failed to register: %v", err)}) PostgreSQL errors wrapped by lib/pq contain table names, constraint names, and driver-version strings — enough for a caller to fingerprint the schema and craft targeted attacks. The error is already logged at full detail with Printf before this line, so callers only need the generic message. Fix: replace the Sprintf with a static "registration failed" string (same pattern the heartbeat and update-card handlers already used). New test: TestRegister_DBErrorResponseIsOpaque verifies the response body is the opaque string and that "sql:", "pq:", and "connection" substrings are absent. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-16 10:34:35 +00:00
Backend Engineer	2451b1acc0	fix(provisioner): rebuild_config flag on restart recovers from destroyed config volume (closes #239 ) When a workspace container AND its /configs Docker volume are both destroyed, the restart handler previously had no recovery path — findTemplateByName searched only the top-level configsDir, which holds workspace-instance dirs (ws-{id[:12]}/), not the role-named org-template source directories. Fix: add `rebuild_config: true` to the POST /workspaces/:id/restart body struct. When set, the handler falls back to searching configsDir/org-templates/ via the existing findTemplateByName logic (which already handles name normalisation and config.yaml name-field matching). The workspace can then self-recover with its own bearer token — no admin intervention required. New helper: resolveOrgTemplate(configsDir, wsName) — pure function, independently tested (4 cases: hit-by-dir, hit-by-config-yaml, no org-templates dir, no match). Usage: curl -X POST -H "Authorization: Bearer $(cat /configs/.auth_token)" \ -d '{"rebuild_config": true}' \ http://platform:8080/workspaces/$WORKSPACE_ID/restart Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-16 10:34:25 +00:00
Hongming Wang	3e50b95800	Merge pull request #433 from Molecule-AI/feat/externalize-prompts-phase4 feat(org-templates): Phase 4 — atomize each role to <role>/workspace.yaml	2026-04-16 03:19:43 -07:00
Hongming Wang	c545e3a276	Merge pull request #417 from Molecule-AI/feat/memory-checkpoint-reconciliation feat(memory): optimistic-locking via if_match_version on workspace_memory writes	2026-04-16 03:18:09 -07:00
rabbitblood	40a69d6f87	feat(org-templates): Phase 4 — atomize each role to <role>/workspace.yaml Part 4 of 4 — terminal step of the org.yaml scalability refactor. Each role in the molecule-dev template now owns its own workspace.yaml file, colocated with the existing system-prompt.md / initial-prompt.md / idle-prompt.md / schedules/.md. Team files shrink to a leader's own definition plus a list of !include refs. ## Platform change `resolveYAMLIncludes` now uses a TWO-ROOT model: - Path resolution is relative to the INCLUDING file's directory (natural sibling + cousin refs, C-include / Sass @import convention). - Security bound is the ORIGINAL org root (`rootDir`), preserved across all recursion depths. Sibling-dir refs like `../my-role/workspace.yaml` from a team file are now allowed (they stay inside the org template); refs that escape the root still error. Regression coverage: new `TestResolveYAMLIncludes_SiblingDirAccess` reproduces the Phase 4 pattern (team file at `teams/x.yaml` referencing `../<role>/workspace.yaml`) — fails without the fix, passes with. ## Template change Atomized 15 child workspaces across 3 team files: - `teams/research.yaml`: 58 → 30 lines; 3 children now !include refs - `teams/dev.yaml`: 222 → 38 lines; 6 children now !include refs - `teams/marketing.yaml`: 143 → 28 lines; 6 children now !include refs Each role now has `<role>/workspace.yaml` colocated with its prompts. Example `frontend-engineer/` directory: frontend-engineer/ ├── workspace.yaml (24 lines — name/role/tier/canvas/plugins/...) ├── system-prompt.md (from earlier phases) ├── initial-prompt.md ├── idle-prompt.md └── (no schedules for this role — but if added, schedules/<slug>.md) ## File-size progression across all 4 phases \| State \| org.yaml \| total `.yaml` in tree \| \|---\|---:\|---:\| \| Before (main) \| 1801 lines / 108 KB \| 1801 / 108 KB (one file) \| \| After Phase 1 (#389) \| 1687 \| 1687 / 101 KB \| \| After Phase 2 (#390) \| 676 \| 676 / 35 KB \| \| After Phase 3 (#393) \| 114 \| 683 (1 + 6 teams) / 33 KB \| \| After this PR* \| 114 \| ~698 (1 + 6 + 15 workspace) / 35 KB \| Aggregate size is flat — the decrease came from prompt externalization in Phases 1/2; Phases 3/4 reorganize structure without adding content. The win is readability and ownership: - Every individual file fits on 1-2 screens. - Adding a new role is now: create `<role>/` dir, add `workspace.yaml` + `system-prompt.md` + prompts, add ONE `!include` line to the team file. No touching of aggregated mega-YAML. - Team files can be reviewed + merged independently. ## Tests All 10 `TestResolveYAMLIncludes_*` tests pass, including the real-template integration test (`TestResolveYAMLIncludes_RealMoleculeDev`) which now walks org.yaml → teams/pm.yaml → teams/research.yaml → ../market-analyst/ workspace.yaml and validates the full 21-role tree unmarshals cleanly. Plus all existing `TestResolvePromptRef` + `TestOrgYAML` + `TestInitialPrompt` suites stay green. ## Ops followup After merging all 4 phases and deploying, the `POST /org/import` endpoint should produce a workspace tree byte-identical to the pre-refactor state. Verify with: diff <(curl POST /org/import before) <(curl POST /org/import after) or by spot-checking: - `/configs/config.yaml` bodies across all 21 workspaces - `workspace_schedules.prompt` row values The externalization is lossless — YAML literal to file and back recovers the same string modulo trailing-whitespace normalization. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-16 03:09:56 -07:00
Hongming Wang	0c73810121	Merge pull request #404 from Molecule-AI/feat/externalize-prompts-phase3 feat(org-templates): Phase 3 — !include directive + split org.yaml into team files	2026-04-16 03:08:01 -07:00
Hongming Wang	db22b5d853	Merge pull request #413 from Molecule-AI/fix/isrunning-distinguish-notfound fix(provisioner): IsRunning conservative on daemon errors to stop restart cascade	2026-04-16 03:07:54 -07:00
Hongming Wang	1e43e45de7	Merge pull request #402 from Molecule-AI/feat/per-agent-git-identity feat(provisioner): per-agent git identity via GIT_AUTHOR_* env vars	2026-04-16 03:07:50 -07:00
rabbitblood	7debdb1676	fix(tests): CSP test now fragment-matches instead of exact-matches SecurityHeaders middleware widened its CSP to allow Next.js inline scripts + data:/blob: images (platform/internal/middleware/securityheaders.go:44, canvas is reverse-proxied through the gin stack so it needs the permissive policy). The two CSP asserts in securityheaders_test.go still hard-compared against the old tight `default-src 'self'`, so they fail on main as of this afternoon. Fix: assert each expected CSP fragment is PRESENT in the header (substring match) instead of byte-for-byte equality. Test intent is "CSP is set, starts with tight default-src, contains the expected directives" — not "CSP matches this exact string". Future subsource tuning (add a new CDN, bump blob:/data: scope) won't re-break this test. Caught because every PR touching anything in the monorepo currently fails the Platform (Go) CI job on these two asserts. Fixing on a dedicated branch so it can land ahead of every blocked PR in the queue. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-16 02:59:06 -07:00
Hongming Wang	b8cb14f46e	feat(tenant): combined platform + canvas Docker image with reverse proxy Single-container tenant architecture: Go platform (:8080) + Canvas Node.js (:3000) in one Fly machine, with Go's NoRoute handler reverse- proxying non-API routes to the canvas. Browser only talks to :8080. Changes: platform/Dockerfile.tenant — multi-stage build (Go + Node + runtime). Bakes workspace-configs-templates/ + org-templates/ into the image. Build context: repo root. platform/entrypoint-tenant.sh — starts both processes, kills both if either exits. Fly health check on :8080 covers the Go binary; canvas health is implicit (proxy returns 502 if canvas is down). platform/internal/router/canvas_proxy.go — httputil.ReverseProxy that forwards unmatched routes to CANVAS_PROXY_URL (http://localhost:3000). Activated by NoRoute when CANVAS_PROXY_URL env is set. platform/internal/router/router.go — wire NoRoute → canvasProxy when CANVAS_PROXY_URL is present; no-op otherwise (local dev unchanged). platform/internal/middleware/securityheaders.go — relaxed CSP to allow Next.js inline scripts/styles/eval + WebSocket + data: URIs. The strict `default-src 'self'` was blocking all canvas rendering. canvas/src/lib/api.ts — changed `\|\|` to `??` for NEXT_PUBLIC_PLATFORM_URL so empty string means "same-origin" (combined image) instead of falling back to localhost:8080. canvas/src/components/tabs/TerminalTab.tsx — same `??` fix for WS URL. Verified: tenant machine boots, canvas renders, 8 runtime templates + 4 org templates visible, API routes work through the same port. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-16 02:46:47 -07:00
rabbitblood	73171532a1	feat(memory): optimistic-locking via if_match_version on workspace_memory writes Closes the silent-overwrite hole where two agents racing a read-modify- write on the same memory key left only one agent's update. Relevant for orchestrators (PM, Dev Lead, Marketing Lead) keeping structured running state (delegation-result ledgers, task queues) in memory, and for the ``research-backlog:`` keys that multiple idle loops write in parallel. ## Semantics ### Back-compat path (no if_match_version) Unchanged: ``INSERT ... ON CONFLICT UPDATE`` last-write-wins. Every existing agent tool, every existing ``commit_memory`` call, every existing cron that writes memory — all continue to work with no edit. ### Optimistic-lock path (if_match_version set) 1. Client calls ``GET /memory/:key`` → ``{value, version: V}`` 2. Client modifies value locally 3. Client ``POST /memory {key, value, if_match_version: V}`` 4. Server: ``UPDATE ... WHERE version = V`` + RETURNING new version 5. On match → 200 + ``{version: V+1}`` 6. On mismatch → 409 + ``{expected_version: V, current_version: <actual>}`` 7. Client reads the actual version and retries. ### Create-only marker ``if_match_version: 0`` means "create iff the key doesn't exist yet". Two agents simultaneously seeding a shared key will see exactly one success + one 409 — no silent collision, no duplicate-init work. ### Schema Migration 023 adds ``version BIGINT NOT NULL DEFAULT 1``. Existing rows baseline at 1. New rows start at 1. Every successful write (both paths) increments: ``version = version + 1`` on update, ``1`` on insert. ## Why version, not updated_at ``updated_at`` has second-granularity and can collide between concurrent writers on a fast clock. A monotonic counter is collision-free and more readable in the 409 response body ("expected 5, current is 7 — you missed 2 writes" tells an agent exactly what to re-read). ## Why ``if_match_version`` and not an ETag header JSON field keeps it in the request body, visible alongside the value payload. Agents assembling requests programmatically don't have to remember to thread a header through their HTTP client wrapper; the existing ``commit_memory`` tool can grow one optional kwarg and match the existing signature shape. ## Tests 11 memory-handler cases covering every path: - GET list / get (with version in response shape) - Set with no version (back-compat upsert, returns new version) - Set with if_match_version match (happy path, increment) - Set with if_match_version mismatch (409 + expected/current fields) - Set with if_match_version=0 on absent key (create-only success) - Set with if_match_version=N on absent key (409 — caller's mental model is wrong) - Bad inputs (missing key, malformed JSON) - Delete happy + error path Full ``go test ./internal/handlers/`` green. ## Follow-up (not in this PR) - Workspace-template tool update: ``commit_memory(content, , if_match_version=None)`` surfaces the new option + on 409 surfaces the current_version so agents can retry without manual re-read. - Named checkpoints table (``workspace_checkpoints``) for durable orchestrator state snapshots. Different concern than per-key locking; separate PR. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-16 02:32:46 -07:00
rabbitblood	8bf27ae1d0	fix(provisioner): IsRunning conservative on daemon errors to stop restart cascade Root cause of the 2026-04-16 09:10 UTC six-container restart cascade. ## Timeline 09:10:26 — PM sent a batch delegation to 15+ agents (Dev Lead coordinating). 09:10:26-27 — 4 leaders/auditors (Security, RL, BE, DevOps) simultaneously hit "workspace agent unreachable — container restart triggered" even though their containers were running fine. Another 2 (DL, UIUX) tripped in the next few seconds. 09:10:27 — Provisioner stopped + recreated 6 containers in parallel. A2A callers got EOFs, PM's batch coordination stalled. ## Root cause `provisioner.IsRunning` collapsed every ContainerInspect error into `(false, nil)`, including transient Docker daemon hiccups: func IsRunning(...) (bool, error) { info, err := p.cli.ContainerInspect(ctx, name) if err != nil { return false, nil // Container doesn't exist ← MISREAD } return info.State.Running, nil } The comment said "Container doesn't exist" but the error was actually any of: daemon timeout, socket EOF, context deadline, connection refused. Under load (batch delegation fan-out → 15 concurrent HTTP inbound → 15 concurrent Claude Code subprocesses → Docker daemon CPU pressure), ContainerInspect calls started failing transiently. All 6 calls returned `(false, nil)`. Caller `maybeMarkContainerDead` treated `running=false` as "container is dead, restart it" → six parallel restarts. This was exactly the destructive-on-error pattern we keep trying to kill (see #160 SDK-stderr-probe, #318 fail-open classes). ## Fix `IsRunning` now distinguishes NotFound from transient errors: - Legitimately missing container (caller deleted, Docker pruned) → `(false, nil)` — safe to act on; caller marks dead + restarts. - Any other error (daemon timeout, socket issue, context deadline) → `(true, err)` — caller stays on the alive path. The transient error is preserved so metrics + logging still see it, but it does NOT trigger the destructive restart branch. `isContainerNotFound` matches on error-message substring — same approach docker/cli uses internally — to avoid pulling in errdefs as a direct dep. Truth table tests in `isrunning_test.go` cover 8 cases: NotFound variants (real + generic), nil, empty, and the 4 transient- error shapes we've actually observed (deadline, EOF, connection-refused, i/o timeout). ## Caller update `maybeMarkContainerDead` in a2a_proxy.go now logs the transient inspect error (was silently discarded via `_`). Visibility without destructiveness. If this error becomes persistent, we'll see it in platform logs rather than diagnosing after another restart cascade. ## Expected impact - Zero restart cascades from the current class of transient inspect errors (EOF, timeout, connection refused). - Dead containers still detected within the A2A layer because an actual stopped container returns NotFound on inspect, and the TTL monitor (180s post #386) catches anything that slips through. - New visibility in platform logs when inspect has trouble — previously silent. Combined with the TTL fix in #386, the defense-in-depth on spurious restart is now: 1. IsRunning only returns false for real NotFound 2. Liveness TTL is 180s, surviving 5+ missed heartbeats 3. A2A proxy 503-Busy path retries with backoff before touching restart logic at all Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-16 02:21:25 -07:00
Hongming Wang	51e3393ec0	fix(ops): bake workspace-configs-templates into platform Docker image Tenant machines were booting with no templates because the Dockerfile only shipped the Go binary + migrations. The canvas showed "0 templates" with an empty picker. Changes: - platform/Dockerfile: build context changed from ./platform to repo root so COPY can reach workspace-configs-templates/ alongside the Go source. COPY paths updated for platform/{go.mod,go.sum,*.go} and platform/migrations/. - .github/workflows/publish-platform-image.yml: context: . (was ./platform), paths trigger now includes workspace-configs-templates/ so template changes rebuild the image. Phase A of the template-registry plan. Phase B adds a DB registry + on-demand fetch for community templates (user pastes GitHub URL at workspace creation time). The baked defaults always ship in the image for zero-config tenant boot. Verified: `docker build -f platform/Dockerfile -t test .` succeeds, `docker run --rm test ls /workspace-configs-templates/` shows all 8 templates (autogen, claude-code-default, crewai, deepagents, gemini-cli, hermes, langgraph, openclaw). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-16 01:54:47 -07:00
rabbitblood	112c28d885	feat(org-templates): Phase 3 — !include directive + split org.yaml into team files Part 3 of 4 in the scalability refactor. Adds YAML `!include` support to the org importer and splits molecule-dev/org.yaml (676 lines post- Phase 2) into 6 team / role files; top-level org.yaml drops to 114 lines of pure scaffolding. ## Platform changes New `platform/internal/handlers/org_include.go`: - `resolveYAMLIncludes(data, baseDir)` — pre-processes a YAML document, expanding any scalar tagged `!include <path>` with the parsed content of the referenced file. - Path resolution via `resolveInsideRoot` so a crafted `!include ../../etc/passwd` can't escape the org template directory (same defense the existing `files_dir` copy uses). - Nested includes supported: each included file carries its own search root (its directory), so `teams/pm.yaml` with `!include research.yaml` resolves to `teams/research.yaml` — matching the convention of C-include / Sass @import / most package systems. - Cycle detection via visited-set keyed on absolute path; belt-and- braces `maxIncludeDepth = 16` cap in case symlinks or path normalization defeats the set. - Inline-template mode (POST /org/import with raw JSON body, no `dir`) errors cleanly when a file ref is used — can't resolve without a base. Wired into both `ListTemplates` (so /org/templates shows an accurate workspace count after the split) and `Import` (expansion happens before unmarshal into OrgTemplate). ## Template changes molecule-dev/org.yaml now contains only: - name + description - defaults (runtime, plugins, category_routing, initial_prompt text) - `workspaces: [!include teams/pm.yaml, !include teams/marketing.yaml]` New files: - `teams/pm.yaml` — PM top-level, children are !include refs - `teams/research.yaml` — Research Lead + Market Analyst + Technical Researcher + Competitive Intelligence (inline children) - `teams/dev.yaml` — Dev Lead + FE/BE/DevOps/Security/QA/UIUX (inline) - `teams/marketing.yaml` — Marketing Lead + DevRel/PMM/Content/ Community/SEO/Social (inline) - `teams/documentation-specialist.yaml` — leaf - `teams/triage-operator.yaml` — leaf ## File-size impact \| State \| org.yaml lines \| total config size \| \|---\|---:\|---:\| \| Before (main) \| 1801 \| 108 KB \| \| After Phase 1 (#389) \| 1687 \| 101 KB \| \| After Phase 2 (#390) \| 676 \| 35 KB \| \| After this PR \| 114 \| 4 KB (org.yaml only) \| With the 6 team files (total ~570 lines of structural yaml), every file is now under 230 lines and individually readable without scrolling past a single team's boundaries. ## Tests `platform/internal/handlers/org_include_test.go` — 9 cases: - Flat include (single file, single workspace) - Nested include (file → file → file) - Traversal rejection (`../secret.yaml`, `../../secret.yaml`) - Cycle detection (a↔b) - Empty path error - Missing file error - Inline-template error (baseDir empty) - No-op when YAML has no includes (safety: we always run the preprocessor) - Integration: load the real `org-templates/molecule-dev/org.yaml`, resolve includes, unmarshal into OrgTemplate, verify PM + Marketing Lead are top-level and PM has ≥4 children after expansion. All 9 pass + existing `TestResolvePromptRef` + `TestOrgYAML` suites stay green. ## Ownership implication Each team file can now be owned + reviewed independently. When the marketing team adds a 7th role, the diff is in `teams/marketing.yaml` alone — no merge conflicts against PM or research changes in the same review window. Same for the eventual engineer team, security team, etc. ## What's next - Phase 4 (queued): per-workspace atomization. Each role gets `<role>/workspace.yaml`; team files shrink to a list of !include refs. Terminal step in the scalability arc — at that point adding a new role is one new file under `org-templates/molecule-dev/<role>/` plus one line in the team's manifest. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-16 07:49:56 +00:00
Hongming Wang	29044c3995	fix(#249 ): add /schedules/health endpoint accessible to CanCommunicate peers (#400 ) Rebased cleanly onto current main (resolves the add/add conflicts that blocked CI on PR #374 — the original branch diverged from a pre-repo-bootstrap commit that predated most files). Changes: - schedules.go: add scheduleHealthResponse struct + Health handler (mirrors A2A proxy auth pattern: X-Workspace-ID + CanCommunicate gate) - router.go: register GET /workspaces/:id/schedules/health on r (not wsAuth) so peer agents can query without holding the target workspace's bearer token - schedules_test.go: 7 new tests (missing caller 401, self-call OK, legacy peer grandfathered, non-peer 403, system caller bypass, no prompt exposure, DB error 500) isSystemCaller/validateCallerToken reused from a2a_proxy.go (same package). registry.CanCommunicate import added to schedules.go. Closes #249 Supersedes PR #374 (which could not get CI due to merge conflict) Co-authored-by: PM (Molecule AI) <pm@molecule-ai.internal> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-16 00:45:30 -07:00
rabbitblood	c12d6436ab	feat(provisioner): per-agent git identity via GIT_AUTHOR_* env vars Every workspace now commits under its own name. Step 3 of the three- step agent-separation plan (platform-level git identity today; GitHub App migration follows as Option 1). ## Problem All 20+ agents in the molecule-dev template (PM, Dev Lead, Research Lead, FE, BE, DevOps, Security, QA, UIUX, Marketing roles, etc.) share a single GITHUB_TOKEN — specifically the CEO's personal PAT. So every commit, PR, and issue across the live repos ends up attributed to HongmingWang-Rabbit. `git log` can't distinguish "which agent wrote this code" from "did the CEO write it"; neither can the authority- verification rule in triage-operator/philosophy.md (rule #3). ## Fix When the provisioner starts a workspace container, it now sets: GIT_AUTHOR_NAME = "Molecule AI <Workspace Name>" GIT_AUTHOR_EMAIL = <slug>@agents.moleculesai.app GIT_COMMITTER_NAME = (same) GIT_COMMITTER_EMAIL = (same) Git prefers these env vars over `git config user.name` / `user.email`, so no per-container git-config step is needed; every commit automatically carries the right authorship. Examples (20 agents, 20 distinct identities): Frontend Engineer → frontend-engineer@agents.moleculesai.app Backend Engineer → backend-engineer@agents.moleculesai.app Product Marketing Manager → product-marketing-manager@agents.moleculesai.app UIUX Designer → uiux-designer@agents.moleculesai.app Domain `agents.moleculesai.app` is deliberate: marks the email as a bot address without resembling a real inbox. ## Operator override preserved `applyAgentGitIdentity` runs AFTER the secret-load loops in `provisionWorkspaceOpts`, but uses `setIfEmpty` so any workspace_secret with the same key wins. Teams that want custom authorship (shared org signing identity, a person-on-the-loop owner) can still set `GIT_AUTHOR_NAME` via /workspaces/:id/secrets and get their value through to git. ## What this does NOT solve (yet) - PR / issue authorship is still whoever owns GITHUB_TOKEN (the shared PAT). That needs the GitHub App migration (Option 1, next PR). The commit-level split shipped here is the prerequisite: the App path will keep these env vars and just swap the PAT for a short-lived installation token. - Existing containers continue with their pre-fix env (git env vars are baked in at container-create time). Applying is one plain `POST /workspaces/:id/restart` per agent after this merges + deploys — the restart goes through provisionWorkspace which picks up the new injection. ## Tests `agent_git_identity_test.go` — 4 behavior tests + a 10-row slug test: - fills all 4 env vars from a workspace name - operator override via pre-set env is preserved (setIfEmpty semantics) - empty / whitespace workspace name is a no-op (no `unknown@...` emails) - nil map doesn't panic (defensive) - slugify handles spaces / punctuation / edge hyphens / em-dashes All 15 cases pass; platform build clean. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-16 00:45:26 -07:00
Hongming Wang	ce0e793673	feat(org-templates): Phase 1 — externalize prompt bodies to sibling files (#389 ) Part 1 of 4 in the scalability refactor. Each role can now keep its initial_prompt / idle_prompt / schedule prompts as sibling .md files under files_dir/; inline YAML literals still work for backwards-compat. ## What changes Platform (org.go importer): - `OrgWorkspace` gains `InitialPromptFile`, `IdlePrompt`, `IdlePromptFile`, `IdleIntervalSeconds`. The idle_* fields were previously dropped by the org importer entirely — struct didn't declare them — which is why engineer idle_prompts never propagated from org.yaml to live /configs (I've been manually docker-cp'ing them in every maintenance cron). - `OrgSchedule` gains `PromptFile`. Hourly/weekly cron prompts are the largest bodies in org.yaml (1-5 KB each) and get resolved at import time just like initial_prompt. - `OrgDefaults` gains the same idle_* + _file fields for org-wide fallback. - New `resolvePromptRef(inline, fileRef, orgBaseDir, filesDir)` helper — the single chokepoint for inline-vs-file resolution. Inline wins when both are set. File refs route through `resolveInsideRoot` so a crafted ref can't escape the org template directory (same traversal defense as files_dir). - `createWorkspaceTree` now injects idle_prompt + idle_interval_seconds into the workspace's config.yaml (previously missing — that's the second half of the idle-prompt propagation bug). Tests:* - `org_prompt_ref_test.go` — 10 cases: inline-wins, file-read-when-empty, both-empty, defaults-level resolution, inline-template mode errors, traversal rejection (via file ref AND via files_dir), missing-file errors, and YAML-unmarshal parsing for each new field. Proof migration: - Documentation Specialist (biggest role at 6.9 KB of prompts) moves from inline YAML to `documentation-specialist/{initial-prompt.md, schedules/daily-docs-sync.md, schedules/weekly-terminology-audit.md}`. - org.yaml drops 1801 → 1687 lines (-6.3%) from just this one role. ## Why this matters org.yaml is 108 KB of which 67 KB (62%) is prompt text. At the current 12-role template size that's already unreadable; the marketing + triage- operator additions pushed it to 1801 lines. The 4-phase refactor aims: - Phase 1 (this PR): platform support + 1 role proof. - Phase 2: migrate remaining ~20 roles to file refs. Target: org.yaml at ~600 lines of pure structural scaffolding. - Phase 3: YAML `!include` preprocessor — split org.yaml into teams/{research,dev,marketing,ops}.yaml shards. - Phase 4: per-workspace atomization — each role gets its own workspace.yaml manifest; org.yaml composes them. ## Backwards compatibility - Inline `initial_prompt: \|` / `prompt: \|` / `idle_prompt: \|` all still work. - Missing `prompt_file` refs log + skip the schedule (not fatal) — fail loud so bugs surface during deployment rather than silent-drop. - Inline-template mode (POST /org/import with raw JSON body, no `dir`) errors cleanly when a file ref is used — can't resolve files without a base dir, surface that rather than guessing. ## Test plan - [x] `go build ./...` clean - [x] `go test -run 'TestResolvePromptRef\|TestOrgYAML' ./internal/handlers/` — 10 tests pass - [x] `python -c "yaml.safe_load(...)"` on the edited org.yaml — parses - [ ] Post-merge: deploy platform rebuild, run `POST /org/import` against a fresh workspace, verify Documentation Specialist's /configs/config.yaml contains the initial_prompt body and workspace_schedules rows contain the cron prompts (phantom-success check: grep the actual content, not just the row count). Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-16 00:32:09 -07:00
Hongming Wang	a9fdbe4185	fix(liveness): raise workspace TTL 60s → 180s to survive Opus synthesis (#386 ) Problem observed 2026-04-16: Research Lead, Dev Lead, Security Auditor, and UIUX Designer were being auto-restarted by the liveness monitor every ~30 minutes, even though their containers were healthy and processing real work. A2A callers (PM, children agents) saw regular EOFs: A2A request to <leader-id> failed: Post http://ws-:8000: EOF Followed in platform logs by: Liveness: workspace <id> TTL expired Auto-restart: restarting <name> (was: offline) Provisioner: stopped and removed container ws- Root cause: the liveness key `ws:{id}` in Redis has a 60s TTL (platform/internal/db/redis.go). The workspace heartbeat loop (workspace-template/heartbeat.py) refreshes it every 30s. That leaves room for exactly ONE missed heartbeat before expiry. A busy Claude Code Opus synthesis can starve the container's asyncio scheduler for 60-120s (the SDK spawns the claude CLI subprocess and blocks until the message-reader yields; the heartbeat coroutine doesn't run during that window). Leaders running 5-minute orchestrator pulses or processing deep delegations routinely hit this. The platform then mistakes a busy-but-healthy container for a dead one, marks it offline, tears it down, and re-provisions — interrupting whatever work was mid- synthesis and generating a cascade of EOF errors on pending A2A calls. Fix: hoist the TTL into a named `LivenessTTL` constant and raise it to 180s. With a 30s heartbeat interval this now tolerates up to ~5 missed beats before expiry — comfortably longer than any realistic Opus stall, while still detecting genuinely-dead containers within 3 minutes. Safety: real crashes are still caught immediately by a2a_proxy's reactive IsRunning() check (maybeMarkContainerDead in a2a_proxy.go:439). That path doesn't depend on TTL; it fires on the first failed forward. So this PR only relaxes the "slow but alive" false-positive — dead-container detection is unchanged. Observed impact before fix (2026-04-16 ~06:40–06:49 UTC, 10-minute window, 4 containers affected): \| Container \| EOF errors \| Forced restart \| \|-------------------\|-----------:\|:--------------:\| \| Dev Lead \| 5 \| yes (06:48) \| \| Research Lead \| 5 \| yes (06:47) \| \| Security Auditor \| 5 \| yes (06:49) \| \| UIUX Designer \| 4 \| no (not yet) \| Expected impact after merge + redeploy: drop to ~0 forced restarts on healthy-busy leaders. If genuinely-stuck agents stop responding, the IsRunning check still catches them on the next A2A forward. Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-16 00:05:45 -07:00
Hongming Wang	0bcebff908	config(org): add Telegram to Dev Lead and Research Lead (#385 ) * feat(adapters): add gemini-cli runtime adapter (closes #332) Adds a `gemini-cli` workspace runtime backed by Google's Gemini CLI (@google/gemini-cli, ~101k ★, Apache 2.0). Mirrors the claude-code adapter pattern: Docker image installs the CLI, CLIAgentExecutor drives the subprocess, A2A MCP tools wire via ~/.gemini/settings.json. Changes: - workspace-template/adapters/gemini_cli/ — new adapter (Dockerfile, adapter.py, __init__.py, requirements.txt); setup() seeds GEMINI.md from system-prompt.md and injects A2A MCP server into settings.json - workspace-template/cli_executor.py — adds gemini-cli to RUNTIME_PRESETS (--yolo flag, -p prompt, --model, GEMINI_API_KEY env auth); adds mcp_via_settings preset flag to skip --mcp-config injection for runtimes that own their own settings file - workspace-configs-templates/gemini-cli/ — default config.yaml + system-prompt.md template - tests/test_adapters.py — adds gemini-cli to expected adapter set - CLAUDE.md — documents new runtime row in the image table Requires: GEMINI_API_KEY global secret. Build: bash workspace-template/build-all.sh gemini-cli Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(provisioner): add gemini-cli to RuntimeImages map Without this entry, POST /workspaces with runtime:gemini-cli falls back to workspace-template:langgraph (wrong image, missing gemini dep) instead of workspace-template:gemini-cli. Every runtime MUST have an entry here. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * config(org): add Telegram to Dev Lead and Research Lead (closes #383) Completes leadership-tier Telegram coverage: PM ✓ DevOps ✓ Security ✓ → Dev Lead ✓ Research Lead ✓ Both roles produce high-value async output (architecture decisions, eco-watch summaries) that was invisible until the user polled the canvas. Same bot_token/chat_id secrets as the other three roles — no new credentials required. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Co-authored-by: DevOps Engineer <devops@molecule.ai> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-16 00:00:10 -07:00
Hongming Wang	52bdadbd6d	fix(security): forward Authorization header in transcript proxy (#405 ) (#380 ) The platform's GET /workspaces/:id/transcript proxy was constructing the outbound request without an Authorization header. The workspace's /transcript endpoint (hardened in #287/#328) fails-closed when the header is absent, so every transcript call in production returned 401 from the workspace. Fix: after WorkspaceAuth validates the incoming bearer token, the handler now forwards it verbatim via req.Header.Set("Authorization", ...). Forwarding is safe — the token has already been validated by the middleware. Tests: - TestTranscript_ForwardsAuthHeader: was t.Skip'd as a bug marker; now active. Verifies the Authorization header reaches the workspace stub. - TestTranscript_NoAuthHeader_PassesThrough: new. Verifies that a missing header produces no synthetic Authorization on the upstream call, and the workspace 401 is faithfully relayed. Identified by QA audit 2026-04-16. Co-authored-by: QA Engineer <qa-engineer@molecule-ai.internal> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-15 23:38:07 -07:00
PM Bot	e257cd80d4	chore(test): remove dead constants from wsauth_middleware_test.go (#358 ) PR #357 deleted the grace-period tests that used hasLiveTokenQuery and workspaceExistsQuery, but the constants themselves (and the stale comment describing the old HasAnyLiveToken-based dispatch) were not removed. Remove both dead const declarations and update the header comment to reflect the strict-enforcement contract introduced by #357. Closes #358. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-16 05:02:11 +00:00
Hongming Wang	fa239217a0	fix(security): remove WorkspaceAuth tokenless grace period (#351 ) Severity HIGH. #318 closed the fake-UUID fail-open for WorkspaceAuth but left the grace period intact for real workspaces with no live tokens. Zombie test-artifact workspaces from prior DAST runs still exist in the DB with empty configs and no tokens, so they pass WorkspaceExists=true but HasAnyLiveToken=false — and fell through the grace period, leaking every global-secret key name to any unauthenticated caller on the Docker network. Phase 30.1 shipped months ago; every production workspace has gone through multiple boot cycles and acquired a token since. The "legacy workspaces grandfathered" window no longer serves legitimate traffic. Removing it entirely is the cleanest fix — and does NOT affect registration (which is on /registry/register, outside this middleware's scope). New contract (strict): every /workspaces/:id/* request MUST carry Authorization: Bearer <token-for-this-workspace> Any missing/mismatched/revoked/wrong-workspace bearer → 401. No existence check, no fallback. The wsauth.WorkspaceExists helper is kept in the package for any future caller but no longer used here. Tests: - TestWorkspaceAuth_351_NoBearer_Returns401_NoDBCalls — new, covers fake UUID / zombie / pre-token in one sub-table. Asserts zero DB calls on missing bearer. - Existing C4/C8 + #170 tests updated to drop the stale HasAnyLiveToken sqlmock expectations. - Renamed TestWorkspaceAuth_Issue170_SecretDelete_FailOpen_NoTokens to _NoTokensStillRejected and flipped the assertion from 200 to 401. - Dropped TestWorkspaceAuth_318_ExistsQueryError_Returns500 — the code path it covered no longer exists. Full platform test sweep green. Closes #351 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 21:52:44 -07:00
Hongming Wang	c5d40b861b	Merge pull request #343 from Molecule-AI/fix/issue-337-webhook-secret-constant-time fix(security): constant-time webhook_secret comparison (#337)	2026-04-15 21:31:02 -07:00
Hongming Wang	50819500f0	fix(security): constant-time webhook_secret comparison (#337 ) Severity LOW. The /webhooks/:type handler compared the Telegram X-Telegram-Bot-Api-Secret-Token header against the decrypted webhook_secret using Go's `!=` operator, which short-circuits on the first mismatched byte. Under low-latency Docker-network conditions an attacker could time response latency byte-by-byte and converge on the real secret, then inject Telegram-formatted messages into any channel. Fix: switch to crypto/subtle.ConstantTimeCompare, which runs in time proportional to the length of the shorter input regardless of content match. Same posture as the cdp-proxy token compare in host-bridge (which already used timingSafeEqual). Risk profile over the public internet is low (Telegram webhooks have natural jitter that masks the signal), but the defensive pattern matters for consistency across all secret comparisons. Closes #337 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 21:23:12 -07:00
Hongming Wang	a205c92428	fix(security): scope PausePollersForToken to requesting workspace (closes #329 ) CI 5/6 pass (E2E cancel = run-supersession pattern). Dev Lead review 04:21: ✅ Approved. Fixes cross-tenant token exposure: PausePollersForToken now scoped to requesting workspace_id via SQL WHERE clause. Closes #329.	2026-04-15 21:22:50 -07:00
Hongming Wang	d85ee97472	fix(security): encrypt channel_config bot_token at rest (closes #319 ) CI fully green. Dev Lead code review: ✅ clean, all read/write paths verified, tests cover round-trip + idempotency + legacy plaintext. Closes #319.	2026-04-15 21:09:34 -07:00
Hongming Wang	5c3aac11e3	fix(security): close WorkspaceAuth fail-open on non-existent workspace IDs (#318 ) CI fully green. Security Audit cycle 15 LGTM. Closes #318. Closes #325.	2026-04-15 21:02:29 -07:00
Hongming Wang	472495c380	Merge pull request #270 from Molecule-AI/feat/workspace-transcript-endpoint feat: GET /workspaces/:id/transcript — live agent session log	2026-04-15 17:55:41 -07:00
airenostars	66b8cbb7fa	fix(transcript): validate workspace URL to prevent SSRF (#272 ) `TranscriptHandler.Get` previously proxied `agent_card->>'url'` directly to the outbound HTTP client with no validation. Since `agent_card` is attacker-writable via /registry/register, a workspace-token holder could point it at cloud metadata (169.254.169.254), link-local ranges, or non-http schemes and pivot the platform container against internal services (IMDS, Redis, Postgres, other containers on the Docker net). Four required fixes per reviewer: 1. `validateWorkspaceURL(u *url.URL)` — runs before `httpClient.Do`: - scheme must be http/https (rejects file://, gopher://, ftp://) - cloud metadata hostname blocklist (GCP + Azure + plain "metadata") - IMDS IP blocklist (169.254.169.254) - IPv4/IPv6 link-local blocklist (169.254/16, fe80::/10, multicast) - IPv6 unique-local fd00::/8 blocklist - loopback + docker.internal still allowed for local dev 2. Query-param allowlist — `target.RawQuery = c.Request.URL.RawQuery` forwarded everything verbatim, letting a caller smuggle params the upstream transcript endpoint didn't intend to expose. Replaced with an allowlist of `since` and `limit`. 3. Sanitized error string — `fmt.Sprintf("workspace unreachable: %v", err)` leaked the actual internal host/IP via `net.OpError`. Now logs the real error server-side and returns a plain "workspace unreachable" to the caller. 4. 10 new regression test cases: - `TestTranscript_Rejects{CloudMetadataIP,NonHTTPScheme,MetadataHostname,LinkLocalIPv6}` exercise the handler end-to-end with each attack URL and assert 400 before the HTTP client fires. - `TestValidateWorkspaceURL` table-drives the validator across localhost/public/docker-internal (allowed) + IMDS/GCP/Azure/file/ gopher/link-local/multicast (rejected). - `TestTranscript_ProxyPropagatesAllowlistedQueryParams` asserts `secret=leak&cmd=rm` is stripped while `since=42&limit=7` pass through. Also fixed a pre-existing test bug: `seedWorkspace` was issuing a real SQL Exec against sqlmock with no expectation set, so the prior test helpers silently failed in CI. Replaced with `expectWorkspaceURLLookup` which programs the mock correctly. All 11 tests now pass. Closes #272 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 17:46:55 -07:00
Hongming Wang	cb37aa850c	fix(security): add Referrer-Policy + Permissions-Policy headers (#282 ) Closes #282. CLAUDE.md documented the SecurityHeaders() middleware as setting 6 headers (X-Content-Type-Options, X-Frame-Options, Referrer- Policy, Content-Security-Policy, Permissions-Policy, HSTS) but the implementation only set 4 — Referrer-Policy and Permissions-Policy were silently missing. Adds: - Referrer-Policy: strict-origin-when-cross-origin — prevents browsers from leaking full paths/queries in Referer on cross- origin navigation. Particularly relevant for canvas embeds of Langfuse trace URLs that may contain trace IDs. - Permissions-Policy: camera=(), microphone=(), geolocation=() — denies sensor access by default. Iframes the canvas embeds (Langfuse trace viewer etc.) can no longer request these without an explicit delegation. Regression tests added to securityheaders_test.go — both headers are now in the same table-driven assertion loop as the other 4, so a future edit that drops them again fails CI loudly. LOW severity — this is defense-in-depth, not a direct exploit path. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 16:52:19 -07:00
airenostars	1f22d7df1b	feat: GET /workspaces/:id/transcript — live agent session log Closes #N (issue to be filed) Lets canvas / operators see live tool calls + AI thinking instead of waiting for the high-level activity log to flush. Right now the only way to "look over an agent's shoulder" is `docker exec ws-XXX cat /home/agent/.claude/projects/.../<session>.jsonl`, which: - doesn't work for remote workspaces (Phase 30 / Fly Machines) - requires shell access on the host - has no pagination This PR adds: 1. `BaseAdapter.transcript_lines(since, limit)` — async hook returning `{runtime, supported, lines, cursor, more, source}`. Default returns `supported: false` so non-claude-code runtimes pass through gracefully. 2. `ClaudeCodeAdapter.transcript_lines` override — reads the most- recently-modified `.jsonl` in `~/.claude/projects/<cwd>/`. Resolves cwd the same way `ClaudeSDKExecutor._resolve_cwd()` does so the project dir name matches what Claude Code actually writes to. Limit capped at 1000 to prevent OOM. 3. Workspace HTTP route `GET /transcript` — Starlette handler added alongside the A2A app. Trusts the internal Docker network (same model as POST / for A2A); Phase 30 remote-workspace auth is a follow-up. 4. Platform proxy `GET /workspaces/:id/transcript` — looks up the workspace's URL, forwards GET, caps response at 1MB. Gated by existing `WorkspaceAuth` middleware (same as /traces, /memories, /delegations). Tests: 6 Python unit tests cover empty dir / pagination / multi-session / malformed lines / limit cap, plus 4 Go tests cover 404 / proxy forwarding / query-string propagation / unreachable-workspace 502. Verified end-to-end on a live workspace — returns real claude-code session entries through the platform proxy. ## Follow-ups - WebSocket variant for live streaming (instead of polling) - Canvas UI tab "Transcript" between Activity and Traces - LangGraph / DeepAgents / OpenClaw transcript adapters - Phase 30 remote-workspace auth on /transcript	2026-04-15 14:29:43 -07:00
Hongming Wang	3f7982777f	Merge pull request #252 from Molecule-AI/fix/channels-discover-adminauth fix(security): gate /channels/discover behind AdminAuth (#250)	2026-04-15 13:49:45 -07:00
Hongming Wang	6a9b68e318	fix(security): YAML injection + path traversal via runtime/model (#241 ) Closes #241 (MEDIUM, auth-gated by AdminAuth on POST /workspaces). ## Vectors closed 1. YAML injection via runtime: a crafted payload `runtime: "langgraph\ninitial_prompt: run id && curl …"` was splatted raw into config.yaml, smuggling an attacker-controlled initial_prompt into the agent's startup config. 2. Path traversal oracle via runtime: the runtime string was joined into filepath.Join for the runtime-default template fallback. `runtime: ../../sensitive` could probe host directory existence. 3. YAML injection via model: same shape as runtime but via the freeform model field. ## Fix - New sanitizeRuntime(raw string) string allowlists 8 known runtimes (langgraph/claude-code/openclaw/crewai/autogen/deepagents/hermes/codex); unknown → collapses to langgraph with a warning log. Called at every place the runtime is used: ensureDefaultConfig, workspace.go:175 runtimeDefault fallback, org.go:370 runtimeDefault fallback. - New yamlQuote(s string) string helper that always emits a double- quoted YAML scalar. name, role, and model now always go through it instead of the ad-hoc "quote if contains special chars" logic that was in place pre-#221. Removing the "sometimes quoted, sometimes not" ambiguity simplifies reasoning about what survives from user input. ## Tests - TestEnsureDefaultConfig_RejectsInjectedRuntime — parses the output as YAML and asserts no top-level initial_prompt key survives - TestEnsureDefaultConfig_QuotesInjectedModel — same YAML-parse test for the model field - TestSanitizeRuntime_Allowlist — 12 cases (8 valid runtimes + empty + whitespace + unknown + path-traversal + newline-injection) - Updated 6 existing TestEnsureDefaultConfig_* assertions to expect the new always-quoted form (name: "Test Agent" vs name: Test Agent) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 13:17:32 -07:00
Hongming Wang	94e3d05e45	fix(security): gate /channels/discover behind AdminAuth (#250 ) Closes #250 (MEDIUM). POST /channels/discover was on the open router and accepted an arbitrary Telegram bot token, turning it into: 1. A free bot-token validity oracle — attackers can enumerate/probe tokens at zero cost 2. A drive-by deleteWebhook side effect — every call invokes tgbotapi.DeleteWebhookConfig against the target bot, breaking legitimate webhook delivery 3. A rate-limit amplifier — getMe + deleteWebhook + getUpdates per call Fix: one-line addition of middleware.AdminAuth(db.DB) to the route, matching its actual intent (platform-operator admin helper, not a per-workspace route). Pattern mirrors /admin/liveness, /events, and /bundles/export from PR #167. No new test: AdminAuth behavior is covered by wsauth_middleware_test.go; this PR only wires it onto an additional route. The load-bearing code comment references #250 so future reviewers can't revert without an issue citation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 13:11:22 -07:00
Hongming Wang	ce160aecc7	fix(security): #234 — sanitize source_id spoof log line via %q Closes #234 LOW. The security log I added in PR #228 (code-review follow-up) echoed body.SourceID with %s, which preserves any \n / \r that json.Unmarshal decoded from the attacker's JSON. An authenticated workspace could have injected fake log entries by sending source_id="evil\ntimestamp=FORGED level=INFO msg=fake". Fix: use %q on both body_source_id and c.ClientIP(). Go-quoted string escapes all control characters so multi-line payloads stay on a single log line. One-line fix. Regression test: TestActivityHandler_Report_SourceIDLogInjection exercises the code path with a literal \n in source_id. Assertion is limited to "handler returns 403 cleanly with no panic" because capturing log output in Go tests requires a log.SetOutput swap, which adds noise for little signal vs just reading the test log output (visible when running with -v). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 12:04:26 -07:00
Hongming Wang	6fd13ff037	fix(security): #226 — gate POST /workspaces template/runtime against traversal Closes #226 MEDIUM. WorkspaceHandler.Create joined payload.Template directly into filepath.Join(configsDir, template) without validating it stayed inside configsDir. An attacker posting Template="../../etc" would have the provisioner walk and mount arbitrary host directories into the workspace container. Same fix as #103 (POST /org/import): use the existing resolveInsideRoot helper to reject absolute paths and any ".." that escapes the root. Applied at both call sites in workspace.go: 1. Synchronous runtime detection before DB insert — 400 on bad input 2. Async provisioning goroutine — early return, logs the rejection (belt-and-suspenders; the create path already blocks) No test added inline because the existing resolveInsideRoot suite (org_path_test.go) already covers absolute / traversal / prefix-sibling / empty-path / deep-subpath cases. A duplicate test for the workspace handler wouldn't add signal. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 12:00:26 -07:00
Hongming Wang	00626a41a5	Merge pull request #224 from Molecule-AI/fix/issue-221-yaml-injection fix(security): sanitize workspace name before YAML interpolation	2026-04-15 11:59:10 -07:00
Hongming Wang	cb0205ed95	fix(security): #221 — quote name as YAML scalar instead of stripping newlines The original fix stripped \n/\r but left the rest in place, then relied on a substring-based test which was over-strict (the escaped fragment still contained the banned substring as bytes). Better approach: emit the name as a double-quoted YAML scalar with all escape sequences (\\, \", \n, \r, \t) handled inline. This is the canonical YAML-safe way to embed user input — no injection possible because every control character is either escaped or rejected by the YAML parser inside the scalar context. Test rewritten to parse the output as YAML and verify: 1. parsed[\"name\"] equals the literal attacker input (payload preserved) 2. no banned top-level keys leaked to the parsed map 3. legitimate default keys (description/version/tier/model) still present Updated the two existing tests that asserted the unquoted name format. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 11:58:16 -07:00
Hongming Wang	1c0e3565af	Merge branch 'main' into test/issue-217-plugin-pipeline-tests	2026-04-15 11:54:12 -07:00
Hongming Wang	c730f6bc02	Merge branch 'main' into fix/issue-221-yaml-injection	2026-04-15 11:54:10 -07:00
Hongming Wang	410d2493d1	fix(code-review): CanvasOrBearer fall-through, scheduler short(), activity spoof log + 6 new tests Addresses self-review of the 10-PR batch merged earlier this session. Splits the follow-ups into this Go-side PR and a later Python/docs PR. ## Fixes 1. wsauth_middleware.go CanvasOrBearer — invalid bearer now hard-rejects with 401 instead of falling through to the Origin check. Previous code let an attacker with an expired token + matching Origin bypass auth. Empty bearer still falls through to the Origin path (the intended canvas path). 2. scheduler.go short() helper — extracts safe UUID prefix truncation. Pre-existing unsafe [:12] and [:8] slices would panic on workspace IDs shorter than the bound. #115's new skip path had the bounds check; the happy-path log lines did not. One helper, three call sites. 3. activity.go security-event log on source_id spoof — #209 added the 403 but the attempt was invisible to any auditor cron. Stable greppable log line with authed_workspace, body_source_id, client IP. ## New tests - TestShort_helper — bounds-safety regression guard for the helper - TestRecordSkipped_writesSkippedStatus — #115 coverage gap, exercises UPDATE + INSERT via sqlmock - TestRecordSkipped_shortWorkspaceIDNoPanic — short-ID crash regression - TestActivityHandler_Report_SourceIDSpoofRejected — #209 403 path - TestActivityHandler_Report_MatchingSourceIDAccepted — non-spoof path - TestHistory_IncludesErrorDetail — #152 problem B coverage go test -race ./... green locally. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 11:48:25 -07:00
Dev Lead Agent	a3ce767822	test(handlers): add unit test suite for plugins_install_pipeline.go The 13K-line plugins_install_pipeline.go had zero unit tests, making it the highest-regression-risk file in the platform handlers package. New test file covers all testable pure-function and integration paths that do not require a live Docker daemon: validatePluginName (8 cases) - valid names, empty, forward slash, backslash, "..", embedded ".."; path-traversal variants ("../etc", "../../secrets") dirSize (6 cases) - empty dir, single file, multiple files, nested subdirectory, exceeds limit (verifies error mentions "cap"), exactly at limit httpErr / newHTTPErr (3 cases) - Error() contains status code, all relevant HTTP codes preserved, errors.As unwraps through fmt.Errorf %w chains regexpEscapeForAwk (6 cases) - alphanumeric names unchanged, slash escaped, dot escaped, + escaped, full "# Plugin: name /" marker (space not escaped), backslash escaped streamDirAsTar (4 cases) - empty dir yields zero entries, single file round-trips content, nested directory preserves relative path, entries have no absolute or tempdir-leaking paths resolveAndStage via stubResolver (10 cases) - empty source → 400, unknown scheme → 400, happy path (result fields), staged dir cleaned on fetch error, ErrPluginNotFound → 404, DeadlineExceeded → 504, generic error → 502, resolver returns invalid name → 400, local:// path traversal → 400 (pre-Fetch validation) stubResolver implements plugins.SourceResolver as an in-process test double — no network, no filesystem side-effects beyond the staging tempdir that resolveAndStage creates and cleans up. Closes #217 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-15 18:47:25 +00:00
Dev Lead Agent	afea61ae52	fix(security): sanitize body.Name before YAML interpolation in generateDefaultConfig A crafted workspace name containing a newline (e.g. "x\nmodel: evil") could inject arbitrary YAML keys into the auto-generated config.yaml. Strip \n and \r from the name before interpolation. YAML key injection requires a newline to start a new mapping entry; other characters such as `:` are safe in unquoted scalar values. Adds TestGenerateDefaultConfig_YAMLInjection with three adversarial inputs: bare \n injection, CRLF injection, and multi-key injection. Closes #221 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-15 18:44:11 +00:00
Hongming Wang	a507961f22	fix(db): #211 — migration runner skips .down.sql (stop wiping data on boot) Closes #211 HIGH ops/security. RunMigrations globbed \`.sql\` which matches both \`.up.sql\` AND \`.down.sql\`. Alphabetical sort puts \"d\" before \"u\", so every platform boot ran the rollback BEFORE the forward migration for any pair starting with migration 018. Net effect: every restart wiped workspace_auth_tokens (the 020 pair), which in turn regressed AdminAuth to its fail-open bootstrap bypass for every route protected by it — the live server was effectively unauthenticated from restart until the next workspace re-registered. Also wiped 018_secrets_encryption_version and 019_workspace_access pairs silently. Fix is a 3-line filter: skip files whose base name ends in \`.down.sql\`. Down migrations remain on disk for operator-driven rollback via psql, but are never picked up by the auto-run loop. Added unit test against a tmp dir to lock the filter behaviour so this can never regress: stages a mix of legacy plain .sql, matched up/down pairs, asserts only forward files survive. Follow-up (not in this PR): the runner still re-applies every migration on every boot. Migrations must be idempotent. A proper schema_migrations tracking table is tracked as a future cleanup. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 11:24:06 -07:00
Hongming Wang	a04f7c288d	fix(security): C2 from #169 — reject spoofed source_id in activity.Report Cherry-picks the one genuinely new fix from #169 after confirming the rest of that PR is already covered on main (C1/C3/C5 by wsAuth group, C6 by #94+#119 SSRF blocklist, C4 ownership by existing WHERE filter). Pre-existing middleware (WorkspaceAuth on /workspaces/:id/* sub-routes) proves the caller owns the :id path param. But the body field source_id was never validated — a workspace authenticated for its own /activity endpoint could still attribute logs to a different workspace by setting source_id=<foreign UUID>. Rejected with 403 now. No schema change, no new middleware. 4-line handler delta. Closes the only real gap in #169; #169 itself will be closed as superseded. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 11:15:08 -07:00
Hongming Wang	2624d28f0c	fix(scheduler): #115 — skip cron fire when workspace is busy Closes #115. The Security Auditor hourly cron (and likely others) hit a ~36% miss rate because the platform's A2A proxy rejected fires with "workspace agent busy — retry after a short backoff" while the agent was still executing the prior audit. That error was recorded as a hard failure and polluted last_error. New behaviour: Before fireSchedule calls into the A2A proxy, it reads workspaces.active_tasks for the target. If >0, it: - Advances next_run_at to the next cron slot (cron keeps ticking) - Bumps run_count - Sets last_status='skipped' + last_error=<reason> - Inserts a cron_run activity_logs row with status='skipped' + error_detail - Broadcasts CRON_SKIPPED for canvas + operators Effect: busy-collision ceases to be an error. The history surface now distinguishes "ran and failed" from "skipped because busy". Operators can tell the difference at a glance, and the liveness view doesn't stall waiting for the next ticker cycle. Pairs with #149 (dedicated heartbeat pulse) and #152 problem B (error_detail surfaced in history) for a coherent scheduler story. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 11:13:15 -07:00
Hongming Wang	4d7c0ee01d	fix(scheduler): #152 problem B — persist and surface cron error_detail Closes #152 problem B (schedule history API drops error detail). Two tiny changes: 1. scheduler.fireSchedule now writes lastError into activity_logs.error_detail when inserting the cron_run row. Previously the column was left NULL even on failure because the INSERT didn't include it. 2. schedules.History SELECT now reads error_detail and includes it in the JSON response under error_detail. Frontend + audit cron can now display "why did this run fail" instead of just "status=error". No schema change — activity_logs.error_detail already exists from migration 009. This just starts using the column. Problem A of #152 (Research Lead ecosystem-watch 50% error rate on its own) is a separate ops investigation and stays open. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 11:11:16 -07:00
Hongming Wang	f0dcb81a24	fix(auth): #168 — CanvasOrBearer middleware for PUT /canvas/viewport only Closes #168 by the route-split path from #194's review. #167 put PUT /canvas/viewport behind strict AdminAuth, breaking canvas drag/zoom persist because the canvas uses session cookies not bearer tokens. New narrow middleware CanvasOrBearer: - Accepts a valid bearer (same contract as AdminAuth) OR - Accepts a request whose Origin exactly matches CORS_ORIGINS - Lazy-bootstrap fail-open preserved for fresh installs Applied ONLY to PUT /canvas/viewport. The softer check is acceptable there because viewport corruption is cosmetic-only — worst case a user refreshes the page. This middleware must NOT be used on routes that leak prompts (#165), create resources (#164), or write files (#190) — see #194 review for why. The other canvas-facing routes mentioned in #168 (Events tab, Bundle Export/Import) remain behind strict AdminAuth pending a proper session-cookie-accepting AdminAuth (#168 follow-up for Phase H). 6 new tests cover: bootstrap fail-open, no-creds 401, canvas origin match, wrong origin 401, empty origin rejected, localhost default. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 11:09:16 -07:00
Hongming Wang	7c9192063d	fix(security): #190 — gate POST /templates/import behind AdminAuth Closes #190 (HIGH). The route was registered on the root router with no auth middleware, letting any unauthenticated caller write arbitrary files into configsDir via a crafted template. Same vulnerability class as #164 (bundles/import) and path-traversal risk same as #103 (org/import). One-line gate via the existing wsAdmin pattern. Lazy-bootstrap fail-open preserved for fresh installs. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 11:00:49 -07:00
Hongming Wang	74046ca2cf	Merge pull request #187 from Molecule-AI/fix/issue-179-trusted-proxies fix(router): SetTrustedProxies(nil) closes rate-limit bypass via X-Forwarded-For (#179)	2026-04-15 10:55:01 -07:00
Hongming Wang	940a7772c3	Merge branch 'main' into fix/issue-170-secret-delete-auth	2026-04-15 10:54:36 -07:00
Backend Engineer	6edaebca00	fix: require workspace auth on DELETE /secrets/:key (#170 ) The route wsAuth.DELETE("/secrets/:key", sech.Delete) was already moved inside the WorkspaceAuth group in a prior commit, closing the CWE-306 unauthenticated-delete vector. This commit adds two regression tests to lock that in: - TestWorkspaceAuth_Issue170_SecretDelete_NoBearer_Returns401: workspace with live tokens, no bearer header → 401 (blocks the attack). - TestWorkspaceAuth_Issue170_SecretDelete_FailOpen_NoTokens: workspace with no tokens (bootstrap/legacy) → 200 (fail-open preserved). Mirrors the TestAdminAuth_Issue120_* and TestWorkspaceAuth_C4_C8_* patterns. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-15 17:42:08 +00:00
Backend Engineer	1ad98be17b	fix(router): call SetTrustedProxies(nil) to close IP-spoofing bypass (#179 ) Without this call Gin's default trusts all X-Forwarded-For headers, letting any caller rotate their effective IP and bypass per-IP rate limiting. SetTrustedProxies(nil) forces c.ClientIP() to always return the real TCP RemoteAddr. Adds two regression tests: one documenting the pre-fix bypass, one asserting the spoofed header is ignored after the fix. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-15 17:32:54 +00:00
Backend Engineer	3cbeab45ba	fix(security): gate GET /approvals/pending behind AdminAuth (#180 ) GET /approvals/pending was registered on the open router with no middleware, allowing any unauthenticated caller to enumerate all pending approvals across every workspace on the platform. Fix: add inline middleware.AdminAuth(db.DB) to the route registration, matching the pattern used in PR #167 for bundles, events, and viewport. The three workspace-scoped approvals routes (POST/GET /approvals, POST /approvals/:id/decide) were already correctly behind WorkspaceAuth inside the wsAuth group — no change needed there. Tests: two new regression tests in wsauth_middleware_test.go — TestAdminAuth_Issue180_ApprovalsListing_NoBearer_Returns401 TestAdminAuth_Issue180_ApprovalsListing_FailOpen_NoTokens Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-15 17:25:09 +00:00
Hongming Wang	ad5e7b88b3	fix(security): #164 + #165 + #166 — gate 6 unauth routes behind AdminAuth CRITICAL (#164): POST /bundles/import — anon callers could create arbitrary workspaces with user-supplied system prompts, plugins, and secrets envelopes. Fixed by gating behind AdminAuth (bundleAdmin group). HIGH (#165): GET /bundles/export/:id — anon UUID probe leaked full system prompts, agent_card, plugins, memory for any workspace. GET /events + GET /events/:workspaceId — anon read of the append-only event log leaked org topology, workspace names, card fragments. Both moved into the same bundleAdmin / eventsAdmin groups. MEDIUM (#166): PUT /canvas/viewport — anon callers could reset shared viewport state. Gated via a scoped viewportAdmin group; GET stays open so canvas bootstraps without a bearer. GET /admin/liveness — operational-intel leak (scheduler cadence reveals work pattern). Inline AdminAuth on the single handler. All 6 routes use the same lazy-bootstrap admin auth the rest of the platform uses: zero-token installs fail-open, once any token exists every request must present a valid bearer. Known follow-up: canvas uses session cookies not bearer tokens (same pattern as #138). In multi-tenant production these canvas features — Events tab, Export/Duplicate, viewport persist — will return 401 once a workspace is token-enrolled. Needs cookie-accepting AdminAuth as a follow-up (tracked as option B in #138 triage discussion); a new issue will be filed for that scope. The security gain from closing #164 CRITICAL outweighs the canvas UX regression for tonight. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 09:52:32 -07:00
Hongming Wang	146f4c781b	Merge pull request #162 from Molecule-AI/fix/issue-138-field-whitelist fix(auth): #138 — field-level authz on PATCH /workspaces/:id (canvas regression fix)	2026-04-15 09:39:22 -07:00

1 2 3 4 5

208 Commits