Commit Graph

159 Commits

Author SHA1 Message Date
Molecule AI Backend Engineer
f99d8b0a1a feat(platform): add per-workspace budget_limit field and A2A enforcement (#541)
- Migration 025: ADD COLUMN budget_limit BIGINT DEFAULT NULL and
  monthly_spend BIGINT NOT NULL DEFAULT 0 to workspaces table
- Models: BudgetLimit *int64 in CreateWorkspacePayload;
  MonthlySpend int64 in HeartbeatPayload
- workspace.go: scanWorkspaceRow, workspaceListQuery, Get, Create, and
  Update all handle budget_limit/monthly_spend; budget_limit is gated
  as a sensitiveUpdateField
- registry.go: heartbeat conditionally writes monthly_spend only when
  payload.MonthlySpend > 0 (avoids overwriting with zero)
- a2a_proxy.go: checkWorkspaceBudget() returns 429 when
  monthly_spend >= budget_limit (NULL = no limit; fail-open on DB error)
- Tests: 8 new workspace_budget_test.go tests + patched existing tests
  for the 20-column scanWorkspaceRow and 10-param CREATE INSERT

Field type: BIGINT (int64), units: USD cents (budget_limit=500 = $5.00/month)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 06:18:41 +00:00
Molecule AI Backend Engineer
f5c9132413 fix(migrations): close 024→026 gap — rename 026→025 token_usage, 027→026 allowlist (#631)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 06:17:36 +00:00
molecule-ai[bot]
19396fc55a Merge pull request #628 from Molecule-AI/fix/issue-623-adminauth-origin-bypass
Merge gate passed (all 7 gates). Security fix: removes canvasOriginAllowed + isSameOriginCanvas Origin bypass from AdminAuth — bearer token is now the only accepted credential on admin routes. 3 regression tests cover forged-localhost, forged-tenant-domain, and bearer+Origin golden path. Auth PR — CEO explicit approval confirmed in chat. UNSTABLE = known GitHub App token scope gap.
2026-04-17 06:13:33 +00:00
Molecule AI Backend Engineer
4810863a40 fix(security): remove canvasOriginAllowed from AdminAuth middleware (#623)
The Origin header is trivially forgeable by any container on the Docker
network. Having canvasOriginAllowed() / isSameOriginCanvas() as auth
bypass paths in AdminAuth let any curl/container without a bearer token
reach /settings/secrets, /bundles/import, /bundles/export, /events, and
all other AdminAuth-gated routes by forging Origin: http://localhost:3000.

Fix: remove both Origin bypass branches from AdminAuth. Bearer token is
now the only accepted credential. Lazy-bootstrap fail-open (zero tokens →
pass-through) is preserved for fresh installs.

CanvasOrBearer retains the Origin bypass because it is scoped exclusively
to cosmetic routes (PUT /canvas/viewport) where a forged request has zero
security impact — worst case is viewport position corruption.

Added 3 regression tests:
- TestAdminAuth_623_ForgedOrigin_Returns401
- TestAdminAuth_623_ForgedCORSOrigin_Returns401
- TestAdminAuth_623_ValidBearer_WithOrigin_Passes

Closes #623, Closes #626

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 06:00:45 +00:00
molecule-ai[bot]
652019afcc Merge pull request #610 from Molecule-AI/feat/issue-591-org-plugin-allowlist
feat(platform): per-org plugin governance registry (allowlist)
2026-04-17 05:55:27 +00:00
molecule-ai[bot]
b3963da815 Merge pull request #602 from Molecule-AI/feat/issue-593-workspace-token-tracking
feat(platform): per-workspace token tracking + GET /workspaces/:id/metrics
2026-04-17 05:54:27 +00:00
molecule-ai[bot]
09c761dcdb Merge pull request #612 from Molecule-AI/fix/test-token-adminauth
fix(security): gate test-token endpoint behind AdminAuth
2026-04-17 05:53:49 +00:00
molecule-ai[bot]
3bb3737d5c Merge pull request #601 from Molecule-AI/feat/issue-590-agui-sse-endpoint
feat(platform): AG-UI compatible SSE endpoint for streaming agent events
2026-04-17 05:45:29 +00:00
Molecule AI Backend Engineer
9733317609 feat(platform): per-org plugin governance registry (#591)
Add an org-scoped allowlist table so org admins can restrict which plugins
workspace agents are allowed to install.  An empty allowlist means
allow-all (backward-compatible with existing deployments).

• migrations/027_org_plugin_allowlist.{up,down}.sql — new table + unique
  index on (org_id, plugin_name)
• handlers/org_plugin_allowlist.go — resolveOrgID, checkOrgPluginAllowlist
  (fail-open on DB errors), GetAllowlist, PutAllowlist (atomic tx replace)
• handlers/org_plugin_allowlist_test.go — 23 unit tests covering all
  handler paths, resolveOrgID, and all checkOrgPluginAllowlist branches
• handlers/plugins_install.go — allowlist gate between resolveAndStage and
  deliverToContainer; returns 403 if plugin is blocked
• router/router.go — GET/PUT /orgs/:id/plugins/allowlist under AdminAuth

All tests pass; go build ./... clean; gosec Issues: 0

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 05:40:23 +00:00
Molecule AI Backend Engineer
57f5659f74 feat(platform): per-workspace token tracking + GET /workspaces/:id/metrics (#593)
Migration 026 adds workspace_token_usage table (uuid pk, workspace_id FK with
CASCADE, period_start TIMESTAMPTZ, input_tokens, output_tokens, call_count,
estimated_cost_usd NUMERIC(12,6), updated_at) with a UNIQUE index on
(workspace_id, period_start) for day-granularity upserts.

A2A proxy (proxyA2ARequest) now spawns a detached goroutine after each
successful call to extractAndUpsertTokenUsage, which:
  1. Parses usage.input_tokens / usage.output_tokens from result.usage
     (JSON-RPC wrapper) with fallback to top-level usage (direct Anthropic).
  2. Calls upsertTokenUsage — INSERT ... ON CONFLICT DO UPDATE so multi-
     call days accumulate correctly. Estimated cost = input×$0.000003 +
     output×$0.000015 (Claude Sonnet default; adjustable in a later phase).
  Token tracking never blocks the critical A2A path.

New endpoint: GET /workspaces/:id/metrics (wsAuth — WorkspaceAuth bearer
bound to :id). Returns:
  {"input_tokens":N,"output_tokens":N,"total_calls":N,
   "estimated_cost_usd":"0.000000","period_start":"...","period_end":"..."}
404 if workspace missing. Period is current UTC day.

11 new tests (4 handler + 7 parse-unit); 19/19 packages pass.

Closes #593

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 05:29:10 +00:00
Molecule AI Backend Engineer
518a09a911 feat(platform): AG-UI compatible SSE endpoint for streaming agent events (#590)
- Add in-process SSE subscription mechanism to Broadcaster (SubscribeSSE,
  deliverToSSE) so both RecordAndBroadcast *and* BroadcastOnly fan out to
  SSE subscribers — critical because BroadcastOnly skips Redis pub/sub and
  would be invisible to a Redis-only subscriber (AGENT_MESSAGE, A2A_RESPONSE,
  TASK_UPDATED are all BroadcastOnly events).
- Add handlers/sse.go: SSEHandler.StreamEvents sets text/event-stream headers,
  checks workspace existence (404 if missing), subscribes via broadcaster, and
  wraps each WSMessage in an AG-UI envelope:
    data: {"type":"<event>","timestamp":<unix_ms>,"data":{...}}\n\n
- Register wsAuth.GET("/workspaces/:id/events/stream") behind existing
  WorkspaceAuth middleware — bearer token bound to :id.
- Add 6 tests: Content-Type, initial ping, AG-UI format, workspace filter
  (cross-workspace events not leaked), 404 on missing workspace, multiple
  sequential events.

All 19 packages pass. Build clean.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 05:16:51 +00:00
Molecule AI Backend Engineer
c84171df72 fix(security): add AdminAuth to /admin/workspaces/:id/test-token route
Without middleware, any caller on a non-production instance could mint a
bearer token for any workspace UUID with no authentication. AdminAuth is
defence-in-depth: on a fresh install (no tokens yet) it is fail-open so
the bootstrap path still works; once the first workspace enrolls a token
all callers must present a valid bearer.

Adds two router-level tests confirming the gate:
- TestTestTokenRoute_RequiresAdminAuth_WhenTokensExist → 401 with no header
- TestTestTokenRoute_FailOpenOnFreshInstall → 200 (bootstrap path intact)

Env-var gating inside GetTestToken is retained as a second layer.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 02:48:00 +00:00
Hongming Wang
b73da288e3 fix(auth): TenantGuard same-origin bypass for EC2 tenant Canvas
On EC2 tenant instances, Caddy serves Canvas (:3000) and API (:8080) under
the same domain. Canvas makes same-origin requests without X-Molecule-Org-Id
or Fly-Replay-Src headers, causing TenantGuard to 404 every API route.

- Add isSameOriginCanvas() as tertiary check in TenantGuard — when
  CANVAS_PROXY_URL is set and Referer/Origin matches Host, pass through.
- Enhance isSameOriginCanvas() to also check Origin header (WebSocket
  upgrade requests send Origin but may not send Referer).
- Add 3 new tests: Referer bypass, Origin bypass (WS), inactive without env.

Fixes all 404s on /workspaces, /templates, /org/templates, /approvals/pending,
/canvas/viewport, and /ws WebSocket on tenant EC2 instances.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 18:22:23 -07:00
molecule-ai[bot]
3b5affb0d1 fix(github): refresh installation token when TTL < 10 min (#547) (#567)
Root cause: the github-app-auth plugin injects GH_TOKEN + GITHUB_TOKEN
into each workspace container's env at provision time (EnvMutator). Those
are GitHub App installation tokens with a fixed ~60 min TTL. The plugin
has an in-process cache that proactively refreshes 5 min before expiry —
but the workspace env is set once at container start and never updated.
Any workspace alive >60 min ends up with an expired token.

Fix (Option B — on-demand endpoint):

pkg/provisionhook:
  - Add TokenProvider interface: Token(ctx) (token, expiresAt, error)
    Lives in pkg/ (public) so the github-app-auth plugin can implement it.
  - Add Registry.FirstTokenProvider() — discovers the first mutator that
    also satisfies TokenProvider via interface assertion. Safe under
    concurrent reads (existing RWMutex).

platform/internal/handlers/github_token.go:
  - New GitHubTokenHandler serving GET /admin/github-installation-token
  - Delegates to the registered TokenProvider (plugin cache — always fresh)
  - 404 if no GitHub App configured, 500 + [github] prefix log on error
  - Never logs the token itself

platform/internal/handlers/workspace.go:
  - Add TokenRegistry() getter so the router can wire the handler without
    coupling to WorkspaceHandler internals

platform/internal/router/router.go:
  - Register GET /admin/github-installation-token under AdminAuth

workspace-template/:
  - scripts/molecule-git-token-helper.sh — git credential helper; calls
    the platform endpoint on every push/fetch; falls through to next
    helper (operator PAT) if platform unreachable
  - entrypoint.sh — configure the credential helper at startup

Why Option B over Option A (background goroutine):
  - The plugin already has its own cache refresh; nothing to refresh here.
  - Pushing env updates into running containers requires docker exec, which
    the architecture explicitly rejects (issue #547 "Alternatives").
  - Pull-based is stateless, trivially testable, zero extra goroutines.

Closes #547

Co-authored-by: Molecule AI DevOps Engineer <devops-engineer@agents.moleculesai.app>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 00:47:03 +00:00
molecule-ai[bot]
21f9e70b75 fix(platform): reject self-delegation to prevent _run_lock deadlock (#570)
When a workspace delegated a task to itself, it would acquire
_run_lock twice on the same goroutine mutex, blocking permanently.

Add an early-return guard in `DelegationHandler.Delegate` that
returns HTTP 400 {"error": "self-delegation not permitted"} as soon
as sourceID == body.TargetID, before any DB or A2A work is done.

Adds TestDelegate_SelfDelegation_Rejected to delegation_test.go.

Closes #548

Co-authored-by: Molecule AI Backend Engineer <backend-engineer@agents.moleculesai.app>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 00:46:20 +00:00
molecule-ai[bot]
a73de2451c fix(platform): persist secrets envelope from POST /workspaces payload (#568)
`CreateWorkspacePayload` was missing a `Secrets` field, so any
`secrets: { KEY: value }` included in a POST /workspaces body was
silently dropped by ShouldBindJSON.

Changes:
- Add `Secrets map[string]string` field to `CreateWorkspacePayload`
- Wrap workspace INSERT in a DB transaction; iterate over secrets,
  encrypt each value via `crypto.Encrypt`, and upsert into
  `workspace_secrets` within the same tx — rollback both on any failure
- Add `mock.ExpectBegin()`/`mock.ExpectCommit()`/`mock.ExpectRollback()`
  to all existing Create tests that were missing transaction expectations
- Add 3 new tests: WithSecrets_Persists, SecretPersistFails_RollsBack,
  EmptySecrets_OK

Closes #545

Co-authored-by: Molecule AI Backend Engineer <backend-engineer@agents.moleculesai.app>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 00:46:17 +00:00
Hongming Wang
925da9d6c2 fix: restore cp_provisioner.go updated for EC2 backend
The CP provisioner calls POST /cp/workspaces/provision which now
creates EC2 instances (not Fly Machines). The tenant platform
auto-activates this when MOLECULE_ORG_ID is set.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 14:25:43 -07:00
Molecule AI Backend Engineer
f88f221dfe fix(middleware): split CSP by route type — strict for API, permissive for canvas (#450)
API routes return JSON and never need 'unsafe-inline' or 'unsafe-eval'.
Serving those directives globally defeated the purpose of CSP and gave
false security assurance. Canvas-proxied routes (NoRoute → Next.js) keep
'unsafe-inline' because React hydration requires it; 'unsafe-eval' was
already absent and is confirmed unnecessary in production builds.

Implementation:
- Add isAPIPath() helper with an explicit prefix allowlist that mirrors
  the routes registered in router/router.go
- Strict "default-src 'self'" on all /workspaces, /registry, /health,
  /admin, /metrics, /settings, /bundles, /org, /templates, /plugins,
  /webhooks, /channels, /ws, /events, /approvals paths
- Permissive CSP (unsafe-inline, no unsafe-eval) on canvas/NoRoute paths
- 4 new test functions: TestCSPAPIRoutesGetStrictPolicy (covers every
  prefix + sub-path), TestCSPCanvasRoutesGetPermissivePolicy, and
  TestIsAPIPath unit test including substring-non-match guard

Resolves #450

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-16 20:26:17 +00:00
rabbitblood
2492f8c806 feat(platform): wire github-app-auth plugin for per-installation tokens
Integrates github.com/Molecule-AI/molecule-ai-plugin-github-app-auth.
When GITHUB_APP_ID is set, the platform constructs a plugin
Authenticator at boot and registers it as an EnvMutator on the
WorkspaceHandler. Every workspace provision then gets a fresh
GITHUB_TOKEN / GH_TOKEN injected from the App's installation token
(rotates ~hourly, refresh 5 min before expiry).

Verified live this turn:
- Platform boot log: `github-app-auth: registered, 1 mutator(s) in chain`
- `docker exec ws-<id> gh auth status` → `Logged in as molecule-ai[bot] (GH_TOKEN)`
- `gh issue list --repo Molecule-AI/molecule-core` returns real data
  (Hermes #498/#499/#500 visible from inside a workspace container)

## Changes
- platform/go.mod + go.sum: new dep on the plugin
- platform/cmd/server/main.go: import + conditional registration
  (soft-skip when GITHUB_APP_ID is unset for self-hosted/dev)
- docker-compose.yml: pass GITHUB_APP_* env + bind-mount private key

## Drive-by
.gitignore: exclude /org-templates /plugins /workspace-configs-templates
— these dirs are populated locally by clone-manifest.sh from the
standalone repos, should never be committed to core. Without this rule
my previous git add -A staged 33 embedded git dirs.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 12:52:20 -07:00
Hongming Wang
8f4d0997c8 fix: code review findings — dead code, DRY, rate limit, docs
1. Delete fly_provisioner.go — superseded by control plane architecture.
   Direct Fly provisioning from tenant was intentionally removed.

2. Extract loadWorkspaceSecrets() — shared by Docker + CP provisioner
   paths. Eliminates 30-line secret-loading duplication.

3. Token rate limit — max 50 active tokens per workspace. Returns 429
   if exceeded. Prevents unbounded token creation by compromised client.

4. CLAUDE.md — add GET/POST/DELETE /workspaces/:id/tokens to route table.

5. .env.example — document MOLECULE_ORG_ID and CP_PROVISION_URL.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 12:04:37 -07:00
Hongming Wang
a152342e8c feat(platform): auto-detect SaaS tenant → control plane provisioner
No env vars to configure. The platform auto-detects the backend:

  MOLECULE_ORG_ID set → SaaS tenant → control plane provisioner
  MOLECULE_ORG_ID empty → self-hosted → Docker provisioner

The control plane URL defaults to https://api.moleculesai.app (override
with CP_PROVISION_URL for testing). No FLY_API_TOKEN on the tenant.

Removed: direct Fly provisioner (FlyProvisioner) — all SaaS workspace
provisioning goes through the control plane which holds the Fly token
and manages billing, quotas, and cleanup.

Two backends: CPProvisioner (SaaS) and Docker Provisioner (self-hosted).

Closes #494

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 11:50:52 -07:00
Hongming Wang
3db589770e fix(auth): allow nesting + delete from tenant canvas (same-origin)
PATCH /workspaces/:id field-level auth for parent_id/tier/runtime
required a bearer token, blocking canvas nesting (drag-to-nest).
Added IsSameOriginCanvas check so the tenant canvas can update
sensitive fields without a bearer.

Exported IsSameOriginCanvas from middleware package so workspace.go
can call it for the field-level auth path.

DELETE /workspaces/:id is behind AdminAuth which already has the
same-origin check — if delete still fails, it's a different issue.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 11:22:45 -07:00
Hongming Wang
c2c80fd269 feat(platform): Fly Machines provisioner for SaaS workspace deployment
When CONTAINER_BACKEND=flyio, workspaces are provisioned as Fly Machines
instead of local Docker containers. This enables workspace deployment
on SaaS tenants where no Docker daemon is available.

New files:
- provisioner/fly_provisioner.go: FlyProvisioner with Start/Stop/
  IsRunning/Restart/Close via Fly Machines API (api.machines.dev/v1)
- FlyRuntimeImages maps runtimes to GHCR image tags

Changes:
- main.go: select Docker vs Fly based on CONTAINER_BACKEND env var
- workspace.go: SetFlyProvisioner() setter, Create checks flyProv first
- workspace_provision.go: provisionWorkspaceFly() loads secrets, calls
  FlyProvisioner.Start, issues auth token for the new machine

Env vars for Fly backend:
- CONTAINER_BACKEND=flyio (activates Fly provisioner)
- FLY_API_TOKEN (Fly deploy token)
- FLY_WORKSPACE_APP (Fly app name for workspace machines)
- FLY_REGION (default: ord)

Closes #494

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 10:51:15 -07:00
Hongming Wang
54bb543ff7 fix: code review findings — token UI, auth hardening, WS dedup
1. Settings panel: wire TokensTab into "API Tokens" tab (was imported
   but not rendered). Rename "API Keys" → "Secrets", add "API Tokens"
   tab. Fix docs link → doc.moleculesai.app/docs/tokens.

2. Referer match hardening: require exact host match or trailing slash
   to prevent evil.com subdomain bypass. Cache CANVAS_PROXY_URL at
   init time instead of per-request os.Getenv.

3. Extract shared deriveWsBaseUrl() to lib/ws-url.ts — eliminates
   duplicate 12-line derivation in socket.ts and TerminalTab.tsx.

4. Token list pagination: add ?limit= and ?offset= params (default
   50, max 200) to GET /workspaces/:id/tokens.

507/507 canvas tests pass, Go build + vet clean.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 10:42:26 -07:00
Hongming Wang
807b4c1b45 fix(auth): allow same-origin canvas requests through WorkspaceAuth on tenant
WorkspaceAuth only accepted bearer tokens, blocking the canvas from
calling per-workspace routes (restart, config, secrets, chat) on the
tenant image where canvas + API share the same origin.

Added isSameOriginCanvas() fallback (same check used by AdminAuth):
checks Referer matches request Host, gated behind CANVAS_PROXY_URL
so only tenant deployments are affected.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 10:06:33 -07:00
Hongming Wang
071fb0da88 fix(tenant): WebSocket URL derivation + AdminAuth same-origin for tenant image
Two bugs on the combined tenant image (canvas + API same-origin):

1. WebSocket URL: NEXT_PUBLIC_WS_URL="" (empty string for same-origin)
   was preserved by ?? operator, producing an invalid WS URL. Now derives
   from window.location when both env vars are empty. Same fix applied
   to TerminalTab.

2. AdminAuth blocking canvas: same-origin requests have no Origin header,
   so neither AdminAuth nor CanvasOrBearer could authenticate the canvas.
   Added isSameOriginCanvas() that checks Referer against request Host,
   gated behind CANVAS_PROXY_URL (only active on tenant image). This
   lets the canvas create/list workspaces, view events, etc. without a
   bearer token when served from the same Go process.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 08:43:01 -07:00
Hongming Wang
3892e4dee1 feat(platform): token management API + MCP setup + external agent guide
1. Token Management API (closes production gap):
   - GET /workspaces/:id/tokens — list tokens (prefix + metadata, never plaintext)
   - POST /workspaces/:id/tokens — create new token (plaintext returned once)
   - DELETE /workspaces/:id/tokens/:tokenId — revoke specific token
   - Behind WorkspaceAuth middleware (need existing token to manage tokens)
   - Tests skip gracefully when no DB available

2. MCP Server Setup:
   - Fix .mcp.json to use npx @molecule-ai/mcp-server (was referencing
     non-existent local ./mcp-server/dist/index.js)
   - Add comprehensive tool→API mapping doc (87 tools across 15 categories)

3. External Agent Registration Guide:
   - Step-by-step: create workspace, register, heartbeat, A2A messaging
   - Python (Flask) and Node.js (Express) complete working examples
   - Communication rules, lifecycle, security, troubleshooting

4. Token Management Guide:
   - Bootstrap flow, rotation procedure, security properties

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 08:37:42 -07:00
Hongming Wang
40c825b8ed Merge pull request #483 from Molecule-AI/fix/platform-modular-template-support
fix(platform): unblock org-template imports against modular workspace templates
2026-04-16 07:55:26 -07:00
rabbitblood
aeb9308c7d fix(platform): unblock org-template imports against modular workspace templates
Two adjacent fixes that surfaced trying to bring the molecule-dev org
template back up against the new standalone workspace-template-* repos.

1) handlers/org.go — expand ${VAR} in workspace_dir before validation.
   The molecule-dev pm/workspace.yaml (and any operator's per-host
   binding) ships `workspace_dir: ${WORKSPACE_DIR}` so each operator
   can pick the host path PM bind-mounts. Without expansion the literal
   "${WORKSPACE_DIR}" string reaches validateWorkspaceDir and fails with
   "must be an absolute path", aborting the whole org import.
   Other fields (channel config, prompts) already go through expandWithEnv;
   workspace_dir was the last hold-out.

2) provisioner/provisioner.go — inject PYTHONPATH=/app for every
   workspace container. Standalone template Dockerfiles COPY adapter.py
   to /app and set ENV ADAPTER_MODULE=adapter, but molecule-runtime is
   a pip console_script entry point so cwd isn't on sys.path
   automatically. Setting PYTHONPATH here fixes every adapter image at
   once instead of needing 8 PRs against template repos. Operator
   override still wins (workspace EnvVars are appended after, so Docker
   takes the later duplicate).

   Note: this unblocks the import path but does NOT make claude-code /
   hermes / etc. boot. The runtime itself has a separate top-level
   `from adapters import` that breaks against modular templates —
   tracked at workspace-runtime#1.

Tests: TestBuildContainerEnv_InjectsPYTHONPATH +
TestBuildContainerEnv_WorkspaceEnvVarsCanOverridePYTHONPATH lock the
default + operator-override invariants. expandWithEnv is already covered
by TestExpandWithEnv_* — the workspace_dir use site is a one-line call
to that primitive.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 07:49:45 -07:00
rabbitblood
a394ae55c3 feat(channels): Lark / Feishu adapter (outbound webhook + Events API inbound)
New ChannelAdapter implementation for Lark (international, open.larksuite.com)
and Feishu (China, open.feishu.cn). Both speak the same payload format —
only the host differs — so a single adapter covers both.

Outbound: POST text to a Custom Bot webhook URL with msg_type:"text".
Lark returns 200 OK even when delivery fails — the body's `code` field is
the truth. Adapter parses the response and returns a Go error when
code != 0 so callers don't think a revoked-webhook send succeeded.

Inbound: handles both v1 url_verification (handshake) and v2 event_callback
(im.message.receive_v1) shapes. Optional verify_token field — when set,
inbound payloads with mismatching tokens are rejected via constant-time
compare (#337 class — never raw == against a stored secret).

Sender ID resolution prefers user_id → falls back to open_id (open_id is
always present; user_id only when the bot has the contacts permission).
Non-text message types and non-message events return nil, nil so the
receiver responds 200 OK without dispatching.

Tests: 23 cases — identity, ValidateConfig (6 sub-cases incl. URL prefix
matrix), SendMessage (no URL / invalid prefix / happy-path body shape /
api-error-code surfacing), ParseWebhook (handshake + token mismatch +
text message + open_id fallback + non-message + non-text + token mismatch
+ malformed JSON + malformed content + empty text), StartPolling no-op,
registry presence.

Also: make migration 023 idempotent (ADD COLUMN IF NOT EXISTS) — the
platform's migration runner has no schema_migrations tracking table, so
every .up.sql replays on every boot. Without IF NOT EXISTS the second
boot against an existing volume crashes with "column already exists".
Followup issue to be filed for proper migration tracking.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 07:10:58 -07:00
rabbitblood
4065a7edee feat(platform): provision-time env mutator hook for plugins
Add `provisionhook.EnvMutator` extension point so out-of-tree plugins
(e.g. github-app-auth, vault-secrets) can inject or override env vars
right before container Start, without forking core or piling more
provider-specific code into the handlers package.

WorkspaceHandler gains an optional `envMutators *provisionhook.Registry`
wired in via SetEnvMutators during boot. The hook fires after built-in
secret loads + per-agent git identity, so plugins can both read what's
already there and override anything they own (GIT_AUTHOR_*, GITHUB_TOKEN).

A nil registry is a no-op via Registry.Run's nil-receiver branch — keeps
the hot path a single nil compare and means existing flows stay green
even with zero plugins registered.

Mutator failure aborts provisioning and marks the workspace failed with
the wrapped error in last_sample_error. Failing fast surfaces the cause
to the operator instead of letting an agent boot into opaque "git push
401" loops it can never recover from on its own.

Tests cover ordered execution, chained env visibility, first-error abort,
nil-receiver no-op, nil-mutator drop, registration order, and concurrent
register-vs-run safety (-race clean).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 06:47:09 -07:00
Security Auditor
6672b5f158 fix(security): YAML-quote skill/prompt names in generateDefaultConfig + opaque file-write errors
Closes #460, #461.

**#460 — YAML injection via unquoted skill/prompt filenames**
`generateDefaultConfig` extracted skill directory names and prompt file
names from user-supplied `body.Files` keys and wrote them directly into
YAML list items without quoting:

  cfg.WriteString("  - " + s + "\n")

`validateRelPath` only blocks path traversal (`../`); it does NOT block
YAML control characters including newlines. On Linux, filenames can
contain newlines, so an attacker with any live workspace bearer token
could submit:

  {"files": {"skills/legit\nruntime: malicious/SKILL.md": "# skill"}}

The generated config.yaml would then contain `runtime: malicious` as a
top-level YAML key, overriding the runtime for workspaces provisioned
from the template.

Fix: extract `yamlEscape` as a reusable local from the same
`strings.NewReplacer` already used for the `name` field (#221) and apply
it to both the `skills:` and `prompt_files:` list items, wrapping each
in double-quotes.

**#461 — Docker error details in ReplaceFiles 500 responses**
`ReplaceFiles` returned `fmt.Sprintf("failed to write files: %v", err)`
in two 500 paths, where `err` comes from Docker API calls and may include
internal container names, volume names, and daemon error messages.

Fix: log the full error server-side and return a static opaque string to
the caller.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-16 05:40:45 -07:00
Hongming Wang
490a3741a1 fix(test): wrap httptest.ResponseRecorder with CloseNotify for canvas proxy tests
httputil.ReverseProxy calls CloseNotify() which httptest.ResponseRecorder
doesn't implement. Gin casts the writer, causing a panic. Added a
closeNotifyRecorder wrapper with a no-op channel.
2026-04-16 05:40:17 -07:00
Molecule AI Backend Engineer
92a2c84d76 fix(auth): inject fresh bearer token into config volume on every provision (closes #418)
Container rebuild or volume wipe caused workspaces to lose /configs/.auth_token.
On re-registration the platform returned no auth_token (HasAnyLiveToken==true →
no re-issue), leaving the workspace unable to authenticate any subsequent API call.

Fix: provisionWorkspaceOpts now calls issueAndInjectToken before Start(). This
revokes any existing live tokens (plaintext is irrecoverable from the stored hash,
so rotation is the only safe path) and issues a fresh token that is written into
cfg.ConfigFiles[".auth_token"]. WriteFilesToContainer delivers it to /configs
immediately after ContainerStart, racing safely ahead of the Python adapter's
1-2s startup time.

Failure modes are soft: revoke or issue errors skip injection with a warning;
provisioning continues and the workspace recovers on the next restart.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-16 05:26:10 -07:00
Hongming Wang
cdc1caf6f9 Merge pull request #467 from Molecule-AI/feat/slack-webhook-validation
[Backend Engineer] feat(channels): Slack adapter with webhook URL validation (#384)
2026-04-16 05:22:47 -07:00
Hongming Wang
a5b77ed7ee Merge pull request #469 from Molecule-AI/feat/per-channel-budget
[Backend Engineer] feat(channels): per-channel message budget with 429 enforcement (#368)
2026-04-16 05:22:39 -07:00
Hongming Wang
84882eca7b Merge pull request #457 from Molecule-AI/fix/issue-451-strip-auth-header-canvas-proxy
[Backend Engineer] fix(security): strip Authorization + Cookie in canvas reverse proxy
2026-04-16 05:17:01 -07:00
Hongming Wang
83cf9d0e5f Merge pull request #446 from Molecule-AI/fix/issue-435-registry-error-leak
fix(security): suppress raw DB error from /registry/register response
2026-04-16 05:16:57 -07:00
Hongming Wang
2eb11d6f2c Merge pull request #465 from Molecule-AI/fix/memory-recall-flood-limit
[Backend Engineer] fix(memories): hard cap of 50 on recall results (#377)
2026-04-16 05:16:49 -07:00
Hongming Wang
510c40089f fix: address all code review findings + remove exposed secrets
Code review fixes:
- 🟡 #1: Replace python3 with jq in Dockerfile template stages (~50MB → ~2MB)
- 🟡 #2: Add clone count verification to scripts/clone-manifest.sh
  (set -e + expected vs actual count check — fails build if any clone fails)
- 🟡 #3: Drop 'unsafe-eval' from CSP (not needed for Next.js production
  standalone builds, only dev mode). Updated test assertion.
- 🟡 #4: Remove broken pyproject.toml from workspace-template/ (it claimed
  to package as molecule-ai-workspace-runtime but the directory structure
  didn't match — the real package ships from the standalone repo)
- 🔵 #1: Add version-pinning TODO comment to manifest.json
- 🔵 #3: Add full repo URLs + test counts for SDK/MCP/CLI/runtime in CLAUDE.md

Security (GitGuardian alert):
- Removed Telegram bot token (8633739353:AA...) from template-molecule-dev
  pm/.env — replaced with ${TELEGRAM_BOT_TOKEN} placeholder
- Removed Claude OAuth token (sk-ant-oat01-...) from template-molecule-dev
  root .env — replaced with ${CLAUDE_CODE_OAUTH_TOKEN} placeholder
- Both tokens need immediate rotation by the operator

Tests: Platform middleware tests updated + all pass.
2026-04-16 05:05:49 -07:00
Molecule AI Backend Engineer
c35dfaf143 feat(channels): per-channel message budget with 429 enforcement (#368)
Add an optional channel_budget (INTEGER, nullable) to workspace_channels
via migration 024. When channel_budget IS NOT NULL and message_count has
reached the budget, the Send handler returns 429 {"error":"channel budget
exceeded"} and aborts before calling SendOutbound.

Implementation details:
- Single SELECT query reads both message_count and channel_budget in one
  round-trip (avoids TOCTOU window between read and write)
- Fail-open on DB error: transient failures log but don't block sends
- Early-return on budget hit is before SendOutbound so message_count
  cannot be incremented past the limit by a concurrent send that slips
  through the window (best-effort; atomic enforcement requires DB-level CAS)
- NULL channel_budget = unlimited (default, backward-compatible)

Migration is idempotent (ADD COLUMN IF NOT EXISTS). Down migration drops
the column cleanly.

Four sqlmock tests cover: at-limit → 429, above-limit → 429, NULL budget
passes through, under-limit passes through.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-16 11:17:14 +00:00
Molecule AI Backend Engineer
6358f26932 feat(channels): add Slack adapter with webhook URL validation (#384)
Implement SlackAdapter satisfying the ChannelAdapter interface:
- ValidateConfig: rejects any webhook_url that doesn't start with
  https://hooks.slack.com/ — returns "invalid Slack webhook URL" so
  the handler surfaces 400 {"error":"invalid config: invalid Slack webhook URL"}
- SendMessage: HTTP POST JSON {"text":"..."} to the webhook URL with a
  10s timeout; rejects invalid-prefix URLs at send time too (defence in depth)
- ParseWebhook: handles both slash-command (form-encoded) and Events API
  (JSON) payloads; no-ops on url_verification and non-message events
- StartPolling: returns nil immediately (Slack doesn't support polling via
  Incoming Webhooks)

Register "slack" in the adapter registry. Twelve unit tests cover
Type/DisplayName, happy-path validation, every bad-URL variant (wrong scheme,
wrong host, SSRF lookalike, empty string), empty webhook in SendMessage,
StartPolling nil return, and registry lookup/listing.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-16 11:14:31 +00:00
Hongming Wang
d424bd947f chore: remove extracted directories, add manifest-driven Docker builds
Remove plugins/, workspace-configs-templates/, org-templates/ dirs (now
in standalone repos). Add manifest.json listing all 33 repos and
scripts/clone-manifest.sh to clone them. Both Dockerfiles now use the
manifest script instead of 33 hardcoded git-clone lines.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 04:13:29 -07:00
Molecule AI Backend Engineer
9b2539a042 fix(memories): add hard cap of 50 on recall results (#377)
Introduce `memoryRecallMaxLimit = 50` constant and honour the `?limit=N`
query parameter in Search. Values above 50 are silently clamped to 50;
absent or invalid values default to 50. The LIMIT clause is now a
parameterised argument (nextArg pattern) instead of a hardcoded literal.
Three sqlmock tests verify the cap, the explicit limit, and the default.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-16 11:12:35 +00:00
Molecule AI Backend Engineer
b5dc285080 fix(security): strip Authorization + Cookie headers in canvas reverse proxy (closes #451)
The canvas proxy was forwarding all headers verbatim to the Next.js process.
Workspace bearer tokens sent by agents (e.g. during an A2A call that hit a
canvas-side route) could reach unvalidated Next.js handlers and be echoed back
to an attacker via an error page or a debug endpoint.

Fix: Director now calls Header.Del("Authorization") + Header.Del("Cookie")
before forwarding. Non-credential headers (Accept, X-Request-Id, etc.) are
unaffected — the strip is surgical.

Four unit tests added (strips Authorization, strips Cookie, forwards other
headers, strips both simultaneously).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-16 11:00:43 +00:00
Backend Engineer
f2b7357b60 fix(security): registry DB errors must not leak raw driver messages (closes #435)
The Register handler was serialising the raw Go error into the HTTP response:
  c.JSON(500, gin.H{"error": fmt.Sprintf("failed to register: %v", err)})

PostgreSQL errors wrapped by lib/pq contain table names, constraint names, and
driver-version strings — enough for a caller to fingerprint the schema and craft
targeted attacks. The error is already logged at full detail with Printf before
this line, so callers only need the generic message.

Fix: replace the Sprintf with a static "registration failed" string (same pattern
the heartbeat and update-card handlers already used).

New test: TestRegister_DBErrorResponseIsOpaque verifies the response body is the
opaque string and that "sql:", "pq:", and "connection" substrings are absent.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-16 10:34:35 +00:00
Backend Engineer
7e971222f4 fix(provisioner): rebuild_config flag on restart recovers from destroyed config volume (closes #239)
When a workspace container AND its /configs Docker volume are both destroyed,
the restart handler previously had no recovery path — findTemplateByName searched
only the top-level configsDir, which holds workspace-instance dirs (ws-{id[:12]}/),
not the role-named org-template source directories.

Fix: add `rebuild_config: true` to the POST /workspaces/:id/restart body struct.
When set, the handler falls back to searching configsDir/org-templates/ via the
existing findTemplateByName logic (which already handles name normalisation and
config.yaml name-field matching). The workspace can then self-recover with its own
bearer token — no admin intervention required.

New helper: resolveOrgTemplate(configsDir, wsName) — pure function, independently
tested (4 cases: hit-by-dir, hit-by-config-yaml, no org-templates dir, no match).

Usage:
  curl -X POST -H "Authorization: Bearer $(cat /configs/.auth_token)" \
       -d '{"rebuild_config": true}' \
       http://platform:8080/workspaces/$WORKSPACE_ID/restart

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-16 10:34:25 +00:00
Hongming Wang
3a2ea197a2 Merge pull request #433 from Molecule-AI/feat/externalize-prompts-phase4
feat(org-templates): Phase 4 — atomize each role to <role>/workspace.yaml
2026-04-16 03:19:43 -07:00
Hongming Wang
f5843eff4d Merge pull request #417 from Molecule-AI/feat/memory-checkpoint-reconciliation
feat(memory): optimistic-locking via if_match_version on workspace_memory writes
2026-04-16 03:18:09 -07:00
rabbitblood
665f7f6313 feat(org-templates): Phase 4 — atomize each role to <role>/workspace.yaml
Part 4 of 4 — terminal step of the org.yaml scalability refactor. Each
role in the molecule-dev template now owns its own workspace.yaml file,
colocated with the existing system-prompt.md / initial-prompt.md /
idle-prompt.md / schedules/*.md. Team files shrink to a leader's own
definition plus a list of !include refs.

## Platform change

`resolveYAMLIncludes` now uses a TWO-ROOT model:
- Path resolution is relative to the INCLUDING file's directory
  (natural sibling + cousin refs, C-include / Sass @import convention).
- Security bound is the ORIGINAL org root (`rootDir`), preserved across
  all recursion depths. Sibling-dir refs like `../my-role/workspace.yaml`
  from a team file are now allowed (they stay inside the org template);
  refs that escape the root still error.

Regression coverage: new `TestResolveYAMLIncludes_SiblingDirAccess`
reproduces the Phase 4 pattern (team file at `teams/x.yaml` referencing
`../<role>/workspace.yaml`) — fails without the fix, passes with.

## Template change

Atomized 15 child workspaces across 3 team files:
- `teams/research.yaml`: 58 → 30 lines; 3 children now !include refs
- `teams/dev.yaml`: 222 → 38 lines; 6 children now !include refs
- `teams/marketing.yaml`: 143 → 28 lines; 6 children now !include refs

Each role now has `<role>/workspace.yaml` colocated with its prompts.
Example `frontend-engineer/` directory:
  frontend-engineer/
  ├── workspace.yaml        (24 lines — name/role/tier/canvas/plugins/...)
  ├── system-prompt.md      (from earlier phases)
  ├── initial-prompt.md
  ├── idle-prompt.md
  └── (no schedules for this role — but if added, schedules/<slug>.md)

## File-size progression across all 4 phases

| State | org.yaml | total `.yaml` in tree |
|---|---:|---:|
| Before (main) | 1801 lines / 108 KB | 1801 / 108 KB (one file) |
| After Phase 1 (#389) | 1687 | 1687 / 101 KB |
| After Phase 2 (#390) | 676 | 676 / 35 KB |
| After Phase 3 (#393) | 114 | 683 (1 + 6 teams) / 33 KB |
| **After this PR** | **114** | **~698** (1 + 6 + 15 workspace) / 35 KB |

Aggregate size is flat — the decrease came from prompt externalization
in Phases 1/2; Phases 3/4 reorganize structure without adding content.
The win is readability and ownership:
- Every individual file fits on 1-2 screens.
- Adding a new role is now: create `<role>/` dir, add `workspace.yaml`
  + `system-prompt.md` + prompts, add ONE `!include` line to the team
  file. No touching of aggregated mega-YAML.
- Team files can be reviewed + merged independently.

## Tests

All 10 `TestResolveYAMLIncludes_*` tests pass, including the real-template
integration test (`TestResolveYAMLIncludes_RealMoleculeDev`) which now
walks org.yaml → teams/pm.yaml → teams/research.yaml → ../market-analyst/
workspace.yaml and validates the full 21-role tree unmarshals cleanly.

Plus all existing `TestResolvePromptRef` + `TestOrgYAML` + `TestInitialPrompt`
suites stay green.

## Ops followup

After merging all 4 phases and deploying, the `POST /org/import`
endpoint should produce a workspace tree byte-identical to the
pre-refactor state. Verify with:
  diff <(curl POST /org/import before) <(curl POST /org/import after)
or by spot-checking:
  - `/configs/config.yaml` bodies across all 21 workspaces
  - `workspace_schedules.prompt` row values

The externalization is lossless — YAML literal to file and back
recovers the same string modulo trailing-whitespace normalization.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 03:09:56 -07:00