Commit Graph

41 Commits

Author SHA1 Message Date
Hongming Wang
35705274c9 fix(code-review): CanvasOrBearer fall-through, scheduler short(), activity spoof log + 6 new tests
Addresses self-review of the 10-PR batch merged earlier this session.
Splits the follow-ups into this Go-side PR and a later Python/docs PR.

## Fixes

1. wsauth_middleware.go CanvasOrBearer — invalid bearer now hard-rejects
   with 401 instead of falling through to the Origin check. Previous code
   let an attacker with an expired token + matching Origin bypass auth.
   Empty bearer still falls through to the Origin path (the intended
   canvas path).

2. scheduler.go short() helper — extracts safe UUID prefix truncation.
   Pre-existing unsafe [:12] and [:8] slices would panic on workspace IDs
   shorter than the bound. #115's new skip path had the bounds check;
   the happy-path log lines did not. One helper, three call sites.

3. activity.go security-event log on source_id spoof — #209 added the
   403 but the attempt was invisible to any auditor cron. Stable
   greppable log line with authed_workspace, body_source_id, client IP.

## New tests

- TestShort_helper — bounds-safety regression guard for the helper
- TestRecordSkipped_writesSkippedStatus — #115 coverage gap, exercises
  UPDATE + INSERT via sqlmock
- TestRecordSkipped_shortWorkspaceIDNoPanic — short-ID crash regression
- TestActivityHandler_Report_SourceIDSpoofRejected — #209 403 path
- TestActivityHandler_Report_MatchingSourceIDAccepted — non-spoof path
- TestHistory_IncludesErrorDetail — #152 problem B coverage

go test -race ./... green locally.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 11:48:25 -07:00
Hongming Wang
25bbfd3bfc fix(security): C2 from #169 — reject spoofed source_id in activity.Report
Cherry-picks the one genuinely new fix from #169 after confirming the
rest of that PR is already covered on main (C1/C3/C5 by wsAuth group,
C6 by #94+#119 SSRF blocklist, C4 ownership by existing WHERE filter).

Pre-existing middleware (WorkspaceAuth on /workspaces/:id/* sub-routes)
proves the caller owns the :id path param. But the body field
source_id was never validated — a workspace authenticated for its own
/activity endpoint could still attribute logs to a different workspace
by setting source_id=<foreign UUID>. Rejected with 403 now.

No schema change, no new middleware. 4-line handler delta. Closes the
only real gap in #169; #169 itself will be closed as superseded.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 11:15:08 -07:00
Hongming Wang
ce88a396da fix(scheduler): #152 problem B — persist and surface cron error_detail
Closes #152 problem B (schedule history API drops error detail).

Two tiny changes:

1. scheduler.fireSchedule now writes lastError into activity_logs.error_detail
   when inserting the cron_run row. Previously the column was left NULL even
   on failure because the INSERT didn't include it.

2. schedules.History SELECT now reads error_detail and includes it in the
   JSON response under error_detail. Frontend + audit cron can now display
   "why did this run fail" instead of just "status=error".

No schema change — activity_logs.error_detail already exists from
migration 009. This just starts using the column.

Problem A of #152 (Research Lead ecosystem-watch 50% error rate on its
own) is a separate ops investigation and stays open.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 11:11:16 -07:00
Hongming Wang
175bc2de50 Merge pull request #162 from Molecule-AI/fix/issue-138-field-whitelist
fix(auth): #138 — field-level authz on PATCH /workspaces/:id (canvas regression fix)
2026-04-15 09:39:22 -07:00
Hongming Wang
cbf46a837b fix(auth): #138 — field-level authz on PATCH /workspaces/:id
Closes #138. #125 moved PATCH /workspaces/:id into the wsAdmin AdminAuth
group to close the #120 unauth vulnerability, but broke canvas drag-
reposition and inline rename because canvas uses session cookies not
bearer tokens. Multi-tenant deployments with any live token would have
seen every canvas PATCH 401.

Option A per #138 triage: PATCH goes back on the open router, but
WorkspaceHandler.Update now enforces field-level authz:

  Cosmetic (no bearer required):
    name, role, x, y, canvas

  Sensitive (bearer required when any live token exists):
    tier          — resource escalation
    parent_id     — A2A hierarchy manipulation
    runtime       — container image swap
    workspace_dir — host bind-mount redirection

Fail-open bootstrap: HasAnyLiveTokenGlobal = 0 → pass-through
(fresh install, pre-Phase-30 upgrade path). Matches the same
lazy-bootstrap contract WorkspaceAuth and AdminAuth use elsewhere.

3 new tests cover all three branches of the matrix (cosmetic
no-bearer, sensitive no-bearer-rejected, sensitive fail-open).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 09:39:09 -07:00
Hongming Wang
06fed1776a Merge pull request #119 from Molecule-AI/fix/111-112-clean
fix(security+scheduler): IPv6 SSRF gap + scheduler unit tests [supersedes #111, #112]
2026-04-15 09:36:59 -07:00
Hongming Wang
0003f7970a Merge branch 'main' into fix/111-112-clean 2026-04-15 09:36:14 -07:00
Hongming Wang
34beac349e Merge branch 'main' into fix/delete-revokes-tokens 2026-04-15 09:35:44 -07:00
Hongming Wang
ba7064f75c fix(tests): add EXISTS probe mock to 4 WorkspaceUpdate tests
#125 added a SELECT EXISTS guard before WorkspaceHandler.Update applies
any UPDATE so nonexistent workspace IDs return 404 instead of silent
zero-row successes. The 4 existing WorkspaceUpdate_* sqlmock tests
didn't mock the probe, so they broke on main. This was not caught
because CI is blocked by the Actions billing cap.

Adds ExpectQuery for the EXISTS probe to:
- TestWorkspaceUpdate_ParentID
- TestWorkspaceUpdate_NameOnly
- TestWorkspaceUpdate_MultipleFields
- TestWorkspaceUpdate_RuntimeField

TestWorkspaceUpdate_BadJSON doesn't need the fix — it aborts on
c.ShouldBindJSON before reaching the guard.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 09:35:08 -07:00
Hongming Wang
51786128ed fix(security): close unauthenticated PATCH /workspaces/:id (#120) + schedule IDOR (#113)
Security fix merging despite CI outage (issue #136 — runner failing since 07:22, all jobs fail in 1-2s with no log output, infrastructure issue confirmed across 28 consecutive runs).

Issue #120 confirmed live by Security Auditor (cycle 3):
  curl -X PATCH .../workspaces/00000000-... -d '{"name":"probe"}' → 200 (no token)

Code reviewed and approved by Security Auditor. Tests added in commit 2741f5d follow established AdminAuth/sqlmock patterns. CI outage is unrelated to these changes.
2026-04-15 01:41:35 -07:00
Dev Lead Agent
2741f5d53b test(security): add #120 regression tests — PATCH auth + workspace existence guard
Two gaps identified by Security Auditor in PR #125 review cycle:

1. handlers_extended_test.go:
   - Fix TestExtended_WorkspaceUpdate: add SELECT EXISTS mock expectation
     so the test correctly reflects the #120 existence guard now running first.
   - Add TestExtended_WorkspaceUpdate_NotFound: verifies PATCH returns 404
     (not 200) for a nonexistent workspace ID — the core #120 behaviour fix.

2. wsauth_middleware_test.go:
   - Add TestAdminAuth_Issue120_PatchWorkspace_NoBearer_Returns401: documents
     the confirmed attack vector (PATCH without token must return 401) and
     asserts AdminAuth is applied to PATCH /workspaces/:id per the router.go change.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-15 08:40:06 +00:00
Dev Lead Agent
590eefb5ae fix(security): #120 PATCH auth + #113 schedule IDOR — close unauthenticated write vectors
Issue #120 (HIGH — immediately exploitable):
  PATCH /workspaces/:id was registered on the root router with no auth
  middleware. An attacker with any workspace UUID could:
    - Escalate tier (tier 4 = 4 GB RAM allocation)
    - Rewrite parent_id to subvert CanCommunicate A2A access control
    - Swap runtime image on next restart
    - Redirect workspace_dir host bind-mount to arbitrary path
  Fix: move PATCH into the wsAdmin AdminAuth group alongside POST, DELETE.
  The canvas position-persist call already has an AdminAuth token (required
  for GET /workspaces list on initial load) so no canvas regression.
  Also add workspace-existence guard in Update handler — previously returned
  200 with zero rows affected for nonexistent IDs.

Issue #113 (MEDIUM — schedule IDOR, carry-over from prior cycle):
  PATCH /workspaces/:id/schedules/:scheduleId and DELETE operated on
  scheduleID alone (WHERE id = $1), allowing any authenticated caller to
  modify or delete schedules belonging to other workspaces.
  Fix: bind workspace_id = c.Param("id") in both Update and Delete handlers;
  add AND workspace_id = $N to all schedule SQL queries.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-15 08:01:22 +00:00
Hongming Wang
ab42968179 Merge pull request #109 from Molecule-AI/feat/issue-101-github-workflow-run
feat(webhooks): #101 — GitHub workflow_run event → DevOps A2A
2026-04-15 00:51:01 -07:00
Security Auditor
7b57f411fc fix(security): close IPv6 SSRF gap in validateAgentURL (C6)
PR #94 blocked 169.254.0.0/16 but left IPv6 equivalents fully open.
Go's (*IPNet).Contains() does not match pure IPv6 addresses against IPv4
CIDRs, so ::1, fe80::*, and fc00::/7 all bypassed the check.

Add three explicit IPv6 entries to blockedRanges:
  - fe80::/10  (IPv6 link-local — cloud metadata analogue)
  - ::1/128    (IPv6 loopback)
  - fc00::/7   (IPv6 ULA — RFC-4193 private)

IPv4-mapped IPv6 (::ffff:169.254.x.x) is already safe: Go normalises
these to IPv4 via To4() before Contains() runs.

Tests: four new cases in TestValidateAgentURL covering all three blocked
IPv6 ranges plus the IPv4-mapped IPv6 auto-normalisation path.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-15 07:43:23 +00:00
DevOps Engineer
543b895d3f fix(security): revoke workspace tokens on delete (root-cause fix for C1 E2E)
The Delete handler marked workspaces 'removed' but never touched
workspace_auth_tokens.  That left stale live tokens in the table, so
HasAnyLiveTokenGlobal stayed true after the last workspace was deleted.
AdminAuth then blocked the unauthenticated GET /workspaces in the E2E
count-zero assertion with 401, and the previous commit worked around it
by commenting out the assertion.

This commit fixes the root cause:
- workspace.go Delete: batch-revoke auth tokens for all deleted
  workspace IDs (including descendants) immediately after the canvas_layouts
  clean-up, using the same pq.Array pattern as the status update.
- workspace_test.go TestWorkspaceDelete_CascadeWithChildren: add the
  expected UPDATE workspace_auth_tokens SET revoked_at sqlmock expectation.
- tests/e2e/test_api.sh: restore the count=0 post-delete assertion
  (now passes because tokens are revoked → fail-open), capture NEW_TOKEN
  from the re-imported workspace registration for the final cleanup call
  (SUM_TOKEN is revoked after SUM_ID is deleted).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-15 07:28:10 +00:00
Hongming Wang
bbeb1a4b8f feat(webhooks): #101 — workflow_run event → DevOps A2A
Closes #101 layer 1: buildGitHubA2APayload now handles workflow_run
events, routing failed CI runs to a workspace via the existing
X-Molecule-Workspace-ID / webhook path. Only completed runs with a
failure/cancelled/timed_out conclusion fan out — success/skipped/neutral
are dropped via errIgnoredGitHubAction.

Surface message is human-readable + includes the run URL so DevOps can
jump straight to the failing job. Metadata carries the full run context
(workflow_name, run_id, run_number, conclusion, head_branch, head_sha,
run_url, trigger_event) for programmatic handling.

4 new tests cover the failure path, success skip, non-completed action
skip, and short-SHA edge case.

Layer 2 (org.yaml wiring for DevOps workspace + GITHUB_WEBHOOK_SECRET
docs) stays as a follow-up PR.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 00:25:49 -07:00
Hongming Wang
a7cb00cc8b fix(security): #103 — path-sanitize + admin-gate POST /org/import
Closes #103 (HIGH). Three attack surfaces on the import endpoint —
body.Dir, workspace.Template, workspace.FilesDir — were concatenated
via filepath.Join without validation, letting an unauthenticated
caller probe arbitrary filesystem paths with "../../../etc".

Two layers of defense:
  1. resolveInsideRoot() rejects absolute paths and any relative path
     whose lexically cleaned join escapes the provided root (Abs +
     HasPrefix + separator guard). 6 tests cover happy path, traversal
     attempts, absolute path, empty input, prefix-sibling escape, and
     deep subpath resolution.
  2. Route now runs behind middleware.AdminAuth so an unauthenticated
     attacker can't reach the handler at all once a token exists.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 00:18:09 -07:00
Hongming Wang
a60477ed1e Merge pull request #94 from Molecule-AI/fix/c6-loopback-ssrf
fix(security): C6 — block loopback IP literals in /registry/register
2026-04-15 00:15:23 -07:00
Backend Engineer
96253ca8ca fix(security): C6 — extend SSRF blocklist to RFC-1918 private ranges
PR #94 only blocked 127.0.0.0/8 (loopback) and 169.254.0.0/16
(link-local/IMDS). An attacker could still register a workspace with
a URL in any RFC-1918 range (10.x, 172.16–31.x, 192.168.x) and
redirect A2A proxy traffic to internal services.

Block all five reserved ranges in validateAgentURL:
  - 169.254.0.0/16  link-local (IMDS: AWS/GCP/Azure)
  - 127.0.0.0/8     loopback (self-SSRF)
  - 10.0.0.0/8      RFC-1918
  - 172.16.0.0/12   RFC-1918 (includes Docker bridge networks)
  - 192.168.0.0/16  RFC-1918

Agents must use DNS hostnames, not IP literals. The provisioner
still writes 127.0.0.1 URLs via direct SQL UPDATE (CASE guard
preserves those); this blocklist only applies to the /registry/register
request body.

Tests: updated 3 previously-allowed RFC-1918 cases to expect rejection;
added 9 new cases covering range boundaries and the Docker bridge range.
All 22 validateAgentURL subtests pass.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-15 04:35:05 +00:00
Backend Engineer
602dcb6283 fix(security): C6 — block loopback IP literals in /registry/register
A workspace that self-registers with a 127.0.0.x URL on first INSERT
could redirect A2A proxy traffic back to the platform itself (SSRF).
The previous fix only blocked 169.254.0.0/16 (cloud metadata).

Add 127.0.0.0/8 to validateAgentURL's blocklist. RFC-1918 private
ranges (10.x, 172.16.x, 192.168.x) remain allowed — Docker container
networking depends on them.

Safe because the provisioner writes 127.0.0.1 URLs via direct SQL
UPDATE, not through /registry/register, so the UPSERT CASE that
preserves provisioner URLs is unaffected. Local-dev agents can still
register using "localhost" by name (hostname, not IP literal).

Tests: removed "valid localhost http" case (now correctly rejected),
added "valid localhost name" + three loopback-block assertions.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-15 03:34:14 +00:00
Hongming Wang
3ddd0cffbf Merge pull request #76 from Molecule-AI/fix/issue-24-schedules-db-authoritative
fix(org): DB-authoritative schedules; org/import is additive on template rows (#24)
2026-04-14 14:40:54 -07:00
Hongming Wang
b15e30ccde fix(schedules): backfill legacy rows to 'template' + extract import SQL const
Addresses code-review warnings on PR #76:
- Migration 022 now backfills pre-existing workspace_schedules rows to
  source='template' before flipping NOT NULL + DEFAULT 'runtime'. Legacy
  rows (all seeded via org/import historically) stay refreshable on
  re-import. Down migration drops the CHECK constraint too.
- Extracted the import UPSERT into const orgImportScheduleSQL so the shape
  test asserts against the const directly instead of file-scraping org.go.
  Removed the os.ReadFile helper.
- scheduleResponse.Source gets json:\",omitempty\" so old clients that
  predate the migration don't see an empty string they can't explain.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 14:30:22 -07:00
Hongming Wang
c47898568c fix(org): use yaml.Marshal for category_routing + newline-guard block appends
Addresses code-review warnings on PR #75:
- renderCategoryRoutingYAML now builds yaml.Node + yaml.Marshal, escaping
  YAML-reserved chars in role names correctly (was JSON-as-YAML, fragile on
  unicode line separators).
- New appendYAMLBlock helper guarantees a newline boundary when concatenating
  YAML fragments into config.yaml (category_routing + initial_prompt both
  used to risk merging into the previous line).
- Fixed struct comment (replace-per-key, not UNION).
- Added TestCategoryRouting_EscapesYAMLSpecials and TestAppendYAMLBlock_NewlineGuard.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 14:28:22 -07:00
Hongming Wang
2e9fb51ff9 fix(org): DB-authoritative schedules; org/import is additive on template rows (#24)
Resolves #24 per CEO direction.

DB is source of truth for workspace_schedules. POST /org/import becomes
idempotent — only touches rows it owns (source='template'); runtime-added
schedules (Canvas / API) are preserved across re-imports.

- Migration 022: adds source TEXT NOT NULL DEFAULT 'runtime' CHECK in
  ('template','runtime'); unique index on (workspace_id, name) so the
  org/import upsert can use ON CONFLICT.
- org.go: schedule INSERT becomes
    INSERT ... 'template' ON CONFLICT (workspace_id, name) DO UPDATE
      SET ... WHERE workspace_schedules.source='template'.
  Never DELETEs.
- schedules.go: runtime POST writes 'runtime' explicitly; List handler
  surfaces the source field on the response so Canvas can render badges.
- 3 new unit tests assert source='runtime' default for runtime CRUD,
  the SQL shape contract for org/import (additive + idempotent +
  runtime-preserving + never-DELETE), and List response surface.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 14:09:44 -07:00
Hongming Wang
d4140ee244 feat(platform): generic category_routing replaces hardcoded audit dispatch (#51)
Add a category_routing block to org.yaml schema (defaults + per-workspace,
UNION semantics with per-key replace). The merged routing table is rendered
into each workspace's config.yaml at import time.

PM's system prompt loses the hardcoded security/ui/infra → role mapping
from PR #50; instead it reads category_routing from /configs/config.yaml
and delegates to whatever roles the org template lists for the incoming
audit-summary's category. Future org templates ship their own routing
without prompt churn.

Tests: 4 new TestCategoryRouting_* cases covering YAML parse, UNION+drop
semantics, deterministic config.yaml render, and empty-map handling.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 14:06:47 -07:00
Hongming Wang
eea64f06ec fix(org): per-workspace plugins UNION with defaults; '!' prefix opts out (#68)
Per-workspace `plugins:` now UNIONS with `defaults.plugins` instead of
replacing. A leading `!` or `-` on a per-workspace entry opts a default
out. Backward-compatible: re-listing defaults still dedupes to the same
list.

Refactored the inline REPLACE logic into a pure helper `mergePlugins`
in org.go so it's unit-testable. Five TestPlugins_* cases added.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 13:21:23 -07:00
Hongming Wang
3c7c65ffd3 Merge pull request #64 from Molecule-AI/fix/issue-15-refresh-oauth-on-restart
fix(secrets): auto-refresh global_secrets on workspace restart (#15)
2026-04-14 12:49:19 -07:00
Hongming Wang
a36047f3d8 feat(platform): inject restart context system message (#19 Layer 1)
After a workspace restart (HTTP /restart or programmatic RestartByID) and
re-registration, the platform sends a synthetic A2A message/send to the
workspace containing:
- restart timestamp
- previous session end timestamp + human duration
- env-var keys now available (keys only — never values)

The message is rendered in the format proposed in #19 and marked with
metadata.kind=restart_context so agents can detect and handle it
specifically if they choose.

Skip path: if the workspace doesn't re-register within 30s, log and drop.
The Restart HTTP response is unaffected by delivery success.

Layer 2 (user-defined restart_prompt via config.yaml / org.yaml) is
deferred — tracked as a separate follow-up issue.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 12:41:01 -07:00
Hongming Wang
1b432f6ffd fix(secrets): auto-restart workspaces on global secret change (#15)
Global secrets (e.g. CLAUDE_CODE_OAUTH_TOKEN) are injected as container env
vars at Start() time. Until now, rotating one only propagated to a workspace
on the next full restart-from-zero, which manual ops had to drive via a
`POST /workspaces/:id/restart` loop. Tier-3 Claude Code agents hit the
stale-token path first and surfaced as 401s inside the SDK.

Restart-time re-read of global_secrets + workspace_secrets was already
correct in `provisionWorkspaceOpts` — the missing piece was the trigger.
SetGlobal / DeleteGlobal now enqueue RestartByID for every non-paused,
non-removed, non-external workspace that does NOT shadow the key with a
workspace-level override. Matches the existing behaviour of workspace-scoped
`Set` / `Delete`.

Adds two sqlmock-backed tests exercising both branches.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 12:39:00 -07:00
Hongming Wang
4ff65b82c7 fix(provisioner): preserve Claude session directory across restart (#12)
Resolves #12. The claude-code SDK stores conversations in
/root/.claude/sessions/ and Postgres tracks current_session_id, but the
container filesystem was recreated on every restart — next agent message
failed with "No conversation found with session ID: <uuid>".

Add a per-workspace named Docker volume (ws-<id>-claude-sessions) mounted
read-write at /root/.claude/sessions. Gated by runtime=claude-code so
other runtimes don't pay for a path they don't use. Volume is cleaned up
in RemoveVolume alongside the config volume.

Two opt-outs discard the volume before restart for a fresh session:
  - env WORKSPACE_RESET_SESSION=1 on the container
  - POST /workspaces/:id/restart?reset=true (or {"reset": true} body)

Plumbed via new ResetClaudeSession field on WorkspaceConfig +
provisionWorkspaceOpts helper so the flag stays request-scoped (not
persisted on CreateWorkspacePayload).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 10:45:30 -07:00
Hongming Wang
496dee8e13 feat(platform): GET /admin/workspaces/:id/test-token for E2E (#6)
Adds a gated admin endpoint that mints a fresh workspace bearer token on
demand, eliminating the register-race currently used by
test_comprehensive_e2e.sh (PR #5 follow-up).

- New handler admin_test_token.go: returns 404 unless MOLECULE_ENV != production
  or MOLECULE_ENABLE_TEST_TOKENS=1. Hides route existence in prod (404 not 403).
- Mints via wsauth.IssueToken; logs at INFO without the token itself.
- Verifies workspace exists before minting (missing -> 404, never 500).
- Tests cover prod-hidden, enable-flag-overrides-prod, missing workspace,
  and happy-path + token-validates round trip.
- tests/e2e/_lib.sh gains e2e_mint_test_token helper for downstream adoption.
- CLAUDE.md updated with route + env vars.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 09:35:26 -07:00
Hongming Wang
602f3ef685 fix(provisioner): stop rogue config-missing restart loop (#17)
Resolves #17.

Part A: scripts/cleanup-rogue-workspaces.sh deletes workspaces whose id
or name starts with known test placeholder prefixes (aaaaaaaa-, etc.)
and force-removes the paired Docker container. Documented in
tests/README.md.

Part B: add a pre-flight check in provisionWorkspace() — when neither a
template path nor in-memory configFiles supplies config.yaml, probe the
existing named volume via a throwaway alpine container. If the volume
lacks config.yaml, mark the workspace status='failed' with a clear
last_sample_error instead of handing it to Docker's unless-stopped
restart policy (which otherwise loops forever on FileNotFoundError).

New pure helper provisioner.ValidateConfigSource + unit tests.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 07:32:58 -07:00
Dev Lead Agent
85be574e4d fix(security): C18 register ownership check, C20 DELETE auth gate
C18 — Workspace URL hijacking (CRITICAL, CONFIRMED LIVE):
POST /registry/register now calls requireWorkspaceToken() before
persisting anything. If the workspace has any live auth tokens, the
caller must supply a valid Bearer token matching that workspace ID.
First registration (no tokens yet) passes through — token is issued
at end of this function (unchanged bootstrap contract). Mirrors the
same pattern already applied to /registry/heartbeat and
/registry/update-card. Attacker POC — overwriting Backend Engineer URL
to http://attacker.example.com:9999/steal — now returns 401.

C20 — Unauthenticated workspace deletion (CRITICAL, CONFIRMED LIVE):
DELETE /workspaces/:id moved from bare router into AdminAuth group.
Any valid workspace bearer token grants access (same fail-open
bootstrap contract as /settings/secrets). Mass-deletion attack chain
(C19 list → C20 delete all) requires auth for the DELETE step.
POST /workspaces (create) also moved to AdminAuth to prevent
unauthenticated workspace creation.

C19 (GET /workspaces topology exposure) deferred — canvas browser
has no bearer token; fix requires canvas service-token refactor.

Tests: 2 new registry tests — C18 bootstrap (no tokens, passes
through and issues token), C18 hijack blocked (has tokens, no
bearer → 401).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-14 07:38:53 +00:00
Dev Lead Agent
48ba0a1332 fix(security): block SSRF via registry URL validation (C6)
POST /registry/register accepted any URL string and persisted it as
the workspace's A2A endpoint — an attacker could register a workspace
with url=http://169.254.169.254/latest/meta-data/ and cause the platform
to proxy requests to the cloud metadata service when proxying A2A traffic.

Fix: validateAgentURL() helper rejects:
  - empty URL
  - non-http/https schemes (file://, ftp://, etc.)
  - 169.254.0.0/16 link-local IPs (AWS/GCP/Azure IMDS endpoints)
Allows RFC-1918 private ranges (Docker networking uses 172.16-31.x.x).

Adds 12 unit tests covering valid Docker-internal URLs and all SSRF vectors.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-14 06:37:37 +00:00
Hongming Wang
2871409f3d Merge pull request #31 from Molecule-AI/fix/security-cycle5-auth
fix(security): Cycle 5+6 — workspace auth middleware blocks all 16 open criticals
2026-04-13 23:22:10 -07:00
Dev Lead Agent
6c78962a33 fix(security): Cycle 5 — auth middleware, injection hardening, skill sandbox
Fix A — platform/internal/middleware/wsauth_middleware.go (NEW):
  WorkspaceAuth() gin middleware enforces per-workspace bearer-token auth on
  ALL /workspaces/:id/* sub-routes. Same lazy-bootstrap contract as
  secrets.Values: workspaces with no live token are grandfathered through.
  Blocks C2, C3, C4, C5, C7, C8, C9, C12, C13 simultaneously.

Fix A — platform/internal/router/router.go:
  Reorganised route registration: bare CRUD (/workspaces, /workspaces/:id)
  and /a2a remain on root router; all other /workspaces/:id/* sub-routes
  moved into wsAuth = r.Group("/workspaces/:id", middleware.WorkspaceAuth(db.DB)).
  CORS AllowHeaders updated to include Authorization so browser/agent callers
  can send the bearer token cross-origin.

Fix B — workspace-template/heartbeat.py:
  _check_delegations(): validate source_id == self.workspace_id before
  accepting a delegation result. Attacker-crafted records with a foreign
  source_id are silently skipped with a WARNING log (injection attempt).
  trigger_msg no longer embeds raw response_preview text; references
  delegation_id + status only — removes the prompt-injection vector.

Fix C — workspace-template/skill_loader/loader.py:
  load_skill_tools(): before exec_module(), verify script is within
  scripts_dir (path traversal guard) and temporarily scrub sensitive env
  vars (CLAUDE_CODE_OAUTH_TOKEN, ANTHROPIC_API_KEY, OPENAI_API_KEY,
  WORKSPACE_AUTH_TOKEN, GITHUB_TOKEN, GH_TOKEN) from os.environ; restore
  in finally block. Defence-in-depth even if /plugins auth gate is bypassed.

Fix D — platform/internal/handlers/socket.go:
  HandleConnect(): agent connections (X-Workspace-ID present) validated via
  wsauth.HasAnyLiveToken + wsauth.ValidateToken before WebSocket upgrade.
  Canvas clients (no X-Workspace-ID) remain unauthenticated.

Fix D — workspace-template/events.py:
  PlatformEventSubscriber._connect(): include platform_auth bearer token in
  WebSocket upgrade headers alongside X-Workspace-ID.

Fix E — workspace-template/executor_helpers.py:
  recall_memories() and commit_memory() now pass platform_auth bearer token
  in Authorization header so WorkspaceAuth middleware allows access.

Fix F — workspace-template/a2a_client.py:
  send_a2a_message(): timeout=None → httpx.Timeout(connect=30, read=300,
  write=30, pool=30). Resolves H2 flagged across 5 consecutive audits.

Tests: 149/149 Python tests pass (test_heartbeat + test_events updated to
assert new source_id validation behaviour and allow Authorization header).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-14 04:44:42 +00:00
Hongming Wang
b773276ba5 refactor(platform): split 981-line plugins.go into per-domain modules
Pure mechanical split — no behavior changes. Groups the PluginsHandler
surface area by responsibility so each file stays focused and readable.

Before: plugins.go — 981 lines, 32 funcs
After:
  plugins.go                   — 194  (struct, constructor, shared helpers)
  plugins_sources.go           —  14  (ListSources)
  plugins_listing.go           — 174  (ListRegistry, ListInstalled,
                                       ListAvailableForWorkspace,
                                       CheckRuntimeCompatibility)
  plugins_install.go           — 276  (Install, Uninstall, Download handlers)
  plugins_install_pipeline.go  — 368  (resolveAndStage, deliverToContainer,
                                       copy/stream tar, CLAUDE.md marker
                                       stripping, dirSize, httpErr,
                                       installRequest/stageResult,
                                       install-layer consts + envx caps)

plugins_test.go (1365 lines) untouched — tests pass unchanged.
go build, go vet, and go test -race ./internal/handlers/... all clean.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 18:01:59 -07:00
Hongming Wang
d751420679 test: 100% coverage of extracted helpers + ConfirmDialog singleButton
Follow-up to the quality-fixes-pass2 code review.

## Go: direct unit tests for PR #5 extracted helpers (~47 new tests)

a2a_proxy_test.go:
- resolveAgentURL: cache hit, cache-miss DB hit, not-found, null-URL,
  docker-rewrite guard
- dispatchA2A: build error, canvas timeout, agent timeout, success
- handleA2ADispatchError: context deadline, generic error, build error
- maybeMarkContainerDead: nil-provisioner, runtime=external short-circuits
- logA2AFailure, logA2ASuccess: activity_logs row content + status

delegation_test.go:
- bindDelegateRequest: valid / malformed / bad-UUID
- lookupIdempotentDelegation: no-key / no-match / failed-row-deleted / existing-pending
- insertDelegationRow: insertOK / insertHandledByIdempotent /
  insertTrackingUnavailable
- insertDelegationOutcome: zero-value is insertOutcomeUnknown sentinel

discovery_test.go:
- discoverWorkspacePeer: online / not-found / access-denied + 2 edges
- writeExternalWorkspaceURL: 3 cases
- discoverHostPeer: smoke test documents the unreachable-by-design path

activity_test.go:
- parseSessionSearchParams: defaults + custom limit/offset/q
- buildSessionSearchQuery: no-filters + with-query shapes
- scanSessionSearchRows: empty / single / multiple rows

Package coverage: 56.1% → 57.6%. Every helper extracted in PR #5 is
now at or near 100% line coverage (see PR notes for the 4 remaining
gaps, all blocked on provisioner interface mockability).

## Defensive enum zero-value fix

insertDelegationOutcome now starts with insertOutcomeUnknown=0 as a
sentinel so an un-initialized variable can't silently read as
"success". insertOK, insertHandledByIdempotent, insertTrackingUnavailable
shift to 1/2/3. No caller changes needed.

## Canvas: ConfirmDialog.singleButton test (5 cases)

canvas/src/components/__tests__/ConfirmDialog.test.tsx covers:
- default render (both buttons)
- singleButton hides Cancel
- singleButton: Escape still fires onCancel
- singleButton: backdrop-click still fires onCancel
- singleButton: onConfirm fires on click

vitest total: 352 → 357, all passing.

## Docstring clarity

ConfirmDialog.tsx: expanded singleButton prop comment to explicitly
instruct callers to pass the same handler for onConfirm/onCancel when
using it as an info toast (matches TemplatePalette usage).

## ErrorBoundary clipboard observability

.catch(() => {}) silently swallowed rejections. Now:
.catch((e) => console.warn("clipboard write failed:", e))
so permission-denied / insecure-context failures surface in the console.

## Verification

- go build ./... clean
- go vet ./... clean
- go test -race ./internal/... — all pass
- canvas npm run build — clean
- canvas npm test -- --run — 357/357 pass
- tests/e2e/test_api.sh — 46/62 pass; all 16 failures are pre-existing
  (token-auth enforcement + stale test workspaces + missing Docker
  network). None involve handlers touched in PR #5.
- Manual: platform + canvas running locally, title=Molecule AI,
  /workspaces returns [], /health returns ok. Identified + killed a
  stale Next.js server from the old Starfire-AgentTeam repo that was
  serving the old brand on IPv4 port 3000.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 17:08:33 -07:00
Hongming Wang
232766d0da chore: address follow-up code review — named enum, singleButton, tests
Post-review fixes on top of the quality-pass-2 branch.

1. delegation.go: replaced insertDelegationRow's (bool, bool) return
   with a typed insertDelegationOutcome enum (insertOK /
   insertHandledByIdempotent / insertTrackingUnavailable). Eliminates
   the positional-boolean decoding the caller had to do. Internal, no
   behavior change.

2. ConfirmDialog.tsx: added singleButton prop. When true, hides the
   Cancel button for single-action info toasts (Esc still dismisses
   via onCancel). TemplatePalette's import notice uses it.

3. ErrorBoundary.tsx: fixed the floating clipboard promise. Added
   .catch(() => {}) so a rejected writeText (permission denied,
   insecure context) doesn't surface as unhandled rejection.

4. a2a_proxy_test.go: added 5 direct unit tests for
   normalizeA2APayload (invalid JSON, wraps-bare, preserves-existing-
   id, preserves-existing-messageId, missing-method). Fills the unit-
   test gap for the helper extracted in the last pass.

Verification:
- go test -race ./internal/handlers/... passes (incl. 5 new tests)
- go build ./... clean
- canvas npm run build clean
- canvas npm test -- --run -> 352/352

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 14:45:05 -07:00
Hongming Wang
789f568bef chore: quality pass — native dialogs, env sync, Go handler splits
Three parallel cleanups driven by the second code-review pass.

## Native dialogs → ConfirmDialog (7 sites)

Violated the standing feedback_no_native_dialogs rule.

- ChannelsTab: confirm() → ConfirmDialog danger variant with pendingDelete state
- ScheduleTab: window.confirm() → ConfirmDialog danger
- ChatTab: confirm("Restart...") → ConfirmDialog warning (restart is recoverable)
- TemplatePalette: two alert() sites collapsed into a single notice state +
  ConfirmDialog as OK-only info toast
- ErrorBoundary: dropped both window.alert calls entirely. Clipboard-copy
  click is self-evident; console.error already captures the fallback.

## .env.example ↔ Go env var sync

Added 11 previously-undocumented env vars grouped into 6 new sections:

- Platform: PLATFORM_URL, MOLECULE_URL, WORKSPACE_DIR, MOLECULE_ENV
- CORS / rate limiting: CORS_ORIGINS, RATE_LIMIT
- Activity retention: ACTIVITY_RETENTION_DAYS, ACTIVITY_CLEANUP_INTERVAL_HOURS
- Container detection: MOLECULE_IN_DOCKER (moved to dedup)
- Observability: AWARENESS_URL
- Webhooks: GITHUB_WEBHOOK_SECRET
- CLI: MOLECLI_URL

All 21 distinct os.Getenv / envx.* keys (excluding HOME) now documented.
Zero orphans in the other direction.

## Go handler function splits (4 funcs, pure refactor)

No behavior change; same tests pass.

| Function                  | Before | After | Helpers                                                       |
|---------------------------|-------:|------:|---------------------------------------------------------------|
| proxyA2ARequest           |    257 |    56 | resolveAgentURL, normalizeA2APayload, dispatchA2A,            |
|                           |        |       | handleA2ADispatchError, maybeMarkContainerDead,               |
|                           |        |       | logA2AFailure, logA2ASuccess                                  |
| Delegate                  |    127 |    60 | bindDelegateRequest, lookupIdempotentDelegation,              |
|                           |        |       | insertDelegationRow                                           |
| Discover                  |    125 |    40 | discoverWorkspacePeer, writeExternalWorkspaceURL,             |
|                           |        |       | discoverHostPeer                                              |
| SessionSearch             |    109 |    24 | parseSessionSearchParams, buildSessionSearchQuery,            |
|                           |        |       | scanSessionSearchRows                                         |

Preserved exact error semantics, log.Printf calls, status codes, and
response shapes. Introduced a proxyDispatchBuildError sentinel in
a2a_proxy so the orchestrator can distinguish "couldn't build the
request" from "Do() failed" without changing existing branches.

## Verification

- go build ./... clean
- go vet ./... clean
- go test -race ./internal/... — all pass
- canvas npm run build — clean
- canvas npm test -- --run — 352/352 pass
- grep window.confirm|window.alert|window.prompt in canvas/src — 0 matches
- every platform os.Getenv key present in .env.example

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 14:36:30 -07:00
Hongming Wang
24fec62d7f initial commit — Molecule AI platform
Forked clean from public hackathon repo (Starfire-AgentTeam, BSL 1.1)
with full rebrand to Molecule AI under github.com/Molecule-AI/molecule-monorepo.

Brand: Starfire → Molecule AI.
Slug: starfire / agent-molecule → molecule.
Env vars: STARFIRE_* → MOLECULE_*.
Go module: github.com/agent-molecule/platform → github.com/Molecule-AI/molecule-monorepo/platform.
Python packages: starfire_plugin → molecule_plugin, starfire_agent → molecule_agent.
DB: agentmolecule → molecule.

History truncated; see public repo for prior commits and contributor
attribution. Verified green: go test -race ./... (platform), pytest
(workspace-template 1129 + sdk 132), vitest (canvas 352), build (mcp).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 11:55:37 -07:00