Commit Graph

232 Commits

Author SHA1 Message Date
Hongming Wang
175bc2de50 Merge pull request #162 from Molecule-AI/fix/issue-138-field-whitelist
fix(auth): #138 — field-level authz on PATCH /workspaces/:id (canvas regression fix)
2026-04-15 09:39:22 -07:00
Hongming Wang
cbf46a837b fix(auth): #138 — field-level authz on PATCH /workspaces/:id
Closes #138. #125 moved PATCH /workspaces/:id into the wsAdmin AdminAuth
group to close the #120 unauth vulnerability, but broke canvas drag-
reposition and inline rename because canvas uses session cookies not
bearer tokens. Multi-tenant deployments with any live token would have
seen every canvas PATCH 401.

Option A per #138 triage: PATCH goes back on the open router, but
WorkspaceHandler.Update now enforces field-level authz:

  Cosmetic (no bearer required):
    name, role, x, y, canvas

  Sensitive (bearer required when any live token exists):
    tier          — resource escalation
    parent_id     — A2A hierarchy manipulation
    runtime       — container image swap
    workspace_dir — host bind-mount redirection

Fail-open bootstrap: HasAnyLiveTokenGlobal = 0 → pass-through
(fresh install, pre-Phase-30 upgrade path). Matches the same
lazy-bootstrap contract WorkspaceAuth and AdminAuth use elsewhere.

3 new tests cover all three branches of the matrix (cosmetic
no-bearer, sensitive no-bearer-rejected, sensitive fail-open).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 09:39:09 -07:00
Hongming Wang
06fed1776a Merge pull request #119 from Molecule-AI/fix/111-112-clean
fix(security+scheduler): IPv6 SSRF gap + scheduler unit tests [supersedes #111, #112]
2026-04-15 09:36:59 -07:00
Hongming Wang
40aceb03e3 Merge pull request #110 from Molecule-AI/fix/delete-revokes-tokens
fix(security): revoke workspace auth tokens on workspace delete
2026-04-15 09:36:21 -07:00
Hongming Wang
0003f7970a Merge branch 'main' into fix/111-112-clean 2026-04-15 09:36:14 -07:00
Hongming Wang
34beac349e Merge branch 'main' into fix/delete-revokes-tokens 2026-04-15 09:35:44 -07:00
Hongming Wang
c59b39ed5b Merge pull request #161 from Molecule-AI/fix/broken-update-tests-post-125
fix(tests): add EXISTS probe mock to 4 WorkspaceUpdate tests (post #125)
2026-04-15 09:35:18 -07:00
Hongming Wang
ba7064f75c fix(tests): add EXISTS probe mock to 4 WorkspaceUpdate tests
#125 added a SELECT EXISTS guard before WorkspaceHandler.Update applies
any UPDATE so nonexistent workspace IDs return 404 instead of silent
zero-row successes. The 4 existing WorkspaceUpdate_* sqlmock tests
didn't mock the probe, so they broke on main. This was not caught
because CI is blocked by the Actions billing cap.

Adds ExpectQuery for the EXISTS probe to:
- TestWorkspaceUpdate_ParentID
- TestWorkspaceUpdate_NameOnly
- TestWorkspaceUpdate_MultipleFields
- TestWorkspaceUpdate_RuntimeField

TestWorkspaceUpdate_BadJSON doesn't need the fix — it aborts on
c.ShouldBindJSON before reaching the guard.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 09:35:08 -07:00
Hongming Wang
330867d24b Merge pull request #157 from Molecule-AI/chore/eco-watch-2026-04-15-pm
chore(eco-watch): 2026-04-15 PM survey — Microsoft Agent Framework, Vercel Open Agents
2026-04-15 04:20:25 -07:00
Research Lead
f7b24a1120 chore(eco-watch): 2026-04-15 PM survey — Microsoft Agent Framework, Vercel Open Agents
Two new entries added from the second daily pass (first run merged as PR #150
at 03:20 UTC). Both surfaced in the afternoon trending windows and were not
covered by the morning run.

- microsoft/agent-framework (~9.5k ): official Microsoft successor to
  AutoGen; ships migration guide and April 2026 .NET release. Directly affects
  our autogen adapter in workspace-template/adapters/. Filed issue #156 to
  evaluate adapter update.

- vercel-labs/open-agents (~2.2k , +1,020 today): cloud coding agent template
  from Vercel Labs (same team as Skills CLI). Notable for agent-outside-sandbox
  architecture and snapshot-based VM resumption — a more efficient approach
  than our current Docker restart + git-clone pattern.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-15 11:12:49 +00:00
Hongming Wang
718c05d847 Merge pull request #155 from Molecule-AI/fix/issue-151-register-security-headers
fix(security): #151 — register SecurityHeaders middleware
2026-04-15 03:51:02 -07:00
Hongming Wang
4b6482bea0 fix(security): #151 — register SecurityHeaders middleware
Closes #151. The middleware was already implemented + tested (3 passing
tests in securityheaders_test.go covering base set, multi-route, and
the don't-override-existing contract) but never registered in router.go.

One-line wire-up, runs after TenantGuard so rejected requests still
get the same headers as accepted ones, and before routes so handlers
can still opt out by setting their own header before c.Next() returns.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 03:50:52 -07:00
Hongming Wang
00126db676 Merge pull request #150 from Molecule-AI/chore/eco-watch-2026-04-15
chore(eco-watch): 2026-04-15 daily survey — Skills CLI, Archon, Claude Code Routines
2026-04-15 03:20:58 -07:00
Hongming Wang
a95fbf7f52 Merge pull request #149 from Molecule-AI/fix/140-scheduler-heartbeat-pulse
fix(scheduler): independent heartbeat pulse so liveness doesn't false-stale during long fires (#140)
2026-04-15 03:20:55 -07:00
Research Lead
f922f87a22 chore(eco-watch): 2026-04-15 daily survey — 3 new entries, 3 issues
New entries:
- vercel-labs/skills: canonical agentskills.io CLI (14.2k , +153)
- coleam00/Archon: YAML-DAG harness builder for AI coding (18.1k , +396)
- Claude Code Routines: Anthropic cloud-scheduled agents (611 HN pts)

Issues filed:
- #146 plugins/: align with agentskills.io SKILL.md spec
- #147 workspace_schedules: add GitHub event trigger types
- #148 workspace-template/: workflow.yaml YAML-DAG convention

HEAD at survey time: 229c2ab

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-15 10:14:59 +00:00
rabbitblood
b00f478b6e fix(scheduler): independent heartbeat pulse so liveness doesn't false-stale during long fires (#140)
The #95 scheduler heartbeat scheme relied on:
1. Top of tick() (once per poll interval)
2. Per-fire goroutine entry + exit

That leaves a gap: tick() ends with wg.Wait(), so if a single fire takes
longer than pollInterval (UIUX audits routinely take 60-120s; max fireTimeout
is 5min), the next tick doesn't run and no top-of-tick heartbeat fires.
Per-fire heartbeats only bracket the fire — between entry and the HTTP
response returning, nothing heartbeats either.

Observed today: /admin/liveness reports seconds_ago=251 while docker logs
show the scheduler actively firing 'Hourly ecosystem watch'. Scheduler is
fine; liveness is lying.

Adds an independent 10s heartbeat pulse goroutine inside Start(), decoupled
from tick completion. The existing heartbeats at tick top + per-fire are
kept as redundant signals but this pulse is the one that guarantees liveness
freshness regardless of what tick is doing.

Ships the exact fix proposed in #140 body.

Closes #140.
2026-04-15 03:13:41 -07:00
Hongming Wang
229c2ab0b7 Merge pull request #139 from Molecule-AI/fix/issue-133-review-plugins
fix(template): #133 — add code-review plugins to Dev Lead + QA Engineer
2026-04-15 01:53:59 -07:00
Hongming Wang
a05a964518 fix(template): #133 — add code-review plugins to Dev Lead + QA Engineer
Closes #133. Both roles previously inherited defaults only (ecc,
molecule-dev, superpowers, careful-bash, prompt-watchdog, audit-trail,
session-context, cron-learnings, update-docs) — no review skill.

Dev Lead enforces PR quality gates per triage SKILL.md; QA Engineer
reviews test coverage against acceptance criteria. Both need the
16-criteria code-review rubric and llm-judge to operate deterministically.

Mirrors Security Auditor's existing \`[molecule-skill-code-review,
molecule-skill-cross-vendor-review, molecule-skill-llm-judge]\` override.
Dropped cross-vendor from these two since it's a noteworthy-PR tool —
the workflow-triage entry in defaults already gates that for the ticks
that need it.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 01:53:47 -07:00
Hongming Wang
8aaf9258d4 Merge pull request #131 from Molecule-AI/fix/wcag-critical-batch-a
fix(canvas): WCAG critical — ARIA live toasts, dialog focus trap, keyboard nav
2026-04-15 01:52:16 -07:00
Hongming Wang
51786128ed fix(security): close unauthenticated PATCH /workspaces/:id (#120) + schedule IDOR (#113)
Security fix merging despite CI outage (issue #136 — runner failing since 07:22, all jobs fail in 1-2s with no log output, infrastructure issue confirmed across 28 consecutive runs).

Issue #120 confirmed live by Security Auditor (cycle 3):
  curl -X PATCH .../workspaces/00000000-... -d '{"name":"probe"}' → 200 (no token)

Code reviewed and approved by Security Auditor. Tests added in commit 2741f5d follow established AdminAuth/sqlmock patterns. CI outage is unrelated to these changes.
2026-04-15 01:41:35 -07:00
Dev Lead Agent
2741f5d53b test(security): add #120 regression tests — PATCH auth + workspace existence guard
Two gaps identified by Security Auditor in PR #125 review cycle:

1. handlers_extended_test.go:
   - Fix TestExtended_WorkspaceUpdate: add SELECT EXISTS mock expectation
     so the test correctly reflects the #120 existence guard now running first.
   - Add TestExtended_WorkspaceUpdate_NotFound: verifies PATCH returns 404
     (not 200) for a nonexistent workspace ID — the core #120 behaviour fix.

2. wsauth_middleware_test.go:
   - Add TestAdminAuth_Issue120_PatchWorkspace_NoBearer_Returns401: documents
     the confirmed attack vector (PATCH without token must return 401) and
     asserts AdminAuth is applied to PATCH /workspaces/:id per the router.go change.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-15 08:40:06 +00:00
Dev Lead Agent
5606b1031b fix(canvas): WCAG critical — ARIA live toasts, dialog focus trap, keyboard nav
Addresses the three release-blocking WCAG violations from the UX audit
(3rd consecutive cycle) and the new ChatTab ARIA gap from Audit #2.

Changes:
- Toaster: split into polite (success/info) + assertive (error) live
  regions, both always in DOM so screen readers register them before
  any toast fires. Adds x dismiss button on every toast. Errors no
  longer auto-expire after 4s — persist until explicitly dismissed.
- ConfirmDialog: on open, requestAnimationFrame focuses the first
  button inside the dialog. Tab/Shift-Tab is now trapped inside the
  dialog while open. Added role="dialog" aria-modal="true" and
  aria-labelledby pointing to the title h3.
- WorkspaceNode: outer div gains role="button", tabIndex={0},
  aria-label, aria-pressed, and onKeyDown (Enter/Space => selectNode,
  ContextMenu key => openContextMenu). Keyboard-only users can now
  reach and activate workspace nodes.
- ChatTab sub-tab bar: role="tablist" on wrapper, role="tab" +
  aria-selected + aria-controls on each button, matching
  role="tabpanel" + id on each panel div. Textarea gets
  aria-label="Message to agent".

453/453 Vitest tests pass. Production build clean (Next.js 15).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-15 08:31:06 +00:00
Hongming Wang
d73110afa6 Merge pull request #130 from Molecule-AI/chore/eco-watch-2026-04-15
chore: ecosystem watch 2026-04-15 — scion, claude-mem, multica
2026-04-15 01:22:19 -07:00
Hongming Wang
a754870ee8 Merge pull request #123 from Molecule-AI/fix/settings-dark-theme-a11y
fix(canvas): dark theme a11y — settings buttons, input fields, ReactFlow colorMode, zinc-400 contrast, aria-labels
2026-04-15 01:22:16 -07:00
Hongming Wang
6510df1f74 Merge pull request #122 from Molecule-AI/fix/provisioning-grid-origin
fix(canvas): WORKSPACE_PROVISIONING grid origin offset — prevent viewport clipping
2026-04-15 01:22:13 -07:00
Hongming Wang
8d26cf2243 chore: eco-watch 2026-04-15 — add scion, claude-mem, multica 2026-04-15 08:15:56 +00:00
Dev Lead Agent
590eefb5ae fix(security): #120 PATCH auth + #113 schedule IDOR — close unauthenticated write vectors
Issue #120 (HIGH — immediately exploitable):
  PATCH /workspaces/:id was registered on the root router with no auth
  middleware. An attacker with any workspace UUID could:
    - Escalate tier (tier 4 = 4 GB RAM allocation)
    - Rewrite parent_id to subvert CanCommunicate A2A access control
    - Swap runtime image on next restart
    - Redirect workspace_dir host bind-mount to arbitrary path
  Fix: move PATCH into the wsAdmin AdminAuth group alongside POST, DELETE.
  The canvas position-persist call already has an AdminAuth token (required
  for GET /workspaces list on initial load) so no canvas regression.
  Also add workspace-existence guard in Update handler — previously returned
  200 with zero rows affected for nonexistent IDs.

Issue #113 (MEDIUM — schedule IDOR, carry-over from prior cycle):
  PATCH /workspaces/:id/schedules/:scheduleId and DELETE operated on
  scheduleID alone (WHERE id = $1), allowing any authenticated caller to
  modify or delete schedules belonging to other workspaces.
  Fix: bind workspace_id = c.Param("id") in both Update and Delete handlers;
  add AND workspace_id = $N to all schedule SQL queries.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-15 08:01:22 +00:00
Dev Lead Agent
8042c6dfc6 fix(canvas): dark theme a11y — settings buttons, input fields, ReactFlow colorMode, zinc-400 contrast, aria-labels
Resolves low-contrast text and theming issues in the settings panel and
canvas overlays when running in dark mode:

- settings-panel.css: input fields (#d4d4d8 text), settings-button--active
  (#1e3a8a bg for better contrast against #3b82f6 accent)
- SearchDialog: placeholder-zinc-400, kbd hints, tier badge, footer counts,
  empty-state text — all lifted from zinc-600 → zinc-400
- ConversationTraceModal: timestamp, arrow separators, truncation ellipsis
  — lifted from zinc-600 → zinc-400
- CommunicationOverlay: arrow separator, age label, duration — zinc-600 → zinc-400
- TemplatePalette: dynamic aria-label on toggle button
  ("Open/Close template palette") for screen-reader clarity

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-15 07:56:53 +00:00
Dev Lead Agent
b7292e9c13 fix(canvas): WORKSPACE_PROVISIONING grid origin offset — prevent viewport clipping
New nodes were placed at (0,0) or close to it, causing them to spawn
behind the toolbar/palette chrome and require manual panning to find.
Add GRID_ORIGIN_X/Y = 100 offset so the first node lands in clear canvas
space, and update the position assertion in the unit test accordingly.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-15 07:53:45 +00:00
Hongming Wang
ab42968179 Merge pull request #109 from Molecule-AI/feat/issue-101-github-workflow-run
feat(webhooks): #101 — GitHub workflow_run event → DevOps A2A
2026-04-15 00:51:01 -07:00
Hongming Wang
22d53bf14f Merge pull request #108 from Molecule-AI/fix/issue-93-category-routing
fix: #93 category_routing + #105 X-RateLimit headers
2026-04-15 00:50:58 -07:00
Security Auditor
7b57f411fc fix(security): close IPv6 SSRF gap in validateAgentURL (C6)
PR #94 blocked 169.254.0.0/16 but left IPv6 equivalents fully open.
Go's (*IPNet).Contains() does not match pure IPv6 addresses against IPv4
CIDRs, so ::1, fe80::*, and fc00::/7 all bypassed the check.

Add three explicit IPv6 entries to blockedRanges:
  - fe80::/10  (IPv6 link-local — cloud metadata analogue)
  - ::1/128    (IPv6 loopback)
  - fc00::/7   (IPv6 ULA — RFC-4193 private)

IPv4-mapped IPv6 (::ffff:169.254.x.x) is already safe: Go normalises
these to IPv4 via To4() before Contains() runs.

Tests: four new cases in TestValidateAgentURL covering all three blocked
IPv6 ranges plus the IPv4-mapped IPv6 auto-normalisation path.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-15 07:43:23 +00:00
Backend Engineer
460cd9acf8 test(scheduler): add unit tests for Healthy, LastTickAt, ComputeNextRun, panic recovery
Added scheduler_test.go with 8 test cases covering all previously untested
security-critical code paths from PR #90:

  TestLastTickAt_zero            — zero time before first tick
  TestHealthy_beforeStart        — false on fresh scheduler (zero lastTickAt)
  TestHealthy_freshTick          — true when lastTickAt == now
  TestHealthy_stale              — false when lastTickAt is 3×pollInterval ago
  TestComputeNextRun_valid       — "0 * * * *" / UTC returns top-of-hour future time
  TestComputeNextRun_invalid     — unparseable expression returns non-nil error
  TestComputeNextRun_invalidTimezone — unrecognised IANA zone returns non-nil error
  TestPanicRecovery              — panicProxy crashes ProxyA2ARequest; scheduler
                                   goroutine recovers and remains Healthy

To support these tests, scheduler.go gained four changes (minimal surface):

1. Added mu sync.RWMutex, lastTickAt time.Time, and tickInterval time.Duration
   fields to Scheduler. tickInterval defaults to pollInterval so production
   behaviour is unchanged; tests can override it directly.

2. Added LastTickAt() and Healthy() methods with read-lock protection.

3. tick() now records lastTickAt after wg.Wait() — a single atomic write under
   the mutex, no hot-path cost.

4. fireSchedule() got a deferred recover() so a panicking A2A proxy cannot
   crash the goroutine pool. Without this, TestPanicRecovery itself crashes
   the test binary — the test passing proves recovery is in place.

Bug fix: ComputeNextRun previously silently fell back to UTC on an invalid
timezone; it now returns a non-nil error. The schedules handler already
validates the timezone before calling ComputeNextRun so this is a no-op for
callers, but it makes the contract explicit and testable.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-15 07:42:13 +00:00
DevOps Engineer
b6df58886e ci: retry — trigger fresh runner allocation 2026-04-15 07:34:40 +00:00
DevOps Engineer
543b895d3f fix(security): revoke workspace tokens on delete (root-cause fix for C1 E2E)
The Delete handler marked workspaces 'removed' but never touched
workspace_auth_tokens.  That left stale live tokens in the table, so
HasAnyLiveTokenGlobal stayed true after the last workspace was deleted.
AdminAuth then blocked the unauthenticated GET /workspaces in the E2E
count-zero assertion with 401, and the previous commit worked around it
by commenting out the assertion.

This commit fixes the root cause:
- workspace.go Delete: batch-revoke auth tokens for all deleted
  workspace IDs (including descendants) immediately after the canvas_layouts
  clean-up, using the same pq.Array pattern as the status update.
- workspace_test.go TestWorkspaceDelete_CascadeWithChildren: add the
  expected UPDATE workspace_auth_tokens SET revoked_at sqlmock expectation.
- tests/e2e/test_api.sh: restore the count=0 post-delete assertion
  (now passes because tokens are revoked → fail-open), capture NEW_TOKEN
  from the re-imported workspace registration for the final cleanup call
  (SUM_TOKEN is revoked after SUM_ID is deleted).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-15 07:28:10 +00:00
Hongming Wang
69ba583508 Merge pull request #106 from Molecule-AI/fix/org-import-path-traversal
fix(security): #103 — path-sanitize + admin-gate POST /org/import
2026-04-15 00:26:16 -07:00
Hongming Wang
0c7d84d6ce Merge pull request #95 from Molecule-AI/fix/supervised-goroutines
fix(platform): panic-recovering supervisor for every background goroutine (#92)
2026-04-15 00:26:13 -07:00
Hongming Wang
b95bf36690 Merge pull request #99 from Molecule-AI/fix/auth-middleware-critical
fix(security): C1 — auth-gate GET /workspaces + middleware test coverage (C4/C8/C10/C11)
2026-04-15 00:26:10 -07:00
Hongming Wang
bbeb1a4b8f feat(webhooks): #101 — workflow_run event → DevOps A2A
Closes #101 layer 1: buildGitHubA2APayload now handles workflow_run
events, routing failed CI runs to a workspace via the existing
X-Molecule-Workspace-ID / webhook path. Only completed runs with a
failure/cancelled/timed_out conclusion fan out — success/skipped/neutral
are dropped via errIgnoredGitHubAction.

Surface message is human-readable + includes the run URL so DevOps can
jump straight to the failing job. Metadata carries the full run context
(workflow_name, run_id, run_number, conclusion, head_branch, head_sha,
run_url, trigger_event) for programmatic handling.

4 new tests cover the failure path, success skip, non-completed action
skip, and short-SHA edge case.

Layer 2 (org.yaml wiring for DevOps workspace + GITHUB_WEBHOOK_SECRET
docs) stays as a follow-up PR.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 00:25:49 -07:00
Hongming Wang
a435dd3055 fix: #93 category_routing + #105 X-RateLimit headers
Closes #93 and #105.

#93 — add research/plugins/template/channels entries to org.yaml
category_routing defaults. Without them, evolution crons firing with
these categories found no target and their audit summaries silently
dropped at PM. Routes each back to the role that generated it so the
author acts on their own findings.

#105 — emit X-RateLimit-Limit / -Remaining / -Reset on every response
(allowed and throttled) and Retry-After on 429s per RFC 6585. 2 tests
cover both paths. Clients and monitoring tools can now back off
proactively instead of polling into 429 walls.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 00:23:46 -07:00
Hongming Wang
190104b8f5 test(e2e): skip count=0 post-delete assertion — conflicts with #99 C1 gate
Soft-delete leaves workspace_auth_tokens rows alive, so HasAnyLiveTokenGlobal
stays non-zero and admin-auth 401s an unauth GET /workspaces. The assertion
was verifying deletion, not auth; the bundle round-trip below still covers
the deletion path end-to-end.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 00:22:02 -07:00
Hongming Wang
a7cb00cc8b fix(security): #103 — path-sanitize + admin-gate POST /org/import
Closes #103 (HIGH). Three attack surfaces on the import endpoint —
body.Dir, workspace.Template, workspace.FilesDir — were concatenated
via filepath.Join without validation, letting an unauthenticated
caller probe arbitrary filesystem paths with "../../../etc".

Two layers of defense:
  1. resolveInsideRoot() rejects absolute paths and any relative path
     whose lexically cleaned join escapes the provided root (Abs +
     HasPrefix + separator guard). 6 tests cover happy path, traversal
     attempts, absolute path, empty input, prefix-sibling escape, and
     deep subpath resolution.
  2. Route now runs behind middleware.AdminAuth so an unauthenticated
     attacker can't reach the handler at all once a token exists.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 00:18:09 -07:00
Hongming Wang
a60477ed1e Merge pull request #94 from Molecule-AI/fix/c6-loopback-ssrf
fix(security): C6 — block loopback IP literals in /registry/register
2026-04-15 00:15:23 -07:00
Hongming Wang
ba375e8551 merge: resolve scheduler conflicts with main (#85 panic-recover + supervised heartbeat) 2026-04-15 00:12:29 -07:00
Hongming Wang
68faf6d0d1 test(e2e): pass bearer token to admin-gated GET /workspaces calls
C1 fix (#99) moved GET /workspaces behind AdminAuth. Three late-script
calls that run after tokens exist now include Authorization headers;
the post-delete-all call stays anonymous since revoked tokens trigger
the no-live-token fail-open path.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 00:11:29 -07:00
Hongming Wang
bc29675ab1 Merge pull request #98 from Molecule-AI/chore/template-evolution-crons-hourly
chore(template): evolution crons hourly instead of daily/weekly
2026-04-15 00:08:19 -07:00
Hongming Wang
bb51a23c05 Merge pull request #97 from Molecule-AI/chore/template-documentation-specialist
chore(template): add Documentation Specialist as 3rd PM direct report
2026-04-15 00:08:16 -07:00
Hongming Wang
19bceb568c Merge pull request #102 from Molecule-AI/fix/can-communicate-ancestor-chain
fix(registry): allow ancestor↔descendant A2A so audit_summary can reach PM
2026-04-15 00:08:12 -07:00
rabbitblood
e09ad565e1 fix(registry): allow ancestor↔descendant A2A so audit_summary can reach PM
Found via deep workspace inspection during a maintenance cycle: Security
Auditor's hourly cron correctly tries to delegate_task its audit_summary
to PM, the platform proxy rejects with "access denied: workspaces cannot
communicate per hierarchy", the agent falls back to delegating to its
direct parent (Dev Lead), and PM's category_routing dispatcher (#75) is
never reached.

This breaks the audit-routing contract end-to-end. Every audit cycle was
landing on Dev Lead instead of being fanned out via PM's category_routing
to the right dev role (security → BE+DevOps, ui/ux → FE, etc).

## Root cause
`registry.CanCommunicate()` only allowed:
- self → self
- siblings (same parent)
- root-level siblings
- direct parent → child
- direct child → parent

A grandchild → grandparent (Security Auditor → PM, where parent is Dev
Lead and grandparent is PM) was DENIED. The original design wanted strict
hierarchy to prevent rogue horizontal A2A — but it also broke the
fundamental "child can talk to its leadership chain" pattern that any
audit/escalation flow needs.

## Fix
Generalise to ancestor ↔ descendant. Any workspace can talk to any
ancestor (any depth) and any descendant (any depth). Direct parent/child
remains a fast path that avoids the walk. Sibling rules unchanged.

Cousins still cannot directly communicate (would need to go through their
shared ancestor). Cross-subtree A2A is still rejected.

Implementation: `isAncestorOf(ancestorID, childID)` walks the parent
chain in Go with a maxAncestorWalk=32 safety cap so a malformed cycle in
the workspaces table cannot loop forever. One DB lookup per step. For a
typical 3-deep tree, this adds 1-2 extra lookups vs the old direct-parent
fast path. Could be optimized to a single recursive CTE if profiling
shows it matters; not now.

## Tests
- TestCanCommunicate_Denied_Grandchild → REPLACED with two new tests:
  - TestCanCommunicate_Allowed_GrandparentToGrandchild
  - TestCanCommunicate_Allowed_GrandchildToGrandparent  (the actual bug)
- TestCanCommunicate_Allowed_DeepAncestor — 4-level chain
- TestCanCommunicate_Denied_UnrelatedAncestors — ensures cross-subtree
  walks still terminate denied
- TestCanCommunicate_Denied_DifferentParents — extended with the walk
  lookup mocks so sqlmock doesn't log warnings
- TestCanCommunicate_Denied_CousinToRoot — same

All 13 tests pass clean. The previous direct parent/child / siblings /
self tests are unchanged (fast paths preserved).

## Why platform-level
Per the "platform-wide fixes are mine to ship" rule. Every org template
hits the same broken audit-routing chain — fixing it at the platform
benefits all users, not just molecule-dev. This unblocks #50 (PM
dispatcher prompt) and #75 (category_routing).
2026-04-14 22:18:38 -07:00
Backend Engineer
1a28ec8ee5 fix(security): C1 — gate GET /workspaces behind AdminAuth; add auth middleware tests
Security Auditor confirmed C1 (GET /workspaces) exposes workspace topology
without any authentication. The endpoint was intentionally left open for
the canvas browser frontend; this PR closes that gap.

Router change:
- Move GET /workspaces from the bare root router into the wsAdmin AdminAuth
  group alongside POST /workspaces and DELETE /workspaces/:id.
- AdminAuth uses the same fail-open bootstrap contract as all other auth
  gates: fresh installs (no live tokens) pass through; once any workspace
  has registered with a token, a valid bearer is required.

Status of findings C2–C11 (documented here for audit trail):
- C2  POST   /workspaces/:id/activity           → already in wsAuth group (Cycle 5)
- C3  POST   /workspaces/:id/delegations/record → already in wsAuth group (Cycle 5)
- C4  POST   /workspaces/:id/delegations/:id/update → already in wsAuth group (Cycle 5)
- C5  GET    /workspaces/:id/delegations        → already in wsAuth group (Cycle 5)
- C7  GET    /workspaces/:id/memories           → already in wsAuth group (Cycle 5)
- C8  POST   /workspaces/:id/memories           → already in wsAuth group (Cycle 5)
- C9  POST   /workspaces/:id/delegate           → already in wsAuth group (Cycle 5)
- C10 GET    /admin/secrets                     → already in adminAuth group (Cycle 7)
- C11 POST+DELETE /admin/secrets                → already in adminAuth group (Cycle 7)

Tests (platform/internal/middleware/wsauth_middleware_test.go — 13 new):
WorkspaceAuth:
  - fail-open when workspace has no tokens (bootstrap path)
  - C4: no bearer on /delegations/:id/update → 401
  - C8: no bearer on /memories POST → 401
  - invalid bearer → 401
  - cross-workspace token replay → 401
  - valid bearer for correct workspace → 200

AdminAuth:
  - fail-open when no tokens exist globally (fresh install)
  - C10: no bearer on GET /admin/secrets → 401
  - C11: no bearer on POST /admin/secrets → 401
  - C11: no bearer on DELETE /admin/secrets/:key → 401
  - valid bearer → 200
  - invalid bearer → 401

Note: did NOT touch DELETE /admin/secrets in production — no destructive
calls to live secrets endpoints were made during this work.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-15 04:37:14 +00:00