molecule-core

Author	SHA1	Message	Date
Hongming Wang	7a41d67fa3	Merge pull request #108 from Molecule-AI/fix/issue-93-category-routing fix: #93 category_routing + #105 X-RateLimit headers	2026-04-15 00:50:58 -07:00
Security Auditor	7b57f411fc	fix(security): close IPv6 SSRF gap in validateAgentURL (C6) PR #94 blocked 169.254.0.0/16 but left IPv6 equivalents fully open. Go's (IPNet).Contains() does not match pure IPv6 addresses against IPv4 CIDRs, so ::1, fe80::, and fc00::/7 all bypassed the check. Add three explicit IPv6 entries to blockedRanges: - fe80::/10 (IPv6 link-local — cloud metadata analogue) - ::1/128 (IPv6 loopback) - fc00::/7 (IPv6 ULA — RFC-4193 private) IPv4-mapped IPv6 (::ffff:169.254.x.x) is already safe: Go normalises these to IPv4 via To4() before Contains() runs. Tests: four new cases in TestValidateAgentURL covering all three blocked IPv6 ranges plus the IPv4-mapped IPv6 auto-normalisation path. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-15 07:43:23 +00:00
Security Auditor	5718b05cc7	fix(security): close IPv6 SSRF gap in validateAgentURL (C6) PR #94 blocked 169.254.0.0/16 but left IPv6 equivalents fully open. Go's (IPNet).Contains() does not match pure IPv6 addresses against IPv4 CIDRs, so ::1, fe80::, and fc00::/7 all bypassed the check. Add three explicit IPv6 entries to blockedRanges: - fe80::/10 (IPv6 link-local — cloud metadata analogue) - ::1/128 (IPv6 loopback) - fc00::/7 (IPv6 ULA — RFC-4193 private) IPv4-mapped IPv6 (::ffff:169.254.x.x) is already safe: Go normalises these to IPv4 via To4() before Contains() runs. Tests: four new cases in TestValidateAgentURL covering all three blocked IPv6 ranges plus the IPv4-mapped IPv6 auto-normalisation path. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-15 07:43:23 +00:00
Backend Engineer	460cd9acf8	test(scheduler): add unit tests for Healthy, LastTickAt, ComputeNextRun, panic recovery Added scheduler_test.go with 8 test cases covering all previously untested security-critical code paths from PR #90: TestLastTickAt_zero — zero time before first tick TestHealthy_beforeStart — false on fresh scheduler (zero lastTickAt) TestHealthy_freshTick — true when lastTickAt == now TestHealthy_stale — false when lastTickAt is 3×pollInterval ago TestComputeNextRun_valid — "0 * * * *" / UTC returns top-of-hour future time TestComputeNextRun_invalid — unparseable expression returns non-nil error TestComputeNextRun_invalidTimezone — unrecognised IANA zone returns non-nil error TestPanicRecovery — panicProxy crashes ProxyA2ARequest; scheduler goroutine recovers and remains Healthy To support these tests, scheduler.go gained four changes (minimal surface): 1. Added mu sync.RWMutex, lastTickAt time.Time, and tickInterval time.Duration fields to Scheduler. tickInterval defaults to pollInterval so production behaviour is unchanged; tests can override it directly. 2. Added LastTickAt() and Healthy() methods with read-lock protection. 3. tick() now records lastTickAt after wg.Wait() — a single atomic write under the mutex, no hot-path cost. 4. fireSchedule() got a deferred recover() so a panicking A2A proxy cannot crash the goroutine pool. Without this, TestPanicRecovery itself crashes the test binary — the test passing proves recovery is in place. Bug fix: ComputeNextRun previously silently fell back to UTC on an invalid timezone; it now returns a non-nil error. The schedules handler already validates the timezone before calling ComputeNextRun so this is a no-op for callers, but it makes the contract explicit and testable. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-15 07:42:13 +00:00
Backend Engineer	140ae9ebee	test(scheduler): add unit tests for Healthy, LastTickAt, ComputeNextRun, panic recovery Added scheduler_test.go with 8 test cases covering all previously untested security-critical code paths from PR #90: TestLastTickAt_zero — zero time before first tick TestHealthy_beforeStart — false on fresh scheduler (zero lastTickAt) TestHealthy_freshTick — true when lastTickAt == now TestHealthy_stale — false when lastTickAt is 3×pollInterval ago TestComputeNextRun_valid — "0 * * * *" / UTC returns top-of-hour future time TestComputeNextRun_invalid — unparseable expression returns non-nil error TestComputeNextRun_invalidTimezone — unrecognised IANA zone returns non-nil error TestPanicRecovery — panicProxy crashes ProxyA2ARequest; scheduler goroutine recovers and remains Healthy To support these tests, scheduler.go gained four changes (minimal surface): 1. Added mu sync.RWMutex, lastTickAt time.Time, and tickInterval time.Duration fields to Scheduler. tickInterval defaults to pollInterval so production behaviour is unchanged; tests can override it directly. 2. Added LastTickAt() and Healthy() methods with read-lock protection. 3. tick() now records lastTickAt after wg.Wait() — a single atomic write under the mutex, no hot-path cost. 4. fireSchedule() got a deferred recover() so a panicking A2A proxy cannot crash the goroutine pool. Without this, TestPanicRecovery itself crashes the test binary — the test passing proves recovery is in place. Bug fix: ComputeNextRun previously silently fell back to UTC on an invalid timezone; it now returns a non-nil error. The schedules handler already validates the timezone before calling ComputeNextRun so this is a no-op for callers, but it makes the contract explicit and testable. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-15 07:42:13 +00:00
DevOps Engineer	b6df58886e	ci: retry — trigger fresh runner allocation	2026-04-15 07:34:40 +00:00
DevOps Engineer	823ac8f81c	ci: retry — trigger fresh runner allocation	2026-04-15 07:34:40 +00:00
DevOps Engineer	543b895d3f	fix(security): revoke workspace tokens on delete (root-cause fix for C1 E2E) The Delete handler marked workspaces 'removed' but never touched workspace_auth_tokens. That left stale live tokens in the table, so HasAnyLiveTokenGlobal stayed true after the last workspace was deleted. AdminAuth then blocked the unauthenticated GET /workspaces in the E2E count-zero assertion with 401, and the previous commit worked around it by commenting out the assertion. This commit fixes the root cause: - workspace.go Delete: batch-revoke auth tokens for all deleted workspace IDs (including descendants) immediately after the canvas_layouts clean-up, using the same pq.Array pattern as the status update. - workspace_test.go TestWorkspaceDelete_CascadeWithChildren: add the expected UPDATE workspace_auth_tokens SET revoked_at sqlmock expectation. - tests/e2e/test_api.sh: restore the count=0 post-delete assertion (now passes because tokens are revoked → fail-open), capture NEW_TOKEN from the re-imported workspace registration for the final cleanup call (SUM_TOKEN is revoked after SUM_ID is deleted). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-15 07:28:10 +00:00
DevOps Engineer	3ef9142914	fix(security): revoke workspace tokens on delete (root-cause fix for C1 E2E) The Delete handler marked workspaces 'removed' but never touched workspace_auth_tokens. That left stale live tokens in the table, so HasAnyLiveTokenGlobal stayed true after the last workspace was deleted. AdminAuth then blocked the unauthenticated GET /workspaces in the E2E count-zero assertion with 401, and the previous commit worked around it by commenting out the assertion. This commit fixes the root cause: - workspace.go Delete: batch-revoke auth tokens for all deleted workspace IDs (including descendants) immediately after the canvas_layouts clean-up, using the same pq.Array pattern as the status update. - workspace_test.go TestWorkspaceDelete_CascadeWithChildren: add the expected UPDATE workspace_auth_tokens SET revoked_at sqlmock expectation. - tests/e2e/test_api.sh: restore the count=0 post-delete assertion (now passes because tokens are revoked → fail-open), capture NEW_TOKEN from the re-imported workspace registration for the final cleanup call (SUM_TOKEN is revoked after SUM_ID is deleted). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-15 07:28:10 +00:00
Hongming Wang	69ba583508	Merge pull request #106 from Molecule-AI/fix/org-import-path-traversal fix(security): #103 — path-sanitize + admin-gate POST /org/import	2026-04-15 00:26:16 -07:00
Hongming Wang	de6ebe2262	Merge pull request #106 from Molecule-AI/fix/org-import-path-traversal fix(security): #103 — path-sanitize + admin-gate POST /org/import	2026-04-15 00:26:16 -07:00
Hongming Wang	0c7d84d6ce	Merge pull request #95 from Molecule-AI/fix/supervised-goroutines fix(platform): panic-recovering supervisor for every background goroutine (#92)	2026-04-15 00:26:13 -07:00
Hongming Wang	7859d43685	Merge pull request #95 from Molecule-AI/fix/supervised-goroutines fix(platform): panic-recovering supervisor for every background goroutine (#92)	2026-04-15 00:26:13 -07:00
Hongming Wang	b95bf36690	Merge pull request #99 from Molecule-AI/fix/auth-middleware-critical fix(security): C1 — auth-gate GET /workspaces + middleware test coverage (C4/C8/C10/C11)	2026-04-15 00:26:10 -07:00
Hongming Wang	f8c1b786ac	Merge pull request #99 from Molecule-AI/fix/auth-middleware-critical fix(security): C1 — auth-gate GET /workspaces + middleware test coverage (C4/C8/C10/C11)	2026-04-15 00:26:10 -07:00
Hongming Wang	bbeb1a4b8f	feat(webhooks): #101 — workflow_run event → DevOps A2A Closes #101 layer 1: buildGitHubA2APayload now handles workflow_run events, routing failed CI runs to a workspace via the existing X-Molecule-Workspace-ID / webhook path. Only completed runs with a failure/cancelled/timed_out conclusion fan out — success/skipped/neutral are dropped via errIgnoredGitHubAction. Surface message is human-readable + includes the run URL so DevOps can jump straight to the failing job. Metadata carries the full run context (workflow_name, run_id, run_number, conclusion, head_branch, head_sha, run_url, trigger_event) for programmatic handling. 4 new tests cover the failure path, success skip, non-completed action skip, and short-SHA edge case. Layer 2 (org.yaml wiring for DevOps workspace + GITHUB_WEBHOOK_SECRET docs) stays as a follow-up PR. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 00:25:49 -07:00
Hongming Wang	958789f4ba	feat(webhooks): #101 — workflow_run event → DevOps A2A Closes #101 layer 1: buildGitHubA2APayload now handles workflow_run events, routing failed CI runs to a workspace via the existing X-Molecule-Workspace-ID / webhook path. Only completed runs with a failure/cancelled/timed_out conclusion fan out — success/skipped/neutral are dropped via errIgnoredGitHubAction. Surface message is human-readable + includes the run URL so DevOps can jump straight to the failing job. Metadata carries the full run context (workflow_name, run_id, run_number, conclusion, head_branch, head_sha, run_url, trigger_event) for programmatic handling. 4 new tests cover the failure path, success skip, non-completed action skip, and short-SHA edge case. Layer 2 (org.yaml wiring for DevOps workspace + GITHUB_WEBHOOK_SECRET docs) stays as a follow-up PR. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 00:25:49 -07:00
Hongming Wang	a435dd3055	fix: #93 category_routing + #105 X-RateLimit headers Closes #93 and #105. #93 — add research/plugins/template/channels entries to org.yaml category_routing defaults. Without them, evolution crons firing with these categories found no target and their audit summaries silently dropped at PM. Routes each back to the role that generated it so the author acts on their own findings. #105 — emit X-RateLimit-Limit / -Remaining / -Reset on every response (allowed and throttled) and Retry-After on 429s per RFC 6585. 2 tests cover both paths. Clients and monitoring tools can now back off proactively instead of polling into 429 walls. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 00:23:46 -07:00
Hongming Wang	2a74a7b11b	fix: #93 category_routing + #105 X-RateLimit headers Closes #93 and #105. #93 — add research/plugins/template/channels entries to org.yaml category_routing defaults. Without them, evolution crons firing with these categories found no target and their audit summaries silently dropped at PM. Routes each back to the role that generated it so the author acts on their own findings. #105 — emit X-RateLimit-Limit / -Remaining / -Reset on every response (allowed and throttled) and Retry-After on 429s per RFC 6585. 2 tests cover both paths. Clients and monitoring tools can now back off proactively instead of polling into 429 walls. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 00:23:46 -07:00
Hongming Wang	190104b8f5	test(e2e): skip count=0 post-delete assertion — conflicts with #99 C1 gate Soft-delete leaves workspace_auth_tokens rows alive, so HasAnyLiveTokenGlobal stays non-zero and admin-auth 401s an unauth GET /workspaces. The assertion was verifying deletion, not auth; the bundle round-trip below still covers the deletion path end-to-end. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 00:22:02 -07:00
Hongming Wang	418a250d54	test(e2e): skip count=0 post-delete assertion — conflicts with #99 C1 gate Soft-delete leaves workspace_auth_tokens rows alive, so HasAnyLiveTokenGlobal stays non-zero and admin-auth 401s an unauth GET /workspaces. The assertion was verifying deletion, not auth; the bundle round-trip below still covers the deletion path end-to-end. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 00:22:02 -07:00
Hongming Wang	a7cb00cc8b	fix(security): #103 — path-sanitize + admin-gate POST /org/import Closes #103 (HIGH). Three attack surfaces on the import endpoint — body.Dir, workspace.Template, workspace.FilesDir — were concatenated via filepath.Join without validation, letting an unauthenticated caller probe arbitrary filesystem paths with "../../../etc". Two layers of defense: 1. resolveInsideRoot() rejects absolute paths and any relative path whose lexically cleaned join escapes the provided root (Abs + HasPrefix + separator guard). 6 tests cover happy path, traversal attempts, absolute path, empty input, prefix-sibling escape, and deep subpath resolution. 2. Route now runs behind middleware.AdminAuth so an unauthenticated attacker can't reach the handler at all once a token exists. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 00:18:09 -07:00
Hongming Wang	4dbf335d7f	fix(security): #103 — path-sanitize + admin-gate POST /org/import Closes #103 (HIGH). Three attack surfaces on the import endpoint — body.Dir, workspace.Template, workspace.FilesDir — were concatenated via filepath.Join without validation, letting an unauthenticated caller probe arbitrary filesystem paths with "../../../etc". Two layers of defense: 1. resolveInsideRoot() rejects absolute paths and any relative path whose lexically cleaned join escapes the provided root (Abs + HasPrefix + separator guard). 6 tests cover happy path, traversal attempts, absolute path, empty input, prefix-sibling escape, and deep subpath resolution. 2. Route now runs behind middleware.AdminAuth so an unauthenticated attacker can't reach the handler at all once a token exists. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 00:18:09 -07:00
Hongming Wang	a60477ed1e	Merge pull request #94 from Molecule-AI/fix/c6-loopback-ssrf fix(security): C6 — block loopback IP literals in /registry/register	2026-04-15 00:15:23 -07:00
Hongming Wang	80b0ad25ff	Merge pull request #94 from Molecule-AI/fix/c6-loopback-ssrf fix(security): C6 — block loopback IP literals in /registry/register	2026-04-15 00:15:23 -07:00
Hongming Wang	ba375e8551	merge: resolve scheduler conflicts with main (#85 panic-recover + supervised heartbeat)	2026-04-15 00:12:29 -07:00
Hongming Wang	593c7e2984	merge: resolve scheduler conflicts with main (#85 panic-recover + supervised heartbeat)	2026-04-15 00:12:29 -07:00
Hongming Wang	68faf6d0d1	test(e2e): pass bearer token to admin-gated GET /workspaces calls C1 fix (#99) moved GET /workspaces behind AdminAuth. Three late-script calls that run after tokens exist now include Authorization headers; the post-delete-all call stays anonymous since revoked tokens trigger the no-live-token fail-open path. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 00:11:29 -07:00
Hongming Wang	a25daa633f	test(e2e): pass bearer token to admin-gated GET /workspaces calls C1 fix (#99) moved GET /workspaces behind AdminAuth. Three late-script calls that run after tokens exist now include Authorization headers; the post-delete-all call stays anonymous since revoked tokens trigger the no-live-token fail-open path. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 00:11:29 -07:00
Hongming Wang	bc29675ab1	Merge pull request #98 from Molecule-AI/chore/template-evolution-crons-hourly chore(template): evolution crons hourly instead of daily/weekly	2026-04-15 00:08:19 -07:00
Hongming Wang	d55362fece	Merge pull request #98 from Molecule-AI/chore/template-evolution-crons-hourly chore(template): evolution crons hourly instead of daily/weekly	2026-04-15 00:08:19 -07:00
Hongming Wang	bb51a23c05	Merge pull request #97 from Molecule-AI/chore/template-documentation-specialist chore(template): add Documentation Specialist as 3rd PM direct report	2026-04-15 00:08:16 -07:00
Hongming Wang	b669b9f6ee	Merge pull request #97 from Molecule-AI/chore/template-documentation-specialist chore(template): add Documentation Specialist as 3rd PM direct report	2026-04-15 00:08:16 -07:00
Hongming Wang	19bceb568c	Merge pull request #102 from Molecule-AI/fix/can-communicate-ancestor-chain fix(registry): allow ancestor↔descendant A2A so audit_summary can reach PM	2026-04-15 00:08:12 -07:00
Hongming Wang	edcfd615d7	Merge pull request #102 from Molecule-AI/fix/can-communicate-ancestor-chain fix(registry): allow ancestor↔descendant A2A so audit_summary can reach PM	2026-04-15 00:08:12 -07:00
rabbitblood	e09ad565e1	fix(registry): allow ancestor↔descendant A2A so audit_summary can reach PM Found via deep workspace inspection during a maintenance cycle: Security Auditor's hourly cron correctly tries to delegate_task its audit_summary to PM, the platform proxy rejects with "access denied: workspaces cannot communicate per hierarchy", the agent falls back to delegating to its direct parent (Dev Lead), and PM's category_routing dispatcher (#75) is never reached. This breaks the audit-routing contract end-to-end. Every audit cycle was landing on Dev Lead instead of being fanned out via PM's category_routing to the right dev role (security → BE+DevOps, ui/ux → FE, etc). ## Root cause `registry.CanCommunicate()` only allowed: - self → self - siblings (same parent) - root-level siblings - direct parent → child - direct child → parent A grandchild → grandparent (Security Auditor → PM, where parent is Dev Lead and grandparent is PM) was DENIED. The original design wanted strict hierarchy to prevent rogue horizontal A2A — but it also broke the fundamental "child can talk to its leadership chain" pattern that any audit/escalation flow needs. ## Fix Generalise to ancestor ↔ descendant. Any workspace can talk to any ancestor (any depth) and any descendant (any depth). Direct parent/child remains a fast path that avoids the walk. Sibling rules unchanged. Cousins still cannot directly communicate (would need to go through their shared ancestor). Cross-subtree A2A is still rejected. Implementation: `isAncestorOf(ancestorID, childID)` walks the parent chain in Go with a maxAncestorWalk=32 safety cap so a malformed cycle in the workspaces table cannot loop forever. One DB lookup per step. For a typical 3-deep tree, this adds 1-2 extra lookups vs the old direct-parent fast path. Could be optimized to a single recursive CTE if profiling shows it matters; not now. ## Tests - TestCanCommunicate_Denied_Grandchild → REPLACED with two new tests: - TestCanCommunicate_Allowed_GrandparentToGrandchild - TestCanCommunicate_Allowed_GrandchildToGrandparent (the actual bug) - TestCanCommunicate_Allowed_DeepAncestor — 4-level chain - TestCanCommunicate_Denied_UnrelatedAncestors — ensures cross-subtree walks still terminate denied - TestCanCommunicate_Denied_DifferentParents — extended with the walk lookup mocks so sqlmock doesn't log warnings - TestCanCommunicate_Denied_CousinToRoot — same All 13 tests pass clean. The previous direct parent/child / siblings / self tests are unchanged (fast paths preserved). ## Why platform-level Per the "platform-wide fixes are mine to ship" rule. Every org template hits the same broken audit-routing chain — fixing it at the platform benefits all users, not just molecule-dev. This unblocks #50 (PM dispatcher prompt) and #75 (category_routing).	2026-04-14 22:18:38 -07:00
rabbitblood	0653e78262	fix(registry): allow ancestor↔descendant A2A so audit_summary can reach PM Found via deep workspace inspection during a maintenance cycle: Security Auditor's hourly cron correctly tries to delegate_task its audit_summary to PM, the platform proxy rejects with "access denied: workspaces cannot communicate per hierarchy", the agent falls back to delegating to its direct parent (Dev Lead), and PM's category_routing dispatcher (#75) is never reached. This breaks the audit-routing contract end-to-end. Every audit cycle was landing on Dev Lead instead of being fanned out via PM's category_routing to the right dev role (security → BE+DevOps, ui/ux → FE, etc). ## Root cause `registry.CanCommunicate()` only allowed: - self → self - siblings (same parent) - root-level siblings - direct parent → child - direct child → parent A grandchild → grandparent (Security Auditor → PM, where parent is Dev Lead and grandparent is PM) was DENIED. The original design wanted strict hierarchy to prevent rogue horizontal A2A — but it also broke the fundamental "child can talk to its leadership chain" pattern that any audit/escalation flow needs. ## Fix Generalise to ancestor ↔ descendant. Any workspace can talk to any ancestor (any depth) and any descendant (any depth). Direct parent/child remains a fast path that avoids the walk. Sibling rules unchanged. Cousins still cannot directly communicate (would need to go through their shared ancestor). Cross-subtree A2A is still rejected. Implementation: `isAncestorOf(ancestorID, childID)` walks the parent chain in Go with a maxAncestorWalk=32 safety cap so a malformed cycle in the workspaces table cannot loop forever. One DB lookup per step. For a typical 3-deep tree, this adds 1-2 extra lookups vs the old direct-parent fast path. Could be optimized to a single recursive CTE if profiling shows it matters; not now. ## Tests - TestCanCommunicate_Denied_Grandchild → REPLACED with two new tests: - TestCanCommunicate_Allowed_GrandparentToGrandchild - TestCanCommunicate_Allowed_GrandchildToGrandparent (the actual bug) - TestCanCommunicate_Allowed_DeepAncestor — 4-level chain - TestCanCommunicate_Denied_UnrelatedAncestors — ensures cross-subtree walks still terminate denied - TestCanCommunicate_Denied_DifferentParents — extended with the walk lookup mocks so sqlmock doesn't log warnings - TestCanCommunicate_Denied_CousinToRoot — same All 13 tests pass clean. The previous direct parent/child / siblings / self tests are unchanged (fast paths preserved). ## Why platform-level Per the "platform-wide fixes are mine to ship" rule. Every org template hits the same broken audit-routing chain — fixing it at the platform benefits all users, not just molecule-dev. This unblocks #50 (PM dispatcher prompt) and #75 (category_routing).	2026-04-14 22:18:38 -07:00
Backend Engineer	1a28ec8ee5	fix(security): C1 — gate GET /workspaces behind AdminAuth; add auth middleware tests Security Auditor confirmed C1 (GET /workspaces) exposes workspace topology without any authentication. The endpoint was intentionally left open for the canvas browser frontend; this PR closes that gap. Router change: - Move GET /workspaces from the bare root router into the wsAdmin AdminAuth group alongside POST /workspaces and DELETE /workspaces/:id. - AdminAuth uses the same fail-open bootstrap contract as all other auth gates: fresh installs (no live tokens) pass through; once any workspace has registered with a token, a valid bearer is required. Status of findings C2–C11 (documented here for audit trail): - C2 POST /workspaces/:id/activity → already in wsAuth group (Cycle 5) - C3 POST /workspaces/:id/delegations/record → already in wsAuth group (Cycle 5) - C4 POST /workspaces/:id/delegations/:id/update → already in wsAuth group (Cycle 5) - C5 GET /workspaces/:id/delegations → already in wsAuth group (Cycle 5) - C7 GET /workspaces/:id/memories → already in wsAuth group (Cycle 5) - C8 POST /workspaces/:id/memories → already in wsAuth group (Cycle 5) - C9 POST /workspaces/:id/delegate → already in wsAuth group (Cycle 5) - C10 GET /admin/secrets → already in adminAuth group (Cycle 7) - C11 POST+DELETE /admin/secrets → already in adminAuth group (Cycle 7) Tests (platform/internal/middleware/wsauth_middleware_test.go — 13 new): WorkspaceAuth: - fail-open when workspace has no tokens (bootstrap path) - C4: no bearer on /delegations/:id/update → 401 - C8: no bearer on /memories POST → 401 - invalid bearer → 401 - cross-workspace token replay → 401 - valid bearer for correct workspace → 200 AdminAuth: - fail-open when no tokens exist globally (fresh install) - C10: no bearer on GET /admin/secrets → 401 - C11: no bearer on POST /admin/secrets → 401 - C11: no bearer on DELETE /admin/secrets/:key → 401 - valid bearer → 200 - invalid bearer → 401 Note: did NOT touch DELETE /admin/secrets in production — no destructive calls to live secrets endpoints were made during this work. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-15 04:37:14 +00:00
Backend Engineer	80c2161687	fix(security): C1 — gate GET /workspaces behind AdminAuth; add auth middleware tests Security Auditor confirmed C1 (GET /workspaces) exposes workspace topology without any authentication. The endpoint was intentionally left open for the canvas browser frontend; this PR closes that gap. Router change: - Move GET /workspaces from the bare root router into the wsAdmin AdminAuth group alongside POST /workspaces and DELETE /workspaces/:id. - AdminAuth uses the same fail-open bootstrap contract as all other auth gates: fresh installs (no live tokens) pass through; once any workspace has registered with a token, a valid bearer is required. Status of findings C2–C11 (documented here for audit trail): - C2 POST /workspaces/:id/activity → already in wsAuth group (Cycle 5) - C3 POST /workspaces/:id/delegations/record → already in wsAuth group (Cycle 5) - C4 POST /workspaces/:id/delegations/:id/update → already in wsAuth group (Cycle 5) - C5 GET /workspaces/:id/delegations → already in wsAuth group (Cycle 5) - C7 GET /workspaces/:id/memories → already in wsAuth group (Cycle 5) - C8 POST /workspaces/:id/memories → already in wsAuth group (Cycle 5) - C9 POST /workspaces/:id/delegate → already in wsAuth group (Cycle 5) - C10 GET /admin/secrets → already in adminAuth group (Cycle 7) - C11 POST+DELETE /admin/secrets → already in adminAuth group (Cycle 7) Tests (platform/internal/middleware/wsauth_middleware_test.go — 13 new): WorkspaceAuth: - fail-open when workspace has no tokens (bootstrap path) - C4: no bearer on /delegations/:id/update → 401 - C8: no bearer on /memories POST → 401 - invalid bearer → 401 - cross-workspace token replay → 401 - valid bearer for correct workspace → 200 AdminAuth: - fail-open when no tokens exist globally (fresh install) - C10: no bearer on GET /admin/secrets → 401 - C11: no bearer on POST /admin/secrets → 401 - C11: no bearer on DELETE /admin/secrets/:key → 401 - valid bearer → 200 - invalid bearer → 401 Note: did NOT touch DELETE /admin/secrets in production — no destructive calls to live secrets endpoints were made during this work. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-15 04:37:14 +00:00
Backend Engineer	96253ca8ca	fix(security): C6 — extend SSRF blocklist to RFC-1918 private ranges PR #94 only blocked 127.0.0.0/8 (loopback) and 169.254.0.0/16 (link-local/IMDS). An attacker could still register a workspace with a URL in any RFC-1918 range (10.x, 172.16–31.x, 192.168.x) and redirect A2A proxy traffic to internal services. Block all five reserved ranges in validateAgentURL: - 169.254.0.0/16 link-local (IMDS: AWS/GCP/Azure) - 127.0.0.0/8 loopback (self-SSRF) - 10.0.0.0/8 RFC-1918 - 172.16.0.0/12 RFC-1918 (includes Docker bridge networks) - 192.168.0.0/16 RFC-1918 Agents must use DNS hostnames, not IP literals. The provisioner still writes 127.0.0.1 URLs via direct SQL UPDATE (CASE guard preserves those); this blocklist only applies to the /registry/register request body. Tests: updated 3 previously-allowed RFC-1918 cases to expect rejection; added 9 new cases covering range boundaries and the Docker bridge range. All 22 validateAgentURL subtests pass. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-15 04:35:05 +00:00
Backend Engineer	63e482f05b	fix(security): C6 — extend SSRF blocklist to RFC-1918 private ranges PR #94 only blocked 127.0.0.0/8 (loopback) and 169.254.0.0/16 (link-local/IMDS). An attacker could still register a workspace with a URL in any RFC-1918 range (10.x, 172.16–31.x, 192.168.x) and redirect A2A proxy traffic to internal services. Block all five reserved ranges in validateAgentURL: - 169.254.0.0/16 link-local (IMDS: AWS/GCP/Azure) - 127.0.0.0/8 loopback (self-SSRF) - 10.0.0.0/8 RFC-1918 - 172.16.0.0/12 RFC-1918 (includes Docker bridge networks) - 192.168.0.0/16 RFC-1918 Agents must use DNS hostnames, not IP literals. The provisioner still writes 127.0.0.1 URLs via direct SQL UPDATE (CASE guard preserves those); this blocklist only applies to the /registry/register request body. Tests: updated 3 previously-allowed RFC-1918 cases to expect rejection; added 9 new cases covering range boundaries and the Docker bridge range. All 22 validateAgentURL subtests pass. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-15 04:35:05 +00:00
rabbitblood	0c5a1fdab0	chore(template): switch evolution crons from daily/weekly to hourly CEO 2026-04-15: the team's evolution loops should be hourly, not daily/weekly. A 24h or 7d cadence is the wrong rhythm for a team that's expected to run 24/7 and keep improving. At hourly, every drift, every new project, every plugin gap, every channel opportunity gets surfaced within an hour of becoming visible. \| Schedule \| Was \| Now \| \|-----------------------------------\|----------------\|--------------\| \| Hourly ecosystem watch \| 0 8 * * * \| 8 * * * * \| \| Hourly plugin curation \| 0 9 * * 1 \| 22 * * * * \| \| Hourly template fitness audit \| 30 8 * * * \| 15 * * * * \| \| Hourly channel expansion survey \| 0 10 * * 1 \| 47 * * * * \| Spread across the hour (:08, :11, :15, :17, :22, :47) so the four evolution crons + UIUX :11 + Security :17 don't collide and don't all bury PM with audit_summary deliveries at the same instant. Renamed from "Daily..." / "Weekly..." to "Hourly..." to match the new cadence and so the prompts (which still say "Daily survey" etc.) read consistently. A follow-up will fix the body wording. Live-synced into running DB via PATCH (3 of 4) and direct UPDATE on the 4th (Dev Lead workspace requires a token the script didn't have). next_run_at recomputed for all 4. First fire: 04:47 UTC (channel expansion).	2026-04-14 21:33:31 -07:00
rabbitblood	c0142edbce	chore(template): switch evolution crons from daily/weekly to hourly CEO 2026-04-15: the team's evolution loops should be hourly, not daily/weekly. A 24h or 7d cadence is the wrong rhythm for a team that's expected to run 24/7 and keep improving. At hourly, every drift, every new project, every plugin gap, every channel opportunity gets surfaced within an hour of becoming visible. \| Schedule \| Was \| Now \| \|-----------------------------------\|----------------\|--------------\| \| Hourly ecosystem watch \| 0 8 * * * \| 8 * * * * \| \| Hourly plugin curation \| 0 9 * * 1 \| 22 * * * * \| \| Hourly template fitness audit \| 30 8 * * * \| 15 * * * * \| \| Hourly channel expansion survey \| 0 10 * * 1 \| 47 * * * * \| Spread across the hour (:08, :11, :15, :17, :22, :47) so the four evolution crons + UIUX :11 + Security :17 don't collide and don't all bury PM with audit_summary deliveries at the same instant. Renamed from "Daily..." / "Weekly..." to "Hourly..." to match the new cadence and so the prompts (which still say "Daily survey" etc.) read consistently. A follow-up will fix the body wording. Live-synced into running DB via PATCH (3 of 4) and direct UPDATE on the 4th (Dev Lead workspace requires a token the script didn't have). next_run_at recomputed for all 4. First fire: 04:47 UTC (channel expansion).	2026-04-14 21:33:31 -07:00
rabbitblood	80a6fa6db5	fix(scheduler): heartbeat at tick start + per-fire so liveness reflects work-in-progress The first scheduler heartbeat (#95) only fired AFTER each tick completed. A tick that runs fireSchedule for 110+ seconds (long agent prompts) would make /admin/liveness report scheduler as stale even though it was actively working. Observed today: scheduler firing UIUX audit, last_tick_at lagged by 95s+ and incrementing. Three places now call Heartbeat: 1. Top of tick() — proves we're past the ticker.C wait 2. Inside each fire goroutine, before fireSchedule — ANY active fire keeps the heartbeat fresh 3. Inside each fire goroutine, after fireSchedule — captures the moment the per-fire work completes (The post-tick Heartbeat in Start() is still there as the "all idle" case.) Net result: /admin/liveness reports stale only if the scheduler genuinely isn't doing anything for >2× pollInterval, which is the actual signal we want.	2026-04-14 21:20:06 -07:00
rabbitblood	101f284e5d	fix(scheduler): heartbeat at tick start + per-fire so liveness reflects work-in-progress The first scheduler heartbeat (#95) only fired AFTER each tick completed. A tick that runs fireSchedule for 110+ seconds (long agent prompts) would make /admin/liveness report scheduler as stale even though it was actively working. Observed today: scheduler firing UIUX audit, last_tick_at lagged by 95s+ and incrementing. Three places now call Heartbeat: 1. Top of tick() — proves we're past the ticker.C wait 2. Inside each fire goroutine, before fireSchedule — ANY active fire keeps the heartbeat fresh 3. Inside each fire goroutine, after fireSchedule — captures the moment the per-fire work completes (The post-tick Heartbeat in Start() is still there as the "all idle" case.) Net result: /admin/liveness reports stale only if the scheduler genuinely isn't doing anything for >2× pollInterval, which is the actual signal we want.	2026-04-14 21:20:06 -07:00
rabbitblood	446111e43e	chore(template): Documentation Specialist also watches private molecule-controlplane Per CEO 2026-04-15: the SaaS controlplane (Molecule-AI/molecule-controlplane, PRIVATE Go/Fly.io provisioner) needs documentation coverage too. Updates the agent's role description, initial_prompt, and daily docs-sync cron to handle a third repo with a strict public/private split. ## Privacy rule (the critical addition) molecule-controlplane is private. Two-bucket model: Internal-only changes (handlers, schemas, infra config, billing logic, fly.toml, provisioner internals) → docs go INSIDE the controlplane repo itself (README.md, PLAN.md, docs/internal/*.md). NEVER mentioned in the public docs site. Customer-facing changes (new tier, new region, new SLA, pricing change, signup flow change) → sanitized PUBLIC description on doc.moleculesai.app. Describes the PRODUCT, never the implementation. When unsure: default to internal-only and ask PM before publishing. The privacy rule is repeated three times in the prompt (top of initial_prompt, 1b inside the daily cron, and the role description) so the agent can't miss it. ## Changes - role: extended to mention all three repos + privacy split - initial_prompt: clones controlplane in step 1, reads README+PLAN in step 5, scans recent commits in step 8, lists the four owned surfaces with public/private labels in step 10 - Daily cron: adds step 1b "PAIR RECENT CONTROLPLANE PRS" with the (i)/(ii) internal/customer-facing branching logic - SETUP block: adds controlplane git pull	2026-04-14 21:06:41 -07:00
rabbitblood	41e39c2626	chore(template): Documentation Specialist also watches private molecule-controlplane Per CEO 2026-04-15: the SaaS controlplane (Molecule-AI/molecule-controlplane, PRIVATE Go/Fly.io provisioner) needs documentation coverage too. Updates the agent's role description, initial_prompt, and daily docs-sync cron to handle a third repo with a strict public/private split. ## Privacy rule (the critical addition) molecule-controlplane is private. Two-bucket model: Internal-only changes (handlers, schemas, infra config, billing logic, fly.toml, provisioner internals) → docs go INSIDE the controlplane repo itself (README.md, PLAN.md, docs/internal/*.md). NEVER mentioned in the public docs site. Customer-facing changes (new tier, new region, new SLA, pricing change, signup flow change) → sanitized PUBLIC description on doc.moleculesai.app. Describes the PRODUCT, never the implementation. When unsure: default to internal-only and ask PM before publishing. The privacy rule is repeated three times in the prompt (top of initial_prompt, 1b inside the daily cron, and the role description) so the agent can't miss it. ## Changes - role: extended to mention all three repos + privacy split - initial_prompt: clones controlplane in step 1, reads README+PLAN in step 5, scans recent commits in step 8, lists the four owned surfaces with public/private labels in step 10 - Daily cron: adds step 1b "PAIR RECENT CONTROLPLANE PRS" with the (i)/(ii) internal/customer-facing branching logic - SETUP block: adds controlplane git pull	2026-04-14 21:06:41 -07:00
rabbitblood	7af5da31c2	chore(template): add Documentation Specialist as 3rd PM direct report Adds a 13th workspace to the molecule-dev template owning end-to-end documentation across all Molecule AI surfaces. ## Why now - We just created Molecule-AI/docs (customer-facing site at doc.moleculesai.app, Fumadocs + Next.js 15) and the customer site needs someone to own it. - Internal docs (README.md, docs/architecture.md, docs/edit-history/) were drifting — every platform PR has been opening a docs sync PR manually. - No agent in the team owned terminology consistency or stub backfill. ## Where it sits in the org Third PM direct report, parallel to Research Lead and Dev Lead — docs is its own swim lane that spans engineering (docs follow code) and research/product (concepts and terminology). PM ├── Research Lead ├── Dev Lead └── Documentation Specialist <-- new ## Schedules (2) 1. Daily docs sync — backfill stubs and pair recent platform PRs `0 9 * * ` — every morning: - Pair every merged platform PR (last 24h) with a docs PR if needed - Backfill one stub page on the docs site - Crawl the live site for broken links / dead anchors - delegate_task to PM with audit_summary (category=docs) 2. Weekly terminology + freshness audit* `0 11 * * 1` — every Monday: - Stale page detection (>30 days untouched on fast-moving surfaces) - Terminology consistency check (one canonical name per concept) - Link-rot scan - Same audit_summary contract ## Plugins Inherits the 9 universal defaults. Adds `browser-automation` for crawling the live docs site. `molecule-skill-update-docs` is already in defaults so the cross-repo sync skill is available. ## Routing Adds `docs: [Documentation Specialist]` to `category_routing` so any agent that emits an audit_summary with category=docs is auto-routed here by the platform. ## Bind mounts Note: this workspace clones BOTH /workspace/repo (the platform monorepo) and /workspace/docs (Molecule-AI/docs) in its initial_prompt so the agent can edit either side.	2026-04-14 21:03:22 -07:00
rabbitblood	53fdffd2c5	chore(template): add Documentation Specialist as 3rd PM direct report Adds a 13th workspace to the molecule-dev template owning end-to-end documentation across all Molecule AI surfaces. ## Why now - We just created Molecule-AI/docs (customer-facing site at doc.moleculesai.app, Fumadocs + Next.js 15) and the customer site needs someone to own it. - Internal docs (README.md, docs/architecture.md, docs/edit-history/) were drifting — every platform PR has been opening a docs sync PR manually. - No agent in the team owned terminology consistency or stub backfill. ## Where it sits in the org Third PM direct report, parallel to Research Lead and Dev Lead — docs is its own swim lane that spans engineering (docs follow code) and research/product (concepts and terminology). PM ├── Research Lead ├── Dev Lead └── Documentation Specialist <-- new ## Schedules (2) 1. Daily docs sync — backfill stubs and pair recent platform PRs `0 9 * * ` — every morning: - Pair every merged platform PR (last 24h) with a docs PR if needed - Backfill one stub page on the docs site - Crawl the live site for broken links / dead anchors - delegate_task to PM with audit_summary (category=docs) 2. Weekly terminology + freshness audit* `0 11 * * 1` — every Monday: - Stale page detection (>30 days untouched on fast-moving surfaces) - Terminology consistency check (one canonical name per concept) - Link-rot scan - Same audit_summary contract ## Plugins Inherits the 9 universal defaults. Adds `browser-automation` for crawling the live docs site. `molecule-skill-update-docs` is already in defaults so the cross-repo sync skill is available. ## Routing Adds `docs: [Documentation Specialist]` to `category_routing` so any agent that emits an audit_summary with category=docs is auto-routed here by the platform. ## Bind mounts Note: this workspace clones BOTH /workspace/repo (the platform monorepo) and /workspace/docs (Molecule-AI/docs) in its initial_prompt so the agent can edit either side.	2026-04-14 21:03:22 -07:00
Hongming Wang	6a8555ef61	Merge pull request #96 from Molecule-AI/feat/canvas-auth-redirect feat(canvas): AuthGate — redirect anonymous users to cp login	2026-04-14 20:42:12 -07:00

... 81 82 83 84 85 ...

4500 Commits