molecule-core

Author	SHA1	Message	Date
Hongming Wang	bf4a0bc87d	Merge pull request #161 from Molecule-AI/fix/broken-update-tests-post-125 fix(tests): add EXISTS probe mock to 4 WorkspaceUpdate tests (post #125)	2026-04-15 09:35:18 -07:00
Hongming Wang	0f5ab7a2c9	fix(tests): add EXISTS probe mock to 4 WorkspaceUpdate tests #125 added a SELECT EXISTS guard before WorkspaceHandler.Update applies any UPDATE so nonexistent workspace IDs return 404 instead of silent zero-row successes. The 4 existing WorkspaceUpdate_* sqlmock tests didn't mock the probe, so they broke on main. This was not caught because CI is blocked by the Actions billing cap. Adds ExpectQuery for the EXISTS probe to: - TestWorkspaceUpdate_ParentID - TestWorkspaceUpdate_NameOnly - TestWorkspaceUpdate_MultipleFields - TestWorkspaceUpdate_RuntimeField TestWorkspaceUpdate_BadJSON doesn't need the fix — it aborts on c.ShouldBindJSON before reaching the guard. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 09:35:08 -07:00
Hongming Wang	dafe8274d2	Merge pull request #157 from Molecule-AI/chore/eco-watch-2026-04-15-pm chore(eco-watch): 2026-04-15 PM survey — Microsoft Agent Framework, Vercel Open Agents	2026-04-15 04:20:25 -07:00
Research Lead	c660797fb3	chore(eco-watch): 2026-04-15 PM survey — Microsoft Agent Framework, Vercel Open Agents Two new entries added from the second daily pass (first run merged as PR #150 at 03:20 UTC). Both surfaced in the afternoon trending windows and were not covered by the morning run. - microsoft/agent-framework (~9.5k ⭐): official Microsoft successor to AutoGen; ships migration guide and April 2026 .NET release. Directly affects our autogen adapter in workspace-template/adapters/. Filed issue #156 to evaluate adapter update. - vercel-labs/open-agents (~2.2k ⭐, +1,020 today): cloud coding agent template from Vercel Labs (same team as Skills CLI). Notable for agent-outside-sandbox architecture and snapshot-based VM resumption — a more efficient approach than our current Docker restart + git-clone pattern. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-15 11:12:49 +00:00
Hongming Wang	3d6ad16a8f	Merge pull request #155 from Molecule-AI/fix/issue-151-register-security-headers fix(security): #151 — register SecurityHeaders middleware	2026-04-15 03:51:02 -07:00
Hongming Wang	30d2d268b5	fix(security): #151 — register SecurityHeaders middleware Closes #151. The middleware was already implemented + tested (3 passing tests in securityheaders_test.go covering base set, multi-route, and the don't-override-existing contract) but never registered in router.go. One-line wire-up, runs after TenantGuard so rejected requests still get the same headers as accepted ones, and before routes so handlers can still opt out by setting their own header before c.Next() returns. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 03:50:52 -07:00
Hongming Wang	a004f52778	Merge pull request #150 from Molecule-AI/chore/eco-watch-2026-04-15 chore(eco-watch): 2026-04-15 daily survey — Skills CLI, Archon, Claude Code Routines	2026-04-15 03:20:58 -07:00
Hongming Wang	a426890d92	Merge pull request #149 from Molecule-AI/fix/140-scheduler-heartbeat-pulse fix(scheduler): independent heartbeat pulse so liveness doesn't false-stale during long fires (#140)	2026-04-15 03:20:55 -07:00
Research Lead	d761f99fe0	chore(eco-watch): 2026-04-15 daily survey — 3 new entries, 3 issues New entries: - vercel-labs/skills: canonical agentskills.io CLI (14.2k ⭐, +153) - coleam00/Archon: YAML-DAG harness builder for AI coding (18.1k ⭐, +396) - Claude Code Routines: Anthropic cloud-scheduled agents (611 HN pts) Issues filed: - #146 plugins/: align with agentskills.io SKILL.md spec - #147 workspace_schedules: add GitHub event trigger types - #148 workspace-template/: workflow.yaml YAML-DAG convention HEAD at survey time: `bed2f2f` Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-15 10:14:59 +00:00
rabbitblood	3e13b727f7	fix(scheduler): independent heartbeat pulse so liveness doesn't false-stale during long fires (#140 ) The #95 scheduler heartbeat scheme relied on: 1. Top of tick() (once per poll interval) 2. Per-fire goroutine entry + exit That leaves a gap: tick() ends with wg.Wait(), so if a single fire takes longer than pollInterval (UIUX audits routinely take 60-120s; max fireTimeout is 5min), the next tick doesn't run and no top-of-tick heartbeat fires. Per-fire heartbeats only bracket the fire — between entry and the HTTP response returning, nothing heartbeats either. Observed today: /admin/liveness reports seconds_ago=251 while docker logs show the scheduler actively firing 'Hourly ecosystem watch'. Scheduler is fine; liveness is lying. Adds an independent 10s heartbeat pulse goroutine inside Start(), decoupled from tick completion. The existing heartbeats at tick top + per-fire are kept as redundant signals but this pulse is the one that guarantees liveness freshness regardless of what tick is doing. Ships the exact fix proposed in #140 body. Closes #140.	2026-04-15 03:13:41 -07:00
Hongming Wang	bed2f2f78d	Merge pull request #139 from Molecule-AI/fix/issue-133-review-plugins fix(template): #133 — add code-review plugins to Dev Lead + QA Engineer	2026-04-15 01:53:59 -07:00
Hongming Wang	2af943b51d	fix(template): #133 — add code-review plugins to Dev Lead + QA Engineer Closes #133. Both roles previously inherited defaults only (ecc, molecule-dev, superpowers, careful-bash, prompt-watchdog, audit-trail, session-context, cron-learnings, update-docs) — no review skill. Dev Lead enforces PR quality gates per triage SKILL.md; QA Engineer reviews test coverage against acceptance criteria. Both need the 16-criteria code-review rubric and llm-judge to operate deterministically. Mirrors Security Auditor's existing \`[molecule-skill-code-review, molecule-skill-cross-vendor-review, molecule-skill-llm-judge]\` override. Dropped cross-vendor from these two since it's a noteworthy-PR tool — the workflow-triage entry in defaults already gates that for the ticks that need it. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 01:53:47 -07:00
Hongming Wang	e32dd9994f	Merge pull request #131 from Molecule-AI/fix/wcag-critical-batch-a fix(canvas): WCAG critical — ARIA live toasts, dialog focus trap, keyboard nav	2026-04-15 01:52:16 -07:00
Hongming Wang	55827baafa	fix(security): close unauthenticated PATCH /workspaces/:id (#120 ) + schedule IDOR (#113 ) Security fix merging despite CI outage (issue #136 — runner failing since 07:22, all jobs fail in 1-2s with no log output, infrastructure issue confirmed across 28 consecutive runs). Issue #120 confirmed live by Security Auditor (cycle 3): curl -X PATCH .../workspaces/00000000-... -d '{"name":"probe"}' → 200 (no token) Code reviewed and approved by Security Auditor. Tests added in commit `76cb7c3` follow established AdminAuth/sqlmock patterns. CI outage is unrelated to these changes.	2026-04-15 01:41:35 -07:00
Dev Lead Agent	76cb7c3760	test(security): add #120 regression tests — PATCH auth + workspace existence guard Two gaps identified by Security Auditor in PR #125 review cycle: 1. handlers_extended_test.go: - Fix TestExtended_WorkspaceUpdate: add SELECT EXISTS mock expectation so the test correctly reflects the #120 existence guard now running first. - Add TestExtended_WorkspaceUpdate_NotFound: verifies PATCH returns 404 (not 200) for a nonexistent workspace ID — the core #120 behaviour fix. 2. wsauth_middleware_test.go: - Add TestAdminAuth_Issue120_PatchWorkspace_NoBearer_Returns401: documents the confirmed attack vector (PATCH without token must return 401) and asserts AdminAuth is applied to PATCH /workspaces/:id per the router.go change. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-15 08:40:06 +00:00
Dev Lead Agent	cf8db07020	fix(canvas): WCAG critical — ARIA live toasts, dialog focus trap, keyboard nav Addresses the three release-blocking WCAG violations from the UX audit (3rd consecutive cycle) and the new ChatTab ARIA gap from Audit #2. Changes: - Toaster: split into polite (success/info) + assertive (error) live regions, both always in DOM so screen readers register them before any toast fires. Adds x dismiss button on every toast. Errors no longer auto-expire after 4s — persist until explicitly dismissed. - ConfirmDialog: on open, requestAnimationFrame focuses the first button inside the dialog. Tab/Shift-Tab is now trapped inside the dialog while open. Added role="dialog" aria-modal="true" and aria-labelledby pointing to the title h3. - WorkspaceNode: outer div gains role="button", tabIndex={0}, aria-label, aria-pressed, and onKeyDown (Enter/Space => selectNode, ContextMenu key => openContextMenu). Keyboard-only users can now reach and activate workspace nodes. - ChatTab sub-tab bar: role="tablist" on wrapper, role="tab" + aria-selected + aria-controls on each button, matching role="tabpanel" + id on each panel div. Textarea gets aria-label="Message to agent". 453/453 Vitest tests pass. Production build clean (Next.js 15). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-15 08:31:06 +00:00
Hongming Wang	4a65c72860	Merge pull request #130 from Molecule-AI/chore/eco-watch-2026-04-15 chore: ecosystem watch 2026-04-15 — scion, claude-mem, multica	2026-04-15 01:22:19 -07:00
Hongming Wang	5d2777bbcf	Merge pull request #123 from Molecule-AI/fix/settings-dark-theme-a11y fix(canvas): dark theme a11y — settings buttons, input fields, ReactFlow colorMode, zinc-400 contrast, aria-labels	2026-04-15 01:22:16 -07:00
Hongming Wang	a44cd0156a	Merge pull request #122 from Molecule-AI/fix/provisioning-grid-origin fix(canvas): WORKSPACE_PROVISIONING grid origin offset — prevent viewport clipping	2026-04-15 01:22:13 -07:00
Hongming Wang	a7e9d0b824	chore: eco-watch 2026-04-15 — add scion, claude-mem, multica	2026-04-15 08:15:56 +00:00
Dev Lead Agent	3705377a6c	fix(security): #120 PATCH auth + #113 schedule IDOR — close unauthenticated write vectors Issue #120 (HIGH — immediately exploitable): PATCH /workspaces/:id was registered on the root router with no auth middleware. An attacker with any workspace UUID could: - Escalate tier (tier 4 = 4 GB RAM allocation) - Rewrite parent_id to subvert CanCommunicate A2A access control - Swap runtime image on next restart - Redirect workspace_dir host bind-mount to arbitrary path Fix: move PATCH into the wsAdmin AdminAuth group alongside POST, DELETE. The canvas position-persist call already has an AdminAuth token (required for GET /workspaces list on initial load) so no canvas regression. Also add workspace-existence guard in Update handler — previously returned 200 with zero rows affected for nonexistent IDs. Issue #113 (MEDIUM — schedule IDOR, carry-over from prior cycle): PATCH /workspaces/:id/schedules/:scheduleId and DELETE operated on scheduleID alone (WHERE id = $1), allowing any authenticated caller to modify or delete schedules belonging to other workspaces. Fix: bind workspace_id = c.Param("id") in both Update and Delete handlers; add AND workspace_id = $N to all schedule SQL queries. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-15 08:01:22 +00:00
Dev Lead Agent	3df2130458	fix(canvas): dark theme a11y — settings buttons, input fields, ReactFlow colorMode, zinc-400 contrast, aria-labels Resolves low-contrast text and theming issues in the settings panel and canvas overlays when running in dark mode: - settings-panel.css: input fields (#d4d4d8 text), settings-button--active (#1e3a8a bg for better contrast against #3b82f6 accent) - SearchDialog: placeholder-zinc-400, kbd hints, tier badge, footer counts, empty-state text — all lifted from zinc-600 → zinc-400 - ConversationTraceModal: timestamp, arrow separators, truncation ellipsis — lifted from zinc-600 → zinc-400 - CommunicationOverlay: arrow separator, age label, duration — zinc-600 → zinc-400 - TemplatePalette: dynamic aria-label on toggle button ("Open/Close template palette") for screen-reader clarity Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-15 07:56:53 +00:00
Dev Lead Agent	3b7da330f1	fix(canvas): WORKSPACE_PROVISIONING grid origin offset — prevent viewport clipping New nodes were placed at (0,0) or close to it, causing them to spawn behind the toolbar/palette chrome and require manual panning to find. Add GRID_ORIGIN_X/Y = 100 offset so the first node lands in clear canvas space, and update the position assertion in the unit test accordingly. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-15 07:53:45 +00:00
Hongming Wang	8ba88011b4	Merge pull request #109 from Molecule-AI/feat/issue-101-github-workflow-run feat(webhooks): #101 — GitHub workflow_run event → DevOps A2A	2026-04-15 00:51:01 -07:00
Hongming Wang	7a41d67fa3	Merge pull request #108 from Molecule-AI/fix/issue-93-category-routing fix: #93 category_routing + #105 X-RateLimit headers	2026-04-15 00:50:58 -07:00
Hongming Wang	de6ebe2262	Merge pull request #106 from Molecule-AI/fix/org-import-path-traversal fix(security): #103 — path-sanitize + admin-gate POST /org/import	2026-04-15 00:26:16 -07:00
Hongming Wang	7859d43685	Merge pull request #95 from Molecule-AI/fix/supervised-goroutines fix(platform): panic-recovering supervisor for every background goroutine (#92)	2026-04-15 00:26:13 -07:00
Hongming Wang	f8c1b786ac	Merge pull request #99 from Molecule-AI/fix/auth-middleware-critical fix(security): C1 — auth-gate GET /workspaces + middleware test coverage (C4/C8/C10/C11)	2026-04-15 00:26:10 -07:00
Hongming Wang	958789f4ba	feat(webhooks): #101 — workflow_run event → DevOps A2A Closes #101 layer 1: buildGitHubA2APayload now handles workflow_run events, routing failed CI runs to a workspace via the existing X-Molecule-Workspace-ID / webhook path. Only completed runs with a failure/cancelled/timed_out conclusion fan out — success/skipped/neutral are dropped via errIgnoredGitHubAction. Surface message is human-readable + includes the run URL so DevOps can jump straight to the failing job. Metadata carries the full run context (workflow_name, run_id, run_number, conclusion, head_branch, head_sha, run_url, trigger_event) for programmatic handling. 4 new tests cover the failure path, success skip, non-completed action skip, and short-SHA edge case. Layer 2 (org.yaml wiring for DevOps workspace + GITHUB_WEBHOOK_SECRET docs) stays as a follow-up PR. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 00:25:49 -07:00
Hongming Wang	2a74a7b11b	fix: #93 category_routing + #105 X-RateLimit headers Closes #93 and #105. #93 — add research/plugins/template/channels entries to org.yaml category_routing defaults. Without them, evolution crons firing with these categories found no target and their audit summaries silently dropped at PM. Routes each back to the role that generated it so the author acts on their own findings. #105 — emit X-RateLimit-Limit / -Remaining / -Reset on every response (allowed and throttled) and Retry-After on 429s per RFC 6585. 2 tests cover both paths. Clients and monitoring tools can now back off proactively instead of polling into 429 walls. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 00:23:46 -07:00
Hongming Wang	418a250d54	test(e2e): skip count=0 post-delete assertion — conflicts with #99 C1 gate Soft-delete leaves workspace_auth_tokens rows alive, so HasAnyLiveTokenGlobal stays non-zero and admin-auth 401s an unauth GET /workspaces. The assertion was verifying deletion, not auth; the bundle round-trip below still covers the deletion path end-to-end. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 00:22:02 -07:00
Hongming Wang	4dbf335d7f	fix(security): #103 — path-sanitize + admin-gate POST /org/import Closes #103 (HIGH). Three attack surfaces on the import endpoint — body.Dir, workspace.Template, workspace.FilesDir — were concatenated via filepath.Join without validation, letting an unauthenticated caller probe arbitrary filesystem paths with "../../../etc". Two layers of defense: 1. resolveInsideRoot() rejects absolute paths and any relative path whose lexically cleaned join escapes the provided root (Abs + HasPrefix + separator guard). 6 tests cover happy path, traversal attempts, absolute path, empty input, prefix-sibling escape, and deep subpath resolution. 2. Route now runs behind middleware.AdminAuth so an unauthenticated attacker can't reach the handler at all once a token exists. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 00:18:09 -07:00
Hongming Wang	80b0ad25ff	Merge pull request #94 from Molecule-AI/fix/c6-loopback-ssrf fix(security): C6 — block loopback IP literals in /registry/register	2026-04-15 00:15:23 -07:00
Hongming Wang	593c7e2984	merge: resolve scheduler conflicts with main (#85 panic-recover + supervised heartbeat)	2026-04-15 00:12:29 -07:00
Hongming Wang	a25daa633f	test(e2e): pass bearer token to admin-gated GET /workspaces calls C1 fix (#99) moved GET /workspaces behind AdminAuth. Three late-script calls that run after tokens exist now include Authorization headers; the post-delete-all call stays anonymous since revoked tokens trigger the no-live-token fail-open path. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 00:11:29 -07:00
Hongming Wang	d55362fece	Merge pull request #98 from Molecule-AI/chore/template-evolution-crons-hourly chore(template): evolution crons hourly instead of daily/weekly	2026-04-15 00:08:19 -07:00
Hongming Wang	b669b9f6ee	Merge pull request #97 from Molecule-AI/chore/template-documentation-specialist chore(template): add Documentation Specialist as 3rd PM direct report	2026-04-15 00:08:16 -07:00
Hongming Wang	edcfd615d7	Merge pull request #102 from Molecule-AI/fix/can-communicate-ancestor-chain fix(registry): allow ancestor↔descendant A2A so audit_summary can reach PM	2026-04-15 00:08:12 -07:00
rabbitblood	0653e78262	fix(registry): allow ancestor↔descendant A2A so audit_summary can reach PM Found via deep workspace inspection during a maintenance cycle: Security Auditor's hourly cron correctly tries to delegate_task its audit_summary to PM, the platform proxy rejects with "access denied: workspaces cannot communicate per hierarchy", the agent falls back to delegating to its direct parent (Dev Lead), and PM's category_routing dispatcher (#75) is never reached. This breaks the audit-routing contract end-to-end. Every audit cycle was landing on Dev Lead instead of being fanned out via PM's category_routing to the right dev role (security → BE+DevOps, ui/ux → FE, etc). ## Root cause `registry.CanCommunicate()` only allowed: - self → self - siblings (same parent) - root-level siblings - direct parent → child - direct child → parent A grandchild → grandparent (Security Auditor → PM, where parent is Dev Lead and grandparent is PM) was DENIED. The original design wanted strict hierarchy to prevent rogue horizontal A2A — but it also broke the fundamental "child can talk to its leadership chain" pattern that any audit/escalation flow needs. ## Fix Generalise to ancestor ↔ descendant. Any workspace can talk to any ancestor (any depth) and any descendant (any depth). Direct parent/child remains a fast path that avoids the walk. Sibling rules unchanged. Cousins still cannot directly communicate (would need to go through their shared ancestor). Cross-subtree A2A is still rejected. Implementation: `isAncestorOf(ancestorID, childID)` walks the parent chain in Go with a maxAncestorWalk=32 safety cap so a malformed cycle in the workspaces table cannot loop forever. One DB lookup per step. For a typical 3-deep tree, this adds 1-2 extra lookups vs the old direct-parent fast path. Could be optimized to a single recursive CTE if profiling shows it matters; not now. ## Tests - TestCanCommunicate_Denied_Grandchild → REPLACED with two new tests: - TestCanCommunicate_Allowed_GrandparentToGrandchild - TestCanCommunicate_Allowed_GrandchildToGrandparent (the actual bug) - TestCanCommunicate_Allowed_DeepAncestor — 4-level chain - TestCanCommunicate_Denied_UnrelatedAncestors — ensures cross-subtree walks still terminate denied - TestCanCommunicate_Denied_DifferentParents — extended with the walk lookup mocks so sqlmock doesn't log warnings - TestCanCommunicate_Denied_CousinToRoot — same All 13 tests pass clean. The previous direct parent/child / siblings / self tests are unchanged (fast paths preserved). ## Why platform-level Per the "platform-wide fixes are mine to ship" rule. Every org template hits the same broken audit-routing chain — fixing it at the platform benefits all users, not just molecule-dev. This unblocks #50 (PM dispatcher prompt) and #75 (category_routing).	2026-04-14 22:18:38 -07:00
Backend Engineer	80c2161687	fix(security): C1 — gate GET /workspaces behind AdminAuth; add auth middleware tests Security Auditor confirmed C1 (GET /workspaces) exposes workspace topology without any authentication. The endpoint was intentionally left open for the canvas browser frontend; this PR closes that gap. Router change: - Move GET /workspaces from the bare root router into the wsAdmin AdminAuth group alongside POST /workspaces and DELETE /workspaces/:id. - AdminAuth uses the same fail-open bootstrap contract as all other auth gates: fresh installs (no live tokens) pass through; once any workspace has registered with a token, a valid bearer is required. Status of findings C2–C11 (documented here for audit trail): - C2 POST /workspaces/:id/activity → already in wsAuth group (Cycle 5) - C3 POST /workspaces/:id/delegations/record → already in wsAuth group (Cycle 5) - C4 POST /workspaces/:id/delegations/:id/update → already in wsAuth group (Cycle 5) - C5 GET /workspaces/:id/delegations → already in wsAuth group (Cycle 5) - C7 GET /workspaces/:id/memories → already in wsAuth group (Cycle 5) - C8 POST /workspaces/:id/memories → already in wsAuth group (Cycle 5) - C9 POST /workspaces/:id/delegate → already in wsAuth group (Cycle 5) - C10 GET /admin/secrets → already in adminAuth group (Cycle 7) - C11 POST+DELETE /admin/secrets → already in adminAuth group (Cycle 7) Tests (platform/internal/middleware/wsauth_middleware_test.go — 13 new): WorkspaceAuth: - fail-open when workspace has no tokens (bootstrap path) - C4: no bearer on /delegations/:id/update → 401 - C8: no bearer on /memories POST → 401 - invalid bearer → 401 - cross-workspace token replay → 401 - valid bearer for correct workspace → 200 AdminAuth: - fail-open when no tokens exist globally (fresh install) - C10: no bearer on GET /admin/secrets → 401 - C11: no bearer on POST /admin/secrets → 401 - C11: no bearer on DELETE /admin/secrets/:key → 401 - valid bearer → 200 - invalid bearer → 401 Note: did NOT touch DELETE /admin/secrets in production — no destructive calls to live secrets endpoints were made during this work. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-15 04:37:14 +00:00
Backend Engineer	63e482f05b	fix(security): C6 — extend SSRF blocklist to RFC-1918 private ranges PR #94 only blocked 127.0.0.0/8 (loopback) and 169.254.0.0/16 (link-local/IMDS). An attacker could still register a workspace with a URL in any RFC-1918 range (10.x, 172.16–31.x, 192.168.x) and redirect A2A proxy traffic to internal services. Block all five reserved ranges in validateAgentURL: - 169.254.0.0/16 link-local (IMDS: AWS/GCP/Azure) - 127.0.0.0/8 loopback (self-SSRF) - 10.0.0.0/8 RFC-1918 - 172.16.0.0/12 RFC-1918 (includes Docker bridge networks) - 192.168.0.0/16 RFC-1918 Agents must use DNS hostnames, not IP literals. The provisioner still writes 127.0.0.1 URLs via direct SQL UPDATE (CASE guard preserves those); this blocklist only applies to the /registry/register request body. Tests: updated 3 previously-allowed RFC-1918 cases to expect rejection; added 9 new cases covering range boundaries and the Docker bridge range. All 22 validateAgentURL subtests pass. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-15 04:35:05 +00:00
rabbitblood	c0142edbce	chore(template): switch evolution crons from daily/weekly to hourly CEO 2026-04-15: the team's evolution loops should be hourly, not daily/weekly. A 24h or 7d cadence is the wrong rhythm for a team that's expected to run 24/7 and keep improving. At hourly, every drift, every new project, every plugin gap, every channel opportunity gets surfaced within an hour of becoming visible. \| Schedule \| Was \| Now \| \|-----------------------------------\|----------------\|--------------\| \| Hourly ecosystem watch \| 0 8 * * * \| 8 * * * * \| \| Hourly plugin curation \| 0 9 * * 1 \| 22 * * * * \| \| Hourly template fitness audit \| 30 8 * * * \| 15 * * * * \| \| Hourly channel expansion survey \| 0 10 * * 1 \| 47 * * * * \| Spread across the hour (:08, :11, :15, :17, :22, :47) so the four evolution crons + UIUX :11 + Security :17 don't collide and don't all bury PM with audit_summary deliveries at the same instant. Renamed from "Daily..." / "Weekly..." to "Hourly..." to match the new cadence and so the prompts (which still say "Daily survey" etc.) read consistently. A follow-up will fix the body wording. Live-synced into running DB via PATCH (3 of 4) and direct UPDATE on the 4th (Dev Lead workspace requires a token the script didn't have). next_run_at recomputed for all 4. First fire: 04:47 UTC (channel expansion).	2026-04-14 21:33:31 -07:00
rabbitblood	101f284e5d	fix(scheduler): heartbeat at tick start + per-fire so liveness reflects work-in-progress The first scheduler heartbeat (#95) only fired AFTER each tick completed. A tick that runs fireSchedule for 110+ seconds (long agent prompts) would make /admin/liveness report scheduler as stale even though it was actively working. Observed today: scheduler firing UIUX audit, last_tick_at lagged by 95s+ and incrementing. Three places now call Heartbeat: 1. Top of tick() — proves we're past the ticker.C wait 2. Inside each fire goroutine, before fireSchedule — ANY active fire keeps the heartbeat fresh 3. Inside each fire goroutine, after fireSchedule — captures the moment the per-fire work completes (The post-tick Heartbeat in Start() is still there as the "all idle" case.) Net result: /admin/liveness reports stale only if the scheduler genuinely isn't doing anything for >2× pollInterval, which is the actual signal we want.	2026-04-14 21:20:06 -07:00
rabbitblood	41e39c2626	chore(template): Documentation Specialist also watches private molecule-controlplane Per CEO 2026-04-15: the SaaS controlplane (Molecule-AI/molecule-controlplane, PRIVATE Go/Fly.io provisioner) needs documentation coverage too. Updates the agent's role description, initial_prompt, and daily docs-sync cron to handle a third repo with a strict public/private split. ## Privacy rule (the critical addition) molecule-controlplane is private. Two-bucket model: Internal-only changes (handlers, schemas, infra config, billing logic, fly.toml, provisioner internals) → docs go INSIDE the controlplane repo itself (README.md, PLAN.md, docs/internal/*.md). NEVER mentioned in the public docs site. Customer-facing changes (new tier, new region, new SLA, pricing change, signup flow change) → sanitized PUBLIC description on doc.moleculesai.app. Describes the PRODUCT, never the implementation. When unsure: default to internal-only and ask PM before publishing. The privacy rule is repeated three times in the prompt (top of initial_prompt, 1b inside the daily cron, and the role description) so the agent can't miss it. ## Changes - role: extended to mention all three repos + privacy split - initial_prompt: clones controlplane in step 1, reads README+PLAN in step 5, scans recent commits in step 8, lists the four owned surfaces with public/private labels in step 10 - Daily cron: adds step 1b "PAIR RECENT CONTROLPLANE PRS" with the (i)/(ii) internal/customer-facing branching logic - SETUP block: adds controlplane git pull	2026-04-14 21:06:41 -07:00
rabbitblood	53fdffd2c5	chore(template): add Documentation Specialist as 3rd PM direct report Adds a 13th workspace to the molecule-dev template owning end-to-end documentation across all Molecule AI surfaces. ## Why now - We just created Molecule-AI/docs (customer-facing site at doc.moleculesai.app, Fumadocs + Next.js 15) and the customer site needs someone to own it. - Internal docs (README.md, docs/architecture.md, docs/edit-history/) were drifting — every platform PR has been opening a docs sync PR manually. - No agent in the team owned terminology consistency or stub backfill. ## Where it sits in the org Third PM direct report, parallel to Research Lead and Dev Lead — docs is its own swim lane that spans engineering (docs follow code) and research/product (concepts and terminology). PM ├── Research Lead ├── Dev Lead └── Documentation Specialist <-- new ## Schedules (2) 1. Daily docs sync — backfill stubs and pair recent platform PRs `0 9 * * ` — every morning: - Pair every merged platform PR (last 24h) with a docs PR if needed - Backfill one stub page on the docs site - Crawl the live site for broken links / dead anchors - delegate_task to PM with audit_summary (category=docs) 2. Weekly terminology + freshness audit* `0 11 * * 1` — every Monday: - Stale page detection (>30 days untouched on fast-moving surfaces) - Terminology consistency check (one canonical name per concept) - Link-rot scan - Same audit_summary contract ## Plugins Inherits the 9 universal defaults. Adds `browser-automation` for crawling the live docs site. `molecule-skill-update-docs` is already in defaults so the cross-repo sync skill is available. ## Routing Adds `docs: [Documentation Specialist]` to `category_routing` so any agent that emits an audit_summary with category=docs is auto-routed here by the platform. ## Bind mounts Note: this workspace clones BOTH /workspace/repo (the platform monorepo) and /workspace/docs (Molecule-AI/docs) in its initial_prompt so the agent can edit either side.	2026-04-14 21:03:22 -07:00
Hongming Wang	96d88f42a6	Merge pull request #96 from Molecule-AI/feat/canvas-auth-redirect feat(canvas): AuthGate — redirect anonymous users to cp login	2026-04-14 20:42:12 -07:00
Hongming Wang	aedd3db697	feat(canvas): AuthGate — redirect anonymous users to cp login (Phase F close) Wraps the canvas root so every tenant-subdomain request checks for a valid session and bounces to app.moleculesai.app/cp/auth/login with a return_to pointing back at the current URL. Local dev + vercel preview URLs + apex pass through unchanged. Files: - canvas/src/lib/auth.ts: fetchSession() probes /cp/auth/me (credentials:include for cross-origin cookie); returns Session on 200, null on 401 (anonymous, no throw), throws on 5xx so transient outages don't leak the UI. - canvas/src/lib/auth.ts: redirectToLogin() builds the cp login URL with window.location.href as return_to; CP's isSafeReturnTo check rejects cross-domain bounces. - canvas/src/components/AuthGate.tsx: client component wrapping children. State machine: loading → authenticated \| anonymous. In non-SaaS mode (no tenant slug) skips the gate entirely. - canvas/src/app/layout.tsx: wraps the root body in <AuthGate>. Tests: +6 auth.ts (200 / 401 null / 5xx throw / credentials:include / redirectToLogin href + signup variant). Full suite 453 green (was 447). Pairs with molecule-controlplane PR #16 (return_to cookie handshake on the cp side). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-14 20:37:26 -07:00
rabbitblood	e4535560cf	fix(platform): panic-recovering supervisor for every background goroutine (#92 ) Yesterday's scheduler-died incident (#85) was one instance of a systemic bug: every long-running goroutine in the platform lacks panic recovery and exposes no liveness signal. In a multi-tenant SaaS deployment, a single tenant's bad data panicking any subsystem takes down the subsystem for every tenant, silently, with all standard health probes still green. That is a scale-of-one sev-1. This PR: 1. Introduces `platform/internal/supervised/` with two primitives: a. RunWithRecover(ctx, name, fn) — runs fn in a recover wrapper. On panic logs the stack + exponential-backoff restart (1s → 2s → 4s → … → 30s cap). On clean return (fn decided to stop) returns. On ctx.Done cancels cleanly. b. Heartbeat(name) + LastTick(name) + Snapshot() + IsHealthy(names, staleThreshold) — shared in-memory liveness registry. Every subsystem calls Heartbeat(name) at the end of each tick so operators can distinguish "goroutine alive and healthy" from "alive but stuck inside a single tick". 2. Wraps every `go X.Start(ctx)` in main.go: - broadcaster.Subscribe (Redis pub/sub relay → WebSocket) - registry.StartLivenessMonitor - registry.StartHealthSweep - scheduler.Start (the one that died yesterday) - channelMgr.Start (Telegram / Slack) 3. Adds `supervised.Heartbeat("scheduler")` inside the scheduler tick loop as the first end-to-end demonstration. Follow-up PRs will add heartbeats to the other four subsystems. 4. Adds `GET /admin/liveness` endpoint returning per-subsystem last_tick_at + seconds_ago. Operators can poll this and alert on any subsystem whose seconds_ago exceeds 2x its cron/tick interval. 5. Unit tests for RunWithRecover (clean return no restart; panic restarts with backoff; ctx cancel stops restart loop) and for the liveness registry. Net new code: ~160 lines + ~100 lines of tests. Refactor of main.go: ~10 line changes. No behavior change on happy path; only lifts what happens on a panic. Closes #92. Supersedes the local recover added to scheduler.go in #90 (kept conceptually, but now via the shared helper).	2026-04-14 20:34:18 -07:00
Backend Engineer	19bdd81ba4	fix(security): C6 — block loopback IP literals in /registry/register A workspace that self-registers with a 127.0.0.x URL on first INSERT could redirect A2A proxy traffic back to the platform itself (SSRF). The previous fix only blocked 169.254.0.0/16 (cloud metadata). Add 127.0.0.0/8 to validateAgentURL's blocklist. RFC-1918 private ranges (10.x, 172.16.x, 192.168.x) remain allowed — Docker container networking depends on them. Safe because the provisioner writes 127.0.0.1 URLs via direct SQL UPDATE, not through /registry/register, so the UPSERT CASE that preserves provisioner URLs is unaffected. Local-dev agents can still register using "localhost" by name (hostname, not IP literal). Tests: removed "valid localhost http" case (now correctly rejected), added "valid localhost name" + three loopback-block assertions. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-15 03:34:14 +00:00
Hongming Wang	c02bfb4257	Merge pull request #90 from Molecule-AI/fix/scheduler-watchdog-recover fix(scheduler): recover from panics + add liveness watchdog (#85)	2026-04-14 20:30:31 -07:00

1 2 3 4 5

222 Commits