molecule-core

Author	SHA1	Message	Date
Hongming Wang	c9f10e459f	Merge branch 'staging' into fix/chat-user-timestamp-from-activity	2026-04-26 21:21:45 -07:00
Hongming Wang	fe204f04da	test(chat): extract historyHydration helper + 12 unit tests User pushed back: the timestamp bug should have been caught by E2E. Right — my earlier coverage tested the server contract (notify endpoint, WS broadcast filter) but never the chat-history HYDRATION path. Without a unit test that froze the wall clock and asserted timestamps came from created_at, a future refactor could re-introduce the same bug. This commit: 1. Extracts the per-row → ChatMessage[] mapping out of the closure inside loadMessagesFromDB into chat/historyHydration.ts. Pure function, no React dependency, easy to test. 2. Adds 12 vitest cases in __tests__/historyHydration.test.ts covering: - Timestamp regression (3 tests, with system time frozen to 2030 so a regression starts producing "2030-…" timestamps and the assertion fails unmistakably). The third test mirrors the user's screenshot: two rows with distinct created_at must produce distinct timestamps. - User-message extraction (text, internal-self filter, null body) - Agent-message extraction (text, error→system role, file attachments, null body, body with neither text nor files) - End-to-end: a single row with both request and response emits two messages with the same timestamp (the canonical canvas-source row pattern) 3. The new file-attachment test caught a SECOND latent bug — the helper was passing `response_body.result ?? response_body` to extractFiles FromTask, which passes the STRING "<text>" for the notify-with- attachments shape `{result: "<text>", parts: [...]}` and silently returns []. So a chat reload after an agent attached a file would lose the chips. Fixed by only unwrapping `result` when it's an object (the task-shape) and falling through to response_body otherwise (the notify shape). ChatTab now imports the helper and the loop body becomes one line: `messages.push(...activityRowToMessages(a, isInternalSelfMessage))`. Verification: - 12/12 historyHydration tests pass - 1072/1072 full canvas vitest pass (was 1060 before, +12) - tsc --noEmit clean Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 21:18:22 -07:00
Hongming Wang	8415870520	fix(chat): pin historical user-message timestamps to activity created_at User flagged that all historical user bubbles render with the same "now" clock after a chat reload — both messages in the screenshot showed 9:01:58 PM despite being sent hours apart. ChatTab.tsx:142 minted user messages with createMessage(...) which calls new Date().toISOString() — fine for a freshly-typed message, wrong for hydrated history. Every reload re-stamped all user bubbles to the render moment, collapsing the visible chronology. The agent path on line 157 already overrides with a.created_at; mirror that. One-line fix (spread + override timestamp) plus a comment explaining why the override is load-bearing so the next refactor doesn't drop it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 21:06:19 -07:00
hongmingwang-moleculeai	917502b9e1	Merge pull request #2133 from Molecule-AI/fix/notify-e2e-pre-sweep test(notify): pre-sweep prior E2E workspaces so interrupted runs don't pile up	2026-04-27 03:58:01 +00:00
Hongming Wang	49fb5fdaf6	test(notify): pre-sweep prior workspaces so interrupted runs don't pile up User flagged a leftover "Notify E2E" workspace on the canvas — caused by an earlier debug run getting SIGPIPE'd before the EXIT trap could fire. Add an idempotent pre-sweep at the top of the script so the next run cleans up any prior leftover with the same name. Belt-and-suspenders with the existing trap; both have to fail for a leak to persist. Verified: - Normal run: 14/14 pass, 0 leftovers - SIGTERM mid-setup: trap fires, 0 leftovers - Re-run after interruption: pre-sweep + new run both clean Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 20:55:13 -07:00
hongmingwang-moleculeai	f547c4e259	Merge pull request #2132 from Molecule-AI/test/comprehensive-comms-e2e test(comms): E2E + canvas coverage for agent → user attachments	2026-04-27 03:49:49 +00:00
Hongming Wang	94e86698fb	fix(test): mint test token for notify E2E so it works in CI Local dev mode bypassed workspace auth, so my first push passed locally but failed CI with HTTP 401 on /notify. The wsAuth-grouped endpoints (notify, activity, chat/uploads) require Authorization: Bearer in any non-dev environment. Mint the token via the existing e2e_mint_test_token helper and thread it through every authenticated curl. Same pattern as test_api.sh. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 20:45:42 -07:00
Hongming Wang	fb080227a3	Merge pull request #2131 from Molecule-AI/feat/agent-comms-grouped-by-peer feat(canvas): Agent Comms grouped by peer with sub-tabs	2026-04-27 03:43:45 +00:00
Hongming Wang	62cfc21033	test(comms): comprehensive E2E coverage for agent → user attachments User asked to "keep optimizing and comprehensive e2e testings to prove all works as expected" for the communication path. Adds three layers of coverage for PR #2130 (agent → user file attachments via send_message_to_user) since that path has the most user-visible blast radius: 1. Shell E2E (tests/e2e/test_notify_attachments_e2e.sh) — pure platform test, no workspace container needed. 14 assertions covering: notify text-only round-trip, notify-with-attachments persists parts[].kind=file in the shape extractFilesFromTask reads, per-element validation rejects empty uri/name (regression for the missing gin `dive` bug), and a real /chat/uploads → /notify URI round-trip when a container is up. 2. Canvas AGENT_MESSAGE handler tests (canvas-events.test.ts +5) — pin the WebSocket-side filtering that drops malformed attachments, allows attachments-only bubbles, ignores non-array payloads, and no-ops on pure-empty events. 3. Persisted response_body shape test (message-parser.test.ts +1) — pins the {result, parts} contract the chat history loader hydrates on reload, so refreshing after an agent attachment restores both caption and download chips. Also wires the new shell E2E into e2e-api.yml so the contract regresses in CI rather than only in manual runs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 20:41:56 -07:00
Hongming Wang	26fb4b309e	fix(canvas): delegation rows show real text + bidirectional bubbles User flagged two paper cuts in Agent Comms after the grouping PR: "Delegating to f6f3a023-ab3c-4a69-b101-976028a4a7ec" reads as gibberish because it's a UUID, and the chat is "one way" with only outbound bubbles even though peers are clearly responding. Both fixes are in toCommMessage's delegation branch: 1. Pull text from the actual payload, not the platform's audit-log summary. - delegate row → request_body.task (the task text the agent sent). Fallback when missing: "Delegating to <resolved-peer-name>" — never the raw UUID. - delegate_result row → response_body.response_preview / .text (the peer's actual reply). Fallback paths render human-readable status for queued / failed cases ("Queued — Peer Agent is busy on a prior task...") instead of platform jargon. 2. delegate_result rows render flow="in" — even though source_id=us (the platform writes the row on our side), the conversational direction is peer → us. The chat now shows alternating bubbles (out: "Build me 10 landing pages" → in: "Done — ZIP at /tmp/...") instead of one-sided "→ To X" wall. The WS push handler in this same file now populates request_body / response_body from the DELEGATION_SENT / DELEGATION_COMPLETE event payloads (task_preview, response_preview), so live-pushed bubbles use the same text-extraction path as the GET-on-mount. Tests: - 4 new in toCommMessage's delegation branch: - delegate row prefers request_body.task over summary - delegate row falls back to name-resolved label when task missing - delegate_result row is INBOUND (flow="in") - delegate_result queued shows human-readable wait message including the resolved peer name - Replaces the previous "delegate row maps text from summary" tests which encoded the (now-undesirable) platform-summary-as-text behavior. - All 15 tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 20:24:58 -07:00
Hongming Wang	5f08455340	feat(canvas): Agent Comms grouped by peer with sub-tabs The chronological-only view was a noodle once Director + N peers exchange more than a few rounds. New layout: a sub-tab bar at the top of the panel, with "All" pinned leftmost and one tab per peer (name + count). Selecting a peer filters the thread to that one DD↔X conversation; "All" preserves the previous chronological view as the default. Tab ordering follows Slack/Linear DM-list convention: most-recent activity descending, so active conversations rise to the top without the user scrolling. Counts in parens match Slack's unread hint pattern (no separate read/unread state — the count is total in this conversation, computed from the same in-memory message list the panel already maintains). Pure-helper extraction: peer-summary derivation lives in `buildPeerSummary(messages)` so the sort + count logic is unit- testable without rendering the panel. 5 new tests cover: count aggregation, most-recent-first ordering, lastTs as max-not-last, empty input, name-stability when the same peerId carries different names across messages. Keyboard: ArrowLeft/Right cycle peer tabs (matches the existing My Chat / Agent Comms tab pattern in ChatTab). Auto-prune: if the selected peer has zero messages after a setMessages update (rare, e.g. dedupe drops the last bubble), fall back to "All" so the viewer doesn't see an empty thread. Frontend-only — no platform / runtime / DB changes. The existing `peerId` / `peerName` fields on CommMessage already carry every piece of data the new UI needs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 20:16:11 -07:00
Hongming Wang	954d7d9182	Merge pull request #2130 from Molecule-AI/feat/agent-to-user-attachments feat(notify): agent → user file attachments via send_message_to_user	2026-04-27 03:13:20 +00:00
Hongming Wang	0027322699	Merge pull request #2129 from Molecule-AI/fix/canvas-safety-net-midnight-rollover fix(ci): sweep prior UTC day in e2e safety nets (midnight-rollover)	2026-04-27 03:01:39 +00:00
Hongming Wang	6eaacf175b	fix(notify): review-flagged Critical + Required findings on PR #2130 Two Critical bugs caught in code review of the agent→user attachments PR: 1. Empty-URI attachments slipped past validation. Gin's go-playground/validator does NOT iterate slice elements without `dive` — verified zero `dive` usage anywhere in workspace-server — so the inner `binding:"required"` tags on NotifyAttachment.URI/Name were never enforced. `attachments: [{"uri":"","name":""}]` would pass validation, broadcast empty-URI chips that render blank in canvas, AND persist them in activity_logs for every page reload to re-render. Added explicit per-element validation in Notify (returns 400 with `attachment[i]: uri and name are required`) plus defence-in-depth in the canvas filter (rejects empty strings, not just non-strings). 3-case regression test pins the rejection. 2. Hardcoded application/octet-stream stripped real mime types. `_upload_chat_files` always passed octet-stream as the multipart Content-Type. chat_files.go:Upload reads `fh.Header.Get("Content-Type")` FIRST and only falls back to extension-sniffing when the header is empty, so every agent-attached file lost its real type forever — broke the canvas's MIME-based icon/preview logic. Now sniff via `mimetypes.guess_type(path)` and only fall back to octet-stream when sniffing returns None. Plus three Required nits: - `sqlmockArgMatcher` was misleading — the closure always returned true after capture, identical to `sqlmock.AnyArg()` semantics, but named like a custom matcher. Renamed to `sqlmockCaptureArg(*string)` so the intent (capture for post-call inspection, not validate via driver-callback) is unambiguous. - Test asserted notify call by `await_args_list[1]` index — fragile to any future _upload_chat_files refactor that adds a pre-flight POST. Now filter call list by URL suffix `/notify` and assert exactly one match. - Added `TestNotify_RejectsAttachmentWithEmptyURIOrName` (3 cases) covering empty-uri, empty-name, both-empty so the Critical fix stays defended. Deferred to follow-up: - ORDER BY tiebreaker for same-millisecond notifies — pre-existing risk, not regression. - Streaming multipart upload — bounded by the platform's 50MB total cap so RAM ceiling is fixed; switch to streaming if cap rises. - Symlink rejection — agent UID can already read whatever its filesystem perms allow via the shell tool; rejecting symlinks doesn't materially shrink the attack surface. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 19:47:31 -07:00
Hongming Wang	d028fe19ff	feat(notify): agent → user file attachments via send_message_to_user Closes the gap where the Director would say "ZIP is ready at /tmp/foo.zip" in plain text instead of attaching a download chip — the runtime literally had no API for outbound file attachments. The canvas + platform's chat-uploads infrastructure already supported the inbound (user → agent) direction (commit `94d9331c`); this PR wires the outbound side. End-to-end shape: agent: send_message_to_user("Done!", attachments=["/tmp/build.zip"]) ↓ runtime POST /workspaces/<self>/chat/uploads (multipart) ↓ platform /workspace/.molecule/chat-uploads/<uuid>-build.zip → returns {uri: workspace:/...build.zip, name, mimeType, size} ↓ runtime POST /workspaces/<self>/notify {message: "Done!", attachments: [{uri, name, mimeType, size}]} ↓ platform Broadcasts AGENT_MESSAGE with attachments + persists to activity_logs with response_body = {result: "Done!", parts: [{kind:file, file:{...}}]} ↓ canvas WS push: canvas-events.ts adds attachments to agentMessages queue Reload: ChatTab.loadMessagesFromDB → extractFilesFromTask sees parts[] Either path → ChatTab renders download chip via existing path Files changed: workspace-server/internal/handlers/activity.go - NotifyAttachment struct {URI, Name, MimeType, Size} - Notify body accepts attachments[], broadcasts in payload, persists as response_body.parts[].kind="file" canvas/src/store/canvas-events.ts - AGENT_MESSAGE handler reads payload.attachments, type-validates each entry, attaches to agentMessages queue - Skips empty events (was: skipped only when content empty) workspace/a2a_tools.py - tool_send_message_to_user(message, attachments=[paths]) - New _upload_chat_files helper: opens each path, multipart POSTs to /chat/uploads, returns the platform's metadata - Fail-fast on missing file / upload error — never sends a notify with a half-rendered attachment chip workspace/a2a_mcp_server.py - inputSchema declares attachments param so claude-code SDK surfaces it to the model - Defensive filter on the dispatch path (drops non-string entries if the model sends a malformed payload) Tests: - 4 new Python: success path, missing file, upload 5xx, no-attach backwards compat - 1 new Go: Notify-with-attachments persists parts[] in response_body so chat reload reconstructs the chip Why /tmp paths work even though they're outside the canvas's allowed roots: the runtime tool reads the bytes locally and re-uploads through /chat/uploads, which lands the file under /workspace (an allowed root). The agent can specify any readable path. Does NOT include: agent → agent file transfer. Different design problem (cross-workspace download auth: peer would need a credential to call sender's /chat/download). Tracked as a follow-up under task #114. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 19:35:58 -07:00
Hongming Wang	3a36d732e4	fix(ci): sweep prior UTC day in e2e safety nets (midnight-rollover) [Molecule-Platform-Evolvement-Manager] ## What was breaking All three staging e2e workflows' "Teardown safety net" steps filtered candidate slugs by `f'e2e-...-{today}-...'` where `today` was computed at safety-net-step time via `datetime.date.today()`. When a run crossed midnight UTC (start before 00:00, end after), `today` became the NEXT day, but the slug it created carried the PRIOR day's date. The filter never matched its own slug → leak. ## Today's incident E2E Staging Canvas run [24970092066]( https://github.com/Molecule-AI/molecule-core/actions/runs/24970092066): - started 2026-04-26 23:45:59Z - created slug `e2e-canvas-20260426-1u8nz3` at 23:59Z - ended 2026-04-27 00:12:47Z (failure) - safety-net step ran with `today=20260427` - filter `e2e-canvas-20260427-` did not match `...20260426-1u8nz3` - tenant + child workspace EC2 both stayed up Confirmed via CP staging logs: no DELETE for `1u8nz3` ever issued. The Playwright globalTeardown didn't fire (test crashed mid-run); the workflow safety-net was the last line and it missed. ## Fix All three workflows now sweep BOTH today AND yesterday's UTC dates, so a run that crosses midnight still matches its own slug: ```python today = datetime.date.today() yesterday = today - datetime.timedelta(days=1) dates = (today.strftime('%Y%m%d'), yesterday.strftime('%Y%m%d')) prefixes = tuple(f'e2e-canvas-{d}-' for d in dates) # (canvas variant) ``` Per-run-id scoping (saas + canary) is preserved — the prior-day prefix still includes the run_id, so cross-midnight runs only sweep their own slugs, not other in-flight runs from yesterday. ## Why two-day window vs. arbitrary lookback A run can't legitimately last more than 24h on GitHub-hosted runners (workflow `timeout-minutes` caps; canary=25, e2e-saas=45, canvas=30). Two-day window is enough to cover any cross-midnight run without widening the cross-run-cleanup blast radius further. The `sweep-stale-e2e-orgs.yml` cron (with its 120-min age threshold) remains the catch-all for anything older that drifts through. ## Test plan - [x] Manual logic simulation: post-midnight slug matches yesterday's prefix; same-day still matches; 2-days-ago does NOT match; production tenant never matches - [x] All three workflow YAMLs syntactically valid - [ ] Next cross-midnight run cleans up its own slug 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 19:23:36 -07:00
Hongming Wang	b08c632740	Merge pull request #2064 from Molecule-AI/feat/external-runtime-first-class feat(external-runtime): first-class BYO-compute workspaces + manifest-driven runtime registry	2026-04-26 23:38:34 +00:00
Hongming Wang	808cc5437f	fix(canvas): ExternalConnectModal redundant null check on Dialog.Root open prop [Molecule-Platform-Evolvement-Manager] Addresses github-code-quality finding on PR #2064: > Comparison between inconvertible types > Variable 'info' cannot be of type null, but it is compared to > an expression of type null. By line 75, `info` has been narrowed to non-null via the `if (!info) return null;` guard at line 56 — so `open={info !== null}` always evaluates to `true`. Switch to JSX shorthand `open` for clarity and to silence the static check. Behaviorally identical; the modal still opens whenever the parent renders this component (which only happens with non-null info). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 16:36:03 -07:00
hongmingwang-moleculeai	a5e099d644	Merge branch 'staging' into feat/external-runtime-first-class	2026-04-26 16:34:17 -07:00
hongmingwang-moleculeai	fdf8b65c59	Merge pull request #2126 from Molecule-AI/fix/director-bypass-and-agent-comms fix(delegation): runtime handles 202+queued; canvas surfaces delegation rows	2026-04-26 23:08:53 +00:00
Hongming Wang	9516504480	Merge pull request #2127 from Molecule-AI/docs/secret-scan-self-doc-fix docs(ci): fix secret-scan reusable workflow self-doc — repo is molecule-core, ref is @staging	2026-04-26 23:06:56 +00:00
Hongming Wang	9d97e2af2f	Merge pull request #2128 from Molecule-AI/fix/a2a-idle-timeout-and-heartbeat-broadcast fix(a2a-proxy): close 60s context-canceled gap on long silent runs	2026-04-26 23:06:40 +00:00
Hongming Wang	5071454074	fix(delegation): lazy-refresh QUEUED state from platform; live DELEGATION_* events Critical follow-up to PR #2126's review. Two real bugs: 1. Runtime QUEUED never resolved. Platform's drain stitch updates the platform's delegate_result row when a queued delegation finally completes, but never pushes back to the runtime. The LLM polling check_delegation_status saw status="queued" forever — combined with the new docstring guidance ("queued → wait, peer will reply"), the model would wait indefinitely on a state that never resolves. Strictly worse than pre-PR behavior where it would have at least bypassed. 2. Live updates dead code. delegation.go writes activity rows by direct INSERT INTO activity_logs, bypassing the LogActivity helper that fires ACTIVITY_LOGGED. Adding "delegation" to the canvas's ACTIVITY_LOGGED filter (PR #2126 first cut) was inert — initial GET worked, live updates did not. Fix: (1) Runtime side, workspace/builtin_tools/delegation.py: - New `_refresh_queued_from_platform(task_id)` async helper that pulls /workspaces/<self>/delegations and finds the platform-side delegate_result row for our task_id. - check_delegation_status calls _refresh when local status is QUEUED, so the LLM's poll itself drives state convergence. - Best-effort: GET failure leaves local state untouched, next poll retries. - Docstring updated to reflect the actual behavior ("polls transparently — keep polling and you'll see the flip"). - 4 new tests cover: QUEUED → completed via refresh; QUEUED → failed via refresh; refresh keeps QUEUED when platform hasn't resolved; refresh swallows network errors safely. (2) Canvas side, AgentCommsPanel.tsx WS push handler: - Listens for DELEGATION_SENT / DELEGATION_STATUS / DELEGATION_COMPLETE / DELEGATION_FAILED in addition to ACTIVITY_LOGGED. - Each event's payload synthesized into an ActivityEntry shape so toCommMessage's existing delegation branch maps it. Status derived: STATUS uses payload.status, COMPLETE → "completed", FAILED → "failed", SENT → "pending". - The ACTIVITY_LOGGED branch keeps the "delegation" type accepted as a no-op-today / future-proof path: if delegation handlers are ever refactored to call LogActivity, this lights up automatically without another canvas change. Doesn't change: the docstring guidance ("queued → wait, don't bypass") is now actually load-bearing because the refresh path will deliver the eventual outcome. Without the refresh, the guidance was a trap. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 16:05:04 -07:00
Hongming Wang	00f78c6252	fix(a2a-proxy): log when A2A_IDLE_TIMEOUT_SECONDS is invalid Review-feedback follow-up. Pre-fix, A2A_IDLE_TIMEOUT_SECONDS=foo or =-30 fell back to the default with zero log signal — operator sets the wrong value, sees "no effect," wastes hours debugging "why is my override not working." Now bad-input cases log a clear message naming the variable, the bad value, and the default applied. Refactor: extract parseIdleTimeoutEnv(string) → time.Duration so the parse logic is unit-testable. defaultIdleTimeoutDuration is a const so tests reference it without re-deriving the value. 8 new unit tests cover empty / valid / negative / zero / non-numeric / float / trailing-units inputs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 15:57:00 -07:00
Hongming Wang	d552c43b94	fix(a2a-proxy): close 60s context-canceled gap on long silent runs Two compounding bugs caused the "context canceled" wave on 2026-04-26 (15+ failed user/agent A2A calls in 1hr across 6 workspaces, including the user's "send it in the chat" message that the director never received): 1. a2a_proxy.go:applyIdleTimeout cancels the dispatch after 60s of broadcaster silence for the workspace. Resets on any SSE event for the workspace, fires cancel() if no event arrives in time. 2. registry.go:Heartbeat broadcast was conditional — `if payload.CurrentTask != prevTask`. The runtime POSTs /registry/heartbeat every 30s, but if current_task hasn't changed the handler emits ZERO broadcasts. evaluateStatus only broadcasts on online/degraded transitions — also no-op when steady. Net: a claude-code agent on a long packaging step or slow tool call keeps the same current_task for >60s → no broadcasts → idle timer fires → in-flight request cancelled mid-flight with the "context canceled" error the user sees in the activity log. Fix: (a) Heartbeat handler always emits a `WORKSPACE_HEARTBEAT` BroadcastOnly event (no DB write — same path as TASK_UPDATED). At the existing 30s runtime cadence this resets the idle timer twice per minute. Cost is one in-memory channel send per active SSE subscriber + one WS hub fan-out per heartbeat — far below any noise floor. (b) idleTimeoutDuration default bumped 60s → 5min as a safety net for any future regression where the heartbeat path goes silent (e.g. runtime crashed mid-request before its next heartbeat). Made env-overridable via A2A_IDLE_TIMEOUT_SECONDS for ops who want to tune (canary tests fail-fast, prod tenants with slow plugins want longer). Either fix alone closes today's gap; both together is defence in depth. The runtime side already POSTs /registry/heartbeat every 30s via workspace/heartbeat.py — no runtime change needed. Test: TestHeartbeatHandler_AlwaysBroadcastsHeartbeat pins the property that an SSE subscriber observes a WORKSPACE_HEARTBEAT broadcast on a same-task heartbeat (the regression scenario). All 16 existing handler tests still pass. Doesn't fix: task #102 (single SDK session bottleneck) — peers will still queue when busy. But this PR ensures the queue/wait flow actually completes instead of being killed by the idle timer mid-wait. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 15:45:44 -07:00
rabbitblood	6e0a8e8e1c	docs(ci): fix secret-scan reusable workflow self-doc — repo is molecule-core, ref is @staging	2026-04-26 15:44:31 -07:00
Hongming Wang	ccb961a17b	Merge pull request #2096 from Molecule-AI/refactor/remove-canvas-hermes-runtime-profile-2054 refactor(canvas): remove RUNTIME_PROFILES.hermes — value flows server-side (#2054 phase 3)	2026-04-26 22:05:42 +00:00
Hongming Wang	05ee0843fc	Merge pull request #2125 from Molecule-AI/fix/canary-teardown-slug-pattern fix(ci): canary teardown safety-net slug pattern (was reversed)	2026-04-26 22:04:46 +00:00
Hongming Wang	057876cb0c	fix(delegation): runtime handles 202+queued; canvas surfaces delegation rows Two bugs that compounded into the "Director does the work itself" UX: 1. workspace/builtin_tools/delegation.py: _execute_delegation only handled HTTP 200 in the response branch. When the peer's a2a-proxy returned HTTP 202 + {queued: true} (single-SDK-session bottleneck on the peer), the loop fell through. Two iterations later the `if "error" in result` check tried to access an unbound `result`, the goroutine ended quietly, and the delegation stayed at FAILED with error="None". The LLM checking status saw "failed" + the platform's "Delegation queued — target at capacity" log line in chat context, concluded the peer was permanently unavailable, and bypassed delegation to do the work itself. Fix: explicit 202+queued branch. Adds DelegationStatus.QUEUED, marks the local delegation as QUEUED, mirrors to the platform, and returns cleanly without retrying. The retry loop is for transient transport errors — queueing is a real ack, not a failure to retry against (retrying would just re-queue the same task). check_delegation_status docstring extended with explicit per-status guidance: pending/in_progress → wait, queued → wait (peer busy on prior task, reply WILL arrive), completed → use result, failed → real error in error field; only fall back on failed, never queued. 2. canvas/src/components/tabs/chat/AgentCommsPanel.tsx: filter dropped every delegation row because it whitelisted only a2a_send / a2a_receive. activity_type='delegation' rows (written by the platform's /delegate handler with method='delegate' or 'delegate_result') never reached toCommMessage. User saw "No agent-to-agent communications yet" while 6+ delegations existed in the DB. Fix: include "delegation" in the both the initial filter and the WS push filter, plus a delegation branch in toCommMessage that maps the row as outbound (always — platform proxies on our behalf) and uses summary as the primary text source. Tests: - 3 new Python tests cover the 202+queued path: status becomes QUEUED not FAILED; no retry on queued (counted by URL match against the A2A target since the mock is shared across all AsyncClient calls); bare 202 without {queued:true} still falls through to the existing retry-then-FAILED path. - 3 new TS tests cover the delegation mapper: 'delegate' row maps as outbound to target with summary text; queued 'delegate_result' preserves status='queued' (load-bearing for the LLM's wait-vs-bypass decision); missing target_id returns null instead of rendering a ghost. Does NOT solve: the underlying single-SDK-session bottleneck that causes peers to queue in the first place. Tracked as task #102 (parallel SDK sessions per workspace) — real architectural work. This PR makes the runtime handle the queueing correctly so the LLM doesn't bail out, and makes the delegations visible in Agent Comms so operators can see what's happening. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 15:01:50 -07:00
hongmingwang-moleculeai	64ecdf9c3b	Merge pull request #2124 from Molecule-AI/fix/canary-job-timeout-headroom fix(canary): bump job timeout to 25m so bash fail + diagnostic can fire (#2090)	2026-04-26 21:45:32 +00:00
Hongming Wang	7425351321	fix(ci): canary teardown safety-net slug pattern (was reversed) [Molecule-Platform-Evolvement-Manager] ## What was broken `canary-staging.yml`'s teardown safety-net step filtered candidate slugs with `f'e2e-{today}-canary-'`. But `test_staging_full_saas.sh` emits canary slugs as `e2e-canary-${date}-${RUN_ID_SUFFIX}` — date SECOND, mode FIRST. Full-mode slugs are the other way around (`e2e-${date}-${RUN_ID_SUFFIX}`), and the canary workflow seems to have been copy-pasted from there without re-checking the slug generator. Net effect: the safety-net step ran on every cancelled / failed canary, hit the CP, got the org list, filtered to zero matches, and exited cleanly. Every cancelled canary EC2 leaked until the once-an-hour `sweep-stale-e2e-orgs.yml` cron eventually caught it (120-min default age threshold means ≥1h leak in the worst case). ## Today's incident Canary run 24966995140 cancelled at 21:03Z. EC2 `tenant-e2e-canary-20260426-canary-24966` still running 1h25m later, manually terminated by the CEO. Three earlier cancellations today (16:04Z, 19:26Z, 20:02Z) hit the same gap — visible as the hourly canary failure pattern in #2090. ## Fix - Filter prefix corrected to `e2e-canary-${today}-` (mode FIRST, date SECOND) to match the actual slug emitter. - Added per-run scoping (`-canary-${GITHUB_RUN_ID}-` suffix) when GITHUB_RUN_ID is set, mirroring the e2e-staging-saas.yml safety net's per-run scoping that was added after the 2026-04-21 cross-run cleanup incident — guards against a queued canary's safety-net step deleting an in-flight different canary's slug while the queue's `cancel-in-progress: false` lets two reach the teardown step concurrently. - Added a comment block tracing the bug + the prior incident so the next maintainer doesn't re-introduce the same mistake. ## Test plan - [x] Manual trace: today's slug `e2e-canary-20260426-canary-24966...` now matches `e2e-canary-20260426-canary-24966` prefix - [x] YAML parses - [ ] Next canary cancellation cleans up automatically ## Companion PR The PRIMARY symptom (TLS-timeout failures, not the leaked EC2) traces to a separate bug in `molecule-controlplane`: tunnel/DNS creation errors are logged-and-continued rather than failing provision. PR coming separately. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 14:44:27 -07:00
hongmingwang-moleculeai	9ee27a5180	Merge pull request #2122 from Molecule-AI/fix/nuke-and-rebuild-self-bootstraps fix(scripts): nuke-and-rebuild self-bootstraps templates; add E2E test	2026-04-26 21:43:13 +00:00
hongmingwang-moleculeai	ad81282ead	Merge pull request #2123 from Molecule-AI/fix/orphan-sweeper-labels-wiped-db fix(orphan-sweeper): reap labeled containers with no DB row (wiped-DB)	2026-04-26 21:42:56 +00:00
Hongming Wang	44d0444aae	fix(scripts): nuke-and-rebuild self-bootstraps templates; add E2E test Two paper cuts the fix addresses: 1. nuke-and-rebuild.sh wipes the compose stack but never re-populates workspace-configs-templates/, org-templates/, or plugins/. Those dirs are .gitignored — the curated set lives in manifest.json as external repos cloned via clone-manifest.sh (idempotent). Without that step, a fresh checkout or a post-deletion run leaves the dirs empty, which silently hides the entire template palette in Canvas + falls back to bare default workspace provisioning. Symptom: "Deploy your first agent" shows zero templates. 2. The existing ws-* container reap was already in the script (good), but it only fires when this script runs. Folks running `docker compose down -v` directly leave orphan ws-* containers behind. Documented that explicitly in the script comment so future readers understand why those lines are critical. The fix is just `bash clone-manifest.sh` added to the script. clone- manifest.sh is idempotent — populated dirs short-circuit, so a re-nuke on a healthy machine pays only a few stat calls. scripts/test-nuke-and-rebuild.sh exercises the canonical workflow end- to-end: - plants a fake orphan ws-* container, then asserts it gets reaped - renames the manifest dirs to simulate a fresh checkout, then asserts they get repopulated - waits for /health and asserts the platform sees the same template count on disk as via /configs in the container (catches bind-mount drift) - asserts the image-auto-refresh watcher (PR #2114) starts, since that's load-bearing for the CD chain users now rely on The test pre-flights port 5432/6379/8080 and exits 0 with a SKIP message if a non-target compose project is holding them — common when parallel monorepo checkouts coexist on one Docker daemon. scripts/ is intentionally outside CI shellcheck per ci.yml comment, but both files pass `shellcheck --severity=warning` anyway. Defers but does not solve the runtime root-cause for orphan ws-* after plain `docker compose down -v`: the orphan-sweeper in the platform only reaps containers whose workspace row says status='removed', so a wiped DB → no row → sweeper ignores them. Proper fix needs container labels keyed to a per-platform-instance UUID so the sweeper can confidently reap "containers I provisioned that aren't in my DB anymore" without nuking a sibling platform's containers on a shared daemon. Tracked as task #109's follow-up; out of scope for this PR. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 14:37:04 -07:00
rabbitblood	5478beef90	fix(canary): bump job timeout to 25m so bash fail + diagnostic can fire (#2090 ) PR #2107 bumped the bash-side TLS-readiness deadline in tests/e2e/test_staging_full_saas.sh from 600s to 900s (15 min) AND added a diagnostic burst on the fail path so the next failure would identify the broken layer (DNS / TLS / HTTP). What I missed: the canary workflow's own timeout-minutes was also 15. So GitHub Actions killed the job at the 15:00 wall-clock mark BEFORE the bash `fail` + diagnostic could fire — every cancellation silent, no failure comment on #2090, no diagnostic data attached. Visible in the 21:03 UTC canary run: cancelled at 14:03 step time (15:18 wall) without ever reaching the diagnostic block. Bump to 25 min — gives ~10 min headroom over the 15-min bash deadline for setup (org create + tenant provision + admin token fetch) plus the diagnostic dump plus teardown. Still tighter than the sibling staging E2E jobs (20/40/45 min) so a genuine wedge surfaces here first. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 14:36:02 -07:00
Hongming Wang	4915d1d59e	fix(orphan-sweeper): reap labeled containers with no DB row (wiped-DB) The existing sweeper only reaps ws-* containers whose workspace row has status='removed'. That misses the entire wiped-DB case: an operator does `docker compose down -v` (kills the postgres volume), the previous platform's ws-* containers keep running, the new platform boots into an empty workspaces table — first pass finds zero candidates and those containers leak forever. Symptom users hit today: 7 ws-* containers from 11h ago, no rows in DB, no visibility in Canvas, eating CPU + memory. Fix shape: 1. Provisioner stamps every ws-* container + volume with `molecule.platform.managed=true`. Without a label, the sweeper would have to assume any unlabeled ws-* container might belong to a sibling platform stack on a shared Docker daemon. 2. Provisioner exposes ListManagedContainerIDPrefixes — a label-filter counterpart to the existing name-filter. 3. Sweeper splits sweepOnce into two independent passes: - sweepRemovedRows (unchanged behavior; status='removed' only) - sweepLabeledOrphansWithoutRows (new; labeled containers whose workspace_id has no row in the table at all) Each pass has its own short-circuit so an empty result or transient error in one doesn't block the other — load-bearing because the wiped-DB pass exists precisely for cases where the removed-row pass finds nothing. Safe under multi-platform-on-shared-daemon: only containers carrying our label get reaped, sibling stacks' containers are invisible to this pass. (For now the label is a constant string; a future per-instance UUID layer can refine "ours" further if a real shared-daemon scenario emerges.) Migration: existing platforms running pre-PR builds have UNLABELED ws-* containers. After this lands they continue to NOT be reaped by the new path (no label = invisible). They'll only be cleaned via manual intervention or once the operator recreates them — same as today. No regression. Tests cover all five branches of the new pass: happy-path reap, no-reap when row exists, mixed reap-some-keep-some, Docker error short-circuits cleanly, non-UUID prefixes get filtered before the SQL query. Pairs with PR #2122 (script-level fix). Together they close the orphan-leak path for both `bash scripts/nuke-and-rebuild.sh` users (handled by the script) AND `docker compose down -v` users (handled by the runtime). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 14:33:41 -07:00
Hongming Wang	909cbe8b3a	Merge pull request #2121 from Molecule-AI/feat/canvas-test-coverage-2071 test(canvas): unit tests for useTemplateDeploy (#2071)	2026-04-26 21:25:09 +00:00
Hongming Wang	3248941ed5	Merge branch 'staging' into feat/canvas-test-coverage-2071	2026-04-26 14:22:26 -07:00
Hongming Wang	a9d2d46682	test(canvas): unit tests for useTemplateDeploy (#2071 ) [Molecule-Platform-Evolvement-Manager] Closes the first item from #2071 (Canvas test gaps follow-up): adds behavioural coverage for the shared template-deploy hook that both TemplatePalette (sidebar) and EmptyState (welcome grid) drive. 10 cases across 4 buckets: Happy path (4): - preflight ok → POST /workspaces → onDeployed fires with new id - caller-supplied canvasCoords flows into the POST body - default coords fall in [100,500) × [100,400) when canvasCoords omitted - template.runtime is preferred over the resolveRuntime fallback (locks the deduped-fallback table contract added in #2061) Preflight failures (2): - network throw sets error AND clears `deploying` (regression test for the "stranded button" bug called out in the SUT's inline comment — drop the try block and you'll fail this test) - not-ok-with-missing-keys opens the modal without firing POST Modal lifecycle (2): - 'keys added' click retries POST without re-running preflight (verifies the executeDeploy / deploy split — preflight call count stays at 1, POST count goes to 1) - 'cancel' click closes modal without firing POST POST failures (2): - Error rejection surfaces the message - non-Error rejection surfaces the "Deploy failed" fallback Mocks `@/lib/api`, `@/lib/deploy-preflight`, and `@/components/MissingKeysModal` (stand-in component exposes the two callbacks as test-id buttons — the real radix modal is irrelevant to this hook's behavior). Test file follows the `vi.hoisted` + import-after-mocks pattern from `canvas/src/app/__tests__/orgs-page.test.tsx`. ## Test plan - [x] All 10 cases pass locally (`vitest run useTemplateDeploy.test.tsx`) - [x] No changes to the SUT — pure additive coverage - [ ] CI green Follow-ups for the rest of #2071 (separate PRs): - A2AEdge rendering + click-to-select-source - OrgCancelButton cancel flow + optimistic state 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 14:17:35 -07:00
Hongming Wang	e02fedec99	Merge pull request #2120 from Molecule-AI/fix/secret-scan-merge-group fix(ci): handle merge_group + shallow-clone BASE in secret-scan	2026-04-26 21:11:54 +00:00
hongmingwang-moleculeai	228106db84	Merge pull request #2119 from Molecule-AI/refactor/provisioning-timeout-use-prune-helper refactor(canvas): ProvisioningTimeout uses pruneStaleKeys helper (follow-up to #2110)	2026-04-26 21:09:53 +00:00
Hongming Wang	0ce537750c	fix(ci): handle merge_group + shallow-clone BASE in secret-scan [Molecule-Platform-Evolvement-Manager] ## What was breaking Two distinct failure modes in `.github/workflows/secret-scan.yml`, both visible after PR #2115 / #2117 hit the merge queue: 1. `merge_group` events: the script reads `github.event.before / after` to determine BASE/HEAD. Those properties only exist on `push` events. On `merge_group` events both came back empty, the script fell through to "no BASE → scan entire tree" mode, and false-positived on `canvas/src/lib/validation/__tests__/secret-formats.test.ts` which contains a `ghp_xxxx…` literal as a masking-function fixture. (Run 24966890424 — exit 1, "matched: ghp_[A-Za-z0-9]{36,}".) 2. `push` events with shallow clone: `fetch-depth: 2` doesn't always cover BASE across true merge commits. When BASE is in the payload but absent from the local object DB, `git diff` errors out with `fatal: bad object <sha>` and the job exits 128. (Run 24966796278 — push at 20:53Z merging #2115.) ## Fixes - Add a dedicated fetch step for `merge_group.base_sha` (mirrors the existing pull_request base fetch) so the diff base is in the object DB before `git diff` runs. - Move event-specific SHAs into a step `env:` block so the script uses a clean `case` over `${{ github.event_name }}` instead of a single `if pull_request / else push` that left merge_group on the empty branch. - Add an on-demand fetch for the push-event BASE when it isn't in the shallow clone, plus a `git cat-file -e` guard before the diff so we fall through cleanly to the "scan entire tree" path if the fetch fails (correct, just slower) instead of exiting 128. ## Defense-in-depth `secret-formats.test.ts` had two literal continuous-string fixtures (`'ghp_xxxx…'`, `'github_pat_xxxx…'`). The ghp_ one matched the secret-scan regex. Switched both to the `'prefix_' + 'x'.repeat(N)` pattern already used elsewhere in the same file — runtime value is the same, but the literal source text no longer matches the regex even if the BASE detection ever falls back to tree-scan mode again. ## Test plan - [x] No remaining regex matches in the secret-formats.test.ts source - [x] YAML structure preserved - [ ] CI passes on this PR's pull_request scan (was already passing) - [ ] CI passes on this PR's merge_group scan (the new path) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 14:08:19 -07:00
rabbitblood	5d888abc41	refactor(canvas): ProvisioningTimeout uses pruneStaleKeys helper Follow-up to #2110 (which generalised pruneStaleKeys to Map<string, T>). Identified by the simplify reviewer on that PR as the only other in-tree caller of the same shape: `for (const id of map.keys()) { if (!liveIds.has(id)) map.delete(id); }`. Net: -3 lines, one less hand-rolled GC loop. No behaviour change — the helper does exactly what the inline block did. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 14:05:28 -07:00
Hongming Wang	84c3206e39	Merge pull request #2117 from Molecule-AI/fix/canvas-hydrate-delete-tombstones-2069 fix(canvas): tombstone deleted ids so in-flight hydrate can't resurrect them (#2069)	2026-04-26 20:57:51 +00:00
rabbitblood	8c69a98da2	chore(simplify): share FALLBACK_POLL_MS as the tombstone TTL + trim verbose comments Simplify pass on top of #2069 fix: - Export FALLBACK_POLL_MS from canvas/src/store/socket.ts and import it as TOMBSTONE_TTL_MS in deleteTombstones.ts. Single source of truth — tuning one without the other would silently re-open the hydrate-races-delete window. Required-fix per simplify reviewer. - Compress deleteTombstones.ts docstring from 30 lines to 10 — keep the "what + why module-level"; drop the long-form problem description (issue #2069 carries it). - Compress canvas.ts call-site comments at removeSubtree (4 lines → 2) and hydrate (2 lines → 2 but tighter). - Don't reassign the workspaces parameter inside hydrate — use a const `live` and thread it through the two downstream calls (computeAutoLayout, buildNodesAndEdges). Same effect, no lint smell. - Trim the canvas.test.ts integration-test preamble. No behaviour change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 13:52:49 -07:00
rabbitblood	7bb0bc39a2	fix(canvas): tombstone deleted ids so in-flight hydrate can't resurrect them (#2069 ) Closes #2069. removeSubtree dropped a parent + descendants locally after DELETE returned 200, but a GET /workspaces request that was IN-FLIGHT before the DELETE completed could land AFTER and hydrate the store with a stale snapshot — re-introducing the deleted nodes on the canvas until the next 10s fallback poll corrected it. New module canvas/src/store/deleteTombstones.ts holds a transient process-lifetime Map<id, deletedAt>. removeSubtree calls markDeleted(removedIds); hydrate calls wasRecentlyDeleted(id) to filter the incoming workspaces. TTL is 10s — matches the WS-fallback poll cadence so a single round-trip is covered, after which a legitimately re-imported id flows through normally. GC happens lazily at every read AND at write time so the map stays bounded — no separate timer / interval / unmount plumbing. Tests: - canvas/src/store/__tests__/deleteTombstones.test.ts: 7 cases covering immediate flag, never-marked, TTL boundary (9999ms vs 10001ms), GC-on-read, GC-on-write, re-mark resets timestamp, iterable input. - canvas/src/store/__tests__/canvas.test.ts: end-to-end "hydrate cannot resurrect ids that removeSubtree just dropped (#2069)" exercises the full chain at the store level. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 13:48:15 -07:00
Hongming Wang	b007d8ac73	Merge pull request #2110 from Molecule-AI/fix/canvas-prune-stale-subtree-ids-2070 fix(canvas): prune lastFitSubtreeIdsRef on stale roots (#2070)	2026-04-26 20:46:24 +00:00
Hongming Wang	a25ed57613	Merge pull request #2115 from Molecule-AI/chore/codeowners-personal-review-routing chore: add CODEOWNERS to auto-route agent PRs to your personal review account	2026-04-26 20:45:30 +00:00
Hongming Wang	1c38c78f5e	feat(compose): IMAGE_AUTO_REFRESH=true by default in local dev (#2116 ) Picks up the GHCR digest watcher added in PR #2114 with no operator action: just `docker compose up` and the platform self-heals to the latest workspace-template image within 5 minutes of publish. Default ON for local dev because that's where the runtime → workspace iteration loop is tightest. .env.example documents the override knob for the rare "running a long test that shouldn't be disturbed by a publish" case. Co-authored-by: Hongming Wang <hongmingwangalt@gmail.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 13:49:08 -07:00
Hongming Wang	dac55f3b42	chore: add CODEOWNERS to auto-route agent PRs to personal review account After landing the 1-required-review gate on staging in cycle 24, every agent-authored PR sits with `REVIEW_REQUIRED` until someone notices. CODEOWNERS solves the routing half: every changed path matches ``, so GitHub auto-requests review from @hongmingwang-moleculeai (the personal account, separate from the HongmingWang-Rabbit agent identity). PRs land in the personal account's notification queue automatically. The ` @hongmingwang-moleculeai` line is informational (route the request) rather than enforced — branch protection's require_code_owner_reviews flag is off, so any approving review still satisfies the 1-review gate. Flip that on later if you want CODEOWNERS approval to be the required review type. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 13:40:13 -07:00

1 2 3 4 5 ...

3131 Commits