molecule-core

Author	SHA1	Message	Date
Hongming Wang	d10c1a1a36	Merge pull request #2848 from Molecule-AI/feat/2799-phase3-pause-1777961500 feat(handlers): migrate Pause loop to StopWorkspaceAuto — #2799 Phase 3 (closes #2799)	2026-05-05 07:03:31 +00:00
Hongming Wang	61b7755c3c	feat(handlers): migrate Pause loop to StopWorkspaceAuto — #2799 Phase 3 Last open #2799 site. Pause's per-workspace stop call now routes through StopWorkspaceAuto, removing the final inline if-cpProv-else (actually if-h.provisioner) dispatch from workspace_restart.go's restart/pause/resume code paths. Pre-2026-05-05 the Pause loop was: if h.provisioner != nil { h.provisioner.Stop(ctx, ws.id) } Same drift class as #2813 (team-collapse leak) + #2814 (workspace delete leak) — Docker-only stop silently no-ops on SaaS, leaving the EC2 running while the workspace row gets marked paused. Orphan sweeper would catch it eventually but the leak window is real. Pause-specific bookkeeping (mark paused, clear workspace keys, broadcast WORKSPACE_PAUSED) stays inline in the handler; only the "stop the running workload" step delegates. StopWorkspaceAuto's no-backend → no-op semantics match the pre-fix behavior on misconfigured deployments (the bookkeeping still runs). One new source-level pin: TestPauseHandler_UsesStopWorkspaceAuto — gates regression to the inline dispatch shape. This closes #2799 Phase 3. After this PR + #2847 (Phase 2 PR-B) land, workspace_restart.go has no remaining inline if-cpProv-else dispatch in any user-facing code path. The remaining direct backend calls inside the file are in stopForRestart and cpStopWithRetry — both internal helpers that ARE the dispatcher's underlying primitives, not new bypasses. Note: scope was originally tagged "Phase 3 needs PauseWorkspaceAuto verb" in the audit on PR #2843. On closer reading Pause's stop step is identical to Stop — only the bookkeeping is Pause-specific. Reusing StopWorkspaceAuto avoids unnecessary surface and keeps the dispatcher trio (provision/stop/restart) tight. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-05 00:00:16 -07:00
Hongming Wang	21a7e7b0e7	Merge pull request #2847 from Molecule-AI/feat/2799-phase2b-runrestart-cycle-1777960000 feat(handlers): provisionWorkspaceAutoSync + Site 4 migration — #2799 Phase 2 PR-B	2026-05-05 06:53:18 +00:00
Hongming Wang	9a772bf946	feat(handlers): provisionWorkspaceAutoSync + Site 4 migration — #2799 Phase 2 PR-B runRestartCycle's auto-restart cycle (Site 4 from PR #2843's audit) needs synchronous provision dispatch — the outer pending-flag loop in RestartByID relies on returning when the new container is up so the next restart cycle doesn't race the in-flight provision goroutine on its Stop call. Phase 1's provisionWorkspaceAuto wraps each per-backend body in `go func() {...}()` — wrong shape for runRestartCycle's needs. This PR introduces provisionWorkspaceAutoSync as a behavioral mirror that runs in the current goroutine instead. Two helpers, kept identical except for the wrapper: provisionWorkspaceAuto: spawns goroutine, returns immediately provisionWorkspaceAutoSync: blocks until per-backend body returns Same backend-selection (CP first, Docker second) + no-backend mark-failed fallback. When one grows a new arm (third backend, retry semantics), the other should too — pinned in the docstring. Site 4 (runRestartCycle) was the only call site that needs sync today. Migrating it removes the last bare if-cpProv-else dispatch in the restart code path's provision half. Three new tests: - TestProvisionWorkspaceAutoSync_RoutesToCPWhenSet - TestProvisionWorkspaceAutoSync_NoBackendMarksFailed - TestRunRestartCycle_UsesProvisionWorkspaceAutoSync (source-level pin) Out of scope (last open #2799 site): Phase 3 — Site 5 (Pause loop). PAUSE doesn't reprovision; needs a new PauseWorkspaceAuto verb. After this PR lands, Pause is the only inline if-cpProv-else dispatch left in workspace_restart.go. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-04 23:44:54 -07:00
Hongming Wang	0a90d7ae1a	Merge pull request #2846 from Molecule-AI/feat/2799-phase2-restart-resume-1777958000 feat(handlers): migrate Restart + Resume handlers to dispatchers — #2799 Phase 2 PR-A	2026-05-05 05:15:29 +00:00
Hongming Wang	5b7f4d260b	feat(handlers): migrate Restart + Resume handlers to dispatchers — #2799 Phase 2 PR-A Sites 1+2 (Restart HTTP handler goroutine) and Site 3 (Resume HTTP handler goroutine) now route through RestartWorkspaceAutoOpts / provisionWorkspaceAuto instead of inlining the if-cpProv-else dispatch. Three changes: 1. RestartWorkspaceAutoOpts — new variant of RestartWorkspaceAuto that carries the resetClaudeSession Docker-only flag (issue #12). The bare RestartWorkspaceAuto still exists as a wrapper that calls Opts with false. CP path silently ignores the flag (each EC2 boots fresh — no session state to clear). Mirrors the Provision pair (provisionWorkspace / provisionWorkspaceOpts). 2. Restart handler (Site 1+2) — the inline goroutine `if h.provisioner != nil { Stop } else if h.cpProv != nil { ... }` collapses to `RestartWorkspaceAutoOpts(...)`. Pre-fix the dispatch was Docker-FIRST ordering (a different drift class from the silent-drop bugs PRs #2811/#2824 closed); the dispatcher enforces CP-FIRST. 3. Resume handler (Site 3) — Resume is provision-only (workspace is paused, no live container), so it routes through provisionWorkspaceAuto, not RestartWorkspaceAuto. Inline if-cpProv-else dispatch removed. Two new source-level pins: - TestRestartHandler_UsesRestartWorkspaceAuto - TestResumeHandler_UsesProvisionWorkspaceAuto These prevent regression to the inline dispatch pattern. Out of scope (tracked under #2799): - Site 4 (runRestartCycle) — synchronous coordination model needs a different shape than the fire-and-return dispatchers. PR-B. - Site 5 (Pause loop) — PAUSE doesn't reprovision, needs a new PauseWorkspaceAuto verb. Phase 3. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-04 22:09:12 -07:00
Hongming Wang	f0fd7b4d9e	Merge pull request #2845 from Molecule-AI/feat/rfc2829-enable-server-side feat(delegations): wire RFC #2829 sweeper + admin routes into platform server	2026-05-05 05:04:11 +00:00
Hongming Wang	7993693cf1	feat(delegations): wire RFC #2829 sweeper + admin routes into platform server Activates the server-side foundation that PRs #2832, #2836, #2837 shipped without wiring (each PR landed dead code on purpose so the review surface stayed tight). ## What this PR wires up 1. router.go — registers the RFC #2829 PR-4 admin endpoints behind AdminAuth: GET /admin/delegations[?status=...&limit=N] GET /admin/delegations/stats 2. cmd/server/main.go — starts the RFC #2829 PR-3 stuck-task sweeper as a supervised goroutine alongside the existing scheduler + hibernation-monitor + image-auto-refresh: go supervised.RunWithRecover(ctx, "delegation-sweeper", delegSweeper.Start) ## What this PR does NOT do - PR-2's DELEGATION_RESULT_INBOX_PUSH flag stays default off — flip happens via env config in a follow-up after staging burn-in. - PR-5's DELEGATION_SYNC_VIA_INBOX flag stays default off — same reason. The two flags are independent; either can be flipped in isolation. - Canvas operator panel UI: this PR exposes the JSON contract; the canvas panel consumes it in a separate canvas PR. ## Coverage 2 new router gate tests in admin_delegations_route_test.go: - List endpoint requires AdminAuth (unauthenticated → 401) - Stats endpoint requires AdminAuth (unauthenticated → 401) Pattern mirrors admin_test_token_route_test.go (the IDOR-fix gate for PR #112). Catches a future router refactor that silently drops AdminAuth — operator dashboard data exposes caller_id, callee_id, and task_preview, none of which should reach unauthenticated callers. Sweeper boots as a no-op until at least one delegation row exists, so this PR is safe to land before PR-5's agent-side cutover sees production traffic. Refs RFC #2829.	2026-05-04 22:00:59 -07:00
Hongming Wang	789d705866	Merge pull request #2843 from Molecule-AI/fix/restart-dispatcher-rework-1777956000 feat(handlers): RestartWorkspaceAuto dispatcher — #2799 Phase 1 (re-do of #2835)	2026-05-05 04:48:52 +00:00
Hongming Wang	cb820acbd6	fix(test): pre-register sqlmock for panic-recovered Docker test goroutine	2026-05-04 21:44:31 -07:00
Hongming Wang	52915268b2	Merge pull request #2844 from Molecule-AI/feat/rfc2829-pr5-agent-side-cutover feat(delegations): agent-side cutover — sync delegate uses async+poll path (RFC #2829 PR-5)	2026-05-05 04:35:55 +00:00
Hongming Wang	82e7059e0e	Merge pull request #2842 from Molecule-AI/fix/codex-template-bump-cli-pin fix(external-templates): unpin codex CLI from stale ^0.57	2026-05-05 04:34:14 +00:00
Hongming Wang	5950d4cd81	feat(delegations): agent-side cutover — sync delegate uses async+poll path (RFC #2829 PR-5) Behind feature flag DELEGATION_SYNC_VIA_INBOX (default off). When set, tool_delegate_task no longer holds an HTTP message/send connection through the platform proxy waiting for the callee's reply. Instead: 1. POST /workspaces/<src>/delegate (returns 202 + delegation_id) — platform's executeDelegation goroutine handles A2A dispatch in the background. No client-side timeout dependency on the platform holding a connection open. 2. Poll GET /workspaces/<src>/delegations every 3s for a row with matching delegation_id reaching terminal status (completed/failed). 3. Return the response_preview text on completed; surface the wrapped _A2A_ERROR_PREFIX error on failed (so caller error detection stays unchanged). This closes the bug class that broke Hongming's home hermes on 2026-05-05 ("message/send queued but result not available after 600s timeout" while the callee was actively heartbeating "iteration 14/90"). ## Compatibility Default-off feature flag — flag-off path is byte-identical to the legacy send_a2a_message behavior, pinned by TestFlagOffLegacyPath::test_flag_off_uses_send_a2a_message_not_polling. Idempotency-key derivation matches tool_delegate_task_async (SHA-256 of source:target:task) so a restart-mid-delegation gets the same key and the platform returns the existing delegation_id. ## Recovery on timeout If the polling budget (DELEGATION_TIMEOUT, default 300s) elapses without a terminal status, the error message includes the delegation_id + a "call check_task_status('<id>') to retrieve later" hint. The platform's durable row is still live — work is NOT lost, just the synchronous wait is over. Caller can poll for the result later via the existing check_task_status tool. ## Stack with PR-2 PR-2 added the SERVER-SIDE result-push to the caller's a2a_receive inbox row. PR-5 (this PR) adds the AGENT-SIDE cutover. Together they remove the proxy-blocked sync path entirely. PR-2 default-off keeps existing behavior; PR-5 default-off keeps existing behavior. Operators flip both for full effect after staging burn-in. ## Coverage 9 unit tests: - flag off → byte-identical to legacy (send_a2a_message called, _delegate_sync_via_polling NOT called) - dispatch HTTP exception → wrapped error - dispatch non-2xx → wrapped error mentioning HTTP code - dispatch missing delegation_id → wrapped error - completed first poll → response_preview returned - failed status → wrapped error with error_detail - transient poll error → keeps polling, eventually succeeds - deadline exceeded → wrapped timeout error mentions delegation_id + check_task_status hint for recovery - filters by delegation_id (other delegations' rows ignored) All passing locally. CI will run the same suite on a clean env. Refs RFC #2829.	2026-05-04 21:31:11 -07:00
Hongming Wang	1e12ed7e9f	Merge pull request #2833 from Molecule-AI/feat/rfc2829-pr2-result-push-and-sync-cutover feat(delegations): result-push to caller inbox behind feature flag (RFC #2829 PR-2)	2026-05-05 04:30:44 +00:00
Hongming Wang	4f67fe59fb	feat(handlers): RestartWorkspaceAuto dispatcher — #2799 Phase 1 Closes the third silent-drop-on-SaaS class for the restart verb. Two of the three dispatchers were already in place (provisionWorkspaceAuto PR #2811, StopWorkspaceAuto PR #2824); this completes the trio. PR #2835 was an earlier attempt at this work (delivered by a peer agent) that I had to send back for four critical bugs — stop-leg dispatch order inverted, no-backend nil-deref, empty payload (dispatcher unusable by callers), forcing-function tests red-from-day-1. This re-do takes the audit + classification from that work but rebuilds the implementation against the existing dispatcher convention. Phase 1 scope: - RestartWorkspaceAuto in workspace.go — symmetric mirror of provisionWorkspaceAuto + StopWorkspaceAuto. CP-first dispatch order. cpStopWithRetry on the SaaS leg (Restart's "make it alive again" contract justifies the retry that StopWorkspaceAuto's delete-time contract does not). Three-arm shape including a no-backend mark-failed defense-in-depth. - Three new pin tests covering the routing surface: TestRestartWorkspaceAuto_RoutesToCPWhenSet, TestRestartWorkspaceAuto_RoutesToDockerWhenOnlyDocker, TestRestartWorkspaceAuto_NoBackendMarksFailed. Phase 2/3 (deferred, file as follow-up issue): - workspace_restart.go's manual dispatch sites (Restart handler goroutine, Resume handler goroutine, runRestartCycle's inline Stop, Pause loop). Each site has async-context reasoning beyond a fire-and-return dispatcher and needs per-site review. - Pause specifically needs a different verb (PauseWorkspaceAuto) since Pause doesn't reprovision. Why no callers migrated in this PR: the existing call sites in workspace_restart.go all build their `payload` from a synchronous DB read first; rewiring them needs care to preserve that ordering plus the resetClaudeSession + template path resolution that lives in the HTTP handler context. Splitting the dispatcher introduction from the migration keeps each PR small and reviewable. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-04 21:30:36 -07:00
Hongming Wang	410275e5af	fix(external-templates): unpin codex CLI from stale ^0.57 `^0.57` only allows 0.57.x — codex CLI is now at 0.128 with breaking API changes between (notably `exec --resume <sid>` → `exec resume <sid>` subcommand). Operators following the snippet today either get a 6-month-old codex with the legacy resume flag, OR install latest manually and discover the daemon previously couldn't drive it. codex-channel-molecule 0.1.2 (just published) handles the new subcommand shape, so operators are best served by always getting the latest codex that the bridge daemon was last validated against. Bump to `@latest`. If a future codex CLI breaks the daemon's invocation again, we ship a new bridge-daemon release rather than asking operators to manage a pin themselves. Test: go test ./internal/handlers/ -run TestExternalTemplates -count=1 → green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-04 21:27:45 -07:00
Hongming Wang	1557743ef9	Merge branch 'staging' into feat/rfc2829-pr2-result-push-and-sync-cutover	2026-05-04 21:25:33 -07:00
Hongming Wang	e727b31246	Merge pull request #2841 from Molecule-AI/fix/drift-check-pr-soft-skip-with-warning fix(branch-protection-drift): hard-fail on schedule only — unblock PRs missing the secret	2026-05-05 04:22:52 +00:00
Hongming Wang	ae05f91bd8	Merge pull request #2840 from Molecule-AI/feat/canvas-memory-add-edit-modal feat(canvas/memories): Add + Edit modal for MemoryInspectorPanel	2026-05-05 04:20:51 +00:00
Hongming Wang	c89f17a2aa	fix(branch-protection-drift): hard-fail on schedule only, soft-skip + warn on PR #2834 added a hard-fail when GH_TOKEN_FOR_ADMIN_API is missing on schedule + pull_request + workflow_dispatch. The PR-trigger hard-fail is now blocking every PR in the repo because the secret hasn't been provisioned yet — including the staging→main auto-promote PR (#2831), which has no path to set repo secrets itself. Per feedback_schedule_vs_dispatch_secrets_hardening.md the original concern is automated/silent triggers losing the gate without a human to notice. That concern applies to schedule specifically: - schedule: cron, no human, silent soft-skip = invisible regression → KEEP HARD-FAIL. - pull_request: a human is reviewing the PR diff and will see workflow warnings inline. A PR cannot retroactively drift live state — drift happens between PRs (UI clicks, manual gh api PATCH), which the schedule canary catches. The PR-time gate would only catch typos in apply.sh, which the *_payload unit tests catch more directly. → SOFT-SKIP with a prominent warning. - workflow_dispatch: operator override, may not have configured the secret yet. → SOFT-SKIP with warning. The skip is explicit (SKIP_DRIFT_CHECK=1 surfaced to env, then a step `if:` guard) so it's auditable in the workflow run UI, not silently swallowed. Unblocks #2831 (auto-promote staging→main) + every PR currently behind this check.	2026-05-04 21:20:30 -07:00
Hongming Wang	cbe48c2225	feat(canvas/memories): Add + Edit modal for MemoryInspectorPanel The Memory tab was read-only — users could see and Delete entries but the only path to write was leaving canvas. Adds a + Add button (toolbar, next to Refresh) and an Edit button (per-entry, next to Delete) that share one MemoryEditorDialog. Add: POST /workspaces/:id/memories with {content, scope, namespace} Edit: PATCH /workspaces/:id/memories/:id (sibling endpoint #2838) with only fields that changed; no-op edits short-circuit client-side so we don't waste a redactSecrets + re-embed pass Edit mode locks scope (cross-scope moves go through delete + recreate to keep the GLOBAL audit-log + redact pipeline single-purpose). Tests: 6 cases on the dialog covering POST shape, PATCH-only-diff, no-op short-circuit, empty-content guard, save-error keeps modal open, and namespace+content combined PATCH. Existing 27 MemoryInspectorPanel tests still pass with the new prop wiring. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-04 21:16:35 -07:00
Hongming Wang	b0bcd97781	Merge pull request #2839 from Molecule-AI/fix/status-failed-must-set-error-1777954000 fix(bundle): markFailed sets last_sample_error + AST drift gate (resolves #2632 root cause)	2026-05-05 04:12:38 +00:00
Hongming Wang	56149f8a24	fix(bundle): markFailed sets last_sample_error + AST gate Closes the bug class surfaced by Canvas E2E #2632: a workspace ends up status='failed' with last_sample_error=NULL, and operators (or the E2E poll loop) see the useless "Workspace failed: (no last_sample_error)" with no triage signal. Two pieces: 1. bundle/importer.go markFailed — the UPDATE was setting only status, leaving last_sample_error NULL. Same incident class as the silent-drop bugs in PRs #2811 + #2824, different code path. markProvisionFailed in workspace_provision_shared.go has set the message column for a long time; this writer drifted the convention. Fix: include last_sample_error in the SET clause + the broadcast. 2. AST drift gate (db/workspace_status_failed_message_drift_test.go) — Go AST walk that finds every db.DB.{Exec,Query,QueryRow}Context call whose argument list binds models.StatusFailed and asserts the SQL literal contains last_sample_error. Catches the next caller that drifts the same convention. Verified to FAIL against the bug shape (reverted importer.go temporarily — gate flagged the exact line) and PASS against the fix. Why an AST gate vs a regex: pre-fix attempt with a regex over UPDATE statements flagged status='online' / status='hibernating' / status= 'removed' UPDATEs as false positives. Walking the AST and only flagging calls that pass the StatusFailed constant eliminates that. Out of scope (filed separately if needed): - The Canvas E2E that surfaced the missing message (#2632) is now a required check on staging via PR #2827. Once this fix lands the next staging push should re-run #2632's failing case and produce a meaningful last_sample_error. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-04 21:08:08 -07:00
Hongming Wang	0134353a48	Merge pull request #2838 from Molecule-AI/feat/memories-update-endpoint feat(memories): PATCH /workspaces/:id/memories/:id endpoint for edits	2026-05-05 04:06:01 +00:00
Hongming Wang	aca7d99152	Merge pull request #2837 from Molecule-AI/feat/rfc2829-pr4-operator-dashboard feat(delegations): operator dashboard endpoint over the durable ledger (RFC #2829 PR-4)	2026-05-05 04:01:46 +00:00
Hongming Wang	aec0fb35d2	feat(memories): PATCH /workspaces/:id/memories/:id endpoint for edits Pre-fix the only writes to agent_memories were Commit (POST) and Delete (DELETE). Editing an entry meant delete + recreate, losing the original id and created_at, and (the user-visible reason for filing this) leaving the canvas Memory tab without an Edit button at all. Adds PATCH that accepts either content, namespace, or both — at least one required (empty body 400s; silently no-op'ing would let a buggy client think it succeeded). The full Commit security pipeline is re-run on content edits: - redactSecrets on every scope (#1201 SAFE-T) - GLOBAL [MEMORY → [_MEMORY delimiter escape (#807 SAFE-T) - GLOBAL audit log row mirroring Commit's #767 forensic pattern - re-embed via the configured EmbeddingFunc (skipping would leave the row's vector pointing at the OLD content, silently breaking semantic search) Cross-scope edits (LOCAL→GLOBAL) intentionally NOT supported — that's delete + recreate so the GLOBAL access-control gate (only root workspaces can write GLOBAL) gets re-evaluated cleanly. 7 new sqlmock tests pin: namespace-only, content-only LOCAL, content-only GLOBAL with audit + escape, empty-body 400, empty- content 400, 404 on missing/wrong-workspace memory, no-op 200 with changed=false (and crucially: no UPDATE fires on no-op). Build clean, full handlers test suite (./internal/handlers) passes in 4s. PR-2 (frontend): Add modal + Edit button in MemoryInspectorPanel.tsx will land separately.	2026-05-04 21:00:47 -07:00
Hongming Wang	b5c0b4d371	Merge pull request #2836 from Molecule-AI/feat/rfc2829-pr3-stuck-task-sweeper feat(delegations): stuck-task sweeper with deadline + heartbeat-staleness rules (RFC #2829 PR-3)	2026-05-05 03:59:16 +00:00
Hongming Wang	2ed4f4fb41	feat(delegations): operator dashboard endpoint over the durable ledger (RFC #2829 PR-4) Two read endpoints over the `delegations` table (PR-1 schema): GET /admin/delegations[?status=in_flight\|stuck\|failed\|completed&limit=N] GET /admin/delegations/stats ## What this gives operators Without this, post-incident investigation requires direct DB access — only the on-call SRE can answer "is workspace X delegating to a wedged callee?". This moves that visibility into the same surface as /admin/queue, /admin/schedules-health, /admin/memories. ## List endpoint Status filter via tight allowlist: - in_flight (default) → status IN (queued, dispatched, in_progress) - stuck → status='stuck' (rows the PR-3 sweeper marked) - failed → status='failed' - completed → status='completed' Unknown status → 400 with the allowlist in the error body. Limit 1..1000, default 100. The status allowlist drives a parameterized IN clause (no string- concatenation of user-controlled values into SQL). Result rows expose all the audit-grade fields the dashboard needs: delegation_id, caller_id, callee_id, task_preview, status, last_heartbeat, deadline, result_preview, error_detail, retry_count, created_at, updated_at. Nullable fields use pointer types so JSON omits them when NULL (no false-zero "" for missing values). ## Stats endpoint Zero-fills every known status key (queued, dispatched, in_progress, completed, failed, stuck) so the dashboard summary card doesn't have to handle "missing key vs zero" branching. ## Out of scope (deferred) - "retry this stuck task" mutation: needs the agent-side cutover (RFC #2829 PR-5 plan) before re-fire is safe - p95 / p99 duration aggregates: separate metric exposure, not a row-level read endpoint - Canvas UI: this is the JSON contract; the canvas operator panel consumes it in a follow-up canvas PR ## Wiring NOT wired into the router in this PR — ships separately to keep PR-by-PR review surface tight. Wiring will land in the `enable-rfc2829-server-side` follow-up PR alongside the sweeper Start call and the result-push flag flip. ## Coverage 11 unit tests: List (8): - default status=in_flight, IN(queued,dispatched,in_progress) - status=stuck → IN(stuck) - status=failed → IN(failed) - unknown status → 400 with allowlist - negative limit → 400 - over-cap limit → 400 - custom limit accepted + echoed in response - nullable fields populated correctly (pointer-omitempty) Stats (2): - zero-fills missing status keys - empty table → all counts zero Contract pin (1): - statusFilters table shape — every documented key + value pair pinned. Drift catches accidental edits (forward defense). Refs RFC #2829.	2026-05-04 20:58:17 -07:00
Hongming Wang	02b325063b	feat(delegations): stuck-task sweeper with deadline + heartbeat-staleness rules (RFC #2829 PR-3) Periodically scans the `delegations` table (PR-1 schema) for in-flight rows that need terminal action: 1. Deadline-exceeded → marked `failed` with "deadline exceeded by sweeper" 2. Heartbeat-stale (no beat for >10× heartbeat interval) → marked `stuck` ## Why both rules Deadline catches forever-heartbeating wedged agents (the alive-but-not- advancing class — agent loops on heartbeat call inside its main loop). Heartbeat-staleness catches OOM-killed and crashed agents that stop cold without graceful shutdown. Either rule alone misses one of these classes. ## Order matters Deadline is checked first. A deadline-exceeded AND stale row is marked `failed` (operator action: investigate + give up), not `stuck` (operator action: investigate + retry). The semantic difference matters. ## NULL heartbeat is a free pass A delegation that's just been inserted but hasn't emitted its first heartbeat yet is NOT stuck-marked — gives the agent its first beat window. Lets the deadline catch true never-started rows naturally. ## Concurrent-completion safety Sweep races with UpdateStatus on a delegation that just completed: the ledger's terminal forward-only protection (PR-1) returns ErrInvalidTransition, sweeper logs + counts in Errors, the row stays correctly in completed. ## Configuration - DELEGATION_SWEEPER_INTERVAL_S — tick cadence (default 5min) - DELEGATION_STUCK_THRESHOLD_S — heartbeat-staleness threshold (default 10min) Both fall back gracefully on invalid input (typo'd env shouldn't crash startup). Both read at construction time so a long-running process picks up overrides via restart. ## Wiring NOT wired into main.go in this PR — that ships separately so the sweeper can be enabled/disabled independently of the binary upgrade. The sweeper is a standalone Sweep(ctx) callable + Start(ctx) ticker loop, both with panic recovery, both indexed-scan-cheap on the partial idx_delegations_inflight_heartbeat from PR-1. ## Coverage 13 unit tests against sqlmock-backed *sql.DB: Sweep semantics (8 tests): - empty in-flight set → clean no-op - deadline → failed - heartbeat-stale → stuck - NULL heartbeat is left alone (first-beat free pass) - healthy row → no-op - both-rule row → marked failed (deadline wins) - mixed set → both rules fire on the right rows - concurrent-completion race → forward-only protection holds Env override parsing (5 tests): - default on missing env - parses positive seconds - falls back on garbage - falls back on negative - constructor picks up overrides; defaults when env unset Refs RFC #2829.	2026-05-04 20:55:13 -07:00
Hongming Wang	43caac911a	Merge pull request #2834 from Molecule-AI/fix/branch-protection-apply-respects-live-state fix(branch-protection): apply.sh respects live state + full-payload drift	2026-05-05 03:54:50 +00:00
Hongming Wang	2e505e7748	fix(branch-protection): apply.sh respects live state + full-payload drift Multi-model review of #2827 caught: the script as-shipped would have silently weakened branch protection on EVERY non-checks dimension the moment anyone ran it. Live staging had enforce_admins=true, dismiss_stale_reviews=false, strict=true, allow_fork_syncing=false, bypass_pull_request_allowances={ HongmingWang-Rabbit + molecule-ai app } Script wrote the opposite for all five. Per memory feedback_dismiss_stale_reviews_blocks_promote.md, the dismiss_stale_reviews flip alone is the load-bearing one — would silently re-block every auto-promote PR (cost user 2.5h once). This PR: 1. apply.sh: per-branch payloads (build_staging_payload / build_main_payload) that codify the deliberate per-branch policy already on the repo, with the script's net contribution being ONLY the new check names (Canvas tabs E2E + E2E API Smoke on staging, Canvas tabs E2E on main). 2. apply.sh: R3 preflight that hits /commits/{sha}/check-runs and asserts every desired check name has at least one historical run on the branch tip. Catches typos like "Canvas Tabs E2E" vs "Canvas tabs E2E" — pre-fix a typo would silently block every PR forever waiting for a context that never emits. Skip via --skip-preflight for genuinely-new workflows whose first run hasn't fired. 3. drift_check.sh: compares the FULL normalised payload (admin, review, lock, conversation, fork-syncing, deletion, force-push) not just the checks list. Pre-fix the drift gate would have missed a UI click that flipped enforce_admins or dismiss_stale_reviews. Drops app_id from the comparison since GH auto-resolves -1 to a specific app id post-write. 4. branch-protection-drift.yml: per memory feedback_schedule_vs_dispatch_secrets_hardening.md — schedule + pull_request triggers HARD-FAIL when GH_TOKEN_FOR_ADMIN_API is missing (silent skip masks the gate disappearing). workflow_dispatch keeps soft-skip for one-off operator runs. Verified by running drift_check against live state: pre-fix would have shown 5 destructive drifts on staging + 5 on main. Post-fix shows ONLY the 2 intended additions on staging + 1 on main, which go away after `apply.sh` runs.	2026-05-04 20:52:11 -07:00
Hongming Wang	ae79b9e9fe	feat(delegations): result-push to caller inbox behind feature flag (RFC #2829 PR-2) When a delegation completes (or fails), also write an `activity_type='a2a_receive'` row to the caller's activity_logs so the caller's inbox poller (workspace/inbox.py — `?type=a2a_receive`) surfaces the result to the agent. Why: today the only way the caller agent learns about a delegation result is by holding open an HTTP `message/send` connection through the platform proxy. That connection has a hard timeout (~600s) — a 90-iteration external-runtime task on stream output routinely blows past it, and the result emitted after the timeout lands in /dev/null. (Hongming's home hermes hit this on 2026-05-05 — task was actively heartbeating "iteration 14/90" when the proxy timer fired.) This PR adds the SERVER-SIDE result-push so the result is durably delivered to the caller's inbox queue. The agent-side cutover (replace sync httpx delegation with delegate_task_async + wait_for_message poll) ships in the next PR — once both land, the proxy timeout class is gone. ## Feature flag `DELEGATION_RESULT_INBOX_PUSH=1` enables the push. Default off — staging canary first, flip after RFC #2829 PR-3 (agent-side) lands and proves the round-trip end-to-end. With the flag off, behavior is byte-identical to before this PR (verified by TestUpdateStatus_FlagOff_NoNewSQL). ## Two write sites 1. UpdateStatus handler (POST /workspaces/:id/delegations/:id/update) — agent-initiated delegations report status here 2. executeDelegation goroutine — canvas-initiated delegations (POST /workspaces/:id/delegate) report status from this background coroutine Both paths call `pushDelegationResultToInbox` which is best-effort: an INSERT failure logs but does NOT propagate up. The existing `delegate_result` row in activity_logs (the dashboard view) remains authoritative; the new `a2a_receive` row is purely additive for the inbox-poller to surface. ## Coverage 6 new tests in delegation_inbox_push_test.go: - flag off → no SQL fired (the rollout-safety contract) - flag on, completed → a2a_receive row with status=ok - flag on, failed → a2a_receive row with status=error + error_detail - UpdateStatus end-to-end (flag on, completed) - UpdateStatus end-to-end (flag on, failed) - UpdateStatus end-to-end (flag off, byte-identical to pre-PR behavior) All 30 existing delegation_test.go tests still pass — flag default off keeps the strict-sqlmock surface unchanged. Refs RFC #2829.	2026-05-04 20:50:46 -07:00
Hongming Wang	b3b9a242d6	Merge pull request #2832 from Molecule-AI/feat/rfc2829-pr1-delegations-table feat(delegations): durable per-task ledger + audit-write helper (RFC #2829 PR-1)	2026-05-05 03:47:06 +00:00
Hongming Wang	ed6dfe01e5	feat(delegations): durable per-task ledger + audit-write helper (RFC #2829 PR-1) Adds the `delegations` table and the DelegationLedger writer that PRs #2-#4 of RFC #2829 build on. Schema-only foundation — no behavior change in this PR. PR-2 wires the ledger into the existing handlers and ships the result- push-to-inbox cutover behind a feature flag. Why a dedicated table when activity_logs already records every delegation event: Today, "what is currently in flight for this workspace" is reconstructed by GROUPing activity_logs by delegation_id and ORDER BY created_at DESC. PR-3's stuck-task sweeper needs the join SELECT delegation_id FROM delegations WHERE status = 'in_progress' AND last_heartbeat < now() - interval '10 minutes' which is impossible to express against the event stream without a window over every (delegation_id, latest event) pair — a planner-killing query at scale. The dedicated table makes the sweeper an indexed scan. Same posture as tenant_resources (PR #2343, memory `reference_tenant_resources_audit`): activity_logs remains the audit- grade source of truth, delegations is the queryable view for dashboards + sweeper joins. Symmetric writes — both tables are written, neither blocks orchestration on the other's failure. Schema highlights: - delegation_id PRIMARY KEY (caller-chosen, idempotent retry on restart is a no-op via ON CONFLICT DO NOTHING) - caller_id / callee_id NOT FK — workspace delete must NOT cascade- delete delegation history (audit retention) - status CHECK constraint enforces the lifecycle (queued\|dispatched\|in_progress\|completed\|failed\|stuck) - last_heartbeat NULL-able; PR-3 sweeper compares to NOW() - deadline default now()+6h matches longest-observed legit delegation (memory-namespace migrations) — protects against forever-heartbeating wedged agents - Partial index `idx_delegations_inflight_heartbeat` keeps the sweeper hot path tiny (only non-terminal rows) - UNIQUE(caller_id, idempotency_key) WHERE NOT NULL — natural collision becomes ON CONFLICT no-op without colliding across callers DelegationLedger.SetStatus enforces forward-only on terminal states (completed/failed/stuck cannot be revised) as defense-in-depth on the schema CHECK. Same-status replay is a no-op. Missing-row SetStatus is a no-op (transient inconsistency the next agent retry will heal). Heartbeat updates only in-flight rows — terminal-state delegations are silently skipped. Coverage: - 17 unit tests against sqlmock-backed *sql.DB (Insert happy path, missing-required guards, truncation, lifecycle transitions, terminal forward-only protection, replay no-op, missing-row no-op, empty-input rejection, heartbeat semantics, transition table shape) - Migration roundtrip verified on a real Postgres 15 instance: up creates the expected schema with all 4 indexes + CHECK, down drops everything cleanly. Refs RFC #2829.	2026-05-04 20:43:06 -07:00
Hongming Wang	ca6e7c39cf	Merge pull request #2830 from Molecule-AI/ux/terminal-tab-external-not-available feat(canvas/terminal): "Not available" banner for runtimes without a TTY	2026-05-05 03:35:52 +00:00
Hongming Wang	ba63f76e10	feat(canvas/terminal): not-available banner for runtimes without a TTY Pre-fix TerminalTab tried to open /ws/terminal/<id> for every workspace including external ones (which have no shell endpoint on the workspace-server). The server returned 404, status flipped to "error", the user saw "Connection failed" with a Reconnect button — reading as a bug when really the runtime intentionally has no TTY. Now: when data.runtime is in RUNTIMES_WITHOUT_TERMINAL (currently just "external"), TerminalTab renders a NotAvailablePanel with a big terminal-off icon and a one-line explanation including the runtime name. The xterm + WebSocket dance is skipped entirely — no spurious 404s, no scary error UI, no Reconnect that can't help. The runtime is determined from the data prop now threaded by SidePanel.tsx (existing pattern for ChatTab/ConfigTab/etc). Tests: 4 new in TerminalTab.notAvailable.test.tsx pin: external renders banner with runtime name, external doesn't open WS, claude- code mounts normally (regression cover for the early-return scope), data omitted falls through (back-compat). Build clean. 1258 tests pass.	2026-05-04 20:33:13 -07:00
Hongming Wang	b037d555fa	Merge pull request #2828 from Molecule-AI/docs/abstraction-pattern-1777951500 docs(backends): document Auto-dispatcher SoT pattern + source-level pins (closes #10)	2026-05-05 03:30:56 +00:00
Hongming Wang	62fc25757c	docs(backends): document Auto-dispatcher SoT pattern + source-level pins Closes #10. The 2026-05-05 hongming silent-drop incident shipped because the backends.md parity matrix didn't enforce a "go through the dispatcher" rule — three handlers (TeamHandler.Expand, OrgHandler.createWorkspaceTree, workspace_crud.go's stopAndRemove) silently bypassed routing on SaaS for ~6 months across two distinct verbs. This doc pass: - Adds a "How to dispatch" section that's the canonical answer to "where do I call Start / Stop / Has from?". Names the three dispatchers (provisionWorkspaceAuto, StopWorkspaceAuto, HasProvisioner), their fallbacks, and the allowed exceptions. - Updates the matrix lifecycle rows so every dispatched operation points at the dispatcher source, not the per-backend bodies. - Adds Org-import + Team-collapse rows so the bulk paths are visible to anyone scanning for parity gaps. - Lists the source-level pins (4 of them) under Enforcement so future contributors see them as load-bearing tests, not noise. - Adds a "When you add a NEW dispatch site" section so the next verb (Pause / Hibernate / Snapshot) lands as a dispatcher mirror, not as another bespoke handler that drifts from the existing two. - Refreshes Last audit to 2026-05-05. No code change; doc-only. The SoT abstractions described here landed in PRs #2811 + #2824. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-04 20:25:10 -07:00
Hongming Wang	a345adacad	Merge pull request #2827 from Molecule-AI/feat/e2e-required-on-staging-1777950000 ci: e2e coverage matrix + branch-protection-as-code (closes #9)	2026-05-05 03:24:54 +00:00
Hongming Wang	7cc1c39c49	ci: e2e coverage matrix + branch-protection-as-code Closes #9. Three pieces, all small: 1. docs/e2e-coverage.md — source of truth for which E2E suites guard which surfaces. Today three were running but informational only on staging; that's how the org-import silent-drop bug shipped without a test catching it pre-merge. Now the matrix shows what's required where + a follow-up note for the two suites that need an always-emit refactor before they can be required. 2. tools/branch-protection/apply.sh — branch protection as code. Lets `staging` and `main` required-checks live in a reviewable shell script instead of UI clicks that get lost between admins. This PR's net change: add `E2E API Smoke Test` and `Canvas tabs E2E` as required on staging. Both already use the always-emit path-filter pattern (no-op step emits SUCCESS when the workflow's paths weren't touched), so making them required can't deadlock unrelated PRs. 3. branch-protection-drift.yml — daily cron + drift_check.sh that compares live protection against apply.sh's desired state. Catches out-of-band UI edits before they drift further. Fails the workflow on mismatch; ops re-runs apply.sh or updates the script. Out of scope (filed as follow-ups): - e2e-staging-saas + e2e-staging-external use plain `paths:` filters and never trigger when paths are unchanged. They need refactoring to the always-emit shape (same as e2e-api / e2e-staging-canvas) before they can be required. - main branch protection mirrors staging here; if main wants the E2E SaaS / External added later, do it in apply.sh and rerun. Operator must apply once after merge: bash tools/branch-protection/apply.sh The drift check picks it up from there. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-04 20:21:59 -07:00
Hongming Wang	111c3d2c01	Merge pull request #2825 from Molecule-AI/feat/configtab-drop-skills-tools-section feat(ConfigTab): drop Skills/Tools tag inputs, give Prompt Files its own section	2026-05-05 03:05:10 +00:00
Hongming Wang	46d79a3e3b	Merge pull request #2824 from Molecule-AI/fix/stop-workspace-auto-saas-1777945000 fix(provision): StopWorkspaceAuto mirror — close SaaS EC2-leak class	2026-05-05 03:05:09 +00:00
Hongming Wang	2198f92dcb	Merge pull request #2823 from Molecule-AI/feat/codex-tab-pypi-install feat(external-templates): codex tab uses plain pip install	2026-05-05 03:03:08 +00:00
Hongming Wang	beab899501	feat(ConfigTab): drop Skills/Tools tag inputs, give Prompt Files its own section User feedback (2026-05-04 conversation): > "Skills and Tools are having their own tab as plugin, and Prompt > Files are in the file system which can be directly edited. Am I > missing something?" > "Tools should be merged into plugin then, and for prompt files... it > should be in another section than in skill& tools" The "Skills & Tools" section in ConfigTab had three TagList inputs: - Skills: managed via the dedicated SkillsTab (per-workspace skill folders) — duplicate UI affordance - Tools: managed via the Plugins tab (install a plugin → its tools become available) — duplicate UI affordance - Prompt Files: load order for system-prompt files — semantically unrelated to skills/tools Drop the Skills + Tools inputs. Move Prompt Files into its own section with explanatory copy that names the auto-loaded files (system-prompt.md, CLAUDE.md, AGENTS.md) and points users at the Files tab for actual editing. Schema fields `config.skills` and `config.tools` are KEPT (load-bearing for runtime skill loading + tool registry); only the inline editor goes away. Operators who need to edit them can still use the Raw YAML toggle. Tests: - New ConfigTab.sections.test.tsx with 4 cases: 1. "Skills & Tools" section title is gone 2. Skills tag input is absent 3. Tools tag input is absent 4. Prompt Files section exists with explanatory copy Sibling ConfigTab tests (hermes, provider) all still pass (20/20). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-04 20:02:05 -07:00
Hongming Wang	b851cfc813	Merge pull request #2822 from Molecule-AI/fix/files-eic-ssh-warning fix(files-eic): silence ssh known-hosts warning that 500'd Hermes config load	2026-05-05 03:02:00 +00:00
Hongming Wang	3cb72b1df0	Merge pull request #2821 from Molecule-AI/auto-sync/main-80e4b9ac chore: sync main → staging (auto, ff to `80e4b9ac`)	2026-05-04 20:04:16 -07:00
Hongming Wang	11c9ed2a46	fix(provision): StopWorkspaceAuto mirror — close SaaS EC2-leak class Closes #2813 (team-collapse) and #2814 (workspace delete). Two leaks, one class. Both call sites had the same shape pre-fix: if h.provisioner != nil { h.provisioner.Stop(ctx, wsID) } On SaaS where h.provisioner (Docker) is nil and h.cpProv is set, that gate evaluates false and the EC2 keeps running. Workspace gets marked removed in DB; EC2 lives on until the orphan sweeper catches it. Same drift class as PR #2811's org-import provision bug — a Docker- only check on what should be a both-backend operation. Confirmed in production: PR #2811's verification step deleted a test workspace and the EC2 stayed running until I terminated it manually. Fix: WorkspaceHandler.StopWorkspaceAuto(ctx, wsID) — symmetric mirror of provisionWorkspaceAuto. CP first, Docker second, no-op when neither is wired (a workspace nobody is running can't be stopped — that's a no-op, not a failure, distinct from provision's mark-failed contract). Three call-site changes: - team.go:208 (Collapse) → h.wh.StopWorkspaceAuto(ctx, childID) - workspace_crud.go:432 (stopAndRemove) → h.StopWorkspaceAuto(...); RemoveVolume stays Docker-only behind an explicit gate since CP-managed workspaces have no host-bind volumes - TeamHandler.provisioner field + NewTeamHandler's *Provisioner param removed as dead code (Stop was the only call site) Volume cleanup separation is intentional: the abstraction is "stop the running workload," not "tear down all state." Callers that need volume cleanup keep their `if h.provisioner != nil { RemoveVolume }` gate AFTER the Stop call. Tests: - TestStopWorkspaceAuto_RoutesToCPWhenSet — SaaS path - TestStopWorkspaceAuto_RoutesToDockerWhenOnlyDocker — self-hosted - TestStopWorkspaceAuto_NoBackendIsNoOp — pins the contract distinction from provisionWorkspaceAuto's mark-failed - TestNoCallSiteCallsBareStop — source-level pin against `.provisioner.Stop(` / `.cpProv.Stop(` outside the dispatcher, per-backend bodies, restart helper, and the Docker-daemon-direct short-lived-container path. Strips Go comments before substring match so archaeology in code comments doesn't trip the gate. - Verified: pin FAILS against the buggy shape (workspace_crud.go reversion); team.go reversion compile-fails because the field is gone — even stronger than the test. Out of scope (tracked under #2799): - workspace_restart.go's manual if-cpProv-else dispatch with retry semantics tuned for the restart hot path. Functionally equivalent + wraps cpStopWithRetry, so it's not the bug class this PR closes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-04 20:00:23 -07:00
Hongming Wang	c0bfd19b9e	feat(external-templates): codex tab uses plain pip install for bridge daemon `codex-channel-molecule` 0.1.0 is now on PyPI, so operators no longer need the `git+https://...` URL workaround. Verified: `pip install codex-channel-molecule` from a clean venv installs the wheel and the `codex-channel-molecule --help` console script runs. PyPI: https://pypi.org/project/codex-channel-molecule/0.1.0/ Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-04 19:58:56 -07:00
Hongming Wang	e0f9434eaf	fix(files-eic): silence ssh known-hosts warning that 500'd Hermes config load GET /workspaces/:id/files/config.yaml on hongming.moleculesai.app's Hermes workspace returned 500 with body: ssh cat: exit status 1 (Warning: Permanently added '[127.0.0.1]:37951' (ED25519) to the list of known hosts.) Root cause: ssh emits the "Permanently added" notice on every fresh tunnel connection, even with UserKnownHostsFile=/dev/null (that prevents persistence, not the warning). It lands on stderr, fooling readFileViaEIC's classifier: if len(out) == 0 && stderr.Len() == 0 { return nil, os.ErrNotExist } return nil, fmt.Errorf("ssh cat: %w (%s)", runErr, ...) stderr was non-empty (the warning), so we returned the wrapped error → 500 from the HTTP layer instead of 404. Fix: add `-o LogLevel=ERROR` to BOTH writeFileViaEIC and readFileViaEIC ssh invocations. Silences info+warning while keeping real auth/tunnel errors visible (those emit at ERROR level). Test: TestSSHArgs_LogLevelErrorBothSites pins the flag in both blocks. Mutation-tested: stripping the flag from one site fails the gate. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-04 19:58:49 -07:00
molecule-ai[bot]	80e4b9ac9a	Merge pull request #2820 from Molecule-AI/staging staging → main: auto-promote `daefdd2`	2026-05-04 19:53:08 -07:00

1 2 3 4 5 ...

4260 Commits