molecule-core

Author	SHA1	Message	Date
Hongming Wang	210a26d31a	refactor(workspace): extract memory tools from a2a_tools.py to a2a_tools_memory.py (RFC #2873 iter 4c) Third slice of the a2a_tools.py split (stacked on iter 4b). Owns the two persistent-memory MCP tools: * tool_commit_memory — write to /workspaces/:id/memories with RBAC + GLOBAL-scope tier-zero enforcement * tool_recall_memory — search /workspaces/:id/memories with RBAC a2a_tools.py shrinks from 609 → 508 LOC (−101). Both handlers depend ONLY on a2a_tools_rbac (iter 4a), a2a_client, and the platform's /memories endpoint — no entanglement with delegation or messaging. Side-effects of the layered architecture: a2a_tools_memory's import contract is "depends on a2a_tools_rbac, never on a2a_tools" — the kitchen-sink module is for back-compat re-exports only. A test pins this so a future refactor that re-introduces `from a2a_tools import …` fails in CI. Tests: * 49 patch sites in TestToolCommitMemory + TestToolRecallMemory retargeted from `a2a_tools.{_check_memory_, _is_root_workspace, httpx.AsyncClient}` to `a2a_tools_memory.…` because the call sites moved. test_a2a_tools_memory.py adds 4 new tests (alias drift gate + import-contract + a2a_tools-side re-export). 117 tests total (77 impl + 28 rbac + 8 delegation + 4 memory), all green. Refs RFC #2873.	2026-05-05 09:50:39 -07:00
Hongming Wang	be18b9c8f9	fix(tests): retarget remaining a2a_tools delegation patches to a2a_tools_delegation CI caught two test files I missed in the original iter 4b retarget: test_a2a_multi_workspace.py + test_delegation_sync_via_polling.py patch a2a_tools.{discover_peer, send_a2a_message, _delegate_sync_via_polling, httpx.AsyncClient} but those call sites moved to a2a_tools_delegation in this PR. 17 patch sites retargeted; 30 tests now green. Refs RFC #2873 iter 4b.	2026-05-05 09:50:30 -07:00
Hongming Wang	2227a14b1e	fix(build): add a2a_tools_delegation to TOP_LEVEL_MODULES drift gate Iter 4b's new module needs the rewrite-list entry. Stacked on iter 4a which already added a2a_tools_rbac. Refs RFC #2873 iter 4b.	2026-05-05 05:01:04 -07:00
Hongming Wang	e72f9ad107	refactor(workspace): extract delegation handlers from a2a_tools.py to a2a_tools_delegation.py (RFC #2873 iter 4b) Second slice of the a2a_tools.py split (stacked on iter 4a). Owns the three delegation MCP tools + the RFC #2829 PR-5 sync-via-polling helper they share: * tool_delegate_task — synchronous delegation * tool_delegate_task_async — fire-and-forget * tool_check_task_status — poll the platform's /delegations log * _delegate_sync_via_polling — durable async + poll for terminal status * _SYNC_POLL_INTERVAL_S / _SYNC_POLL_BUDGET_S constants a2a_tools.py shrinks from 915 → 609 LOC (−306). Stacked on iter 4a's RBAC extraction; uses `from a2a_tools_rbac import auth_headers_for_heartbeat` as its auth-header source. The lazy `from a2a_tools import report_activity` inside tool_delegate_task breaks the circular-import cycle (a2a_tools imports the delegation re-exports at module-load; delegation handler needs report_activity at CALL time). A dedicated test pins this contract. Tests: * 77 existing test_a2a_tools_impl.py tests pass after retargeting 20 patch sites in TestToolDelegateTask + TestToolDelegateTaskAsync + TestToolCheckTaskStatus from `a2a_tools.foo` to `a2a_tools_delegation.foo` (foo ∈ {discover_peer, send_a2a_message, httpx.AsyncClient}). The patches need to target the new module because that's where the call sites live now. * test_a2a_tools_delegation.py adds 8 new tests: - 6 alias drift gates (`a2a_tools.tool_delegate_task is …`) - 2 import-contract tests (no top-level circular dep + a2a_tools surfaces every delegation symbol) - 1 sync-poll budget invariant 113 tests total (77 impl + 28 rbac + 8 delegation), all green. Refs RFC #2873.	2026-05-05 05:00:52 -07:00
Hongming Wang	17aec22f9b	fix(build): add a2a_tools_rbac to TOP_LEVEL_MODULES drift gate Iter 4a's new module needs to be in the rewrite list so the wheel ships its imports prefixed correctly. Caught by 'PR-built wheel + import smoke'. Refs RFC #2873 iter 4a.	2026-05-05 05:00:47 -07:00
Hongming Wang	0c461eb9f1	refactor(workspace): extract RBAC helpers from a2a_tools.py to a2a_tools_rbac.py (RFC #2873 iter 4a) First slice of the a2a_tools.py (991 LOC) split — single-concern module for the workspace's RBAC + auth-header layer: * _ROLE_PERMISSIONS canonical table * _get_workspace_tier * _check_memory_write_permission * _check_memory_read_permission * _is_root_workspace * _auth_headers_for_heartbeat a2a_tools.py shrinks from 991 → 915 LOC. Internal call sites (15 references) work unchanged because the bare names are re-imported at module-level — Python's local-then-module name resolution still finds them in a2a_tools's namespace, so existing tests' patch("a2a_tools._foo", …) keeps working. The RBAC layer can now evolve independently of the 18 tool handlers. Adding a new role or capability action touches one file, not the kitchen-sink module. Tests: * 77 existing test_a2a_tools_impl.py pass unchanged. * test_a2a_tools_rbac.py adds 28 focused tests: - 6 alias drift-gate tests (`_foo is rbac.foo`) - 4 get_workspace_tier env+config branches - 2 is_root_workspace tier branches - 6 check_memory_write_permission roles + override branches - 3 check_memory_read_permission scenarios - 3 auth_headers_for_heartbeat platform_auth branches - 4 ROLE_PERMISSIONS table invariants * Direct coverage for the helper module (was previously only exercised through 991-LOC tool-handler tests). Refs RFC #2873.	2026-05-05 04:43:16 -07:00
Hongming Wang	243f9bc2b1	Merge pull request #2877 from Molecule-AI/feat/poll-mode-chat-upload-phase1 feat(rfc): poll-mode chat upload — phase 1 platform staging layer	2026-05-05 11:32:10 +00:00
Hongming Wang	43bf94a07c	fix(chat-uploads): align poll-mode activity rows with inbox poll filter The workspace inbox poller filters `GET /workspaces/:id/activity?type=a2a_receive` — writing rows with `activity_type=chat_upload_receive` would be silently invisible to it. Switch the poll-mode upload-staging handler to write `activity_type=a2a_receive` with `method=chat_upload_receive` as the discriminator. Same shape as A2A's `tasks/send` vs `message/send` method split; the workspace-side handler (Phase 2) routes by `method`, not activity_type. Pinned with `TestPollUpload_ActivityRowDiscriminator` — sqlmock WithArgs on positions 2 (activity_type) and 5 (method) so a refactor that flips activity_type back to a custom value gets a red test instead of a runtime "poller saw nothing" silent break.	2026-05-05 04:29:07 -07:00
Hongming Wang	55f5c0b0ff	Merge pull request #2876 from Molecule-AI/refactor/shell-e2e-tmp-cleanup test(e2e): plug /tmp scratch leaks + add CI lint gate (RFC #2873 iter 2)	2026-05-05 11:24:43 +00:00
Hongming Wang	86fdaad111	feat(rfc): poll-mode chat upload — phase 1 platform staging layer External-runtime workspaces (registered via molecule connect, behind NAT, no public callback URL) currently see HTTP 422 "workspace has no callback URL" on every chat file upload. The only escape is to wrap the laptop in ngrok / Cloudflare tunnel + re-register push-mode — a tax that shouldn't exist for a one-line use case. This phase introduces the platform-side staging layer that lets canvas → external workspace uploads ride the same poll loop the inbox already uses for text messages. Architecture (mirrors inbox poll, SSOT principle): Canvas POST /chat/uploads (multipart) ↓ delivery_mode=poll Platform: chat_files.uploadPollMode ↓ pendinguploads.Storage.Put + LogActivity(chat_upload_receive) Workspace's existing inbox poller picks up the activity row (Phase 2) Workspace fetches: GET /workspaces/:id/pending-uploads/:fid/content Workspace acks: POST /workspaces/:id/pending-uploads/:fid/ack Pieces in this PR: * Migration 20260505100000 — pending_uploads table; partial indexes on unacked + expires_at for the workspace fetch + Phase 3 sweep hot paths. No FK to workspaces (audit retention), 24h hard TTL. * internal/pendinguploads — Storage interface + Postgres impl. Bytes inline (bytea) today; the interface lets a future PR replace with S3 (RFC #2789) by swapping one constructor. 100% test coverage on the Postgres impl via sqlmock-pinned SQL. * handlers.PendingUploadsHandler — GET /content + POST /ack endpoints. wsAuth-gated; cross-workspace bleed protection via per-row workspace_id check (token leak from A can't read B's pending bytes). Handler tests pin happy path + every 4xx/5xx mapping including cross-workspace + race-with-sweep. * chat_files.go — Upload poll-mode branch behind WithPendingUploads builder. Push-mode unchanged (regression-tested). Multipart parse + per-file sanitize + storage.Put + activity_logs row per file. * SanitizeFilename — Go mirror of workspace/internal_chat_uploads.py sanitize_filename. Tests pin parity case-by-case so canvas-emitted URIs stay identical regardless of which path handles the upload. * Comprehensive logging — every state transition (staged, fetch, ack, error) emits a structured log line with workspace_id + file_id + size + sanitized name. Phase 3 metrics will hook these. The pendinguploads.Storage wiring is opt-in (WithPendingUploads on ChatFilesHandler) so a binary deployed without the migration keeps the pre-existing 422 behavior — no boot-order coupling between code roll and schema roll. Phase 2 (separate PR): workspace inbox extension — inbox_uploads.py fetches via the GET endpoint, writes to /workspace/.molecule/chat- uploads/, acks, and rewrites the URI from platform-pending: → workspace: so the agent's existing send-attachments path needs no changes. Phase 3: GC sweep + dashboards. Phase 4: poll-mode E2E on staging. Tests: * 100% coverage on pendinguploads (sqlmock-pinned SQL drift gate). * Functional 100% on new handler code (uncovered branches are documented defensive duplicates: uuid re-parse, multipart Open error, Writer.Write fail — none reproducible in unit tests). * Push-mode + NULL delivery_mode regression tests pin no behavior change for existing workspaces. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-05 04:22:24 -07:00
Hongming Wang	6125700c39	test(e2e): plug /tmp scratch leaks in 3 shell E2E tests + add CI lint gate (RFC #2873 iter 2) Three shell E2E tests created scratch files via `mktemp` but never deleted them on early exit (assertion failure, SIGINT, errexit). Each CI run leaked ~10-100 KB of /tmp into the runner; over ~200 runs/week that's 20+ MB of accumulated cruft. ## Files - test_chat_attachments_e2e.sh — was missing both trap and rm; added per-run TMPDIR_E2E with `trap rm -rf … EXIT INT TERM`. - test_notify_attachments_e2e.sh — had a `cleanup()` for the workspace but didn't include the TMPF; only an unconditional `rm -f` at the bottom (line 233) which doesn't fire on early exit. Extended cleanup() to also rm the scratch + dropped the redundant trailing rm. - test_chat_attachments_multiruntime_e2e.sh — `round_trip()` function had per-call `rm -f` only on the success path; failure paths leaked. Switched to script-level TMPDIR_E2E + trap; per-call rm dropped (the trap handles every return path including SIGINT). Pattern: `mktemp -d -t prefix-XXX` for the dir, `mktemp <full-template>` for files (portable across BSD/macOS + GNU coreutils — `-p` is GNU-only and breaks Mac local-dev runs). ## Regression gate New `tests/e2e/lint_cleanup_traps.sh` asserts every `.sh` that calls `mktemp` also has a `trap … EXIT` line in the file. Wired into the existing Shellcheck (E2E scripts) CI step. Verified locally: passes on the fixed state, fails-loud when one of the 3 fixes is reverted. ## Verification - shellcheck --severity=warning clean on all 4 touched files - lint_cleanup_traps.sh passes on the post-fix tree (6 mktemp users, all have EXIT trap) - Negative test: revert one fix → lint exits 1 with file:line + suggested fix pattern in the error message (CI-grokkable ::error file=… annotation) - Trap fires on SIGTERM mid-run (smoke-tested on macOS BSD mktemp) - Trap fires on `exit 1` (smoke-tested) ## Bars met (7-axis) - SSOT: trap pattern documented in lint message (one rule, one fix) - Cleanup: this IS the cleanup hygiene fix - 100% coverage: lint catches future regressions across all `tests/e2e/.sh` files, not just the 3 fixed today - File-split: N/A (no files split) - Plugin / abstract / modular: N/A (test infra, not product code) Iteration 2 of RFC #2873.	2026-05-05 04:21:26 -07:00
Hongming Wang	89ee8e4d04	Merge pull request #2874 from Molecule-AI/refactor/default-model-for-runtime-ssot refactor(models): consolidate per-runtime model defaults to SSOT (RFC #2873 iter 1)	2026-05-05 11:15:41 +00:00
Hongming Wang	26e2e97006	refactor(models): consolidate per-runtime model defaults to SSOT (RFC #2873 iter 1) Two call sites — workspace_provision.go:537 and org_import.go:54 — duplicated the same `if runtime == "claude-code"` branch deciding the default model when the operator/agent didn't supply one. They were copy-pasted; nothing prevented them from drifting silently. Extract to `models.DefaultModel(runtime string) string`. Both call sites now route through the helper. New runtimes need one entry in DefaultModel + one assertion in TestDefaultModel — pre-fix it required two source edits + an audit. Foundation for the future `RuntimeConfig` interface (RFC #2873 + task #231): once we add `ProvisioningTimeout()`, `CapabilitiesSupported()` etc., the helper expands to per-runtime structs and `DefaultModel` becomes one method on the interface. ## Coverage 15 unit tests pinning the exact contract: - claude-code → "sonnet" - 9 other known runtimes → universal default - empty + unknown → universal default (matches pre-refactor fallthrough) - case-sensitivity preserved (CLAUDE-CODE → universal default) Plus invariant test: `DefaultModel` never returns "" — protects against a future "return early on unknown" regression that would silently break workspace creation. ## Verification - go build ./... clean - 15 model unit tests pass - existing handler tests untouched (no behavior change at call sites) - identical output to pre-refactor for every input First iteration of the OSS-shape refactor program. Each PR meets all 7 bars (plugin/abstract/modular/SSOT/coverage/cleanup/file-split). Refs RFC #2873.	2026-05-05 04:12:37 -07:00
Hongming Wang	ec574f3d4b	Merge pull request #2871 from Molecule-AI/fix/runtime-prbuild-compat-concurrency-event-1777975000 fix(ci): include event_name in runtime-prbuild-compat concurrency group	2026-05-05 11:05:38 +00:00
Hongming Wang	42f2ea3f4f	fix(ci): include event_name in runtime-prbuild-compat concurrency group Every staging push run for the last 4 SHAs was cancelled by the matching pull_request run because both fired into the same concurrency group: group: ${{ github.workflow }}-${{ ...sha }} Same SHA → same group → cancel-in-progress=true means the second arrival cancels the first. Empirically the push run lost the race; staging branch-protection then saw a CANCELLED required check and the auto-promote chain stalled. Fix: include github.event_name in the group key. push and pull_request runs for the same SHA now hash to different groups, both complete, both report SUCCESS to branch protection. Pattern of the bug: 10:46 sha=1e8d7ae1 ev=pull_request conclusion=success 10:46 sha=1e8d7ae1 ev=push conclusion=cancelled 10:45 sha=ecf5f6fb ev=pull_request conclusion=success 10:45 sha=ecf5f6fb ev=push conclusion=cancelled 10:28 sha=471dff25 ev=pull_request conclusion=success 10:28 sha=471dff25 ev=push conclusion=cancelled 10:12 sha=9e678ccd ev=pull_request conclusion=success 10:12 sha=9e678ccd ev=push conclusion=cancelled Same drift class as the 2026-04-28 auto-promote-staging incident (memory: feedback_concurrency_group_per_sha.md) — globally-scoped groups silently cancel runs in matched-SHA scenarios. This is the only workflow in .github/workflows/ that uses the narrow per-sha shape without event_name. Others either don't use concurrency at all, or use ${{ github.ref }} which is event- neutral. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-05 04:01:20 -07:00
Hongming Wang	e0e9201142	Merge pull request #2870 from Molecule-AI/ci/handlers-pg-apply-all-migrations ci(handlers-pg): apply all migrations with skip-on-error + sanity check	2026-05-05 10:51:26 +00:00
Hongming Wang	90d202c80a	ci(handlers-pg): apply all migrations with skip-on-error + sanity check (#320 ) Previous workflow applied only 049_delegations.up.sql — fragile to future migrations that touch the delegations table or any other handlers/-tested table. Operator would have to remember to update the workflow's psql -f line per migration. New behavior: loop every .up.sql in lexicographic order, apply each with ON_ERROR_STOP=1 + per-migration result captured. Failed migrations are SKIPPED rather than blocking the suite — handles the historical migrations (017_memories_fts_namespace, 042_a2a_queue, etc.) that depend on tables since renamed/dropped and can't replay from scratch. Migrations that DO succeed land their tables, which is sufficient for the integration tests in handlers/. Sanity gate at the end: if the delegations table is missing after the replay, hard-fail with a loud error. That catches a real regression where 049 itself becomes broken (e.g., schema rename), separate from the historical-broken-migration noise above. Per-migration log line ("✓" or "⊘ skipped") makes it easy to spot when a migration that SHOULD have replayed didn't. Verified locally: full migration chain runs, 049 lands, all 7 integration tests pass against the chained-migration DB. Closes #320.	2026-05-05 03:48:43 -07:00
Hongming Wang	1e8d7ae17c	Merge pull request #2869 from Molecule-AI/test/rfc2829-tighten-sweeper-assertions test(delegations): tighten integration-test assertions + integrationDB doc	2026-05-05 10:42:31 +00:00
Hongming Wang	ecf5f6fbf3	Merge pull request #2868 from Molecule-AI/feat/org-import-idempotency feat(org-import): make createWorkspaceTree idempotent (#2859, Phase 3 of #2857)	2026-05-05 10:40:55 +00:00
Hongming Wang	fcdf79774d	test(delegations): tighten integration-test assertions + integrationDB doc (#321 ) Three small follow-ups from #2866 self-review: 1. TestIntegration_Sweeper_StaleHeartbeatIsMarkedStuck — assert strings.Contains(errDet, "no heartbeat for") instead of != "". The original "non-empty" check passes for any error_detail value; if a future regression swaps the message format, the test wouldn't catch it. Pin the production format string explicitly. 2. TestIntegration_Sweeper_DeadlineExceededIsMarkedFailed — drop the redundant `last_heartbeat = now()` write. The sweeper checks deadline FIRST (the stronger statement) and short-circuits before evaluating heartbeat staleness, so the heartbeat field is irrelevant for that test path. 3. integrationDB doc comment now warns explicitly that the helper is NOT t.Parallel()-safe — it hot-swaps the package-level mdb.DB and restores via t.Cleanup. If a future contributor adds t.Parallel() to one of these tests they race on the global. Comment makes the constraint discoverable instead of a debugging surprise. All 7 integration tests still pass against real Postgres locally.	2026-05-05 03:39:22 -07:00
Hongming Wang	d6337a1ae9	feat(org-import): make createWorkspaceTree idempotent (Phase 3 of #2857 ) OrgHandler.Import was non-idempotent — every call INSERTed a fresh row for every workspace in the tree, regardless of whether matching workspaces already existed. Calling /org/import twice with the same template duplicated the entire tree. This was the bigger leak source than TeamHandler.Expand (deleted in PR #2856). tenant-hongming accumulated 72 distinct child workspaces in 4 days entirely from repeated org-template spawns of the same template — the (tier × runtime) matrix in the audit data was the template's static shape, multiplied by spawn count. Fix: route through a new lookupExistingChild helper before INSERT. Skip-if-exists semantics by default: - Match on (parent_id, name) using `IS NOT DISTINCT FROM` so NULL parents (root workspaces) are included. - Ignore status='removed' rows so collapsed teams or deleted workspaces don't block re-import. - Recursion still runs on the existing id so partial-match templates (parent exists, some children missing) backfill correctly instead of either no-op'ing the whole subtree or duplicating the existing children. - Result entries for skipped nodes carry skipped:true so callers (canvas Import preflight modal) can surface "5 of 7 already existed, 2 created." The recursion that walked ws.Children is extracted into recurseChildrenForImport so both the create-path and the skip-path share one implementation — no duplicated grid math, no two paths to keep in sync. Note: replace_if_exists semantics (re-roll: stop+delete old, create new) are deferred. Skip-if-exists alone closes the leak; re-roll is a later UX decision for the canvas Import preflight modal. Tests: - 4 sqlmock cases on lookupExistingChild: not-found, found, nil-parent (the IS NOT DISTINCT FROM NULL trick), DB-error propagates (must fail fast — silent fallback to INSERT is the failure mode the helper exists to prevent). - 1 source-level AST gate (per memory feedback_behavior_based_ast_gates.md): pins that h.lookupExistingChild( appears BEFORE INSERT INTO workspaces in org_import.go. If a future refactor reintroduces the un-checked INSERT, the gate fails. Verified load-bearing by removing the call — build fails (helper symbol gone). go vet ./... clean. go test ./internal/handlers/ -count 1 — all green (4.2s, no regression on existing OrgImport / Provision / Team tests). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-05 03:37:49 -07:00
Hongming Wang	471dff25e9	Merge pull request #2866 from Molecule-AI/test/rfc2829-sweeper-integration-coverage test(delegations): extend Postgres integration suite with sweeper coverage	2026-05-05 10:23:50 +00:00
Hongming Wang	3d2a50e2a2	test(delegations): extend integration suite with sweeper coverage (3 tests) Real-Postgres tests for the RFC #2829 PR-3 sweeper. Validates: - Deadline-exceeded rows are marked failed with the expected error_detail - Stale-heartbeat in-flight rows are marked stuck (uses DELEGATION_STUCK_THRESHOLD_S env override for deterministic timing) - Healthy rows (fresh heartbeat + future deadline) are not touched — no false-positive against well-behaved delegations These extend the gate added in the previous commit so the workflow catches sweeper regressions, not just ledger-write ones. All 7 integration tests now pass; CI workflow runs them all.	2026-05-05 03:20:19 -07:00
Hongming Wang	9e678ccd5e	Merge pull request #2863 from Molecule-AI/fix/expand-removal-followup fix(canvas/tests): pin Expand-to-Team absence with literal assertion	2026-05-05 10:07:53 +00:00
Hongming Wang	191ef3be91	fix(canvas/tests): pin Expand-to-Team absence with literal assertion Multi-model review of #2862 caught a non-load-bearing assertion: the test used \`expect(labels).not.toContain(expect.stringMatching(...))\` to claim the "Expand to Team" right-click item is gone. But vitest's toContain uses Object.is/===, so asymmetric matchers like expect.stringMatching are plain objects that never === any string — the assertion silently passed for ANY string array, including arrays that DID contain "Expand to Team". The test would have green-lit the unfixed code. Switch to the literal substring shape the rest of this file already uses (see lines 175/183/254 — labels.some((l) => l.includes(...))). Verified the new assertion is load-bearing: 1. Reintroduced \`{ label: "Expand to Team", ... }\` into the childless-workspace branch of ContextMenu.tsx 2. Ran the test — failed at the new assertion line as expected 3. Reverted the regression — test passes again Net diff: replaces one broken expect with one correct expect + a WHY-comment noting the toContain/asymmetric-matcher gotcha so the next reader (or test writer) doesn't reintroduce the same shape. Per memory feedback_assert_exact_not_substring.md: pin assertions that fail on the old code path; this assertion never fired even on the bug it was written to catch. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-05 03:05:17 -07:00
Hongming Wang	25fd6b021d	Merge pull request #2862 from Molecule-AI/chore/remove-canvas-expand-button chore(canvas): remove Expand-to-Team right-click button (#2858, Phase 2 of #2857)	2026-05-05 09:56:59 +00:00
Hongming Wang	a959feae84	Merge branch 'staging' into chore/remove-canvas-expand-button	2026-05-05 02:52:28 -07:00
Hongming Wang	c661ea4cd3	Merge pull request #2861 from Molecule-AI/fix/rfc2829-result-preview-ordering-and-integration-gate fix(delegations): preserve result_preview + add real-Postgres integration gate	2026-05-05 09:51:30 +00:00
Hongming Wang	49027af419	chore(canvas): remove Expand-to-Team right-click button (#2858 ) Pairs with PR #2856 which removed the backend POST /workspaces/:id/expand route. With the backend gone, the canvas right-click "Expand to Team" button calls a 404. Remove the button and its callback. ContextMenu.tsx: - Delete handleExpand callback (8 lines) - Drop the "Expand to Team" item from the childless-workspace menu array; childless workspaces now only show the regular actions (Extract from Team / Export Bundle / Duplicate / Pause / Restart / Delete). Toolbar.tsx: - Drop "expand," from the right-click help-text shortcut. ContextMenu.keyboard.test.tsx — two new pinning cases: - "'Expand to Team' menu item is gone (childless workspace)" — asserts the label literal is absent + the regular actions (Delete, Restart) are still present. - "'Collapse Team' is still present when the workspace HAS children" — sanity that the parent-with-children menu (Arrange Children / Collapse Team / Zoom to Team) didn't regress. How users create children now: the existing + New Workspace dialog (CreateWorkspaceDialog.tsx) already has a parent picker. No new UI needed — every workspace can be a parent via the regular Create flow with parent_id set. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-05 02:51:13 -07:00
Hongming Wang	4c9f12258d	fix(delegations): preserve result_preview through completion + add real-Postgres integration gate Two-part PR: ## Fix: result_preview was lost on completion Self-review of #2854 caught a real bug. SetStatus has a same-status replay no-op; the order of calls in `executeDelegation` completion + `UpdateStatus` completed branch clobbered the preview field: 1. updateDelegationStatus(completed, "") fires 2. inner recordLedgerStatus(completed, "", "") → SetStatus transitions dispatched → completed with preview="" 3. outer recordLedgerStatus(completed, "", responseText) → SetStatus reads current=completed, status=completed → SAME-STATUS NO-OP, never writes responseText → preview lost Confirmed against real Postgres (see integration test). Strict-sqlmock unit tests passed because they pin SQL shape, not row state. Fix: call the WITH-PREVIEW recordLedgerStatus FIRST, then updateDelegationStatus. The inner call becomes the no-op (correctly preserves the row written by the outer call). Same gap fixed in UpdateStatus handler — body.ResponsePreview was never landing in the ledger because updateDelegationStatus's nested SetStatus(completed, "", "") fired first. ## Gate: real-Postgres integration tests + CI workflow The unit-test-only workflow that shipped #2854 was the root cause. Adding two layers of defense: 1. workspace-server/internal/handlers/delegation_ledger_integration_test.go — `//go:build integration` tag, requires INTEGRATION_DB_URL env var. 4 tests: * ResultPreviewPreservedThroughCompletion (regression gate for the bug above — fires the production call sequence in fixed order and asserts row.result_preview matches) * ResultPreviewBuggyOrderIsLost (DIAGNOSTIC: confirms the same-status no-op contract works as designed; if SetStatus's semantics ever change, this test fires) * FailedTransitionCapturesErrorDetail (failure-path symmetry) * FullLifecycle_QueuedToDispatchedToCompleted (forward-only + happy path) 2. .github/workflows/handlers-postgres-integration.yml — required check on staging branch protection. Spins postgres:15 service container, applies the delegations migration, runs `go test -tags=integration` against the live DB. Always-runs + per-step gating on path filter (handlers/wsauth/migrations) so the required-check name is satisfied on PRs that don't touch relevant code. Local dev workflow (file header documents this): docker run --rm -d --name pg -e POSTGRES_PASSWORD=test -p 55432:5432 postgres:15-alpine psql ... < workspace-server/migrations/049_delegations.up.sql INTEGRATION_DB_URL="postgres://postgres:test@localhost:55432/molecule?sslmode=disable" \ go test -tags=integration ./internal/handlers/ -run "^TestIntegration_" ## Why this matters Per memory `feedback_mandatory_local_e2e_before_ship`: backend PRs MUST verify against real Postgres before claiming done. sqlmock pins SQL shape; only a real DB can verify row state. The workflow makes this gate mandatory rather than optional.	2026-05-05 02:47:52 -07:00
Hongming Wang	da46bdeded	Merge pull request #2826 from Molecule-AI/feat/canvas-chat-lazy-load-history feat(canvas/chat): lazy-load history — 10 newest on mount, 20 per scroll-up batch	2026-05-05 09:44:29 +00:00
Hongming Wang	d890fd9a3f	Merge pull request #2856 from Molecule-AI/chore/remove-team-expand-handler chore(workspace-server): remove TeamHandler.Expand bulk-create handler	2026-05-05 09:42:51 +00:00
Hongming Wang	ec1f21922c	chore(workspace-server): remove TeamHandler.Expand bulk-create handler Every workspace can have children via the regular CreateWorkspace flow with parent_id set, so a separate handler that bulk-creates from config.yaml's sub_workspaces (and was non-idempotent — calling it twice duplicated the team) earned its way out. "Team" is just the state of having children; expanding/collapsing is purely a canvas-side visual action that toggles the `collapsed` column via PATCH. The non-idempotency directly caused tenant-hongming's vCPU starvation: 72 distinct child workspaces accumulated in 4 days, ~14 leaked EC2s (50 of 64 vCPU consumed by stale teams), every Canvas tabs E2E retry flaking on RunInstances VcpuLimitExceeded. What stays: - TeamHandler.Collapse — still useful; stops + removes children via StopWorkspaceAuto. Reachable from the canvas Collapse Team button. (Note: that button currently calls PATCH /workspaces/:id, not the Collapse endpoint — that's a separate reachability question for later.) - findTemplateDirByName helper — kept in team.go pending a relocate decision; no in-package consumers after Expand. - The four other paths that create child workspaces continue to work unchanged: regular POST /workspaces with parent_id, OrgHandler.Import (recursive tree), Bundle import, scripts. What goes: - POST /workspaces/:id/expand route (router.go) - TeamHandler.Expand method (team.go: ~130 lines) - 4 TestTeamExpand_* sqlmock tests (team_test.go) - TestTeamExpand_UsesAutoNotDirectDockerPath AST gate (workspace_provision_auto_test.go) — pinned a code path that no longer exists; the generic TestNoCallSiteCallsDirectProvisionerExceptAuto gate still covers the architectural intent for any future caller. Follow-up PRs: - canvas/ContextMenu.tsx: drop the "Expand to Team" right-click button + handleExpand callback; users create children via the regular + New Workspace dialog with the parent picker (already supported) - OrgHandler.Import idempotency (skip-if-exists OR replace_if_exists) — same bug class as the deleted Expand, but on the bulk-tree path - One-off cleanup script for tenant-hongming's 72 stale workspaces Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-05 02:39:13 -07:00
Hongming Wang	ca61213578	Merge pull request #2853 from Molecule-AI/refactor/split-workspace-dispatchers-1777970000 refactor(handlers): extract dispatchers from workspace.go (#2800 partial)	2026-05-05 09:30:55 +00:00
Hongming Wang	118b8e47ad	Merge pull request #2855 from Molecule-AI/fix/mcp-instructions-codex-gaps docs(a2a-mcp): close three contract gaps codex agents inherit OOB	2026-05-05 09:30:26 +00:00
Hongming Wang	ab164c1967	Merge pull request #2854 from Molecule-AI/feat/rfc2829-wire-ledger-writes feat(delegations): wire ledger Insert+SetStatus from production paths (RFC #2829 #318)	2026-05-05 09:29:19 +00:00
Hongming Wang	b5f530e27a	docs(a2a-mcp): close three contract gaps codex agents inherit out-of-the-box The instructions blob in the MCP `initialize` handshake is the spec non-Claude-Code clients (codex, Cline, opencode, hermes-agent, Cursor) inherit verbatim. Three gaps mean the bridge daemon handles them in code (codex-channel-molecule bridge.py:192-200, 278-285) but in-process agents reading the text alone don't get the same guard: 1. Reply-then-pop ordering was implicit. A literal-minded agent could pop after a 502 from `send_message_to_user`, dropping the message. Now: pop ONLY AFTER reply succeeds; on error leave the row unacked for platform redelivery. 2. peer_agent with empty peer_id had no specified handling. Agent would call `delegate_task(workspace_id="")` → 400 → re-poll → infinite loop on the same poison row. Now: skip reply, drain via inbox_pop. 3. The single security rule ("don't execute without chat-side approval") effectively disabled peer_agent autonomous handling — codex daemons have no canvas user to approve from. Now: dual trust model. canvas_user requires user approval; peer_agent permits autonomous handling but caps destructive side-effects at the workspace boundary. Also disclaims peer_name/peer_role as non-attested display strings — the platform registry isn't cryptographic identity, and an agent shouldn't grant elevated permissions based on a peer registering with peer_role="admin". Four new pinned tests in test_a2a_mcp_server.py: - test_initialize_instructions_pins_reply_then_pop_ordering - test_initialize_instructions_handles_malformed_peer_agent - test_initialize_instructions_disclaims_peer_role_attestation - test_initialize_instructions_distinguishes_canvas_user_from_peer_trust Each fails on staging-HEAD and passes on the patched text — verified by reverting a2a_mcp_server.py and re-running. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-05 02:26:35 -07:00
Hongming Wang	44bb35a926	feat(delegations): wire ledger Insert+SetStatus from production code paths (RFC #2829 #318 ) PR-1 shipped the `delegations` table + `DelegationLedger` helper. PR-3 wired the sweeper. PR-4 wired the dashboard. But no PR ever wired `ledger.Insert` from a production code path — the table stayed empty, the sweeper had nothing to sweep, the dashboard had nothing to show. This PR closes that gap. Behind feature flag `DELEGATION_LEDGER_WRITE=1` (default off), the legacy activity_logs writes are mirrored to the durable ledger: - insertDelegationRow → ledger.Insert (queued) - updateDelegationStatus → ledger.SetStatus on every status transition - executeDelegation completion path → ledger.SetStatus(completed, result_preview) for the result preview that activity_logs already stores in response_body - Record handler → ledger.Insert + ledger.SetStatus(dispatched) so agent-initiated delegations land in the same table ## Why a flag The legacy flow has ~30 strict-sqlmock tests pinning exactly which SQL statements fire per handler. Adding ledger writes always-on would force adding ExpectExec stanzas to each. Flag-off keeps all 30 green without churn; flag-on lets operators populate the table in staging to feed the sweeper + dashboard once the agent-side cutover (RFC #2829 PR-5) has proven the round-trip end-to-end. Default off → byte-identical to pre-#318 behavior. ## Status vocabulary mapping activity_logs uses a freer status vocabulary than the ledger's CHECK constraint allows. updateDelegationStatus is called with values like "received" that the ledger doesn't accept; the wiring filters via a switch to only forward known-good values, skipping anything else. Record's first activity_logs row is `dispatched` but the ledger's Insert path requires `queued` as initial state. Insert as queued first; the very next SetStatus(..., dispatched) promotes it on the same row. ## Coverage 8 wiring tests (delegation_ledger_writes_test.go): - flag off → no SQL fired (rollout safety contract) - flag on → INSERT + UPDATE fire as expected - flag rejects loose truthy values (true/yes/0/on/TRUE) — only "1" is the on signal, matching PR-2 + PR-5 conventions - terminal-state replay swallows ErrInvalidTransition (legacy is authoritative; ledger replay error is not a delegation failure) All 30 existing delegation_test.go tests still pass — flag default off keeps the strict-sqlmock surface unchanged. Refs RFC #2829.	2026-05-05 02:26:06 -07:00
Hongming Wang	024ef260db	refactor(handlers): extract dispatchers from workspace.go (#2800 partial) workspace.go was 950 lines after the dispatcher work in PRs #2811 + #2824 + #2843 + #2846 + #2847 + #2848 + #2850. This extracts the 6 SoT dispatcher helpers into a new workspace_dispatchers.go so the file is the architectural unit it deserves to be (one place for "how do we route a workspace lifecycle verb to a backend?"). Moved (no body changes — pure cut + paste with imports): - HasProvisioner (gate accessor) - provisionWorkspaceAuto (async provision) - provisionWorkspaceAutoSync (sync provision, runRestartCycle's path) - StopWorkspaceAuto (stop dispatcher) - RestartWorkspaceAuto (restart wrapper) - RestartWorkspaceAutoOpts (restart with resetClaudeSession) workspace.go shrinks from 950 → 735 lines and now holds: - WorkspaceHandler struct + constructor - SetCPProvisioner / SetEnvMutators - Create / List / Get / scanWorkspaceRow - HTTP handler glue workspace_dispatchers.go is 255 lines and holds the dispatcher trio + sync variant + gate accessor + a header docblock summarizing the history (PRs that added each helper) and the source-level pin tests that gate against drift. Source-level pin tests updated: - TestNoCallSiteCallsDirectProvisionerExceptAuto: workspace_dispatchers.go added to allowlist (the dispatcher IS the place that calls per-backend bodies directly). - TestNoCallSiteCallsBareStop: same. - TestNoBareBothNilCheck / TestOrgImportGate_UsesHasProvisionerNotBareField: no change — they were source-pinning specific files, not all callers. Build clean, vet clean, full test suite passes (1742 / 0 in workspace, all Go test packages green). Out of scope (#2800 has more): - workspace_provision.go (869 lines) split into Docker + CP halves — files would still be 400+ each, marginal value. Defer until a third backend lands and the symmetry breaks. - Splitting Create / List / Get into per-handler files — they're short and tightly coupled to the struct; keep co-located. Closes #2800 partial. Filing a follow-up issue if/when workspace.go or workspace_provision.go grows past 800 lines again. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-05 02:24:49 -07:00
Hongming Wang	d175d0c4c1	Merge branch 'staging' into feat/canvas-chat-lazy-load-history	2026-05-05 02:22:38 -07:00
Hongming Wang	d21ac991c1	Merge pull request #2852 from Molecule-AI/feat/external-rotate-credentials feat(external): credential rotation + re-show instruction modal (#319)	2026-05-05 09:01:44 +00:00
Hongming Wang	c85783fbee	docs(workspace): point recovery hint at /external/rotate (not the never-shipped /tokens) Self-review of #2852: the inline comment on the IssueToken-failed branch still referenced POST /workspaces/:id/tokens, which never shipped. The recovery path that did ship in #2852 is POST /workspaces/:id/external/rotate. Update the hint so the next operator who hits this failure mode finds the right endpoint.	2026-05-05 01:58:43 -07:00
Hongming Wang	b375252dc8	feat(external): credential rotation + re-show instruction modal (#319 ) External workspaces (runtime=external) lose their workspace_auth_token the moment the create modal closes — the token is unrecoverable from any later DB read. Operators who lost their copy or want to respond to a suspected leak had no recovery path short of recreating the workspace (which also breaks cross-workspace delegation links + memory namespace). This PR adds two endpoints + a Config-tab section that surfaces them: POST /workspaces/:id/external/rotate Revokes any prior live tokens, mints a fresh one, returns the same ExternalConnectionInfo payload Create returns. Old credentials stop working immediately — the previously-paired agent will fail auth on its next heartbeat (~20s). GET /workspaces/:id/external/connection Returns the connect block with auth_token="". For the operator who just needs to re-find PLATFORM_URL / WORKSPACE_ID / one of the snippets without invalidating the live agent. Both reject runtime ≠ external with 400 + a hint pointing at /restart for non-external runtimes (which mints AND injects into the container). ## Why a flag isn't needed The endpoints are purely additive — Create's behavior is unchanged. Existing external workspaces don't see anything different until an operator clicks the new buttons. ## DRY refactor Extracted BuildExternalConnectionPayload() in external_connection.go as the single source of truth for the connect payload shape. Create, Rotate, and GetExternalConnection all call it. Adds a snippet once → all three endpoints emit it. Trims trailing slash on platform_url so no double-slash sneaks into registry_endpoint. ## Canvas ExternalConnectionSection mounts in ConfigTab when runtime=external. Two buttons: - "Show connection info" (cosmetic) — fetches GET /external/connection - "Rotate credentials" (destructive) — confirm dialog explains the impact, then POST /external/rotate Both reuse the existing ExternalConnectModal so operators don't learn a second snippet UX. ## Coverage 10 Go tests: - Rotate happy path (revoke + mint order, payload shape, broadcast event) - Rotate refuses non-external runtimes (400 with restart hint) - Rotate 404 on unknown workspace + 400 on empty id - GetExternalConnection happy path (auth_token="", same payload shape) - GetExternalConnection refuses non-external + 404 on unknown - BuildExternalConnectionPayload — placeholder substitution + trailing slash trimming + blank-token contract 6 canvas tests: - both action buttons render - "Show" calls GET /external/connection and opens modal - "Rotate" opens confirm dialog before firing POST - Cancel dismisses without rotating - Confirm POSTs and opens modal with returned token - API failures surface as visible error chips Migration: existing external workspaces gain new abilities; no data migration. The DRY refactor preserves byte-identical Create response shape (8 ConfigTab tests + all existing handler tests still pass). Closes #319.	2026-05-05 01:55:27 -07:00
Hongming Wang	3d226a2c68	Merge pull request #2851 from Molecule-AI/feat/peer-metadata-cache-evict-2482-1777967000 perf(a2a): bound + LRU-evict _peer_metadata cache (#2482)	2026-05-05 08:41:32 +00:00
Hongming Wang	da6d319c48	perf(a2a): bound + LRU-evict _peer_metadata cache (#2482 ) Pre-fix _peer_metadata was an unbounded dict — a workspace receiving from N distinct peers across its lifetime accumulated entries indefinitely (~100 bytes × N). Not crash-class at typical scale (10K peers ≈ 1 MB) but unbounded. The TTL-at-read pattern bounded staleness but did nothing for memory. Fix: hand-rolled LRU on top of OrderedDict. No new dependency. - _PEER_METADATA_MAXSIZE = 1024 (issue's recommended bound) - _peer_metadata_get(canon) — read + LRU touch (move to MRU) - _peer_metadata_set(canon, value) — write + evict-if-over-maxsize - All production reads/writes route through the helpers - _peer_metadata_lock guards the OrderedDict ops so concurrent background-enrichment workers (#2484) don't race the LRU invariant Why hand-rolled vs cachetools: - No new dep. workspace/ has 0 cache libraries today; adding one for ~30 lines is negative leverage. - The TTL is enforced at the call site (existing pattern); only the size cap + LRU is new. cachetools.TTLCache fuses the two, which would force a refactor of every caller's TTL check. - The size + lock are simple enough that a future swap-in of cachetools is mechanical if needs evolve. Why maxsize matters more than ttl (issue's framing): A runaway poller that touches new peer_ids every push would still grow within a single TTL window — TTL eviction only fires at read time. The size cap fires immediately on insert, regardless of read pattern. Three new tests: - test_peer_metadata_set_evicts_lru_when_at_maxsize - test_peer_metadata_get_promotes_to_lru_head - test_peer_metadata_set_replaces_existing_entry_in_place 1742 passed / 0 failed locally (78 new + 1664 existing). Closes #2482. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-05 01:39:07 -07:00
Hongming Wang	76e9656a7b	Merge pull request #2850 from Molecule-AI/feat/enrich-off-poller-2484-1777965000 perf(a2a): move enrichment GET off the inbox poller thread (#2484)	2026-05-05 08:30:20 +00:00
Hongming Wang	35017c5452	perf(a2a): move enrichment GET off the inbox poller thread (#2484 ) The inbox poller's notification callback called the synchronous enrich_peer_metadata on every push, blocking the poller for up to 2s × N uncached peers per poll batch. Push delivery latency was gated on registry RTT — exactly what PR #2471's negative-cache patch was trying to avoid amplifying. Fix: cache-first nonblocking path with a tiny background worker pool. enrich_peer_metadata_nonblocking(peer_id): - Cache hit (fresh, within TTL): return cached record immediately - Cache miss / stale: return None, schedule background fetch via ThreadPoolExecutor The first push from a new peer arrives metadata-light (bare peer_id); the next push within the 5-min TTL hits the warm cache and gets full name/role. Acceptable trade-off because the channel-envelope enrichment is a UX nicety, not a correctness invariant — and the cold-cache window per peer is bounded to one push. Defenses: - In-flight gate (_enrich_in_flight) — N concurrent pushes for the same uncached peer schedule exactly ONE worker, not N. Without this, a chatty peer's first burst of pushes would amplify into parallel registry GETs — the exact DoS-on-self pattern the negative cache was meant to rate-limit. - Lazy executor init — most test fixtures + short-lived CLI invocations never need it; only the long-running molecule-mcp path actually fires background work. - Daemon-style threads via thread_name_prefix; executor never blocks process exit. Tests: - test_enrich_peer_metadata_nonblocking_cache_hit_returns_immediately - test_enrich_peer_metadata_nonblocking_cache_miss_schedules_fetch - test_enrich_peer_metadata_nonblocking_coalesces_duplicate_pushes - test_enrich_peer_metadata_nonblocking_invalid_peer_id_returns_none Plus updates to the existing test_envelope_enrichment_* suite that asserted synchronous behavior — they now drain the in-flight set via _wait_for_enrichment_inflight_for_testing before checking cache state. Existing synchronous enrich_peer_metadata is unchanged — Phase B (#2790) schema↔dispatcher drift gate + the negative-cache contract from PR #2471 still apply. The nonblocking variant is purely additive. 1739 passed, 0 failed locally. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-05 01:24:42 -07:00
Hongming Wang	d10c1a1a36	Merge pull request #2848 from Molecule-AI/feat/2799-phase3-pause-1777961500 feat(handlers): migrate Pause loop to StopWorkspaceAuto — #2799 Phase 3 (closes #2799)	2026-05-05 07:03:31 +00:00
Hongming Wang	61b7755c3c	feat(handlers): migrate Pause loop to StopWorkspaceAuto — #2799 Phase 3 Last open #2799 site. Pause's per-workspace stop call now routes through StopWorkspaceAuto, removing the final inline if-cpProv-else (actually if-h.provisioner) dispatch from workspace_restart.go's restart/pause/resume code paths. Pre-2026-05-05 the Pause loop was: if h.provisioner != nil { h.provisioner.Stop(ctx, ws.id) } Same drift class as #2813 (team-collapse leak) + #2814 (workspace delete leak) — Docker-only stop silently no-ops on SaaS, leaving the EC2 running while the workspace row gets marked paused. Orphan sweeper would catch it eventually but the leak window is real. Pause-specific bookkeeping (mark paused, clear workspace keys, broadcast WORKSPACE_PAUSED) stays inline in the handler; only the "stop the running workload" step delegates. StopWorkspaceAuto's no-backend → no-op semantics match the pre-fix behavior on misconfigured deployments (the bookkeeping still runs). One new source-level pin: TestPauseHandler_UsesStopWorkspaceAuto — gates regression to the inline dispatch shape. This closes #2799 Phase 3. After this PR + #2847 (Phase 2 PR-B) land, workspace_restart.go has no remaining inline if-cpProv-else dispatch in any user-facing code path. The remaining direct backend calls inside the file are in stopForRestart and cpStopWithRetry — both internal helpers that ARE the dispatcher's underlying primitives, not new bypasses. Note: scope was originally tagged "Phase 3 needs PauseWorkspaceAuto verb" in the audit on PR #2843. On closer reading Pause's stop step is identical to Stop — only the bookkeeping is Pause-specific. Reusing StopWorkspaceAuto avoids unnecessary surface and keeps the dispatcher trio (provision/stop/restart) tight. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-05 00:00:16 -07:00
Hongming Wang	21a7e7b0e7	Merge pull request #2847 from Molecule-AI/feat/2799-phase2b-runrestart-cycle-1777960000 feat(handlers): provisionWorkspaceAutoSync + Site 4 migration — #2799 Phase 2 PR-B	2026-05-05 06:53:18 +00:00

1 2 3 4 5 ...

4310 Commits