molecule-core

Author	SHA1	Message	Date
Hongming Wang	c8fca1467e	Merge pull request #2915 from Molecule-AI/fix/harness-disable-memory-plugin fix(harness): disable memory-plugin sidecar in harness tenants	2026-05-05 18:45:43 +00:00
Hongming Wang	7c8b81c6eb	fix(harness): disable memory-plugin sidecar in harness tenants PR #2906 bundled memory-plugin-postgres as a startup-gated sidecar in both tenant entrypoints. Plugin migrations include \`CREATE EXTENSION IF NOT EXISTS vector\` which fails on the harness's plain postgres:15-alpine (no pgvector preinstalled). The 30s health gate then aborts container boot and Harness Replays fails. Detected on auto-promote PR #2914 — Harness Replays job: Container harness-tenant-alpha-1 Error Container harness-tenant-beta-1 Error dependency failed to start: container harness-tenant-alpha-1 exited (1) The harness doesn't exercise memory features, so the simplest fix is to use the documented escape hatch the sidecar entrypoint already ships (MEMORY_PLUGIN_DISABLE=1) — applied to both alpha and beta tenants in compose.yml. Alternative would be switching the harness postgres images to pgvector/pgvector:pg15, deferred until the harness wants to verify memory paths. Refs PR #2906. Unblocks #2914 (auto-promote staging→main).	2026-05-05 11:42:20 -07:00
Hongming Wang	fc1c45789e	Merge pull request #2912 from Molecule-AI/feat/saas-default-hardening-2910 feat(saas): close 4th default-tier site + lift org_import asymmetry + tests (#2910)	2026-05-05 18:42:19 +00:00
Hongming Wang	e3a18ed8e8	Merge pull request #2911 from Molecule-AI/fix/memory-plugin-bind-loopback fix(memory-plugin): bind to 127.0.0.1 by default	2026-05-05 18:38:35 +00:00
Hongming Wang	9f551319d2	feat(saas): close 4th default-tier site + lift org_import asymmetry + tests (#2910 ) Multi-model retrospective review of #2901 found three Critical gaps: 1. (#2910 PR-B) template_import.go:79 wrote `tier: 3` hardcoded into generated config.yaml. On SaaS this defeated the T4 default at the create-handler layer — a config-less template import landed at T3 regardless of POST /workspaces' computed default. The 4th default-tier site #2901 missed. 2. (#2910 PR-A) #2901 claimed `go test ... all green` but added zero new tests. Existing structural-pin tests caught dispatch-layer drift but said nothing about tier-default drift. A future refactor that flips DefaultTier() to always return 3 would ship green. 3. (#2910 PR-E) org_import.go fallback returned T2 on self-hosted while workspace.go returned T3. Internally consistent ("bulk vs interactive defaults") but undocumented same-name-different-value drift. Fix: - TemplatesHandler.NewTemplatesHandler now takes `wh *WorkspaceHandler` (nil-tolerant for read-only callers). Import + ReplaceFiles compute tier via h.wh.DefaultTier() and pass it to generateDefaultConfig. generateDefaultConfig gets a `tier int` parameter (bounds-checked, invalid input falls back to T3). - org_import.go fallback lifts to h.workspace.DefaultTier() — single source of truth shared with Create + Templates so a future tier-default change sweeps every entry point at once. - New saas_default_tier_test.go pinning: TestIsSaaS_TrueWhenCPProvWired TestIsSaaS_FalseWhenOnlyDocker TestDefaultTier_SaaS_IsT4 TestDefaultTier_SelfHosted_IsT3 TestGenerateDefaultConfig_RespectsTierParam TestGenerateDefaultConfig_SelfHostedTierT3 TestGenerateDefaultConfig_OutOfRangeFallsBackToT3 - Existing template_import_test.go tests + chat_files_test.go + security_regression_test.go updated to thread the new tier param / wh constructor arg through their NewTemplatesHandler calls. Their pre-#2910 assertion of `tier: 3` is preserved (now passes because the test caller passes `3` explicitly), so no regression. go vet ./... clean. go test ./internal/handlers/ -count 1 — all green (4.2s). Deferred to separate follow-ups (per #2910 plan): - PR-C: MOLECULE_DEPLOYMENT_MODE explicit deployment-mode signal (closes the IsSaaS()=cpProv!=nil structural fragility) - PR-D: Host iptables IMDS block + IMDSv2 hop-limit (paired with molecule-controlplane EC2-IAM-scope audit) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-05 11:38:22 -07:00
Hongming Wang	1052f8bdb0	fix(memory-plugin): bind to 127.0.0.1 by default Self-review of PR #2906 flagged: defaultListenAddr was ":9100" — binds on every container interface. Inside today's deployment that's moot (no host port mapping, platform talks over loopback) but it's not least-privilege. A future Dockerfile edit that publishes the port, a misconfigured Fly machine, or a future cross-host plugin topology would expose an unauth'd memory store. Loopback is the right baseline. Operators with a multi-host topology already override via MEMORY_PLUGIN_LISTEN_ADDR — that path is unchanged. Tests: * TestLoadConfig_DefaultListenAddrIsLoopback pins the new default. * TestLoadConfig_ListenAddrEnvOverride pins the override path so operators relying on it don't break. * TestLoadConfig_MissingDatabaseURL covers the existing fail-fast. No prior unit tests existed for loadConfig — boot_e2e_test.go always sets MEMORY_PLUGIN_LISTEN_ADDR explicitly, so the default was never exercised by tests. This PR adds that coverage. Refs RFC #2728. Hardening follow-up to PR #2906.	2026-05-05 11:35:24 -07:00
Hongming Wang	5334d60de4	Merge pull request #2898 from Molecule-AI/2867-workspaces-insert-allowlist test(handlers): allowlist INSERT INTO workspaces sites (#2867 class 1)	2026-05-05 18:18:19 +00:00
Hongming Wang	d6c0227e3f	Merge pull request #2906 from Molecule-AI/feat/memory-plugin-sidecar-bundle feat(memory-v2): bundle memory-plugin-postgres as in-image sidecar	2026-05-05 18:16:57 +00:00
Hongming Wang	27db090d3d	Merge pull request #2907 from Molecule-AI/feat/poll-mode-chat-upload-phase5a feat(poll-upload): phase 5a — atomic batch insert + acked-index + mime hardening	2026-05-05 11:16:56 -07:00
Hongming Wang	0f25f6de97	test(handlers): allowlist INSERT INTO workspaces sites — close bulk-create regression class (#2867 class 1) Adds TestINSERTworkspacesAllowlist: walks every non-test .go in this package, finds funcs containing an `INSERT INTO workspaces (` SQL literal, and pins the result against an explicit allowlist with the safety mechanism named per entry. New entries fail the build until a reviewer adds them — forcing the question "what makes this INSERT idempotent?" at PR-review time, not after the next bulk-create leak (the shape that produced 72 stale child workspaces in tenant-hongming over 4 days). Pairs with TestCreateWorkspaceTree_CallsLookupBeforeInsert (the behavior pin for the one bulk path today). Together: - this test catches "did a new function start inserting?" - that test catches "did the existing bulk path drop its idempotency check?" Both fire immediately when drift happens. Current allowlist (3 entries): - org_import.go:createWorkspaceTree → lookup-then-insert via lookupExistingChild (#2868 phase 3, also pinned by the sibling AST gate from #2895) - registry.go:Register → ON CONFLICT (id) DO UPDATE (idempotent by primary key — external workspace upsert) - workspace.go:Create → single-workspace POST /workspaces, server- generated UUID, no iteration Verified via mutation: dropping a synthetic tempBulkLeakTest with an unsafe loop+INSERT into the package fails the gate with a clear diagnostic pointing at the file + function. Restoring the tree returns the gate to green. Memory: feedback_assert_exact_not_substring.md (verify tightened test FAILS on bug shape) — mutation proof done locally. RFC #2867 class 1. Class 2 (Prometheus gauge for ec2_instance duplicates) + class 3 (structured logging on workspace create) are follow-up PRs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-05 11:15:16 -07:00
Hongming Wang	9991057ad1	feat(poll-upload): phase 5a — atomic batch insert + acked-index + mime hardening Resolves four of six findings from the retrospective code review of Phases 1–4 (poll-mode chat upload). Bundled because every change is in the platform's pending_uploads layer or the multi-file handler that reads it. Findings resolved: 1. Important — Sweep query lacked an index for the acked-retention OR-arm. The Phase 1 partial indexes are both `WHERE acked_at IS NULL`, so the `(acked_at IS NOT NULL AND acked_at < retention)` half of the WHERE clause seq-scanned the table on every cycle. Add a complementary partial index on `acked_at WHERE acked_at IS NOT NULL` so both arms of the disjunction are index-covered. Disjoint from the existing two indexes (no row matches both predicates), so write amplification is bounded to ~one index entry per terminal-state row. 2. Important — uploadPollMode partial-failure left orphans. The previous per-file Put loop committed rows 1..K-1 and then errored on row K with no compensation, so a client retry would double-insert the survivors. Refactor the handler into three explicit phases (pre-validate + read-into-memory, single atomic PutBatch, per-file activity row) and add Storage.PutBatch with all-or-nothing transaction semantics. 3. FYI — pendinguploads.StartSweeperWithInterval was exported only for tests. Move it to lower-case startSweeperWithInterval and expose the test seam through pendinguploads/export_test.go (Go convention; the shim file is stripped from the production binary at build time). 4. Nit — multipart Content-Type was passed verbatim into pending_uploads rows and re-served on /content. Add safeMimetype which strips parameters, rejects CR/LF/control bytes, and coerces malformed shapes to application/octet-stream. The eventual GET /content response can no longer be header-split via a crafted Content-Type on the multipart. Comprehensive tests: - 10 PutBatch unit tests (sqlmock): happy path, empty input, all four pre-validation rejection paths, BeginTx error, per-row error + Rollback (no Commit), first-row error, Commit error. - 4 new PutBatch integration tests (real Postgres): all-rows-commit happy path with COUNT(*) verification, atomic-rollback no-leak via a NUL-byte filename that lib/pq rejects mid-batch, oversize short-circuit no-Tx, idx_pending_uploads_acked existence + partial predicate via pg_indexes (planner-shape-independent). - 3 new chat_files_poll tests: atomic rollback on second-file oversize, atomic rollback on PutBatch error, mimetype CRLF/NUL/parameter sanitization (8 sub-cases). The two remaining review findings (inbox_uploads.fetch_and_stage blocks the poll loop synchronously; two httpx Clients per row) are Python-side and ship in Phase 5b once this lands on staging. Test-only export pattern via export_test.go, atomic pre-validation discipline (validate before Tx), and behavior-based (not name-based) test assertions follow the standing project conventions.	2026-05-05 11:10:13 -07:00
Hongming Wang	b89a49ec93	feat(memory-v2): bundle memory-plugin-postgres as in-image sidecar Closes the gap between the merged Memory v2 code (PR #2757 wired the client into main.go) and operator activation. Without this PR an operator wanting to flip MEMORY_V2_CUTOVER=true had to provision a separate memory-plugin service and point MEMORY_PLUGIN_URL at it — extra ops surface for what the design intends to be a built-in. What ships: * Both Dockerfile + Dockerfile.tenant build the cmd/memory-plugin-postgres binary into /memory-plugin. * Entrypoints spawn the plugin in the background on :9100 BEFORE starting the main server; wait up to 30s for /v1/health to return 200; abort boot loud if it doesn't (better to crash-loop than to silently route cutover traffic against a dead plugin). * Default env: MEMORY_PLUGIN_DATABASE_URL=$DATABASE_URL (share the existing tenant Postgres — plugin's `memory_namespaces` / `memory_records` tables coexist with platform schema, no conflicts), MEMORY_PLUGIN_LISTEN_ADDR=:9100. * MEMORY_PLUGIN_DISABLE=1 escape hatch for operators running the plugin externally on a separate host. * Platform image: plugin runs as the `platform` user (not root) via su-exec — matches the privilege boundary the main server already drops to. Tenant image already starts as `canvas` so the plugin inherits non-root automatically. What stays operator-controlled: * MEMORY_V2_CUTOVER is NOT auto-set. Behavior change for existing deployments: zero. The wiring at workspace-server/internal/memory/ wiring/wiring.go skips building the plugin client until the operator opts in, so the running sidecar is a no-op for traffic until then. * MEMORY_PLUGIN_URL is NOT auto-set either, for the same reason — setting it implies cutover-active intent. Operators set both on staging first, verify a live commit/recall round-trip (closes pending task #292), then promote to production. Operator activation steps after this PR ships: 1. Verify pgvector extension is available on the target Postgres (the plugin's first migration runs CREATE EXTENSION IF NOT EXISTS vector). Railway's managed Postgres ships pgvector available; some self-hosted operators may need to enable it. 2. Redeploy the workspace-server with this image. 3. Set MEMORY_PLUGIN_URL=http://localhost:9100 + MEMORY_V2_CUTOVER=true in the environment (staging first). 4. Watch boot logs for "memory-plugin: ✅ sidecar healthy" and the wiring.go cutover messages; do a live commit_memory + recall_memory round-trip via the canvas Memory tab to verify. 5. Promote to production once staging holds for a sweep window. Refs RFC #2728. Closes the dormant-plugin gap noted in task #294.	2026-05-05 11:10:11 -07:00
Hongming Wang	f5613bf099	Merge pull request #2902 from Molecule-AI/fix-pendinguploads-sweeper-test-race test(pendinguploads): close cycleDone-vs-metric-record race in sweeper tests	2026-05-05 18:02:21 +00:00
Hongming Wang	9bd2a2c45f	Merge pull request #2903 from Molecule-AI/fix/chat-tab-initial-scroll-bottom fix(canvas/chat): instant-scroll to bottom on first mount	2026-05-05 17:50:42 +00:00
Hongming Wang	a489ee1a7c	fix(canvas/chat): instant-scroll to bottom on first mount Reported: "right now when chat box opens it opens in the middle, but it should be at the end of conversation." Root cause: ChatTab.tsx:548 fires `bottomRef.scrollIntoView({ behavior: "smooth" })` on every messages-update. On initial mount with N messages already loaded, the smooth-scroll triggers a ~300ms animation that any concurrent React re-render (agent push landing, theme toggle, sidepanel resize) interrupts mid-flight, leaving the user stuck somewhere in the middle of the conversation. Fix: track first-mount via hasInitialScrollRef. Use behavior:"instant" for the initial jump (deterministic, no animation interruption), then smooth for subsequent appends (the new-message-landing visual stays). Refs flipped on first messages.length > 0 transition, so: - Initial open of chat tab: instant jump to bottom ✓ - New agent message arrives: smooth scroll into view ✓ - Workspace switch (ChatTab remounts): fresh hasInitialScrollRef, gets instant again ✓ - loadOlder prepend: anchor-restore path unchanged, still pins user's reading position ✓ Test plan: - pnpm test --run ChatTab.lazyHistory.test.tsx → 8 pass (existing lazy-history tests untouched) - npx tsc --noEmit clean - Manual on hongming.moleculesai.app: open a busy chat (mac laptop, ~50 messages), confirm view lands at the latest bubble, not mid- scroll. Switch to another workspace + back → instant again. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-05 10:47:32 -07:00
Hongming Wang	c79ba05ed5	test(pendinguploads): close cycleDone-vs-metric-record race in sweeper tests TestStartSweeper_RecordsMetricsOnError flaked on every CI rerun under race detection: `error counter delta = 0, want 1`. Root cause is a race between two goroutines, not a bug in the production sweeper. The fake `fakeSweepStorage.Sweep` signals `cycleDone` from inside its deferred return — that happens BEFORE Sweep's return value is received by `sweepOnce`, which is what triggers the metric increment. On slow CI hosts the test goroutine wins the read after `waitForCycle` unblocks and BEFORE StartSweeper's goroutine has called `metrics.PendingUploadsSweepError`, so the asserted delta is 0 even though the metric WILL be 1 a few ms later. Adds a polling assert helper, `waitForMetricDelta`, that closes the race deterministically without timing-based sleeps: - TestStartSweeper_RecordsMetricsOnError uses waitForMetricDelta to wait for the error counter to settle at 1. - TestStartSweeper_RecordsMetricsOnSuccess uses it on the success counters (acked, expired) so the error-stayed-zero assertion reads after StartSweeper has fully processed the cycle. - waitForCycle keeps its current shape but documents the caveat in its comment so future tests don't repeat the assumption. Verified: `go test ./internal/pendinguploads/ -race -count 5` passes all 9 tests across 5 iterations cleanly. Per memory feedback_question_test_when_unexpected.md: the "delta=0, want=1" failure looked like a real production bug at first glance, but instrumented inspection showed the metric DOES increment, just AFTER the test's read. The fix is the test's wait shape, not the sweeper. Unblocks every PR currently broken by this flake (#2898 hit it on two consecutive CI runs; staging-merged PRs from earlier today (#2877/#2881/#2885/#2886) introduced the test). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-05 10:46:17 -07:00
Hongming Wang	6470e5f41b	Merge pull request #2887 from Molecule-AI/refactor/a2a-tools-delegation-extract-rfc2873-iter4b refactor(workspace): extract delegation handlers from a2a_tools.py (RFC #2873 iter 4b)	2026-05-05 17:40:40 +00:00
Hongming Wang	aa560c0314	Merge pull request #2901 from Molecule-AI/feat/saas-default-t4 feat(saas): default new workspaces to T4 on SaaS, T3 self-hosted	2026-05-05 17:34:08 +00:00
Hongming Wang	7644e82f2f	feat(saas): default new workspaces to T4 on SaaS, T3 self-hosted User reported every SaaS workspace defaults to T2 (Standard). Three sites quietly disagreed on the default: - canvas CreateWorkspaceDialog (line 126): isSaaS ? 4 : 3 ← only correct one - canvas EmptyState "Create blank": tier: 2 ← hardcoded - workspace.go POST /workspaces: tier = 3 ← not SaaS-aware - org_import.go createWorkspaceTree: tier = 2 (fallback)← not SaaS-aware So a user clicking "+ New Workspace" via the dialog got T4 on SaaS, but a user clicking "Create blank" on the empty canvas got T2, and an agent POSTing /workspaces directly got T3. Same tenant, three different tiers depending on entry point. Fix: 1. WorkspaceHandler.IsSaaS() and DefaultTier() helpers (workspace_dispatchers.go). IsSaaS() := h.cpProv != nil — single source of truth for "are we SaaS" across the file. DefaultTier() returns 4 on SaaS, 3 on self-hosted. SaaS rationale: each workspace runs on its own sibling EC2 so the per-workspace tier boundary is a Docker resource limit on the only container present — no neighbour to protect from. T4 matches the boundary. 2. workspace.go now defaults tier via h.DefaultTier() instead of hardcoded T3. 3. org_import.go fallback (when neither ws.tier nor defaults.tier set) becomes SaaS-aware: T4 on SaaS, T2 on self-hosted (preserve the existing safe-shared-Docker-daemon default for self-hosted org imports). 4. canvas EmptyState "Create blank" stops sending tier:2 in the body and lets the backend pick — single source of truth in the backend. Eliminates the third disagreement. Test plan: - go vet ./... clean - go test ./internal/handlers/ -count 1 — all green (4.3s) - npx tsc --noEmit on canvas — clean - Staging E2E (after deploy): create a fresh workspace via canvas empty-state on hongming.moleculesai.app, confirm tier=4 on the workspace details panel. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-05 10:30:22 -07:00
Hongming Wang	abba16beb4	Merge pull request #2883 from Molecule-AI/refactor/a2a-tools-rbac-extract-rfc2873-iter4a refactor(workspace): extract RBAC helpers from a2a_tools.py (RFC #2873 iter 4a)	2026-05-05 16:59:36 +00:00
Hongming Wang	9c752e0673	Merge pull request #2879 from Molecule-AI/refactor/mcp-cli-split-rfc2873-iter3 refactor(workspace): split mcp_cli.py into focused modules (RFC #2873 iter 3)	2026-05-05 16:58:05 +00:00
Hongming Wang	be18b9c8f9	fix(tests): retarget remaining a2a_tools delegation patches to a2a_tools_delegation CI caught two test files I missed in the original iter 4b retarget: test_a2a_multi_workspace.py + test_delegation_sync_via_polling.py patch a2a_tools.{discover_peer, send_a2a_message, _delegate_sync_via_polling, httpx.AsyncClient} but those call sites moved to a2a_tools_delegation in this PR. 17 patch sites retargeted; 30 tests now green. Refs RFC #2873 iter 4b.	2026-05-05 09:50:30 -07:00
Hongming Wang	2cb1b26512	Merge pull request #2895 from Molecule-AI/2872-workspaces-unique-parent-name test(org-import): tighten idempotency gate AST → discriminate workspaces vs lookalikes (#2872 Imp-1)	2026-05-05 16:03:26 +00:00
Hongming Wang	48d1945269	test(org-import): tighten AST gate to discriminate workspaces vs lookalikes (#2872 Imp-1) The previous TestCreateWorkspaceTree_CallsLookupBeforeInsert used bytes.Index("INSERT INTO workspaces"), which prefix-matches INSERT INTO workspaces_audit, INSERT INTO workspace_secrets, and INSERT INTO workspace_channels. RFC #2872 cited this as a silent false-pass mode: a future refactor that adds an audit-table INSERT literal earlier in source than the real workspaces INSERT would make the gate point at the wrong target. Replaces the byte-search with a go/ast walk + a regex that requires `\s\(` after `workspaces` — distinguishes the real target from prefix lookalikes. Adds three discriminating tests: - TestWorkspacesInsertRE_RejectsLookalikes — pins the regex against 9 sql shapes (real, raw-string-literal, audit-shadow, workspace_ prefixes, canvas_layouts, UPDATE/SELECT, comments). - TestGate_FailsWhenLookupAfterInsert — synthesizes Go source where the lookup is positioned AFTER the workspaces INSERT, asserts the helper returns lookupPos > insertPos (which the production gate flags via t.Errorf). Proves the gate isn't vestigial. - TestGate_IgnoresAuditTableShadow — synthesizes source with an audit-table INSERT BEFORE the lookup + real INSERT, asserts the tightened regex correctly walks past the shadow and finds the real INSERT. Also extracts findLookupAndWorkspacesInsertPos as a helper so the gate logic can be exercised against synthetic source, not only against the real org_import.go. Memory: feedback_assert_exact_not_substring.md (verify tightened test FAILS on old code) — TestGate_FailsWhenLookupAfterInsert is the failing-on-bug-shape proof. Closes the silent-false-pass mode of #2872 Important-1. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-05 08:32:56 -07:00
Hongming Wang	bbec4cfcfb	Merge pull request #2886 from Molecule-AI/feat/poll-mode-chat-upload-phase4 test(rfc): poll-mode chat upload — phase 4 real-Postgres integration	2026-05-05 12:13:17 +00:00
Hongming Wang	e50799bc29	test(rfc): poll-mode chat upload — phase 4 real-Postgres integration Phase 4 closes out the rollout — strict-sqlmock unit tests pin which SQL fires, but they cannot detect bugs that depend on the actual row state after the SQL runs. Real-Postgres integration tests catch: - the Sweep CTE depends on Postgres' make_interval function and the table's CHECK constraints; sqlmock would happily accept a hand-written SQL literal that Postgres rejects at runtime. - the partial idx_pending_uploads_unacked index only catches a wrong WHERE predicate at real-query-plan time. - subtle predicate drift (e.g. a WHERE clause that filters by acked_at IS NOT NULL but uses BETWEEN incorrectly). Test cases: - PutGetAckRoundTrip: the full happy path — Put, Get, MarkFetched, Ack, idempotent re-Ack, Get-after-Ack returns ErrNotFound. - Sweep_DeletesAckedAfterRetention: row not eligible at retention=1h immediately after Ack; deleted at retention=0. - Sweep_DeletesExpiredUnacked: backdated expires_at exercises the unacked-and-expired branch of the WHERE clause. - Sweep_DeletesBothCategoriesInOneCycle: three rows (acked, expired, fresh); a single Sweep deletes the first two and leaves the third. - PutEnforcesSizeCap: ErrTooLarge above MaxFileBytes. - GetIgnoresExpiredAndAcked: Get filters predicate matches expected row state in the table. Run path: - locally via the file-header docker incantation. - CI runs on every PR/push that touches handlers/ OR migrations/ (.github/workflows/handlers-postgres-integration.yml).	2026-05-05 05:04:41 -07:00
Hongming Wang	07839580a0	Merge pull request #2885 from Molecule-AI/feat/poll-mode-chat-upload-phase3 feat(rfc): poll-mode chat upload — phase 3 GC sweep + observability	2026-05-05 05:04:19 -07:00
Hongming Wang	2227a14b1e	fix(build): add a2a_tools_delegation to TOP_LEVEL_MODULES drift gate Iter 4b's new module needs the rewrite-list entry. Stacked on iter 4a which already added a2a_tools_rbac. Refs RFC #2873 iter 4b.	2026-05-05 05:01:04 -07:00
Hongming Wang	e72f9ad107	refactor(workspace): extract delegation handlers from a2a_tools.py to a2a_tools_delegation.py (RFC #2873 iter 4b) Second slice of the a2a_tools.py split (stacked on iter 4a). Owns the three delegation MCP tools + the RFC #2829 PR-5 sync-via-polling helper they share: * tool_delegate_task — synchronous delegation * tool_delegate_task_async — fire-and-forget * tool_check_task_status — poll the platform's /delegations log * _delegate_sync_via_polling — durable async + poll for terminal status * _SYNC_POLL_INTERVAL_S / _SYNC_POLL_BUDGET_S constants a2a_tools.py shrinks from 915 → 609 LOC (−306). Stacked on iter 4a's RBAC extraction; uses `from a2a_tools_rbac import auth_headers_for_heartbeat` as its auth-header source. The lazy `from a2a_tools import report_activity` inside tool_delegate_task breaks the circular-import cycle (a2a_tools imports the delegation re-exports at module-load; delegation handler needs report_activity at CALL time). A dedicated test pins this contract. Tests: * 77 existing test_a2a_tools_impl.py tests pass after retargeting 20 patch sites in TestToolDelegateTask + TestToolDelegateTaskAsync + TestToolCheckTaskStatus from `a2a_tools.foo` to `a2a_tools_delegation.foo` (foo ∈ {discover_peer, send_a2a_message, httpx.AsyncClient}). The patches need to target the new module because that's where the call sites live now. * test_a2a_tools_delegation.py adds 8 new tests: - 6 alias drift gates (`a2a_tools.tool_delegate_task is …`) - 2 import-contract tests (no top-level circular dep + a2a_tools surfaces every delegation symbol) - 1 sync-poll budget invariant 113 tests total (77 impl + 28 rbac + 8 delegation), all green. Refs RFC #2873.	2026-05-05 05:00:52 -07:00
Hongming Wang	17aec22f9b	fix(build): add a2a_tools_rbac to TOP_LEVEL_MODULES drift gate Iter 4a's new module needs to be in the rewrite list so the wheel ships its imports prefixed correctly. Caught by 'PR-built wheel + import smoke'. Refs RFC #2873 iter 4a.	2026-05-05 05:00:47 -07:00
Hongming Wang	8388144098	fix(build): add iter-3 mcp_* modules to TOP_LEVEL_MODULES drift gate The iter-3 split created mcp_heartbeat / mcp_inbox_pollers / mcp_workspace_resolver but the wheel build's drift-gate check at scripts/build_runtime_package.py:TOP_LEVEL_MODULES wasn't updated. Without this fix the wheel ships those modules un-rewritten, so their imports of platform_auth / configs_dir / etc. break at runtime. Caught by the 'PR-built wheel + import smoke' check. Refs RFC #2873 iter 3.	2026-05-05 05:00:29 -07:00
Hongming Wang	a327d207da	feat(rfc): poll-mode chat upload — phase 3 GC sweep + observability Phase 3 of the poll-mode chat upload rollout. Stack atop Phase 2. The platform's pending_uploads table grows once-per-uploaded-file with no built-in cleanup. Phase 1's hard TTL (expires_at default 24h) makes expired rows un-fetchable but doesn't actually delete them; Phase 1's ack stamps acked_at but leaves the row indefinitely. Without a sweep the table grows unbounded across normal traffic. This PR adds: - `Storage.Sweep(ctx, ackRetention)` — a single round-trip CTE that deletes acked rows past their retention window plus unacked rows past expires_at. Returns `(acked, expired)` deletion counts so Phase 3 dashboards can spot the stuck-fetch pattern (high expired, low acked) vs healthy churn. - `pendinguploads.StartSweeper(ctx, storage, ackRetention)` — background goroutine that calls Sweep every 5 minutes (default). Runs once immediately on startup so a platform restart cleans up any rows that became eligible while we were down. - Prometheus counters `molecule_pending_uploads_swept_total` with `outcome={acked,expired,error}` labels. Wired into the existing `/metrics` endpoint. - Wired from cmd/server/main.go via supervised.RunWithRecover — one transient panic doesn't take the platform down with it. Defaults: - SweepInterval = 5m (matches the dashboard refresh cadence) - DefaultAckRetention = 1h (gives the workspace at-least-once retry headroom in case it processed but failed to write the file before crashing) Test coverage: 100% on storage_test.go (extended with sweepSQL pin + six Sweep test cases including negative-retention clamp + zero-retention immediate-delete + DB error wrapping) and sweeper_test.go (ticker-driven + ctx-cancel + nil-storage + transient-error-doesn't-crash + metric counter assertions). Closes the third of four phases tracked on the parent RFC; phase 4 is the staging E2E test.	2026-05-05 05:00:13 -07:00
Hongming Wang	529c3f3922	Merge pull request #2884 from Molecule-AI/feat/phantom-busy-reset-metric-2865-1777976000 feat(metrics): add molecule_phantom_busy_resets_total counter (#2865)	2026-05-05 11:50:28 +00:00
Hongming Wang	c778b62202	feat(metrics): add molecule_phantom_busy_resets_total counter (#2865 ) Closes #2865 (split-B of the #2669 root-cause stack). The phantom-busy sweep in workspace-server/internal/scheduler/scheduler.go already logs each row reset, but no aggregate metric surfaces "how often is this firing." A regression that causes high reset rates (e.g. controlplane#481's missing env vars, or future drift in the workspace runtime's task-lifecycle accounting) only surfaces when users complain. Fix: counter exposed at /metrics as molecule_phantom_busy_resets_total, incremented from sweepPhantomBusy after each row whose active_tasks was reset. Same shape as existing molecule_websocket_connections_active. Operator-side dashboard: alert when daily phantom-busy reset count > 0.5% of active workspaces. Today's steady-state is near-zero; any increase is a regression signal. Tests: - TestTrackPhantomBusyReset_IncrementsCounter - TestTrackPhantomBusyReset_RaceFreeUnderConcurrentWrites (50×200 concurrent writes; tests atomic invariant) - TestHandler_ExposesPhantomBusyResetsCounter (asserts HELP + TYPE + value lines in Prometheus text format) - TestHandler_PhantomBusyResetsZeroByDefault (fresh-process 0 contract — prevents a future refactor from accidentally dropping the metric from /metrics) Race-detector clean. Vet clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-05 04:45:24 -07:00
Hongming Wang	d80bffe3e3	Merge pull request #2881 from Molecule-AI/feat/poll-mode-chat-upload-phase2 feat(rfc): poll-mode chat upload — phase 2 workspace inbox extension	2026-05-05 11:44:36 +00:00
Hongming Wang	0c461eb9f1	refactor(workspace): extract RBAC helpers from a2a_tools.py to a2a_tools_rbac.py (RFC #2873 iter 4a) First slice of the a2a_tools.py (991 LOC) split — single-concern module for the workspace's RBAC + auth-header layer: * _ROLE_PERMISSIONS canonical table * _get_workspace_tier * _check_memory_write_permission * _check_memory_read_permission * _is_root_workspace * _auth_headers_for_heartbeat a2a_tools.py shrinks from 991 → 915 LOC. Internal call sites (15 references) work unchanged because the bare names are re-imported at module-level — Python's local-then-module name resolution still finds them in a2a_tools's namespace, so existing tests' patch("a2a_tools._foo", …) keeps working. The RBAC layer can now evolve independently of the 18 tool handlers. Adding a new role or capability action touches one file, not the kitchen-sink module. Tests: * 77 existing test_a2a_tools_impl.py pass unchanged. * test_a2a_tools_rbac.py adds 28 focused tests: - 6 alias drift-gate tests (`_foo is rbac.foo`) - 4 get_workspace_tier env+config branches - 2 is_root_workspace tier branches - 6 check_memory_write_permission roles + override branches - 3 check_memory_read_permission scenarios - 3 auth_headers_for_heartbeat platform_auth branches - 4 ROLE_PERMISSIONS table invariants * Direct coverage for the helper module (was previously only exercised through 991-LOC tool-handler tests). Refs RFC #2873.	2026-05-05 04:43:16 -07:00
Hongming Wang	86015412eb	build(runtime): register inbox_uploads in TOP_LEVEL_MODULES The drift gate in build_runtime_package.py rejects any workspace/*.py module not listed in TOP_LEVEL_MODULES — it would ship un-rewritten and break wheel imports. Add inbox_uploads (introduced in this PR) to the list.	2026-05-05 04:41:07 -07:00
Hongming Wang	f81813f708	feat(rfc): poll-mode chat upload — phase 2 workspace inbox extension Workspace-side fetcher for the platform-staged chat uploads written by phase 1. Stack atop feat/poll-mode-chat-upload-phase1. Wire shape — the platform writes one activity_logs row per uploaded file with `activity_type=a2a_receive`, `method=chat_upload_receive`, and a `request_body={file_id, name, mimeType, size, uri}` carrying the synthetic `platform-pending:<wsid>/<fid>` URI. Workspace-side flow (new module workspace/inbox_uploads.py): 1. Fetch via GET /workspaces/:id/pending-uploads/:file_id/content 2. Stage to /workspace/.molecule/chat-uploads/<32-hex>-<sanitized> (same on-disk shape as internal_chat_uploads.py — agent-side URI resolvers see no contract change) 3. POST /workspaces/:id/pending-uploads/:file_id/ack 4. Cache `platform-pending: → workspace:` so the eventual chat message that REFERENCES the upload (separate, later activity row) gets URI-rewritten before the agent sees it. Inbox poller extension (workspace/inbox.py): - is_chat_upload_row(row) discriminator on `method` - upload-receive rows trigger fetch_and_stage and are NOT enqueued as InboxMessages (they're side-effect rows, not chat messages) - cursor advances past them regardless of fetch outcome — a permanent /content failure must not stall the cursor and block real chat traffic - message_from_activity calls rewrite_request_body to swap platform-pending: URIs to local workspace: URIs in subsequent chat messages' file parts. Cache miss leaves the URI untouched so the agent surfaces an unresolvable URI rather than the inbox silently dropping the part. Filename sanitization mirrors workspace-server/internal/handlers /chat_files.go::SanitizeFilename and workspace/internal_chat_uploads .py::sanitize_filename — pinned by the existing parity test suites. Coverage: 100% on inbox_uploads.py; the inbox.py extension is fully covered by three new tests in test_inbox.py (skip-from-queue, cursor-advance-past-broken-fetch, URI-rewrite ordering).	2026-05-05 04:39:02 -07:00
Hongming Wang	28ef75d25e	refactor(workspace): split mcp_cli.py (626 LOC) into focused modules (RFC #2873 iter 3) Splits the standalone molecule-mcp wrapper into three single-concern modules per the OSS-shape refactor program: * mcp_heartbeat.py — register POST + heartbeat loop + auth-failure escalation + inbound-secret persistence * mcp_workspace_resolver.py — single + multi-workspace env validation + on-disk token-file read + operator-help printer * mcp_inbox_pollers.py — activate inbox singleton + spawn one daemon poller per workspace mcp_cli.py becomes a 193-LOC orchestrator: validates env, calls each module's helpers, hands off to a2a_mcp_server.cli_main. The console- script entry molecule-mcp = molecule_runtime.mcp_cli:main is preserved. Back-compat aliases (mcp_cli._build_agent_card, _heartbeat_loop, _resolve_workspaces, etc.) re-export the new modules' authoritative functions so existing tests + wheel_smoke.py + any downstream caller keeps working unchanged. A new test file pins each alias as the exact same callable (drift gate via `is`). Tests: * 62 existing test_mcp_cli.py + test_mcp_cli_multi_workspace.py pass against the split. * Two heartbeat-loop persist tests + the auth-escalation caplog setup updated to target mcp_heartbeat (the module where the loop body now lives) instead of mcp_cli (still works through aliases for direct calls, but Python's name resolution inside the loop body uses the new module's namespace). * test_mcp_cli_split.py adds 11 new tests: alias drift gate + inbox-poller single + multi-workspace branches + degraded inbox-import logging path (none of those existed before). Refs RFC #2873.	2026-05-05 04:33:06 -07:00
Hongming Wang	243f9bc2b1	Merge pull request #2877 from Molecule-AI/feat/poll-mode-chat-upload-phase1 feat(rfc): poll-mode chat upload — phase 1 platform staging layer	2026-05-05 11:32:10 +00:00
Hongming Wang	43bf94a07c	fix(chat-uploads): align poll-mode activity rows with inbox poll filter The workspace inbox poller filters `GET /workspaces/:id/activity?type=a2a_receive` — writing rows with `activity_type=chat_upload_receive` would be silently invisible to it. Switch the poll-mode upload-staging handler to write `activity_type=a2a_receive` with `method=chat_upload_receive` as the discriminator. Same shape as A2A's `tasks/send` vs `message/send` method split; the workspace-side handler (Phase 2) routes by `method`, not activity_type. Pinned with `TestPollUpload_ActivityRowDiscriminator` — sqlmock WithArgs on positions 2 (activity_type) and 5 (method) so a refactor that flips activity_type back to a custom value gets a red test instead of a runtime "poller saw nothing" silent break.	2026-05-05 04:29:07 -07:00
Hongming Wang	55f5c0b0ff	Merge pull request #2876 from Molecule-AI/refactor/shell-e2e-tmp-cleanup test(e2e): plug /tmp scratch leaks + add CI lint gate (RFC #2873 iter 2)	2026-05-05 11:24:43 +00:00
Hongming Wang	86fdaad111	feat(rfc): poll-mode chat upload — phase 1 platform staging layer External-runtime workspaces (registered via molecule connect, behind NAT, no public callback URL) currently see HTTP 422 "workspace has no callback URL" on every chat file upload. The only escape is to wrap the laptop in ngrok / Cloudflare tunnel + re-register push-mode — a tax that shouldn't exist for a one-line use case. This phase introduces the platform-side staging layer that lets canvas → external workspace uploads ride the same poll loop the inbox already uses for text messages. Architecture (mirrors inbox poll, SSOT principle): Canvas POST /chat/uploads (multipart) ↓ delivery_mode=poll Platform: chat_files.uploadPollMode ↓ pendinguploads.Storage.Put + LogActivity(chat_upload_receive) Workspace's existing inbox poller picks up the activity row (Phase 2) Workspace fetches: GET /workspaces/:id/pending-uploads/:fid/content Workspace acks: POST /workspaces/:id/pending-uploads/:fid/ack Pieces in this PR: * Migration 20260505100000 — pending_uploads table; partial indexes on unacked + expires_at for the workspace fetch + Phase 3 sweep hot paths. No FK to workspaces (audit retention), 24h hard TTL. * internal/pendinguploads — Storage interface + Postgres impl. Bytes inline (bytea) today; the interface lets a future PR replace with S3 (RFC #2789) by swapping one constructor. 100% test coverage on the Postgres impl via sqlmock-pinned SQL. * handlers.PendingUploadsHandler — GET /content + POST /ack endpoints. wsAuth-gated; cross-workspace bleed protection via per-row workspace_id check (token leak from A can't read B's pending bytes). Handler tests pin happy path + every 4xx/5xx mapping including cross-workspace + race-with-sweep. * chat_files.go — Upload poll-mode branch behind WithPendingUploads builder. Push-mode unchanged (regression-tested). Multipart parse + per-file sanitize + storage.Put + activity_logs row per file. * SanitizeFilename — Go mirror of workspace/internal_chat_uploads.py sanitize_filename. Tests pin parity case-by-case so canvas-emitted URIs stay identical regardless of which path handles the upload. * Comprehensive logging — every state transition (staged, fetch, ack, error) emits a structured log line with workspace_id + file_id + size + sanitized name. Phase 3 metrics will hook these. The pendinguploads.Storage wiring is opt-in (WithPendingUploads on ChatFilesHandler) so a binary deployed without the migration keeps the pre-existing 422 behavior — no boot-order coupling between code roll and schema roll. Phase 2 (separate PR): workspace inbox extension — inbox_uploads.py fetches via the GET endpoint, writes to /workspace/.molecule/chat- uploads/, acks, and rewrites the URI from platform-pending: → workspace: so the agent's existing send-attachments path needs no changes. Phase 3: GC sweep + dashboards. Phase 4: poll-mode E2E on staging. Tests: * 100% coverage on pendinguploads (sqlmock-pinned SQL drift gate). * Functional 100% on new handler code (uncovered branches are documented defensive duplicates: uuid re-parse, multipart Open error, Writer.Write fail — none reproducible in unit tests). * Push-mode + NULL delivery_mode regression tests pin no behavior change for existing workspaces. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-05 04:22:24 -07:00
Hongming Wang	6125700c39	test(e2e): plug /tmp scratch leaks in 3 shell E2E tests + add CI lint gate (RFC #2873 iter 2) Three shell E2E tests created scratch files via `mktemp` but never deleted them on early exit (assertion failure, SIGINT, errexit). Each CI run leaked ~10-100 KB of /tmp into the runner; over ~200 runs/week that's 20+ MB of accumulated cruft. ## Files - test_chat_attachments_e2e.sh — was missing both trap and rm; added per-run TMPDIR_E2E with `trap rm -rf … EXIT INT TERM`. - test_notify_attachments_e2e.sh — had a `cleanup()` for the workspace but didn't include the TMPF; only an unconditional `rm -f` at the bottom (line 233) which doesn't fire on early exit. Extended cleanup() to also rm the scratch + dropped the redundant trailing rm. - test_chat_attachments_multiruntime_e2e.sh — `round_trip()` function had per-call `rm -f` only on the success path; failure paths leaked. Switched to script-level TMPDIR_E2E + trap; per-call rm dropped (the trap handles every return path including SIGINT). Pattern: `mktemp -d -t prefix-XXX` for the dir, `mktemp <full-template>` for files (portable across BSD/macOS + GNU coreutils — `-p` is GNU-only and breaks Mac local-dev runs). ## Regression gate New `tests/e2e/lint_cleanup_traps.sh` asserts every `.sh` that calls `mktemp` also has a `trap … EXIT` line in the file. Wired into the existing Shellcheck (E2E scripts) CI step. Verified locally: passes on the fixed state, fails-loud when one of the 3 fixes is reverted. ## Verification - shellcheck --severity=warning clean on all 4 touched files - lint_cleanup_traps.sh passes on the post-fix tree (6 mktemp users, all have EXIT trap) - Negative test: revert one fix → lint exits 1 with file:line + suggested fix pattern in the error message (CI-grokkable ::error file=… annotation) - Trap fires on SIGTERM mid-run (smoke-tested on macOS BSD mktemp) - Trap fires on `exit 1` (smoke-tested) ## Bars met (7-axis) - SSOT: trap pattern documented in lint message (one rule, one fix) - Cleanup: this IS the cleanup hygiene fix - 100% coverage: lint catches future regressions across all `tests/e2e/.sh` files, not just the 3 fixed today - File-split: N/A (no files split) - Plugin / abstract / modular: N/A (test infra, not product code) Iteration 2 of RFC #2873.	2026-05-05 04:21:26 -07:00
Hongming Wang	89ee8e4d04	Merge pull request #2874 from Molecule-AI/refactor/default-model-for-runtime-ssot refactor(models): consolidate per-runtime model defaults to SSOT (RFC #2873 iter 1)	2026-05-05 11:15:41 +00:00
Hongming Wang	26e2e97006	refactor(models): consolidate per-runtime model defaults to SSOT (RFC #2873 iter 1) Two call sites — workspace_provision.go:537 and org_import.go:54 — duplicated the same `if runtime == "claude-code"` branch deciding the default model when the operator/agent didn't supply one. They were copy-pasted; nothing prevented them from drifting silently. Extract to `models.DefaultModel(runtime string) string`. Both call sites now route through the helper. New runtimes need one entry in DefaultModel + one assertion in TestDefaultModel — pre-fix it required two source edits + an audit. Foundation for the future `RuntimeConfig` interface (RFC #2873 + task #231): once we add `ProvisioningTimeout()`, `CapabilitiesSupported()` etc., the helper expands to per-runtime structs and `DefaultModel` becomes one method on the interface. ## Coverage 15 unit tests pinning the exact contract: - claude-code → "sonnet" - 9 other known runtimes → universal default - empty + unknown → universal default (matches pre-refactor fallthrough) - case-sensitivity preserved (CLAUDE-CODE → universal default) Plus invariant test: `DefaultModel` never returns "" — protects against a future "return early on unknown" regression that would silently break workspace creation. ## Verification - go build ./... clean - 15 model unit tests pass - existing handler tests untouched (no behavior change at call sites) - identical output to pre-refactor for every input First iteration of the OSS-shape refactor program. Each PR meets all 7 bars (plugin/abstract/modular/SSOT/coverage/cleanup/file-split). Refs RFC #2873.	2026-05-05 04:12:37 -07:00
Hongming Wang	ec574f3d4b	Merge pull request #2871 from Molecule-AI/fix/runtime-prbuild-compat-concurrency-event-1777975000 fix(ci): include event_name in runtime-prbuild-compat concurrency group	2026-05-05 11:05:38 +00:00
Hongming Wang	42f2ea3f4f	fix(ci): include event_name in runtime-prbuild-compat concurrency group Every staging push run for the last 4 SHAs was cancelled by the matching pull_request run because both fired into the same concurrency group: group: ${{ github.workflow }}-${{ ...sha }} Same SHA → same group → cancel-in-progress=true means the second arrival cancels the first. Empirically the push run lost the race; staging branch-protection then saw a CANCELLED required check and the auto-promote chain stalled. Fix: include github.event_name in the group key. push and pull_request runs for the same SHA now hash to different groups, both complete, both report SUCCESS to branch protection. Pattern of the bug: 10:46 sha=1e8d7ae1 ev=pull_request conclusion=success 10:46 sha=1e8d7ae1 ev=push conclusion=cancelled 10:45 sha=ecf5f6fb ev=pull_request conclusion=success 10:45 sha=ecf5f6fb ev=push conclusion=cancelled 10:28 sha=471dff25 ev=pull_request conclusion=success 10:28 sha=471dff25 ev=push conclusion=cancelled 10:12 sha=9e678ccd ev=pull_request conclusion=success 10:12 sha=9e678ccd ev=push conclusion=cancelled Same drift class as the 2026-04-28 auto-promote-staging incident (memory: feedback_concurrency_group_per_sha.md) — globally-scoped groups silently cancel runs in matched-SHA scenarios. This is the only workflow in .github/workflows/ that uses the narrow per-sha shape without event_name. Others either don't use concurrency at all, or use ${{ github.ref }} which is event- neutral. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-05 04:01:20 -07:00
Hongming Wang	e0e9201142	Merge pull request #2870 from Molecule-AI/ci/handlers-pg-apply-all-migrations ci(handlers-pg): apply all migrations with skip-on-error + sanity check	2026-05-05 10:51:26 +00:00
Hongming Wang	90d202c80a	ci(handlers-pg): apply all migrations with skip-on-error + sanity check (#320 ) Previous workflow applied only 049_delegations.up.sql — fragile to future migrations that touch the delegations table or any other handlers/-tested table. Operator would have to remember to update the workflow's psql -f line per migration. New behavior: loop every .up.sql in lexicographic order, apply each with ON_ERROR_STOP=1 + per-migration result captured. Failed migrations are SKIPPED rather than blocking the suite — handles the historical migrations (017_memories_fts_namespace, 042_a2a_queue, etc.) that depend on tables since renamed/dropped and can't replay from scratch. Migrations that DO succeed land their tables, which is sufficient for the integration tests in handlers/. Sanity gate at the end: if the delegations table is missing after the replay, hard-fail with a loud error. That catches a real regression where 049 itself becomes broken (e.g., schema rename), separate from the historical-broken-migration noise above. Per-migration log line ("✓" or "⊘ skipped") makes it easy to spot when a migration that SHOULD have replayed didn't. Verified locally: full migration chain runs, 049 lands, all 7 integration tests pass against the chained-migration DB. Closes #320.	2026-05-05 03:48:43 -07:00

1 2 3 4 5 ...

4343 Commits