molecule-core

Author	SHA1	Message	Date
Hongming Wang	0ca4e431c1	test(e2e): add poll-mode chat upload E2E and wire into e2e-api.yml Covers the user-visible flow that Phase 1-5b shipped (RFC #2891): register a poll-mode workspace, POST a multi-file /chat/uploads, verify the activity feed shows one chat_upload_receive row per file, fetch the bytes via /pending-uploads/:fid/content, ack each row, and confirm a post-ack fetch returns 404. Also pins cross-workspace bleed protection (workspace B's bearer on A's URL → 401, B's URL with A's file_id → 404) and the file_id-UUID-parse 400 path. 23 assertions, all green against a local platform (Postgres+Redis+ platform-server stack matches the e2e-api.yml CI recipe verbatim). Why a new script instead of extending test_poll_mode_e2e.sh: that script tests A2A short-circuit + since_id cursor semantics; this one tests the chat-upload path. They share zero handler code on the platform side and would dilute each other's failure messages if combined. Why not the bearerless-401 strict-mode assertion: the platform's wsauth fail-opens for bearerless requests when MOLECULE_ENV=development (see middleware/devmode.go). The CI workflow doesn't set that var, but some local-dev .env files do — the assertion would flap by environment without testing the poll-mode upload contract. The middleware's own unit tests cover strict-mode 401. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-05 13:08:55 -07:00
Hongming Wang	ff75aeb43e	Merge pull request #2922 from Molecule-AI/fix/memory-plugin-gate-sidecar-on-cutover fix(memory-plugin): gate sidecar spawn on cutover-active	2026-05-05 19:44:01 +00:00
Hongming Wang	81cf0cbf98	Merge pull request #2921 from Molecule-AI/fix/batch-fetcher-cancel-on-timeout fix(inbox-uploads): cancel BatchFetcher futures on wait_all timeout	2026-05-05 12:41:48 -07:00
Hongming Wang	412dec0d87	fix(memory-plugin): gate sidecar spawn on cutover-active PR #2906 spawned the sidecar unconditionally on every tenant boot. The plugin's first migration runs \`CREATE EXTENSION vector\` which fails on tenant Postgres without pgvector preinstalled — every staging tenant redeploy aborted at the 30s health gate. CP fail-fast kept running tenants on the prior image (no outage), but the new image was DOA. Caught on staging redeploy 2026-05-05 19:23 with \`pq: extension "vector" is not available\`. Fix: only spawn the sidecar when the operator has flipped the cutover flag — \`MEMORY_V2_CUTOVER=true\` OR \`MEMORY_PLUGIN_URL\` is set. * Aligns the entrypoint to the same opt-in posture wiring.go already uses (it skips building the client when MEMORY_PLUGIN_URL is empty). * Until cutover, the sidecar isn't even running — no migration, no health gate, no boot-time pgvector dependency. * Operators activating cutover already redeploy with the new env vars set; that's when the sidecar starts. By definition they've verified pgvector is available before flipping. * MEMORY_PLUGIN_DISABLE=1 escape hatch preserved; harness fix #2915 becomes belt-and-suspenders (still respected). Both Dockerfile and entrypoint-tenant.sh updated. Behavior change for existing deployments: zero (cutover env vars still unset → sidecar still inert, but now also not running). Refs RFC #2728. Hotfix for #2906; supersedes the migration-path fragility class (the sidecar isn't doing migrations on tenants that won't use it).	2026-05-05 12:39:03 -07:00
Hongming Wang	39931acd9c	fix(inbox-uploads): cancel BatchFetcher futures on wait_all timeout The deadline contract was incomplete: wait_all logged the timeout but close() then called executor.shutdown(wait=True), which blocked on the leaked workers — undoing the user-facing timeout. The inbox poll loop would stall indefinitely on a hung /content fetch instead of returning to chat-message processing. Fix: wait_all now flips self._timed_out and cancels queued (not-yet- started) futures; close() reads that flag and switches to shutdown(wait=False, cancel_futures=True) on the timeout path. Currently-running workers can't be interrupted by Python's threading model, but they're now detached daemons whose blocking httpx call no longer gates the next poll. Healthy path (no timeout) keeps the existing drain-and-wait so a still-queued ack POST isn't dropped mid-write. Two new tests pin both legs of the contract end-to-end: - close-after-timeout-doesn't-block: hung worker, wait_all(0.05s) fires the timeout, close() returns in <1s instead of waiting ~5s for the worker to come back. - close-without-timeout-still-drains: 2 slow workers, wait_all completes cleanly, close() drains both ack POSTs. Resolves the BatchFetcher timeout-cancellation finding from the post-merge five-axis review of Phase 5b. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-05 12:34:41 -07:00
hongmingwang-moleculeai	6f19b88fa7	Merge pull request #2920 from Molecule-AI/feat/structured-provision-logging-2867 feat(workspace-server): structured logging at provisioning boundaries (#2867 PR-D)	2026-05-05 19:34:05 +00:00
Hongming Wang	83454e5efd	feat(workspace-server): structured logging at provisioning boundaries Adds internal/provlog with a single Event(name, fields) helper that emits JSON-tagged single-line records to the standard logger. Five boundary sites instrumented for #2867: provision.start — workspace_dispatchers.go (sync + async) provision.skip_existing — org_import.go idempotency hit provision.ec2_started — cp_provisioner.go after RunInstances provision.ec2_stopped — cp_provisioner.go after TerminateInstances ack restart.pre_stop — workspace_restart.go before Stop dispatch These pair with the existing human-prose log.Printf lines (kept). The new records are grep+jq friendly so a future log-aggregation pipeline can reconstruct per-workspace provision timelines without parsing the operator messages — this is the "and debug loggers so it dont happen again" half of the leak-prevention work. Tests: - provlog: emits evt-prefixed JSON, nil-tolerant, marshal-error fallback preserves event boundary, single-line output pinned. - handlers: provlog_emit_test.go pins three call-site contracts: provisionWorkspaceAutoSync emits provision.start with sync=true, stopForRestart emits restart.pre_stop with backend=cp on SaaS, and backend=none when both backends are nil. Field taxonomy is convenience for ops, not contract — payload can grow additively without breaking callers. Behavior gate is the event name + boundary location, per feedback_behavior_based_ast_gates.md. Refs #2867 (PR-D structured logging at provisioning boundaries) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-05 12:30:11 -07:00
Hongming Wang	8254bedf30	Merge pull request #2917 from Molecule-AI/chore/delete-team-collapse-2864 chore: delete TeamHandler.Collapse + docs cleanup (closes #2864)	2026-05-05 19:04:30 +00:00
Hongming Wang	ec72f199e6	Merge pull request #2916 from Molecule-AI/fix/memory-plugin-embed-migrations fix(memory-plugin): embed migrations into binary via go:embed (hotfix #2906)	2026-05-05 19:04:01 +00:00
Hongming Wang	ae22a55675	Merge pull request #2909 from Molecule-AI/feat/poll-mode-chat-upload-phase5b feat(poll-upload): phase 5b — concurrent BatchFetcher + httpx client reuse	2026-05-05 12:05:58 -07:00
Hongming Wang	eec4ea2e7d	chore: delete TeamHandler.Collapse + docs cleanup (closes #2864 ) Multi-model retrospective review of #2856 (Phase 1 Expand removal) flagged that TeamHandler.Collapse is unreachable from the canvas UI: the "Collapse Team" button calls PATCH /workspaces/:id { collapsed } (visual flag toggle on canvas_layouts), NOT POST /workspaces/:id/collapse. The destructive POST route — which stops EC2s, marks children removed, and deletes layouts — has zero UI callers (verified via grep across canvas/, scripts/, and the MCP tool registry; only docs referenced it). Two semantically different operations had been sharing the word "Collapse": - Visual collapse (canvas) → PATCH { collapsed: true }. Hides children visually. Reversible. UI-only. - Destructive collapse (POST /collapse) → Stops + marks removed. Irreversible. No caller. Deleting the destructive one + its supporting machinery: - workspace-server/internal/handlers/team.go (entirely) - workspace-server/internal/handlers/team_test.go (entirely) - POST /collapse route + teamh init in router.go - findTemplateDirByName helper (zero non-test callers after Expand was deleted in #2856; package-private so no out-of-package consumers) - NewTeamHandler constructor (no callers after route removed) Plus stale doc references (the most dangerous was the MCP wrapper mapping in mcp-server-setup.md — anyone generating MCP tool wrappers from that table was wiring a 404): - docs/agent-runtime/team-expansion.md (deleted entirely — whole guide taught the deleted flow) - docs/api-reference.md (dropped two team.go rows) - docs/api-protocol/platform-api.md (dropped /expand + /collapse rows) - docs/architecture/molecule-technical-doc.md (dropped /expand + /collapse rows) - docs/guides/mcp-server-setup.md (dropped expand_team + collapse_team MCP wrapper mappings) - docs/glossary.md (dropped "(org template expand_team)" parenthetical) - docs/frontend/canvas.md (dropped broken link to deleted team-expansion.md) Kept: docs/architecture/backends.md mention of "TeamHandler.Expand (#2367) bypassed routing on Start" — correct historical context for the AST gate's existence, no live route reference. Visual-collapse path unaffected: canvas/src/components/ContextMenu.tsx:227 → api.patch — unchanged canvas/src/components/WorkspaceNode.tsx:128 → api.patch — unchanged go vet ./... clean. go test ./internal/handlers/ -count 1 — all green (4.3s, no regression). Net: -388/+10 = ~378 lines removed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-05 11:59:43 -07:00
Hongming Wang	6201d12533	fix(memory-plugin): embed migrations into binary via go:embed PR #2906 shipped the binary at /memory-plugin without the migrations directory. The plugin's runMigrations() resolved a relative path \`cmd/memory-plugin-postgres/migrations\` that exists in the build context but NOT in the runtime image. Every staging tenant boot failed with: memory-plugin-postgres: migrate: read migrations dir "cmd/memory-plugin-postgres/migrations": open cmd/memory-plugin-postgres/migrations: no such file or directory memory-plugin: ❌ /v1/health never returned 200 after 30s — aborting boot Caught on the staging redeploy fleet job after #2906 merged. Tenants stayed on the old image (CP redeploy correctly fail-fasted) but the new image was broken. Fix: \`//go:embed migrations/.up.sql\` bundles the migrations into the binary at build time. No filesystem path dependency at runtime. \`embed.FS\` embeds the .up.sql files alongside the binary. * runMigrations() reads from migrationsFS by default; MEMORY_PLUGIN_MIGRATIONS_DIR override path preserved for operators shipping custom migrations. * Names sorted alphabetically — pinned by a test so a future \`002_.up.sql\` is guaranteed to run after \`001_.up.sql\`. Tests: * TestMigrationsEmbedded_ContainsCreateTable — pins that the embed pattern matched files AND those files contain CREATE TABLE (catches both empty-pattern and wrong-files-embedded). * TestRunMigrationsFromEmbed_OrderingIsAlphabetic — pins sorted application order. Verified locally: \`go build\` succeeds, binary 9.3MB, \`strings\` shows the embedded SQL. Refs RFC #2728. Hotfix for #2906.	2026-05-05 11:57:37 -07:00
Hongming Wang	81e83c05b7	fix(inbox): drop unused batch_fetcher = None after end-of-batch drain Lint nit from review bot — _drain_uploads() runs and the function immediately advances to the cursor save + return, so the local re-assign is dead code. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-05 11:56:54 -07:00
Hongming Wang	5b5eacbb29	test(inbox): clean up daemon poller thread to prevent test cross-talk test_start_poller_thread_is_daemon spawned a daemon thread with no stop mechanism — the leaked thread polled every 10ms with the test's patched httpx.Client mock STILL ACTIVE for ~50ms after the test scope. Later tests that re-patched httpx.Client + asserted call counts on fetch_and_stage / Client construction got their assertions inflated by the leaked thread's iterations. Symptoms: test_poll_once_skips_chat_upload_row_from_queue saw fetch_and_stage called twice instead of once on Python 3.11 CI; test_batch_fetcher_owns_client_when_not_supplied saw two Client constructions instead of one in the full local suite. Both surfaced only after Phase 5b's BatchFetcher refactor changed the timing window that allowed the leaked thread to fire mid-test. Fix: extend start_poller_thread with an optional stop_event kwarg (backward compatible — production callers pass None and rely on the daemon flag for process-exit cleanup). The test now signals + joins on stop_event before exiting scope, so the thread is gone before any later test patches httpx. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-05 11:47:14 -07:00
Hongming Wang	c8fca1467e	Merge pull request #2915 from Molecule-AI/fix/harness-disable-memory-plugin fix(harness): disable memory-plugin sidecar in harness tenants	2026-05-05 18:45:43 +00:00
Hongming Wang	7c8b81c6eb	fix(harness): disable memory-plugin sidecar in harness tenants PR #2906 bundled memory-plugin-postgres as a startup-gated sidecar in both tenant entrypoints. Plugin migrations include \`CREATE EXTENSION IF NOT EXISTS vector\` which fails on the harness's plain postgres:15-alpine (no pgvector preinstalled). The 30s health gate then aborts container boot and Harness Replays fails. Detected on auto-promote PR #2914 — Harness Replays job: Container harness-tenant-alpha-1 Error Container harness-tenant-beta-1 Error dependency failed to start: container harness-tenant-alpha-1 exited (1) The harness doesn't exercise memory features, so the simplest fix is to use the documented escape hatch the sidecar entrypoint already ships (MEMORY_PLUGIN_DISABLE=1) — applied to both alpha and beta tenants in compose.yml. Alternative would be switching the harness postgres images to pgvector/pgvector:pg15, deferred until the harness wants to verify memory paths. Refs PR #2906. Unblocks #2914 (auto-promote staging→main).	2026-05-05 11:42:20 -07:00
Hongming Wang	fc1c45789e	Merge pull request #2912 from Molecule-AI/feat/saas-default-hardening-2910 feat(saas): close 4th default-tier site + lift org_import asymmetry + tests (#2910)	2026-05-05 18:42:19 +00:00
Hongming Wang	e3a18ed8e8	Merge pull request #2911 from Molecule-AI/fix/memory-plugin-bind-loopback fix(memory-plugin): bind to 127.0.0.1 by default	2026-05-05 18:38:35 +00:00
Hongming Wang	9f551319d2	feat(saas): close 4th default-tier site + lift org_import asymmetry + tests (#2910 ) Multi-model retrospective review of #2901 found three Critical gaps: 1. (#2910 PR-B) template_import.go:79 wrote `tier: 3` hardcoded into generated config.yaml. On SaaS this defeated the T4 default at the create-handler layer — a config-less template import landed at T3 regardless of POST /workspaces' computed default. The 4th default-tier site #2901 missed. 2. (#2910 PR-A) #2901 claimed `go test ... all green` but added zero new tests. Existing structural-pin tests caught dispatch-layer drift but said nothing about tier-default drift. A future refactor that flips DefaultTier() to always return 3 would ship green. 3. (#2910 PR-E) org_import.go fallback returned T2 on self-hosted while workspace.go returned T3. Internally consistent ("bulk vs interactive defaults") but undocumented same-name-different-value drift. Fix: - TemplatesHandler.NewTemplatesHandler now takes `wh *WorkspaceHandler` (nil-tolerant for read-only callers). Import + ReplaceFiles compute tier via h.wh.DefaultTier() and pass it to generateDefaultConfig. generateDefaultConfig gets a `tier int` parameter (bounds-checked, invalid input falls back to T3). - org_import.go fallback lifts to h.workspace.DefaultTier() — single source of truth shared with Create + Templates so a future tier-default change sweeps every entry point at once. - New saas_default_tier_test.go pinning: TestIsSaaS_TrueWhenCPProvWired TestIsSaaS_FalseWhenOnlyDocker TestDefaultTier_SaaS_IsT4 TestDefaultTier_SelfHosted_IsT3 TestGenerateDefaultConfig_RespectsTierParam TestGenerateDefaultConfig_SelfHostedTierT3 TestGenerateDefaultConfig_OutOfRangeFallsBackToT3 - Existing template_import_test.go tests + chat_files_test.go + security_regression_test.go updated to thread the new tier param / wh constructor arg through their NewTemplatesHandler calls. Their pre-#2910 assertion of `tier: 3` is preserved (now passes because the test caller passes `3` explicitly), so no regression. go vet ./... clean. go test ./internal/handlers/ -count 1 — all green (4.2s). Deferred to separate follow-ups (per #2910 plan): - PR-C: MOLECULE_DEPLOYMENT_MODE explicit deployment-mode signal (closes the IsSaaS()=cpProv!=nil structural fragility) - PR-D: Host iptables IMDS block + IMDSv2 hop-limit (paired with molecule-controlplane EC2-IAM-scope audit) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-05 11:38:22 -07:00
Hongming Wang	1052f8bdb0	fix(memory-plugin): bind to 127.0.0.1 by default Self-review of PR #2906 flagged: defaultListenAddr was ":9100" — binds on every container interface. Inside today's deployment that's moot (no host port mapping, platform talks over loopback) but it's not least-privilege. A future Dockerfile edit that publishes the port, a misconfigured Fly machine, or a future cross-host plugin topology would expose an unauth'd memory store. Loopback is the right baseline. Operators with a multi-host topology already override via MEMORY_PLUGIN_LISTEN_ADDR — that path is unchanged. Tests: * TestLoadConfig_DefaultListenAddrIsLoopback pins the new default. * TestLoadConfig_ListenAddrEnvOverride pins the override path so operators relying on it don't break. * TestLoadConfig_MissingDatabaseURL covers the existing fail-fast. No prior unit tests existed for loadConfig — boot_e2e_test.go always sets MEMORY_PLUGIN_LISTEN_ADDR explicitly, so the default was never exercised by tests. This PR adds that coverage. Refs RFC #2728. Hardening follow-up to PR #2906.	2026-05-05 11:35:24 -07:00
Hongming Wang	30fb507165	feat(poll-upload): phase 5b — concurrent BatchFetcher + httpx client reuse Resolves the two remaining findings from the Phase 1-4 retrospective review (the Python-side counterparts to phase 5a): 1. Important — inbox_uploads.fetch_and_stage blocked the inbox poll loop synchronously per row. A user dragging 4 files into chat at once would stall the poller for 4× per-fetch latency before the chat message reached the agent. Add BatchFetcher: a thread-pool wrapper (default 4 workers) that submits fetches concurrently and exposes wait_all() as the barrier the inbox loop calls before processing the chat-message row that references the uploads. The drain barrier is the correctness invariant: rewrite_request_body must observe a populated URI cache when it walks the chat-message row's parts. _poll_once now drains the BatchFetcher inline before the first non-upload row, AND at end-of-batch (case: batch contains only upload rows; the corresponding chat message arrives in a later poll, but the future-poll-races-current-fetch race is closed). 2. Nit — fetch_and_stage created two httpx.Client instances per row (one for GET /content, one for POST /ack). Refactor so a single client serves both calls. When called from BatchFetcher, the batch-shared client serves every row's GET + ack — so the second fetch reuses the TCP+TLS handshake from the first. Comprehensive tests: - 13 new inbox_uploads tests: - fetch_and_stage with supplied client: zero httpx.Client constructions, GET+POST through the same client, caller's client not closed (lifecycle owned by caller). - fetch_and_stage without supplied client: exactly one httpx.Client constructed (was 2 pre-fix), closed on the way out. - BatchFetcher: 3 rows × 120ms = parallel completion < 250ms (vs. ~360ms serial), URI cache hot when wait_all returns, per-row failure isolation, single-client reuse across all submits, idempotent close, submit-after-close raises, owned-vs-supplied client lifecycle, no-op wait_all on empty batch, graceful httpx-missing degradation. - 3 new inbox tests: - poll_once drains uploads before processing the chat-message row (in-place mutation of row['request_body'] proves the URI was rewritten BEFORE message_from_activity returned). - poll_once with only upload rows still drains at end-of-batch. - poll_once with no upload rows never constructs a BatchFetcher (zero overhead on the no-upload happy path). 133 total inbox + inbox_uploads tests pass; 0 regressions. Closes the chat-upload poll-mode-perf gap end-to-end. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-05 11:26:55 -07:00
Hongming Wang	5334d60de4	Merge pull request #2898 from Molecule-AI/2867-workspaces-insert-allowlist test(handlers): allowlist INSERT INTO workspaces sites (#2867 class 1)	2026-05-05 18:18:19 +00:00
Hongming Wang	d6c0227e3f	Merge pull request #2906 from Molecule-AI/feat/memory-plugin-sidecar-bundle feat(memory-v2): bundle memory-plugin-postgres as in-image sidecar	2026-05-05 18:16:57 +00:00
Hongming Wang	27db090d3d	Merge pull request #2907 from Molecule-AI/feat/poll-mode-chat-upload-phase5a feat(poll-upload): phase 5a — atomic batch insert + acked-index + mime hardening	2026-05-05 11:16:56 -07:00
Hongming Wang	0f25f6de97	test(handlers): allowlist INSERT INTO workspaces sites — close bulk-create regression class (#2867 class 1) Adds TestINSERTworkspacesAllowlist: walks every non-test .go in this package, finds funcs containing an `INSERT INTO workspaces (` SQL literal, and pins the result against an explicit allowlist with the safety mechanism named per entry. New entries fail the build until a reviewer adds them — forcing the question "what makes this INSERT idempotent?" at PR-review time, not after the next bulk-create leak (the shape that produced 72 stale child workspaces in tenant-hongming over 4 days). Pairs with TestCreateWorkspaceTree_CallsLookupBeforeInsert (the behavior pin for the one bulk path today). Together: - this test catches "did a new function start inserting?" - that test catches "did the existing bulk path drop its idempotency check?" Both fire immediately when drift happens. Current allowlist (3 entries): - org_import.go:createWorkspaceTree → lookup-then-insert via lookupExistingChild (#2868 phase 3, also pinned by the sibling AST gate from #2895) - registry.go:Register → ON CONFLICT (id) DO UPDATE (idempotent by primary key — external workspace upsert) - workspace.go:Create → single-workspace POST /workspaces, server- generated UUID, no iteration Verified via mutation: dropping a synthetic tempBulkLeakTest with an unsafe loop+INSERT into the package fails the gate with a clear diagnostic pointing at the file + function. Restoring the tree returns the gate to green. Memory: feedback_assert_exact_not_substring.md (verify tightened test FAILS on bug shape) — mutation proof done locally. RFC #2867 class 1. Class 2 (Prometheus gauge for ec2_instance duplicates) + class 3 (structured logging on workspace create) are follow-up PRs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-05 11:15:16 -07:00
Hongming Wang	9991057ad1	feat(poll-upload): phase 5a — atomic batch insert + acked-index + mime hardening Resolves four of six findings from the retrospective code review of Phases 1–4 (poll-mode chat upload). Bundled because every change is in the platform's pending_uploads layer or the multi-file handler that reads it. Findings resolved: 1. Important — Sweep query lacked an index for the acked-retention OR-arm. The Phase 1 partial indexes are both `WHERE acked_at IS NULL`, so the `(acked_at IS NOT NULL AND acked_at < retention)` half of the WHERE clause seq-scanned the table on every cycle. Add a complementary partial index on `acked_at WHERE acked_at IS NOT NULL` so both arms of the disjunction are index-covered. Disjoint from the existing two indexes (no row matches both predicates), so write amplification is bounded to ~one index entry per terminal-state row. 2. Important — uploadPollMode partial-failure left orphans. The previous per-file Put loop committed rows 1..K-1 and then errored on row K with no compensation, so a client retry would double-insert the survivors. Refactor the handler into three explicit phases (pre-validate + read-into-memory, single atomic PutBatch, per-file activity row) and add Storage.PutBatch with all-or-nothing transaction semantics. 3. FYI — pendinguploads.StartSweeperWithInterval was exported only for tests. Move it to lower-case startSweeperWithInterval and expose the test seam through pendinguploads/export_test.go (Go convention; the shim file is stripped from the production binary at build time). 4. Nit — multipart Content-Type was passed verbatim into pending_uploads rows and re-served on /content. Add safeMimetype which strips parameters, rejects CR/LF/control bytes, and coerces malformed shapes to application/octet-stream. The eventual GET /content response can no longer be header-split via a crafted Content-Type on the multipart. Comprehensive tests: - 10 PutBatch unit tests (sqlmock): happy path, empty input, all four pre-validation rejection paths, BeginTx error, per-row error + Rollback (no Commit), first-row error, Commit error. - 4 new PutBatch integration tests (real Postgres): all-rows-commit happy path with COUNT(*) verification, atomic-rollback no-leak via a NUL-byte filename that lib/pq rejects mid-batch, oversize short-circuit no-Tx, idx_pending_uploads_acked existence + partial predicate via pg_indexes (planner-shape-independent). - 3 new chat_files_poll tests: atomic rollback on second-file oversize, atomic rollback on PutBatch error, mimetype CRLF/NUL/parameter sanitization (8 sub-cases). The two remaining review findings (inbox_uploads.fetch_and_stage blocks the poll loop synchronously; two httpx Clients per row) are Python-side and ship in Phase 5b once this lands on staging. Test-only export pattern via export_test.go, atomic pre-validation discipline (validate before Tx), and behavior-based (not name-based) test assertions follow the standing project conventions.	2026-05-05 11:10:13 -07:00
Hongming Wang	b89a49ec93	feat(memory-v2): bundle memory-plugin-postgres as in-image sidecar Closes the gap between the merged Memory v2 code (PR #2757 wired the client into main.go) and operator activation. Without this PR an operator wanting to flip MEMORY_V2_CUTOVER=true had to provision a separate memory-plugin service and point MEMORY_PLUGIN_URL at it — extra ops surface for what the design intends to be a built-in. What ships: * Both Dockerfile + Dockerfile.tenant build the cmd/memory-plugin-postgres binary into /memory-plugin. * Entrypoints spawn the plugin in the background on :9100 BEFORE starting the main server; wait up to 30s for /v1/health to return 200; abort boot loud if it doesn't (better to crash-loop than to silently route cutover traffic against a dead plugin). * Default env: MEMORY_PLUGIN_DATABASE_URL=$DATABASE_URL (share the existing tenant Postgres — plugin's `memory_namespaces` / `memory_records` tables coexist with platform schema, no conflicts), MEMORY_PLUGIN_LISTEN_ADDR=:9100. * MEMORY_PLUGIN_DISABLE=1 escape hatch for operators running the plugin externally on a separate host. * Platform image: plugin runs as the `platform` user (not root) via su-exec — matches the privilege boundary the main server already drops to. Tenant image already starts as `canvas` so the plugin inherits non-root automatically. What stays operator-controlled: * MEMORY_V2_CUTOVER is NOT auto-set. Behavior change for existing deployments: zero. The wiring at workspace-server/internal/memory/ wiring/wiring.go skips building the plugin client until the operator opts in, so the running sidecar is a no-op for traffic until then. * MEMORY_PLUGIN_URL is NOT auto-set either, for the same reason — setting it implies cutover-active intent. Operators set both on staging first, verify a live commit/recall round-trip (closes pending task #292), then promote to production. Operator activation steps after this PR ships: 1. Verify pgvector extension is available on the target Postgres (the plugin's first migration runs CREATE EXTENSION IF NOT EXISTS vector). Railway's managed Postgres ships pgvector available; some self-hosted operators may need to enable it. 2. Redeploy the workspace-server with this image. 3. Set MEMORY_PLUGIN_URL=http://localhost:9100 + MEMORY_V2_CUTOVER=true in the environment (staging first). 4. Watch boot logs for "memory-plugin: ✅ sidecar healthy" and the wiring.go cutover messages; do a live commit_memory + recall_memory round-trip via the canvas Memory tab to verify. 5. Promote to production once staging holds for a sweep window. Refs RFC #2728. Closes the dormant-plugin gap noted in task #294.	2026-05-05 11:10:11 -07:00
Hongming Wang	f5613bf099	Merge pull request #2902 from Molecule-AI/fix-pendinguploads-sweeper-test-race test(pendinguploads): close cycleDone-vs-metric-record race in sweeper tests	2026-05-05 18:02:21 +00:00
Hongming Wang	9bd2a2c45f	Merge pull request #2903 from Molecule-AI/fix/chat-tab-initial-scroll-bottom fix(canvas/chat): instant-scroll to bottom on first mount	2026-05-05 17:50:42 +00:00
Hongming Wang	a489ee1a7c	fix(canvas/chat): instant-scroll to bottom on first mount Reported: "right now when chat box opens it opens in the middle, but it should be at the end of conversation." Root cause: ChatTab.tsx:548 fires `bottomRef.scrollIntoView({ behavior: "smooth" })` on every messages-update. On initial mount with N messages already loaded, the smooth-scroll triggers a ~300ms animation that any concurrent React re-render (agent push landing, theme toggle, sidepanel resize) interrupts mid-flight, leaving the user stuck somewhere in the middle of the conversation. Fix: track first-mount via hasInitialScrollRef. Use behavior:"instant" for the initial jump (deterministic, no animation interruption), then smooth for subsequent appends (the new-message-landing visual stays). Refs flipped on first messages.length > 0 transition, so: - Initial open of chat tab: instant jump to bottom ✓ - New agent message arrives: smooth scroll into view ✓ - Workspace switch (ChatTab remounts): fresh hasInitialScrollRef, gets instant again ✓ - loadOlder prepend: anchor-restore path unchanged, still pins user's reading position ✓ Test plan: - pnpm test --run ChatTab.lazyHistory.test.tsx → 8 pass (existing lazy-history tests untouched) - npx tsc --noEmit clean - Manual on hongming.moleculesai.app: open a busy chat (mac laptop, ~50 messages), confirm view lands at the latest bubble, not mid- scroll. Switch to another workspace + back → instant again. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-05 10:47:32 -07:00
Hongming Wang	c79ba05ed5	test(pendinguploads): close cycleDone-vs-metric-record race in sweeper tests TestStartSweeper_RecordsMetricsOnError flaked on every CI rerun under race detection: `error counter delta = 0, want 1`. Root cause is a race between two goroutines, not a bug in the production sweeper. The fake `fakeSweepStorage.Sweep` signals `cycleDone` from inside its deferred return — that happens BEFORE Sweep's return value is received by `sweepOnce`, which is what triggers the metric increment. On slow CI hosts the test goroutine wins the read after `waitForCycle` unblocks and BEFORE StartSweeper's goroutine has called `metrics.PendingUploadsSweepError`, so the asserted delta is 0 even though the metric WILL be 1 a few ms later. Adds a polling assert helper, `waitForMetricDelta`, that closes the race deterministically without timing-based sleeps: - TestStartSweeper_RecordsMetricsOnError uses waitForMetricDelta to wait for the error counter to settle at 1. - TestStartSweeper_RecordsMetricsOnSuccess uses it on the success counters (acked, expired) so the error-stayed-zero assertion reads after StartSweeper has fully processed the cycle. - waitForCycle keeps its current shape but documents the caveat in its comment so future tests don't repeat the assumption. Verified: `go test ./internal/pendinguploads/ -race -count 5` passes all 9 tests across 5 iterations cleanly. Per memory feedback_question_test_when_unexpected.md: the "delta=0, want=1" failure looked like a real production bug at first glance, but instrumented inspection showed the metric DOES increment, just AFTER the test's read. The fix is the test's wait shape, not the sweeper. Unblocks every PR currently broken by this flake (#2898 hit it on two consecutive CI runs; staging-merged PRs from earlier today (#2877/#2881/#2885/#2886) introduced the test). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-05 10:46:17 -07:00
Hongming Wang	6470e5f41b	Merge pull request #2887 from Molecule-AI/refactor/a2a-tools-delegation-extract-rfc2873-iter4b refactor(workspace): extract delegation handlers from a2a_tools.py (RFC #2873 iter 4b)	2026-05-05 17:40:40 +00:00
Hongming Wang	aa560c0314	Merge pull request #2901 from Molecule-AI/feat/saas-default-t4 feat(saas): default new workspaces to T4 on SaaS, T3 self-hosted	2026-05-05 17:34:08 +00:00
Hongming Wang	7644e82f2f	feat(saas): default new workspaces to T4 on SaaS, T3 self-hosted User reported every SaaS workspace defaults to T2 (Standard). Three sites quietly disagreed on the default: - canvas CreateWorkspaceDialog (line 126): isSaaS ? 4 : 3 ← only correct one - canvas EmptyState "Create blank": tier: 2 ← hardcoded - workspace.go POST /workspaces: tier = 3 ← not SaaS-aware - org_import.go createWorkspaceTree: tier = 2 (fallback)← not SaaS-aware So a user clicking "+ New Workspace" via the dialog got T4 on SaaS, but a user clicking "Create blank" on the empty canvas got T2, and an agent POSTing /workspaces directly got T3. Same tenant, three different tiers depending on entry point. Fix: 1. WorkspaceHandler.IsSaaS() and DefaultTier() helpers (workspace_dispatchers.go). IsSaaS() := h.cpProv != nil — single source of truth for "are we SaaS" across the file. DefaultTier() returns 4 on SaaS, 3 on self-hosted. SaaS rationale: each workspace runs on its own sibling EC2 so the per-workspace tier boundary is a Docker resource limit on the only container present — no neighbour to protect from. T4 matches the boundary. 2. workspace.go now defaults tier via h.DefaultTier() instead of hardcoded T3. 3. org_import.go fallback (when neither ws.tier nor defaults.tier set) becomes SaaS-aware: T4 on SaaS, T2 on self-hosted (preserve the existing safe-shared-Docker-daemon default for self-hosted org imports). 4. canvas EmptyState "Create blank" stops sending tier:2 in the body and lets the backend pick — single source of truth in the backend. Eliminates the third disagreement. Test plan: - go vet ./... clean - go test ./internal/handlers/ -count 1 — all green (4.3s) - npx tsc --noEmit on canvas — clean - Staging E2E (after deploy): create a fresh workspace via canvas empty-state on hongming.moleculesai.app, confirm tier=4 on the workspace details panel. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-05 10:30:22 -07:00
Hongming Wang	abba16beb4	Merge pull request #2883 from Molecule-AI/refactor/a2a-tools-rbac-extract-rfc2873-iter4a refactor(workspace): extract RBAC helpers from a2a_tools.py (RFC #2873 iter 4a)	2026-05-05 16:59:36 +00:00
Hongming Wang	9c752e0673	Merge pull request #2879 from Molecule-AI/refactor/mcp-cli-split-rfc2873-iter3 refactor(workspace): split mcp_cli.py into focused modules (RFC #2873 iter 3)	2026-05-05 16:58:05 +00:00
Hongming Wang	be18b9c8f9	fix(tests): retarget remaining a2a_tools delegation patches to a2a_tools_delegation CI caught two test files I missed in the original iter 4b retarget: test_a2a_multi_workspace.py + test_delegation_sync_via_polling.py patch a2a_tools.{discover_peer, send_a2a_message, _delegate_sync_via_polling, httpx.AsyncClient} but those call sites moved to a2a_tools_delegation in this PR. 17 patch sites retargeted; 30 tests now green. Refs RFC #2873 iter 4b.	2026-05-05 09:50:30 -07:00
Hongming Wang	2cb1b26512	Merge pull request #2895 from Molecule-AI/2872-workspaces-unique-parent-name test(org-import): tighten idempotency gate AST → discriminate workspaces vs lookalikes (#2872 Imp-1)	2026-05-05 16:03:26 +00:00
Hongming Wang	48d1945269	test(org-import): tighten AST gate to discriminate workspaces vs lookalikes (#2872 Imp-1) The previous TestCreateWorkspaceTree_CallsLookupBeforeInsert used bytes.Index("INSERT INTO workspaces"), which prefix-matches INSERT INTO workspaces_audit, INSERT INTO workspace_secrets, and INSERT INTO workspace_channels. RFC #2872 cited this as a silent false-pass mode: a future refactor that adds an audit-table INSERT literal earlier in source than the real workspaces INSERT would make the gate point at the wrong target. Replaces the byte-search with a go/ast walk + a regex that requires `\s\(` after `workspaces` — distinguishes the real target from prefix lookalikes. Adds three discriminating tests: - TestWorkspacesInsertRE_RejectsLookalikes — pins the regex against 9 sql shapes (real, raw-string-literal, audit-shadow, workspace_ prefixes, canvas_layouts, UPDATE/SELECT, comments). - TestGate_FailsWhenLookupAfterInsert — synthesizes Go source where the lookup is positioned AFTER the workspaces INSERT, asserts the helper returns lookupPos > insertPos (which the production gate flags via t.Errorf). Proves the gate isn't vestigial. - TestGate_IgnoresAuditTableShadow — synthesizes source with an audit-table INSERT BEFORE the lookup + real INSERT, asserts the tightened regex correctly walks past the shadow and finds the real INSERT. Also extracts findLookupAndWorkspacesInsertPos as a helper so the gate logic can be exercised against synthetic source, not only against the real org_import.go. Memory: feedback_assert_exact_not_substring.md (verify tightened test FAILS on old code) — TestGate_FailsWhenLookupAfterInsert is the failing-on-bug-shape proof. Closes the silent-false-pass mode of #2872 Important-1. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-05 08:32:56 -07:00
Hongming Wang	bbec4cfcfb	Merge pull request #2886 from Molecule-AI/feat/poll-mode-chat-upload-phase4 test(rfc): poll-mode chat upload — phase 4 real-Postgres integration	2026-05-05 12:13:17 +00:00
Hongming Wang	e50799bc29	test(rfc): poll-mode chat upload — phase 4 real-Postgres integration Phase 4 closes out the rollout — strict-sqlmock unit tests pin which SQL fires, but they cannot detect bugs that depend on the actual row state after the SQL runs. Real-Postgres integration tests catch: - the Sweep CTE depends on Postgres' make_interval function and the table's CHECK constraints; sqlmock would happily accept a hand-written SQL literal that Postgres rejects at runtime. - the partial idx_pending_uploads_unacked index only catches a wrong WHERE predicate at real-query-plan time. - subtle predicate drift (e.g. a WHERE clause that filters by acked_at IS NOT NULL but uses BETWEEN incorrectly). Test cases: - PutGetAckRoundTrip: the full happy path — Put, Get, MarkFetched, Ack, idempotent re-Ack, Get-after-Ack returns ErrNotFound. - Sweep_DeletesAckedAfterRetention: row not eligible at retention=1h immediately after Ack; deleted at retention=0. - Sweep_DeletesExpiredUnacked: backdated expires_at exercises the unacked-and-expired branch of the WHERE clause. - Sweep_DeletesBothCategoriesInOneCycle: three rows (acked, expired, fresh); a single Sweep deletes the first two and leaves the third. - PutEnforcesSizeCap: ErrTooLarge above MaxFileBytes. - GetIgnoresExpiredAndAcked: Get filters predicate matches expected row state in the table. Run path: - locally via the file-header docker incantation. - CI runs on every PR/push that touches handlers/ OR migrations/ (.github/workflows/handlers-postgres-integration.yml).	2026-05-05 05:04:41 -07:00
Hongming Wang	07839580a0	Merge pull request #2885 from Molecule-AI/feat/poll-mode-chat-upload-phase3 feat(rfc): poll-mode chat upload — phase 3 GC sweep + observability	2026-05-05 05:04:19 -07:00
Hongming Wang	2227a14b1e	fix(build): add a2a_tools_delegation to TOP_LEVEL_MODULES drift gate Iter 4b's new module needs the rewrite-list entry. Stacked on iter 4a which already added a2a_tools_rbac. Refs RFC #2873 iter 4b.	2026-05-05 05:01:04 -07:00
Hongming Wang	e72f9ad107	refactor(workspace): extract delegation handlers from a2a_tools.py to a2a_tools_delegation.py (RFC #2873 iter 4b) Second slice of the a2a_tools.py split (stacked on iter 4a). Owns the three delegation MCP tools + the RFC #2829 PR-5 sync-via-polling helper they share: * tool_delegate_task — synchronous delegation * tool_delegate_task_async — fire-and-forget * tool_check_task_status — poll the platform's /delegations log * _delegate_sync_via_polling — durable async + poll for terminal status * _SYNC_POLL_INTERVAL_S / _SYNC_POLL_BUDGET_S constants a2a_tools.py shrinks from 915 → 609 LOC (−306). Stacked on iter 4a's RBAC extraction; uses `from a2a_tools_rbac import auth_headers_for_heartbeat` as its auth-header source. The lazy `from a2a_tools import report_activity` inside tool_delegate_task breaks the circular-import cycle (a2a_tools imports the delegation re-exports at module-load; delegation handler needs report_activity at CALL time). A dedicated test pins this contract. Tests: * 77 existing test_a2a_tools_impl.py tests pass after retargeting 20 patch sites in TestToolDelegateTask + TestToolDelegateTaskAsync + TestToolCheckTaskStatus from `a2a_tools.foo` to `a2a_tools_delegation.foo` (foo ∈ {discover_peer, send_a2a_message, httpx.AsyncClient}). The patches need to target the new module because that's where the call sites live now. * test_a2a_tools_delegation.py adds 8 new tests: - 6 alias drift gates (`a2a_tools.tool_delegate_task is …`) - 2 import-contract tests (no top-level circular dep + a2a_tools surfaces every delegation symbol) - 1 sync-poll budget invariant 113 tests total (77 impl + 28 rbac + 8 delegation), all green. Refs RFC #2873.	2026-05-05 05:00:52 -07:00
Hongming Wang	17aec22f9b	fix(build): add a2a_tools_rbac to TOP_LEVEL_MODULES drift gate Iter 4a's new module needs to be in the rewrite list so the wheel ships its imports prefixed correctly. Caught by 'PR-built wheel + import smoke'. Refs RFC #2873 iter 4a.	2026-05-05 05:00:47 -07:00
Hongming Wang	8388144098	fix(build): add iter-3 mcp_* modules to TOP_LEVEL_MODULES drift gate The iter-3 split created mcp_heartbeat / mcp_inbox_pollers / mcp_workspace_resolver but the wheel build's drift-gate check at scripts/build_runtime_package.py:TOP_LEVEL_MODULES wasn't updated. Without this fix the wheel ships those modules un-rewritten, so their imports of platform_auth / configs_dir / etc. break at runtime. Caught by the 'PR-built wheel + import smoke' check. Refs RFC #2873 iter 3.	2026-05-05 05:00:29 -07:00
Hongming Wang	a327d207da	feat(rfc): poll-mode chat upload — phase 3 GC sweep + observability Phase 3 of the poll-mode chat upload rollout. Stack atop Phase 2. The platform's pending_uploads table grows once-per-uploaded-file with no built-in cleanup. Phase 1's hard TTL (expires_at default 24h) makes expired rows un-fetchable but doesn't actually delete them; Phase 1's ack stamps acked_at but leaves the row indefinitely. Without a sweep the table grows unbounded across normal traffic. This PR adds: - `Storage.Sweep(ctx, ackRetention)` — a single round-trip CTE that deletes acked rows past their retention window plus unacked rows past expires_at. Returns `(acked, expired)` deletion counts so Phase 3 dashboards can spot the stuck-fetch pattern (high expired, low acked) vs healthy churn. - `pendinguploads.StartSweeper(ctx, storage, ackRetention)` — background goroutine that calls Sweep every 5 minutes (default). Runs once immediately on startup so a platform restart cleans up any rows that became eligible while we were down. - Prometheus counters `molecule_pending_uploads_swept_total` with `outcome={acked,expired,error}` labels. Wired into the existing `/metrics` endpoint. - Wired from cmd/server/main.go via supervised.RunWithRecover — one transient panic doesn't take the platform down with it. Defaults: - SweepInterval = 5m (matches the dashboard refresh cadence) - DefaultAckRetention = 1h (gives the workspace at-least-once retry headroom in case it processed but failed to write the file before crashing) Test coverage: 100% on storage_test.go (extended with sweepSQL pin + six Sweep test cases including negative-retention clamp + zero-retention immediate-delete + DB error wrapping) and sweeper_test.go (ticker-driven + ctx-cancel + nil-storage + transient-error-doesn't-crash + metric counter assertions). Closes the third of four phases tracked on the parent RFC; phase 4 is the staging E2E test.	2026-05-05 05:00:13 -07:00
Hongming Wang	529c3f3922	Merge pull request #2884 from Molecule-AI/feat/phantom-busy-reset-metric-2865-1777976000 feat(metrics): add molecule_phantom_busy_resets_total counter (#2865)	2026-05-05 11:50:28 +00:00
Hongming Wang	c778b62202	feat(metrics): add molecule_phantom_busy_resets_total counter (#2865 ) Closes #2865 (split-B of the #2669 root-cause stack). The phantom-busy sweep in workspace-server/internal/scheduler/scheduler.go already logs each row reset, but no aggregate metric surfaces "how often is this firing." A regression that causes high reset rates (e.g. controlplane#481's missing env vars, or future drift in the workspace runtime's task-lifecycle accounting) only surfaces when users complain. Fix: counter exposed at /metrics as molecule_phantom_busy_resets_total, incremented from sweepPhantomBusy after each row whose active_tasks was reset. Same shape as existing molecule_websocket_connections_active. Operator-side dashboard: alert when daily phantom-busy reset count > 0.5% of active workspaces. Today's steady-state is near-zero; any increase is a regression signal. Tests: - TestTrackPhantomBusyReset_IncrementsCounter - TestTrackPhantomBusyReset_RaceFreeUnderConcurrentWrites (50×200 concurrent writes; tests atomic invariant) - TestHandler_ExposesPhantomBusyResetsCounter (asserts HELP + TYPE + value lines in Prometheus text format) - TestHandler_PhantomBusyResetsZeroByDefault (fresh-process 0 contract — prevents a future refactor from accidentally dropping the metric from /metrics) Race-detector clean. Vet clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-05 04:45:24 -07:00
Hongming Wang	d80bffe3e3	Merge pull request #2881 from Molecule-AI/feat/poll-mode-chat-upload-phase2 feat(rfc): poll-mode chat upload — phase 2 workspace inbox extension	2026-05-05 11:44:36 +00:00

1 2 3 4 5 ...

4358 Commits