Commit Graph

4347 Commits

Author SHA1 Message Date
Hongming Wang
ae22a55675
Merge pull request #2909 from Molecule-AI/feat/poll-mode-chat-upload-phase5b
feat(poll-upload): phase 5b — concurrent BatchFetcher + httpx client reuse
2026-05-05 12:05:58 -07:00
Hongming Wang
81e83c05b7 fix(inbox): drop unused batch_fetcher = None after end-of-batch drain
Lint nit from review bot — _drain_uploads() runs and the function
immediately advances to the cursor save + return, so the local
re-assign is dead code.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 11:56:54 -07:00
Hongming Wang
5b5eacbb29 test(inbox): clean up daemon poller thread to prevent test cross-talk
test_start_poller_thread_is_daemon spawned a daemon thread with no stop
mechanism — the leaked thread polled every 10ms with the test's patched
httpx.Client mock STILL ACTIVE for ~50ms after the test scope. Later
tests that re-patched httpx.Client + asserted call counts on
fetch_and_stage / Client construction got their assertions inflated by
the leaked thread's iterations.

Symptoms: test_poll_once_skips_chat_upload_row_from_queue saw
fetch_and_stage called twice instead of once on Python 3.11 CI;
test_batch_fetcher_owns_client_when_not_supplied saw two Client
constructions instead of one in the full local suite. Both surfaced
only after Phase 5b's BatchFetcher refactor changed the timing window
that allowed the leaked thread to fire mid-test.

Fix: extend start_poller_thread with an optional stop_event kwarg
(backward compatible — production callers pass None and rely on the
daemon flag for process-exit cleanup). The test now signals + joins
on stop_event before exiting scope, so the thread is gone before any
later test patches httpx.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 11:47:14 -07:00
Hongming Wang
c8fca1467e
Merge pull request #2915 from Molecule-AI/fix/harness-disable-memory-plugin
fix(harness): disable memory-plugin sidecar in harness tenants
2026-05-05 18:45:43 +00:00
Hongming Wang
7c8b81c6eb fix(harness): disable memory-plugin sidecar in harness tenants
PR #2906 bundled memory-plugin-postgres as a startup-gated sidecar in
both tenant entrypoints. Plugin migrations include
\`CREATE EXTENSION IF NOT EXISTS vector\` which fails on the harness's
plain postgres:15-alpine (no pgvector preinstalled). The 30s health
gate then aborts container boot and Harness Replays fails.

Detected on auto-promote PR #2914 — Harness Replays job:
  Container harness-tenant-alpha-1  Error
  Container harness-tenant-beta-1   Error
  dependency failed to start: container harness-tenant-alpha-1 exited (1)

The harness doesn't exercise memory features, so the simplest fix is
to use the documented escape hatch the sidecar entrypoint already
ships (MEMORY_PLUGIN_DISABLE=1) — applied to both alpha and beta
tenants in compose.yml. Alternative would be switching the harness
postgres images to pgvector/pgvector:pg15, deferred until the
harness wants to verify memory paths.

Refs PR #2906. Unblocks #2914 (auto-promote staging→main).
2026-05-05 11:42:20 -07:00
Hongming Wang
fc1c45789e
Merge pull request #2912 from Molecule-AI/feat/saas-default-hardening-2910
feat(saas): close 4th default-tier site + lift org_import asymmetry + tests (#2910)
2026-05-05 18:42:19 +00:00
Hongming Wang
e3a18ed8e8
Merge pull request #2911 from Molecule-AI/fix/memory-plugin-bind-loopback
fix(memory-plugin): bind to 127.0.0.1 by default
2026-05-05 18:38:35 +00:00
Hongming Wang
9f551319d2 feat(saas): close 4th default-tier site + lift org_import asymmetry + tests (#2910)
Multi-model retrospective review of #2901 found three Critical gaps:

1. (#2910 PR-B) template_import.go:79 wrote `tier: 3` hardcoded into
   generated config.yaml. On SaaS this defeated the T4 default at the
   create-handler layer — a config-less template import landed at T3
   regardless of POST /workspaces' computed default. The 4th
   default-tier site #2901 missed.

2. (#2910 PR-A) #2901 claimed `go test ... all green` but added zero
   new tests. Existing structural-pin tests caught dispatch-layer
   drift but said nothing about tier-default drift. A future refactor
   that flips DefaultTier() to always return 3 would ship green.

3. (#2910 PR-E) org_import.go fallback returned T2 on self-hosted
   while workspace.go returned T3. Internally consistent ("bulk vs
   interactive defaults") but undocumented same-name-different-value
   drift.

Fix:

- TemplatesHandler.NewTemplatesHandler now takes `wh *WorkspaceHandler`
  (nil-tolerant for read-only callers). Import + ReplaceFiles compute
  tier via h.wh.DefaultTier() and pass it to generateDefaultConfig.
  generateDefaultConfig gets a `tier int` parameter (bounds-checked,
  invalid input falls back to T3).

- org_import.go fallback lifts to h.workspace.DefaultTier() — single
  source of truth shared with Create + Templates so a future
  tier-default change sweeps every entry point at once.

- New saas_default_tier_test.go pinning:
    TestIsSaaS_TrueWhenCPProvWired
    TestIsSaaS_FalseWhenOnlyDocker
    TestDefaultTier_SaaS_IsT4
    TestDefaultTier_SelfHosted_IsT3
    TestGenerateDefaultConfig_RespectsTierParam
    TestGenerateDefaultConfig_SelfHostedTierT3
    TestGenerateDefaultConfig_OutOfRangeFallsBackToT3

- Existing template_import_test.go tests + chat_files_test.go +
  security_regression_test.go updated to thread the new tier param /
  wh constructor arg through their NewTemplatesHandler calls. Their
  pre-#2910 assertion of `tier: 3` is preserved (now passes because
  the test caller passes `3` explicitly), so no regression.

go vet ./... clean. go test ./internal/handlers/ -count 1 — all
green (4.2s).

Deferred to separate follow-ups (per #2910 plan):
- PR-C: MOLECULE_DEPLOYMENT_MODE explicit deployment-mode signal
  (closes the IsSaaS()=cpProv!=nil structural fragility)
- PR-D: Host iptables IMDS block + IMDSv2 hop-limit (paired with
  molecule-controlplane EC2-IAM-scope audit)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 11:38:22 -07:00
Hongming Wang
1052f8bdb0 fix(memory-plugin): bind to 127.0.0.1 by default
Self-review of PR #2906 flagged: defaultListenAddr was ":9100" — binds
on every container interface. Inside today's deployment that's moot
(no host port mapping, platform talks over loopback) but it's not
least-privilege. A future Dockerfile edit that publishes the port,
a misconfigured Fly machine, or a future cross-host plugin topology
would expose an unauth'd memory store.

Loopback is the right baseline. Operators with a multi-host topology
already override via MEMORY_PLUGIN_LISTEN_ADDR — that path is unchanged.

Tests:
  * TestLoadConfig_DefaultListenAddrIsLoopback pins the new default.
  * TestLoadConfig_ListenAddrEnvOverride pins the override path so
    operators relying on it don't break.
  * TestLoadConfig_MissingDatabaseURL covers the existing fail-fast.

No prior unit tests existed for loadConfig — boot_e2e_test.go always
sets MEMORY_PLUGIN_LISTEN_ADDR explicitly, so the default was never
exercised by tests. This PR adds that coverage.

Refs RFC #2728. Hardening follow-up to PR #2906.
2026-05-05 11:35:24 -07:00
Hongming Wang
30fb507165 feat(poll-upload): phase 5b — concurrent BatchFetcher + httpx client reuse
Resolves the two remaining findings from the Phase 1-4 retrospective
review (the Python-side counterparts to phase 5a):

1. Important — inbox_uploads.fetch_and_stage blocked the inbox poll
   loop synchronously per row. A user dragging 4 files into chat at
   once would stall the poller for 4× per-fetch latency before the
   chat message reached the agent. Add BatchFetcher: a thread-pool
   wrapper (default 4 workers) that submits fetches concurrently and
   exposes wait_all() as the barrier the inbox loop calls before
   processing the chat-message row that references the uploads.

   The drain barrier is the correctness invariant: rewrite_request_body
   must observe a populated URI cache when it walks the chat-message
   row's parts. _poll_once now drains the BatchFetcher inline before
   the first non-upload row, AND at end-of-batch (case: batch contains
   only upload rows; the corresponding chat message arrives in a later
   poll, but the future-poll-races-current-fetch race is closed).

2. Nit — fetch_and_stage created two httpx.Client instances per row
   (one for GET /content, one for POST /ack). Refactor so a single
   client serves both calls. When called from BatchFetcher, the
   batch-shared client serves every row's GET + ack — so the second
   fetch reuses the TCP+TLS handshake from the first.

Comprehensive tests:

- 13 new inbox_uploads tests:
  - fetch_and_stage with supplied client: zero httpx.Client
    constructions, GET+POST through the same client, caller's client
    not closed (lifecycle owned by caller).
  - fetch_and_stage without supplied client: exactly one
    httpx.Client constructed (was 2 pre-fix), closed on the way out.
  - BatchFetcher: 3 rows × 120ms = parallel completion < 250ms
    (vs. ~360ms serial), URI cache hot when wait_all returns,
    per-row failure isolation, single-client reuse across all
    submits, idempotent close, submit-after-close raises,
    owned-vs-supplied client lifecycle, no-op wait_all on empty
    batch, graceful httpx-missing degradation.

- 3 new inbox tests:
  - poll_once drains uploads before processing the chat-message row
    (in-place mutation of row['request_body'] proves the URI was
    rewritten BEFORE message_from_activity returned).
  - poll_once with only upload rows still drains at end-of-batch.
  - poll_once with no upload rows never constructs a BatchFetcher
    (zero overhead on the no-upload happy path).

133 total inbox + inbox_uploads tests pass; 0 regressions.

Closes the chat-upload poll-mode-perf gap end-to-end.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 11:26:55 -07:00
Hongming Wang
5334d60de4
Merge pull request #2898 from Molecule-AI/2867-workspaces-insert-allowlist
test(handlers): allowlist INSERT INTO workspaces sites (#2867 class 1)
2026-05-05 18:18:19 +00:00
Hongming Wang
d6c0227e3f
Merge pull request #2906 from Molecule-AI/feat/memory-plugin-sidecar-bundle
feat(memory-v2): bundle memory-plugin-postgres as in-image sidecar
2026-05-05 18:16:57 +00:00
Hongming Wang
27db090d3d
Merge pull request #2907 from Molecule-AI/feat/poll-mode-chat-upload-phase5a
feat(poll-upload): phase 5a — atomic batch insert + acked-index + mime hardening
2026-05-05 11:16:56 -07:00
Hongming Wang
0f25f6de97 test(handlers): allowlist INSERT INTO workspaces sites — close bulk-create regression class (#2867 class 1)
Adds TestINSERTworkspacesAllowlist: walks every non-test .go in this
package, finds funcs containing an `INSERT INTO workspaces (` SQL
literal, and pins the result against an explicit allowlist with the
safety mechanism named per entry.

New entries fail the build until a reviewer adds them — forcing the
question "what makes this INSERT idempotent?" at PR-review time, not
after the next bulk-create leak (the shape that produced 72 stale
child workspaces in tenant-hongming over 4 days).

Pairs with TestCreateWorkspaceTree_CallsLookupBeforeInsert (the
behavior pin for the one bulk path today). Together:
- this test catches "did a new function start inserting?"
- that test catches "did the existing bulk path drop its idempotency check?"

Both fire immediately when drift happens.

Current allowlist (3 entries):
- org_import.go:createWorkspaceTree → lookup-then-insert via
  lookupExistingChild (#2868 phase 3, also pinned by the sibling AST
  gate from #2895)
- registry.go:Register → ON CONFLICT (id) DO UPDATE (idempotent by
  primary key — external workspace upsert)
- workspace.go:Create → single-workspace POST /workspaces, server-
  generated UUID, no iteration

Verified via mutation: dropping a synthetic tempBulkLeakTest with an
unsafe loop+INSERT into the package fails the gate with a clear
diagnostic pointing at the file + function. Restoring the tree
returns the gate to green.

Memory: feedback_assert_exact_not_substring.md (verify tightened test
FAILS on bug shape) — mutation proof done locally.

RFC #2867 class 1. Class 2 (Prometheus gauge for ec2_instance
duplicates) + class 3 (structured logging on workspace create) are
follow-up PRs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 11:15:16 -07:00
Hongming Wang
9991057ad1 feat(poll-upload): phase 5a — atomic batch insert + acked-index + mime hardening
Resolves four of six findings from the retrospective code review of Phases
1–4 (poll-mode chat upload). Bundled because every change is in the
platform's pending_uploads layer or the multi-file handler that reads it.

Findings resolved:

1. Important — Sweep query lacked an index for the acked-retention OR-arm.
   The Phase 1 partial indexes are both `WHERE acked_at IS NULL`, so the
   `(acked_at IS NOT NULL AND acked_at < retention)` half of the WHERE
   clause seq-scanned the table on every cycle. Add a complementary
   partial index on `acked_at WHERE acked_at IS NOT NULL` so both arms
   of the disjunction are index-covered. Disjoint from the existing two
   indexes (no row matches both predicates), so write amplification is
   bounded to ~one index entry per terminal-state row.

2. Important — uploadPollMode partial-failure left orphans. The previous
   per-file Put loop committed rows 1..K-1 and then errored on row K with
   no compensation, so a client retry would double-insert the survivors.
   Refactor the handler into three explicit phases (pre-validate +
   read-into-memory, single atomic PutBatch, per-file activity row) and
   add Storage.PutBatch with all-or-nothing transaction semantics.

3. FYI — pendinguploads.StartSweeperWithInterval was exported only for
   tests. Move it to lower-case startSweeperWithInterval and expose the
   test seam through pendinguploads/export_test.go (Go convention; the
   shim file is stripped from the production binary at build time).

4. Nit — multipart Content-Type was passed verbatim into pending_uploads
   rows and re-served on /content. Add safeMimetype which strips
   parameters, rejects CR/LF/control bytes, and coerces malformed shapes
   to application/octet-stream. The eventual GET /content response can no
   longer be header-split via a crafted Content-Type on the multipart.

Comprehensive tests:

- 10 PutBatch unit tests (sqlmock): happy path, empty input, all four
  pre-validation rejection paths, BeginTx error, per-row error +
  Rollback (no Commit), first-row error, Commit error.
- 4 new PutBatch integration tests (real Postgres): all-rows-commit
  happy path with COUNT(*) verification, atomic-rollback no-leak via
  a NUL-byte filename that lib/pq rejects mid-batch, oversize
  short-circuit no-Tx, idx_pending_uploads_acked existence + partial
  predicate via pg_indexes (planner-shape-independent).
- 3 new chat_files_poll tests: atomic rollback on second-file oversize,
  atomic rollback on PutBatch error, mimetype CRLF/NUL/parameter
  sanitization (8 sub-cases).

The two remaining review findings (inbox_uploads.fetch_and_stage blocks
the poll loop synchronously; two httpx Clients per row) are Python-side
and ship in Phase 5b once this lands on staging.

Test-only export pattern via export_test.go, atomic pre-validation
discipline (validate before Tx), and behavior-based (not name-based)
test assertions follow the standing project conventions.
2026-05-05 11:10:13 -07:00
Hongming Wang
b89a49ec93 feat(memory-v2): bundle memory-plugin-postgres as in-image sidecar
Closes the gap between the merged Memory v2 code (PR #2757 wired the
client into main.go) and operator activation. Without this PR an
operator wanting to flip MEMORY_V2_CUTOVER=true had to provision a
separate memory-plugin service and point MEMORY_PLUGIN_URL at it —
extra ops surface for what the design intends to be a built-in.

What ships:
  * Both Dockerfile + Dockerfile.tenant build the
    cmd/memory-plugin-postgres binary into /memory-plugin.
  * Entrypoints spawn the plugin in the background on :9100 BEFORE
    starting the main server; wait up to 30s for /v1/health to return
    200; abort boot loud if it doesn't (better to crash-loop than to
    silently route cutover traffic against a dead plugin).
  * Default env: MEMORY_PLUGIN_DATABASE_URL=$DATABASE_URL (share the
    existing tenant Postgres — plugin's `memory_namespaces` /
    `memory_records` tables coexist with platform schema, no
    conflicts), MEMORY_PLUGIN_LISTEN_ADDR=:9100.
  * MEMORY_PLUGIN_DISABLE=1 escape hatch for operators running the
    plugin externally on a separate host.
  * Platform image: plugin runs as the `platform` user (not root) via
    su-exec — matches the privilege boundary the main server already
    drops to. Tenant image already starts as `canvas` so the plugin
    inherits non-root automatically.

What stays operator-controlled:
  * MEMORY_V2_CUTOVER is NOT auto-set. Behavior change for existing
    deployments: zero. The wiring at workspace-server/internal/memory/
    wiring/wiring.go skips building the plugin client until the
    operator opts in, so the running sidecar is a no-op for traffic
    until then.
  * MEMORY_PLUGIN_URL is NOT auto-set either, for the same reason —
    setting it implies cutover-active intent. Operators set both on
    staging first, verify a live commit/recall round-trip (closes
    pending task #292), then promote to production.

Operator activation steps after this PR ships:
  1. Verify pgvector extension is available on the target Postgres
     (the plugin's first migration runs CREATE EXTENSION IF NOT
     EXISTS vector). Railway's managed Postgres ships pgvector
     available; some self-hosted operators may need to enable it.
  2. Redeploy the workspace-server with this image.
  3. Set MEMORY_PLUGIN_URL=http://localhost:9100 + MEMORY_V2_CUTOVER=true
     in the environment (staging first).
  4. Watch boot logs for "memory-plugin:  sidecar healthy" and the
     wiring.go cutover messages; do a live commit_memory + recall_memory
     round-trip via the canvas Memory tab to verify.
  5. Promote to production once staging holds for a sweep window.

Refs RFC #2728. Closes the dormant-plugin gap noted in task #294.
2026-05-05 11:10:11 -07:00
Hongming Wang
f5613bf099
Merge pull request #2902 from Molecule-AI/fix-pendinguploads-sweeper-test-race
test(pendinguploads): close cycleDone-vs-metric-record race in sweeper tests
2026-05-05 18:02:21 +00:00
Hongming Wang
9bd2a2c45f
Merge pull request #2903 from Molecule-AI/fix/chat-tab-initial-scroll-bottom
fix(canvas/chat): instant-scroll to bottom on first mount
2026-05-05 17:50:42 +00:00
Hongming Wang
a489ee1a7c fix(canvas/chat): instant-scroll to bottom on first mount
Reported: "right now when chat box opens it opens in the middle, but
it should be at the end of conversation."

Root cause: ChatTab.tsx:548 fires `bottomRef.scrollIntoView({ behavior:
"smooth" })` on every messages-update. On initial mount with N
messages already loaded, the smooth-scroll triggers a ~300ms animation
that any concurrent React re-render (agent push landing, theme
toggle, sidepanel resize) interrupts mid-flight, leaving the user
stuck somewhere in the middle of the conversation.

Fix: track first-mount via hasInitialScrollRef. Use behavior:"instant"
for the initial jump (deterministic, no animation interruption), then
smooth for subsequent appends (the new-message-landing visual stays).

Refs flipped on first messages.length > 0 transition, so:
- Initial open of chat tab: instant jump to bottom ✓
- New agent message arrives: smooth scroll into view ✓
- Workspace switch (ChatTab remounts): fresh hasInitialScrollRef, gets
  instant again ✓
- loadOlder prepend: anchor-restore path unchanged, still pins user's
  reading position ✓

Test plan:
- pnpm test --run ChatTab.lazyHistory.test.tsx → 8 pass (existing
  lazy-history tests untouched)
- npx tsc --noEmit clean
- Manual on hongming.moleculesai.app: open a busy chat (mac laptop,
  ~50 messages), confirm view lands at the latest bubble, not mid-
  scroll. Switch to another workspace + back → instant again.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 10:47:32 -07:00
Hongming Wang
c79ba05ed5 test(pendinguploads): close cycleDone-vs-metric-record race in sweeper tests
TestStartSweeper_RecordsMetricsOnError flaked on every CI rerun under
race detection: `error counter delta = 0, want 1`. Root cause is a
race between two goroutines, not a bug in the production sweeper.

The fake `fakeSweepStorage.Sweep` signals `cycleDone` from inside its
deferred return — that happens BEFORE Sweep's return value is
received by `sweepOnce`, which is what triggers the metric increment.
On slow CI hosts the test goroutine wins the read after `waitForCycle`
unblocks and BEFORE StartSweeper's goroutine has called
`metrics.PendingUploadsSweepError`, so the asserted delta is 0 even
though the metric WILL be 1 a few ms later.

Adds a polling assert helper, `waitForMetricDelta`, that closes the
race deterministically without timing-based sleeps:

- TestStartSweeper_RecordsMetricsOnError uses waitForMetricDelta to
  wait for the error counter to settle at 1.
- TestStartSweeper_RecordsMetricsOnSuccess uses it on the success
  counters (acked, expired) so the error-stayed-zero assertion
  reads after StartSweeper has fully processed the cycle.
- waitForCycle keeps its current shape but documents the caveat in
  its comment so future tests don't repeat the assumption.

Verified: `go test ./internal/pendinguploads/ -race -count 5` passes
all 9 tests across 5 iterations cleanly.

Per memory feedback_question_test_when_unexpected.md: the
"delta=0, want=1" failure looked like a real production bug at first
glance, but instrumented inspection showed the metric DOES increment,
just AFTER the test's read. The fix is the test's wait shape, not
the sweeper.

Unblocks every PR currently broken by this flake (#2898 hit it on
two consecutive CI runs; staging-merged PRs from earlier today
(#2877/#2881/#2885/#2886) introduced the test).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 10:46:17 -07:00
Hongming Wang
6470e5f41b
Merge pull request #2887 from Molecule-AI/refactor/a2a-tools-delegation-extract-rfc2873-iter4b
refactor(workspace): extract delegation handlers from a2a_tools.py (RFC #2873 iter 4b)
2026-05-05 17:40:40 +00:00
Hongming Wang
aa560c0314
Merge pull request #2901 from Molecule-AI/feat/saas-default-t4
feat(saas): default new workspaces to T4 on SaaS, T3 self-hosted
2026-05-05 17:34:08 +00:00
Hongming Wang
7644e82f2f feat(saas): default new workspaces to T4 on SaaS, T3 self-hosted
User reported every SaaS workspace defaults to T2 (Standard). Three
sites quietly disagreed on the default:

- canvas CreateWorkspaceDialog (line 126): isSaaS ? 4 : 3   ← only correct one
- canvas EmptyState "Create blank":      tier: 2            ← hardcoded
- workspace.go POST /workspaces:         tier = 3           ← not SaaS-aware
- org_import.go createWorkspaceTree:     tier = 2 (fallback)← not SaaS-aware

So a user clicking "+ New Workspace" via the dialog got T4 on SaaS,
but a user clicking "Create blank" on the empty canvas got T2, and an
agent POSTing /workspaces directly got T3. Same tenant, three different
tiers depending on entry point.

Fix:

1. WorkspaceHandler.IsSaaS() and DefaultTier() helpers (workspace_dispatchers.go).
   IsSaaS() := h.cpProv != nil — single source of truth for "are we
   SaaS" across the file. DefaultTier() returns 4 on SaaS, 3 on
   self-hosted. SaaS rationale: each workspace runs on its own sibling
   EC2 so the per-workspace tier boundary is a Docker resource limit
   on the only container present — no neighbour to protect from. T4
   matches the boundary.

2. workspace.go now defaults tier via h.DefaultTier() instead of
   hardcoded T3.

3. org_import.go fallback (when neither ws.tier nor defaults.tier set)
   becomes SaaS-aware: T4 on SaaS, T2 on self-hosted (preserve the
   existing safe-shared-Docker-daemon default for self-hosted org
   imports).

4. canvas EmptyState "Create blank" stops sending tier:2 in the body
   and lets the backend pick — single source of truth in the backend.
   Eliminates the third disagreement.

Test plan:
- go vet ./... clean
- go test ./internal/handlers/ -count 1 — all green (4.3s)
- npx tsc --noEmit on canvas — clean
- Staging E2E (after deploy): create a fresh workspace via canvas
  empty-state on hongming.moleculesai.app, confirm tier=4 on the
  workspace details panel.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 10:30:22 -07:00
Hongming Wang
abba16beb4
Merge pull request #2883 from Molecule-AI/refactor/a2a-tools-rbac-extract-rfc2873-iter4a
refactor(workspace): extract RBAC helpers from a2a_tools.py (RFC #2873 iter 4a)
2026-05-05 16:59:36 +00:00
Hongming Wang
9c752e0673
Merge pull request #2879 from Molecule-AI/refactor/mcp-cli-split-rfc2873-iter3
refactor(workspace): split mcp_cli.py into focused modules (RFC #2873 iter 3)
2026-05-05 16:58:05 +00:00
Hongming Wang
be18b9c8f9 fix(tests): retarget remaining a2a_tools delegation patches to a2a_tools_delegation
CI caught two test files I missed in the original iter 4b retarget:
test_a2a_multi_workspace.py + test_delegation_sync_via_polling.py
patch a2a_tools.{discover_peer, send_a2a_message, _delegate_sync_via_polling,
httpx.AsyncClient} but those call sites moved to a2a_tools_delegation
in this PR. 17 patch sites retargeted; 30 tests now green.

Refs RFC #2873 iter 4b.
2026-05-05 09:50:30 -07:00
Hongming Wang
2cb1b26512
Merge pull request #2895 from Molecule-AI/2872-workspaces-unique-parent-name
test(org-import): tighten idempotency gate AST → discriminate workspaces vs lookalikes (#2872 Imp-1)
2026-05-05 16:03:26 +00:00
Hongming Wang
48d1945269 test(org-import): tighten AST gate to discriminate workspaces vs lookalikes (#2872 Imp-1)
The previous TestCreateWorkspaceTree_CallsLookupBeforeInsert used
bytes.Index("INSERT INTO workspaces"), which prefix-matches
INSERT INTO workspaces_audit, INSERT INTO workspace_secrets, and
INSERT INTO workspace_channels. RFC #2872 cited this as a silent
false-pass mode: a future refactor that adds an audit-table INSERT
literal earlier in source than the real workspaces INSERT would
make the gate point at the wrong target.

Replaces the byte-search with a go/ast walk + a regex that requires
`\s*\(` after `workspaces` — distinguishes the real target from
prefix lookalikes.

Adds three discriminating tests:
- TestWorkspacesInsertRE_RejectsLookalikes — pins the regex against
  9 sql shapes (real, raw-string-literal, audit-shadow, workspace_*
  prefixes, canvas_layouts, UPDATE/SELECT, comments).
- TestGate_FailsWhenLookupAfterInsert — synthesizes Go source where
  the lookup is positioned AFTER the workspaces INSERT, asserts the
  helper returns lookupPos > insertPos (which the production gate
  flags via t.Errorf). Proves the gate isn't vestigial.
- TestGate_IgnoresAuditTableShadow — synthesizes source with an
  audit-table INSERT BEFORE the lookup + real INSERT, asserts the
  tightened regex correctly walks past the shadow and finds the
  real INSERT.

Also extracts findLookupAndWorkspacesInsertPos as a helper so the
gate logic can be exercised against synthetic source, not only
against the real org_import.go.

Memory: feedback_assert_exact_not_substring.md (verify tightened
test FAILS on old code) — TestGate_FailsWhenLookupAfterInsert is
the failing-on-bug-shape proof.

Closes the silent-false-pass mode of #2872 Important-1.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 08:32:56 -07:00
Hongming Wang
bbec4cfcfb
Merge pull request #2886 from Molecule-AI/feat/poll-mode-chat-upload-phase4
test(rfc): poll-mode chat upload — phase 4 real-Postgres integration
2026-05-05 12:13:17 +00:00
Hongming Wang
e50799bc29 test(rfc): poll-mode chat upload — phase 4 real-Postgres integration
Phase 4 closes out the rollout — strict-sqlmock unit tests pin which
SQL fires, but they cannot detect bugs that depend on the actual row
state after the SQL runs. Real-Postgres integration tests catch:

  - the Sweep CTE depends on Postgres' make_interval function and
    the table's CHECK constraints; sqlmock would happily accept a
    hand-written SQL literal that Postgres rejects at runtime.
  - the partial idx_pending_uploads_unacked index only catches a
    wrong WHERE predicate at real-query-plan time.
  - subtle predicate drift (e.g. a WHERE clause that filters by
    acked_at IS NOT NULL but uses BETWEEN incorrectly).

Test cases:

  - PutGetAckRoundTrip: the full happy path — Put, Get, MarkFetched,
    Ack, idempotent re-Ack, Get-after-Ack returns ErrNotFound.
  - Sweep_DeletesAckedAfterRetention: row not eligible at retention=1h
    immediately after Ack; deleted at retention=0.
  - Sweep_DeletesExpiredUnacked: backdated expires_at exercises the
    unacked-and-expired branch of the WHERE clause.
  - Sweep_DeletesBothCategoriesInOneCycle: three rows (acked, expired,
    fresh); a single Sweep deletes the first two and leaves the third.
  - PutEnforcesSizeCap: ErrTooLarge above MaxFileBytes.
  - GetIgnoresExpiredAndAcked: Get filters predicate matches expected
    row state in the table.

Run path:
  - locally via the file-header docker incantation.
  - CI runs on every PR/push that touches handlers/** OR migrations/**
    (.github/workflows/handlers-postgres-integration.yml).
2026-05-05 05:04:41 -07:00
Hongming Wang
07839580a0
Merge pull request #2885 from Molecule-AI/feat/poll-mode-chat-upload-phase3
feat(rfc): poll-mode chat upload — phase 3 GC sweep + observability
2026-05-05 05:04:19 -07:00
Hongming Wang
2227a14b1e fix(build): add a2a_tools_delegation to TOP_LEVEL_MODULES drift gate
Iter 4b's new module needs the rewrite-list entry. Stacked on iter 4a
which already added a2a_tools_rbac.

Refs RFC #2873 iter 4b.
2026-05-05 05:01:04 -07:00
Hongming Wang
e72f9ad107 refactor(workspace): extract delegation handlers from a2a_tools.py to a2a_tools_delegation.py (RFC #2873 iter 4b)
Second slice of the a2a_tools.py split (stacked on iter 4a). Owns the
three delegation MCP tools + the RFC #2829 PR-5 sync-via-polling
helper they share:

  * tool_delegate_task — synchronous delegation
  * tool_delegate_task_async — fire-and-forget
  * tool_check_task_status — poll the platform's /delegations log
  * _delegate_sync_via_polling — durable async + poll for terminal status
  * _SYNC_POLL_INTERVAL_S / _SYNC_POLL_BUDGET_S constants

a2a_tools.py shrinks from 915 → 609 LOC (−306). Stacked on iter 4a's
RBAC extraction; uses `from a2a_tools_rbac import auth_headers_for_heartbeat`
as its auth-header source.

The lazy `from a2a_tools import report_activity` inside tool_delegate_task
breaks the circular-import cycle (a2a_tools imports the delegation
re-exports at module-load; delegation handler needs report_activity at
CALL time). A dedicated test pins this contract.

Tests:
  * 77 existing test_a2a_tools_impl.py tests pass after retargeting
    20 patch sites in TestToolDelegateTask + TestToolDelegateTaskAsync +
    TestToolCheckTaskStatus from `a2a_tools.foo` to
    `a2a_tools_delegation.foo` (foo ∈ {discover_peer, send_a2a_message,
    httpx.AsyncClient}). The patches need to target the new module
    because that's where the call sites live now.
  * test_a2a_tools_delegation.py adds 8 new tests:
    - 6 alias drift gates (`a2a_tools.tool_delegate_task is …`)
    - 2 import-contract tests (no top-level circular dep + a2a_tools
      surfaces every delegation symbol)
    - 1 sync-poll budget invariant

113 tests total (77 impl + 28 rbac + 8 delegation), all green.

Refs RFC #2873.
2026-05-05 05:00:52 -07:00
Hongming Wang
17aec22f9b fix(build): add a2a_tools_rbac to TOP_LEVEL_MODULES drift gate
Iter 4a's new module needs to be in the rewrite list so the wheel
ships its imports prefixed correctly. Caught by 'PR-built wheel +
import smoke'.

Refs RFC #2873 iter 4a.
2026-05-05 05:00:47 -07:00
Hongming Wang
8388144098 fix(build): add iter-3 mcp_* modules to TOP_LEVEL_MODULES drift gate
The iter-3 split created mcp_heartbeat / mcp_inbox_pollers /
mcp_workspace_resolver but the wheel build's drift-gate check at
scripts/build_runtime_package.py:TOP_LEVEL_MODULES wasn't updated.
Without this fix the wheel ships those modules un-rewritten, so
their imports of platform_auth / configs_dir / etc. break at
runtime. Caught by the 'PR-built wheel + import smoke' check.

Refs RFC #2873 iter 3.
2026-05-05 05:00:29 -07:00
Hongming Wang
a327d207da feat(rfc): poll-mode chat upload — phase 3 GC sweep + observability
Phase 3 of the poll-mode chat upload rollout. Stack atop Phase 2.

The platform's pending_uploads table grows once-per-uploaded-file with
no built-in cleanup. Phase 1's hard TTL (expires_at default 24h) makes
expired rows un-fetchable but doesn't actually delete them; Phase 1's
ack stamps acked_at but leaves the row indefinitely. Without a sweep
the table grows unbounded across normal traffic.

This PR adds:

- `Storage.Sweep(ctx, ackRetention)` — a single round-trip CTE that
  deletes acked rows past their retention window plus unacked rows
  past expires_at. Returns `(acked, expired)` deletion counts so
  Phase 3 dashboards can spot the stuck-fetch pattern (high expired,
  low acked) vs healthy churn.
- `pendinguploads.StartSweeper(ctx, storage, ackRetention)` —
  background goroutine that calls Sweep every 5 minutes (default).
  Runs once immediately on startup so a platform restart cleans up
  any rows that became eligible while we were down.
- Prometheus counters `molecule_pending_uploads_swept_total` with
  `outcome={acked,expired,error}` labels. Wired into the existing
  `/metrics` endpoint.
- Wired from cmd/server/main.go via supervised.RunWithRecover —
  one transient panic doesn't take the platform down with it.

Defaults:
  - SweepInterval = 5m (matches the dashboard refresh cadence)
  - DefaultAckRetention = 1h (gives the workspace at-least-once retry
    headroom in case it processed but failed to write the file before
    crashing)

Test coverage: 100% on storage_test.go (extended with sweepSQL pin +
six Sweep test cases including negative-retention clamp + zero-retention
immediate-delete + DB error wrapping) and sweeper_test.go (ticker-driven
+ ctx-cancel + nil-storage + transient-error-doesn't-crash + metric
counter assertions).

Closes the third of four phases tracked on the parent RFC; phase 4 is
the staging E2E test.
2026-05-05 05:00:13 -07:00
Hongming Wang
529c3f3922
Merge pull request #2884 from Molecule-AI/feat/phantom-busy-reset-metric-2865-1777976000
feat(metrics): add molecule_phantom_busy_resets_total counter (#2865)
2026-05-05 11:50:28 +00:00
Hongming Wang
c778b62202 feat(metrics): add molecule_phantom_busy_resets_total counter (#2865)
Closes #2865 (split-B of the #2669 root-cause stack).

The phantom-busy sweep in workspace-server/internal/scheduler/scheduler.go
already logs each row reset, but no aggregate metric surfaces "how often
is this firing." A regression that causes high reset rates (e.g.
controlplane#481's missing env vars, or future drift in the workspace
runtime's task-lifecycle accounting) only surfaces when users complain.

Fix: counter exposed at /metrics as molecule_phantom_busy_resets_total,
incremented from sweepPhantomBusy after each row whose active_tasks
was reset. Same shape as existing molecule_websocket_connections_active.

Operator-side dashboard: alert when daily phantom-busy reset count
> 0.5% of active workspaces. Today's steady-state is near-zero; any
increase is a regression signal.

Tests:
  - TestTrackPhantomBusyReset_IncrementsCounter
  - TestTrackPhantomBusyReset_RaceFreeUnderConcurrentWrites (50×200
    concurrent writes; tests atomic invariant)
  - TestHandler_ExposesPhantomBusyResetsCounter (asserts HELP + TYPE
    + value lines in Prometheus text format)
  - TestHandler_PhantomBusyResetsZeroByDefault (fresh-process 0
    contract — prevents a future refactor from accidentally dropping
    the metric from /metrics)

Race-detector clean. Vet clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 04:45:24 -07:00
Hongming Wang
d80bffe3e3
Merge pull request #2881 from Molecule-AI/feat/poll-mode-chat-upload-phase2
feat(rfc): poll-mode chat upload — phase 2 workspace inbox extension
2026-05-05 11:44:36 +00:00
Hongming Wang
0c461eb9f1 refactor(workspace): extract RBAC helpers from a2a_tools.py to a2a_tools_rbac.py (RFC #2873 iter 4a)
First slice of the a2a_tools.py (991 LOC) split — single-concern module
for the workspace's RBAC + auth-header layer:

  * _ROLE_PERMISSIONS canonical table
  * _get_workspace_tier
  * _check_memory_write_permission
  * _check_memory_read_permission
  * _is_root_workspace
  * _auth_headers_for_heartbeat

a2a_tools.py shrinks from 991 → 915 LOC. Internal call sites (15
references) work unchanged because the bare names are re-imported at
module-level — Python's local-then-module name resolution still
finds them in a2a_tools's namespace, so existing tests'
patch("a2a_tools._foo", …) keeps working.

The RBAC layer can now evolve independently of the 18 tool handlers.
Adding a new role or capability action touches one file, not the
kitchen-sink module.

Tests:
  * 77 existing test_a2a_tools_impl.py pass unchanged.
  * test_a2a_tools_rbac.py adds 28 focused tests:
    - 6 alias drift-gate tests (`_foo is rbac.foo`)
    - 4 get_workspace_tier env+config branches
    - 2 is_root_workspace tier branches
    - 6 check_memory_write_permission roles + override branches
    - 3 check_memory_read_permission scenarios
    - 3 auth_headers_for_heartbeat platform_auth branches
    - 4 ROLE_PERMISSIONS table invariants
  * Direct coverage for the helper module (was previously only
    exercised through 991-LOC tool-handler tests).

Refs RFC #2873.
2026-05-05 04:43:16 -07:00
Hongming Wang
86015412eb build(runtime): register inbox_uploads in TOP_LEVEL_MODULES
The drift gate in build_runtime_package.py rejects any workspace/*.py
module not listed in TOP_LEVEL_MODULES — it would ship un-rewritten
and break wheel imports. Add inbox_uploads (introduced in this PR)
to the list.
2026-05-05 04:41:07 -07:00
Hongming Wang
f81813f708 feat(rfc): poll-mode chat upload — phase 2 workspace inbox extension
Workspace-side fetcher for the platform-staged chat uploads written by
phase 1. Stack atop feat/poll-mode-chat-upload-phase1.

Wire shape — the platform writes one activity_logs row per uploaded
file with `activity_type=a2a_receive`, `method=chat_upload_receive`,
and a `request_body={file_id, name, mimeType, size, uri}` carrying
the synthetic `platform-pending:<wsid>/<fid>` URI.

Workspace-side flow (new module workspace/inbox_uploads.py):

  1. Fetch via GET /workspaces/:id/pending-uploads/:file_id/content
  2. Stage to /workspace/.molecule/chat-uploads/<32-hex>-<sanitized>
     (same on-disk shape as internal_chat_uploads.py — agent-side
     URI resolvers see no contract change)
  3. POST /workspaces/:id/pending-uploads/:file_id/ack
  4. Cache `platform-pending: → workspace:` so the eventual chat
     message that REFERENCES the upload (separate, later activity row)
     gets URI-rewritten before the agent sees it.

Inbox poller extension (workspace/inbox.py):

  - is_chat_upload_row(row) discriminator on `method`
  - upload-receive rows trigger fetch_and_stage and are NOT enqueued
    as InboxMessages (they're side-effect rows, not chat messages)
  - cursor advances past them regardless of fetch outcome — a
    permanent /content failure must not stall the cursor and block
    real chat traffic
  - message_from_activity calls rewrite_request_body to swap
    platform-pending: URIs to local workspace: URIs in subsequent
    chat messages' file parts. Cache miss leaves the URI untouched
    so the agent surfaces an unresolvable URI rather than the inbox
    silently dropping the part.

Filename sanitization mirrors workspace-server/internal/handlers
/chat_files.go::SanitizeFilename and workspace/internal_chat_uploads
.py::sanitize_filename — pinned by the existing parity test suites.

Coverage: 100% on inbox_uploads.py; the inbox.py extension is fully
covered by three new tests in test_inbox.py (skip-from-queue,
cursor-advance-past-broken-fetch, URI-rewrite ordering).
2026-05-05 04:39:02 -07:00
Hongming Wang
28ef75d25e refactor(workspace): split mcp_cli.py (626 LOC) into focused modules (RFC #2873 iter 3)
Splits the standalone molecule-mcp wrapper into three single-concern
modules per the OSS-shape refactor program:

  * mcp_heartbeat.py — register POST + heartbeat loop + auth-failure
    escalation + inbound-secret persistence
  * mcp_workspace_resolver.py — single + multi-workspace env validation
    + on-disk token-file read + operator-help printer
  * mcp_inbox_pollers.py — activate inbox singleton + spawn one daemon
    poller per workspace

mcp_cli.py becomes a 193-LOC orchestrator: validates env, calls each
module's helpers, hands off to a2a_mcp_server.cli_main. The console-
script entry molecule-mcp = molecule_runtime.mcp_cli:main is preserved.

Back-compat aliases (mcp_cli._build_agent_card, _heartbeat_loop,
_resolve_workspaces, etc.) re-export the new modules' authoritative
functions so existing tests + wheel_smoke.py + any downstream caller
keeps working unchanged. A new test file pins each alias as the
exact same callable (drift gate via `is`).

Tests:
  * 62 existing test_mcp_cli.py + test_mcp_cli_multi_workspace.py
    pass against the split.
  * Two heartbeat-loop persist tests + the auth-escalation caplog
    setup updated to target mcp_heartbeat (the module where the loop
    body now lives) instead of mcp_cli (still works through aliases
    for direct calls, but Python's name resolution inside the loop
    body uses the new module's namespace).
  * test_mcp_cli_split.py adds 11 new tests: alias drift gate +
    inbox-poller single + multi-workspace branches + degraded
    inbox-import logging path (none of those existed before).

Refs RFC #2873.
2026-05-05 04:33:06 -07:00
Hongming Wang
243f9bc2b1
Merge pull request #2877 from Molecule-AI/feat/poll-mode-chat-upload-phase1
feat(rfc): poll-mode chat upload — phase 1 platform staging layer
2026-05-05 11:32:10 +00:00
Hongming Wang
43bf94a07c fix(chat-uploads): align poll-mode activity rows with inbox poll filter
The workspace inbox poller filters
`GET /workspaces/:id/activity?type=a2a_receive` — writing rows with
`activity_type=chat_upload_receive` would be silently invisible to it.

Switch the poll-mode upload-staging handler to write
`activity_type=a2a_receive` with `method=chat_upload_receive` as the
discriminator. Same shape as A2A's `tasks/send` vs `message/send` method
split; the workspace-side handler (Phase 2) routes by `method`, not
activity_type.

Pinned with `TestPollUpload_ActivityRowDiscriminator` — sqlmock
WithArgs on positions 2 (activity_type) and 5 (method) so a refactor
that flips activity_type back to a custom value gets a red test
instead of a runtime "poller saw nothing" silent break.
2026-05-05 04:29:07 -07:00
Hongming Wang
55f5c0b0ff
Merge pull request #2876 from Molecule-AI/refactor/shell-e2e-tmp-cleanup
test(e2e): plug /tmp scratch leaks + add CI lint gate (RFC #2873 iter 2)
2026-05-05 11:24:43 +00:00
Hongming Wang
86fdaad111 feat(rfc): poll-mode chat upload — phase 1 platform staging layer
External-runtime workspaces (registered via molecule connect, behind
NAT, no public callback URL) currently see HTTP 422 "workspace has no
callback URL" on every chat file upload. The only escape is to wrap the
laptop in ngrok / Cloudflare tunnel + re-register push-mode — a tax
that shouldn't exist for a one-line use case.

This phase introduces the platform-side staging layer that lets
canvas → external workspace uploads ride the same poll loop the inbox
already uses for text messages.

Architecture (mirrors inbox poll, SSOT principle):
  Canvas POST /chat/uploads (multipart)
      ↓ delivery_mode=poll
  Platform: chat_files.uploadPollMode
      ↓ pendinguploads.Storage.Put + LogActivity(chat_upload_receive)
  Workspace's existing inbox poller picks up the activity row (Phase 2)
  Workspace fetches: GET /workspaces/:id/pending-uploads/:fid/content
  Workspace acks:    POST /workspaces/:id/pending-uploads/:fid/ack

Pieces in this PR:
  * Migration 20260505100000 — pending_uploads table; partial indexes
    on unacked + expires_at for the workspace fetch + Phase 3 sweep
    hot paths. No FK to workspaces (audit retention), 24h hard TTL.
  * internal/pendinguploads — Storage interface + Postgres impl. Bytes
    inline (bytea) today; the interface lets a future PR replace with
    S3 (RFC #2789) by swapping one constructor. 100% test coverage on
    the Postgres impl via sqlmock-pinned SQL.
  * handlers.PendingUploadsHandler — GET /content + POST /ack endpoints.
    wsAuth-gated; cross-workspace bleed protection via per-row
    workspace_id check (token leak from A can't read B's pending bytes).
    Handler tests pin happy path + every 4xx/5xx mapping including
    cross-workspace + race-with-sweep.
  * chat_files.go — Upload poll-mode branch behind WithPendingUploads
    builder. Push-mode unchanged (regression-tested). Multipart parse
    + per-file sanitize + storage.Put + activity_logs row per file.
  * SanitizeFilename — Go mirror of workspace/internal_chat_uploads.py
    sanitize_filename. Tests pin parity case-by-case so canvas-emitted
    URIs stay identical regardless of which path handles the upload.
  * Comprehensive logging — every state transition (staged, fetch,
    ack, error) emits a structured log line with workspace_id +
    file_id + size + sanitized name. Phase 3 metrics will hook these.

The pendinguploads.Storage wiring is opt-in (WithPendingUploads on
ChatFilesHandler) so a binary deployed without the migration keeps the
pre-existing 422 behavior — no boot-order coupling between code roll
and schema roll.

Phase 2 (separate PR): workspace inbox extension — inbox_uploads.py
fetches via the GET endpoint, writes to /workspace/.molecule/chat-
uploads/, acks, and rewrites the URI from platform-pending: → workspace:
so the agent's existing send-attachments path needs no changes.
Phase 3: GC sweep + dashboards. Phase 4: poll-mode E2E on staging.

Tests:
  * 100% coverage on pendinguploads (sqlmock-pinned SQL drift gate).
  * Functional 100% on new handler code (uncovered branches are
    documented defensive duplicates: uuid re-parse, multipart Open
    error, Writer.Write fail — none reproducible in unit tests).
  * Push-mode + NULL delivery_mode regression tests pin no behavior
    change for existing workspaces.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 04:22:24 -07:00
Hongming Wang
6125700c39 test(e2e): plug /tmp scratch leaks in 3 shell E2E tests + add CI lint gate (RFC #2873 iter 2)
Three shell E2E tests created scratch files via `mktemp` but never
deleted them on early exit (assertion failure, SIGINT, errexit). Each
CI run leaked ~10-100 KB of /tmp into the runner; over ~200 runs/week
that's 20+ MB of accumulated cruft.

## Files

- **test_chat_attachments_e2e.sh** — was missing both trap and rm;
  added per-run TMPDIR_E2E with `trap rm -rf … EXIT INT TERM`.
- **test_notify_attachments_e2e.sh** — had a `cleanup()` for the
  workspace but didn't include the TMPF; only an unconditional
  `rm -f` at the bottom (line 233) which doesn't fire on early exit.
  Extended cleanup() to also rm the scratch + dropped the redundant
  trailing rm.
- **test_chat_attachments_multiruntime_e2e.sh** — `round_trip()`
  function had per-call `rm -f` only on the success path; failure
  paths leaked. Switched to script-level TMPDIR_E2E + trap; per-call
  rm dropped (the trap handles every return path including SIGINT).

Pattern: `mktemp -d -t prefix-XXX` for the dir, `mktemp <full-template>`
for files (portable across BSD/macOS + GNU coreutils — `-p` is
GNU-only and breaks Mac local-dev runs).

## Regression gate

New `tests/e2e/lint_cleanup_traps.sh` asserts every `*.sh` that calls
`mktemp` also has a `trap … EXIT` line in the file. Wired into the
existing Shellcheck (E2E scripts) CI step. Verified locally: passes
on the fixed state, fails-loud when one of the 3 fixes is reverted.

## Verification

  - shellcheck --severity=warning clean on all 4 touched files
  - lint_cleanup_traps.sh passes on the post-fix tree (6 mktemp users,
    all have EXIT trap)
  - Negative test: revert one fix → lint exits 1 with file:line +
    suggested fix pattern in the error message (CI-grokkable
    ::error file=… annotation)
  - Trap fires on SIGTERM mid-run (smoke-tested on macOS BSD mktemp)
  - Trap fires on `exit 1` (smoke-tested)

## Bars met (7-axis)

  - SSOT: trap pattern documented in lint message (one rule, one fix)
  - Cleanup: this IS the cleanup hygiene fix
  - 100% coverage: lint catches future regressions across all
    `tests/e2e/*.sh` files, not just the 3 fixed today
  - File-split: N/A (no files split)
  - Plugin / abstract / modular: N/A (test infra, not product code)

Iteration 2 of RFC #2873.
2026-05-05 04:21:26 -07:00
Hongming Wang
89ee8e4d04
Merge pull request #2874 from Molecule-AI/refactor/default-model-for-runtime-ssot
refactor(models): consolidate per-runtime model defaults to SSOT (RFC #2873 iter 1)
2026-05-05 11:15:41 +00:00
Hongming Wang
26e2e97006 refactor(models): consolidate per-runtime model defaults to SSOT (RFC #2873 iter 1)
Two call sites — workspace_provision.go:537 and org_import.go:54 —
duplicated the same `if runtime == "claude-code"` branch deciding
the default model when the operator/agent didn't supply one. They
were copy-pasted; nothing prevented them from drifting silently.

Extract to `models.DefaultModel(runtime string) string`. Both call
sites now route through the helper. New runtimes need one entry
in DefaultModel + one assertion in TestDefaultModel — pre-fix it
required two source edits + an audit.

Foundation for the future `RuntimeConfig` interface (RFC #2873 +
task #231): once we add `ProvisioningTimeout()`, `CapabilitiesSupported()`
etc., the helper expands to per-runtime structs and `DefaultModel`
becomes one method on the interface.

## Coverage

15 unit tests pinning the exact contract:
  - claude-code → "sonnet"
  - 9 other known runtimes → universal default
  - empty + unknown → universal default (matches pre-refactor fallthrough)
  - case-sensitivity preserved (CLAUDE-CODE → universal default)

Plus invariant test: `DefaultModel` never returns "" — protects
against a future "return early on unknown" regression that would
silently break workspace creation.

## Verification

  - go build ./... clean
  - 15 model unit tests pass
  - existing handler tests untouched (no behavior change at call sites)
  - identical output to pre-refactor for every input

First iteration of the OSS-shape refactor program. Each PR meets all
7 bars (plugin/abstract/modular/SSOT/coverage/cleanup/file-split).

Refs RFC #2873.
2026-05-05 04:12:37 -07:00