molecule-core

core-lead/molecule-core

Fork 0

forked from molecule-ai/molecule-core

Commit Graph

Author	SHA1	Message	Date
Hongming Wang	39931acd9c	fix(inbox-uploads): cancel BatchFetcher futures on wait_all timeout The deadline contract was incomplete: wait_all logged the timeout but close() then called executor.shutdown(wait=True), which blocked on the leaked workers — undoing the user-facing timeout. The inbox poll loop would stall indefinitely on a hung /content fetch instead of returning to chat-message processing. Fix: wait_all now flips self._timed_out and cancels queued (not-yet- started) futures; close() reads that flag and switches to shutdown(wait=False, cancel_futures=True) on the timeout path. Currently-running workers can't be interrupted by Python's threading model, but they're now detached daemons whose blocking httpx call no longer gates the next poll. Healthy path (no timeout) keeps the existing drain-and-wait so a still-queued ack POST isn't dropped mid-write. Two new tests pin both legs of the contract end-to-end: - close-after-timeout-doesn't-block: hung worker, wait_all(0.05s) fires the timeout, close() returns in <1s instead of waiting ~5s for the worker to come back. - close-without-timeout-still-drains: 2 slow workers, wait_all completes cleanly, close() drains both ack POSTs. Resolves the BatchFetcher timeout-cancellation finding from the post-merge five-axis review of Phase 5b. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-05 12:34:41 -07:00
Hongming Wang	30fb507165	feat(poll-upload): phase 5b — concurrent BatchFetcher + httpx client reuse Resolves the two remaining findings from the Phase 1-4 retrospective review (the Python-side counterparts to phase 5a): 1. Important — inbox_uploads.fetch_and_stage blocked the inbox poll loop synchronously per row. A user dragging 4 files into chat at once would stall the poller for 4× per-fetch latency before the chat message reached the agent. Add BatchFetcher: a thread-pool wrapper (default 4 workers) that submits fetches concurrently and exposes wait_all() as the barrier the inbox loop calls before processing the chat-message row that references the uploads. The drain barrier is the correctness invariant: rewrite_request_body must observe a populated URI cache when it walks the chat-message row's parts. _poll_once now drains the BatchFetcher inline before the first non-upload row, AND at end-of-batch (case: batch contains only upload rows; the corresponding chat message arrives in a later poll, but the future-poll-races-current-fetch race is closed). 2. Nit — fetch_and_stage created two httpx.Client instances per row (one for GET /content, one for POST /ack). Refactor so a single client serves both calls. When called from BatchFetcher, the batch-shared client serves every row's GET + ack — so the second fetch reuses the TCP+TLS handshake from the first. Comprehensive tests: - 13 new inbox_uploads tests: - fetch_and_stage with supplied client: zero httpx.Client constructions, GET+POST through the same client, caller's client not closed (lifecycle owned by caller). - fetch_and_stage without supplied client: exactly one httpx.Client constructed (was 2 pre-fix), closed on the way out. - BatchFetcher: 3 rows × 120ms = parallel completion < 250ms (vs. ~360ms serial), URI cache hot when wait_all returns, per-row failure isolation, single-client reuse across all submits, idempotent close, submit-after-close raises, owned-vs-supplied client lifecycle, no-op wait_all on empty batch, graceful httpx-missing degradation. - 3 new inbox tests: - poll_once drains uploads before processing the chat-message row (in-place mutation of row['request_body'] proves the URI was rewritten BEFORE message_from_activity returned). - poll_once with only upload rows still drains at end-of-batch. - poll_once with no upload rows never constructs a BatchFetcher (zero overhead on the no-upload happy path). 133 total inbox + inbox_uploads tests pass; 0 regressions. Closes the chat-upload poll-mode-perf gap end-to-end. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-05 11:26:55 -07:00
Hongming Wang	f81813f708	feat(rfc): poll-mode chat upload — phase 2 workspace inbox extension Workspace-side fetcher for the platform-staged chat uploads written by phase 1. Stack atop feat/poll-mode-chat-upload-phase1. Wire shape — the platform writes one activity_logs row per uploaded file with `activity_type=a2a_receive`, `method=chat_upload_receive`, and a `request_body={file_id, name, mimeType, size, uri}` carrying the synthetic `platform-pending:<wsid>/<fid>` URI. Workspace-side flow (new module workspace/inbox_uploads.py): 1. Fetch via GET /workspaces/:id/pending-uploads/:file_id/content 2. Stage to /workspace/.molecule/chat-uploads/<32-hex>-<sanitized> (same on-disk shape as internal_chat_uploads.py — agent-side URI resolvers see no contract change) 3. POST /workspaces/:id/pending-uploads/:file_id/ack 4. Cache `platform-pending: → workspace:` so the eventual chat message that REFERENCES the upload (separate, later activity row) gets URI-rewritten before the agent sees it. Inbox poller extension (workspace/inbox.py): - is_chat_upload_row(row) discriminator on `method` - upload-receive rows trigger fetch_and_stage and are NOT enqueued as InboxMessages (they're side-effect rows, not chat messages) - cursor advances past them regardless of fetch outcome — a permanent /content failure must not stall the cursor and block real chat traffic - message_from_activity calls rewrite_request_body to swap platform-pending: URIs to local workspace: URIs in subsequent chat messages' file parts. Cache miss leaves the URI untouched so the agent surfaces an unresolvable URI rather than the inbox silently dropping the part. Filename sanitization mirrors workspace-server/internal/handlers /chat_files.go::SanitizeFilename and workspace/internal_chat_uploads .py::sanitize_filename — pinned by the existing parity test suites. Coverage: 100% on inbox_uploads.py; the inbox.py extension is fully covered by three new tests in test_inbox.py (skip-from-queue, cursor-advance-past-broken-fetch, URI-rewrite ordering).	2026-05-05 04:39:02 -07:00

Author

SHA1

Message

Date

Hongming Wang

39931acd9c

fix(inbox-uploads): cancel BatchFetcher futures on wait_all timeout

The deadline contract was incomplete: wait_all logged the timeout but
close() then called executor.shutdown(wait=True), which blocked on
the leaked workers — undoing the user-facing timeout. The inbox poll
loop would stall indefinitely on a hung /content fetch instead of
returning to chat-message processing.

Fix: wait_all now flips self._timed_out and cancels queued (not-yet-
started) futures; close() reads that flag and switches to
shutdown(wait=False, cancel_futures=True) on the timeout path.
Currently-running workers can't be interrupted by Python's threading
model, but they're now detached daemons whose blocking httpx call
no longer gates the next poll.

Healthy path (no timeout) keeps the existing drain-and-wait so a
still-queued ack POST isn't dropped mid-write.

Two new tests pin both legs of the contract end-to-end:
- close-after-timeout-doesn't-block: hung worker, wait_all(0.05s)
  fires the timeout, close() returns in <1s instead of waiting ~5s
  for the worker to come back.
- close-without-timeout-still-drains: 2 slow workers, wait_all
  completes cleanly, close() drains both ack POSTs.

Resolves the BatchFetcher timeout-cancellation finding from the
post-merge five-axis review of Phase 5b.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-05 12:34:41 -07:00

Hongming Wang

30fb507165

feat(poll-upload): phase 5b — concurrent BatchFetcher + httpx client reuse

Resolves the two remaining findings from the Phase 1-4 retrospective
review (the Python-side counterparts to phase 5a):

1. Important — inbox_uploads.fetch_and_stage blocked the inbox poll
   loop synchronously per row. A user dragging 4 files into chat at
   once would stall the poller for 4× per-fetch latency before the
   chat message reached the agent. Add BatchFetcher: a thread-pool
   wrapper (default 4 workers) that submits fetches concurrently and
   exposes wait_all() as the barrier the inbox loop calls before
   processing the chat-message row that references the uploads.

   The drain barrier is the correctness invariant: rewrite_request_body
   must observe a populated URI cache when it walks the chat-message
   row's parts. _poll_once now drains the BatchFetcher inline before
   the first non-upload row, AND at end-of-batch (case: batch contains
   only upload rows; the corresponding chat message arrives in a later
   poll, but the future-poll-races-current-fetch race is closed).

2. Nit — fetch_and_stage created two httpx.Client instances per row
   (one for GET /content, one for POST /ack). Refactor so a single
   client serves both calls. When called from BatchFetcher, the
   batch-shared client serves every row's GET + ack — so the second
   fetch reuses the TCP+TLS handshake from the first.

Comprehensive tests:

- 13 new inbox_uploads tests:
  - fetch_and_stage with supplied client: zero httpx.Client
    constructions, GET+POST through the same client, caller's client
    not closed (lifecycle owned by caller).
  - fetch_and_stage without supplied client: exactly one
    httpx.Client constructed (was 2 pre-fix), closed on the way out.
  - BatchFetcher: 3 rows × 120ms = parallel completion < 250ms
    (vs. ~360ms serial), URI cache hot when wait_all returns,
    per-row failure isolation, single-client reuse across all
    submits, idempotent close, submit-after-close raises,
    owned-vs-supplied client lifecycle, no-op wait_all on empty
    batch, graceful httpx-missing degradation.

- 3 new inbox tests:
  - poll_once drains uploads before processing the chat-message row
    (in-place mutation of row['request_body'] proves the URI was
    rewritten BEFORE message_from_activity returned).
  - poll_once with only upload rows still drains at end-of-batch.
  - poll_once with no upload rows never constructs a BatchFetcher
    (zero overhead on the no-upload happy path).

133 total inbox + inbox_uploads tests pass; 0 regressions.

Closes the chat-upload poll-mode-perf gap end-to-end.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-05 11:26:55 -07:00

Hongming Wang

f81813f708

feat(rfc): poll-mode chat upload — phase 2 workspace inbox extension

Workspace-side fetcher for the platform-staged chat uploads written by
phase 1. Stack atop feat/poll-mode-chat-upload-phase1.

Wire shape — the platform writes one activity_logs row per uploaded
file with `activity_type=a2a_receive`, `method=chat_upload_receive`,
and a `request_body={file_id, name, mimeType, size, uri}` carrying
the synthetic `platform-pending:<wsid>/<fid>` URI.

Workspace-side flow (new module workspace/inbox_uploads.py):

  1. Fetch via GET /workspaces/:id/pending-uploads/:file_id/content
  2. Stage to /workspace/.molecule/chat-uploads/<32-hex>-<sanitized>
     (same on-disk shape as internal_chat_uploads.py — agent-side
     URI resolvers see no contract change)
  3. POST /workspaces/:id/pending-uploads/:file_id/ack
  4. Cache `platform-pending: → workspace:` so the eventual chat
     message that REFERENCES the upload (separate, later activity row)
     gets URI-rewritten before the agent sees it.

Inbox poller extension (workspace/inbox.py):

  - is_chat_upload_row(row) discriminator on `method`
  - upload-receive rows trigger fetch_and_stage and are NOT enqueued
    as InboxMessages (they're side-effect rows, not chat messages)
  - cursor advances past them regardless of fetch outcome — a
    permanent /content failure must not stall the cursor and block
    real chat traffic
  - message_from_activity calls rewrite_request_body to swap
    platform-pending: URIs to local workspace: URIs in subsequent
    chat messages' file parts. Cache miss leaves the URI untouched
    so the agent surfaces an unresolvable URI rather than the inbox
    silently dropping the part.

Filename sanitization mirrors workspace-server/internal/handlers
/chat_files.go::SanitizeFilename and workspace/internal_chat_uploads
.py::sanitize_filename — pinned by the existing parity test suites.

Coverage: 100% on inbox_uploads.py; the inbox.py extension is fully
covered by three new tests in test_inbox.py (skip-from-queue,
cursor-advance-past-broken-fetch, URI-rewrite ordering).

2026-05-05 04:39:02 -07:00

3 Commits