molecule-core/canvas/src
Hongming Wang 892de784b3 fix: review-driven hardening of wedge detector + idle timeout + progress feed
Bundle review of pieces 1/2/3 surfaced two critical issues plus a
handful of required + optional fixes. All addressed.

Critical:

1. Migration 043 was missing 'paused' and 'hibernated' from the
   workspace_status enum. Both are real production statuses written
   by workspace_restart.go (lines 283 and 406), introduced by
   migration 029_workspace_hibernation. The original `USING
   status::workspace_status` cast would have errored mid-transaction
   on any production DB containing those values. Added both. Also
   added `SET LOCAL lock_timeout = '5s'` so the migration aborts
   instead of stalling the workspace fleet behind a slow SELECT.

2. The chat activity-feed window kept only 8 lines, and a single
   multi-tool turn (Read 5 files + Grep + Bash + Edit + delegate)
   easily flushed older context before the user could read it.
   Extracted appendActivityLine to chat/activityLog.ts with a
   20-line window AND consecutive-duplicate collapse (same tool
   on the same target twice in a row is noise, not new progress).
   5 unit tests pin the behavior.

Required:

3. The SDK wedge flag was sticky-only — a single transient
   Control-request-timeout from a flaky network blip locked the
   workspace into degraded for the whole process lifetime, even
   when the next query() would have succeeded. Added
   _clear_sdk_wedge_on_success(), called from _run_query's success
   path. The next heartbeat after a working query reports
   runtime_state empty and the platform recovers the workspace to
   online without a manual restart. New regression test.

4. _report_tool_use now sets target_id = WORKSPACE_ID for self-
   actions, matching the convention other self-logged activity
   rows use. DB consumers joining on target_id see a well-defined
   value instead of NULL.

Optional taken:

5. Tightened _WEDGE_ERROR_PATTERNS from "control request timeout"
   to "control request timeout: initialize" — suffix-anchored so a
   future SDK error on an in-flight tool-call control message
   doesn't get misclassified as the unrecoverable post-init wedge.

6. Dropped the redundant "context canceled" substring fallback in
   isUpstreamBusyError. errors.Is(err, context.Canceled) is the
   typed check; the substring would also match healthy client-side
   aborts, which we don't want classified as upstream-busy.

Verified: 1010 canvas tests + 64 Python tests + full Go suite pass;
migration applies cleanly on dev DB with all 8 enum values; reverse
migration restores TEXT.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-25 08:43:10 -07:00
..
__tests__ fix(canvas): include NEXT_PUBLIC_PLATFORM_URL in CSP connect-src 2026-04-20 12:55:03 -07:00
app merge(staging): resolve conflicts + fix 7 test regressions on top of #2061 2026-04-24 13:50:39 -07:00
components fix: review-driven hardening of wedge detector + idle timeout + progress feed 2026-04-25 08:43:10 -07:00
hooks feat(canvas+org): env preflight, EmptyState parity, shared useTemplateDeploy hook 2026-04-24 15:15:33 -07:00
lib feat(canvas+org): env preflight, EmptyState parity, shared useTemplateDeploy hook 2026-04-24 15:15:33 -07:00
store fix(dotenv,socket): review-driven hardening of .env loader + WS poll 2026-04-24 21:09:18 -07:00
stores initial commit — Molecule AI platform 2026-04-13 11:55:37 -07:00
styles feat(canvas+platform): chat attachments, model selection, deploy/delete UX 2026-04-24 13:27:51 -07:00
types feat(canvas): audit trail visualization panel (issue #753) 2026-04-17 16:03:28 +00:00
middleware.ts feat(router): /cp/* reverse-proxy to CP + same-origin canvas fetches 2026-04-20 13:01:40 -07:00