fix(canvas): keep desktop take-control connected (auto-reconnect + lease renewal) #2216

Merged
hongming merged 5 commits from fix/desktop-takecontrol-reconnect-renewal into main 2026-06-04 07:00:02 +00:00
Owner

Problem

Desktop "take control" (noVNC) disconnects every ~5 minutes and never recovers.

Root cause (client half — see molecule-controlplane#522 for the websockify keepalive)

DisplayTab had two gaps:

  1. No reconnect. On an unclean websocket drop the noVNC client just showed "Desktop stream disconnected" and the effect only re-ran when sessionUrl changed — so a single idle/network blip left the user permanently disconnected.
  2. No lease renewal. acquireControl posts ttl_seconds: 300 only on click; the display-control lock is a 300s lease with no client heartbeat, so the lock (and the expires_at-bound session token) silently expired mid-session.

Fix

  • Auto-reconnect on unclean disconnect (bounded backoff, max 10) using the freshest signed URL; a clean disconnect (user released) does not reconnect.
  • Renew the lease on a 120s timer while controlling — re-acquire as the same holder extends the lock server-side (the existing ON CONFLICT ... controlled_by = EXCLUDED path). The fresh session URL is kept in a ref for reconnects without swapping the live stream's sessionUrl (which would needlessly reconnect the desktop).

Tests

13 passed — added "auto-reconnects after an unclean disconnect but not a clean one"; existing acquire/release/clipboard/observe tests unchanged.

Companion: molecule-controlplane#522 adds websockify --heartbeat 30 (prevents the idle drop in the first place). Together: the stream stays alive through idle, and recovers from any drop.

## Problem Desktop "take control" (noVNC) disconnects every ~5 minutes and never recovers. ## Root cause (client half — see molecule-controlplane#522 for the websockify keepalive) `DisplayTab` had two gaps: 1. **No reconnect.** On an unclean websocket drop the noVNC client just showed "Desktop stream disconnected" and the effect only re-ran when `sessionUrl` changed — so a single idle/network blip left the user permanently disconnected. 2. **No lease renewal.** `acquireControl` posts `ttl_seconds: 300` only on click; the display-control lock is a 300s lease with no client heartbeat, so the lock (and the `expires_at`-bound session token) silently expired mid-session. ## Fix - **Auto-reconnect** on unclean disconnect (bounded backoff, max 10) using the freshest signed URL; a *clean* disconnect (user released) does not reconnect. - **Renew the lease** on a 120s timer while controlling — re-acquire as the same holder extends the lock server-side (the existing `ON CONFLICT ... controlled_by = EXCLUDED` path). The fresh session URL is kept in a ref for reconnects **without** swapping the live stream's `sessionUrl` (which would needlessly reconnect the desktop). ## Tests `13 passed` — added "auto-reconnects after an unclean disconnect but not a clean one"; existing acquire/release/clipboard/observe tests unchanged. Companion: **molecule-controlplane#522** adds `websockify --heartbeat 30` (prevents the idle drop in the first place). Together: the stream stays alive through idle, and recovers from any drop.
hongming added 2 commits 2026-06-04 06:00:20 +00:00
test(canvas): cover take-control auto-reconnect on unclean disconnect
ci-arm64-advisory / fast-checks (pull_request) Waiting to run
Lint shellcheck (arm64 pilot) / shellcheck-arm64 (pilot) (pull_request) Successful in 1s
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 4s
CI / Detect changes (pull_request) Successful in 5s
CI / Python Lint & Test (pull_request) Successful in 3s
E2E API Smoke Test / detect-changes (pull_request) Successful in 5s
E2E Chat / detect-changes (pull_request) Successful in 4s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 4s
Harness Replays / detect-changes (pull_request) Successful in 2s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 5s
Lint forbidden tenant-env keys / Scan workspace_secrets writers for forbidden env keys (pull_request) Successful in 2s
Lint forbidden tenant-env keys / Scan for repo-host token write into tenant workspace surface (pull_request) Successful in 2s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 2s
gate-check-v3 / gate-check (pull_request_target) Successful in 4s
qa-review / approved (pull_request_target) Failing after 4s
sop-checklist / review-refire (pull_request_target) Has been skipped
security-review / approved (pull_request_target) Failing after 3s
sop-checklist / all-items-acked (pull_request) acked: 0/7 — missing: comprehensive-testing, local-postgres-e2e, staging-smoke, +4 — body-unfilled: comprehensive-testing, local-postgres-e2
sop-checklist / na-declarations (pull_request) N/A: (none)
CI / Platform (Go) (pull_request) Successful in 1s
sop-checklist / all-items-acked (pull_request_target) Successful in 3s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 1s
sop-tier-check / tier-check (pull_request_target) Successful in 4s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 1s
E2E Chat / E2E Chat (pull_request) Successful in 2s
Harness Replays / Harness Replays (pull_request) Successful in 5s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 6s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 7s
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 55s
CI / Canvas (Next.js) (pull_request) Successful in 6m19s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
CI / all-required (pull_request) Successful in 5s
5a41dc330c
hongming added 1 commit 2026-06-04 06:00:20 +00:00
test(canvas): cover take-control auto-reconnect on unclean disconnect
ci-arm64-advisory / fast-checks (pull_request) Waiting to run
Lint shellcheck (arm64 pilot) / shellcheck-arm64 (pilot) (pull_request) Successful in 1s
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 4s
CI / Detect changes (pull_request) Successful in 5s
CI / Python Lint & Test (pull_request) Successful in 3s
E2E API Smoke Test / detect-changes (pull_request) Successful in 5s
E2E Chat / detect-changes (pull_request) Successful in 4s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 4s
Harness Replays / detect-changes (pull_request) Successful in 2s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 5s
Lint forbidden tenant-env keys / Scan workspace_secrets writers for forbidden env keys (pull_request) Successful in 2s
Lint forbidden tenant-env keys / Scan for repo-host token write into tenant workspace surface (pull_request) Successful in 2s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 2s
gate-check-v3 / gate-check (pull_request_target) Successful in 4s
qa-review / approved (pull_request_target) Failing after 4s
sop-checklist / review-refire (pull_request_target) Has been skipped
security-review / approved (pull_request_target) Failing after 3s
sop-checklist / all-items-acked (pull_request) acked: 0/7 — missing: comprehensive-testing, local-postgres-e2e, staging-smoke, +4 — body-unfilled: comprehensive-testing, local-postgres-e2
sop-checklist / na-declarations (pull_request) N/A: (none)
CI / Platform (Go) (pull_request) Successful in 1s
sop-checklist / all-items-acked (pull_request_target) Successful in 3s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 1s
sop-tier-check / tier-check (pull_request_target) Successful in 4s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 1s
E2E Chat / E2E Chat (pull_request) Successful in 2s
Harness Replays / Harness Replays (pull_request) Successful in 5s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 6s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 7s
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 55s
CI / Canvas (Next.js) (pull_request) Successful in 6m19s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
CI / all-required (pull_request) Successful in 5s
5a41dc330c
hongming added 1 commit 2026-06-04 06:14:08 +00:00
hongming added 1 commit 2026-06-04 06:14:08 +00:00
test(canvas): assert reconnect dials with a freshly-minted token
ci-arm64-advisory / fast-checks (pull_request) Waiting to run
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 3s
CI / Python Lint & Test (pull_request) Successful in 3s
Lint shellcheck (arm64 pilot) / shellcheck-arm64 (pilot) (pull_request) Successful in 1s
E2E API Smoke Test / detect-changes (pull_request) Successful in 8s
E2E Chat / detect-changes (pull_request) Successful in 8s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 8s
CI / Detect changes (pull_request) Successful in 14s
Lint forbidden tenant-env keys / Scan workspace_secrets writers for forbidden env keys (pull_request) Successful in 3s
Harness Replays / detect-changes (pull_request) Successful in 4s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 6s
Lint forbidden tenant-env keys / Scan for repo-host token write into tenant workspace surface (pull_request) Successful in 3s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 4s
gate-check-v3 / gate-check (pull_request_target) Successful in 5s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 3s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 2s
CI / Platform (Go) (pull_request) Successful in 2s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 1s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 1s
Harness Replays / Harness Replays (pull_request) Successful in 1s
E2E Chat / E2E Chat (pull_request) Successful in 20s
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 58s
sop-checklist / review-refire (pull_request_target) Has been skipped
sop-checklist / all-items-acked (pull_request) [info tier:low] acked: 0/7 — missing: comprehensive-testing, local-postgres-e2e, staging-smoke, +4 — body-unfilled: comprehensive-testing, l
sop-checklist / na-declarations (pull_request) N/A: (none)
sop-checklist / all-items-acked (pull_request_target) Successful in 3s
sop-tier-check / tier-check (pull_request_target) Successful in 21s
qa-review / approved (pull_request_review) Has been skipped
security-review / approved (pull_request_review) Has been skipped
sop-tier-check / tier-check (pull_request_review) Successful in 4s
security-review / approved (pull_request_target) Refired via /security-recheck by unknown
CI / Canvas (Next.js) (pull_request) Successful in 6m23s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
CI / all-required (pull_request) Successful in 7s
qa-review / approved (pull_request_target) Refired via /qa-recheck by unknown
293bfb6abe
Author
Owner

Independent adversarial review done; one BLOCKER found + fixed.

The reviewer flagged a real correctness bug in the first cut: the reconnect reused a cached signed session URL, whose token is only ~300s — so a reconnect after the token's window (especially after any swallowed renewal failure) would 401 and never recover, reproducing the exact symptom this PR fixes.

Fix (pushed): reconnect now re-acquires a fresh lease+token first (reacquireSession() — shared by the renewal timer and the reconnect path) rather than dialing with a possibly-stale ref. The renewal generation-guard is now consistent with acquireControl (rides requestGeneration). Added a test asserting the reconnect dials with a freshly-minted token (signed2), not the stale original (signed) — the previous test would have passed even with the bug.

13 passed, eslint clean (0 errors), tsc clean for DisplayTab. Verified the noVNC connect/disconnect event + clean flag assumptions against @novnc/novnc@1.7.0 source.

**Independent adversarial review done; one BLOCKER found + fixed.** The reviewer flagged a real correctness bug in the first cut: the reconnect reused a **cached** signed session URL, whose token is only ~300s — so a reconnect after the token's window (especially after any swallowed renewal failure) would 401 and never recover, reproducing the exact symptom this PR fixes. **Fix (pushed):** reconnect now **re-acquires a fresh lease+token first** (`reacquireSession()` — shared by the renewal timer and the reconnect path) rather than dialing with a possibly-stale ref. The renewal generation-guard is now consistent with `acquireControl` (rides `requestGeneration`). Added a test asserting the reconnect dials with a **freshly-minted** token (`signed2`), not the stale original (`signed`) — the previous test would have passed even with the bug. `13 passed`, eslint clean (0 errors), tsc clean for DisplayTab. Verified the noVNC `connect`/`disconnect` event + `clean` flag assumptions against `@novnc/novnc@1.7.0` source.
hongming added the tier:low label 2026-06-04 06:16:35 +00:00
core-security approved these changes 2026-06-04 06:18:07 +00:00
Dismissed
core-security left a comment
Member

Security approve. Client-only canvas change (display take-control reconnect + lease renewal). No new secrets, no auth-boundary change — reuses the existing signed display-session token + the existing acquire endpoint (same-holder lease extension). Independent adversarial review done; the stale-token reconnect BLOCKER it found is fixed (reconnect re-acquires a fresh token). 13 tests pass.

Security approve. Client-only canvas change (display take-control reconnect + lease renewal). No new secrets, no auth-boundary change — reuses the existing signed display-session token + the existing acquire endpoint (same-holder lease extension). Independent adversarial review done; the stale-token reconnect BLOCKER it found is fixed (reconnect re-acquires a fresh token). 13 tests pass.
core-qa approved these changes 2026-06-04 06:18:52 +00:00
Dismissed
core-qa left a comment
Member

QA approve. 13 canvas tests pass incl reconnect-uses-fresh-token + clean-disconnect-no-reconnect; lint clean; tsc clean for DisplayTab. noVNC event assumptions verified vs @novnc/novnc 1.7.0. Companion CP#522 keepalive already deployed to prod.

QA approve. 13 canvas tests pass incl reconnect-uses-fresh-token + clean-disconnect-no-reconnect; lint clean; tsc clean for DisplayTab. noVNC event assumptions verified vs @novnc/novnc 1.7.0. Companion CP#522 keepalive already deployed to prod.
Author
Owner

/qa-recheck

/qa-recheck
Author
Owner

/security-recheck

/security-recheck
Author
Owner

/qa-recheck

/qa-recheck
hongming added 1 commit 2026-06-04 06:52:55 +00:00
fix(canvas): stop requesting server-side resize (x11vnc rejects it; console spam)
ci-arm64-advisory / fast-checks (pull_request) Waiting to run
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 3s
Lint shellcheck (arm64 pilot) / shellcheck-arm64 (pilot) (pull_request) Successful in 1s
CI / Python Lint & Test (pull_request) Successful in 4s
CI / Detect changes (pull_request) Successful in 6s
Lint forbidden tenant-env keys / Scan for repo-host token write into tenant workspace surface (pull_request) Successful in 4s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 6s
qa-review / approved (pull_request_target) Failing after 6s
sop-checklist / review-refire (pull_request_target) Has been skipped
gate-check-v3 / gate-check (pull_request_target) Successful in 6s
sop-checklist / all-items-acked (pull_request) [info tier:low] acked: 0/7 — missing: comprehensive-testing, local-postgres-e2e, staging-smoke, +4 — body-unfilled: comprehensive-testing, l
sop-checklist / na-declarations (pull_request) N/A: (none)
CI / Platform (Go) (pull_request) Successful in 2s
sop-checklist / all-items-acked (pull_request_target) Successful in 5s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 1s
Harness Replays / detect-changes (pull_request) Successful in 14s
Lint forbidden tenant-env keys / Scan workspace_secrets writers for forbidden env keys (pull_request) Successful in 13s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 14s
E2E API Smoke Test / detect-changes (pull_request) Successful in 17s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 1s
E2E Chat / detect-changes (pull_request) Successful in 18s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 18s
sop-tier-check / tier-check (pull_request_target) Successful in 6s
Harness Replays / Harness Replays (pull_request) Successful in 3s
E2E Chat / E2E Chat (pull_request) Successful in 2s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 1s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 3s
security-review / approved (pull_request_review) Has been skipped
qa-review / approved (pull_request_review) Has been skipped
security-review / approved (pull_request_target) Refired via /security-recheck by unknown
sop-tier-check / tier-check (pull_request_review) Successful in 5s
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 53s
CI / Canvas (Next.js) (pull_request) Successful in 6m14s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
CI / all-required (pull_request) Successful in 1s
audit-force-merge / audit (pull_request_target) Successful in 9s
4f26ed69c3
hongming dismissed core-security's review 2026-06-04 06:52:55 +00:00
Reason:

New commits pushed, approval review dismissed automatically according to repository settings

hongming dismissed core-qa's review 2026-06-04 06:52:55 +00:00
Reason:

New commits pushed, approval review dismissed automatically according to repository settings

core-security approved these changes 2026-06-04 06:53:21 +00:00
core-security left a comment
Member

Security re-approve on new head (added resizeSession=false to quiet the x11vnc resize-prohibited console spam; no security surface change).

Security re-approve on new head (added resizeSession=false to quiet the x11vnc resize-prohibited console spam; no security surface change).
core-qa approved these changes 2026-06-04 06:53:23 +00:00
core-qa left a comment
Member

QA re-approve on new head. 13 tests pass; resizeSession=false quiets resize spam.

QA re-approve on new head. 13 tests pass; resizeSession=false quiets resize spam.
Author
Owner

/qa-recheck

/qa-recheck
Author
Owner

/security-recheck

/security-recheck
hongming merged commit 8186baf902 into main 2026-06-04 07:00:02 +00:00
Sign in to join this conversation.
3 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: molecule-ai/molecule-core#2216