fix(workspace): check error on external workspace status update #2045

Closed
core-be wants to merge 14 commits from fix/http-defaultclient-auth-paths into staging
Member
No description provided.
core-be changed target branch from main to staging 2026-06-01 03:40:13 +00:00
core-be changed target branch from staging to main 2026-06-01 23:16:54 +00:00
core-be changed target branch from main to staging 2026-06-01 23:23:06 +00:00
core-be changed target branch from staging to main 2026-06-02 04:10:04 +00:00
core-be changed target branch from main to staging 2026-06-02 04:10:12 +00:00
core-be added 14 commits 2026-06-02 04:10:12 +00:00
- PR #3029: approve with nit (CP orphan sweeper + registry prefix).
- PR #3033: approve with blocker — README undercounts runtimes by 1
  because #3029 adds codex but #3033 claims 8 runtimes and omits it.
- All tests pass locally.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Replaces http.DefaultClient in MCP bridge (mcp_tools.go) and CP config
fetch (cp_config.go) with dedicated clients that have transport-level
timeouts. Eliminates a fragility class where global mutable client +
missing context timeout could hang forever on dead workspaces or slow CP.

- mcpHTTPClient: 5s dial, 30s response header, 5s TLS handshake
- cp_config fetch: 10s client timeout matching context deadline

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Two production code paths used context-less sql.DB methods:

1. buildProvisionerConfig (workspace_provision.go:263) called db.DB.QueryRow
   instead of QueryRowContext(ctx, ...) — losing request cancellation.

2. queryPeerMaps (discovery.go:307) called db.DB.Query instead of
   QueryContext(ctx, ...) and did not accept a context parameter. Updated
   signature and all 4 call sites in Peers handler to pass ctx.

No behavioral change for happy path; fixes context-timeout hygiene so
slow queries can be cancelled when the HTTP client disconnects.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
workspace.go BeginTx had explicit Rollback() on every error path but
no deferred safety net. A panic (or future refactor adding an early
return) would leak the tx hold. Add defer tx.Rollback() immediately
after BeginTx — harmless no-op after Commit, critical safety net on
unexpected exits.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Two QueryContext calls in restart_context used explicit rows.Close()
after the loop. If a panic or early return occurred during iteration,
the underlying connection would leak back to the pool with an active
result set. Switch to defer rows.Close() inside each if-block for
 RAII-style cleanup.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The async goroutine that processes inbound webhooks had no recover().
A panic inside HandleInbound (e.g., from the A2A proxy or adapter layer)
would crash the entire platform process. Add defer recover() with
workspace-scoped logging so the goroutine exits cleanly and the process
stays alive.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Add panic recovery to goroutines that are NOT wrapped by supervised.RunWithRecover:
- session_auth.go: init() cache sweeper (process-level background task)
- ratelimit.go + mcp_ratelimit.go: bucket cleanup tickers
- bundle/importer.go: fire-and-forget provision start during import
- plugins_install.go: delayed restart after plugin uninstall
- terminal.go: WebSocket bridge goroutines (stdout, PTY, stdin)
- a2a_proxy.go: SSE idle watcher

A panic in any of these would previously crash the process or terminate
a subsystem silently. Now they log and swallow the panic, preserving
process stability.
Prevents bounded timer accumulation during the 30s workspace-online wait.
Each time.After created a timer that couldn't be GC'd until it fired;
with ~60 iterations this was minor but unnecessary.
Prevents timer accumulation in long-running loops:
- restart_context.go: poll loop uses NewTicker instead of time.After
- supervised.go: backoff sleep uses NewTimer with proper Stop
- telegram.go: poll loop uses NewTimer for both retryAfter and poll interval

Each time.After created a timer that couldn't be GC'd until it fired.
In long-running goroutines this caused bounded but unnecessary memory
growth.
Add error checking + logging to previously ignored db.ExecContext and
broadcaster.RecordAndBroadcast calls in bundle importer. Improves
observability when provision failures or URL updates fail silently.
Apply unchecked-error fixes from errcheck audit (#1062):
- channels/manager.go: log db.ExecContext and broadcaster errors in HandleInbound/SendOutbound; return json.Unmarshal errors in loadChannel
- channels/telegram.go: log bot.Send errors for callback ack and edit message
- handlers/approvals.go: log broadcaster, db.QueryRowContext, and db.ExecContext errors

Improves observability when secondary operations fail silently.
fix(workspace): check error on external workspace status update
cascade-list-drift-gate / check (pull_request) Waiting to run
Check migration collisions / Migration version collision check (pull_request) Waiting to run
MCP Stdio Transport Regression / MCP stdio with regular-file stdout (pull_request) Waiting to run
E2E Peer Visibility (literal MCP list_peers) / E2E Peer Visibility (pull_request) Waiting to run
E2E Staging External Runtime / E2E Staging External Runtime (pull_request) Waiting to run
E2E Staging SaaS (full lifecycle) / pr-validate (pull_request) Waiting to run
E2E Staging SaaS (full lifecycle) / E2E Staging SaaS (pull_request) Waiting to run
lint-continue-on-error-tracking / lint-continue-on-error-tracking (pull_request) Waiting to run
Lint curl status-code capture / Scan workflows for curl status-capture pollution (pull_request) Waiting to run
lint-mask-pr-atomicity / lint-mask-pr-atomicity (pull_request) Waiting to run
Lint pre-flip continue-on-error / Verify continue-on-error flips have run-log proof (pull_request) Waiting to run
lint-required-context-exists-in-bp / lint-required-context-exists-in-bp (pull_request) Waiting to run
Lint workflow YAML (Gitea-1.22.6-hostile shapes) / Lint workflow YAML for Gitea-1.22.6-hostile shapes (pull_request) Waiting to run
publish-runtime-autobump / pr-validate (pull_request) Waiting to run
publish-runtime-autobump / bump-and-tag (pull_request) Waiting to run
review-check-tests / review-check.sh regression tests (pull_request) Waiting to run
Runtime Pin Compatibility / PyPI-latest install + import smoke (pull_request) Waiting to run
Ops Scripts Tests / Ops scripts (unittest) (pull_request) Waiting to run
sop-checklist / review-refire (pull_request) Waiting to run
audit-force-merge / audit (pull_request) Waiting to run
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 13s
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 2m12s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 16s
gate-check-v3 / gate-check (pull_request) Successful in 18s
qa-review / approved (pull_request) Successful in 11s
sop-checklist / na-declarations (pull_request) N/A: (none)
security-review / approved (pull_request) Successful in 18s
sop-checklist / all-items-acked (pull_request) Successful in 12s
sop-tier-check / tier-check (pull_request) Successful in 15s
CI / Canvas Deploy Reminder (pull_request) Has been cancelled
E2E API Smoke Test / E2E API Smoke Test (pull_request) Has been cancelled
E2E Chat / E2E Chat (pull_request) Has been cancelled
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Has been cancelled
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Has been cancelled
Harness Replays / Harness Replays (pull_request) Has been cancelled
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Has been cancelled
E2E Chat / detect-changes (pull_request) Has been cancelled
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Has been cancelled
Handlers Postgres Integration / detect-changes (pull_request) Has been cancelled
Harness Replays / detect-changes (pull_request) Has been cancelled
Runtime PR-Built Compatibility / detect-changes (pull_request) Has been cancelled
CI / all-required (pull_request) Has been cancelled
CI / Detect changes (pull_request) Has been cancelled
CI / Platform (Go) (pull_request) Has been cancelled
CI / Canvas (Next.js) (pull_request) Has been cancelled
CI / Shellcheck (E2E scripts) (pull_request) Has been cancelled
CI / Python Lint & Test (pull_request) Has been cancelled
E2E API Smoke Test / detect-changes (pull_request) Has been cancelled
a7c3f6b7ed
ci: remove unused canvasUserMessage type to fix lint on staging
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 10s
CI / Detect changes (pull_request) Successful in 8s
CI / Canvas (Next.js) (pull_request) Failing after 49s
CI / all-required (pull_request) Failing after 6s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 39s
E2E API Smoke Test / detect-changes (pull_request) Successful in 10s
E2E Chat / detect-changes (pull_request) Successful in 13s
CI / Platform (Go) (pull_request) Successful in 6m9s
CI / Canvas Deploy Reminder (pull_request) Has been cancelled
Handlers Postgres Integration / detect-changes (pull_request) Successful in 6s
Harness Replays / detect-changes (pull_request) Successful in 6s
CI / Python Lint & Test (pull_request) Successful in 7m15s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 10s
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 1m26s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 8s
qa-review / approved (pull_request_target) Successful in 10s
security-review / approved (pull_request_target) Successful in 8s
Harness Replays / Harness Replays (pull_request) Successful in 7s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 1m55s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 31s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 2m42s
E2E Chat / E2E Chat (pull_request) Failing after 7m58s
sop-checklist / review-refire (pull_request_target) Has been skipped
gate-check-v3 / gate-check (pull_request_target) Successful in 3s
sop-checklist / all-items-acked (pull_request) acked: 0/7 — missing: comprehensive-testing, local-postgres-e2e, staging-smoke, +4 — body-unfilled: comprehensive-testing, local-postgres-e2
sop-checklist / na-declarations (pull_request) N/A: (none)
sop-checklist / all-items-acked (pull_request_target) Successful in 3s
sop-tier-check / tier-check (pull_request_target) Successful in 3s
lint-mask-pr-atomicity / lint-mask-pr-atomicity (pull_request) Failing after 1m10s
audit-force-merge / audit (pull_request_target) Has been skipped
85f8d78ac0
internal/handlers/a2a_proxy_helpers.go:412 had an unused struct that
causes golangci-lint `unused` failure on every PR targeting staging.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
core-be closed this pull request 2026-06-02 04:39:00 +00:00
Some required checks failed
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 10s
CI / Detect changes (pull_request) Successful in 8s
CI / Canvas (Next.js) (pull_request) Failing after 49s
CI / all-required (pull_request) Failing after 6s
Required
Details
CI / Shellcheck (E2E scripts) (pull_request) Successful in 39s
E2E API Smoke Test / detect-changes (pull_request) Successful in 10s
E2E Chat / detect-changes (pull_request) Successful in 13s
CI / Platform (Go) (pull_request) Successful in 6m9s
CI / Canvas Deploy Reminder (pull_request) Has been cancelled
Handlers Postgres Integration / detect-changes (pull_request) Successful in 6s
Harness Replays / detect-changes (pull_request) Successful in 6s
CI / Python Lint & Test (pull_request) Successful in 7m15s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 10s
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 1m26s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 8s
qa-review / approved (pull_request_target) Successful in 10s
security-review / approved (pull_request_target) Successful in 8s
Harness Replays / Harness Replays (pull_request) Successful in 7s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 1m55s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 31s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 2m42s
E2E Chat / E2E Chat (pull_request) Failing after 7m58s
sop-checklist / review-refire (pull_request_target) Has been skipped
gate-check-v3 / gate-check (pull_request_target) Successful in 3s
sop-checklist / all-items-acked (pull_request) acked: 0/7 — missing: comprehensive-testing, local-postgres-e2e, staging-smoke, +4 — body-unfilled: comprehensive-testing, local-postgres-e2
Required
Details
sop-checklist / na-declarations (pull_request) N/A: (none)
sop-checklist / all-items-acked (pull_request_target) Successful in 3s
sop-tier-check / tier-check (pull_request_target) Successful in 3s
lint-mask-pr-atomicity / lint-mask-pr-atomicity (pull_request) Failing after 1m10s
audit-force-merge / audit (pull_request_target) Has been skipped

Pull request closed

Sign in to join this conversation.
No Reviewers
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: molecule-ai/molecule-core#2045