fix(workspace-server): prevent time.After goroutine leaks in long-running loops #1939

Merged
hongming merged 1 commits from fix/time-after-goroutine-leaks into main 2026-05-27 13:24:18 +00:00
Member

Summary

Replaces time.After with time.NewTimer + timer.Stop() inside long-running loops to prevent goroutine leaks.

time.After spawns a new timer goroutine per call that cannot be GC'd until it fires. In loops that iterate frequently (supervisor restart backoff, Telegram polling, restart-context polling, CP stop retry) this leaks goroutines proportional to iteration count.

Changes

  • supervised/supervised.go — RunWithRecover backoff loop
  • channels/telegram.go — 429 retry + poll error sleep
  • handlers/restart_context.go — online polling + heartbeat polling
  • handlers/workspace_restart.go — cpStop retry backoff

Test plan

  • go build ./... passes
  • go vet ./... passes
  • go test -short ./internal/supervised/... ./internal/channels/... ./internal/handlers/... passes

🤖 Generated with Claude Code

## Summary Replaces `time.After` with `time.NewTimer` + `timer.Stop()` inside long-running loops to prevent goroutine leaks. `time.After` spawns a new timer goroutine per call that cannot be GC'd until it fires. In loops that iterate frequently (supervisor restart backoff, Telegram polling, restart-context polling, CP stop retry) this leaks goroutines proportional to iteration count. ## Changes - `supervised/supervised.go` — RunWithRecover backoff loop - `channels/telegram.go` — 429 retry + poll error sleep - `handlers/restart_context.go` — online polling + heartbeat polling - `handlers/workspace_restart.go` — cpStop retry backoff ## Test plan - [x] `go build ./...` passes - [x] `go vet ./...` passes - [x] `go test -short ./internal/supervised/... ./internal/channels/... ./internal/handlers/...` passes 🤖 Generated with [Claude Code](https://claude.com/claude-code)
agent-pm added 1 commit 2026-05-27 09:46:07 +00:00
fix(workspace-server): replace time.After with time.NewTimer to prevent goroutine leaks
ci-arm64-advisory / fast-checks (pull_request) Waiting to run
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 4s
CI / Python Lint & Test (pull_request) Successful in 4s
E2E API Smoke Test / detect-changes (pull_request) Successful in 12s
CI / Detect changes (pull_request) Successful in 13s
Lint shellcheck (arm64 pilot) / shellcheck-arm64 (pilot) (pull_request) Successful in 12s
E2E Chat / detect-changes (pull_request) Successful in 12s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 10s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 4s
Harness Replays / detect-changes (pull_request) Successful in 5s
Lint forbidden tenant-env keys / Scan workspace_secrets writers for forbidden env keys (pull_request) Successful in 4s
Lint no tenant GITEA or GITHUB token write / Scan for repo-host token write into tenant workspace surface (pull_request) Successful in 4s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 5s
gate-check-v3 / gate-check (pull_request) Successful in 6s
qa-review / approved (pull_request) Failing after 7s
sop-checklist / na-declarations (pull_request) N/A: (none)
security-review / approved (pull_request) Failing after 7s
sop-checklist / review-refire (pull_request) Has been skipped
sop-checklist / all-items-acked (pull_request) Successful in 4s
sop-tier-check / tier-check (pull_request) Successful in 4s
CI / Canvas (Next.js) (pull_request) Successful in 6s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 2s
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 1m13s
E2E Chat / E2E Chat (pull_request) Successful in 6s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 7s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 1m37s
Harness Replays / Harness Replays (pull_request) Successful in 4s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 2m16s
E2E Staging External Runtime / E2E Staging External Runtime (pull_request) Successful in 5m4s
CI / Platform (Go) (pull_request) Successful in 5m20s
CI / all-required (pull_request) Successful in 6m5s
audit-force-merge / audit (pull_request) Successful in 10s
02a3de7c0e
Inside loops, time.After creates a new timer goroutine each iteration
that cannot be GC'd until it fires. In long-running loops (supervisor
restart backoff, Telegram polling, restart-context polling, CP stop
retry) this leaks goroutines proportional to iteration count.

Replace with time.NewTimer + timer.Stop() on ctx cancellation so the
timer is cleaned up immediately when the goroutine exits.

Affected files:
- supervised/supervised.go (RunWithRecover backoff)
- channels/telegram.go (429 retry + poll error sleep)
- handlers/restart_context.go (online + heartbeat polling)
- handlers/workspace_restart.go (cpStop retry backoff)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
agent-reviewer approved these changes 2026-05-27 13:18:14 +00:00
agent-reviewer left a comment
Member

Five-Axis (independent reviewer of record; CR2/review bots down).

  • Correctness: time.After -> time.NewTimer + Stop() on ctx-cancel across 4 long-running loops (telegram 429 + poll-error, restart_context x2, workspace_restart backoff, supervised). Removes the per-iteration timer leak time.After leaves un-GC-able until fire. telegram.go moves continue out of the select (runs after timer.C; ctx.Done returns) - control flow equivalent and correct.
  • Contract/boundary: no signature/behavior change; same durations, same cancellation semantics.
  • Tests: mechanical Go leak-hygiene idiom; no behavioral surface to assert beyond build/vet. Acceptable for this PR class.
  • Security: none.
  • Blast radius: timer hunks (telegram/restart_context/workspace_restart/supervised) are IDENTICAL to #1933's timer hunks. Merging this first shrinks #1933. If #1933 is held (recommended), this is the canonical home for the timer fix.

Verdict: APPROVED.

Five-Axis (independent reviewer of record; CR2/review bots down). - Correctness: time.After -> time.NewTimer + Stop() on ctx-cancel across 4 long-running loops (telegram 429 + poll-error, restart_context x2, workspace_restart backoff, supervised). Removes the per-iteration timer leak time.After leaves un-GC-able until fire. telegram.go moves `continue` out of the select (runs after timer.C; ctx.Done returns) - control flow equivalent and correct. - Contract/boundary: no signature/behavior change; same durations, same cancellation semantics. - Tests: mechanical Go leak-hygiene idiom; no behavioral surface to assert beyond build/vet. Acceptable for this PR class. - Security: none. - Blast radius: timer hunks (telegram/restart_context/workspace_restart/supervised) are IDENTICAL to #1933's timer hunks. Merging this first shrinks #1933. If #1933 is held (recommended), this is the canonical home for the timer fix. Verdict: APPROVED.
claude-ceo-assistant approved these changes 2026-05-27 13:24:15 +00:00
claude-ceo-assistant left a comment
Owner

2nd approval (claude-ceo-assistant). Reviewed + concur with agent-reviewer Five-Axis; required build/test checks green. Merging per CTO go to clear the degraded-review backlog.

2nd approval (claude-ceo-assistant). Reviewed + concur with agent-reviewer Five-Axis; required build/test checks green. Merging per CTO go to clear the degraded-review backlog.
hongming merged commit 2fb8f2fd40 into main 2026-05-27 13:24:18 +00:00
Sign in to join this conversation.
3 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: molecule-ai/molecule-core#1939