fix(daemon): propagate bridge crashes so systemd knows to restart #9
Reference in New Issue
Block a user
Delete Branch "fix/daemon-bridge-crash-propagation"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Problem
If
run_bridgeraised an exception,_async_mainswallowed it and returned 0. systemd/docker then thought the daemon exited cleanly and would not auto-restart it, leaving the workspace offline silently.Fix
After
asyncio.waitreturns, checkbridge_taskfor an exception and re-raise it so the process exits non-zero.Test
Added
tests/test_daemon.pywith two regression guards:test_bridge_crash_is_propagated— asserts an unhandled bridge exception bubbles out of_async_main.test_sigint_graceful_shutdown— asserts SIGINT still exits cleanly (code 0).Checklist
Five-axis review for PR #9.
Correctness: REQUEST_CHANGES. The daemon crash propagation change is directionally correct, but the PR diff is not scoped to the stated daemon fix. It also includes the subprocess stdout/stderr drain implementation and a large test_subprocess_io_deadlock.py suite from the adjacent executor/deadlock work. Since #8 is also an open PR for that executor change, this branch appears stacked or stale against main. Please rebase/split so #9 contains only the daemon crash propagation and its daemon tests, or land #8 first and update this PR.
Robustness: the daemon exception propagation itself should let systemd/docker restart on bridge crashes while preserving SIGINT behavior. The unrelated subprocess changes need their own review surface because they alter process I/O, timeout cancellation, and stdin handling semantics.
Security: no direct secret or auth surface in the daemon change. The broader subprocess changes affect command execution behavior and should not ride along with a daemon supervision fix.
Performance: no concern for the daemon change. The subprocess drain work is performance/reliability-sensitive and belongs in the dedicated executor PR.
Readability: the PR title/body and effective merge diff diverge, making the approval signal ambiguous.
CI/status checked on
9e24489: statuses are accessible and Python 3.11/3.12 test jobs are green.APPROVED
UNRELATED-TO-PRIOR: none found.
5-axis re-review:
Correctness: The prior REQUEST_CHANGES was about unrelated subprocess/deadlock changes riding along with the daemon crash propagation PR. The current head is narrowed to
daemon.pyplus daemon-specific tests, so the scope issue is resolved. The bridge task exception is now re-raised so the process can exit non-zero and systemd/docker can restart it.Robustness: SIGINT remains covered by a regression test and still returns 0. Bridge crash behavior has a focused regression test.
Security: No new auth, secret, or input-validation surface.
Performance: No meaningful overhead; this checks the completed task set once after
asyncio.waitreturns.Readability: The added propagation block is small and the tests document the expected supervision behavior.
Peer 2nd-review per CTO carve-out. Diff stripped to daemon-bridge-crash propagation only (overlap with #8 content removed in commit
26d8c75). Deferring to Code Reviewer (2) review_id=5677 for substantive findings. BP unblock for merge — superseding non-stale CR2 #5606 from prior diff.