molecule-core

History

Hongming Wang 46daae1ffb fix(provision): entry log + panic recovery on workspace provision goroutines Issue #2486: 7 claude-code workspaces stuck in provisioning produced NONE of the four documented exit-path log lines in provisionWorkspaceCP — neither prepare-failed, nor start-failed, nor persist-instance-id-failed, nor success. Operators couldn't tell whether the goroutine ran at all. Add an entry log at the top of provisionWorkspaceOpts + provisionWorkspaceCP so a missing entry distinguishes "goroutine never started" from "started but exited via an unlogged path." Add logProvisionPanic at the same defer site so a panic inside either provisioner doesn't (a) crash the whole workspace-server process, taking every other tenant workspace with it, and (b) silently leave the row in `provisioning` until the 10-min sweeper fires. The recover persists status='failed' with a sanitized panic-class message via a fresh 10s context (the goroutine's own ctx may have been the one panicking). Tests pin three contracts: - no-op when no panic (otherwise every successful provision emits a spurious log line) - recovers + persists failed status on panic, with stack trace - defense-in-depth: if the persist itself fails, log it instead of leaving the operator with a recovered-panic log but no row Regression-injected by neutering the recover() body — all three tests fail until the recover + UPDATE path is restored. This is observability + resilience only, not a root-cause fix for #2486. The actual silent-drop class still needs reproduction once the tenant is on a build that includes this entry log. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>		2026-05-01 19:14:20 -07:00
..
artifacts
buildinfo	feat(deploy): verify each tenant /buildinfo matches published SHA after redeploy	2026-04-30 10:55:08 -07:00
bundle	refactor(workspace-status): typed constants + AST-based drift gate	2026-04-30 10:41:41 -07:00
channels
crypto
db	refactor(workspace-status): catch missed literal in workspace_bootstrap.go + add literal-drift gate	2026-04-30 10:51:01 -07:00
envx
events	test(handlers): introduce events.EventEmitter interface (#1814 partial)	2026-04-26 09:05:52 -07:00
handlers	fix(provision): entry log + panic recovery on workspace provision goroutines	2026-05-01 19:14:20 -07:00
imagewatch	feat(workspace-server): GHCR digest watcher closes runtime CD chain (#2114 )	2026-04-26 13:36:26 -07:00
metrics
middleware	fix(tenant-guard): allowlist /buildinfo so redeploy verifier can reach it	2026-04-30 12:54:51 -07:00
models	refactor(workspace-status): typed constants + AST-based drift gate	2026-04-30 10:41:41 -07:00
orgtoken
plugins
provisioner	refactor(handlers): widen WorkspaceHandler.provisioner to LocalProvisionerAPI interface (#2369 )	2026-04-30 09:18:16 -07:00
registry	refactor(workspace-status): typed constants + AST-based drift gate	2026-04-30 10:41:41 -07:00
router	feat(workspace-server): PUT /provider endpoint for explicit LLM provider (#196 )	2026-04-30 22:25:48 -07:00
scheduler	feat(runtime): native_scheduler skip — primitive #3 of 6	2026-04-26 22:47:00 -07:00
supervised
ws
wsauth	refactor(wsauth): extract lookupTokenByHash to dedup auth predicate across 3 callers	2026-04-30 03:11:38 -07:00