molecule-core

History

Hongming Wang 3cdb67f27e Some checks failed CI / Shellcheck (E2E scripts) (pull_request) Successful in 2s Details Harness Replays / detect-changes (pull_request) Successful in 8s Details Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 4s Details CI / Detect changes (pull_request) Successful in 6s Details Handlers Postgres Integration / detect-changes (pull_request) Successful in 4s Details E2E API Smoke Test / detect-changes (pull_request) Successful in 6s Details E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 6s Details Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 4s Details Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 5s Details CI / Python Lint & Test (pull_request) Successful in 3s Details Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 5s Details E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 6s Details Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 4s Details CI / Canvas (Next.js) (pull_request) Successful in 18s Details CodeQL / Analyze (${{ matrix.language }}) (go) (pull_request) Failing after 43s Details CI / Canvas Deploy Reminder (pull_request) Has been skipped Details CodeQL / Analyze (${{ matrix.language }}) (python) (pull_request) Failing after 1m19s Details CodeQL / Analyze (${{ matrix.language }}) (javascript-typescript) (pull_request) Failing after 1m22s Details Harness Replays / Harness Replays (pull_request) Failing after 37s Details CI / Platform (Go) (pull_request) Failing after 2m33s Details E2E API Smoke Test / E2E API Smoke Test (pull_request) Failing after 4m48s Details fix(workspace-server): CP orphan sweeper closes deprovision split-write race (#2989 ) The deprovision path marks `workspaces.status='removed'` BEFORE calling the controlplane DELETE. If that CP call fails (transient 5xx, network hiccup, AWS provider error), the DB row stays at 'removed' with `instance_id` populated and there's no retry — the EC2 lives forever. 9 prod orphans accumulated over 3 days under this bug. Adds a SaaS-mode counterpart to the existing Docker `orphan_sweeper`: - 60s tick (matches the Docker sweeper cadence) - LIMIT 100 per cycle so a sustained CP outage drains over multiple cycles without blowing the request timeout - Re-issues `cpProv.Stop` for any workspace at status='removed' with a non-NULL `instance_id`. Stop is idempotent (AWS terminate on already-terminated is a no-op; CP's Deprovision tolerates already- deleted DNS) so retries are safe. - On Stop success, NULLs `instance_id` so the next cycle skips the row. - On Stop failure, leaves `instance_id` populated for next cycle. The existing Docker sweeper is gated on `prov != nil`; the new sweeper is gated on `cpProv != nil`. SaaS tenants get exactly one of the two, self-hosted tenants get the Docker one — no overlap. Why this shape over option A (CP-first ordering) or B (durable outbox): the existing inline path already returns a loud 500 to the user when CP fails — the only missing piece is automatic retry, which a 60s sweeper provides without protocol changes, new tables, or new workers. ~30 LOC of production code vs. ~400 for an outbox. RFC discussion in #2989 comment chain. Tests: - 9 unit tests covering happy path, Stop failure, UPDATE failure, multiple orphans (one-fails-others-still-process), DB query error, nil-DB defense, nil-reaper short-circuit, and the boot-immediate-then- tick cadence contract. - Mutation-tested: status='running' substitution and removed-UPDATE- block both fail at least one test. Out of scope: - Backfilling the 9 named orphans — they'll heal automatically on the first sweep cycle after this lands; no manual cleanup needed. - Long-term durable-outbox architecture — separate RFC.		2026-05-06 16:43:33 -07:00
..
artifacts	chore: sync staging to main — 1188 commits, 5 conflicts resolved (#1743 )	2026-04-23 18:30:18 +00:00
buildinfo	feat(deploy): verify each tenant /buildinfo matches published SHA after redeploy	2026-04-30 10:55:08 -07:00
bundle	refactor(events): migrate 18 files to typed EventType constants (RFC #2945 PR-B-1)	2026-05-05 19:05:03 -07:00
channels	refactor(events): migrate 18 files to typed EventType constants (RFC #2945 PR-B-1)	2026-05-05 19:05:03 -07:00
crypto	chore: open-source restructure — rename dirs, remove internal files, scrub secrets	2026-04-18 00:24:44 -07:00
db	fix(bundle): markFailed sets last_sample_error + AST gate	2026-05-04 21:08:08 -07:00
envx	chore: open-source restructure — rename dirs, remove internal files, scrub secrets	2026-04-18 00:24:44 -07:00
events	feat(events): typed EventType registry — single source of truth for WS event names (RFC #2945 PR-B)	2026-05-05 16:25:38 -07:00
handlers	feat(provisioner): env-driven RegistryPrefix() for workspace template images (#6 )	2026-05-06 14:23:01 -07:00
imagewatch	feat(workspace-server): GHCR digest watcher closes runtime CD chain (#2114 )	2026-04-26 13:36:26 -07:00
memory	fix(textutil): SSOT for rune-safe string truncation, fix 3 audit-gap bugs	2026-05-05 23:01:21 -07:00
messagestore	feat(messagestore): MessageStore interface + Postgres impl (RFC #2945 PR-D)	2026-05-05 23:38:14 -07:00
metrics	feat(rfc): poll-mode chat upload — phase 3 GC sweep + observability	2026-05-05 05:00:13 -07:00
middleware	fix(tenant-guard): allowlist /buildinfo so redeploy verifier can reach it	2026-04-30 12:54:51 -07:00
models	refactor(models): consolidate per-runtime model defaults to SSOT (RFC #2873 iter 1)	2026-05-05 04:12:37 -07:00
orgtoken	fix: F1085 rm scope concat + GH#756 ValidateToken terminal guard + CI test fixes	2026-04-24 07:16:54 +00:00
pendinguploads	fix(chat-uploads): activity rows commit atomically with PutBatch	2026-05-05 21:34:28 -07:00
plugins	chore: open-source restructure — rename dirs, remove internal files, scrub secrets	2026-04-18 00:24:44 -07:00
provisioner	feat(provisioner): env-driven RegistryPrefix() for workspace template images (#6 )	2026-05-06 14:23:01 -07:00
provlog	feat(workspace-server): structured logging at provisioning boundaries	2026-05-05 12:30:11 -07:00
registry	fix(workspace-server): CP orphan sweeper closes deprovision split-write race (#2989 )	2026-05-06 16:43:33 -07:00
router	feat(messagestore): MessageStore interface + Postgres impl (RFC #2945 PR-D)	2026-05-05 23:38:14 -07:00
scheduler	fix(textutil): SSOT for rune-safe string truncation, fix 3 audit-gap bugs	2026-05-05 23:01:21 -07:00
supervised	chore: open-source restructure — rename dirs, remove internal files, scrub secrets	2026-04-18 00:24:44 -07:00
textutil	fix(textutil): SSOT for rune-safe string truncation, fix 3 audit-gap bugs	2026-05-05 23:01:21 -07:00
ws	chore: open-source restructure — rename dirs, remove internal files, scrub secrets	2026-04-18 00:24:44 -07:00
wsauth	perf(wsauth): in-process cache for platform_inbound_secret reads	2026-05-03 00:04:38 -07:00