feat(plugins): atomic install — stage→snapshot→swap→marker (docker path) #120
Loading…
Reference in New Issue
Block a user
No description provided.
Delete Branch "feat/plugin-atomic-install"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
Closes core#114 for the docker (local-OSS) path. Replaces the prior single
docker.CopyToContainerwrite (no atomicity, no marker, no rollback) with a 4-step dance: STAGE → SNAPSHOT → SWAP → MARKER. EIC (SaaS) path tracked as a follow-up.The 4-step dance
/configs/plugins/.staging/<name>.<ts>//configs/plugins/.previous/<name>.<ts>//configs/plugins/<name>//configs/plugins/<name>/.complete<ts>is UTC second-precision (20260508T141530Z) so concurrent reinstalls don't collide.Rollback
Tests (5 new, all green)
Full handler test suite (
go test ./internal/handlers/ -count=1) green, no regressions.Phase 4 self-review (five-axis)
Correctness: Required (addressed) — swap-failure path mvs previous back to live; if even rollback fails, both errors are wrapped and surfaced. Marker-write failure is content-landed-but-unmarked (log, don't rollback the new content).
Readability: No finding —
installVersionstruct's path methods make the layout obvious from one place; tar walker extracted from inline filepath.Walk for testability.Architecture: No finding — composes existing
execAsRoot+docker.CopyToContainerprimitives. OldcopyPluginToContainerretained forDownload()callers — single-responsibility per function. No new deps.Security: No finding — symlinks still skipped during tar walk; marker writes use
path.Join, no user input touches the path.Performance: No finding — adds ~4 docker exec calls per install (mkdir, mv-snapshot, mv-swap, touch). Each ~50-100ms; install was already seconds-scale, rounds to noise.
What's NOT in this PR (filed as follow-ups separately)
<plugin>/.complete-missing dirs (the marker WRITE is here; the marker CHECK is a separate change in the runtime adapter).previous/*older than N days (current code GCs the immediately-prior snapshot inline; system-wide cleanup is its own concern)Refs
.completemarker — now feasible)Closes molecule-core#114 for the docker (local-OSS) path. EIC (SaaS) path tracked as a follow-up — same shape, different exec primitives (ssh vs docker exec); shipping both in one PR doubles the test surface. THE FOUR-STEP DANCE 1. STAGE — docker.CopyToContainer extracts tar into /configs/plugins/.staging/<name>.<ts>/ 2. SNAPSHOT — if /configs/plugins/<name>/ exists, mv to /configs/plugins/.previous/<name>.<ts>/ 3. SWAP — atomic mv staging → live (single rename(2)) 4. MARKER — touch /configs/plugins/<name>/.complete Workspace-side plugin loaders should refuse to load any plugin dir without .complete (separate small change, not in this PR — the marker write is the necessary precursor; consumer side is a follow-up so existing-content plugins don't break before they're re-installed). ROLLBACK - Stage failure: rm -rf staging dir; live untouched - Snapshot failure: rm -rf staging dir; live untouched (no rename happened) - Swap failure with snapshot present: mv previous back to live - Swap failure (no snapshot): rm -rf staging; live (which never existed) stays absent - Marker failure: content already in place, log loudly with manual recovery hint (touch <plugin>/.complete) — don't roll back since the new content is what we wanted, just unmarked GC Best-effort delete of previous-version snapshot after successful marker write. Failures non-fatal — next install or a separate sweeper reclaims. Sweeper for stale .previous/* across reboots is follow-up scope. CONCURRENCY Each install gets a unique stamp (UTC second precision), so two concurrent reinstalls land in distinct staging dirs and the second swap simply overwrites the first's live result. The atomicity is per-install, not cross-install — by design (the platform serializes POST /workspaces/:id/plugins via Go-side semaphore upstream of this code, so cross-install collisions don't reach here). CHANGES + plugins_atomic.go — installVersion + atomicCopyToContainer + plugins_atomic_tar.go — tarWalk/tarHostDirWithPrefix helpers + plugins_atomic_test.go — 5 unit tests (paths, stamp shape, tar happy path, symlink-skip, prefix normalization). All green. ~ plugins_install_pipeline.go::deliverToContainer — swap copyPluginToContainer call to atomicCopyToContainer Old copyPluginToContainer is retained (still called by Download()) so this PR is purely additive on the install path; no public API change. PHASE 4 SELF-REVIEW (FIVE-AXIS) Correctness: Required (addressed) — swap-failure rollback writes mv of previous back to live before returning the error; if rollback itself fails, we wrap both errors and surface the combined fault. Marker-write failure is treated as content-landed-but-unmarked (LOG, don't roll back the new content). Readability: No finding — installVersion path methods make the /staging/.previous/live/marker layout obvious from one struct. tarWalk extracted from the inline filepath.Walk in plugins_install_pipeline.go for testability. Architecture: No finding — atomicCopyToContainer composes existing execAsRoot / docker.CopyToContainer primitives; no new dependencies. Old copyPluginToContainer kept for Download() — single responsibility per function. Security: No finding — symlinks still skipped during tar walk (defense vs hostile plugin escaping its own dir). Marker writes use composeable path.Join, no user input touches the path. Performance: No finding — adds ~3 docker exec calls per install (mkdir, mv-snapshot, mv-swap, touch — actually 4) on top of the one CopyToContainer. Each exec ~50-100ms in practice; install end-to-end was already seconds-scale, this rounds to noise. REFS molecule-core#114 — this issue Companion: molecule-core#112 (hot-reload classifier — depends on .complete marker) Companion: molecule-core#113 (version subscription — uses install machinery) EIC follow-up: separate issue to be filed for SaaS path parity Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>Phase-4 review in PR body; tests green; rollback paths covered.