feat(plugins): atomic install — stage → snapshot → swap → rollback #114

Closed
opened 2026-05-08 14:52:59 +00:00 by claude-ceo-assistant · 1 comment

Problem

Plugin install today writes directly into <config_path>/plugins/<name>/. If the install fails partway through (network glitch mid-fetch, a single file write hits an error, manifest validation fails after some files already landed), the workspace is left with a half-written plugin. Rollback = manual re-install of the previous version, but we have no record of what the previous version was.

For Reno-Stars: a botched update could brick their workspace's plugin until manual ops intervention.

Proposed approach

Stage→swap→rollback pattern, mirroring the .complete cache-marker pattern from the !external resolver work (PR #107 hardening):

  1. Stage — fetch new plugin into <config_path>/plugins/.staging/<name>.<sha>/.
  2. Validate — fully validate manifest, schema, file integrity, runtime compatibility BEFORE touching the live dir.
  3. Snapshot — copy current <plugins>/<name>/<plugins>/.previous/<name>.<old-sha>/ (atomic rename + symlink).
  4. Swap — rename staged dir into place; write .complete marker only after.
  5. Rollback — if Step 4 or any post-swap validation fails, swap previous-snapshot back. If even rollback fails, log loudly + leave workspace in known-bad state for operator (better than silent corruption).
  6. GC — periodic sweeper deletes .previous/<name>.<sha>/ snapshots older than N days.

Acceptance criteria

  • Install path uses staging dir for fetch + validation
  • Swap is atomic (tmp dir → rename), with .complete marker written last
  • Previous-version snapshot kept for rollback
  • Failure on Step 2 (validate): live dir untouched
  • Failure on Step 4 (swap): previous snapshot restored automatically
  • Tests: simulated mid-fetch failure, mid-swap failure, validate-fail
  • Cleanup sweeper for old snapshots

Out of scope

  • Cross-plugin atomicity (if installing 5 plugins, each is independent)
  • Persistent rollback log table (file system snapshots are sufficient for 1 generation back)

Refs

  • !external resolver .complete marker pattern (PR #107 hardening) — same shape
  • molecule-core#TBD — version subscription (#4)
  • Reno-Stars safety concern
## Problem Plugin install today writes directly into `<config_path>/plugins/<name>/`. If the install fails partway through (network glitch mid-fetch, a single file write hits an error, manifest validation fails after some files already landed), the workspace is left with a half-written plugin. Rollback = manual re-install of the previous version, but we have no record of what the previous version was. For Reno-Stars: a botched update could brick their workspace's plugin until manual ops intervention. ## Proposed approach Stage→swap→rollback pattern, mirroring the `.complete` cache-marker pattern from the !external resolver work (PR #107 hardening): 1. **Stage** — fetch new plugin into `<config_path>/plugins/.staging/<name>.<sha>/`. 2. **Validate** — fully validate manifest, schema, file integrity, runtime compatibility BEFORE touching the live dir. 3. **Snapshot** — copy current `<plugins>/<name>/` → `<plugins>/.previous/<name>.<old-sha>/` (atomic rename + symlink). 4. **Swap** — rename staged dir into place; write `.complete` marker only after. 5. **Rollback** — if Step 4 or any post-swap validation fails, swap previous-snapshot back. If even rollback fails, log loudly + leave workspace in known-bad state for operator (better than silent corruption). 6. **GC** — periodic sweeper deletes `.previous/<name>.<sha>/` snapshots older than N days. ## Acceptance criteria - Install path uses staging dir for fetch + validation - Swap is atomic (tmp dir → rename), with `.complete` marker written last - Previous-version snapshot kept for rollback - Failure on Step 2 (validate): live dir untouched - Failure on Step 4 (swap): previous snapshot restored automatically - Tests: simulated mid-fetch failure, mid-swap failure, validate-fail - Cleanup sweeper for old snapshots ## Out of scope - Cross-plugin atomicity (if installing 5 plugins, each is independent) - Persistent rollback log table (file system snapshots are sufficient for 1 generation back) ## Refs - !external resolver `.complete` marker pattern (PR #107 hardening) — same shape - molecule-core#TBD — version subscription (#4) - Reno-Stars safety concern
Author
Owner

Done — PR #120 merged. EIC (SaaS) path follow-up filed mentally; will surface as a new issue when scope is clear (depends on what we observe from prod docker path soak first).

Done — PR #120 merged. EIC (SaaS) path follow-up filed mentally; will surface as a new issue when scope is clear (depends on what we observe from prod docker path soak first).
Sign in to join this conversation.
No Milestone
No project
No Assignees
1 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: molecule-ai/molecule-core#114
No description provided.