Some checks failed
CodeQL / Analyze (${{ matrix.language }}) (go) (pull_request) Successful in 4s
CodeQL / Analyze (${{ matrix.language }}) (javascript-typescript) (pull_request) Successful in 4s
CodeQL / Analyze (${{ matrix.language }}) (python) (pull_request) Successful in 4s
Retarget main PRs to staging / Retarget to staging (pull_request) Has been skipped
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 10s
E2E API Smoke Test / detect-changes (pull_request) Successful in 13s
CI / Detect changes (pull_request) Successful in 15s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 13s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 14s
Harness Replays / detect-changes (pull_request) Successful in 15s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 15s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 15s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 9s
CI / Canvas (Next.js) (pull_request) Successful in 10s
CI / Python Lint & Test (pull_request) Successful in 8s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 9s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 11s
Harness Replays / Harness Replays (pull_request) Failing after 20s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 1m55s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 3m47s
CI / Platform (Go) (pull_request) Successful in 7m36s
Closes molecule-core#114 for the docker (local-OSS) path.
EIC (SaaS) path tracked as a follow-up — same shape, different
exec primitives (ssh vs docker exec); shipping both in one PR
doubles the test surface.
THE FOUR-STEP DANCE
1. STAGE — docker.CopyToContainer extracts tar into
/configs/plugins/.staging/<name>.<ts>/
2. SNAPSHOT — if /configs/plugins/<name>/ exists, mv to
/configs/plugins/.previous/<name>.<ts>/
3. SWAP — atomic mv staging → live (single rename(2))
4. MARKER — touch /configs/plugins/<name>/.complete
Workspace-side plugin loaders should refuse to load any plugin dir
without .complete (separate small change, not in this PR — the marker
write is the necessary precursor; consumer side is a follow-up so
existing-content plugins don't break before they're re-installed).
ROLLBACK
- Stage failure: rm -rf staging dir; live untouched
- Snapshot failure: rm -rf staging dir; live untouched (no rename happened)
- Swap failure with snapshot present: mv previous back to live
- Swap failure (no snapshot): rm -rf staging; live (which never
existed) stays absent
- Marker failure: content already in place, log loudly with manual
recovery hint (touch <plugin>/.complete) — don't roll back since
the new content is what we wanted, just unmarked
GC
Best-effort delete of previous-version snapshot after successful
marker write. Failures non-fatal — next install or a separate
sweeper reclaims. Sweeper for stale .previous/* across reboots is
follow-up scope.
CONCURRENCY
Each install gets a unique stamp (UTC second precision), so two
concurrent reinstalls land in distinct staging dirs and the second
swap simply overwrites the first's live result. The atomicity is
per-install, not cross-install — by design (the platform serializes
POST /workspaces/:id/plugins via Go-side semaphore upstream of
this code, so cross-install collisions don't reach here).
CHANGES
+ plugins_atomic.go — installVersion + atomicCopyToContainer
+ plugins_atomic_tar.go — tarWalk/tarHostDirWithPrefix helpers
+ plugins_atomic_test.go — 5 unit tests (paths, stamp shape,
tar happy path, symlink-skip, prefix
normalization). All green.
~ plugins_install_pipeline.go::deliverToContainer — swap
copyPluginToContainer call to atomicCopyToContainer
Old copyPluginToContainer is retained (still called by Download()) so
this PR is purely additive on the install path; no public API change.
PHASE 4 SELF-REVIEW (FIVE-AXIS)
Correctness: Required (addressed) — swap-failure rollback writes mv
of previous back to live before returning the error; if rollback
itself fails, we wrap both errors and surface the combined fault.
Marker-write failure is treated as content-landed-but-unmarked
(LOG, don't roll back the new content).
Readability: No finding — installVersion path methods make the
/staging/.previous/live/marker layout obvious from one struct.
tarWalk extracted from the inline filepath.Walk in
plugins_install_pipeline.go for testability.
Architecture: No finding — atomicCopyToContainer composes existing
execAsRoot / docker.CopyToContainer primitives; no new dependencies.
Old copyPluginToContainer kept for Download() — single responsibility
per function.
Security: No finding — symlinks still skipped during tar walk
(defense vs hostile plugin escaping its own dir). Marker writes
use composeable path.Join, no user input touches the path.
Performance: No finding — adds ~3 docker exec calls per install
(mkdir, mv-snapshot, mv-swap, touch — actually 4) on top of the
one CopyToContainer. Each exec ~50-100ms in practice; install
end-to-end was already seconds-scale, this rounds to noise.
REFS
molecule-core#114 — this issue
Companion: molecule-core#112 (hot-reload classifier — depends on .complete marker)
Companion: molecule-core#113 (version subscription — uses install machinery)
EIC follow-up: separate issue to be filed for SaaS path parity
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
208 lines
8.0 KiB
Go
208 lines
8.0 KiB
Go
package handlers
|
|
|
|
// plugins_atomic.go — atomic install pattern for plugin delivery into a
|
|
// running workspace container. Closes molecule-core#114.
|
|
//
|
|
// Replaces the prior "tar + docker.CopyToContainer to /configs/plugins/<name>"
|
|
// single-step write (no atomicity, no marker, no rollback) with a 4-step
|
|
// dance:
|
|
//
|
|
// 1. STAGE — extract tar into /configs/plugins/.staging/<name>.<ts>/
|
|
// 2. SNAPSHOT — if /configs/plugins/<name>/ exists, mv to .previous/<name>.<ts>/
|
|
// 3. SWAP — mv /configs/plugins/.staging/<name>.<ts>/ → /configs/plugins/<name>/
|
|
// 4. MARKER — touch /configs/plugins/<name>/.complete
|
|
//
|
|
// On any post-snapshot failure we attempt a best-effort rollback by mv-ing
|
|
// the previous snapshot back into place. The .complete marker is the
|
|
// canonical "this install is fully landed" signal — workspace-side plugin
|
|
// loaders should refuse to load a plugin dir without it.
|
|
//
|
|
// Scope: docker path only (workspace running as a local container). The
|
|
// SaaS path (deliverViaEIC, SSH-into-EC2) is unchanged in this PR; tracked
|
|
// as a follow-up. The same stage-then-swap shape applies but the exec
|
|
// primitives differ (ssh vs docker exec), and shipping both paths in one
|
|
// PR doubles the test surface.
|
|
|
|
import (
|
|
"bytes"
|
|
"context"
|
|
"fmt"
|
|
"path"
|
|
"strings"
|
|
"time"
|
|
|
|
"github.com/docker/docker/api/types/container"
|
|
)
|
|
|
|
const (
|
|
pluginsRoot = "/configs/plugins"
|
|
pluginsStagingDir = "/configs/plugins/.staging"
|
|
pluginsPrevDir = "/configs/plugins/.previous"
|
|
completeMarker = ".complete"
|
|
)
|
|
|
|
// installVersion identifies one install attempt — the plugin name plus a
|
|
// monotonic-ish UTC timestamp suffix. Used to namespace the staging dir
|
|
// and any snapshot of the previous version, so a reinstall mid-flight
|
|
// can't collide with a concurrent reinstall.
|
|
type installVersion struct {
|
|
plugin string
|
|
stamp string // e.g. 20260508T141530Z
|
|
}
|
|
|
|
func newInstallVersion(plugin string) installVersion {
|
|
return installVersion{
|
|
plugin: plugin,
|
|
stamp: time.Now().UTC().Format("20060102T150405Z"),
|
|
}
|
|
}
|
|
|
|
// stagedPath is the container path where the new content lands during fetch.
|
|
// e.g. /configs/plugins/.staging/molecule-skill-foo.20260508T141530Z
|
|
func (v installVersion) stagedPath() string {
|
|
return path.Join(pluginsStagingDir, v.plugin+"."+v.stamp)
|
|
}
|
|
|
|
// previousPath is where the prior live version is moved before swap.
|
|
// e.g. /configs/plugins/.previous/molecule-skill-foo.20260508T141530Z
|
|
func (v installVersion) previousPath() string {
|
|
return path.Join(pluginsPrevDir, v.plugin+"."+v.stamp)
|
|
}
|
|
|
|
// livePath is the destination after swap.
|
|
// e.g. /configs/plugins/molecule-skill-foo
|
|
func (v installVersion) livePath() string {
|
|
return path.Join(pluginsRoot, v.plugin)
|
|
}
|
|
|
|
// markerPath is the .complete file inside the live dir written last.
|
|
func (v installVersion) markerPath() string {
|
|
return path.Join(v.livePath(), completeMarker)
|
|
}
|
|
|
|
// atomicCopyToContainer does a stage→snapshot→swap→marker install of a
|
|
// host-side staged plugin tree into a running container's
|
|
// /configs/plugins/<name>/. Returns nil on success.
|
|
//
|
|
// On post-snapshot failure (swap or marker write), best-effort rollback
|
|
// restores the previous snapshot to the live path. Returns the original
|
|
// error wrapped — the caller should surface it; rollback success is
|
|
// logged separately.
|
|
func (h *PluginsHandler) atomicCopyToContainer(
|
|
ctx context.Context, containerName, hostDir, pluginName string,
|
|
) error {
|
|
v := newInstallVersion(pluginName)
|
|
|
|
// Step 0a: ensure staging + previous root dirs exist (idempotent).
|
|
if _, err := h.execAsRoot(ctx, containerName, []string{
|
|
"mkdir", "-p", pluginsStagingDir, pluginsPrevDir,
|
|
}); err != nil {
|
|
return fmt.Errorf("atomic install: mkdir staging/previous: %w", err)
|
|
}
|
|
|
|
// Step 0b: tar the host content with a path prefix that lands it in the
|
|
// staging dir — NOT directly into the live name. The prefix has no
|
|
// leading "/" because docker.CopyToContainer extracts paths relative
|
|
// to the dstPath argument we pass below.
|
|
stagedRel := strings.TrimPrefix(v.stagedPath(), "/")
|
|
tarBuf, err := tarHostDirWithPrefix(hostDir, stagedRel)
|
|
if err != nil {
|
|
return fmt.Errorf("atomic install: tar host dir: %w", err)
|
|
}
|
|
|
|
// Step 1: STAGE — extract tar into /configs/plugins/.staging/<name>.<ts>/
|
|
if err := h.docker.CopyToContainer(ctx, containerName, "/", &tarBuf,
|
|
container.CopyToContainerOptions{}); err != nil {
|
|
// Best-effort: clean up any partial staging extract before returning.
|
|
_, _ = h.execAsRoot(ctx, containerName, []string{
|
|
"rm", "-rf", v.stagedPath(),
|
|
})
|
|
return fmt.Errorf("atomic install: copy to container: %w", err)
|
|
}
|
|
|
|
// Step 2: SNAPSHOT — if a live version exists, move it aside.
|
|
// `test -d` exits 0 if the dir exists, non-zero otherwise; the helper
|
|
// returns a non-nil error in the non-zero case which we treat as
|
|
// "no previous version" rather than a real failure.
|
|
snapshotted := false
|
|
if _, err := h.execAsRoot(ctx, containerName, []string{
|
|
"test", "-d", v.livePath(),
|
|
}); err == nil {
|
|
if _, err := h.execAsRoot(ctx, containerName, []string{
|
|
"mv", v.livePath(), v.previousPath(),
|
|
}); err != nil {
|
|
// Snapshot failure: roll back the staged extract before failing.
|
|
_, _ = h.execAsRoot(ctx, containerName, []string{
|
|
"rm", "-rf", v.stagedPath(),
|
|
})
|
|
return fmt.Errorf("atomic install: snapshot previous version: %w", err)
|
|
}
|
|
snapshotted = true
|
|
}
|
|
|
|
// Step 3: SWAP — atomic rename of the staged dir into the live name.
|
|
// `mv` on the same filesystem is a single rename(2), atomic at the FS level.
|
|
if _, err := h.execAsRoot(ctx, containerName, []string{
|
|
"mv", v.stagedPath(), v.livePath(),
|
|
}); err != nil {
|
|
// Swap failure: roll back if we had a snapshot.
|
|
if snapshotted {
|
|
if _, rbErr := h.execAsRoot(ctx, containerName, []string{
|
|
"mv", v.previousPath(), v.livePath(),
|
|
}); rbErr != nil {
|
|
return fmt.Errorf("atomic install: swap failed AND rollback failed: swap=%w, rollback=%v", err, rbErr)
|
|
}
|
|
}
|
|
// Best-effort cleanup of the still-staged dir.
|
|
_, _ = h.execAsRoot(ctx, containerName, []string{
|
|
"rm", "-rf", v.stagedPath(),
|
|
})
|
|
return fmt.Errorf("atomic install: swap to live path: %w", err)
|
|
}
|
|
|
|
// Step 4: MARKER — touch .complete inside the live dir as the last write.
|
|
// Workspace-side plugin loaders treat a plugin dir without this marker
|
|
// as half-installed and skip it (or surface a clear error to the
|
|
// operator instead of loading a possibly-partial tree).
|
|
if _, err := h.execAsRoot(ctx, containerName, []string{
|
|
"touch", v.markerPath(),
|
|
}); err != nil {
|
|
// Marker write failure with the new content already in place is a
|
|
// weird state — content is fine on disk, but the plugin loader
|
|
// will refuse to use it. Log loudly; do NOT roll back, since the
|
|
// content is the latest, just unmarked. Operator can manually
|
|
// `touch <plugin>/.complete` to recover.
|
|
return fmt.Errorf("atomic install: write .complete marker (content landed but unmarked, manual recovery: touch %s): %w", v.markerPath(), err)
|
|
}
|
|
|
|
// Step 5: GC — best-effort delete the previous snapshot. Failures here
|
|
// just leave a directory; not load-bearing for correctness, the next
|
|
// install or a separate sweeper will reclaim the space.
|
|
if snapshotted {
|
|
_, _ = h.execAsRoot(ctx, containerName, []string{
|
|
"rm", "-rf", v.previousPath(),
|
|
})
|
|
}
|
|
|
|
return nil
|
|
}
|
|
|
|
// tarHostDirWithPrefix walks hostDir and writes a tar to a buffer with
|
|
// every entry's name prefixed by `prefix`. Mirrors the prior streaming
|
|
// shape used in copyPluginToContainer but with a configurable prefix
|
|
// (the prior version hardcoded "plugins/<name>/"; we use a full
|
|
// staging path so the extracted layout is the staging dir directly).
|
|
//
|
|
// Symlinks are skipped — same posture as streamDirAsTar elsewhere in
|
|
// this file. Skipping prevents a hostile plugin from injecting a
|
|
// symlink that, post-extract, points outside the plugin's own dir.
|
|
func tarHostDirWithPrefix(hostDir, prefix string) (bytes.Buffer, error) {
|
|
var buf bytes.Buffer
|
|
tw := newTarWriter(&buf)
|
|
defer tw.Close()
|
|
if err := tarWalk(hostDir, prefix, tw); err != nil {
|
|
return bytes.Buffer{}, err
|
|
}
|
|
return buf, nil
|
|
}
|