Some checks failed
E2E API Smoke Test / E2E API Smoke Test (pull_request) Failing after 4m46s
CI / Detect changes (pull_request) Successful in 5s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 6s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 6s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 5s
CI / Canvas (Next.js) (pull_request) Successful in 5s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 7s
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 4s
E2E API Smoke Test / detect-changes (pull_request) Successful in 4s
Harness Replays / detect-changes (pull_request) Successful in 5s
Retarget main PRs to staging / Retarget to staging (pull_request) Has been skipped
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 6s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 2s
CI / Python Lint & Test (pull_request) Successful in 4s
CodeQL / Analyze (${{ matrix.language }}) (go) (pull_request) Failing after 53s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Failing after 44s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 4s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
CodeQL / Analyze (${{ matrix.language }}) (javascript-typescript) (pull_request) Failing after 1m21s
CodeQL / Analyze (${{ matrix.language }}) (python) (pull_request) Failing after 1m28s
Harness Replays / Harness Replays (pull_request) Failing after 43s
CI / Platform (Go) (pull_request) Successful in 3m19s
Two coupled fixes for molecule-core#10 (plugin install 503 vs status=online split-state): 1. SSOT for "is this workspace's container running" — `findRunningContainer` in plugins.go used to carry its own copy of `cli.ContainerInspect`, which collapsed transient daemon errors into the same `""` return as a genuinely-stopped container. Healthsweep's `Provisioner.IsRunning` handled the same input correctly (defensive). Promote the inspect logic to `provisioner.RunningContainerName`, route both consumers through it. Transient errors get a distinct log line on the plugins side so triage doesn't confuse a flaky daemon with a stopped container. 2. Runtime-aware Install/Uninstall — `runtime='external'` workspaces have no local container; push-install via docker exec is meaningless. They pull plugins via the download endpoint instead (Phase 30.3). Without a guard they fell through to `findRunningContainer` and 503'd with a misleading "container not running." Add an early 422 with a hint pointing at the download endpoint. The two fixes are independent: (1) preserves correctness when the SSOT helper is later modified; (2) eliminates the persistent split-state on the 5 external persona-agent workspaces in this DB (and on tenant deployments hitting the same shape). * `internal/provisioner/provisioner.go` — new `RunningContainerName(ctx, cli, id) (string, error)` with three documented outcomes (running / stopped / transient). `Provisioner.IsRunning` now wraps it; behavior preserved. * `internal/handlers/plugins.go` — `findRunningContainer` shimmed onto `RunningContainerName`; new `isExternalRuntime(id)` predicate. * `internal/handlers/plugins_install.go` — Install + Uninstall reject external runtimes with 422 + hint, before the source-fetch step. * `internal/handlers/plugins_install_external_test.go` — 5 cases: external→422, uninstall-external→422, container-backed-falls-through, no-runtime-lookup-fails-open, lookup-error-fails-open. * `internal/handlers/plugins_findrunning_ssot_test.go` — two AST gates pin the SSOT routing so future PRs can't silently re-introduce the parallel impl. Mutation-tested: reverting either consumer to a direct `ContainerInspect` makes the gate fail. Refs: molecule-core#10 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
305 lines
11 KiB
Go
305 lines
11 KiB
Go
package handlers
|
|
|
|
import (
|
|
"archive/tar"
|
|
"bytes"
|
|
"compress/gzip"
|
|
"context"
|
|
"errors"
|
|
"fmt"
|
|
"log"
|
|
"net/http"
|
|
"os"
|
|
"time"
|
|
|
|
"github.com/Molecule-AI/molecule-monorepo/platform/internal/db"
|
|
"github.com/Molecule-AI/molecule-monorepo/platform/internal/envx"
|
|
"github.com/Molecule-AI/molecule-monorepo/platform/internal/wsauth"
|
|
"github.com/gin-gonic/gin"
|
|
)
|
|
|
|
// Install handles POST /workspaces/:id/plugins — installs a plugin.
|
|
//
|
|
// Body: {"source": "<scheme>://<spec>"}
|
|
//
|
|
// - {"source": "local://my-plugin"} → install from platform registry
|
|
// - {"source": "github://owner/repo"} → install from GitHub
|
|
// - {"source": "github://owner/repo#v1.2.0"} → pinned ref
|
|
// - {"source": "clawhub://sonoscli@1.2.0"} → when a ClawHub resolver is registered
|
|
//
|
|
// The shape of the plugin (agentskills.io format, MCP server, DeepAgents
|
|
// sub-agent, …) is orthogonal and handled by the per-runtime adapter
|
|
// inside the workspace at startup.
|
|
func (h *PluginsHandler) Install(c *gin.Context) {
|
|
workspaceID := c.Param("id")
|
|
// External-runtime guard (molecule-core#10): push-install via docker
|
|
// exec is meaningless for `runtime='external'` workspaces — they have
|
|
// no local container. Reject early with a hint pointing at the
|
|
// pull-mode endpoint, instead of falling through to a misleading
|
|
// "container not running" 503 from findRunningContainer.
|
|
if h.isExternalRuntime(workspaceID) {
|
|
c.JSON(http.StatusUnprocessableEntity, gin.H{
|
|
"error": "plugin install via push is not supported for external runtimes",
|
|
"hint": "external workspaces pull plugins via GET /workspaces/:id/plugins/:name/download",
|
|
})
|
|
return
|
|
}
|
|
// Cap the JSON body so a pathological POST can't exhaust parser memory.
|
|
bodyMax := envx.Int64("PLUGIN_INSTALL_BODY_MAX_BYTES", defaultInstallBodyMaxBytes)
|
|
c.Request.Body = http.MaxBytesReader(c.Writer, c.Request.Body, bodyMax)
|
|
|
|
// Bound the whole install (fetch + copy) so a slow/malicious source
|
|
// can't tie up an HTTP handler goroutine indefinitely. Overridable
|
|
// via PLUGIN_INSTALL_FETCH_TIMEOUT (duration string, e.g. "10m").
|
|
timeout := envx.Duration("PLUGIN_INSTALL_FETCH_TIMEOUT", defaultInstallFetchTimeout)
|
|
ctx, cancel := context.WithTimeout(c.Request.Context(), timeout)
|
|
defer cancel()
|
|
|
|
var req installRequest
|
|
if err := c.ShouldBindJSON(&req); err != nil {
|
|
c.JSON(http.StatusBadRequest, gin.H{"error": "invalid request body"})
|
|
return
|
|
}
|
|
|
|
result, err := h.resolveAndStage(ctx, req)
|
|
if err != nil {
|
|
var he *httpErr
|
|
if errors.As(err, &he) {
|
|
c.JSON(he.Status, he.Body)
|
|
return
|
|
}
|
|
c.JSON(http.StatusInternalServerError, gin.H{"error": "plugin install failed"})
|
|
return
|
|
}
|
|
defer os.RemoveAll(result.StagedDir)
|
|
|
|
// Org plugin allowlist gate (#591).
|
|
// If the workspace's org has a non-empty allowlist, the plugin must be
|
|
// on it. An empty allowlist means allow-all (backward compat).
|
|
if blocked, reason := checkOrgPluginAllowlist(ctx, workspaceID, result.PluginName); blocked {
|
|
c.JSON(http.StatusForbidden, gin.H{"error": reason})
|
|
return
|
|
}
|
|
|
|
if err := h.deliverToContainer(ctx, workspaceID, result); err != nil {
|
|
var he *httpErr
|
|
if errors.As(err, &he) {
|
|
c.JSON(he.Status, he.Body)
|
|
return
|
|
}
|
|
c.JSON(http.StatusInternalServerError, gin.H{"error": "plugin deliver failed"})
|
|
return
|
|
}
|
|
|
|
log.Printf("Plugin install: %s via %s → workspace %s (restarting)", result.PluginName, result.Source.Scheme, workspaceID)
|
|
c.JSON(http.StatusOK, gin.H{
|
|
"status": "installed",
|
|
"plugin": result.PluginName,
|
|
"source": result.Source.Raw(),
|
|
})
|
|
}
|
|
|
|
// Uninstall handles DELETE /workspaces/:id/plugins/:name — removes a plugin.
|
|
func (h *PluginsHandler) Uninstall(c *gin.Context) {
|
|
workspaceID := c.Param("id")
|
|
pluginName := c.Param("name")
|
|
ctx := c.Request.Context()
|
|
|
|
// Mirror Install's external-runtime guard (molecule-core#10) so the
|
|
// two endpoints reject the same shape with the same message.
|
|
if h.isExternalRuntime(workspaceID) {
|
|
c.JSON(http.StatusUnprocessableEntity, gin.H{
|
|
"error": "plugin uninstall via docker exec is not supported for external runtimes",
|
|
"hint": "external workspaces manage their own plugin directory; remove it locally",
|
|
})
|
|
return
|
|
}
|
|
|
|
if err := validatePluginName(pluginName); err != nil {
|
|
c.JSON(http.StatusBadRequest, gin.H{"error": "invalid plugin name"})
|
|
return
|
|
}
|
|
|
|
containerName := h.findRunningContainer(ctx, workspaceID)
|
|
if containerName == "" {
|
|
c.JSON(http.StatusServiceUnavailable, gin.H{"error": "workspace container not running"})
|
|
return
|
|
}
|
|
|
|
// Read the plugin's manifest BEFORE deletion to learn which skill dirs
|
|
// it owns, so we can clean them out of /configs/skills/ and avoid the
|
|
// auto-restart re-mounting them. Issue #106.
|
|
skillNames := h.readPluginSkillsFromContainer(ctx, containerName, pluginName)
|
|
|
|
// 1. Strip plugin's rule/fragment markers from CLAUDE.md (mirrors
|
|
// AgentskillsAdaptor.uninstall lines 184-188). Best-effort: if
|
|
// the user edited CLAUDE.md, our marker stays untouched.
|
|
h.stripPluginMarkersFromMemory(ctx, containerName, pluginName)
|
|
|
|
// 2. Remove copied skill dirs declared in the plugin's plugin.yaml.
|
|
for _, skill := range skillNames {
|
|
if err := validatePluginName(skill); err != nil {
|
|
// Defensive: a malformed skill name in plugin.yaml shouldn't
|
|
// turn into a path-traversal exec. Just skip it.
|
|
log.Printf("Plugin uninstall: skipping invalid skill name %q in %s: %v", skill, pluginName, err)
|
|
continue
|
|
}
|
|
_, _ = h.execAsRoot(ctx, containerName, []string{
|
|
"rm", "-rf", "/configs/skills/" + skill,
|
|
})
|
|
}
|
|
|
|
// 3. Delete the plugin directory itself (as root to handle file ownership).
|
|
_, err := h.execAsRoot(ctx, containerName, []string{
|
|
"rm", "-rf", "/configs/plugins/" + pluginName,
|
|
})
|
|
if err != nil {
|
|
log.Printf("Plugin uninstall: failed to remove %s from %s: %v", pluginName, workspaceID, err)
|
|
c.JSON(http.StatusInternalServerError, gin.H{"error": "failed to remove plugin"})
|
|
return
|
|
}
|
|
|
|
// Verify deletion before restart
|
|
h.execInContainer(ctx, containerName, []string{"sync"})
|
|
|
|
// Auto-restart (small delay to ensure fs writes are flushed)
|
|
if h.restartFunc != nil {
|
|
go func() {
|
|
time.Sleep(2 * time.Second)
|
|
h.restartFunc(workspaceID)
|
|
}()
|
|
}
|
|
|
|
log.Printf("Plugin uninstall: %s from workspace %s (restarting)", pluginName, workspaceID)
|
|
c.JSON(http.StatusOK, gin.H{
|
|
"status": "uninstalled",
|
|
"plugin": pluginName,
|
|
})
|
|
}
|
|
|
|
// Download handles GET /workspaces/:id/plugins/:name/download?source=<scheme://spec>
|
|
//
|
|
// Phase 30.3 — stream the named plugin as a gzipped tarball so remote
|
|
// agents can pull and unpack locally. Replaces the Docker-exec install
|
|
// path for `runtime='external'` workspaces.
|
|
//
|
|
// The `source` query parameter is optional. When omitted we default to
|
|
// `local://<name>` (the platform's curated registry). When set, any
|
|
// registered scheme works — `github://owner/repo`, future `clawhub://…`,
|
|
// etc. — which lets a workspace install plugins from upstream repos
|
|
// without the platform pre-staging them.
|
|
//
|
|
// Auth: requires the workspace's bearer token (same shape as 30.2). A
|
|
// plugin tarball often ships rule text + skill files that reference
|
|
// internal APIs, so we prefer fail-closed on DB errors to prevent a
|
|
// hiccup from turning this into an unauth'd download endpoint.
|
|
func (h *PluginsHandler) Download(c *gin.Context) {
|
|
workspaceID := c.Param("id")
|
|
pluginName := c.Param("name")
|
|
ctx := c.Request.Context()
|
|
|
|
if err := validatePluginName(pluginName); err != nil {
|
|
c.JSON(http.StatusBadRequest, gin.H{"error": "invalid plugin name"})
|
|
return
|
|
}
|
|
|
|
// Auth gate — workspace token required (fail-closed on DB errors).
|
|
hasLive, hlErr := wsauth.HasAnyLiveToken(ctx, db.DB, workspaceID)
|
|
if hlErr != nil {
|
|
log.Printf("wsauth: plugin.Download HasAnyLiveToken(%s) failed: %v", workspaceID, hlErr)
|
|
c.JSON(http.StatusInternalServerError, gin.H{"error": "auth check failed"})
|
|
return
|
|
}
|
|
if hasLive {
|
|
tok := wsauth.BearerTokenFromHeader(c.GetHeader("Authorization"))
|
|
if tok == "" {
|
|
c.JSON(http.StatusUnauthorized, gin.H{"error": "missing workspace auth token"})
|
|
return
|
|
}
|
|
if err := wsauth.ValidateToken(ctx, db.DB, workspaceID, tok); err != nil {
|
|
c.JSON(http.StatusUnauthorized, gin.H{"error": "invalid workspace auth token"})
|
|
return
|
|
}
|
|
}
|
|
|
|
// Resolve source — default to local://<name> when caller doesn't
|
|
// specify. This is the common case: pulling a platform-curated
|
|
// plugin by its canonical name.
|
|
source := c.Query("source")
|
|
if source == "" {
|
|
source = "local://" + pluginName
|
|
}
|
|
|
|
// Reuse the existing install-layer bounds so download shares
|
|
// fetch-timeout, body limits, and staged-dir size caps with Install.
|
|
timeout := envx.Duration("PLUGIN_INSTALL_FETCH_TIMEOUT", defaultInstallFetchTimeout)
|
|
fetchCtx, cancel := context.WithTimeout(ctx, timeout)
|
|
defer cancel()
|
|
|
|
result, err := h.resolveAndStage(fetchCtx, installRequest{Source: source})
|
|
if err != nil {
|
|
var he *httpErr
|
|
if errors.As(err, &he) {
|
|
c.JSON(he.Status, he.Body)
|
|
return
|
|
}
|
|
c.JSON(http.StatusInternalServerError, gin.H{"error": "plugin download failed"})
|
|
return
|
|
}
|
|
defer os.RemoveAll(result.StagedDir)
|
|
|
|
// Sanity: resolved plugin name must match the URL path param.
|
|
// Resolvers can return a plugin.yaml-derived name that differs
|
|
// from the URL segment; reject the mismatch rather than ship a
|
|
// tarball labeled "foo" that actually contains plugin "bar".
|
|
if result.PluginName != pluginName {
|
|
c.JSON(http.StatusBadRequest, gin.H{
|
|
"error": fmt.Sprintf("source resolved to plugin %q but URL requested %q", result.PluginName, pluginName),
|
|
"resolved_name": result.PluginName,
|
|
"requested_name": pluginName,
|
|
})
|
|
return
|
|
}
|
|
|
|
// Buffer the full tar.gz before writing any response bytes. This lets
|
|
// us emit a clean 5xx if tar packing fails — previously, a partial
|
|
// stream surfaced as HTTP 200 + truncated body, which made remote
|
|
// agents fail at unpack time with cryptic gzip errors instead of
|
|
// distinguishing "platform borked" from "network glitch".
|
|
//
|
|
// Plugin sizes are bounded by PLUGIN_INSTALL_MAX_DIR_BYTES (default
|
|
// 100 MiB) which `resolveAndStage` already validated — buffering at
|
|
// that scale is acceptable. If we ever raise the cap above ~500 MiB,
|
|
// switch to a temp file backed io.ReadSeeker and use http.ServeContent.
|
|
var buf bytes.Buffer
|
|
gz := gzip.NewWriter(&buf)
|
|
tw := tar.NewWriter(gz)
|
|
if err := streamDirAsTar(result.StagedDir, tw); err != nil {
|
|
log.Printf("plugin.Download: tar pack failed for %s: %v", pluginName, err)
|
|
c.JSON(http.StatusInternalServerError, gin.H{
|
|
"error": "failed to pack plugin",
|
|
"plugin": pluginName,
|
|
})
|
|
return
|
|
}
|
|
if err := tw.Close(); err != nil {
|
|
log.Printf("plugin.Download: tar close failed for %s: %v", pluginName, err)
|
|
c.JSON(http.StatusInternalServerError, gin.H{"error": "failed to finalize tar"})
|
|
return
|
|
}
|
|
if err := gz.Close(); err != nil {
|
|
log.Printf("plugin.Download: gzip close failed for %s: %v", pluginName, err)
|
|
c.JSON(http.StatusInternalServerError, gin.H{"error": "failed to finalize gzip"})
|
|
return
|
|
}
|
|
|
|
c.Header("Content-Type", "application/gzip")
|
|
c.Header("Content-Disposition", fmt.Sprintf(`attachment; filename="%s.tar.gz"`, pluginName))
|
|
c.Header("Content-Length", fmt.Sprintf("%d", buf.Len()))
|
|
c.Header("X-Plugin-Name", pluginName)
|
|
c.Header("X-Plugin-Source", result.Source.Raw())
|
|
if _, err := c.Writer.Write(buf.Bytes()); err != nil {
|
|
log.Printf("plugin.Download: response write failed for %s: %v", pluginName, err)
|
|
}
|
|
}
|