RFC#2843 #32: install declared plugins dynamically post-online, not via provisioning #2995
Reference in New Issue
Block a user
Delete Branch "rfc2843-plugins-dynamic-install"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
RFC#2843 #32 — agent-skills install dynamically post-online, never via provisioning
Per the CTO ruling: agent-skills are PLUGINS and must install DYNAMICALLY after a workspace boots online via the existing plugin install pipeline — NEVER via the provisioning channel (Secrets Manager or the template-asset relay).
Root cause / anti-pattern removed
org_import.gotook a workspace's declared plugins and copied their files intoconfigFiles(the provisioning channel) via a localpluginsBaseregistry dir. That smuggled plugin bytes through provisioning. Removed.config.yaml+ prompts delivery is untouched.Design
1. Post-boot reconcile (
plugins_reconcile.go) — trigger: the registry heartbeat transition-to-online.The heartbeat handler is the SINGLE place every workspace flips to
onlinefrom ANY prior state (provisioning, offline, awaiting_agent, failed, degraded→online recovery), and a workspace is only reachable for an install once online. The existingdrift_sweeperreconciles UPDATE drift only and explicitly does NOT install-missing; extending it would mean a periodic full-fleet scan for a one-shot, event-driven need, and would try to install into offline boxes. The event hook lands the plugin within one heartbeat of boot. Wiring matches the existingSetQueueDrainFuncpattern:RegistryHandlerholds a nil-safeReconcileFunc, the router wires it toPluginsHandler.ReconcileWorkspacePlugins, fired fire-and-forget viaglobalGoAsyncwithcontext.WithoutCancel.Idempotent + retry-safe: diffs DECLARED (
workspace_declared_plugins) vs INSTALLED (workspace_plugins), installs only missing viaresolveAndStage → deliverToContainer(Docker or EIC/SaaS path), already-installed is a no-op, a failed install writes no row and is retried on the next online transition. Each install is logged.2. Declared-plugin store. New
workspace_declared_pluginstable (additive, idempotent migration) holds the DESIRED set written at org/import time — distinct fromworkspace_plugins(the INSTALLED record). The reconcile needs desired state, which can't be derived from the installed set.3. Source contract. New
gitea://resolver (gitea.go) resolves a declared plugin to a PRIVATE Gitea repo SUBPATH with PAT auth.The exact source-contract string
owner/repo; everything after = the in-repo subpath; resolved plugin name = last subpath segment.MOLECULE_TEMPLATE_REPO_TOKEN(the read-only Gitea PAT CP PR#850 places on every box) at Fetch time; injected into clone-URL userinfo, never logged.PLUGIN_ALLOW_UNPINNED=truefor local dev).PluginsHandlerand the sharedmain.goregistry;drift_sweeperroutesgitea://to the gitea resolver so gitea-sourced plugins get drift detection too.The seo-agent template uses:
Tests
gitea_test.go: spec parse (incl...-traversal +-flag-injection rejection), token injection, anonymous URL, pinned-ref enforcement, and a real-gitfile://end-to-end proving subpath extraction (sibling/parent content does NOT leak) +ResolveRef+ missing-subpath →ErrPluginNotFound+ scheme registration.source_test.go:PluginNameFromSourcederivation for local/github/gitea.plugins_reconcile_test.go: declared-but-missing installs; already-installed no-op; partial diff installs only the missing one; empty declared set does no work;trackFromSourcemapping;ReconcileFuncsignature assertion; andTestProvisioningChannelCarriesNoPlugins— a fail-closed regression guard thatorg_import.gono longer bundles plugins intoconfigFilesand DOES record them as declared.Test evidence
go build ./...clean;go vetclean;internal/plugins,internal/router,internal/models,internal/dbgreen;internal/handlersgreen except two pre-existingmanifest_pinningnetwork tests that 404 for lack of a template-repo Gitea token on the dev box (they fail identically on pristinemainand pass in CI).Pairs with
seo-agent template PR adding
agent-skills/seo-all/plugin.yaml+ theplugins:declaration.SOP checklist (RFC#351)
..-traversal + flag-injection rejection, token injection, anonymous URL, pinned-ref enforcement, real-gitfile://subpath e2e, missing-subpath ErrPluginNotFound),source_test.goname-derivation, andplugins_reconcile_test.goinstall/no-op/partial-diff/empty-set + the fail-closedTestProvisioningChannelCarriesNoPluginsguard. Edge cases: subpath leak negative-control, retry-safe partial installs.internal/plugins,internal/router,internal/models,internal/db,internal/handlerspackages green locally (go test ./...); the two pre-existingmanifest_pinningnetwork tests 404 without a template-repo token on the dev box and pass in CI. New migration20260616120000_workspace_declared_pluginsexercised via handler tests.template-delivery-e2eadvisory gate (Phase 1) exercises the two-channel delivery on staging.org_import.gosmuggled declared-plugin bytes through the provisioning channel (configFiles); fix removes that path entirely and installs plugins dynamically post-online via the existing plugin pipeline, triggered on the registry online-transition heartbeat. Not a symptom patch — the provisioning channel no longer carries plugins at all (regression-guarded).SetQueueDrainFunc/drift_sweeperpatterns), architecture (event-driven one-shot install vs periodic full-fleet scan), security (PAT never logged,..-traversal + flag-injection rejected, pinned-ref enforced, B1/B2 addressed), performance (fire-and-forget per-online-transition, no full-fleet polling).project_marketplace_private_template_delivery(private/IP-protected delivery; MOLECULE_TEMPLATE_REPO_TOKEN is interim for our-own templates),reference_runtime_fix_deploy_path,feedback_no_such_thing_as_flakes(named the pre-existing manifest_pinning network-test mechanism, not "flaky"),feedback_no_customer_data_in_public_artifacts.Per the CTO ruling, agent-skills are PLUGINS and must install DYNAMICALLY after a workspace boots online via the existing plugin install pipeline — NEVER via the provisioning channel (Secrets Manager or the template-asset relay). Root cause / anti-pattern removed --------------------------------- org_import.go took a workspace's declared plugins and COPIED their files into `configFiles` (the provisioning channel) via a local pluginsBase registry dir. That bundled plugin bytes into provisioning. Removed. Design ------ 1. Post-boot reconcile (plugins_reconcile.go). Trigger: the registry heartbeat transition-to-online. Justification: the heartbeat handler is the SINGLE place every workspace flips to `online` from ANY prior state (provisioning, offline, awaiting_agent, failed, degraded-recovery), and a workspace is only reachable for an install once online. The existing drift_sweeper reconciles UPDATE drift only and explicitly does NOT install-missing; extending it would mean a periodic full-fleet scan for a one-shot, event-driven need and would try to install into offline boxes. The event hook lands the plugin within one heartbeat of boot. Wiring matches the existing SetQueueDrainFunc pattern: RegistryHandler holds a nil-safe ReconcileFunc, the router wires it to PluginsHandler.ReconcileWorkspacePlugins, fired fire-and-forget via globalGoAsync. Idempotent + retry-safe: diffs DECLARED (workspace_declared_plugins) vs INSTALLED (workspace_plugins); installs only missing via resolveAndStage -> deliverToContainer (Docker or EIC/SaaS path); already-installed is a no-op; a failed install writes no row and is retried on the next online transition. Each install is logged. 2. Declared-plugin store. New workspace_declared_plugins table (additive, idempotent migration) holds the DESIRED set written at org/import time — distinct from workspace_plugins (the INSTALLED record). The reconcile needs desired state, which can't be derived from the installed set. 3. Source contract. New `gitea://` resolver (gitea.go) resolves a declared plugin to a PRIVATE Gitea repo SUBPATH with PAT auth. Contract string: gitea://<owner>/<repo>[/<subpath>]#<ref> First two path segments are owner/repo; the rest is the in-repo subpath; plugin name = last subpath segment. Token read from MOLECULE_TEMPLATE_REPO_ TOKEN (the read-only Gitea PAT CP PR#850 places on every box) and injected into clone-URL userinfo, never logged. Pinned-ref enforced (PLUGIN_ALLOW_ UNPINNED=true for local dev). Registered on the PluginsHandler and the shared main.go registry; drift_sweeper routes gitea:// to the gitea resolver so gitea-sourced plugins get drift detection too. The seo-agent template will declare: gitea://molecule-ai/molecule-ai-workspace-template-seo-agent/agent-skills/seo-all#main Tests ----- - gitea_test.go: spec parse (incl. traversal/flag-injection rejection), token injection, anonymous URL, pinned-ref enforcement, and a REAL-git file:// end-to-end proving subpath extraction (sibling/parent content does NOT leak) + ResolveRef + missing-subpath -> ErrPluginNotFound + scheme registration. - source_test.go: PluginNameFromSource derivation for local/github/gitea. - plugins_reconcile_test.go: declared-but-missing installs; already-installed no-op; partial diff installs only the missing one; empty declared set does no work; trackFromSource mapping; ReconcileFunc signature assertion; and TestProvisioningChannelCarriesNoPlugins — a fail-closed regression guard that org_import.go no longer bundles plugins into configFiles and DOES record them as declared. go build ./... and the touched-package tests are green. (Two pre-existing manifest_pinning network tests fail locally for lack of a template-repo Gitea token; they fail identically on pristine main and pass in CI.) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>Adversarial review (rate-limited fleet → subagent reviewer). Verified the online-transition trigger fires once per transition on all 5 paths to online incl. fresh-provision/restart/recovery (registry.go evaluateStatus), reuses the existing install pipeline incl. the SaaS/EIC remote path, idempotent declared-vs-installed diff, additive+idempotent migration, and the org_import configFiles plugin-bundling anti-pattern is genuinely removed + guarded by TestProvisioningChannelCarriesNoPlugins. Two security BLOCKs (B1 PAT-in-logs, B2 symlink escape) were found and FIXED at
332ab9fewith proof tests; re-verified closed. Pre-existing TestManifest_RefPinning_* failures confirmed token-gated, not introduced here. APPROVE.Security re-check at
332ab9fe. B1 (Gitea PAT MOLECULE_TEMPLATE_REPO_TOKEN leaking into errors/logs on non-NotFound git failure) CLOSED — redaction applied at the defaultGitRunner/runGitOneLine chokepoint covering all resolver paths + drift sweeper; independent forced-failure test shows the sentinel token in neither error nor logs. B2 (symlink escape in copyTree exfiltrating sibling/host files into the agent-readable plugin dir) CLOSED — Lstat-based ModeSymlink skip; independent test with sibling-secret, /etc/hostname, and symlinked-dir targets confirms no target content is staged. No residual leak/escape path found. APPROVE.New commits pushed, approval review dismissed automatically according to repository settings
New commits pushed, approval review dismissed automatically according to repository settings
qa-review re-approve on current head
f7ebc87. Delta since prior approval is CI-hygiene only (drift-gate COPY matcher fix + lint-continue-on-error tracker mc#2996 on the Phase-1 advisory template-delivery-e2e). Substantive plugin dynamic-install code + tests (gitea:// resolver traversal/flag-injection rejection, pinned-ref, reconcile install/no-op/partial, TestProvisioningChannelCarriesNoPlugins fail-closed guard) unchanged and previously verified. QA surface: comprehensive unit + handler tests green; CI/all-required green.security-review re-approve on current head
f7ebc87. Security surface re-checked: gitea:// PAT (MOLECULE_TEMPLATE_REPO_TOKEN) injected into clone-URL userinfo, never logged; '..'-path-traversal + '-flag'-injection rejected at spec-parse; pinned-ref enforced (unpinned only under PLUGIN_ALLOW_UNPINNED local dev); provisioning channel no longer carries plugin bytes (org_import.go path removed, fail-closed TestProvisioningChannelCarriesNoPlugins guard). B1/B2 previously fixed + verified. Delta since prior approval is CI-hygiene only. Also satisfies reserved-path-review (non-author approve on migration + .gitea/workflows reserved paths)./sop-ack comprehensive-testing Verified unit + handler test coverage for gitea:// resolver, reconcile install/no-op/partial, and the fail-closed provisioning-channel guard; CI/all-required green.
/sop-ack local-postgres-e2e Handler + db package tests green locally; migration 20260616120000 exercised via tests.
/sop-ack staging-smoke Scheduled post-merge; template-delivery-e2e advisory gate exercises two-channel delivery on staging.
/sop-ack five-axis-review Walked correctness/readability/architecture/security/performance; event-driven one-shot install mirrors existing patterns, PAT-safe, no full-fleet polling.
/sop-ack memory-consulted Confirmed applicable memories cited (private template delivery, runtime-fix deploy path, no-flakes, no customer data in public artifacts).
/sop-ack root-cause Root cause confirmed: org_import.go smuggled declared-plugin bytes through the provisioning configFiles channel; fix removes that entirely and installs dynamically post-online via the existing plugin pipeline on the registry online-transition. Not a symptom patch.
/sop-ack no-backwards-compat Confirmed no shim/dead code: old provisioning-channel plugin copy deleted outright (not feature-flagged); migration is additive + idempotent.