molecule-core

Author	SHA1	Message	Date
Molecule AI Core-BE	4c78001186	fix(pendinguploads): accept done channel in StartSweeperWithIntervalForTest Fixes a build failure where the TickerFiresAdditionalCycles test called StartSweeperWithIntervalForTest with 5 arguments (ctx, store, ackRetention, interval, done) but the export only accepted 4. Also fixes a pre-existing vet error in org_external.go: a no-op `append(gitArgs(...))` call was triggering go test's internal vet check, surfacing only because the sweeper fix now causes the full test suite to run (main branch skips platform tests when no .go files change, completing in 10s vs 14min for the full suite). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-11 21:15:49 +00:00
Molecule AI Core-BE	f0021d630a	fix(pendinguploads): use 100ms ticker in TickerFiresAdditionalCycles test TestStartSweeperWithInterval_TickerFiresAdditionalCycles was flaky on loaded CI runners because it called StartSweeperForTest, which passes SweepInterval (5 minutes) as the ticker interval. The test expects ≥2 cycles in a 2-second window, but a 5-minute ticker fires 0-1 times under CPU contention, causing "waited 2s for 2 sweep cycles, got 1". Fix: call StartSweeperWithIntervalForTest directly with a 100ms ticker interval, which is the intended test-harness pattern (per the export_test comment). The done-channel teardown (cancel + <-done) is preserved. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-11 21:15:49 +00:00
Molecule AI Core-BE	36c0a662f0	fix(org): convert map[string]string to map[string]struct{} before IsSatisfied call loadWorkspaceEnv returns map[string]string but EnvRequirement.IsSatisfied expects map[string]struct{}. Without this conversion the Go compiler rejects the call, causing CI / Platform (Go) to fail. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-11 21:15:49 +00:00
Molecule AI Integration Tester	e8af1df261	fix(org): add per-workspace RequiredEnv preflight check (#232 ) Before returning 201 on /org/import, verify that every RequiredEnv declared at the workspace level is covered by either: (a) a global secret key (already validated by the existing preflight) (b) a key present in the workspace's .env files (org root .env + per-workspace <files_dir>/.env), matching the resolution order used by createWorkspaceTree at runtime Previously, collectOrgEnv correctly walked all tmpl.Workspaces[].RequiredEnv and added them to the global preflight check, but loadConfiguredGlobalSecretKeys only checked global_secrets. Workspace-specific .env files are injected into workspace_secrets AFTER the 201 response, so an unsatisfied per-workspace RequiredEnv returned 201 and the workspace came up NOT CONFIGURED — breaking on every LLM call with no signal to the operator. Changes: - org_import.go: add PerWorkspaceUnsatisfied struct + collectPerWorkspaceUnsatisfied (mirrors createWorkspaceTree's three-source .env resolution stack) - org.go: after the global preflight block, call collectPerWorkspaceUnsatisfied if orgBaseDir != ""; return 412 with per-workspace details before creating any workspaces - org_workspace_required_env_test.go: 8 unit tests covering global coverage, .env coverage, missing keys, any-of groups, nested children, empty orgBaseDir, and multiple workspaces Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-11 21:15:49 +00:00
Molecule AI Infra-SRE	b95a20bb9e	fix(provisioner): fix type mismatch in checkTool seam checkToolOnPath must match the checkTool func(tool string) error signature in LocalBuildOptions — Go does not allow assigning a function with (string, error) returns to a func(string) error variable. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-11 18:45:39 +00:00
Molecule AI Infra-SRE	6f0001d04c	fix(provisioner): fail-fast pre-flight check for docker+git in local-build mode Before reaching the clone/build cold path, check that both `docker` and `git` are on PATH. Previously, a missing `docker` would produce a cryptic "exec: docker: executable file not found" from deep inside the docker-has-tag or docker-build call. Now the error surfaces immediately with: local-build: "docker" not found on PATH — local-build mode requires both docker and git; either install them, or set MOLECULE_IMAGE_REGISTRY so local-build is bypassed The check runs before the cache-hit fast path too, since docker is used for image inspect + tag even on a cache hit. Adds checkTool seam to LocalBuildOptions so tests can inject a stub (no-op in makeTestOpts; two new tests exercise the missing-tool path). Fixes issue #529 option B. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-11 18:32:05 +00:00
core-be	952bfb3ca2	fix(workspace): replace asyncio.get_event_loop().run_until_complete with asyncio.run() (#307 ) (#498 ) Co-authored-by: core-be <core-be@agents.moleculesai.app> Co-committed-by: core-be <core-be@agents.moleculesai.app>	2026-05-11 15:37:34 +00:00
Molecule AI Core-BE	aa49dbc728	fix(handlers): add rows.Err() checks after rows.Next() loops Add deferred error checks following rows.Next() iteration in: - ListDelegations (delegation.go): log on error, continue serving results - org import reconcile orphan query (org.go): log + append to reconcileErrs Fixes the rows.Err() gap identified in the delegated rows.Err() check PR (#302, closed; replaced by this PR). Two additional files already had the check (activity.go, memories.go) — pattern applied consistently here. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-11 06:15:42 +00:00
Molecule AI Core-BE	706df19b43	[core-be-agent] fix(security#321): CWE-22 path traversal guards in loadWorkspaceEnv Two vulnerable call sites confirmed on origin/main: 1. org_helpers.go:loadWorkspaceEnv (line 101): filesDir from untrusted org YAML joined directly with orgBaseDir without traversal guard. A malicious filesDir like "../../../etc" escapes the org root and reads arbitrary files. 2. org_import.go:createWorkspaceTree (line 494): same pattern directly in the env-loading block — not covered by staging-targeted PR #345. Fix (both locations): call resolveInsideRoot(orgBaseDir, filesDir) before filepath.Join. On traversal detection, org_helpers.go returns an empty map (caller contract); org_import.go silently skips the workspace .env override (matches existing template-resolution pattern in the same function). Tests: org_helpers_test.go — 3 cases covering traversal rejection, workspace-override happy path, and empty filesDir edge case. Closes: molecule-core#362, molecule-core#321 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-11 03:34:55 +00:00
Molecule AI Core-BE	d67c3da13e	fix(platform): A2A proxy ResponseHeaderTimeout 60s -> 180s default, env-configurable	2026-05-11 02:09:06 +00:00
claude-ceo-assistant	65f9df24b8	Merge branch 'main' into fix/external-connection-user-facing-urls	2026-05-10 11:37:44 +00:00
core-be	a355b6f0ad	fix(workspace-server): emit Gitea/PyPI URLs for external user instructions (RFC #229 P2-5) The Molecule-AI GitHub org was suspended 2026-05-06; canonical SCM is now git.moleculesai.app. external_connection.go was still emitting github.com URLs in operator-facing copy-paste blocks, breaking external-agent onboarding silently. Per-site decisions (8 emit sites in 1 file): - L124 (channel template doc comment): swap source-of-truth comment to Gitea host. - L137 /plugin marketplace add Molecule-AI/...: swap to explicit Gitea HTTPS URL form. End-to-end-verified path per internal#37 § 1.A. - L138 /plugin install molecule@molecule-mcp-claude-channel: marketplace name is molecule-channel (per remote .claude-plugin/marketplace.json), not the repo name. Fix to molecule@molecule-channel. - L157 --channels plugin:molecule@molecule-mcp-claude-channel: same marketplace-name fix. - L179 user-facing GitHub URL: swap to Gitea. - L261 pip install git+https://github.com/Molecule-AI/molecule-sdk-python: not on PyPI; swap to git+https://git.moleculesai.app/molecule-ai/... - L310 hermes-channel doc comment: swap source-of-truth comment. - L339 pip install git+https://github.com/Molecule-AI/hermes-channel-molecule: not on PyPI; swap to Gitea. - L369 issue-tracker URL: swap to Gitea. Verification: - molecule-ai-workspace-runtime, codex-channel-molecule are on PyPI (200); no swap needed for those pip lines (they were already package-name form). - molecule-mcp-claude-channel, molecule-sdk-python, hermes-channel-molecule are NOT on PyPI; swapped to git+https://git.moleculesai.app/molecule-ai/ form. All three repos are public on Gitea (default branch main) and serve git-upload-pack unauthenticated (verified curl 200 against /info/refs?service=git-upload-pack). - Third-party github URLs (gin import, openai/codex, NousResearch/ hermes-agent upstream issue trackers, npm @openai/codex) intentionally preserved. Adds TestExternalTemplates_NoBrokenMoleculeAIGitHubURLs regression guard to prevent the same broken URLs from re-emerging on future template edits. go vet / go build / existing TestExternal* — all clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-10 04:23:46 -07:00
core-be	0846ebc1f6	fix(workspace-server): respect MOLECULE_IMAGE_REGISTRY in imagewatch + admin_workspace_images (RFC #229 P2-4) Two surfaces in workspace-server hardcoded `ghcr.io` and silently bypassed the `MOLECULE_IMAGE_REGISTRY` env override that flips every other image operation to the configured private mirror (e.g. AWS ECR in production): 1. internal/imagewatch/watch.go — image-auto-refresh polled `https://ghcr.io/v2/...` and `https://ghcr.io/token` directly. Post- suspension, with the platform pointed at ECR, the watcher silently stopped seeing digest changes (every poll either 404'd or hung on a registry it has no business talking to). 2. internal/handlers/admin_workspace_images.go — Docker Engine auth payload pinned `serveraddress: "ghcr.io"`, so when the operator sets `MOLECULE_IMAGE_REGISTRY=…ecr…/molecule-ai` the engine matched the wrong credential entry on every authenticated pull. Fix: extract `provisioner.RegistryHost()` returning the host portion of `RegistryPrefix()` (e.g. `ghcr.io` ← `ghcr.io/molecule-ai`, or `004947743811.dkr.ecr.us-east-2.amazonaws.com` ← the ECR mirror prefix), and route both surfaces through it. Default behavior is unchanged for OSS users on GHCR. Tests - New `TestRegistryHost_SplitsHostFromOrgPath` and `TestRegistryHost_NeverEmpty` pin the helper across GHCR / ECR / self-hosted Gitea / bare-host edge cases. - New `TestGHCRAuthHeader_RespectsRegistryEnv` asserts the Docker auth payload's `serveraddress` follows MOLECULE_IMAGE_REGISTRY (and never leaks the org-path suffix). - New `TestRemoteDigest_RegistryHostFollowsEnv` stands up an httptest server, points MOLECULE_IMAGE_REGISTRY at it, and confirms both the token endpoint and the manifest HEAD land there — i.e. the full image- watch loop respects the env override end-to-end. Both new tests were verified to FAIL on the pre-fix code path before the helper was wired in, so a future revert can't silently re-introduce the bug. Out of scope (followup needed) ECR uses `aws ecr get-authorization-token` (SigV4 + basic-auth) instead of GHCR's `/token?service=…&scope=…` flow. This PR makes the URL host- configurable; the bearer-token negotiation in `fetchPullToken` still speaks the GHCR flavor. On ECR with `IMAGE_AUTO_REFRESH=true`, the watcher will now fail loudly at the token fetch (logged per tick) rather than silently hitting ghcr.io. Operators on ECR should keep IMAGE_AUTO_REFRESH=false until ECR auth is wired — tracked as a separate task. Net effect of this PR alone is strictly better than pre-fix: fail-loud > silent-broken. Refs: RFC #229 P2-4 tier:low Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-10 04:21:27 -07:00
claude-ceo-assistant	bc555aeb45	Merge pull request 'fix(provisioner): export MOLECULE_MODEL canonical env + read it first; drop stray brace in delegation_test.go' (#286 ) from fix/molecule-model-env-go into main	2026-05-10 10:52:22 +00:00
hongming-pc2	9b930d8e39	fix(provisioner): export MOLECULE_MODEL (canonical model env) + read it first; drop stray brace in delegation_test.go internal#226 follow-up #1. `molecule_runtime.config` resolves the picked model as `MOLECULE_MODEL` > `MODEL` > (legacy) `MODEL_PROVIDER` (#280) — this side of the boundary now matches: - applyRuntimeModelEnv reads `MOLECULE_MODEL` ahead of `MODEL` / `MODEL_PROVIDER`, and exports BOTH `MOLECULE_MODEL` and `MODEL` (the latter kept for back-compat with everything that already reads `os.environ["MODEL"]`). So a workspace whose secrets carry `MOLECULE_MODEL` (the unambiguous name) is honoured, and the `MODEL_PROVIDER` misnomer — which got set to provider slugs ("minimax") and even runtime names ("claude-code") — is the lowest- priority fallback, exactly as on the runtime side. - the resolution-order comment is updated to flag MODEL_PROVIDER as the legacy-and-misleadingly-named var. Also drops a stray trailing `}` in delegation_test.go (committed in `97768272` "test(delegation): add isDeliveryConfirmedSuccess helper") that made `internal/handlers` fail to parse — one of the things keeping the package from compiling for tests. Tests: TestApplyRuntimeModelEnv_SetsUniversalMODELForAllRuntimes extended to assert MOLECULE_MODEL mirrors MODEL on every case, plus two new cases (MOLECULE_MODEL env fallback; MOLECULE_MODEL beats MODEL_PROVIDER). Could not run `go test ./internal/handlers/` locally — the package is still blocked behind `internal/plugins` `SourceResolver` redeclaration (the #248 plugin-router/resolver refactor, Core-BE's lane); CI validates once that lands. The applyRuntimeModelEnv change is mechanical (same shape as the existing `MODEL` handling) — reviewer please eyeball. Companion: molecule-core#280 (runtime config.py side), molecule-ai-workspace-template-claude-code#14 (CLI-stream-error surfacing). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-10 03:11:41 -07:00
Molecule AI · core-lead	cc4d7fc2c1	Merge branch 'main' into fix/offsec-001-error-message-scrubbing	2026-05-10 10:01:43 +00:00
Molecule AI Core Platform Lead	9e3d420363	[core-lead-agent] fix(core#228): cascade fixes for PluginResolver — make main compile PR #256 introduced PluginResolver to break the SourceResolver redeclaration deadlock, but missed three downstream call-sites that left main uncompilable: 1. plugins/drift_sweeper.go: PluginResolver.Resolve was declared returning PluginResolver (recursive). Registry.Resolve returns the production SourceResolver from source.go, so Registry didn't satisfy PluginResolver. Fix: Resolve returns SourceResolver. Add compile-time assertion that Registry satisfies PluginResolver so any future signature drift fails the build instead of router wiring. 2. plugins/drift_sweeper_test.go: stubResolver was still declared with the old SourceResolver shape AND asserted against SourceResolver — the assertion failed because stubResolver lacks Scheme()/Fetch(). Fix: stub is a PluginResolver; assertion targets PluginResolver. Drop the unused "database/sql" import that fails go vet. 3. router/router.go: - The `70f84823` reorder moved the plgh init block above its dockerCli dependency (line 538 used; line 594 declared). Moved the dockerCli declaration up so it's available where used; replaced the orphaned declaration in the terminal block with a comment. - Setup's pluginResolver param was typed plugins.SourceResolver — wrong for plugins.Registry (Registry is not a per-scheme resolver). Retyped to plugins.PluginResolver, which Registry actually satisfies. - Removed the broken `plgh.WithSourceResolver(pluginResolver)` call — WithSourceResolver expects a per-scheme SourceResolver, not a PluginResolver/registry. plgh has its own internal default registry (github+local) from NewPluginsHandler, so dropping the call is functionally a no-op vs the broken state. Kept the param so the drift sweeper (main.go) can share scheme enumeration when needed. 4. go.sum: add the content hash entry for go.moleculesai.app/plugin/ gh-identity/pluginloader (only the /go.mod hash was present, breaking `go build ./cmd/server`). Verified locally: go build ./... ✓ go vet ./... ✓ (only pre-existing org_external append warning) go test ./internal/plugins/... ✓ go test ./internal/router/... ✓ 6 pre-existing handler test failures (TestExecuteDelegation_, TestHandleDiagnose_*) are orthogonal — they did not run before because the package didn't compile. Out of scope for this fix; tracking separately. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-10 09:46:35 +00:00
Molecule AI Infra-SRE	7d1a189f2e	fix(mcp): scrub err.Error() from JSON-RPC error messages (OFFSEC-001) Replace all three err.Error() leaks in mcp.go with constant strings, consistent with the same fix applied to 22 other files in PRs #1193/1206/1219/#168. - Call handler (line ~329): "parse error: " + err.Error() → "parse error" - dispatchRPC params unmarshal (line ~417): "invalid params: " + err.Error() → "invalid parameters" - dispatchRPC tool call (line ~422): err.Error() → "tool call failed" + log.Printf server-side for forensics Routes protected by WorkspaceAuth (C1) and MCPRateLimiter (C2) — this is defence-in-depth per OFFSEC-001 / #259. Tests added: - TestMCPHandler_Call_MalformedJSON_ReturnsConstantParseError - TestMCPHandler_dispatchRPC_InvalidParams_ReturnsConstantMessage - TestMCPHandler_dispatchRPC_UnknownTool_ReturnsConstantMessage - TestMCPHandler_dispatchRPC_InvalidParams_ArrayInsteadOfObject Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-10 09:01:51 +00:00
Molecule AI Core-BE	70f8482399	fix(core#248): reorder router.go plugin init before drift handler — plgh ordering fix Plgh was referenced at line 505 before it was created at line 632, causing "undefined: plgh" on main. Moved the entire Plugins block to before the drift handler block. No functional change to registered routes — only declaration order. Combined with `d88a320f` (SourceResolver→PluginResolver rename, SSRF guard placement, and test regressions) this makes main fully compile again. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-10 08:08:09 +00:00
Molecule AI Core-BE	d88a320f0c	fix: resolve SourceResolver naming conflict, SSRF guard placement, and multiple test regressions - plugins/drift_sweeper.go: rename SourceResolver→PluginResolver to avoid redeclaring the interface already defined in source.go (core#228) - handlers/workspace.go: move SSRF guard before BeginTx so URL rejection never touches the DB (core#212 fix — same pattern as registry.go:324) - handlers/restart_signals.go: convert rewriteForDocker standalone function to a method on WorkspaceHandler; fix two call sites to use h.rewriteForDocker - handlers/plugins.go: change Sources() return type from plugins.SourceResolver to pluginSources (the narrow interface satisfied by Registry) - handlers/admin_plugin_drift.go: remove unused "context" import - handlers/delegation_test.go: remove stray closing brace - handlers/restart_signals_test.go: rewrite with correct miniredis v2 API (mr.Get takes context, mr.Set requires TTL), resolveURLTestWrapper embedding pattern, and corrected Redis key handling - handlers/workspace_test.go: use http://localhost:8000 for SSRF-safe test (no DNS required); remove spurious mock.ExpectExec for Redis CacheURL call Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-10 06:05:11 +00:00
Molecule AI Core-DevOps	4474ddc189	fix(workspace): add SSRF validation before writing external workspace URL Issue #212: POST /workspaces with runtime=external and a URL wrote the URL directly to the DB without validateAgentURL checking (the same check that registry.go:324 applies to the heartbeat path). An attacker with AdminAuth could register a workspace URL at a cloud metadata endpoint (169.254.169.254) and exfiltrate IAM credentials when the platform fires pre-restart drain signals. Changes: - workspace.go: add validateAgentURL(payload.URL) guard before the UPDATE at line 386. 400 on unsafe URL, no DB write occurs. - workspace_test.go: add 3 regression tests: - TestWorkspaceCreate_ExternalURL_SSRFSafe: safe public URL → 201 - TestWorkspaceCreate_ExternalURL_SSRFMetadataBlocked: 169.254.169.254 → 400 - TestWorkspaceCreate_ExternalURL_SSRFLoopbackBlocked: 127.0.0.1 → 400 Both unsafe tests assert zero DB calls (the handler rejects before any transaction). Ref: issue #212. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-10 02:30:18 +00:00
Molecule AI Core-BE	d0126662c7	docs: cycle report 2026-05-10 Cycle summary: - Assigned: core#125 (feat: preserve in-flight A2A messages across restart) - Implemented: Phase 1 of #125 — pre-restart drain signal - Opened: PR #207 - Reviewed: PR #140 (static-token fallback, approved) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-10 01:15:07 +00:00
Molecule AI Core-BE	ada1008012	feat(plugins): plugin drift detector + queue + admin apply endpoint (#123 ) ## Summary Adds the version-subscription drift detection and operator-apply workflow for per-workspace plugin tracking (core#113). ## Components Migration (`20260510000000_plugin_drift_queue`): - Adds `installed_sha` column to `workspace_plugins` — records the commit SHA installed so the drift sweeper can compare against upstream. - Creates `plugin_update_queue` table with status: pending \| applied \| dismissed. - Adds partial unique index to prevent duplicate pending rows per (workspace_id, plugin_name). GithubResolver (`github.go`): - `LastFetchSHA` field + `LastSHA()` getter — populated by `Fetch` after a successful shallow clone (captured before `.git` is stripped). Used by the install pipeline to seed `installed_sha`. - `ResolveRef(ctx, spec)` method — resolves a plugin spec to its full commit SHA using `git fetch --depth=1 + git rev-parse`. Used by the drift sweeper to get the current upstream SHA for a tracked ref (tag:vX.Y.Z, tag:latest, sha:…, or bare branch). Drift sweeper (`plugins/drift_sweeper.go`): - Periodic sweep every 1h: SELECTs rows where `tracked_ref != 'none' AND installed_sha IS NOT NULL`, resolves upstream SHA, queues drift if different. - `ListPendingUpdates()` — reads pending queue rows for the admin endpoint. - `ApplyDriftUpdate()` — marks entry applied (idempotent). - ctx.Err() guard on ticker arm to avoid post-shutdown work. Install pipeline (`plugins_install_pipeline.go`, `plugins_tracking.go`, `plugins_install.go`): - `stageResult.InstalledSHA` field — carries the SHA from Fetch to the DB. - `recordWorkspacePluginInstall` now accepts and stores `installed_sha`. - `deleteWorkspacePluginRow` — removes tracking row on uninstall so a stale SHA doesn't prevent the next install from creating a fresh row. - Both Docker and EIC uninstall paths call `deleteWorkspacePluginRow`. Admin endpoints (`handlers/admin_plugin_drift.go`): - `GET /admin/plugin-updates-pending` — list all pending drift entries. - `POST /admin/plugin-updates/:id/apply` — re-installs plugin from source_raw (re-fetching the same tracked ref), records the new SHA, marks entry applied, triggers workspace restart. Idempotent (already-applied returns 200). Router wiring (`router.go`, `cmd/server/main.go`): - Plugin registry created in main.go and shared between PluginsHandler and drift sweeper. - `router.Setup` accepts optional `pluginResolver` param. - `PluginsHandler.Sources()` export for the sweeper wiring pattern. ## Tests - `plugins/github_test.go` — `ResolveRef` coverage (invalid spec, git error, not-found mapping, no-panic for all ref shapes). - `plugins/drift_sweeper_test.go` — `ResolveRef` happy path, stub resolver interface compliance. - `handlers/admin_plugin_drift_test.go` — ListPending (empty, non-empty, DB error), Apply (not found, already applied, already dismissed, workspace_plugins missing). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-10 00:39:50 +00:00
Molecule AI Core-DevOps	e29b166f60	fix(test): poll error counter to 0 before asserting in RecordsMetricsOnSuccess Race-detector CI runs (-race) slow goroutines enough that a prior sweeper goroutine (e.g. TestStartSweeper_TransientErrorDoesNotCrashLoop) can still be running and incrementing pendingUploadsSweepErrors after metricDelta() captures its baseline, but before the success-path sweeper records its success metrics. The test then reads deltaError=1 instead of 0. Fix: add waitForMetricDelta(t, deltaError, 0, 2*time.Second) before the assertion, matching the polling pattern already used in the error-path test (TestStartSweeper_RecordsMetricsOnError). This ensures the error counter has settled before we assert on it. Fixes molecule-core#22. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-09 23:27:19 +00:00
Molecule AI Core Platform Lead	bd0a52a9a1	merge main into infra/fix-issue-151: keep PR #183 root-skip wording in local_test.go	2026-05-09 22:51:03 +00:00
Molecule AI Core-DevOps	eaf7dbb7c4	fix(handlers): auto-restart workspace after file write/delete/replace PUT /workspaces/:id/files and DELETE /workspaces/:id/files updated the config volume but never restarted the container, so the running agent continued serving stale file content from its in-memory cache. The SecretsHandler already had this pattern (issue #15); TemplatesHandler was missing it. Fix: after every successful write/delete in WriteFile, DeleteFile, and ReplaceFiles, call h.wh.RestartByID(workspaceID) asynchronously, guarded by h.wh != nil (nil-tolerant for callers that only use read-only surfaces). The RestartByID coalescing gate prevents thundering-herd on concurrent requests. Fixes #151. Fixes #87 (duplicate effort closed — core-be also filed #183). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-09 22:43:27 +00:00
Molecule AI Core Platform Lead	ede4551c73	Merge remote-tracking branch 'origin/main' into trig-185	2026-05-09 22:41:01 +00:00
Molecule AI Core-BE	2077cf4054	[core-be-agent] fix(pendinguploads/test): correct sweeper test isolation Issue #86: TestStartSweeper_RecordsMetricsOnSuccess fails in full-suite. Root cause: two cooperating bugs in the sweeper test harness. 1. Sweeper loop called sweepOnce after ctx cancellation (double-increment). When ctx was cancelled the loop's select received ctx.Done(), called sweepOnce with the cancelled ctx, storage.Sweep returned context error, and metrics.PendingUploadsSweepError() incremented the error counter a SECOND time before the loop exited. Subsequent tests captured a polluted error baseline and their deltaError assertions failed. 2. Tests called defer cancel() without waiting for the goroutine to exit. The goroutine could still be blocked on Sweep (waiting for the next ticker's C channel) when the next test called metricDelta(). If the goroutine's Sweep returned during the next test's measurement window, the shared metric counters mutated mid-baseline. Fix (production code): - Guard the ticker arm: if ctx.Err() != nil, continue instead of calling sweepOnce. This prevents the post-cancellation sweep from running. Fix (test harness): - startSweeperWithInterval gains a done chan struct{} parameter. When the loop exits the channel is closed exactly once. - StartSweeperForTest starts the goroutine and returns the done channel, allowing tests to drain it with <-done after cancel() — guaranteeing the goroutine has fully terminated before the next test's baseline. All 8 sweeper tests now use StartSweeperForTest and drain the done channel before returning, ensuring stable metric baselines across the full suite. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-09 22:30:28 +00:00
Molecule AI Core-DevOps	e65633bf15	fix(test): skip TestLocalResolver_BubblesUpCopyFailure when uid==0 os.Chmod(dst, 0o555) silently passes when os.Geteuid() == 0 because root bypasses POSIX permission checks. A previous attempt to use a symlink to /dev/full also fails: Go's os.MkdirAll resolves the symlink during path traversal and the kernel allows mkdir("/dev/full") as a device-table entry — io.Copy to /dev/full then succeeds with 0 bytes written and returns nil. The honest, consistent fix mirrors TestLocalResolver_CopyFileSourceUnreadable: skip when running as root. The write-failure propagation logic is exercised correctly in non-root CI environments. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-09 22:22:44 +00:00
Molecule AI Core-BE	e3ea8ff74a	[core-be-agent] fix(plugins/test): skip TestLocalResolver_BubblesUpCopyFailure when running as root Fixes issue #87: the test sets chmod(dst, 0o555) to make the destination read-only and asserts the copy fails. On Linux, root bypasses filesystem permissions and can write to 0o555 directories, so the copy succeeds when running as root and the assertion fails. Fix: check os.Getuid() == 0 at the start of the test and skip with a clear message. Mirrors the existing skip in TestLocalResolver_CopyFileSourceUnreadable (line 175) which already handles the same root-bypass issue for unreadable source files. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-09 22:21:35 +00:00
Molecule AI Core Platform Lead	97768272a3	test(delegation): add isDeliveryConfirmedSuccess helper + 10-case table test [core-lead-agent] Closes the regression-test gap on PR #170 (Core-BE's fix for #159 retry-storm). Original PR shipped the inline conditional without a unit test; this commit: 1. Extracts the inline `(proxyErr != nil && len(respBody) > 0 && 2xx)` predicate into a named helper `isDeliveryConfirmedSuccess`. Same behavior; the call site now reads `if isDeliveryConfirmedSuccess(...)`. 2. Adds `TestIsDeliveryConfirmedSuccess` — 10-case table test covering: - The new branch (2xx + body + transport error → recover as success): status=200, status=299, status=200+min-body - Each precondition failing in isolation: * nil proxyErr → false (no decision) * empty/nil body → false (no work to recover) * 4xx/5xx/3xx body → false (agent-signalled failure or redirect) * <200 status → false (not 2xx) Test-pattern mirrors the existing `TestIsTransientProxyError_Retries...` and `TestIsQueuedProxyResponse` table tests in the same file — same file-local mock-error pattern, no new test infra.	2026-05-09 22:12:04 +00:00
Molecule AI Core-BE	21a5c31b85	[core-be-agent] fix: Treat delivery-confirmed proxy errors as delegation success Two-part fix for issue #159 — successful delegation responses were rendered as error banners: PART 1 — a2a_proxy.go: When io.ReadAll fails mid-stream (e.g., TCP connection drops after the agent sent its 200 OK response), the prior code returned (0, nil, BadGateway) discarding both the HTTP status code and any partial body bytes already received. Fix: return (resp.StatusCode, respBody, error) so callers can inspect what was delivered even when the body read failed. PART 2 — delegation.go: New condition in executeDelegation after the transient-error retry block: if proxyErr != nil && len(respBody) > 0 && status >= 200 && status < 300 { goto handleSuccess } When proxyA2ARequest returns a delivery-confirmed error (status 2xx + non-empty partial body), route to success instead of failure. This prevents the retry-storm pattern where the canvas shows "error" with a Restart-workspace suggestion even though the delegation actually completed and the response is available. Regression tests (delegation_test.go): - TestExecuteDelegation_DeliveryConfirmedProxyError_TreatsAsSuccess: server sends 200 + partial body then closes; second attempt succeeds. Verifies the new condition fires for delivery-confirmed 2xx responses. - TestExecuteDelegation_ProxyErrorNon2xx_RemainsFailed: server sends 500 + partial body then closes. Verifies non-2xx routes to failure. - TestExecuteDelegation_ProxyErrorEmptyBody_RemainsFailed: server returns 502 Bad Gateway (empty body, transient). Verifies empty-body errors still route to failure (condition len(respBody) > 0 guards it). - TestExecuteDelegation_CleanProxyResponse_Unchanged: clean 200 OK. Verifies baseline (proxyErr == nil path) is unaffected. Fixes issue #159. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-09 22:11:54 +00:00
Molecule AI Core-BE	c9cf240751	[core-be-agent] fix(template_import): Remove silent template-dir fallback in ReplaceFiles offline path When the workspace container is offline and writeViaEphemeral fails (docker unavailable), ReplaceFiles previously fell back to writing to the host-side template directory. This silently returned 200 with "source: template" while the file change was invisible after restart because the restart handler reads from the Docker volume, not the template dir (issue #151). Now returns 503 Service Unavailable with a message telling the caller to retry after the workspace starts. The ephemeral write path is the only correct mechanism for offline-container updates. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-09 21:58:34 +00:00
Molecule AI Core-BE	7079d4ba01	[core-be-agent] fix: Treat delivery-confirmed proxy errors as delegation success When proxyA2ARequest returns an error but we have a non-empty response body with a 2xx status code, the agent completed the work successfully. The error is a delivery/transport error (e.g., connection reset after response was received). Previously, executeDelegation would mark these as "failed" even though the work was done, causing: - Retry storms (canvas suggests restart, user retries) - "error" rendering in canvas even though result is available - Data loss risk from unnecessary restarts Now we check for valid response data before marking as failed. Fixes issue #159. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-09 21:52:09 +00:00
Molecule AI Core Platform Lead	ea8ac4f023	Merge remote-tracking branch 'origin/main' into tech-debt/rename-net	2026-05-09 21:19:28 +00:00
Molecule AI Core Platform Lead	ad89173f0f	Merge remote-tracking branch 'origin/main' into tech-debt/rename-net	2026-05-09 21:18:46 +00:00
Molecule AI Core Platform Lead	7090eab0d5	fix(workspace-server): sanitize err.Error() leaks in CascadeDelete and OrgImport [core-lead-agent] Closes Core-Security audit finding (2026-05-09 audit cycle, MEDIUM): 1. workspace-server/internal/handlers/workspace_crud.go:335 `DELETE /workspaces/:id` returned `err.Error()` verbatim in the 500 body, leaking wrapped lib/pq driver strings (schema column names, index hints) to HTTP clients. Replaced with sanitized message; raw error already logged server-side via the existing log.Printf immediately above. 2. workspace-server/internal/handlers/org.go:610 `OrgImport` echoed the user-supplied `body.Dir` verbatim in the 404 "org template not found: %s" response. Path traversal is already blocked by resolveInsideRoot earlier in the handler, but echoing raw input back lets a client probe filesystem layout (404-with-echo vs. 400-from-resolve is itself a signal). Dropped the input from the client-facing message; preserved full context in a new log.Printf (orgFile path + the requested body.Dir) for operator triage. Both fixes preserve operator-side diagnostics (logs unchanged in content, only client-facing JSON sanitized). No behavior change for legitimate clients — error type, status code, and JSON shape all stay the same. Tier: low. Defensive hardening only; reduces info-disclosure surface without altering control-flow or auth gates.	2026-05-09 21:01:40 +00:00
Molecule AI Core-DevOps	252f8d0c47	tech-debt: rename molecule-monorepo-net -> molecule-core-net Renames Docker network across all code, configs, scripts, and docs. Per issue #93: the network was named molecule-monorepo-net as a holdover from when the repo was called molecule-monorepo. The canonical repo name is now molecule-core, so the network should be molecule-core-net. Files changed: - docker-compose.yml, docker-compose.infra.yml: network definition - infra/scripts/setup.sh: docker network create - scripts/nuke-and-rebuild.sh: docker network rm - workspace-server/internal/provisioner/provisioner.go: DefaultNetwork - All comments/docs: updated wording Acceptance: grep -rn 'molecule-monorepo-net' returns zero matches. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-09 20:51:48 +00:00
Molecule AI Core-FE	e8f521011f	fix(mcp): write delegation activity row so canvas Agent Comms shows task text MCP delegate_task and delegate_task_async bypassed the delegation activity lifecycle entirely — no activity_log row was written for MCP-initiated delegations. As a result the canvas Agent Comms tab rendered outbound delegations as bare "Delegation dispatched" events with no task body. Fix: insert a delegation row (mirroring insertDelegationRow from delegation.go) before the A2A call so the canvas can show the task text. The sync tool updates status to 'dispatched' after the HTTP call; the async tool inserts with 'dispatched' directly (goroutine won't update). Closes #158. Closes #49 (partial — addresses the canvas-display gap; full lifecycle parity requires DelegationWriter extraction, tracked separately). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-09 20:44:06 +00:00
claude-ceo-assistant	b3041c13d3	fix(org-import): emit started event after YAML parse so name is populated The org.import.started event was firing immediately after request body bind, before the YAML at body.Dir was loaded. Result: payload.name was "" whenever the caller passed `dir` (the common path — the canvas and all live imports use dir, not inline template). Three started rows already in the local platform's structure_events have empty name. Fix: move the started emit (and importStart timestamp) to after the YAML unmarshal / inline-template fallthrough, where tmpl.Name is guaranteed populated. Bonus: pre-parse error returns (invalid body, traversal-rejected dir, file-not-found, YAML expansion fail, YAML unmarshal fail, neither dir nor template provided) no longer emit an orphan started row — every started is now guaranteed a paired completed/failed. Verified live against running platform: re-imported molecule-dev-only, new started row in structure_events carries "Molecule AI Dev Team (dev-only)" instead of "". Tests: full handler suite green (`go test ./internal/handlers/`). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-08 16:25:24 -07:00
claude-ceo-assistant	bfefcb315b	refactor(handlers): Delete() delegates to CascadeDelete helper Drops ~150 lines of duplicated cascade logic from the Delete HTTP handler — workspace_crud.go's CascadeDelete (added in PR #137) and Delete() were running the same #73 race-guard sequence (status update → canvas_layouts → tokens → schedules → container stop → broadcast), just with Delete() inlined and CascadeDelete owning the OrgImport reconcile path. CascadeDelete now returns the descendant id list (was: count) so Delete() can drive the optional ?purge=true hard-delete against the same set the cascade just touched. Net diff: workspace_crud.go shrinks from ~270 lines in Delete() to ~75 lines (parse + 409 confirm gate + CascadeDelete call + stop-error 500 + purge block + 200 response). Behavior identical — same SQL ordering, same #73 race guard, same response shapes. Three sqlmock tests for the 0-children case gained one extra ExpectQuery for the recursive-CTE descendants scan (the old inline code skipped that query when len(children)==0; CascadeDelete walks unconditionally — returns 0 rows, same end state, one extra cheap query). Tests: full handler suite green (`go test ./internal/handlers/`). Live-tested against the running local platform: DELETE on a fake workspace returns `{"cascade_deleted":0,"status":"removed"}`, fleet of 9 workspaces preserved, refactored handler matches the prior wire-shape exactly. Tracked as the PR #137 follow-up tech-debt item. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-08 15:47:51 -07:00
claude-ceo-assistant	3de51faa19	fix(org-import): reconcile mode + audit-event emission Closes the additive-import zombie bug — re-running /org/import with a tree shape that reparents same-named roles left the prior workspace online because lookupExistingChild's dedupe is parent-scoped (different parent_id → "different" workspace). Caught 2026-05-08 after a dev-tree re-import left 8 orphans co-existing with the new tree on canvas until manual cascade-delete. Three layers in this PR: - mode="reconcile" on /org/import — after the import loop, online workspaces whose name matches an imported name but whose id isn't in the result set are cascade-deleted. Default mode "" / "merge" preserves existing additive behavior. Empty-set guards prevent accidental "delete everything" if either array comes up empty. - WorkspaceHandler.CascadeDelete extracted as a callable helper from the existing Delete HTTP handler so OrgImport's reconcile path shares the same teardown sequence (#73 race guard, container stop, volume removal, token revocation, schedule disable, event broadcast). The HTTP Delete handler still inlines the same logic; deduplication tracked as tech-debt follow-up. - emitOrgEvent(structure_events) records org.import.started + org.import.completed with mode, created/skipped/reconcile_removed counts, duration_ms, error. Replaces the lost-on-restart stdout-only log shape for an audit-trail surface that's queryable by SQL. Closes the "what happened at 20:13?" debugging gap that motivated this fix. Verified live against the local platform: cascade-delete on an old tree's removed root cleared 8 surviving orphans; mode="reconcile" with a freshly-INSERTed fake orphan removed exactly the fake; idempotent re-run of reconcile is a no-op (0 removed, no errors); structure_events captures every started+completed pair with full payload. 7 new unit tests (walkOrgWorkspaceNames flat/nested/spawning:false/ empty-name; emitOrgEvent success + DB-error-swallow; errString). Full handler suite green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-08 15:04:47 -07:00
claude-ceo-assistant	6f861926bd	Merge pull request 'fix(workspace_provision): preserve MODEL secret over MODEL_PROVIDER slug on restart' (#136 ) from fix/preserve-model-secret-on-restart into main	2026-05-08 21:31:50 +00:00
claude-ceo-assistant	15c5f32491	fix(workspace_provision): preserve MODEL secret over MODEL_PROVIDER slug on restart Phase 4 follow-up to template-claude-code PR #9 (2026-05-08 dev-tree wedge). Pre-fix: applyRuntimeModelEnv unconditionally overwrote envVars["MODEL"] with the MODEL_PROVIDER slug whenever payload.Model was empty (the restart path). This silently wiped the operator'\''s explicit per-persona MODEL secret on every restart. Symptom: dev-tree workspaces booted correctly on first /org/import (the envVars map was populated direct from the persona env file with both MODEL=MiniMax-M2.7-highspeed and MODEL_PROVIDER=minimax), then on the next Restart the MODEL secret got clobbered to literal "minimax" — a provider slug, not a valid model id — and the workspace template'\''s adapter failed to match any registry prefix, fell through to providers[0] (anthropic-oauth), and wedged at SDK initialize. Fix: resolution order in applyRuntimeModelEnv is now: 1. payload.Model (caller passed the canvas-picked model id verbatim) 2. envVars["MODEL"] (workspace_secret persisted from persona env) 3. envVars["MODEL_PROVIDER"] (legacy canvas Save+Restart shape) Tests ----- TestApplyRuntimeModelEnv_PersonaEnvMODELSecretPreserved — locks in the new resolution order with four cases: - MODEL secret wins over MODEL_PROVIDER slug (persona-env shape) - MODEL secret wins even when same as MODEL_PROVIDER - MODEL absent → fall back to MODEL_PROVIDER (legacy shape) - Both absent → no MODEL set (no-op) Existing TestApplyRuntimeModelEnv_SetsUniversalMODELForAllRuntimes continues to pass — fix is strictly additive on the precedence chain.	2026-05-08 14:31:14 -07:00
claude-ceo-assistant	9b5e89bb42	Merge pull request 'feat(org-import): add spawning:false field to skip workspace + descendants' (#135 ) from feat/org-import-spawning-false into main	2026-05-08 21:20:56 +00:00
claude-ceo-assistant	b91da1ab77	feat(org-import): add spawning:false field to skip workspace + descendants Lets a workspace declare it (and its entire subtree) should be skipped during /org/import. Pointer-typed `*bool` so we distinguish "explicitly false" from "unset" (default = spawn). ## Use case The dev-tree org template ships the full role taxonomy (Dev Lead with Core Platform / Controlplane / App & Docs / Infra / SDK Leads, each with their own engineering / QA / security / UI-UX children — 27 personas total in a single import). Some setups need a smaller set: - Local dev on a memory-constrained machine - Demo / smoke runs that don't need the full org breathing - Customer trials starting with leadership-only before fan-out Pre-fix the only options were: - Edit the canonical template (mutates shared state) - Author a parallel slimmer template (duplicates structure) - Manual workspace deprovision after full import (wasteful — already paid the docker pull / build cost) `spawning: false` is the per-workspace knob that solves this without touching the canonical template structure. ## Semantics - Unset: workspace spawns (current behaviour, no migration) - `spawning: true`: explicitly spawns (same as unset) - `spawning: false`: workspace is skipped AND every descendant is skipped. The guard sits BEFORE any side effect in createWorkspaceTree — no DB row, no docker provision, no children recursion. A false-spawning subtree is genuinely a no-op except for the log line. countWorkspaces still counts the subtree (so /org/templates numbers reflect the full structure). ## Stage A — verified Local dev-only template that wraps teams/dev.yaml (Dev Lead) with children:[] cleared on the 5 sub-team yaml files, plus 3 floater personas (Release Manager / Integration Tester / Fullstack Engineer). /org/import returned 9 workspaces. Drop-in: same result via `spawning: false` on each sub-tree root in the future. ## Stage B — N/A Pure additive feature on the org-template handler. No SaaS deploy chain implications. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-08 14:20:14 -07:00
claude-ceo-assistant	c3596d6271	fix(org-import): use ws.FilesDir as persona-dir lookup, add docker-cli-buildx to dev image ## org_import.go — persona env injection root-cause fix The Phase-3 fix from earlier today (`feedback/per-agent-gitea-identity-default`) introduced loadPersonaEnvFile to inject persona-specific creds into workspace_secrets on /org/import. It passed `ws.Role` as the persona-dir lookup key, but in our dev-tree org.yaml shape `role:` carries the multi-line descriptive text the agent reads from its prompt ("Engineering planning and team coordination — leads Core Platform, Controlplane, ..."), while `files_dir:` holds the short slug (`core-lead`, `dev-lead`, etc.) matching `~/.molecule-ai/personas/<files_dir>/env`. isSafeRoleName silently rejected the multi-word role text → no persona env loaded → every imported workspace booted with zero workspace_secrets rows → no ANTHROPIC / CLAUDE_CODE / MINIMAX auth in the container env → claude_agent_sdk wedged on `query.initialize()` with a 60s control-request timeout. After the fix, /org/import on the dev tree (27 personas) populates 8 workspace_secrets per workspace (Gitea identity + MODEL/MODEL_PROVIDER + provider-specific token), 5 of 6 leads boot online, and the remaining wedges trace to a separate runtime-template-repo bug (workspace-template-claude-code's claude_sdk_executor.py doesn't dispatch on MODEL_PROVIDER=minimax — filed separately). ## Dockerfile.dev — docker-cli + docker-cli-buildx Without these, every claude-code/tier-2 workspace POST fails-fast: - docker-cli alone produces `exec: "docker": executable file not found` - docker-cli alone (no buildx) fails on `docker build` with `ERROR: BuildKit is enabled but the buildx component is missing or broken` Both packages are now installed in the dev image; verified with `docker exec molecule-core-platform-1 docker buildx version`. ## Stage A verified Local /org/import dev-only path: 27 workspaces created, all 27 receive persona env injection (8 secrets each — Gitea identity + provider creds). Lead workspaces (claude-code-OAuth tier) boot online. ## Stage B — N/A Local-dev-only path (docker-compose.dev.yml + dev image). Tenant EC2 provisioning uses Dockerfile.tenant (untouched). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-08 13:50:46 -07:00
claude-ceo-assistant	72b0d4b1ab	feat(plugins): workspace_plugins tracking table — version-subscription foundation Closes core#113 partial. Adds the DB foundation for the version-subscription model. Drift detection + queue + admin apply endpoint are follow-up scope (separate PR; filed as a new issue). WHY THIS PR ONLY GETS US PART-WAY Plugin install state today is filesystem-only — '/configs/plugins/<name>/' inside the container. There's no DB record of 'plugin X installed at workspace W from source S, tracking ref T'. That makes drift detection impossible: nothing to compare upstream tags against. This PR adds the table + the install-endpoint hook that writes to it. With baseline tags now on every plugin (post internal#92), the table starts collecting tracked-ref values immediately on the next install. The actual drift-check job + queue + apply endpoint layer on top. WHAT THIS ADDS workspace_plugins table: workspace_id FK → workspaces(id) ON DELETE CASCADE plugin_name canonical name from plugin.yaml source_raw full source URL the install used tracked_ref 'none' \| 'tag:vX.Y.Z' \| 'tag:latest' \| 'sha:<full>' installed_at, updated_at installRequest gains optional 'track' field (defaults to 'none'). Install handler upserts the workspace_plugins row after delivery succeeds. DB write failure is logged but doesn't fail the install (the plugin IS in the container; surfacing 500 misleads the caller). validateTrackedRef enforces the closed set of accepted shapes: 'none' \| 'tag:<non-empty>' \| 'sha:<non-empty>' Bare values like 'latest' / 'main' / version-strings without prefix are rejected — the drift detector keys on prefix to know what kind of resolution to do. WHAT THIS DOES NOT ADD (filed separately) - Drift detector job (cron / on-demand) that scans 'WHERE tracked_ref != none' rows and queues updates on upstream drift - plugin_update_queue table (separate migration once detector lands) - GET /admin/plugin-updates-pending and POST .../apply endpoints - Tier-aware apply (core#115 — composes here) PHASE 4 SELF-REVIEW (FIVE-AXIS) Correctness: No finding — install endpoint behavior unchanged for callers that don't pass 'track'. DB write is best-effort + logged on failure. validateTrackedRef rejects ambiguous bare strings. Readability: No finding — separate file plugins_tracking.go isolates the new concern; install handler delta is a single 4-line block. Architecture: No finding — additive table; existing schema untouched. Migration 20260508160000_* uses the timestamp-prefixed convention. Security: No finding — INSERT params via placeholders (no string interpolation). validateTrackedRef rejects unexpected shapes before the column constraint would. Performance: No finding — one extra ExecContext per install. Install is already seconds-scale (network fetch + tar + docker exec); rounds to noise. TESTS (1 new, all green) TestValidateTrackedRef — pin closed set + structural validators REFS core#113 — this issue (foundation only; drift+queue+apply = follow-up) internal#92, internal#93 — plugin/template baseline tags (now exists for tracking) core#114 — atomic install (this PR composes — no atomicity regression) core#115 — canary tier filter (will key off the same DB foundation) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-08 08:52:35 -07:00
claude-ceo-assistant	249e760fbd	feat(plugins): hot-reload classifier — skip restart on SKILL-content-only updates Closes molecule-core#112. Composes with #114 (atomic install). Before issuing restartFunc, classify the diff between staged and live: - skill-content-only: only **/SKILL.md content changed → skip restart (Claude Code re-reads SKILL.md on each Skill invocation; no in-memory cache) - cold: anything else → restartFunc as before (hooks/settings load at session start; plugin.yaml is structural; added/removed files require a fresh load) DETECTION - Hash every regular file in staged tree (host filesystem, sha256) - Hash every regular file in live tree (in-container via docker exec sh -c 'cd <livePath> && find . -type f -print0 \| xargs -0 sha256sum') - .complete marker dropped from comparison (mtime varies install-to- install; including it would force-cold every reinstall) - File added/removed → cold - File content differs but isn't SKILL.md → cold - All differences are SKILL.md basenames → skill-content-only DEFAULTS COLD - First install (no live tree) → cold - Live tree read failure → cold (conservative; never hot-reload speculatively) - Symlinks skipped during hash (same posture as tar walker) PHASE 4 SELF-REVIEW Correctness: No finding — all error paths default to cold; never falsely classify as skill-content-only. The .complete drop is a deliberate exception (the marker is bookkeeping, not content). Readability: No finding — single-purpose helpers (hashLocalTree, hashContainerTree, isSkillMarkdown, shQuote) each do one thing. The classifier itself reads as 'compare set, then walk diff with isSkillMarkdown gate.' Architecture: No finding — composes existing execAsRoot primitive; new helpers in plugins_classifier.go don't touch any other handler. Old behavior unchanged when live read fails. Security: No finding — shQuote single-quotes any non-trivial path, pluginName comes from validatePluginName-validated source, and the docker exec command takes the path as a single arg (xargs -0 handles binary-safe path delimiting). Symlinks skipped. Performance: No finding — adds two tree walks (host + container) per install. Container walk is one docker exec call returning sha256 lines; for typical plugins (~10-50 files) round-trip is ~100ms. Versus the saved ~5-10s of restart on a hot-reloadable update, this is a clear win. TESTS (4 new, all green; full handler suite green) TestIsSkillMarkdown — basename match, case-sensitive TestHashLocalTree_StableHash — re-hash same dir = same map TestHashLocalTree_SymlinkSkipped — hostile link doesn't poison classifier TestShQuote — quoting boundary for shell injection safety REFS molecule-core#112 — this issue molecule-core#114 — atomic install (.complete marker added there) Reno-Stars iteration safety (Hongming 2026-05-08) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-08 08:26:05 -07:00
claude-ceo-assistant	3e96184d6f	Merge pull request 'feat(plugins): atomic install — stage→snapshot→swap→marker (docker path)' (#120 ) from feat/plugin-atomic-install into main	2026-05-08 15:23:31 +00:00

1 2 3 4 5 ...

699 Commits