c1a94deabc
25 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
| e8af1df261 |
fix(org): add per-workspace RequiredEnv preflight check (#232)
Before returning 201 on /org/import, verify that every RequiredEnv
declared at the workspace level is covered by either:
(a) a global secret key (already validated by the existing preflight)
(b) a key present in the workspace's .env files (org root .env +
per-workspace <files_dir>/.env), matching the resolution order
used by createWorkspaceTree at runtime
Previously, collectOrgEnv correctly walked all
tmpl.Workspaces[].RequiredEnv and added them to the global preflight
check, but loadConfiguredGlobalSecretKeys only checked global_secrets.
Workspace-specific .env files are injected into workspace_secrets AFTER
the 201 response, so an unsatisfied per-workspace RequiredEnv returned
201 and the workspace came up NOT CONFIGURED — breaking on every LLM
call with no signal to the operator.
Changes:
- org_import.go: add PerWorkspaceUnsatisfied struct +
collectPerWorkspaceUnsatisfied (mirrors createWorkspaceTree's
three-source .env resolution stack)
- org.go: after the global preflight block, call
collectPerWorkspaceUnsatisfied if orgBaseDir != ""; return 412
with per-workspace details before creating any workspaces
- org_workspace_required_env_test.go: 8 unit tests covering global
coverage, .env coverage, missing keys, any-of groups, nested
children, empty orgBaseDir, and multiple workspaces
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
|||
| aa49dbc728 |
fix(handlers): add rows.Err() checks after rows.Next() loops
Add deferred error checks following rows.Next() iteration in: - ListDelegations (delegation.go): log on error, continue serving results - org import reconcile orphan query (org.go): log + append to reconcileErrs Fixes the rows.Err() gap identified in the delegated rows.Err() check PR (#302, closed; replaced by this PR). Two additional files already had the check (activity.go, memories.go) — pattern applied consistently here. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> |
|||
|
|
7090eab0d5 |
fix(workspace-server): sanitize err.Error() leaks in CascadeDelete and OrgImport
[core-lead-agent] Closes Core-Security audit finding (2026-05-09 audit cycle, MEDIUM): 1. workspace-server/internal/handlers/workspace_crud.go:335 `DELETE /workspaces/:id` returned `err.Error()` verbatim in the 500 body, leaking wrapped lib/pq driver strings (schema column names, index hints) to HTTP clients. Replaced with sanitized message; raw error already logged server-side via the existing log.Printf immediately above. 2. workspace-server/internal/handlers/org.go:610 `OrgImport` echoed the user-supplied `body.Dir` verbatim in the 404 "org template not found: %s" response. Path traversal is already blocked by resolveInsideRoot earlier in the handler, but echoing raw input back lets a client probe filesystem layout (404-with-echo vs. 400-from-resolve is itself a signal). Dropped the input from the client-facing message; preserved full context in a new log.Printf (orgFile path + the requested body.Dir) for operator triage. Both fixes preserve operator-side diagnostics (logs unchanged in content, only client-facing JSON sanitized). No behavior change for legitimate clients — error type, status code, and JSON shape all stay the same. Tier: low. Defensive hardening only; reduces info-disclosure surface without altering control-flow or auth gates. |
||
|
|
b3041c13d3 |
fix(org-import): emit started event after YAML parse so name is populated
All checks were successful
CodeQL / Analyze (${{ matrix.language }}) (go) (pull_request) Successful in 1s
CodeQL / Analyze (${{ matrix.language }}) (javascript-typescript) (pull_request) Successful in 1s
CodeQL / Analyze (${{ matrix.language }}) (python) (pull_request) Successful in 1s
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 5s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 7s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 6s
CI / Detect changes (pull_request) Successful in 8s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 7s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 7s
Harness Replays / detect-changes (pull_request) Successful in 7s
E2E API Smoke Test / detect-changes (pull_request) Successful in 9s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 1s
CI / Python Lint & Test (pull_request) Successful in 3s
CI / Canvas (Next.js) (pull_request) Successful in 4s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 3s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 4s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
Harness Replays / Harness Replays (pull_request) Successful in 59s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 1m45s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 1m53s
CI / Platform (Go) (pull_request) Successful in 2m51s
The org.import.started event was firing immediately after request body bind, before the YAML at body.Dir was loaded. Result: payload.name was "" whenever the caller passed `dir` (the common path — the canvas and all live imports use dir, not inline template). Three started rows already in the local platform's structure_events have empty name. Fix: move the started emit (and importStart timestamp) to after the YAML unmarshal / inline-template fallthrough, where tmpl.Name is guaranteed populated. Bonus: pre-parse error returns (invalid body, traversal-rejected dir, file-not-found, YAML expansion fail, YAML unmarshal fail, neither dir nor template provided) no longer emit an orphan started row — every started is now guaranteed a paired completed/failed. Verified live against running platform: re-imported molecule-dev-only, new started row in structure_events carries "Molecule AI Dev Team (dev-only)" instead of "". Tests: full handler suite green (`go test ./internal/handlers/`). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
bfefcb315b |
refactor(handlers): Delete() delegates to CascadeDelete helper
Some checks failed
CodeQL / Analyze (${{ matrix.language }}) (go) (pull_request) Successful in 1s
CodeQL / Analyze (${{ matrix.language }}) (javascript-typescript) (pull_request) Successful in 2s
CodeQL / Analyze (${{ matrix.language }}) (python) (pull_request) Successful in 2s
pr-guards / disable-auto-merge-on-push (pull_request) Successful in 5s
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 11s
CI / Detect changes (pull_request) Successful in 13s
E2E API Smoke Test / detect-changes (pull_request) Successful in 13s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 13s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 13s
Harness Replays / detect-changes (pull_request) Successful in 13s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 9s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 13s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 5s
CI / Python Lint & Test (pull_request) Successful in 41s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 41s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 1m4s
CI / Canvas (Next.js) (pull_request) Successful in 1m3s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
Harness Replays / Harness Replays (pull_request) Failing after 1m5s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 3m47s
CI / Platform (Go) (pull_request) Successful in 5m18s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Has been cancelled
Drops ~150 lines of duplicated cascade logic from the Delete HTTP handler — workspace_crud.go's CascadeDelete (added in PR #137) and Delete() were running the same #73 race-guard sequence (status update → canvas_layouts → tokens → schedules → container stop → broadcast), just with Delete() inlined and CascadeDelete owning the OrgImport reconcile path. CascadeDelete now returns the descendant id list (was: count) so Delete() can drive the optional ?purge=true hard-delete against the same set the cascade just touched. Net diff: workspace_crud.go shrinks from ~270 lines in Delete() to ~75 lines (parse + 409 confirm gate + CascadeDelete call + stop-error 500 + purge block + 200 response). Behavior identical — same SQL ordering, same #73 race guard, same response shapes. Three sqlmock tests for the 0-children case gained one extra ExpectQuery for the recursive-CTE descendants scan (the old inline code skipped that query when len(children)==0; CascadeDelete walks unconditionally — returns 0 rows, same end state, one extra cheap query). Tests: full handler suite green (`go test ./internal/handlers/`). Live-tested against the running local platform: DELETE on a fake workspace returns `{"cascade_deleted":0,"status":"removed"}`, fleet of 9 workspaces preserved, refactored handler matches the prior wire-shape exactly. Tracked as the PR #137 follow-up tech-debt item. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
3de51faa19 |
fix(org-import): reconcile mode + audit-event emission
All checks were successful
CodeQL / Analyze (${{ matrix.language }}) (go) (pull_request) Successful in 1s
CodeQL / Analyze (${{ matrix.language }}) (javascript-typescript) (pull_request) Successful in 2s
CodeQL / Analyze (${{ matrix.language }}) (python) (pull_request) Successful in 1s
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 5s
CI / Detect changes (pull_request) Successful in 8s
E2E API Smoke Test / detect-changes (pull_request) Successful in 7s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 6s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 8s
Harness Replays / detect-changes (pull_request) Successful in 7s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 7s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 10s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 2s
CI / Python Lint & Test (pull_request) Successful in 4s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 34s
CI / Canvas (Next.js) (pull_request) Successful in 57s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 56s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 1m1s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 2m22s
Harness Replays / Harness Replays (pull_request) Successful in 2m59s
CI / Platform (Go) (pull_request) Successful in 3m20s
Closes the additive-import zombie bug — re-running /org/import with a tree shape that reparents same-named roles left the prior workspace online because lookupExistingChild's dedupe is parent-scoped (different parent_id → "different" workspace). Caught 2026-05-08 after a dev-tree re-import left 8 orphans co-existing with the new tree on canvas until manual cascade-delete. Three layers in this PR: - mode="reconcile" on /org/import — after the import loop, online workspaces whose name matches an imported name but whose id isn't in the result set are cascade-deleted. Default mode "" / "merge" preserves existing additive behavior. Empty-set guards prevent accidental "delete everything" if either array comes up empty. - WorkspaceHandler.CascadeDelete extracted as a callable helper from the existing Delete HTTP handler so OrgImport's reconcile path shares the same teardown sequence (#73 race guard, container stop, volume removal, token revocation, schedule disable, event broadcast). The HTTP Delete handler still inlines the same logic; deduplication tracked as tech-debt follow-up. - emitOrgEvent(structure_events) records org.import.started + org.import.completed with mode, created/skipped/reconcile_removed counts, duration_ms, error. Replaces the lost-on-restart stdout-only log shape for an audit-trail surface that's queryable by SQL. Closes the "what happened at 20:13?" debugging gap that motivated this fix. Verified live against the local platform: cascade-delete on an old tree's removed root cleared 8 surviving orphans; mode="reconcile" with a freshly-INSERTed fake orphan removed exactly the fake; idempotent re-run of reconcile is a no-op (0 removed, no errors); structure_events captures every started+completed pair with full payload. 7 new unit tests (walkOrgWorkspaceNames flat/nested/spawning:false/ empty-name; emitOrgEvent success + DB-error-swallow; errString). Full handler suite green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
b91da1ab77 |
feat(org-import): add spawning:false field to skip workspace + descendants
Some checks failed
CodeQL / Analyze (${{ matrix.language }}) (go) (pull_request) Successful in 11s
CodeQL / Analyze (${{ matrix.language }}) (javascript-typescript) (pull_request) Successful in 11s
CodeQL / Analyze (${{ matrix.language }}) (python) (pull_request) Successful in 11s
Check merge_group trigger on required workflows / Required workflows have merge_group trigger (pull_request) Successful in 24s
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 36s
cascade-list-drift-gate / check (pull_request) Successful in 35s
E2E API Smoke Test / detect-changes (pull_request) Successful in 36s
CI / Detect changes (pull_request) Successful in 39s
Lint curl status-code capture / Scan workflows for curl status-capture pollution (pull_request) Successful in 27s
branch-protection drift check / Branch protection drift (pull_request) Successful in 45s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 47s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 37s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 58s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 57s
Harness Replays / detect-changes (pull_request) Successful in 50s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 29s
CI / Python Lint & Test (pull_request) Successful in 33s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 56s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 30s
Ops Scripts Tests / Ops scripts (unittest) (pull_request) Successful in 2m5s
Harness Replays / Harness Replays (pull_request) Failing after 1m37s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 4m54s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 6m49s
CI / Platform (Go) (pull_request) Successful in 9m13s
CI / Canvas (Next.js) (pull_request) Failing after 11m30s
CI / Canvas Deploy Reminder (pull_request) Has been cancelled
Lets a workspace declare it (and its entire subtree) should be skipped during /org/import. Pointer-typed `*bool` so we distinguish "explicitly false" from "unset" (default = spawn). ## Use case The dev-tree org template ships the full role taxonomy (Dev Lead with Core Platform / Controlplane / App & Docs / Infra / SDK Leads, each with their own engineering / QA / security / UI-UX children — 27 personas total in a single import). Some setups need a smaller set: - Local dev on a memory-constrained machine - Demo / smoke runs that don't need the full org breathing - Customer trials starting with leadership-only before fan-out Pre-fix the only options were: - Edit the canonical template (mutates shared state) - Author a parallel slimmer template (duplicates structure) - Manual workspace deprovision after full import (wasteful — already paid the docker pull / build cost) `spawning: false` is the per-workspace knob that solves this without touching the canonical template structure. ## Semantics - Unset: workspace spawns (current behaviour, no migration) - `spawning: true`: explicitly spawns (same as unset) - `spawning: false`: workspace is skipped AND every descendant is skipped. The guard sits BEFORE any side effect in createWorkspaceTree — no DB row, no docker provision, no children recursion. A false-spawning subtree is genuinely a no-op except for the log line. countWorkspaces still counts the subtree (so /org/templates numbers reflect the full structure). ## Stage A — verified Local dev-only template that wraps teams/dev.yaml (Dev Lead) with children:[] cleared on the 5 sub-team yaml files, plus 3 floater personas (Release Manager / Integration Tester / Fullstack Engineer). /org/import returned 9 workspaces. Drop-in: same result via `spawning: false` on each sub-tree root in the future. ## Stage B — N/A Pure additive feature on the org-template handler. No SaaS deploy chain implications. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
3bc7749e84 |
feat(org-import): make provision concurrency configurable via env
Org-import was hard-capped at 3 concurrent workspace provisions (#1084), calibrated for Docker-mode workspaces where each provision was a docker-run. Now that workspaces are EC2 instances, AWS RunInstances parallelises happily and the artificial cap of 3 makes a 7-workspace org-import take 3-4× longer than necessary (3 batches × ~70s/provision ≈ 4 min wall time when AWS could absorb all 7 in parallel for ~70s). This PR makes the cap configurable via MOLECULE_PROVISION_CONCURRENCY: unset → 3 (Docker-mode default, unchanged) "0" → effectively unlimited (SaaS / EC2 backend; AWS rate-limit + vCPU quota are the real backpressure) N>0 → exactly N N<0 → fall back to default 3 + warning log garbage → fall back to default 3 + warning log The "0 = unlimited" mapping is the user-facing convention requested for SaaS deployments — operators don't have to pick an arbitrary large number. Implementation hands off 1<<20 internally so the channel-based semaphore stays a no-op without infinite-buffer risk. Test coverage (org_provision_concurrency_test.go, 6 cases / 15 subtests): - unset → default - "0" → large unlimited cap - positive integer exact (1, 5, 10, 50) - negative → default + warning - non-numeric → default + warning - whitespace-trimmed (" 7 " → 7) Boot-time log line confirms the resolved cap so an operator can verify their env is being honored without re-deploying. Does NOT address the separate 600s "never registered" timeout the user also reported during org-import — that's filed as molecule-core#2793 for proper investigation (parallel-provision contention, network routing, register-retry budget, or container-start failure are all candidates and need live SSM capture to bisect). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
80c612d987 |
fix(org-import): remove force=true bypass of required-env preflight
The pre-#2290 \`force: true\` flag on POST /org/import skipped the required-env preflight, letting orgs import without their declared required keys (e.g. ANTHROPIC_API_KEY). The ux-ab-lab incident: that import path was used, the org shipped without ANTHROPIC_API_KEY in global_secrets, and every workspace 401'd on the first LLM call. Per #2290 picks (C/remove/both): - Q1=C: template-derived required_env (no schema change — already the existing aggregation via collectOrgEnv). - Q2=remove: drop the bypass entirely. The seed/dev-org flow that legitimately needs to skip becomes a separate dry-run-import path with its own audit trail, not a permission bypass. - Q3=block-at-import-only: provision-time drift logging is a follow-up; for this PR, blocking at import is the gate. Surface change: - Force field removed from POST /org/import request body. - 412 \"suggestion\" text drops the \"or pass force=true\" guidance. - Legacy callers sending {\"force\": true} are silently tolerated (Go's json.Unmarshal drops unknown fields), so no client-side breakage; the bypass effect is just gone. Audited callers in this repo: - canvas/src/components/TemplatePalette.tsx — never sends force. - scripts/post-rebuild-setup.sh — never sends force. - Only external tooling sent force=true. Those callers must now set the global secret via POST /settings/secrets before importing. Adds TestOrgImport_ForceFieldRemoved as a structural pin: if a future change re-adds Force to the body struct, the test fails and forces an explicit reckoning with the #2290 rationale. Closes #2290 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
4e6f6bf0f3 | merge: sync staging into feat/wire-max-concurrent-from-template-1408 | ||
|
|
4bcfc64e25 |
chore(simplify): drop verbose comments + introduce DefaultMaxConcurrentTasks const
Simplify pass on top of the wire-up commit: - New const models.DefaultMaxConcurrentTasks = 1; handlers and tests reference the symbol so the schema-default mirror lives in one place. - Strip 5 multi-line comments that narrated what the code does. - Drop the duplicate field-rationale on OrgWorkspace; the one on CreateWorkspacePayload is canonical. - Drop test-side positional comments that would silently lie if columns get reordered. Pure cleanup; no behaviour change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
ad5295cd8a |
feat(workspaces): wire max_concurrent_tasks from template config.yaml (#1408)
Phase 4 of #1408 (active_tasks counter). Runtime increment/decrement, schema column (037), and scheduler enforcement (scheduler.go:312) already shipped — but the write path from template config.yaml + direct API was missing, so every workspace silently fell through to the schema default of 1. Leaders that set max_concurrent_tasks: 3 in their org template were getting 1 anyway, defeating the entire feature for the use case it was built for (cron-vs-A2A contention on PM/lead workspaces). - OrgWorkspace gains MaxConcurrentTasks (yaml + json tags) - CreateWorkspacePayload gains MaxConcurrentTasks (json tag) - Both INSERTs now write the column unconditionally; 0/omitted payload value falls back to 1 (schema default mirror) so the wire stays single-shape — no forked column list / goto. - Existing Create-handler test mocks updated to expect the 11th arg. - New TestWorkspaceCreate_MaxConcurrentTasksOverride locks the payload→DB propagation for the leader case (value=3). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
ad73a56db1 |
feat(env-preflight): support any_of OR groups (e.g. API_KEY OR OAUTH_TOKEN)
Extends the org-import env preflight so a template can declare an
alternative: satisfy ANY one member to pass. Motivated by the
Claude-family node case where either ANTHROPIC_API_KEY or
CLAUDE_CODE_OAUTH_TOKEN unlocks the agent — forcing both was wrong.
Server (workspace-server):
- New EnvRequirement union type with custom YAML + JSON
(un)marshaling. Accepts scalar (strict) or {any_of: [...]} in
both on-disk org.yaml and inline POST /org/import bodies.
- collectOrgEnv now returns []EnvRequirement. Dedups groups by
sorted-member signature. "Strict wins" pruning drops any-of
groups that mention a name already declared strictly (same
tier and cross-tier).
- Import preflight uses EnvRequirement.IsSatisfied — scalar =
exact match, group = any member present.
- Empty any_of: [] rejected at parse time (never-satisfiable).
- 14 handler tests (6 updated for the union shape, 8 new
covering any-of satisfaction, dedup, strict-dominates-group,
cross-tier pruning, invalid-member filtering, YAML round-trip,
and empty-any-of rejection).
Canvas:
- EnvRequirement = string | {any_of: string[]} with envReqMembers,
envReqSatisfied, envReqKey helpers.
- OrgImportPreflightModal renders strict rows and any-of groups
via a new AnyOfEnvGroup sub-component: "Configure any one"
banner, per-member input, ✓-satisfied indicator, and dimmed
siblings once any member is configured so the user can still
switch providers.
- TemplatePalette.OrgTemplate.required_env / recommended_env
retyped to EnvRequirement[]; passthrough to the modal
unchanged.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
5adc8a74d5 |
feat(canvas+org): env preflight, EmptyState parity, shared useTemplateDeploy hook
Builds on #2061. Three internally-cohesive sub-features; easiest to read in order. ## 1. Org-level env preflight Server - `OrgTemplate` + `OrgWorkspace` gain `required_env: string[]` and `recommended_env: string[]` YAML fields. - `GET /org/templates` walks the tree and returns the tree-union (deduped, sorted) of both. `collectOrgEnv` dedup prefers required when the same key is declared at both tiers. - `POST /org/import` preflights against `global_secrets` WHERE `octet_length(encrypted_value) > 0` (empty-value rows used to be counted as "configured" and the per-container preflight still failed at start time). 412 Precondition Failed + `missing_env` list when required keys are absent. `force=true` bypasses with an audit log line. DB lookup failure now returns 500 (was: silent fall-through that defeated the guard). Env-var NAMES validated against `^[A-Z][A-Z0-9_]{0,127}$` so a malicious template can't ship pathological names into the UI or DB. Canvas - New `OrgImportPreflightModal`: red "Required" section (blocking) and yellow "Recommended" section (non-blocking, import stays enabled, shows live missing-count next to the Import button). - Per-key password input → `PUT /settings/secrets` → strike-through on save. Functional `setDrafts` throughout (no stale-closure clobbers on rapid successive saves). `useEffect` seed keyed on a sorted-join string signature so a parent re-render with a new array identity doesn't clobber typed inputs. - `TemplatePalette.handleImport` branches: zero env declarations → straight to import; any declarations → fetch configured global secret keys, open the modal. Tests (Go): `TestCollectOrgEnv_*` (5) cover union-across-levels, required-wins-over-recommended (including same-struct), dedup, empty, invalid-name rejection. ## 2. EmptyState parity with TemplatePalette The "Deploy your first agent" grid used to call `POST /workspaces` with no preflight while the sidebar palette ran `checkDeploySecrets` + `MissingKeysModal` first. Same template deployed two different ways → first-run users saw containers boot in `failed` state without guidance. Now both surfaces share one preflight + modal handshake. EmptyState's previous `interface Template` dropped `runtime`, `models`, and `required_env` — silently discarding exactly the fields the preflight needs. `Template` now lives in `deploy-preflight.ts` and is imported from there by both surfaces. ## 3. useTemplateDeploy hook With the preflight + modal wiring now duplicated across EmptyState + TemplatePalette + (going forward) any third surface, extracted the pattern into `canvas/src/hooks/useTemplateDeploy.tsx`: const { deploy, deploying, error, modal } = useTemplateDeploy({ canvasCoords: ..., // optional, default random onDeployed: (id) => ..., }); Closes three drift surfaces that the duplication had created: - `resolveRuntime` id→runtime fallback table (moved to `deploy-preflight.ts`). EmptyState had a narrower fallback that would have silently disagreed with the palette on any future id needing a non-identity mapping. - `checkDeploySecrets` call signature. One owner. - `MissingKeysModal` JSX wiring. One owner. Narrow try/catch around `checkDeploySecrets` so a preflight network failure clears `deploying` and surfaces via `setError` instead of stranding the button forever. `modal: ReactNode` (not a `renderModal()` function) — the previous memoization bought nothing since consumers called it inline every render. Named `MissingKeysInfo` interface for the state shape. ## 4. Viewport auto-fit user-pan gate fix During org deploy the canvas was meant to pan+zoom to follow each arriving workspace (`molecule:fit-deploying-org` event → debounced fitView). In practice the fit stayed stuck on wherever the first fit landed. Root cause: React Flow v12 fires `onMoveEnd` with a truthy `event` at the END of a programmatic `fitView` animation. The original "respect-user-pan" gate stamped `userPannedAtRef` in `onMoveEnd`, so our own fit completing looked like a user pan, and every subsequent auto-fit short-circuited for the rest of the deploy. Fix: stop trusting `onMoveEnd` for user-intent detection. Register explicit `wheel` + `pointerdown` listeners on `document` with capture phase and `target.closest('.react-flow__pane')` filter. Capture-phase immunity to `stopPropagation`; pane-filter rejects toolbar / modal / side-panel clicks (the old `window` fallback caught those). `onMoveEnd` simplified to only drive the debounced viewport save. Also: fit event dispatched on root arrivals (not just children), so the canvas centers on the just-landed root immediately instead of waiting ~2s for the first child. Animation 600ms → 400ms so successive per-arrival fits don't pile up visually. End-state fit stays at 1200ms — intentional asymmetry ("settling" vs "tracking"), documented in code. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
425df5e5a9 |
merge(staging): resolve conflicts + fix 7 test regressions on top of #2061
- Merge origin/staging into fix/canvas-multilevel-layout-ux. 18 files
auto-merged (mostly canvas/tabs/chat and workspace-server handlers
the earlier DIRTY marker was stale relative to current staging).
- Fix 7 test failures surfaced by the merge:
1. Canvas.pan-to-node.test.tsx — mockGetIntersectingNodes was
inferred as vi.fn(() => never[]); mockReturnValueOnce of a node
object failed type check. Explicit return-type annotation.
2. Canvas.pan-to-node.test.tsx + Canvas.a11y.test.tsx — Canvas.tsx
reads deletingIds.size (new multilevel-layout state). Both mock
stores lacked deletingIds; added new Set<string>() to each.
3. canvas-batch-partial-failure.test.ts — makeWS() built a wire-
format WorkspaceData (snake_case, with x/y/uptime_seconds). The
store's node.data is now WorkspaceNodeData (camelCase, no wire-
only fields). Rewrote makeWS to produce WorkspaceNodeData and
updated 5 call-site casts. No assertions changed.
4. ConfigTab.hermes.test.tsx — two tests pinned pre-#2061 behavior
that the PR intentionally inverts:
a. "shows hermes-specific info banner" — RUNTIMES_WITH_OWN_CONFIG
now contains only {"external"}, so the banner is no longer
shown for hermes. Inverted assertion: now pins ABSENCE of
the banner, with a comment noting the inversion.
b. "config.yaml runtime wins over DB" — priority reversed:
DB is now authoritative so the tier-on-node badge matches
the form. Inverted scenario: DB=hermes + yaml=crewai →
form shows hermes. Switched test's DB runtime off langgraph
because the dropdown collapses langgraph into an empty-
valued "default" option that would hide the win signal.
- No production code changed — this commit is staging merge + test
realignment only. 953/953 canvas tests pass. tsc --noEmit clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
94d9331c76 |
feat(canvas+platform): chat attachments, model selection, deploy/delete UX
Session's accumulated UX work across frontend and platform. Reviewable in four logical sections — diff is large but internally cohesive (each section fixes a gap the next one depends on). ## Chat attachments — user ↔ agent file round trip - New POST /workspaces/:id/chat/uploads (multipart, 50 MB total / 25 MB per file, UUID-prefixed storage under /workspace/.molecule/chat-uploads/). - New GET /workspaces/:id/chat/download with RFC 6266 filename escaping and binary-safe io.CopyN streaming. - Canvas: drag-and-drop onto chat pane, pending-file pills, per-message attachment chips with fetch+blob download (anchor navigation can't carry auth headers). - A2A flow carries FileParts end-to-end; hermes template executor now consumes attachments via platform helpers. ## Platform attachment helpers (workspace/executor_helpers.py) Every runtime's executor routes through the same helpers so future runtimes inherit attachment awareness for free: - extract_attached_files — resolve workspace:/file:///bare URIs, reject traversal, skip non-existent. - build_user_content_with_files — manifest for non-image files, multi-modal list (text + image_url) for images. Respects MOLECULE_DISABLE_IMAGE_INLINING for providers whose vision adapter hangs on base64 payloads (MiniMax M2.7). - collect_outbound_files — scans agent reply for /workspace/... paths, stages each into chat-uploads/ (download endpoint whitelist), emits as FileParts in the A2A response. - ensure_workspace_writable — called at molecule-runtime startup so non-root agents can write /workspace without each template having to chmod in its Dockerfile. Hermes template executor + langgraph (a2a_executor.py) + claude-code (claude_sdk_executor.py) all adopt the helpers. ## Model selection & related platform fixes - PUT /workspaces/:id/model — was 404'ing, so canvas "Save" silently lost the model choice. Stores into workspace_secrets (MODEL_PROVIDER), auto-restarts via RestartByID. - applyRuntimeModelEnv falls back to envVars["MODEL_PROVIDER"] so Restart propagates the stored model to HERMES_DEFAULT_MODEL without needing the caller to rehydrate payload.Model. - ConfigTab Tier dropdown now reads from workspaces row, not the (stale) config.yaml — fixes "badge shows T3, form shows T2". ## ChatTab & WebSocket UX fixes - Send button no longer locks after a dropped TASK_COMPLETE — `sending` no longer initializes from data.currentTask. - A2A POST timeout 15 s → 120 s. LLM turns routinely exceed 15 s; the previous default aborted fetches while the server was still replying, producing "agent may be unreachable" on success. - socket.ts: disposed flag + reconnectTimer cancellation + handler detachment fix zombie-WebSocket in React StrictMode. - Hermes Config tab: RUNTIMES_WITH_OWN_CONFIG drops 'hermes' — the adaptor's purpose IS the form, banner was contradictory. - workspace_provision.go auto-recovery: try <runtime>-default AND bare <runtime> for template path (hermes lives at the bare name). ## Org deploy/delete animation (theme-ready CSS) - styles/theme-tokens.css — design tokens (durations, easings, colors). Light theme overrides by setting only the deltas. - styles/org-deploy.css — animation classes + keyframes, every value references a token. prefers-reduced-motion respected. - Canvas projects node.draggable=false onto locked workspaces (deploying children AND actively-deleting ids) — RF's authoritative drag lock; useDragHandlers retains a belt-and- braces check. - Organ cancel button (red pulse pill on root during deploy) cascades via existing DELETE /workspaces/:id?confirm=true. - Auto fit-view after each arrival, debounced 500 ms so rapid sibling arrivals coalesce into one fit (previous per-event fit made the viewport lurch continuously). - Auto-fit respects user-pan — onMoveEnd stamps a user-pan timestamp only when event !== null (ignores programmatic fitView) so auto-fits don't self-cancel. - deletingIds store slice + useOrgDeployState merge gives the delete flow the same dim + non-draggable treatment as deploy. - Platform-level classNames.ts shared by canvas-events + useCanvasViewport (DRY'd 3 copies of split/filter/join). ## Server payload change - org_import.go WORKSPACE_PROVISIONING broadcast now includes parent_id + parent-RELATIVE x/y (slotX/slotY) so the canvas renders the child at the right parent-nested slot without doing any absolute-position walk. createWorkspaceTree signature gains relX, relY alongside absX, absY; both call sites updated. ## Tests - workspace/tests/test_executor_helpers.py — 11 new cases covering URI resolution (including traversal rejection), attached-file extraction (both Part shapes), manifest-only vs multi-modal content, large-image skip, outbound staging, dedup, and ensure_workspace_writable (chmod 777 + non-root tolerance). - workspace-server chat_files_test.go — upload validation, Content-Disposition escaping, filename sanitisation. - workspace-server secrets_test.go — SetModel upsert, empty clears, invalid UUID rejection. - tests/e2e/test_chat_attachments_e2e.sh — round-trip against a live hermes workspace. - tests/e2e/test_chat_attachments_multiruntime_e2e.sh — static plumbing check + round-trip across hermes/langgraph/claude-code. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
d0080b0e98 |
feat(org): log loud when org-template dir is a half-clone
Audit 2026-04-24 case: org-templates/molecule-dev/ contained only .git/ (working tree wiped). ListTemplates silently skipped the directory and the molecule-dev template silently disappeared from the Canvas palette. No log trail; CEO discovered hours later when looking for the registry listing manually. This commit adds a one-line log warning when a directory under orgDir has a .git/ subdir but no org.yaml/.yml — that's almost always a manifest clone that got truncated. The warning includes the recovery command (`git checkout main -- .`) so operators can self-fix without re-cloning. Doesn't change the response behavior — the directory is still skipped to keep ListTemplates a fail-soft endpoint. Just makes the failure visible in `docker logs platform`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
8c80175cd8 |
fix(canvas): subtree-aware layout + org-import reliability + UX polish
Five tightly-related fixes surfaced while stress-testing org-template
imports (Legal Team, Molecule Company, etc.) on a running control plane:
1) Org import was silently failing — INSERT wrote `collapsed` into the
`workspaces` table but that column lives on `canvas_layouts`
(005_canvas_layouts.sql). Every import returned 207 with 0 rows
created, which `api.post` treated as success → green "Imported"
toast + empty canvas. Moved the write to canvas_layouts; updated
the workspace_crud PATCH path to UPSERT there too; refreshed the
test mock. Added a client-side assertion that throws on
2xx-with-`error`-body so future partial-failures surface a red
toast rather than lying about success.
2) Multi-level nested layout was collision-prone: children that were
themselves parents (CTO → Dev Lead → 6 engineers) got the same
leaf-sized grid slot as leaf siblings and clipped into each other.
Added post-order `sizeOfSubtree` + sibling-size-aware
`childSlotInGrid` on both the Go server and the TS client (kept in
sync). `buildNodesAndEdges` now uses subtree sizes for both parent
dimensions and the rescue heuristic. `setCollapsed` on expand now
reads each child's actual rendered width/height instead of the
leaf-count formula — a regression test covers the CTO/Dev Lead
scenario.
3) Provisioning-timeout banner was unusable during large imports: a
30-workspace tree triggered 27 simultaneous "stuck" warnings 2
minutes in (server paces + provision concurrency = 3 guarantee tail
items legitimately wait longer). Scaled threshold with concurrent
count (base + 45s per queue slot beyond concurrency) and added a
Dismiss (×) button per banner.
4) Auto pan-and-zoom on org ready: after the last workspace flips out
of `provisioning`, canvas now fitView's with a 1.2s animation,
0.25 padding, `maxZoom: 0.8` and `minZoom: 0.25`. Without the zoom
caps fitView was hitting the component's maxZoom=2 on small trees
and zooming in instead of out.
5) Toolbar was visually busy: `+ N sub` count wrapped onto a second
row on narrow viewports; status dot and workspace total were in
separate border-delimited cells. Merged into one segment with
`whitespace-nowrap`; A2A / Audit / Search / Help collapsed to
icon-only 28px buttons with tooltip + aria-label (Figma/Linear
pattern). Stop All / Restart Pending keep text — they're urgent.
Also:
- `api.{get,post,...}` accept an optional `{ timeoutMs }` so callers
that hit intentionally-slow endpoints (org import paces 2s between
siblings) don't trip the 15s default and report false aborts.
- `WorkspaceNode` clamps role text to 2 lines so verbose descriptions
don't unboundedly grow card height and break the grid.
- `PARENT_HEADER_PADDING` bumped 44→130 to clear name + runtime +
2-line role + the currentTask banner that appears during the
initial-prompt phase.
Tests: 930 canvas tests + full Go handler suite pass. Added
regressions for (i) 207 partial-success surfacing as throw, and
(ii) setCollapsed sizing with nested-parent children.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
96cc4b0c42 |
fix(quickstart): wire up template/plugin registry via manifest.json
The Canvas template palette was empty on a fresh clone because
`workspace-configs-templates/`, `org-templates/`, and `plugins/` are
gitignored and nothing populated them. The registry already exists —
`manifest.json` at repo root lists every curated
`workspace-template-*`, `org-template-*`, and `plugin-*` repo, and
`scripts/clone-manifest.sh` clones them — but the step was absent
from the README and setup.sh, so new users never ran it.
### What this commit does
**1. `setup.sh` runs `clone-manifest.sh` automatically** (once).
After starting the Docker network but before booting infra, iterate
`manifest.json` and clone any workspace_templates / org_templates /
plugins that aren't already populated. Idempotent — subsequent
runs skip dirs that have content. Requires `jq`; when jq is missing
the step prints a clear install hint and skips (doesn't fail).
**2. `clone-manifest.sh` is idempotent.** Before running `git clone`,
check whether the target directory already exists and is non-empty —
skip if so. Lets `setup.sh` rerun safely without forcing the operator
to delete already-cloned template repos.
**3. `ListTemplates` logs the reason it skips a template.** The
handler previously swallowed `resolveYAMLIncludes` errors with
`continue`, so a broken template showed up as an empty palette with
no log trail. Now the include-expansion and yaml.Unmarshal failure
paths both emit a descriptive `log.Printf` — the exact message that
made the stale `org-templates/molecule-dev/` snapshot debuggable:
ListTemplates: skipping molecule-dev — !include expansion failed:
!include "core-platform.yaml" at line 25: open .../teams/
core-platform.yaml: no such file or directory
**4. Remove the in-tree `org-templates/molecule-dev/` snapshot** (170
files). Matches the explicit intent of prior commit
`bfec9e53` — "remove org-templates/molecule-dev/ — standalone repo
is source of truth". A later "full staging snapshot" re-added a
partial copy that had `!include` references to 7 role files that
never existed in the snapshot (`core-platform.yaml`,
`controlplane.yaml`, `app-docs.yaml`, `infra.yaml`, `sdk.yaml`,
`release-manager/workspace.yaml`, `integration-tester/workspace.yaml`).
`clone-manifest.sh` repopulates it fresh from
`Molecule-AI/molecule-ai-org-template-molecule-dev`.
.gitignore exception for `molecule-dev/` is dropped accordingly
— the whole `/org-templates/*` tree is now gitignored, symmetric
with `/plugins/` and `/workspace-configs-templates/`.
**5. Doc updates** (README, README.zh-CN, CONTRIBUTING) mention `jq`
as a prerequisite and describe what setup.sh now does.
### Verification
On a fresh-nuked DB with the updated branch:
1. `bash infra/scripts/setup.sh` — cleanly clones 33/33 manifest
repos (20 plugins, 8 workspace_templates, 5 org_templates), then
boots infra. Second run skips all 33 (idempotent).
2. `go run ./cmd/server` — "Applied 41 migrations", :8080 healthy.
3. `curl http://localhost:8080/org/templates` returns 4 templates
(was `[]`):
- Free Beats All
- MeDo Smoke Test
- Molecule AI Worker Team (Gemini)
- Reno Stars Agent Team
4. `bash tests/e2e/test_api.sh` — 61/61 pass.
5. `npx vitest run` in canvas — 902/902 pass.
6. `shellcheck infra/scripts/setup.sh` — clean.
### SaaS parity
All changes are local-dev surface. `setup.sh`, `clone-manifest.sh`,
and the local `org-templates/` directory aren't part of the CP
provisioner path — SaaS tenant machines get their templates via
Dockerfile layers or CP-side provisioning, not `clone-manifest.sh`.
The `ListTemplates` log addition is harmless either way (replaces a
silent `continue` with a `log.Printf + continue`).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
64ccf8e179
|
fix: CWE-78 rm scope, go vet failures, delegation idempotency
* refactor: split 4 oversized handler files into focused sub-files - org.go (1099 lines) → org.go + org_import.go + org_helpers.go - mcp.go (1001 lines) → mcp.go + mcp_tools.go - workspace.go (934 lines) → workspace.go + workspace_crud.go - a2a_proxy.go (825 lines) → a2a_proxy.go + a2a_proxy_helpers.go No functional changes — same package, same exports, same tests. All files stay under 635 lines. Note: isSafeURL and isPrivateOrMetadataIP are duplicated between mcp_tools.go and a2a_proxy_helpers.go — this is a pre-existing issue from the original mcp.go and a2a_proxy.go, not introduced by this split. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat(runtime+scheduler): increment/decrement active_tasks counter (refs #1386) * docs(tutorials): add Self-Hosted AI Agents guide — Docker, Fly Machines, bare metal * docs: add Remote Agents feature + Phase 30 blog links to docs index * docs(marketing): update Phase 30 brief — Action 5 complete, docs/index.md update noted * docs(api-ref): add workspace file copy API reference (#1281) Documents TemplatesHandler.copyFilesToContainer (container_files.go): - Endpoint overview: PUT /workspaces/:id/files/*path - Parameter descriptions for all four function parameters - CWE-22 path traversal protection (PRs #1267/1270/1271) - Defense-in-depth: validateRelPath at handler + archive boundary - Full error code table (400/404/500) - curl example with success and path-traversal rejection cases Also covers: writeViaEphemeral routing, findContainer fallback, allowed roots allow-list, and related links to platform-api.md. Co-authored-by: Molecule AI Technical Writer <technical-writer@agents.moleculesai.app> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(security): CWE-78/CWE-22 — block shell injection in deleteViaEphemeral (#1310) ## Summary Issue #1273: deleteViaEphemeral interpolated filePath directly into rm command, enabling both shell injection (CWE-78) and path traversal (CWE-22) attacks. ## Changes 1. Added validateRelPath(filePath) guard before constructing the rm command. validateRelPath blocks absolute paths and ".." traversal sequences. 2. Changed Cmd from "/configs/"+filePath (string interpolation) to []string{"rm", "-rf", "/configs", filePath} (exec form). This eliminates shell injection entirely — filePath is a plain argument, never interpreted as shell code. ## Security properties - validateRelPath: blocks "../" and absolute paths before they reach Docker - Exec form: filePath cannot inject shell metacharacters even if validation is somehow bypassed - "/configs" as separate arg: rm has exactly two arguments, no room for injected args Closes #1273. Co-authored-by: Molecule AI Infra-Runtime-BE <infra-runtime-be@agents.moleculesai.app> * fix(security): backport SSRF defence (CWE-918) to main — isSafeURL in a2a_proxy.go (#1292) (#1302) * fix(security): backport SSRF defence (CWE-918) to main — isSafeURL in mcp.go and a2a_proxy.go Issue #1042: 3 CodeQL SSRF findings across mcp.go and a2a_proxy.go. staging already ships the fix (PRs #1147, #1154 → merged); main did not include it. - mcp.go: add isSafeURL() + isPrivateOrMetadataIP() helpers; validate agentURL before outbound calls in mcpCallTool (line ~529) and toolDelegateTaskAsync (line ~607) - a2a_proxy.go: add identical isSafeURL() + isPrivateOrMetadataIP() helpers; call isSafeURL() before dispatchA2A in resolveAgentURL() (blocks finding #1 at line 462) - mcp_test.go: 19 new tests covering all blocked URL patterns: file://, ftp://, 127.0.0.1, ::1, 169.254.169.254, 10.x.x.x, 172.16.x.x, 192.168.x.x, empty hostname, invalid URL, isPrivateOrMetadataIP across all private/CGNAT/metadata ranges 1. URL scheme enforcement — http/https only 2. IP literal blocking — loopback, link-local, RFC-1918, CGNAT, doc/test ranges 3. DNS hostname resolution — blocks internal hostnames resolving to private IPs Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(ci-blocker): remove duplicate isSafeURL/isPrivateOrMetadataIP from mcp.go Issue #1292: PR #1274 duplicated isSafeURL + isPrivateOrMetadataIP in mcp.go — both functions already exist on main at lines 829 and 876. Kept the mcp.go definitions (the originals) and removed the 70-line duplicate appended at end of file. a2a_proxy.go functions are unchanged — they serve the same purpose via a separate code path. * fix: remove orphaned commit-text lines from a2a_proxy.go Three lines from the PR/commit title were accidentally baked into the file during the rebase from #1274 to #1302, causing a Go syntax error (a bare string literal at statement level followed by dangling braces). Deletion restores: } return agentURL, nil } Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Co-authored-by: Molecule AI Infra-Runtime-BE <infra-runtime-be@agents.moleculesai.app> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> Co-authored-by: Molecule AI Core-BE <core-be@agents.moleculesai.app> Co-authored-by: Molecule AI SDK Lead <sdk-lead@agents.moleculesai.app> * fix(canvas/test): patch test regressions from PR #1243 + proximity hitbox fix (#1313) * fix(ci): revert cancel-in-progress to true — ubuntu-runner dispatch stalled With cancel-in-progress: false, pending CI runs accumulate in the ci-staging concurrency group. New pushes create queued runs, but GitHub dispatches multiple runs for the same SHA instead of replacing the pending one. All runs get stuck/cancelled before completing. Reverting to cancel-in-progress: true restores CI operation — runs that are superseded are cancelled, freeing the concurrency slot for the new run to proceed. Runner availability (ubuntu-latest dispatch stall) is a separate infra issue tracked independently. * fix(security): validate tar header names in copyFilesToContainer — CWE-22 path traversal (#1043) Tar header names were built from raw map keys without validation. A malicious server-side caller could embed "../" in a file name to escape the destPath volume mount (/configs) and write files outside the intended directory. Fix: validate each name with filepath.Clean + IsAbs + HasPrefix("..") checks before using it in the tar header, then join with destPath for the archive header. Also guard parent-directory creation against traversal. Closes #1043. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(canvas/test): patch regressed tests from PR #1243 orgs-page flakiness fix Two regressions introduced by PR #1243 (fix issue #1207): 1. **ContextMenu.keyboard.test.tsx** — `setPendingDelete` now receives `{id, name, hasChildren}` (cascade-delete UX, PR #1252), but the test expected only `{id, name}`. Added `hasChildren: false` to the assertion. 2. **orgs-page.test.tsx** — 10 tests awaited `vi.advanceTimersByTimeAsync(50)` without `act()`. With fake timers, `setState` (synchronous) is flushed by `advanceTimersByTimeAsync`, but the React state update it triggers is a microtask — so the test saw stale render. Wrapping in `act(async () => { await vi.advanceTimersByTimeAsync(50); })` ensures microtasks drain before assertions run. All 813 vitest tests pass. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(canvas): add 100px proximity threshold to drag-to-nest detection Fixes #1052 — previously, getIntersectingNodes() returned any node whose bounding box overlapped the dragged node, regardless of actual pixel distance. On a sparse canvas this triggered the "Nest Workspace" dialog even when the dragged node was nowhere near any target. The fix adds an on-node-drag proximity filter: only nodes within 100px (center-to-center) of the dragged node are eligible as nest targets. Distance is computed as squared Euclidean to avoid the sqrt overhead in the hot drag path. Added two tests to Canvas.pan-to-node.test.tsx covering the mock wiring and confirming the regression is addressed in Canvas.tsx. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Co-authored-by: molecule-ai[bot] <276602405+molecule-ai[bot]@users.noreply.github.com> Co-authored-by: Molecule AI Core-FE <core-fe@agents.moleculesai.app> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(canvas): add ?? 0 guard for optional budget_used in progressPct (#1324) (#1327) * fix(ci): revert cancel-in-progress to true — ubuntu-runner dispatch stalled With cancel-in-progress: false, pending CI runs accumulate in the ci-staging concurrency group. New pushes create queued runs, but GitHub dispatches multiple runs for the same SHA instead of replacing the pending one. All runs get stuck/cancelled before completing. Reverting to cancel-in-progress: true restores CI operation — runs that are superseded are cancelled, freeing the concurrency slot for the new run to proceed. Runner availability (ubuntu-latest dispatch stall) is a separate infra issue tracked independently. * fix(security): validate tar header names in copyFilesToContainer — CWE-22 path traversal (#1043) Tar header names were built from raw map keys without validation. A malicious server-side caller could embed "../" in a file name to escape the destPath volume mount (/configs) and write files outside the intended directory. Fix: validate each name with filepath.Clean + IsAbs + HasPrefix("..") checks before using it in the tar header, then join with destPath for the archive header. Also guard parent-directory creation against traversal. Closes #1043. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(canvas/test): patch regressed tests from PR #1243 orgs-page flakiness fix Two regressions introduced by PR #1243 (fix issue #1207): 1. **ContextMenu.keyboard.test.tsx** — `setPendingDelete` now receives `{id, name, hasChildren}` (cascade-delete UX, PR #1252), but the test expected only `{id, name}`. Added `hasChildren: false` to the assertion. 2. **orgs-page.test.tsx** — 10 tests awaited `vi.advanceTimersByTimeAsync(50)` without `act()`. With fake timers, `setState` (synchronous) is flushed by `advanceTimersByTimeAsync`, but the React state update it triggers is a microtask — so the test saw stale render. Wrapping in `act(async () => { await vi.advanceTimersByTimeAsync(50); })` ensures microtasks drain before assertions run. All 813 vitest tests pass. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(canvas): add 100px proximity threshold to drag-to-nest detection Fixes #1052 — previously, getIntersectingNodes() returned any node whose bounding box overlapped the dragged node, regardless of actual pixel distance. On a sparse canvas this triggered the "Nest Workspace" dialog even when the dragged node was nowhere near any target. The fix adds an on-node-drag proximity filter: only nodes within 100px (center-to-center) of the dragged node are eligible as nest targets. Distance is computed as squared Euclidean to avoid the sqrt overhead in the hot drag path. Added two tests to Canvas.pan-to-node.test.tsx covering the mock wiring and confirming the regression is addressed in Canvas.tsx. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(canvas): add ?? 0 guard for optional budget_used in progressPct Fixes #1324 — TypeScript strict mode flags budget.budget_used as possibly undefined in the progressPct ternary, even though the outer condition checks budget_limit > 0. Fix: use nullish coalescing (budget_used ?? 0) so progress shows 0% when the backend returns a partial shape (provisioning-stuck workspaces). Also adds a test covering the undefined-budget_used case with the progress bar aria-valuenow and fill width both at 0%. Closes #1324. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Co-authored-by: molecule-ai[bot] <276602405+molecule-ai[bot]@users.noreply.github.com> Co-authored-by: Molecule AI Core-FE <core-fe@agents.moleculesai.app> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(canvas): add ?? 0 guard for optional budget_used in progressPct (issue #1324) (#1329) * fix(ci): revert cancel-in-progress to true — ubuntu-runner dispatch stalled With cancel-in-progress: false, pending CI runs accumulate in the ci-staging concurrency group. New pushes create queued runs, but GitHub dispatches multiple runs for the same SHA instead of replacing the pending one. All runs get stuck/cancelled before completing. Reverting to cancel-in-progress: true restores CI operation — runs that are superseded are cancelled, freeing the concurrency slot for the new run to proceed. Runner availability (ubuntu-latest dispatch stall) is a separate infra issue tracked independently. * fix(security): validate tar header names in copyFilesToContainer — CWE-22 path traversal (#1043) Tar header names were built from raw map keys without validation. A malicious server-side caller could embed "../" in a file name to escape the destPath volume mount (/configs) and write files outside the intended directory. Fix: validate each name with filepath.Clean + IsAbs + HasPrefix("..") checks before using it in the tar header, then join with destPath for the archive header. Also guard parent-directory creation against traversal. Closes #1043. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(canvas/test): patch regressed tests from PR #1243 orgs-page flakiness fix Two regressions introduced by PR #1243 (fix issue #1207): 1. **ContextMenu.keyboard.test.tsx** — `setPendingDelete` now receives `{id, name, hasChildren}` (cascade-delete UX, PR #1252), but the test expected only `{id, name}`. Added `hasChildren: false` to the assertion. 2. **orgs-page.test.tsx** — 10 tests awaited `vi.advanceTimersByTimeAsync(50)` without `act()`. With fake timers, `setState` (synchronous) is flushed by `advanceTimersByTimeAsync`, but the React state update it triggers is a microtask — so the test saw stale render. Wrapping in `act(async () => { await vi.advanceTimersByTimeAsync(50); })` ensures microtasks drain before assertions run. All 813 vitest tests pass. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(canvas): add 100px proximity threshold to drag-to-nest detection Fixes #1052 — previously, getIntersectingNodes() returned any node whose bounding box overlapped the dragged node, regardless of actual pixel distance. On a sparse canvas this triggered the "Nest Workspace" dialog even when the dragged node was nowhere near any target. The fix adds an on-node-drag proximity filter: only nodes within 100px (center-to-center) of the dragged node are eligible as nest targets. Distance is computed as squared Euclidean to avoid the sqrt overhead in the hot drag path. Added two tests to Canvas.pan-to-node.test.tsx covering the mock wiring and confirming the regression is addressed in Canvas.tsx. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(canvas): add ?? 0 guard for optional budget_used in progressPct Fixes #1324 — TypeScript strict mode flags budget.budget_used as possibly undefined in the progressPct ternary, even though the outer condition checks budget_limit > 0. Fix: use nullish coalescing (budget_used ?? 0) so progress shows 0% when the backend returns a partial shape (provisioning-stuck workspaces). Also adds a test covering the undefined-budget_used case with the progress bar aria-valuenow and fill width both at 0%. Closes #1324. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Co-authored-by: molecule-ai[bot] <276602405+molecule-ai[bot]@users.noreply.github.com> Co-authored-by: Molecule AI Core-FE <core-fe@agents.moleculesai.app> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(platform): unblock SaaS workspace registration end-to-end Every workspace in the cross-EC2 SaaS provisioning shape was failing registration, heartbeat, or A2A routing. Four distinct blockers sat between "EC2 is up" and "agent responds"; three are platform-side and fixed here (the fourth is in the CP user-data, separate PR). 1. SSRF validator blocked RFC-1918 (registry.go + mcp.go) validateAgentURL and isPrivateOrMetadataIP rejected 172.16.0.0/12, which contains the AWS default VPC range (172.31.x.x) that every sibling workspace EC2 registers from. Registration returned 400 and the 10-min provision sweep flipped status to failed. RFC-1918 + IPv6 ULA are now gated behind saasMode(); link-local (169.254/16), loopback, IPv6 metadata (fe80::/10, ::1), and TEST-NET stay blocked unconditionally in both modes. saasMode() resolution order: 1. MOLECULE_DEPLOY_MODE=saas|self-hosted (explicit operator flag) 2. MOLECULE_ORG_ID presence (legacy implicit signal, kept for back-compat so existing deployments don't need a config change) isPrivateOrMetadataIP now actually checks IPv6 — previously it returned false on any non-IPv4 input, which would let a registered [::1] or [fe80::...] URL bypass the SSRF check entirely. 2. Orphan auth-token minting (workspace_provision.go) issueAndInjectToken mints a token and stuffs it into cfg.ConfigFiles[".auth_token"]. The Docker provisioner writes that file into the /configs volume — the CP provisioner ignores it (only cfg.EnvVars crosses the wire). Result: live token in DB, no plaintext on disk, RegistryHandler.requireWorkspaceToken 401s every /registry/register attempt because the workspace is no longer in the "no live token → bootstrap-allowed" state. Now no-ops in SaaS mode; the register handler already mints on first successful register and returns the plaintext in the response body for the runtime to persist locally. Also removes the redundant wsauth.IssueToken call at the bottom of provisionWorkspaceCP, which created the same orphan-token pattern a second time. 3. Compaction artefacts (bundle/importer.go, handlers/org_tokens.go, scheduler.go, workspace_provision.go) Four pre-existing compile errors on main from an earlier session's code truncation: missing tuple destructuring on ExecContext / redactSecrets / orgTokenActor, missing close-brace in Scheduler.fireSchedule's panic recovery. All one-line mechanical fixes; without them the binary would not build. Tests ----- ssrf_test.go adds: * TestSaasMode — covers the env resolution ladder (explicit flag wins over legacy signal, case-insensitive, whitespace tolerant) * TestIsPrivateOrMetadataIP_SaaSMode — asserts RFC-1918 + IPv6 ULA flip to allowed, metadata/loopback/TEST-NET still blocked * TestIsPrivateOrMetadataIP_IPv6 — regression guard for the old "returns false for all IPv6" behaviour Follow-up issue for CP-sourced workspace_id attestation will be filed separately — closes the residual intra-VPC SSRF + token-race windows the SaaS-mode relaxation introduces. Verified end-to-end today on workspace 6565a2e0 (hermes runtime, OpenAI provider) — agent returned "PONG" in 1.4s after register → heartbeat → A2A proxy → runtime. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(runtime+scheduler): increment/decrement active_tasks + max_concurrent (#1408) Runtime (shared_runtime.py): - set_current_task now increments active_tasks on task start, decrements on completion (was binary 0/1) - Counter never goes below 0 (max(0, n-1)) - Pushes heartbeat immediately on BOTH increment and decrement (#1372) Scheduler (scheduler.go): - Reads max_concurrent_tasks from DB (default 1, backward compatible) - Skips cron only when active_tasks >= max_concurrent_tasks (was > 0) - Leaders can be configured with max_concurrent_tasks > 1 to accept A2A delegations while a cron runs Platform: - Added max_concurrent_tasks column to workspaces (migration 037) - Workspace model + list/get queries include the new field - API exposes max_concurrent_tasks in workspace JSON Config.yaml support (future): runtime_config.max_concurrent_tasks Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(review): address 3 critical issues from code review 1. BLOCKER: executor_helpers.py now uses increment/decrement too (was still binary 0/1, stomping the counter for CLI + SDK executors) 2. BUG: asymmetric getattr defaults fixed — both paths use default 0 (was 0 on increment, 1 on decrement) 3. UX: current_task preserved when active_tasks > 0 on decrement (was clearing task description even when other tasks still running) 4. Scheduler polling loop re-reads max_concurrent_tasks on each poll (was using stale value from initial query) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Hongming Wang <hongmingwangrabbit@gmail.com> Co-authored-by: molecule-ai[bot] <276602405+molecule-ai[bot]@users.noreply.github.com> Co-authored-by: Molecule AI Technical Writer <technical-writer@agents.moleculesai.app> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> Co-authored-by: Molecule AI Infra-Runtime-BE <infra-runtime-be@agents.moleculesai.app> Co-authored-by: Molecule AI Core-BE <core-be@agents.moleculesai.app> Co-authored-by: Molecule AI SDK Lead <sdk-lead@agents.moleculesai.app> Co-authored-by: Molecule AI Core-FE <core-fe@agents.moleculesai.app> Co-authored-by: Hongming Wang <hongmingwang.rabbit@users.noreply.github.com> * docs: workspace files API reference, skill catalog, and links * docs: fix secrets endpoint path across docs The workspace secrets endpoint is `/workspaces/:id/secrets`, not `/secrets/values`. This was wrong in quickstart.md (Path 2: Remote Agent) and workspace-runtime.md (registration flow example and comparison table). The external-agent-registration guide already had the correct path. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * docs: fix broken blog cross-link in skills-vs-bundled-tools post Link path had an extra `/docs/` segment: `/docs/blog/...` instead of `/blog/...`. Nextra resolves blog posts directly under `/blog/<slug>`, not under `/docs/blog/`. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * docs: add skill-catalog.md guide Linked from the skills-vs-bundled-tools blog post as a reference for TTS/image-generation/web-search skills. The blog promises "install directly via the CLI" with a skill catalog — this page fills that promise by documenting available skill types, install commands, version management, custom skill authoring, and removal. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * docs(marketing): update Phase 30 brief — Action 5 complete, docs/index.md update noted * docs(api-ref): add workspace file copy API reference Documents TemplatesHandler.copyFilesToContainer (container_files.go): - Endpoint overview: PUT /workspaces/:id/files/*path - Parameter descriptions for all four function parameters - CWE-22 path traversal protection (PRs #1267/1270/1271) - Defense-in-depth: validateRelPath at handler + archive boundary - Full error code table (400/404/500) - curl example with success and path-traversal rejection cases Also covers: writeViaEphemeral routing, findContainer fallback, allowed roots allow-list, and related links to platform-api.md. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Co-authored-by: Molecule AI Technical Writer <technical-writer@agents.moleculesai.app> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> Co-authored-by: molecule-ai[bot] <276602405+molecule-ai[bot]@users.noreply.github.com> * fix(handlers): add saasMode() gating to isPrivateOrMetadataIP in a2a_proxy_helpers.go Issue #1421 / #1401: PR #1363 (handler split) moved isPrivateOrMetadataIP into a2a_proxy_helpers.go but kept the OLD pre-SaaS version — it unconditionally blocks RFC-1918 addresses, regressing the fix in commits |
||
|
|
35ccda1091 |
fix(security): replace err.Error() with generic messages in handler responses (#1193)
Replace all c.JSON(http.StatusBadRequest, gin.H{"error": err.Error()})
calls across 22 handler files with context-appropriate generic messages
to prevent internal error strings (DB details, validation messages,
file paths) leaking into API responses.
Pattern established:
- ShouldBindJSON failures → "invalid request body" (or "invalid delegation request")
- Validation failures → "invalid workspace ID", "invalid path", etc.
- Server-side errors still logged, only generic message returned to client
References: Security finding from Audit #125 (Stripe key leak via err.Error())
Co-authored-by: Molecule AI Fullstack (floater) <fullstack-floater@agents.moleculesai.app>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
|
||
|
|
762b38fa30 |
fix(org-import): limit concurrent Docker provisioning to 3 (#1084)
The org import fired all workspace provisioning goroutines concurrently, overwhelming Docker when creating 39+ containers. Containers timed out, leaving workspaces stuck in 'provisioning' with no schedules or hooks. Fix: - Add provisionConcurrency=3 semaphore limiting concurrent Docker ops - Increase workspaceCreatePacingMs from 50ms to 2000ms between siblings - Pass semaphore through createWorkspaceTree recursion With 39 workspaces at 3 concurrent + 2s pacing, import takes ~30s instead of timing out. Each workspace gets its full template: schedules, hooks, settings, hierarchy. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
||
|
|
ff7ac87b97 |
feat: seed initial memories from org template and create payload (#1050)
Add MemorySeed model and initial_memories support at three levels: - POST /workspaces payload: seed memories on workspace creation - org.yaml workspace config: per-workspace initial_memories with defaults fallback - org.yaml global_memories: org-wide GLOBAL scope memories seeded on the first root workspace during import Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
||
|
|
39074cc4ae |
chore: final open-source cleanup — binary, stale paths, private refs
- Remove compiled workspace-server/server binary from git - Fix .gitignore, .gitattributes, .githooks/pre-commit for renamed dirs - Fix CI workflow path filters (workspace-template → workspace) - Replace real EC2 IP and personal slug in test_saas_tenant.sh - Scrub molecule-controlplane references in docs - Fix stale workspace-template/ paths in provisioner, handlers, tests - Clean tracked Python cache files Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
||
|
|
d8026347e5 |
chore: open-source restructure — rename dirs, remove internal files, scrub secrets
Renames: - platform/ → workspace-server/ (Go module path stays as "platform" for external dep compat — will update after plugin module republish) - workspace-template/ → workspace/ Removed (moved to separate repos or deleted): - PLAN.md — internal roadmap (move to private project board) - HANDOFF.md, AGENTS.md — one-time internal session docs - .claude/ — gitignored entirely (local agent config) - infra/cloudflare-worker/ → Molecule-AI/molecule-tenant-proxy - org-templates/molecule-dev/ → standalone template repo - .mcp-eval/ → molecule-mcp-server repo - test-results/ — ephemeral, gitignored Security scrubbing: - Cloudflare account/zone/KV IDs → placeholders - Real EC2 IPs → <EC2_IP> in all docs - CF token prefix, Neon project ID, Fly app names → redacted - Langfuse dev credentials → parameterized - Personal runner username/machine name → generic Community files: - CONTRIBUTING.md — build, test, branch conventions - CODE_OF_CONDUCT.md — Contributor Covenant 2.1 All Dockerfiles, CI workflows, docker-compose, railway.toml, render.yaml, README, CLAUDE.md updated for new directory names. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |