Commit Graph

218 Commits

Author SHA1 Message Date
Hongming Wang
2c3eccf9d6 test(auth): provide window.location.pathname in redirectToLogin mocks
The pathname.startsWith() loop-break added to redirectToLogin needs
pathname on the mock Location object; tests were supplying only href.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 11:16:22 -07:00
rabbitblood
b360a4353f fix(auth): redirect to app.moleculesai.app for login, not tenant subdomain
Tenant subdomains (hongmingwang.moleculesai.app) proxy to EC2 platform
which has no /cp/auth/* routes. Auth UI lives on app.moleculesai.app.

Added getAuthOrigin() that detects SaaS tenant hosts and redirects to
the app subdomain for login/signup. Non-SaaS hosts (localhost, dev)
fall back to PLATFORM_URL as before.

[Molecule-Platform-Evolvement-Manager]

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-23 11:16:22 -07:00
rabbitblood
6730c7713d fix(auth): redirect to login on 401 from any API call
When session credentials expire mid-use, ALL API calls return 401.
Previously this threw a generic error that crashed the UI with no
recovery path. Now the API client intercepts 401 and redirects to
login once (via redirectToLogin which already guards against loops).

Combined with the AuthGate /cp/auth/* path guard, this gives the
correct behavior: credentials lost → redirect to login → user logs
in → return_to sends them back.

[Molecule-Platform-Evolvement-Manager]

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-23 11:16:22 -07:00
rabbitblood
edc42b2893 fix(auth): break infinite redirect loop on /cp/auth/login
AuthGate redirected anonymous users to /cp/auth/login?return_to=<url>,
but the login page itself triggered AuthGate, which redirected again
with double-encoded return_to. Each redirect added another encoding
layer until the URL exceeded 431 (Request Header Fields Too Large).

Two guards:
1. redirectToLogin() returns early if already on /cp/auth/* path
2. AuthGate skips redirect check entirely for /cp/auth/* paths

[Molecule-Platform-Evolvement-Manager]

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-23 11:16:22 -07:00
Hongming Wang
dc476153c1 Merge remote-tracking branch 'origin/staging' into promote/main-to-staging-2026-04-23
# Conflicts:
#	canvas/src/components/__tests__/ContextMenu.keyboard.test.tsx
2026-04-23 09:50:16 -07:00
8f7808642a fix(test): add getState to useCanvasStore mock in ContextMenu keyboard test
PR #1781 introduced useCanvasStore.getState() call in ContextMenu.tsx
(line 169) but the existing Vitest mock for useCanvasStore in the keyboard
test file lacked a getState method, causing:
  TypeError: useCanvasStore.getState is not a function

Fix: attach getState: () => mockStore to the mock using Object.assign
so the static method is available alongside the selector fn.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-23 16:43:08 +00:00
Hongming Wang
47dc72c6b3 chore: promote main → staging (52 commits, 2 conflicts resolved)
Brings the staging branch up to date with main's feature-fix stream so
every staging-targeted PR stops tripping on pre-existing rot. Before
this merge, staging had 30+ compile + test failures from fix PRs that
landed on main but never reached staging — primarily #1755's panic-
cascade + schema-drift alignments.

After this merge the handlers package goes from 30+ fails → 2 pre-
existing nil-docker test panics (TestCopyFilesToContainer_CWE22_
RejectsTraversal + TestDeleteViaEphemeral_F1085_RejectsTraversal),
both authored on staging and broken before this promotion. Tracked
separately; not a merge regression.

## Conflicts resolved

1. docs/marketing/campaigns/discord-adapter-announcement/announcement.md
   — deleted on main (9d0d213: "move sensitive strategy + research to
   internal repo"), modified on staging. Deletion wins: marketing
   content moved out of the public monorepo per that commit's intent.
   The content lives in the internal repo.

2. workspace-server/internal/handlers/container_files.go — staging's
   rmTarget version kept. Main's version had `Cmd: []string{"rm",
   "-rf", "/configs/" + filePath}` which concatenates raw filePath
   AFTER the prefix-check on rmTarget, defeating the path-traversal
   guard (a "../etc/passwd" input passes validation but the rm cmd
   then traverses). Staging's `Cmd: []string{"rm", "-rf", rmTarget}`
   uses the validated path. Keeping staging's more-secure variant.

## Includes build unblockers from #1769 / #1782
- terminal.go: malformed handleLocalConnect repaired
- terminal_test.go: missing braces in TestHandleConnect_RoutesToLocal
- workspace_crud.go: unused imports + duplicate strField block
- container_files_test.go: duplicate contains() removed (uses the one
  in workspace_provision_test.go, same package)

## Verification
- go build ./...  clean
- go vet ./...  clean
- go test -race ./... — 18/20 packages green; 2 test panics in
  internal/handlers are pre-existing on staging (documented above)
2026-04-23 08:51:01 -07:00
Hongming Wang
68ee76c6b7 fix(canvas): add getState to useCanvasStore mock in ContextMenu keyboard test
ContextMenu.tsx reads parent-workspace children via
useCanvasStore.getState().nodes.filter(...) — a direct .getState()
call, not the selector-calling form. The existing vi.mock exposed
only the selector form, so rendering crashed with
"TypeError: useCanvasStore.getState is not a function".

Restructure the vi.mock factory to return Object.assign(fn, {
getState: () => mockStore }) so both call shapes resolve. Factory body
builds the function locally because vi.mock hoists above outer-scope
variable declarations and can't reference `mockStore` via closure.

Verified: all 15 tests in the file pass after the change.

Unblocks the Canvas (Next.js) CI check on PR #1743 (staging→main sync).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 01:49:34 -07:00
Hongming Wang
d4cead5002 chore: extract ContextMenu Zustand fix + a2a_proxy local-docker SSRF bypass + workspace-server Dockerfile GID entrypoint
Three small, non-overlapping fixes extracted from closed PR #1664:

1. canvas/src/components/ContextMenu.tsx — Replace the useMemo-over-nodes
   pattern with a hashed-boolean selector (s.nodes.some(...)) so Zustand's
   useSyncExternalStore snapshot comparison is stable. Resolves React
   error #185 (infinite render loop). Moves the child-node list derivation
   into the delete handler via getState() so the render path no longer
   allocates a fresh array.

2. workspace-server/internal/handlers/a2a_proxy.go — Allow the
   Docker-bridge hostname path (ws-<id>:8000) to skip the SSRF guard in
   local-docker mode. Gated on !saasMode() so SaaS deployments keep the
   full private-IP blocklist (a remote workspace registration can't claim
   a ws-* hostname and reach a sensitive VPC IP).

3. workspace-server/Dockerfile — Add entrypoint.sh that discovers the
   docker.sock GID at boot and adds the platform user to that group, then
   exec's su-exec to drop privileges. Lets the platform container reach
   the host docker socket without running as root.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-22 20:00:16 -07:00
molecule-ai[bot]
9d076b9c4d
Merge pull request #1684 from Molecule-AI/fix/missing-keys-modal-a11y-v2
fix(canvas/a11y): MissingKeysModal — backdrop aria-hidden, decorative SVGs, form labels
2026-04-23 02:54:46 +00:00
5157f80d19 fix(canvas): add type=button to ApprovalBanner action buttons (bug #1669)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-23 02:15:52 +00:00
Hongming Wang
e08ea7b5ba fix(canvas): require hermes model at create + send to CP (fixes silent Anthropic 401)
Root cause of the hermes 401 "Invalid API key" on SaaS workspaces:

  1. CreateWorkspaceDialog never sent `model` in the /workspaces POST
  2. Tenant/CP plumbed through a valid (provider, API key) but empty MODEL
  3. Workspace install.sh ran with HERMES_DEFAULT_MODEL unset
  4. derive-provider.sh saw no slug → PROVIDER="auto"
  5. Hermes fell back to its compiled-in default (Anthropic via
     OpenAI-compat adapter)
  6. User's MINIMAX_API_KEY was present but irrelevant — hermes tried
     Anthropic with it → 401

Fix:

- Extend HERMES_PROVIDERS with `defaultModel` + `models` (suggestion
  list). Each provider ships with a known-good default so the trap
  is physically impossible to hit with the new form.
- Add a required Model input to the Hermes panel, auto-populated
  from the provider's defaultModel when the provider changes (only
  if the user hasn't typed their own slug yet).
- Datalist surfaces additional model suggestions per provider so
  users can pick a different size (e.g. M2.7-highspeed) without
  typing the whole slug.
- handleCreate validates hermesModel is non-empty, sends as `model`
  in the POST body alongside the secrets block.
- useEffect guard avoids clobbering a user-typed custom slug when
  they toggle providers back and forth.

Existing 19 a11y tests still pass (non-SaaS path unchanged, four-tier
picker still renders, arrow-key nav still wraps).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-22 18:59:49 -07:00
Hongming Wang
66de81fbfa
Merge pull request #1689 from Molecule-AI/refactor/strip-secret-service-dropdown
refactor(secrets): strip Service dropdown from Add-Key form
2026-04-22 18:46:02 -07:00
Hongming Wang
0574e7c1d0 feat(canvas): add T4 tier (full-host access); SaaS default T4
Following feedback that T4 — not T3 — is the full-access tier:

- Non-SaaS picker now shows all four tiers: T1 Sandboxed, T2 Standard,
  T3 Privileged, T4 Full Access. Four-column grid.
- SaaS picker stays single-option but now locks to T4 (was T3). Every
  SaaS workspace gets a dedicated EC2 VM, which is unambiguously the
  "full host" case — T3 (privileged container) was a category mismatch.
- Default tier on SaaS is 4 (was 3). CP provisioner already supports
  tier 4 (t3.large / 80 GB). TIER_CONFIG already has T4's amber color.

Tests updated for the four-tier picker: wrap tests now go T4 ↔ T1, and
the selection/tabIndex tests cover the fourth button.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-22 18:17:13 -07:00
382238daa3 test(canvas): relax setPendingDelete assertion to use expect.objectContaining
Staging added hasChildren/children fields to workspace store shape.
Test assertion updated to use objectContaining to avoid false negatives.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-23 00:59:38 +00:00
66c6b83ab2 test(canvas): add ActivityTab and MissingKeysModal component tests
- ActivityTab.test.tsx: 27 tests covering filter bar (aria-pressed states,
  API reload), loading/error/empty states, ActivityRow content (type badges,
  method, duration_ms, summary, error styling), A2A flow indicators,
  auto-refresh Live/Paused toggle, refresh button, activity count

- MissingKeysModal.component.test.tsx: 25 tests covering visibility,
  ARIA semantics (role=dialog, aria-modal, aria-labelledby), content,
  keyboard (Escape, Enter), save flow (disabled/.../Saved/error), Add Keys
  & Deploy gate, Cancel + backdrop click, Open Settings button

- MissingKeysModal.test.tsx: refactored to preflight logic only (7 tests);
  component rendering now covered in component test file

863 tests passing (+3 net).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-23 00:58:56 +00:00
Hongming Wang
8b1af9708c feat(canvas): default tier T3 and hide T1/T2 on SaaS
On SaaS every workspace gets its own EC2 VM — the Docker-sandbox
distinction between T1 (sandboxed), T2 (standard Docker), and T3
(full host access) doesn't apply. A SaaS workspace is always a
dedicated VM, which is "full access" by construction. Showing T1/T2
in that UI is a category error: users pick a sandbox level that has
no effect on the actual EC2 machine they get.

Changes:
- tenant.ts: export isSaaSTenant() — returns true when canvas is
  served at <slug>.moleculesai.app (SSR-safe: false on server)
- CreateWorkspaceDialog: when isSaaSTenant(), render only the T3
  option, default tier=3, grid collapses to a single column. Label
  gets a " — dedicated VM" hint so the user knows what they're
  getting. On self-hosted the full T1/T2/T3 picker is unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-22 17:02:48 -07:00
Hongming Wang
d956164812 refactor(secrets): strip Service dropdown from Add-Key form
The Add-Key form used to open with a required Service dropdown
(GitHub / Anthropic / OpenRouter / Other) that gated everything
else. The dropdown did no persistent work — the secret store only
cares about (key_name, value); the Service label was never saved
anywhere. It also suffered registry drift: today we support ~22
hermes-dispatched providers (MiniMax, Gemini, DeepSeek, Kimi, Qwen,
NVIDIA, etc.); only 3 had entries. Everyone else landed in "Other"
with no downside beyond the mandatory click.

Replaces it with:

1. Key-name <datalist> autocomplete sourced from new
   KEY_NAME_SUGGESTIONS in lib/services.ts — 26 entries covering
   common infra keys + every hermes-supported provider.

2. inferGroup(keyName) derives classification at render time,
   matching what the store already does in getGrouped(). No
   behaviour change for list grouping.

3. Provider docs link renders inline only when inferGroup
   recognises the name. For 'custom' keys we stay quiet — no
   false-structure prompt.

4. Test-connection button still available when the inferred group
   supports it AND the value is format-valid. Same providers as
   before.

SERVICES registry preserved for LIST rendering + test routing.

Result: two fields instead of three. One fewer decision. Provider-
agnostic by design — new providers work the moment someone types
their canonical env var name; no UI code change per provider.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-22 16:41:43 -07:00
Hongming Wang
f6e6a64ba9 fix(canvas): forward-port dynamic runtime dropdown from staging (PR #1526)
PR #1526 shipped the /templates registry + canvas dynamic Runtime /
Model / Required-Env fields on 2026-04-22 — but merged into the
staging branch, not main. The staging→main promotion PR #1496 has
been open unmerged for a while with 1172 commits divergence, so
prod (which builds from main) still carries the old hardcoded
dropdown.

Symptom seen on hongmingwang.moleculesai.app today:

- New Hermes Agent workspace (template declares runtime: hermes) loads
  Config tab → Runtime dropdown shows "LangGraph (default)" because
  there's no <option value="hermes"> in the hardcoded list; it falls
  back to empty-value silently.
- Model field is a plain TextInput with static placeholder
  "e.g. anthropic:claude-sonnet-4-6" — should be a combobox populated
  from the selected runtime's models[].
- Required Env Vars is a TagList with static placeholder
  "e.g. CLAUDE_CODE_OAUTH_TOKEN" — should auto-populate from the
  selected model's required_env.
- Net effect: "Save & Deploy" sends empty model + empty env to the
  provisioner → workspace instant-fails.

This PR cherry-picks the exact three files from PR #1526 (#359dc61
on staging) forward to main, without pulling the other 1171
commits:

- canvas/src/components/tabs/ConfigTab.tsx
  - RuntimeOption interface + FALLBACK_RUNTIME_OPTIONS (hermes,
    gemini-cli included)
  - useEffect fetches /templates and populates runtimeOptions
    dynamically
  - dropdown renders from runtimeOptions (no hardcoded list)
  - Model becomes a combobox with datalist of available models
    per selected runtime
  - Required Env Vars auto-populates from the selected model's
    required_env on model change

- workspace-server/internal/handlers/templates.go
  - /templates endpoint returns [{id, name, runtime, models}] with
    per-template models registry (id, name, required_env)

- workspace-server/internal/handlers/templates_test.go
  - Tests for runtime+models parsing and legacy top-level model
    fallback

The canvas Runtime dropdown now resolves "hermes" correctly;
Model dropdown shows the models[] from the hermes template; Env
auto-populates with HERMES_API_KEY (or whichever model selected).

Verified locally:
  - workspace-server builds clean
  - Template handler tests pass: TestTemplatesList_RuntimeAndModelsRegistry,
    TestTemplatesList_LegacyTopLevelModel, TestTemplatesList_NonexistentDir

Follow-up: the staging→main promotion gap (#1496) is the
underlying process issue. Either merge that PR or adopt a policy
of landing fixes directly on main (as several PRs have today).
Files here were chosen minimally to avoid pulling unrelated staging
changes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-22 14:28:38 -07:00
airenostars
7a89704b6e
fix(build): add missing fmt import + fix canvas Dockerfile GID (#1487)
* docs(canary-release): flag as aspirational; link to current state

The canary-release.md doc describes the pipeline as if the fleet is
running — referring to AWS account 004947743811 and a configured
MoleculeStagingProvisioner role. Reality as of 2026-04-22: no canary
tenants are provisioned, the 3 GH Actions secrets are empty, and
canary-verify.yml has failed 7/7 times in a row.

Added a top-of-doc ⚠️ state note that:

1. Clarifies this is intended design, not deployed reality.
2. Notes the AWS account ID is historical / unverified.
3. Explains that merges currently rely on manual promote-latest.
4. Cross-links to molecule-controlplane/docs/canary-tenants.md for
   the Phase 1 work that's shipped, the Phase 2 stand-up plan, and
   the "should we even do this now?" decision framework.
5. Asks whoever lands Phase 2 to reconcile the two docs.

No behaviour change — doc-only.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(build): add missing fmt import in a2a_proxy.go, fix canvas Dockerfile GID

- a2a_proxy.go: missing "fmt" import caused build failure (8 undefined
  references at lines 743-775). Likely dropped during a recent merge.
- canvas/Dockerfile: GID 1000 already in use in node base image.
  Changed to dynamic group/user creation with fallback.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Hongming Wang <hongmingwang.rabbit@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: Hongming Wang <hongmingwangrabbit@gmail.com>
2026-04-22 21:10:58 +00:00
116526bff3 fix(canvas/a11y): orgs/page.tsx — form labels, error announcements, checkout banner
- CreateOrgForm: replace bare <span> labels with <label htmlFor> + input id
  (WCAG 1.3.1 — programmatic label association); add aria-describedby hint for slug field
- Error state: add role=alert on error <p> (WCAG 4.1.3 — Status Messages)
- CheckoutBanner: add role=status + aria-live=polite (WCAG 4.1.3);
  restore decorative ✓ with aria-hidden=true

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-22 21:06:20 +00:00
d6dbf23172 test(canvas/a11y): add WCAG 2.1 accessibility tests for ConsoleModal and DeleteCascadeConfirmDialog
ConsoleModal: role=dialog, aria-modal, aria-labelledby, backdrop aria-hidden, error role=alert, accessible button names
DeleteCascadeConfirmDialog: role=dialog, aria-modal, aria-labelledby, backdrop aria-hidden, SVG aria-hidden, disabled state, keyboard interactions (Escape, Enter), accessible names

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-22 20:39:48 +00:00
8bb0fe70ff fix(canvas/a11y): DeleteCascadeConfirmDialog backdrop aria-hidden (WCAG 4.1.2)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-22 20:36:05 +00:00
a322dd0056 fix(canvas/a11y): unaudited components — backdrop/semantic a11y gaps
- ConsoleModal.tsx: backdrop div aria-hidden; error div role=alert (WCAG 4.1.2)
- ProvisioningTimeout.tsx: warning SVG aria-hidden; cancel-dialog backdrop aria-hidden (WCAG 4.1.2)
- TermsGate.tsx: backdrop aria-hidden; dialog role=dialog+aria-modal+aria-labelledby; error role=alert
- TopBar.tsx: replace non-semantic role=banner div with <header>; logo emoji aria-hidden
- FilesToolbar.tsx: aria-label on select dropdown; aria-label on all icon buttons (New, Upload, Export, Clear, Refresh, file input)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-22 20:07:49 +00:00
c6e7ccb289 fix(canvas/a11y): MissingKeysModal — backdrop aria-hidden, decorative SVGs
- Backdrop div: add aria-hidden="true" so screen readers skip it (WCAG 4.1.2)
- Warning triangle SVG (header): add aria-hidden="true" (decorative icon)
- Saved-badge checkmark SVG: add aria-hidden="true" (decorative icon)
- Add MissingKeysModal.a11y.test.tsx: 14 tests covering role=dialog,
  aria-modal, aria-labelledby, backdrop aria-hidden, SVG aria-hidden,
  focus-on-open (WCAG 2.4.3), Escape key handler (WCAG 2.1.2),
  accessible button names

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-22 19:40:18 +00:00
e211a25ccd fix(canvas/a11y): dialog aria-modal, icon-button labels, focus management
- CookieConsent.tsx: add aria-modal="true" (WCAG 2.1.1)
- ConsoleModal.tsx: add useRef + requestAnimationFrame focus management on open
- ConversationTraceModal.tsx: remove redundant aria-describedby={undefined}
- FileTree.tsx: add aria-label to directory/file delete buttons (WCAG 4.1.2)
- FileEditor.tsx: add aria-label to download button (WCAG 4.1.2)
- ScheduleTab.tsx: add aria-label to Run Now, Edit, Delete icon buttons
- form-inputs.tsx: add aria-label to tag removal button

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-22 19:03:00 +00:00
Molecule AI Fullstack (floater)
ea5e018f76 Merge main into staging to sync 2026-04-22 18:15:52 +00:00
molecule-ai[bot]
6bd1691446
Merge pull request #1594 from Molecule-AI/fix/canvas-a11y-clean
fix(canvas/a11y): aria-hidden on decorative SVGs + MissingKeysModal semantics
2026-04-22 18:11:12 +00:00
236158d4a4 fix(canvas/a11y): add aria-hidden to decorative SVGs + MissingKeysModal semantics
- DeleteCascadeConfirmDialog: aria-hidden on warning triangle SVG (button
  already has adjacent text content; icon is purely decorative)
- Toolbar: aria-hidden on 4 decorative SVGs (stop-all, restart-pending,
  search, help) — buttons all have aria-label/aria-expanded/text
- MissingKeysModal: role="dialog" aria-modal="true" aria-labelledby on
  container, id="missing-keys-title" on heading, requestAnimationFrame
  focus management via useRef (replaces autoFocus={index===0})
- CreateWorkspaceDialog: remove redundant aria-describedby={undefined}

WCAG 2.1 SC 1.1.1 — screen readers skip purely-presentational icons.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-22 17:40:43 +00:00
Hongming Wang
5f96a832e7 fix(canvas): drop node:20-alpine default user before creating canvas uid 1000
publish-canvas-image has been failing on every main push since 2026-04-21
at `addgroup -g 1000 canvas` because node:20-alpine already ships a `node`
user/group at uid/gid 1000. Same collision workspace-server/Dockerfile.tenant
already fixes with `deluser --remove-home node` before `addgroup`.

Copying that pattern here so the workflow goes green again and canvas images
publish to ghcr. No runtime behaviour change — canvas still runs as non-root
uid 1000.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-22 09:42:02 -07:00
Hongming Wang
359dc615e9
fix(canvas+templates): fetch runtime dropdown from /templates registry (#1526)
* fix(canvas+templates): fetch runtime dropdown from /templates registry

Canvas hardcoded 6 runtime options, drifting from manifest.json which
already registers hermes + gemini-cli as first-class workspace templates.
A Hermes workspace had runtime=hermes in its DB row but Config showed
"LangGraph (default)" — the HTML select fell back to its first option
because "hermes" wasn't listed, and saving would clobber the runtime
back to empty.

Now:
- GET /templates returns the runtime field from each cloned template's
  config.yaml (previously dropped on the floor)
- ConfigTab fetches /templates on mount, dedupes non-empty runtimes, and
  renders them as <option>s. Falls back to the static list if the fetch
  fails (offline, older backend), so the control never renders empty.

Adding a template to manifest.json now flows through automatically — no
canvas PR required.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(canvas+templates): model + required-env suggestions from template

Extends the dropdown fix so Model and Required Env also flow from
the template registry instead of being free-form fields the user
has to remember.

Template config.yaml now declares:

  runtime_config:
    model: <default>
    models:
      - id: nous-hermes-3-70b
        name: Nous Hermes 3 70B (Nous Portal)
        required_env: [HERMES_API_KEY]
      - id: nousresearch/hermes-3-llama-3.1-70b
        name: Hermes 3 70B (via OpenRouter)
        required_env: [OPENROUTER_API_KEY]

Platform: GET /templates now returns runtime + model + models[] per
template (was previously dropping runtime + ignoring runtime_config).

Canvas:
- Runtime dropdown built from /templates (was hardcoded 6 options)
- Model input becomes a datalist combobox; free-form input still
  allowed since model names rotate faster than templates
- Required Env Vars default to the selected model's required_env,
  labelled "(suggested)" so the user knows it's template-driven
- Everything falls back to a static list when /templates is
  unreachable, so offline editing still works

Follow-up: add models[] to the other 7 template repos (claude-code,
crewai, autogen, deepagents, openclaw, gemini-cli, langgraph). This
PR updates the platform + canvas; the Hermes template config update
goes in a separate PR against its own repo.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(canvas): commit required_env on model change; add backend tests

Review turned up that the \"Required Env Vars (suggested)\" display
was cosmetic-only — users picking a different model saw the new
env suggestion in the TagList, but the values never made it into
state, so Save serialized an empty (or stale) required_env and the
workspace ran with the wrong auth check.

Canvas fixes:
- Model input onChange now commits the matched modelSpec's required_env
  to state — but only when the prior required_env was empty or matched
  the previous modelSpec's list (i.e. user hadn't manually edited).
  User-typed envs always win.
- Dropped the display-only fallback in TagList values; shows only what's
  actually in state.
- New \"Template suggests X, Apply\" hint button covers the edge case
  where state and template differ (existing workspace whose required_env
  lags the template's current recommendation).
- datalist option key now includes index so template authors shipping
  duplicate model ids don't trigger a silent React key collision.
- Small arraysEqual helper.

Backend tests:
- TestTemplatesList_RuntimeAndModelsRegistry — asserts /templates
  response carries runtime + models[] with per-model required_env.
- TestTemplatesList_LegacyTopLevelModel — asserts older templates with
  top-level model: still surface correctly, with empty Models[].

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Hongming Wang <hongmingwang.rabbit@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-22 15:07:46 +00:00
Hongming Wang
e88ab70251 fix(canvas): stop infinite re-render on ContextMenu mount
ContextMenu's children selector ran .filter() inside the Zustand
hook, returning a brand-new array reference on every render.
useSyncExternalStore under the hood compares snapshots with
Object.is — a new array always differs, so React kept scheduling
re-renders, hit the 50-update depth cap, and crashed with minified
error #185.

Observed as "Application error: a client-side exception" on every
SaaS tenant once a session cookie resolved. Caught in dev mode
where the build emits the clear warning:

  The result of getSnapshot should be cached to avoid an infinite loop
      at ContextMenu (src/components/ContextMenu.tsx:26:34)

Fix: select the stable nodes array once, derive children via
useMemo outside the store subscription. Same output, no new
reference per render.

Manually verified: dev bundle served through a cloudflared tunnel
to a live tenant, ContextMenu component mounts cleanly, remaining
console errors are all unrelated (localhost API 401s from the dev
server pointing at its own origin).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 21:47:32 -07:00
molecule-ai[bot]
64ccf8e179
fix: CWE-78 rm scope, go vet failures, delegation idempotency
* refactor: split 4 oversized handler files into focused sub-files

- org.go (1099 lines) → org.go + org_import.go + org_helpers.go
- mcp.go (1001 lines) → mcp.go + mcp_tools.go
- workspace.go (934 lines) → workspace.go + workspace_crud.go
- a2a_proxy.go (825 lines) → a2a_proxy.go + a2a_proxy_helpers.go

No functional changes — same package, same exports, same tests.
All files stay under 635 lines.

Note: isSafeURL and isPrivateOrMetadataIP are duplicated between
mcp_tools.go and a2a_proxy_helpers.go — this is a pre-existing issue
from the original mcp.go and a2a_proxy.go, not introduced by this split.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat(runtime+scheduler): increment/decrement active_tasks counter (refs #1386)

* docs(tutorials): add Self-Hosted AI Agents guide — Docker, Fly Machines, bare metal

* docs: add Remote Agents feature + Phase 30 blog links to docs index

* docs(marketing): update Phase 30 brief — Action 5 complete, docs/index.md update noted

* docs(api-ref): add workspace file copy API reference (#1281)

Documents TemplatesHandler.copyFilesToContainer (container_files.go):
- Endpoint overview: PUT /workspaces/:id/files/*path
- Parameter descriptions for all four function parameters
- CWE-22 path traversal protection (PRs #1267/1270/1271)
- Defense-in-depth: validateRelPath at handler + archive boundary
- Full error code table (400/404/500)
- curl example with success and path-traversal rejection cases

Also covers: writeViaEphemeral routing, findContainer fallback,
allowed roots allow-list, and related links to platform-api.md.

Co-authored-by: Molecule AI Technical Writer <technical-writer@agents.moleculesai.app>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(security): CWE-78/CWE-22 — block shell injection in deleteViaEphemeral (#1310)

## Summary
Issue #1273: deleteViaEphemeral interpolated filePath directly into
rm command, enabling both shell injection (CWE-78) and path traversal
(CWE-22) attacks.

## Changes
1. Added validateRelPath(filePath) guard before constructing the rm command.
   validateRelPath blocks absolute paths and ".." traversal sequences.
2. Changed Cmd from "/configs/"+filePath (string interpolation) to
   []string{"rm", "-rf", "/configs", filePath} (exec form). This
   eliminates shell injection entirely — filePath is a plain argument,
   never interpreted as shell code.

## Security properties
- validateRelPath: blocks "../" and absolute paths before they reach Docker
- Exec form: filePath cannot inject shell metacharacters even if validation
  is somehow bypassed
- "/configs" as separate arg: rm has exactly two arguments, no room for
  injected args

Closes #1273.

Co-authored-by: Molecule AI Infra-Runtime-BE <infra-runtime-be@agents.moleculesai.app>

* fix(security): backport SSRF defence (CWE-918) to main — isSafeURL in a2a_proxy.go (#1292) (#1302)

* fix(security): backport SSRF defence (CWE-918) to main — isSafeURL in mcp.go and a2a_proxy.go

Issue #1042: 3 CodeQL SSRF findings across mcp.go and a2a_proxy.go.
staging already ships the fix (PRs #1147, #1154 → merged); main did not include it.

- mcp.go: add isSafeURL() + isPrivateOrMetadataIP() helpers; validate
  agentURL before outbound calls in mcpCallTool (line ~529) and
  toolDelegateTaskAsync (line ~607)
- a2a_proxy.go: add identical isSafeURL() + isPrivateOrMetadataIP()
  helpers; call isSafeURL() before dispatchA2A in resolveAgentURL()
  (blocks finding #1 at line 462)
- mcp_test.go: 19 new tests covering all blocked URL patterns:
  file://, ftp://, 127.0.0.1, ::1, 169.254.169.254, 10.x.x.x,
  172.16.x.x, 192.168.x.x, empty hostname, invalid URL,
  isPrivateOrMetadataIP across all private/CGNAT/metadata ranges

1. URL scheme enforcement — http/https only
2. IP literal blocking — loopback, link-local, RFC-1918, CGNAT, doc/test ranges
3. DNS hostname resolution — blocks internal hostnames resolving to private IPs

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(ci-blocker): remove duplicate isSafeURL/isPrivateOrMetadataIP from mcp.go

Issue #1292: PR #1274 duplicated isSafeURL + isPrivateOrMetadataIP in
mcp.go — both functions already exist on main at lines 829 and 876.
Kept the mcp.go definitions (the originals) and removed the 70-line
duplicate appended at end of file. a2a_proxy.go functions are
unchanged — they serve the same purpose via a separate code path.

* fix: remove orphaned commit-text lines from a2a_proxy.go

Three lines from the PR/commit title were accidentally baked into the
file during the rebase from #1274 to #1302, causing a Go syntax error
(a bare string literal at statement level followed by dangling braces).

Deletion restores:
  }
  return agentURL, nil
}

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Molecule AI Infra-Runtime-BE <infra-runtime-be@agents.moleculesai.app>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Molecule AI Core-BE <core-be@agents.moleculesai.app>
Co-authored-by: Molecule AI SDK Lead <sdk-lead@agents.moleculesai.app>

* fix(canvas/test): patch test regressions from PR #1243 + proximity hitbox fix (#1313)

* fix(ci): revert cancel-in-progress to true — ubuntu-runner dispatch stalled

With cancel-in-progress: false, pending CI runs accumulate in the
ci-staging concurrency group. New pushes create queued runs, but
GitHub dispatches multiple runs for the same SHA instead of replacing
the pending one. All runs get stuck/cancelled before completing.

Reverting to cancel-in-progress: true restores CI operation — runs
that are superseded are cancelled, freeing the concurrency slot for
the new run to proceed.

Runner availability (ubuntu-latest dispatch stall) is a separate
infra issue tracked independently.

* fix(security): validate tar header names in copyFilesToContainer — CWE-22 path traversal (#1043)

Tar header names were built from raw map keys without validation. A malicious
server-side caller could embed "../" in a file name to escape the destPath
volume mount (/configs) and write files outside the intended directory.

Fix: validate each name with filepath.Clean + IsAbs + HasPrefix("..") checks
before using it in the tar header, then join with destPath for the archive
header. Also guard parent-directory creation against traversal.

Closes #1043.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(canvas/test): patch regressed tests from PR #1243 orgs-page flakiness fix

Two regressions introduced by PR #1243 (fix issue #1207):

1. **ContextMenu.keyboard.test.tsx** — `setPendingDelete` now receives
   `{id, name, hasChildren}` (cascade-delete UX, PR #1252), but the test
   expected only `{id, name}`. Added `hasChildren: false` to the assertion.

2. **orgs-page.test.tsx** — 10 tests awaited `vi.advanceTimersByTimeAsync(50)`
   without `act()`. With fake timers, `setState` (synchronous) is flushed by
   `advanceTimersByTimeAsync`, but the React state update it triggers is a
   microtask — so the test saw stale render. Wrapping in `act(async () =>
   { await vi.advanceTimersByTimeAsync(50); })` ensures microtasks drain
   before assertions run.

All 813 vitest tests pass.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(canvas): add 100px proximity threshold to drag-to-nest detection

Fixes #1052 — previously, getIntersectingNodes() returned any node whose
bounding box overlapped the dragged node, regardless of actual pixel
distance. On a sparse canvas this triggered the "Nest Workspace" dialog
even when the dragged node was nowhere near any target.

The fix adds an on-node-drag proximity filter: only nodes within 100px
(center-to-center) of the dragged node are eligible as nest targets.
Distance is computed as squared Euclidean to avoid the sqrt overhead in
the hot drag path.

Added two tests to Canvas.pan-to-node.test.tsx covering the mock wiring
and confirming the regression is addressed in Canvas.tsx.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: molecule-ai[bot] <276602405+molecule-ai[bot]@users.noreply.github.com>
Co-authored-by: Molecule AI Core-FE <core-fe@agents.moleculesai.app>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(canvas): add ?? 0 guard for optional budget_used in progressPct (#1324) (#1327)

* fix(ci): revert cancel-in-progress to true — ubuntu-runner dispatch stalled

With cancel-in-progress: false, pending CI runs accumulate in the
ci-staging concurrency group. New pushes create queued runs, but
GitHub dispatches multiple runs for the same SHA instead of replacing
the pending one. All runs get stuck/cancelled before completing.

Reverting to cancel-in-progress: true restores CI operation — runs
that are superseded are cancelled, freeing the concurrency slot for
the new run to proceed.

Runner availability (ubuntu-latest dispatch stall) is a separate
infra issue tracked independently.

* fix(security): validate tar header names in copyFilesToContainer — CWE-22 path traversal (#1043)

Tar header names were built from raw map keys without validation. A malicious
server-side caller could embed "../" in a file name to escape the destPath
volume mount (/configs) and write files outside the intended directory.

Fix: validate each name with filepath.Clean + IsAbs + HasPrefix("..") checks
before using it in the tar header, then join with destPath for the archive
header. Also guard parent-directory creation against traversal.

Closes #1043.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(canvas/test): patch regressed tests from PR #1243 orgs-page flakiness fix

Two regressions introduced by PR #1243 (fix issue #1207):

1. **ContextMenu.keyboard.test.tsx** — `setPendingDelete` now receives
   `{id, name, hasChildren}` (cascade-delete UX, PR #1252), but the test
   expected only `{id, name}`. Added `hasChildren: false` to the assertion.

2. **orgs-page.test.tsx** — 10 tests awaited `vi.advanceTimersByTimeAsync(50)`
   without `act()`. With fake timers, `setState` (synchronous) is flushed by
   `advanceTimersByTimeAsync`, but the React state update it triggers is a
   microtask — so the test saw stale render. Wrapping in `act(async () =>
   { await vi.advanceTimersByTimeAsync(50); })` ensures microtasks drain
   before assertions run.

All 813 vitest tests pass.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(canvas): add 100px proximity threshold to drag-to-nest detection

Fixes #1052 — previously, getIntersectingNodes() returned any node whose
bounding box overlapped the dragged node, regardless of actual pixel
distance. On a sparse canvas this triggered the "Nest Workspace" dialog
even when the dragged node was nowhere near any target.

The fix adds an on-node-drag proximity filter: only nodes within 100px
(center-to-center) of the dragged node are eligible as nest targets.
Distance is computed as squared Euclidean to avoid the sqrt overhead in
the hot drag path.

Added two tests to Canvas.pan-to-node.test.tsx covering the mock wiring
and confirming the regression is addressed in Canvas.tsx.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(canvas): add ?? 0 guard for optional budget_used in progressPct

Fixes #1324 — TypeScript strict mode flags budget.budget_used as
possibly undefined in the progressPct ternary, even though the
outer condition checks budget_limit > 0.

Fix: use nullish coalescing (budget_used ?? 0) so progress shows 0%
when the backend returns a partial shape (provisioning-stuck
workspaces). Also adds a test covering the undefined-budget_used
case with the progress bar aria-valuenow and fill width both at 0%.

Closes #1324.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: molecule-ai[bot] <276602405+molecule-ai[bot]@users.noreply.github.com>
Co-authored-by: Molecule AI Core-FE <core-fe@agents.moleculesai.app>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(canvas): add ?? 0 guard for optional budget_used in progressPct (issue #1324) (#1329)

* fix(ci): revert cancel-in-progress to true — ubuntu-runner dispatch stalled

With cancel-in-progress: false, pending CI runs accumulate in the
ci-staging concurrency group. New pushes create queued runs, but
GitHub dispatches multiple runs for the same SHA instead of replacing
the pending one. All runs get stuck/cancelled before completing.

Reverting to cancel-in-progress: true restores CI operation — runs
that are superseded are cancelled, freeing the concurrency slot for
the new run to proceed.

Runner availability (ubuntu-latest dispatch stall) is a separate
infra issue tracked independently.

* fix(security): validate tar header names in copyFilesToContainer — CWE-22 path traversal (#1043)

Tar header names were built from raw map keys without validation. A malicious
server-side caller could embed "../" in a file name to escape the destPath
volume mount (/configs) and write files outside the intended directory.

Fix: validate each name with filepath.Clean + IsAbs + HasPrefix("..") checks
before using it in the tar header, then join with destPath for the archive
header. Also guard parent-directory creation against traversal.

Closes #1043.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(canvas/test): patch regressed tests from PR #1243 orgs-page flakiness fix

Two regressions introduced by PR #1243 (fix issue #1207):

1. **ContextMenu.keyboard.test.tsx** — `setPendingDelete` now receives
   `{id, name, hasChildren}` (cascade-delete UX, PR #1252), but the test
   expected only `{id, name}`. Added `hasChildren: false` to the assertion.

2. **orgs-page.test.tsx** — 10 tests awaited `vi.advanceTimersByTimeAsync(50)`
   without `act()`. With fake timers, `setState` (synchronous) is flushed by
   `advanceTimersByTimeAsync`, but the React state update it triggers is a
   microtask — so the test saw stale render. Wrapping in `act(async () =>
   { await vi.advanceTimersByTimeAsync(50); })` ensures microtasks drain
   before assertions run.

All 813 vitest tests pass.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(canvas): add 100px proximity threshold to drag-to-nest detection

Fixes #1052 — previously, getIntersectingNodes() returned any node whose
bounding box overlapped the dragged node, regardless of actual pixel
distance. On a sparse canvas this triggered the "Nest Workspace" dialog
even when the dragged node was nowhere near any target.

The fix adds an on-node-drag proximity filter: only nodes within 100px
(center-to-center) of the dragged node are eligible as nest targets.
Distance is computed as squared Euclidean to avoid the sqrt overhead in
the hot drag path.

Added two tests to Canvas.pan-to-node.test.tsx covering the mock wiring
and confirming the regression is addressed in Canvas.tsx.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(canvas): add ?? 0 guard for optional budget_used in progressPct

Fixes #1324 — TypeScript strict mode flags budget.budget_used as
possibly undefined in the progressPct ternary, even though the
outer condition checks budget_limit > 0.

Fix: use nullish coalescing (budget_used ?? 0) so progress shows 0%
when the backend returns a partial shape (provisioning-stuck
workspaces). Also adds a test covering the undefined-budget_used
case with the progress bar aria-valuenow and fill width both at 0%.

Closes #1324.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: molecule-ai[bot] <276602405+molecule-ai[bot]@users.noreply.github.com>
Co-authored-by: Molecule AI Core-FE <core-fe@agents.moleculesai.app>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(platform): unblock SaaS workspace registration end-to-end

Every workspace in the cross-EC2 SaaS provisioning shape was failing
registration, heartbeat, or A2A routing. Four distinct blockers sat
between "EC2 is up" and "agent responds"; three are platform-side and
fixed here (the fourth is in the CP user-data, separate PR).

1. SSRF validator blocked RFC-1918 (registry.go + mcp.go)
   validateAgentURL and isPrivateOrMetadataIP rejected 172.16.0.0/12,
   which contains the AWS default VPC range (172.31.x.x) that every
   sibling workspace EC2 registers from. Registration returned 400 and
   the 10-min provision sweep flipped status to failed. RFC-1918 +
   IPv6 ULA are now gated behind saasMode(); link-local (169.254/16),
   loopback, IPv6 metadata (fe80::/10, ::1), and TEST-NET stay blocked
   unconditionally in both modes.

   saasMode() resolution order:
     1. MOLECULE_DEPLOY_MODE=saas|self-hosted (explicit operator flag)
     2. MOLECULE_ORG_ID presence (legacy implicit signal, kept for
        back-compat so existing deployments don't need a config change)

   isPrivateOrMetadataIP now actually checks IPv6 — previously it
   returned false on any non-IPv4 input, which would let a registered
   [::1] or [fe80::...] URL bypass the SSRF check entirely.

2. Orphan auth-token minting (workspace_provision.go)
   issueAndInjectToken mints a token and stuffs it into
   cfg.ConfigFiles[".auth_token"]. The Docker provisioner writes that
   file into the /configs volume — the CP provisioner ignores it
   (only cfg.EnvVars crosses the wire). Result: live token in DB, no
   plaintext on disk, RegistryHandler.requireWorkspaceToken 401s every
   /registry/register attempt because the workspace is no longer in
   the "no live token → bootstrap-allowed" state. Now no-ops in SaaS
   mode; the register handler already mints on first successful
   register and returns the plaintext in the response body for the
   runtime to persist locally.

   Also removes the redundant wsauth.IssueToken call at the bottom of
   provisionWorkspaceCP, which created the same orphan-token pattern
   a second time.

3. Compaction artefacts (bundle/importer.go, handlers/org_tokens.go,
   scheduler.go, workspace_provision.go)
   Four pre-existing compile errors on main from an earlier session's
   code truncation: missing tuple destructuring on ExecContext /
   redactSecrets / orgTokenActor, missing close-brace in
   Scheduler.fireSchedule's panic recovery. All one-line mechanical
   fixes; without them the binary would not build.

Tests
-----
ssrf_test.go adds:
  * TestSaasMode — covers the env resolution ladder (explicit flag
    wins over legacy signal, case-insensitive, whitespace tolerant)
  * TestIsPrivateOrMetadataIP_SaaSMode — asserts RFC-1918 + IPv6 ULA
    flip to allowed, metadata/loopback/TEST-NET still blocked
  * TestIsPrivateOrMetadataIP_IPv6 — regression guard for the old
    "returns false for all IPv6" behaviour

Follow-up issue for CP-sourced workspace_id attestation will be filed
separately — closes the residual intra-VPC SSRF + token-race windows
the SaaS-mode relaxation introduces.

Verified end-to-end today on workspace 6565a2e0 (hermes runtime, OpenAI
provider) — agent returned "PONG" in 1.4s after register → heartbeat →
A2A proxy → runtime.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(runtime+scheduler): increment/decrement active_tasks + max_concurrent (#1408)

Runtime (shared_runtime.py):
- set_current_task now increments active_tasks on task start, decrements
  on completion (was binary 0/1)
- Counter never goes below 0 (max(0, n-1))
- Pushes heartbeat immediately on BOTH increment and decrement (#1372)

Scheduler (scheduler.go):
- Reads max_concurrent_tasks from DB (default 1, backward compatible)
- Skips cron only when active_tasks >= max_concurrent_tasks (was > 0)
- Leaders can be configured with max_concurrent_tasks > 1 to accept
  A2A delegations while a cron runs

Platform:
- Added max_concurrent_tasks column to workspaces (migration 037)
- Workspace model + list/get queries include the new field
- API exposes max_concurrent_tasks in workspace JSON

Config.yaml support (future): runtime_config.max_concurrent_tasks

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(review): address 3 critical issues from code review

1. BLOCKER: executor_helpers.py now uses increment/decrement too
   (was still binary 0/1, stomping the counter for CLI + SDK executors)

2. BUG: asymmetric getattr defaults fixed — both paths use default 0
   (was 0 on increment, 1 on decrement)

3. UX: current_task preserved when active_tasks > 0 on decrement
   (was clearing task description even when other tasks still running)

4. Scheduler polling loop re-reads max_concurrent_tasks on each poll
   (was using stale value from initial query)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Hongming Wang <hongmingwangrabbit@gmail.com>
Co-authored-by: molecule-ai[bot] <276602405+molecule-ai[bot]@users.noreply.github.com>
Co-authored-by: Molecule AI Technical Writer <technical-writer@agents.moleculesai.app>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Molecule AI Infra-Runtime-BE <infra-runtime-be@agents.moleculesai.app>
Co-authored-by: Molecule AI Core-BE <core-be@agents.moleculesai.app>
Co-authored-by: Molecule AI SDK Lead <sdk-lead@agents.moleculesai.app>
Co-authored-by: Molecule AI Core-FE <core-fe@agents.moleculesai.app>
Co-authored-by: Hongming Wang <hongmingwang.rabbit@users.noreply.github.com>

* docs: workspace files API reference, skill catalog, and links

* docs: fix secrets endpoint path across docs

The workspace secrets endpoint is `/workspaces/:id/secrets`, not
`/secrets/values`. This was wrong in quickstart.md (Path 2: Remote Agent)
and workspace-runtime.md (registration flow example and comparison table).
The external-agent-registration guide already had the correct path.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* docs: fix broken blog cross-link in skills-vs-bundled-tools post

Link path had an extra `/docs/` segment: `/docs/blog/...` instead of
`/blog/...`. Nextra resolves blog posts directly under `/blog/<slug>`,
not under `/docs/blog/`.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* docs: add skill-catalog.md guide

Linked from the skills-vs-bundled-tools blog post as a reference
for TTS/image-generation/web-search skills. The blog promises
"install directly via the CLI" with a skill catalog — this page
fills that promise by documenting available skill types, install
commands, version management, custom skill authoring, and removal.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* docs(marketing): update Phase 30 brief — Action 5 complete, docs/index.md update noted

* docs(api-ref): add workspace file copy API reference

Documents TemplatesHandler.copyFilesToContainer (container_files.go):
- Endpoint overview: PUT /workspaces/:id/files/*path
- Parameter descriptions for all four function parameters
- CWE-22 path traversal protection (PRs #1267/1270/1271)
- Defense-in-depth: validateRelPath at handler + archive boundary
- Full error code table (400/404/500)
- curl example with success and path-traversal rejection cases

Also covers: writeViaEphemeral routing, findContainer fallback,
allowed roots allow-list, and related links to platform-api.md.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Molecule AI Technical Writer <technical-writer@agents.moleculesai.app>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: molecule-ai[bot] <276602405+molecule-ai[bot]@users.noreply.github.com>

* fix(handlers): add saasMode() gating to isPrivateOrMetadataIP in a2a_proxy_helpers.go

Issue #1421 / #1401: PR #1363 (handler split) moved isPrivateOrMetadataIP
into a2a_proxy_helpers.go but kept the OLD pre-SaaS version — it
unconditionally blocks RFC-1918 addresses, regressing the fix in
commits 1125a02 / cf10733.

The A2A proxy path now has the same SaaS-gated logic as registry.go:
- Cloud metadata (169.254/16, fe80::/10, ::1) always blocked in both modes
- RFC-1918 (10/8, 172.16/12, 192.168/16) + IPv6 ULA (fc00::/7) blocked in
  self-hosted, allowed in SaaS cross-EC2 mode
- IPv6 addresses now properly checked (previous version returned false for all)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* docs(marketing): Discord adapter Day 2 Reddit + HN community copy

* fix(tests): supply *events.Broadcaster pointer to captureBroadcaster

Cannot use *captureBroadcaster as *events.Broadcaster when the struct
embeds events.Broadcaster as a value — must initialize as a named field.

Fixes go vet error in workspace_provision_test.go:
  cannot use broadcaster (*captureBroadcaster) as *events.Broadcaster value

* Merge pull request #1429 from fix/canvas-tooltip-clear-timer

Without this, a 400ms setTimeout from onFocus/onMouseEnter that fires
after onBlur will re-show a tooltip the user just dismissed. The
setShow(false) in onBlur closes the tooltip immediately but leaves the
timer pending — Tab-blur followed by timer-fire would re-show it.

Fix: add clearTimeout(timerRef.current) at the top of onBlur, mirroring
the pattern already used in onMouseLeave and onFocus.

Refs: PR #1367 (a11y keyboard support — this was a pre-existing gap)

Co-authored-by: Molecule AI App-FE <app-fe@agents.moleculesai.app>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(canvas/test): add missing children:[] to setPendingDelete expectation (#1426)

PR #1252 (cascade-delete UX) updated setPendingDelete to pass a
children array for cascade-warning rendering. The keyboard-a11y test
assertion was not updated to match.

Test: clicking 'Delete' hoists state to the store and closes the menu

Co-authored-by: Molecule AI Core-QA <core-qa@agents.moleculesai.app>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(canvas/test): add children:[] to setPendingDelete + \&apos; entity fix (closes #1380) (#1427)

* ci: retry — trigger fresh runner allocation

* fix(canvas/test): add children:[] to setPendingDelete assertion

setPendingDelete now includes children:[] (PR #1383 extended the
pendingDelete type). The keyboard accessibility test at line 225 used
exact object matching which omitted the new field, causing a failure
after staging merged #1383.

Issue: #1380

* fix(canvas): replace &apos; HTML entity with straight apostrophe

JSX does not entity-decode &apos; — it renders the literal text
"&apos;" instead of "'".  Found at line 157 (payment confirmed) and
line 321 (empty org list).  Replaced with a straight apostrophe,
which JSX handles correctly.

Ref: issue #1375
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: DevOps Engineer <devops@molecule.ai>
Co-authored-by: Molecule AI Core-UIUX <core-uiux@agents.moleculesai.app>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* Merge pull request #1430 from fix/1421-saas-ssrf-helpers

Issue #1421 / #1401: PR #1363 (handler split) moved isPrivateOrMetadataIP
into a2a_proxy_helpers.go but kept the OLD pre-SaaS version — it
unconditionally blocks RFC-1918 addresses, regressing the fix in
commits 1125a02 / cf10733.

The A2A proxy path now has the same SaaS-gated logic as registry.go:
- Cloud metadata (169.254/16, fe80::/10, ::1) always blocked in both modes
- RFC-1918 (10/8, 172.16/12, 192.168/16) + IPv6 ULA (fc00::/7) blocked in
  self-hosted, allowed in SaaS cross-EC2 mode
- IPv6 addresses now properly checked (previous version returned false for all)

Co-authored-by: Molecule AI Core-BE <core-be@agents.moleculesai.app>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(P0): CWE-22 path traversal in copyFilesToContainer + ContextMenu test

Issue #1434 — CWE-22 Path Traversal Regression:
PR #1280 (dc218212) correctly used cleaned path in tar header.
PR #1363 (e9615af) regressed to using uncleaned `name`.
Fix: use `clean` in filepath.Join AND add defence-in-depth escape check.

Issue #1422 — ContextMenu Test Regression:
PR #1340 expanded pendingDelete store type to include `children:[]`.
Test assertion missing the field — add `children:[]` to match.

Note: ssrf.go created (shared isSafeURL/isPrivateOrMetadataIP) to
prepare for the handler-split refactor fix — current branch has no
build error, but the shared file will prevent regression when PR #1363
is merged. isSafeURL/isPrivateOrMetadataIP retained in both files
for now to avoid breaking callers while the split is finalized.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: resolve 3 go vet failures + add idempotency_key to delegate_task_async

- workspace_provision_test.go: add missing mock := setupTestDB(t) to
  TestSeedInitialMemories_Truncation — mock was referenced but never
  declared, causing "undefined: mock" vet error
- orgtoken/tokens_test.go: discard unused orgID return value with _ in
  Validate call — "declared and not used" vet error
- a2a_tools.py: delegate_task_async now sends idempotency_key (SHA-256
  of workspace_id + task) to POST /workspaces/:id/delegate, fixing
  duplicate task execution when an agent restarts mid-delegation (#1456)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: airenostars <airenostars@gmail.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: molecule-ai[bot] <276602405+molecule-ai[bot]@users.noreply.github.com>
Co-authored-by: Hongming Wang <hongmingwangrabbit@gmail.com>
Co-authored-by: Molecule AI Technical Writer <technical-writer@agents.moleculesai.app>
Co-authored-by: Molecule AI Infra-Runtime-BE <infra-runtime-be@agents.moleculesai.app>
Co-authored-by: Molecule AI Core-BE <core-be@agents.moleculesai.app>
Co-authored-by: Molecule AI SDK Lead <sdk-lead@agents.moleculesai.app>
Co-authored-by: Molecule AI Core-FE <core-fe@agents.moleculesai.app>
Co-authored-by: Hongming Wang <hongmingwang.rabbit@users.noreply.github.com>
Co-authored-by: Molecule AI Community Manager <community-manager@agents.moleculesai.app>
Co-authored-by: Molecule AI App-FE <app-fe@agents.moleculesai.app>
Co-authored-by: Molecule AI Core-QA <core-qa@agents.moleculesai.app>
Co-authored-by: DevOps Engineer <devops@molecule.ai>
Co-authored-by: Molecule AI Core-UIUX <core-uiux@agents.moleculesai.app>
Co-authored-by: Molecule AI Dev Lead <dev-lead@agents.moleculesai.app>
2026-04-21 18:22:30 +00:00
molecule-ai[bot]
38e9eba59a
fix(P0): CWE-22 path traversal in copyFilesToContainer + ContextMenu test
Issue #1434 — CWE-22 Path Traversal Regression:
PR #1280 (dc218212) correctly used cleaned path in tar header.
PR #1363 (e9615af) regressed to using uncleaned `name`.
Fix: use `clean` in filepath.Join AND add defence-in-depth escape check.

Issue #1422 — ContextMenu Test Regression:
PR #1340 expanded pendingDelete store type to include `children:[]`.
Test assertion missing the field — add `children:[]` to match.

Note: ssrf.go created (shared isSafeURL/isPrivateOrMetadataIP) to
prepare for the handler-split refactor fix — current branch has no
build error, but the shared file will prevent regression when PR #1363
is merged. isSafeURL/isPrivateOrMetadataIP retained in both files
for now to avoid breaking callers while the split is finalized.

Co-authored-by: Molecule AI Core-BE <core-be@agents.moleculesai.app>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-21 16:56:47 +00:00
Hongming Wang
a14cf863d1
Merge pull request #1445 from Molecule-AI/fix/tenant-dockerfile-uid-conflict
fix(tenant-image): remove node user so canvas uid 1000 can be created
2026-04-21 08:58:09 -07:00
e9615af169 Merge origin/main into staging: resolve conflicts with main's test + security fixes
Conflicts resolved (took main's versions):
- canvas/src/app/__tests__/orgs-page.test.tsx (act() wrappers, PR #1350)
- canvas/src/components/Canvas.tsx (100px proximity threshold, PR #1357)
- canvas/src/components/__tests__/ContextMenu.keyboard.test.tsx (hasChildren fix)
- workspace-server/internal/handlers/container_files.go (CWE-22/CWE-78 fixes, PRs #1281/#1310)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-21 12:25:42 +00:00
Hongming Wang
6bd674e412 fix(e2e): CP DELETE /cp/admin/tenants body uses 'confirm', not 'confirm_token'
Verified against live staging: the admin endpoint returns 400 'confirm
field must equal the URL slug' when the body key is 'confirm_token'.
Every workflow's safety-net teardown step + the main harness + the
Playwright teardown all had the wrong key. Fixed all six call sites.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 04:50:28 -07:00
Hongming Wang
d7193dfa34 feat(e2e): pivot to admin-bearer-only auth + add sanity self-check workflow
Reduces required secret surface from 2 (session cookie + admin token)
to 1 (admin token). Pairs with molecule-controlplane#202 which adds:
  - POST /cp/admin/orgs    — server-to-server org creation
  - GET /cp/admin/orgs/:slug/admin-token — per-tenant bearer fetch

With those endpoints live, CI doesn't need to scrape a browser WorkOS
session cookie. CP admin bearer (Railway CP_ADMIN_API_TOKEN) drives
provision + tenant-token retrieval + teardown through a single
credential.

Changes
-------
  test_staging_full_saas.sh: admin bearer for provision/teardown,
    fetched per-tenant token drives all tenant API calls. Added
    E2E_INTENTIONAL_FAILURE=1 toggle that poisons the tenant token
    after provisioning so the teardown path gets exercised when the
    happy-path isn't.

  canvas/e2e/staging-setup.ts: same pivot; exports STAGING_TENANT_TOKEN
    instead of STAGING_SESSION_COOKIE.
  canvas/e2e/staging-tabs.spec.ts: context.setExtraHTTPHeaders with
    Authorization: Bearer on every page request, no cookie handling.

  All three workflows (e2e-staging-saas, canary-staging,
    e2e-staging-canvas): drop MOLECULE_STAGING_SESSION_COOKIE env +
    verification step. One secret to set.

  NEW e2e-staging-sanity.yml: weekly Mon 06:00 UTC. Runs the harness
    with E2E_INTENTIONAL_FAILURE=1 and inverts the pass condition —
    rc=1 is green, rc=0 (unexpected success) or rc=4 (leak) open a
    priority-high issue labelled e2e-safety-net. This is the
    answer to 'how do we know the teardown path still works when
    nothing else has failed recently.'

STAGING_SAAS_E2E.md refreshed: single-secret setup, sanity workflow
documented, canvas workflow added to the coverage matrix.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 04:34:11 -07:00
Hongming Wang
f4700858ac feat(e2e): canary + canvas Playwright workflows; delegation mechanics
Three additions on top of 187a9bf:

1. Canary (.github/workflows/canary-staging.yml)
   30-min cron that runs the full-SaaS harness in E2E_MODE=canary: one
   hermes workspace + one A2A PONG + teardown. ~8-min wall clock vs
   ~20-min for the full run.
   Alerting is self-contained: opens a single 'Canary failing' issue on
   first failure, comments on subsequent failures (no issue spam),
   auto-closes the issue on the next green run. Labels: canary-staging,
   bug. Safety-net teardown step sweeps e2e-YYYYMMDD-canary-* orgs
   tagged today so a runner cancel can't leak EC2.

2. Canvas Playwright (canvas/e2e/staging-*.ts + playwright.staging.config.ts
   + .github/workflows/e2e-staging-canvas.yml)
   staging-setup.ts provisions a fresh org + hermes workspace (same
   lifecycle as the bash harness, just in TypeScript). staging-tabs.spec.ts
   clicks through all 13 workspace-panel tabs (chat, activity, details,
   skills, terminal, config, schedule, channels, files, memory, traces,
   events, audit) and asserts each renders without crashing and without
   'Failed to load' error toasts. Known SaaS gaps (Files empty, Terminal
   disconnects, Peers 401) are documented in #1369 and whitelisted so
   they don't fail the test — the gate is 'no hard crash', not 'no
   issues'.
   staging-teardown.ts deletes the org via DELETE /cp/admin/tenants/:slug.
   playwright.staging.config.ts separates staging from local tests so
   pnpm test in dev doesn't try to provision against staging. Retries=2
   and timeouts are longer; workers=1 because the setup provisions one
   shared workspace. Workflow uploads HTML report + screenshots on
   failure for 14 days.

3. Delegation mechanics (tests/e2e/test_staging_full_saas.sh section 10)
   Parent → child proxy test: POST /workspaces/CHILD/a2a with
   X-Source-Workspace-Id=PARENT and verify the child responds + child
   activity log captures PARENT as source. Intentionally LLM-free: the
   mechanics regression is what matters; prompt-driven delegation
   correctness belongs in canvas-driven tests.
   Also reorders teardown step to 11/11 since delegation is 10/11.

Mode gating:
   E2E_MODE=canary -> skips child workspace, HMA memory, peers,
   activity, delegation (steps 6, 9, 10 no-op). Full-lifecycle still
   runs every piece. Validated both paths via 'bash -n' syntax check
   after each edit.

Secrets requirement unchanged (same two secrets as 187a9bf):
  MOLECULE_STAGING_SESSION_COOKIE, MOLECULE_STAGING_ADMIN_TOKEN.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 04:15:10 -07:00
molecule-ai[bot]
00bd73f8c8
fix(canvas): a11y fixes + budget_used TypeScript guard + orgs-page test fix (#1367)
* fix(canvas/a11y): mark StatusDot as aria-hidden — decorative element

StatusDot is purely decorative; the status is already conveyed via
aria-label on parent elements (WorkspaceNode, SidePanel header, etc.).
Marking it aria-hidden="true" prevents screen readers from announcing
the empty div as "img" with no alt text.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(canvas): guard budget_used optional field with ?? 0 in progress calc

TypeScript error in CI: 'budget.budget_used' is possibly 'undefined'
when used in the progress percentage calculation. The field is
optional per BudgetData interface, so ?? 0 is the correct guard.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(canvas/a11y): Tooltip keyboard focus support + ARIA role

- Add role="tooltip" + unique id so assistive tech can find tooltip content
- Add aria-describedby on trigger so screen readers announce tooltip text
- Add onFocus/onBlur handlers so keyboard users (Tab navigation) can see
  tooltips that mouse users see on hover

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(canvas/test): restore advanceTimersByTime pattern in orgs-page error test

waitFor() + fake timers (vi.useFakeTimers in beforeEach) cause race
conditions: the 5s polling timeout fires before React state updates flush.
Restores the established pattern used by all other tests in this file:
advanceTimersByTimeAsync(50) + runAllTimersAsync().
Also removes the now-unused waitFor import.

Ref: PRs #1360, #1345
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Molecule AI Core-UIUX <core-uiux@agents.moleculesai.app>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-21 11:08:24 +00:00
molecule-ai[bot]
bde456a893 feat(canvas/e2e): add Playwright test for context-menu → delete confirm flow (#1344)
Issue #1138: Add Playwright E2E for context-menu → delete confirm flow.

The unit test (ContextMenu.keyboard.test.tsx) only exercises the store
setter — it can't catch the portal/race bug from PR #1133 where the
portal-rendered ConfirmDialog was closed by the menu's outside-click
handler before onConfirm fired.

This E2E test covers:
- Right-click workspace node → context menu opens
- Click Delete → ConfirmDialog appears (not swallowed)
- Click Confirm → dialog closes, node disappears, DELETE /workspaces/:id fires
- Click Cancel → dialog closes, node remains

Requires: platform on :8080, canvas on :3000.

Closes #1138.

Co-authored-by: Molecule AI Core-UIUX <core-uiux@agents.moleculesai.app>
2026-04-21 08:11:48 +00:00
molecule-ai[bot]
f2e4f71fee fix(canvas/test): restore waitFor in orgs-page error test + add getState mock (#1341)
Issue #1268: orgs-page error state test — replace vi.advanceTimersByTimeAsync(50)
with waitFor polling. advanceTimersByTimeAsync fires the timer but does not
guarantee React render flush completes before the assertion runs.

Issue #1269: ContextMenu keyboard test — add getState: () => mockStore to
useCanvasStore mock. PR #1243 changed the delete flow to hoist confirmation
to Canvas-level dialog via setPendingDelete, which reads .nodes via
useCanvasStore.getState() — the mock was missing getState.

Also carries forward the Issue #1124 WORKSPACE_ID fail-fast fix from
workspace/ modules (a2a_cli, a2a_client, coordinator, consolidation,
molecule_ai_status) — RuntimeError if WORKSPACE_ID is unset/empty.

Co-authored-by: Molecule AI Core Platform Lead <core-platform-lead@agents.moleculesai.app>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-21 07:52:15 +00:00
molecule-ai[bot]
b21b3d163f fix(canvas): add ?? 0 guard for optional budget_used in progressPct (#1324) (#1327)
* fix(ci): revert cancel-in-progress to true — ubuntu-runner dispatch stalled

With cancel-in-progress: false, pending CI runs accumulate in the
ci-staging concurrency group. New pushes create queued runs, but
GitHub dispatches multiple runs for the same SHA instead of replacing
the pending one. All runs get stuck/cancelled before completing.

Reverting to cancel-in-progress: true restores CI operation — runs
that are superseded are cancelled, freeing the concurrency slot for
the new run to proceed.

Runner availability (ubuntu-latest dispatch stall) is a separate
infra issue tracked independently.

* fix(security): validate tar header names in copyFilesToContainer — CWE-22 path traversal (#1043)

Tar header names were built from raw map keys without validation. A malicious
server-side caller could embed "../" in a file name to escape the destPath
volume mount (/configs) and write files outside the intended directory.

Fix: validate each name with filepath.Clean + IsAbs + HasPrefix("..") checks
before using it in the tar header, then join with destPath for the archive
header. Also guard parent-directory creation against traversal.

Closes #1043.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(canvas/test): patch regressed tests from PR #1243 orgs-page flakiness fix

Two regressions introduced by PR #1243 (fix issue #1207):

1. **ContextMenu.keyboard.test.tsx** — `setPendingDelete` now receives
   `{id, name, hasChildren}` (cascade-delete UX, PR #1252), but the test
   expected only `{id, name}`. Added `hasChildren: false` to the assertion.

2. **orgs-page.test.tsx** — 10 tests awaited `vi.advanceTimersByTimeAsync(50)`
   without `act()`. With fake timers, `setState` (synchronous) is flushed by
   `advanceTimersByTimeAsync`, but the React state update it triggers is a
   microtask — so the test saw stale render. Wrapping in `act(async () =>
   { await vi.advanceTimersByTimeAsync(50); })` ensures microtasks drain
   before assertions run.

All 813 vitest tests pass.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(canvas): add 100px proximity threshold to drag-to-nest detection

Fixes #1052 — previously, getIntersectingNodes() returned any node whose
bounding box overlapped the dragged node, regardless of actual pixel
distance. On a sparse canvas this triggered the "Nest Workspace" dialog
even when the dragged node was nowhere near any target.

The fix adds an on-node-drag proximity filter: only nodes within 100px
(center-to-center) of the dragged node are eligible as nest targets.
Distance is computed as squared Euclidean to avoid the sqrt overhead in
the hot drag path.

Added two tests to Canvas.pan-to-node.test.tsx covering the mock wiring
and confirming the regression is addressed in Canvas.tsx.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(canvas): add ?? 0 guard for optional budget_used in progressPct

Fixes #1324 — TypeScript strict mode flags budget.budget_used as
possibly undefined in the progressPct ternary, even though the
outer condition checks budget_limit > 0.

Fix: use nullish coalescing (budget_used ?? 0) so progress shows 0%
when the backend returns a partial shape (provisioning-stuck
workspaces). Also adds a test covering the undefined-budget_used
case with the progress bar aria-valuenow and fill width both at 0%.

Closes #1324.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: molecule-ai[bot] <276602405+molecule-ai[bot]@users.noreply.github.com>
Co-authored-by: Molecule AI Core-FE <core-fe@agents.moleculesai.app>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-21 07:21:27 +00:00
molecule-ai[bot]
45715aa8a5 fix(canvas/test): patch test regressions from PR #1243 + proximity hitbox fix (#1313)
* fix(ci): revert cancel-in-progress to true — ubuntu-runner dispatch stalled

With cancel-in-progress: false, pending CI runs accumulate in the
ci-staging concurrency group. New pushes create queued runs, but
GitHub dispatches multiple runs for the same SHA instead of replacing
the pending one. All runs get stuck/cancelled before completing.

Reverting to cancel-in-progress: true restores CI operation — runs
that are superseded are cancelled, freeing the concurrency slot for
the new run to proceed.

Runner availability (ubuntu-latest dispatch stall) is a separate
infra issue tracked independently.

* fix(security): validate tar header names in copyFilesToContainer — CWE-22 path traversal (#1043)

Tar header names were built from raw map keys without validation. A malicious
server-side caller could embed "../" in a file name to escape the destPath
volume mount (/configs) and write files outside the intended directory.

Fix: validate each name with filepath.Clean + IsAbs + HasPrefix("..") checks
before using it in the tar header, then join with destPath for the archive
header. Also guard parent-directory creation against traversal.

Closes #1043.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(canvas/test): patch regressed tests from PR #1243 orgs-page flakiness fix

Two regressions introduced by PR #1243 (fix issue #1207):

1. **ContextMenu.keyboard.test.tsx** — `setPendingDelete` now receives
   `{id, name, hasChildren}` (cascade-delete UX, PR #1252), but the test
   expected only `{id, name}`. Added `hasChildren: false` to the assertion.

2. **orgs-page.test.tsx** — 10 tests awaited `vi.advanceTimersByTimeAsync(50)`
   without `act()`. With fake timers, `setState` (synchronous) is flushed by
   `advanceTimersByTimeAsync`, but the React state update it triggers is a
   microtask — so the test saw stale render. Wrapping in `act(async () =>
   { await vi.advanceTimersByTimeAsync(50); })` ensures microtasks drain
   before assertions run.

All 813 vitest tests pass.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(canvas): add 100px proximity threshold to drag-to-nest detection

Fixes #1052 — previously, getIntersectingNodes() returned any node whose
bounding box overlapped the dragged node, regardless of actual pixel
distance. On a sparse canvas this triggered the "Nest Workspace" dialog
even when the dragged node was nowhere near any target.

The fix adds an on-node-drag proximity filter: only nodes within 100px
(center-to-center) of the dragged node are eligible as nest targets.
Distance is computed as squared Euclidean to avoid the sqrt overhead in
the hot drag path.

Added two tests to Canvas.pan-to-node.test.tsx covering the mock wiring
and confirming the regression is addressed in Canvas.tsx.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: molecule-ai[bot] <276602405+molecule-ai[bot]@users.noreply.github.com>
Co-authored-by: Molecule AI Core-FE <core-fe@agents.moleculesai.app>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-21 07:06:57 +00:00
molecule-ai[bot]
c0d5e528a4 fix(canvas): cascade-delete UX — require checkbox before Delete All (#1314)
* fix(canvas/test): restore test regressions from PR #1243

PR #1243 introduced two regressions in the canvas vitest suite:

1. ContextMenu.keyboard.test.tsx: the setPendingDelete call now
   passes `{hasChildren, id, name}` (not just `{id, name}`). Updated
   the keyboard-a11y test assertion to match the new store shape.

2. orgs-page.test.tsx: mockFetch.mockResolvedValueOnce() returned a
   plain object that didn't match the two-argument (url, options)
   call signature used by the component's fetch wrapper. Switched to
   mockImplementationOnce returning a rejected Promise — matching
   real fetch's rejection contract — and added runAllTimersAsync after
   advanceTimersByTimeAsync(50) to flush React state updates.

54 test files · 813 tests · all passing

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(canvas): replace bounding-box intersection with distance threshold for nest detection

ReactFlow's getIntersectingNodes uses bounding-box overlap detection, which
fires the drag-over state whenever any part of two nodes' position rectangles
overlap — even when the dragged node is far from the target. This made the
"Nest Workspace" dialog appear from large distances.

Fix: scan all nodes on each drag tick and set dragOverNodeId to the closest
node within NEST_PROXIMITY_THRESHOLD (150 px, center-to-center). This matches
the intuitive behavior: nest only when the node is actually dropped near another.

Constants:
- NEST_PROXIMITY_THRESHOLD = 150px (~60% of a collapsed node's width)
- DEFAULT_NODE_WIDTH = 245px (mid-range of min/max node widths)
- DEFAULT_NODE_HEIGHT = 110px

Also removed the unused getIntersectingNodes import (was causing duplicate
identifier error when both onNodeDrag and the zoom handler called useReactFlow
in the same component scope).

Closes #1052.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(canvas): cascade-delete UX — show child count and require checkbox before Delete All

Issue #1137: with ?confirm=true always sent, a single confirmation silently
cascades — a team lead with 20 children gets nuked on one click.

Changes:
- store/canvas.ts: pendingDelete type now includes children: {id, name}[]
- ContextMenu.tsx: passes child list to setPendingDelete on Delete click
- DeleteCascadeConfirmDialog.tsx: new component — shows child names, a
  cascade warning, and requires the operator to tick a checkbox before
  Delete All activates. Disabled by default; only enables after checkbox.
- Canvas.tsx: conditionally renders DeleteCascadeConfirmDialog for
  hasChildren workspaces, or plain ConfirmDialog for leaf workspaces.
  confirmDelete requires cascadeConfirmChecked=true when hasChildren.
- ContextMenu.keyboard.test.tsx: updated setPendingDelete assertion to
  include children:[] (no children in the test fixture).

813 tests pass.

Closes #1137.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Molecule AI Core-UIUX <core-uiux@agents.moleculesai.app>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-21 07:06:45 +00:00
molecule-ai[bot]
04c3bc6eb1 fix(canvas): cascade-delete UX — warn before deleting workspace with children (PR #1252)
- Store: pendingDelete now carries `hasChildren: boolean` (computed from
  nodes.some(parentId === nodeId))
- ContextMenu: passes hasChildren into setPendingDelete
- Canvas: dialog title changes to "Delete Workspace and Children" with
  ⚠️ message when hasChildren; confirms with "Delete All"

Refs: #1137

Co-authored-by: Molecule AI Fullstack (floater) <fullstack-floater@agents.moleculesai.app>
2026-04-21 03:25:12 +00:00
molecule-ai[bot]
221d8b2384 fix(canvas): guard undefined lastErrorRate and period dates in metrics (PR #1250)
- DetailsTab: use `(data.lastErrorRate ?? 0)` instead of bare multiply to
  prevent NaN% when the field is absent on pre-provisioning workspaces.
- WorkspaceUsage: make formatPeriod accept optional start/end strings;
  return "—" for undefined so the usage period shows blank rather than
  "Invalid Date" for provisioning/partial workspaces.

Refs: #1139

Co-authored-by: Molecule AI Fullstack (floater) <fullstack-floater@agents.moleculesai.app>
2026-04-21 03:22:17 +00:00
molecule-ai[bot]
883cb7ebc3 fix(canvas): eliminate flaky timer state between orgs-page tests (#1207) (#1243)
Root cause: tests used try/finally { vi.useRealTimers() / vi.useFakeTimers() }
back-and-forth. When any test's finally-block called vi.useFakeTimers(),
subsequent tests inherited fake timer state causing 50ms real setTimeouts
to not fire and mockFetch to accumulate calls across test boundaries.

Fix: consolidate timer management to beforeEach/afterEach hooks.
- beforeEach: vi.useFakeTimers() — all tests start from known fake state
- afterEach: cleanup() + vi.useRealTimers() — restore real timers for next test
- Individual tests: use vi.advanceTimersByTimeAsync(50) instead of real setTimeout

Also removed duplicate afterEach(cleanup()) and unused waitFor import.

Closes #1207.

Co-authored-by: Molecule AI Core-FE <core-fe@agents.moleculesai.app>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-21 03:07:28 +00:00
molecule-ai[bot]
014295d57f fix(issue-1207): eliminate orgs-page test flakiness (#1235)
* fix(auth): F1094 — requireCallerOwnsOrg reads org_id not created_by (#1200)

Root cause: requireCallerOwnsOrg (org_plugin_allowlist.go:116) was
reading org_api_tokens.created_by to determine caller's org workspace
ID. But created_by is a provenance label ("session", "admin-token",
"org-token:<prefix>") — never a UUID. The equality check
callerOrg != targetOrgID always failed → every org-token caller
got 403 on /orgs/:id/plugins/allowlist routes.

Fix:
- Migration 036: adds org_id UUID column (nullable) to org_api_tokens
  with index. Existing pre-migration tokens get org_id=NULL → deny
  by default (safer than cross-org access).
- orgtoken.Issue: takes new orgID param; stores in org_id column.
- orgtoken.OrgIDByTokenID: new helper reads org_id for a token ID.
  Returns ("", nil) for NULL/unanchored tokens.
- requireCallerOwnsOrg: now calls OrgIDByTokenID instead of reading
  created_by. Pre-migration tokens with org_id=NULL get callerOrg=""
  → denied (safer).
- orgTokenActor (org_tokens.go): returns (createdBy, orgID) pair.
  Token minted via another org token gets its org_id set at mint time.
  Session/ADMIN_TOKEN callers get orgID="".
- orgtoken.Token struct: adds OrgID field for list display.
- orgtoken.List: selects org_id alongside other columns.
- Updated existing tests for new Issue signature.
- Added 10 regression tests covering: happy path, unanchored denial,
  cross-org denial, session bypass, DB error denial.

🤖 Generated with [Claude Code](https://claude.ai/claude-code)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(security): replace err.Error() leaks with prod-safe messages (#1206)

- workspace_provision.go: provisionWorkspace, provisionWorkspaceCP —
  replaced 7 err.Error() calls with "provisioning failed" in both
  Broadcast payloads and last_sample_error DB column. Full error
  preserved in server-side log.Printf.

- plugins_install_pipeline.go: resolveAndStage — replaced 5 err.Error()
  calls with generic messages:
    "invalid plugin source"
    "plugin source not supported"
    "invalid plugin name"
    "staged plugin exceeds size limit"
    "plugin manifest integrity check failed"

Risk mitigated: DB errors (pq: connection refused, pq: deadlock),
OS errors, and internal paths no longer leak in HTTP JSON responses
or WebSocket broadcasts.

Added regression tests (workspace_provision_test.go):
  - TestProvisionWorkspace_NoInternalErrorsInBroadcast
  - TestProvisionWorkspaceCP_NoInternalErrorsInBroadcast
  - TestResolveAndStage_NoInternalErrorsInHTTPErr

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(F1089): log panic-recovery UPDATE errors in scheduler

The panic defer blocks in tick() and fireSchedule() now capture
and log errors from the db.DB.ExecContext call that advances next_run_at
after a panic. Previously, a DB failure during panic recovery was
silent — the log line for the panic itself appeared but any subsequent
UPDATE failure was invisible, risking unnoticed scheduler drift.

context.Background() was already used (F1089 comment in place); this
commit adds the missing error capture + log.Printf on exec failure.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(issue-1207): eliminate orgs-page test flakiness

Three root causes addressed:

1. Duplicate afterEach blocks (lines 97-103) — two identical
   afterEach(() => { cleanup(); }) blocks collapsed to one.

2. Fake-timer isolation gap — if a polling test failed before its
   finally-block ran, vi.useFakeTimers() persisted globally. The next
   non-polling test's setTimeout(50) then hung indefinitely (fake
   timers don't advance without vi.advanceTimersByTime), causing
   waitFor/async timeouts. Fixed by calling vi.useRealTimers()
   unconditionally in beforeEach (guaranteed clean slate) and
   afterEach (even when a test fails before its own finally).

3. mockFetch.callHistory now cleared via mockReset() in beforeEach,
   preventing "expected 2 calls but got N" failures from carry-over
   between polling tests.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Molecule AI Dev Lead <dev-lead@agents.moleculesai.app>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-21 02:47:25 +00:00
molecule-ai[bot]
4e39201d76 fix(canvas): show toast when clipboard API unavailable in ConsoleModal (#1199) (#1231)
Use explicit navigator.clipboard check instead of optional chaining so
the no-op case is handled explicitly. When clipboard API is unavailable
(non-HTTPS context) show a toast: "Copy requires HTTPS — please select
and copy manually". Production is always HTTPS so this only affects
local dev with http:// canvas.

Closes #1199.

Co-authored-by: Molecule AI Core-FE <core-fe@agents.moleculesai.app>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-21 02:45:29 +00:00