Commit Graph

6 Commits

Author SHA1 Message Date
e8f521011f fix(mcp): write delegation activity row so canvas Agent Comms shows task text
Some checks failed
CodeQL / Analyze (${{ matrix.language }}) (go) (pull_request) Successful in 1s
CodeQL / Analyze (${{ matrix.language }}) (javascript-typescript) (pull_request) Successful in 1s
CodeQL / Analyze (${{ matrix.language }}) (python) (pull_request) Successful in 1s
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 5s
CI / Detect changes (pull_request) Successful in 8s
E2E API Smoke Test / detect-changes (pull_request) Successful in 8s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 9s
sop-tier-check / tier-check (pull_request) Failing after 8s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 8s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 12s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 11s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 4s
Harness Replays / detect-changes (pull_request) Successful in 12s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Failing after 27s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 35s
Harness Replays / Harness Replays (pull_request) Failing after 43s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Failing after 53s
CI / Platform (Go) (pull_request) Successful in 3m17s
CI / Canvas (Next.js) (pull_request) Successful in 4m3s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 5m55s
CI / Python Lint & Test (pull_request) Successful in 7m29s
audit-force-merge / audit (pull_request) Successful in 5s
MCP delegate_task and delegate_task_async bypassed the delegation activity
lifecycle entirely — no activity_log row was written for MCP-initiated
delegations. As a result the canvas Agent Comms tab rendered outbound
delegations as bare "Delegation dispatched" events with no task body.

Fix: insert a delegation row (mirroring insertDelegationRow from
delegation.go) before the A2A call so the canvas can show the task text.
The sync tool updates status to 'dispatched' after the HTTP call; the
async tool inserts with 'dispatched' directly (goroutine won't update).

Closes #158.
Closes #49 (partial — addresses the canvas-display gap; full lifecycle
parity requires DelegationWriter extraction, tracked separately).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-09 20:44:06 +00:00
Hongming Wang
d99b3f2aec refactor(handlers): consolidate Notify + MCP send_message_to_user through AgentMessageWriter (RFC #2945 PR-A)
Pre-RFC-#2945 the broadcast + activity_log INSERT for "agent → user
chat" was duplicated across two handlers — activity.go's Notify (HTTP
/notify) and mcp_tools.go's toolSendMessageToUser (MCP tools/call).
The duplication is exactly what produced the reno-stars production
data-loss regression (PR #2944): the persistence-half fix landed for
one handler and silently lagged for the other for months, dropping
every long-form external-agent message on reload.

PR #2944 added the missing INSERT to mcp_tools.go and a forward-
looking AST gate. This PR removes the duplication at the source.

What changes
------------

NEW: workspace-server/internal/handlers/agent_message_writer.go
- AgentMessageWriter struct + NewAgentMessageWriter ctor.
- Send(ctx, workspaceID, message, attachments) error: workspace
  lookup → broadcast WS AGENT_MESSAGE → INSERT activity_logs.
- ErrWorkspaceNotFound for the lookup-miss path so callers can
  return 404 / JSON-RPC error cleanly.
- Best-effort persistence: INSERT failure logs only, returns nil so
  the broadcast success isn't undone (matches previous behavior in
  both call sites — pinned by test).
- Takes events.EventEmitter (interface) so tests can substitute a
  capturing fake without nil-panicking inside hub.Broadcast.

UPDATED: activity.go:Notify
- Replaced ~75 lines of inline broadcast+INSERT with a 12-line
  call to AgentMessageWriter.Send.
- Attachment shape conversion (NotifyAttachment → AgentMessageAttachment)
  is local to the HTTP handler; the writer's API doesn't import the
  HTTP-binding-tagged type.

UPDATED: mcp_tools.go:toolSendMessageToUser
- Replaced ~40 lines (the post-#2944 broadcast+INSERT pair) with a
  6-line call to the writer.
- Attachments is nil today because the MCP tool args don't expose
  attachments yet. When the schema adds it, build the slice and
  pass through; the writer half is ready.

Tests
-----

agent_message_writer_test.go (8 tests, comprehensive):
- TestAgentMessageWriter_Send_Success_NoAttachments — happy path,
  pins JSON `{"result":"hi"}`.
- TestAgentMessageWriter_Send_Success_WithAttachments — pins file
  parts shape (kind=file, file.{uri,name,mimeType,size}). Uses a
  jsonMatcher that decodes + asserts via predicate (tolerant of
  map key ordering, exact on shape).
- TestAgentMessageWriter_Send_WorkspaceNotFound — pins
  ErrWorkspaceNotFound + asserts NO broadcast NO INSERT.
- TestAgentMessageWriter_Send_DBInsertFailureStillReturnsNil — pins
  best-effort persistence contract.
- TestAgentMessageWriter_Send_PreviewTruncation — pins ≤80-char
  preview + ellipsis (Ryan's onboarding-friction report would have
  bloated activity_logs.summary by 2KB without this).
- TestAgentMessageWriter_Send_BroadcastsAgentMessageEvent — pins WS
  event name + payload shape via capturingEmitter.
- TestAgentMessageWriter_Send_OmitsAttachmentsKeyWhenEmpty — pins
  the "no key when nil" wire contract.

The existing AST gate from #2944
(TestAgentMessageBroadcastsArePersisted) still holds: any future
function emitting AGENT_MESSAGE without an INSERT fails the test.
With the writer in place that's now redundant — both producers go
through it — but the gate is cheap to keep as defense-in-depth.

Verified: go vet clean; all writer + caller tests pass; existing
TestNotify_* + TestMCPHandler_SendMessage_* + the AST gate all green.

Refs RFC #2945, PR #2944.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 15:29:42 -07:00
Hongming Wang
cdfc9f743f fix(mcp): persist send_message_to_user pushes to activity_log (reno-stars data loss)
Reported on production tenant reno-stars: an external claude-code agent
(CEO Ryan PC workspace) sent a long-form message via send_message_to_user;
the user saw it live in the chat panel but it vanished after a refresh.
Confirmed via direct production query — the message is NOT in
activity_logs at all (only short test pings around it are persisted).

Root cause: there are TWO server-side handlers for send_message_to_user:

  1. HTTP `/workspaces/:id/notify` (activity.go:Notify) — broadcasts WS
     AND inserts a row into activity_logs. This is the path the
     in-container runtime's tool_send_message_to_user calls.

  2. MCP-bridge `tools/call name=send_message_to_user`
     (mcp_tools.go:toolSendMessageToUser) — broadcasts WS only,
     **never persisted**. This is the path EXTERNAL agents using
     molecule-mcp's send_message_to_user tool route through.

The persistence fix landed for path 1 months ago but was never mirrored
on path 2. External agents — exactly the case in reno-stars/CEO Ryan PC
— have been silently losing every long-form notification on reload.

Fix: mirror the activity.go INSERT shape inside toolSendMessageToUser:

  INSERT INTO activity_logs
    (workspace_id, activity_type, method, summary, response_body, status)
  VALUES ($1, 'a2a_receive', 'notify', $2, $3::jsonb, 'ok')

Same wire shape as /notify so the canvas's chat-history hydration
(`type=a2a_receive&source=canvas`) treats both writers identically.
Errors are log-only — broadcast already succeeded, persistence failure
shouldn't block the tool response (matches /notify behavior; downside
is the same data-loss-on-DB-error risk, surfaced via log.Printf).

Tests
-----

- `TestMCPHandler_SendMessageToUser_PersistsToActivityLog` — pins both
  the workspace-name lookup AND the INSERT shape. Regex-matches
  `'a2a_receive'` + `'notify'` literals so a future refactor that
  changes activity_type or method breaks the test loud, not silently
  re-introducing the data-loss bug.
- Updated newMCPHandler to use newTestBroadcaster() (real ws.Hub) —
  events.NewBroadcaster(nil) crashes inside hub.Broadcast in the
  send_message_to_user path. Same shape every other handler test uses.

Verified `go test ./internal/handlers/ -run TestMCPHandler_SendMessage`
green; full vet clean.

Refs reno-stars production incident 2026-05-05.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 14:47:48 -07:00
Hongming Wang
290e6dfdc3 Memory v2 PR-6: backward-compat shim — legacy tools route to v2
Builds on merged PR-1..5. Adds the bridge that lets legacy
commit_memory / recall_memory tools route through the v2 plugin path
when MEMORY_PLUGIN_URL is wired, otherwise fall through to the
existing DB-backed code unchanged.

What ships:
  * handlers/mcp_tools_memory_legacy_shim.go — translation helpers:
      scopeToWritableNamespace, scopeToReadableNamespaces,
      commitMemoryLegacyShim, recallMemoryLegacyShim,
      namespaceKindToLegacyScope
  * handlers/mcp_tools.go — toolCommitMemory + toolRecallMemory now
    delegate to the shim when memv2 is wired

Translation:
  commit:  LOCAL  → workspace:<self>
           TEAM   → team:<root>     (resolver picks at runtime)
           empty  → defaults to LOCAL (preserves legacy default)
           GLOBAL → still rejected at MCP bridge (C3 preserved)
  recall:  LOCAL  → search restricted to workspace:<self>
           TEAM   → workspace:<self> + team:<root>
           empty  → all readable (matches v2 default behavior)
           GLOBAL → blocked at MCP bridge (C3 preserved)

Response shapes are preserved exactly:
  commit: {"id":"...","scope":"LOCAL"|"TEAM"} — agents see no diff
  recall: [{"id":"...","content":"...","scope":"LOCAL"|...,"created_at":"..."}, ...]
  org-namespace memories get the same [MEMORY id=... scope=ORG ns=...]
  prefix as v2 search; legacy scope label comes back as "GLOBAL"

Operational rollout:
  * Today: MEMORY_PLUGIN_URL unset on most operators → legacy DB path
  * After PR-7 backfill: operators set MEMORY_PLUGIN_URL → all writes
    flow through plugin transparently
  * After PR-8 cutover: dual-write removed, plugin is the only path
  * After PR-9 (~60 days later): legacy tool entries dropped entirely

Coverage: 100% on every helper, 100% on recallMemoryLegacyShim,
94.7% on commitMemoryLegacyShim. The 1 uncovered line is a defensive
guard against a v2-response-parse error that's unreachable when the
v2 tool is operating correctly (it always returns valid JSON).

Edge cases pinned:
  * scope translation for every legacy value + invalid scope
  * resolver error propagation
  * plugin error propagation
  * GLOBAL still blocked
  * default-scope fallback (LOCAL)
  * empty content rejected
  * No-op when v2 unwired (legacy SQL path exercised via sqlmock)
  * org-namespace memory wrap on recall + GLOBAL scope label round-trip
  * No-results returns "No memories found." (legacy message preserved)
2026-05-04 08:01:41 -07:00
Hongming Wang
73464a21dd
fix(restart): support SaaS control-plane provisioner (unblocks Platform Go build too) (#1512)
Squash-merge fix/restart (PR #1512): remove SSRF helpers from a2a_proxy_helpers.go since ssrf.go on main now owns these functions, resolving duplicate symbol build failures. Author: HongmingWang-Rabbit. Approved by molecule-ai. Mergeable, UNSTABLE (likely due to pending head branch changes).
2026-04-21 22:56:01 +00:00
molecule-ai[bot]
64ccf8e179
fix: CWE-78 rm scope, go vet failures, delegation idempotency
* refactor: split 4 oversized handler files into focused sub-files

- org.go (1099 lines) → org.go + org_import.go + org_helpers.go
- mcp.go (1001 lines) → mcp.go + mcp_tools.go
- workspace.go (934 lines) → workspace.go + workspace_crud.go
- a2a_proxy.go (825 lines) → a2a_proxy.go + a2a_proxy_helpers.go

No functional changes — same package, same exports, same tests.
All files stay under 635 lines.

Note: isSafeURL and isPrivateOrMetadataIP are duplicated between
mcp_tools.go and a2a_proxy_helpers.go — this is a pre-existing issue
from the original mcp.go and a2a_proxy.go, not introduced by this split.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat(runtime+scheduler): increment/decrement active_tasks counter (refs #1386)

* docs(tutorials): add Self-Hosted AI Agents guide — Docker, Fly Machines, bare metal

* docs: add Remote Agents feature + Phase 30 blog links to docs index

* docs(marketing): update Phase 30 brief — Action 5 complete, docs/index.md update noted

* docs(api-ref): add workspace file copy API reference (#1281)

Documents TemplatesHandler.copyFilesToContainer (container_files.go):
- Endpoint overview: PUT /workspaces/:id/files/*path
- Parameter descriptions for all four function parameters
- CWE-22 path traversal protection (PRs #1267/1270/1271)
- Defense-in-depth: validateRelPath at handler + archive boundary
- Full error code table (400/404/500)
- curl example with success and path-traversal rejection cases

Also covers: writeViaEphemeral routing, findContainer fallback,
allowed roots allow-list, and related links to platform-api.md.

Co-authored-by: Molecule AI Technical Writer <technical-writer@agents.moleculesai.app>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(security): CWE-78/CWE-22 — block shell injection in deleteViaEphemeral (#1310)

## Summary
Issue #1273: deleteViaEphemeral interpolated filePath directly into
rm command, enabling both shell injection (CWE-78) and path traversal
(CWE-22) attacks.

## Changes
1. Added validateRelPath(filePath) guard before constructing the rm command.
   validateRelPath blocks absolute paths and ".." traversal sequences.
2. Changed Cmd from "/configs/"+filePath (string interpolation) to
   []string{"rm", "-rf", "/configs", filePath} (exec form). This
   eliminates shell injection entirely — filePath is a plain argument,
   never interpreted as shell code.

## Security properties
- validateRelPath: blocks "../" and absolute paths before they reach Docker
- Exec form: filePath cannot inject shell metacharacters even if validation
  is somehow bypassed
- "/configs" as separate arg: rm has exactly two arguments, no room for
  injected args

Closes #1273.

Co-authored-by: Molecule AI Infra-Runtime-BE <infra-runtime-be@agents.moleculesai.app>

* fix(security): backport SSRF defence (CWE-918) to main — isSafeURL in a2a_proxy.go (#1292) (#1302)

* fix(security): backport SSRF defence (CWE-918) to main — isSafeURL in mcp.go and a2a_proxy.go

Issue #1042: 3 CodeQL SSRF findings across mcp.go and a2a_proxy.go.
staging already ships the fix (PRs #1147, #1154 → merged); main did not include it.

- mcp.go: add isSafeURL() + isPrivateOrMetadataIP() helpers; validate
  agentURL before outbound calls in mcpCallTool (line ~529) and
  toolDelegateTaskAsync (line ~607)
- a2a_proxy.go: add identical isSafeURL() + isPrivateOrMetadataIP()
  helpers; call isSafeURL() before dispatchA2A in resolveAgentURL()
  (blocks finding #1 at line 462)
- mcp_test.go: 19 new tests covering all blocked URL patterns:
  file://, ftp://, 127.0.0.1, ::1, 169.254.169.254, 10.x.x.x,
  172.16.x.x, 192.168.x.x, empty hostname, invalid URL,
  isPrivateOrMetadataIP across all private/CGNAT/metadata ranges

1. URL scheme enforcement — http/https only
2. IP literal blocking — loopback, link-local, RFC-1918, CGNAT, doc/test ranges
3. DNS hostname resolution — blocks internal hostnames resolving to private IPs

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(ci-blocker): remove duplicate isSafeURL/isPrivateOrMetadataIP from mcp.go

Issue #1292: PR #1274 duplicated isSafeURL + isPrivateOrMetadataIP in
mcp.go — both functions already exist on main at lines 829 and 876.
Kept the mcp.go definitions (the originals) and removed the 70-line
duplicate appended at end of file. a2a_proxy.go functions are
unchanged — they serve the same purpose via a separate code path.

* fix: remove orphaned commit-text lines from a2a_proxy.go

Three lines from the PR/commit title were accidentally baked into the
file during the rebase from #1274 to #1302, causing a Go syntax error
(a bare string literal at statement level followed by dangling braces).

Deletion restores:
  }
  return agentURL, nil
}

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Molecule AI Infra-Runtime-BE <infra-runtime-be@agents.moleculesai.app>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Molecule AI Core-BE <core-be@agents.moleculesai.app>
Co-authored-by: Molecule AI SDK Lead <sdk-lead@agents.moleculesai.app>

* fix(canvas/test): patch test regressions from PR #1243 + proximity hitbox fix (#1313)

* fix(ci): revert cancel-in-progress to true — ubuntu-runner dispatch stalled

With cancel-in-progress: false, pending CI runs accumulate in the
ci-staging concurrency group. New pushes create queued runs, but
GitHub dispatches multiple runs for the same SHA instead of replacing
the pending one. All runs get stuck/cancelled before completing.

Reverting to cancel-in-progress: true restores CI operation — runs
that are superseded are cancelled, freeing the concurrency slot for
the new run to proceed.

Runner availability (ubuntu-latest dispatch stall) is a separate
infra issue tracked independently.

* fix(security): validate tar header names in copyFilesToContainer — CWE-22 path traversal (#1043)

Tar header names were built from raw map keys without validation. A malicious
server-side caller could embed "../" in a file name to escape the destPath
volume mount (/configs) and write files outside the intended directory.

Fix: validate each name with filepath.Clean + IsAbs + HasPrefix("..") checks
before using it in the tar header, then join with destPath for the archive
header. Also guard parent-directory creation against traversal.

Closes #1043.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(canvas/test): patch regressed tests from PR #1243 orgs-page flakiness fix

Two regressions introduced by PR #1243 (fix issue #1207):

1. **ContextMenu.keyboard.test.tsx** — `setPendingDelete` now receives
   `{id, name, hasChildren}` (cascade-delete UX, PR #1252), but the test
   expected only `{id, name}`. Added `hasChildren: false` to the assertion.

2. **orgs-page.test.tsx** — 10 tests awaited `vi.advanceTimersByTimeAsync(50)`
   without `act()`. With fake timers, `setState` (synchronous) is flushed by
   `advanceTimersByTimeAsync`, but the React state update it triggers is a
   microtask — so the test saw stale render. Wrapping in `act(async () =>
   { await vi.advanceTimersByTimeAsync(50); })` ensures microtasks drain
   before assertions run.

All 813 vitest tests pass.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(canvas): add 100px proximity threshold to drag-to-nest detection

Fixes #1052 — previously, getIntersectingNodes() returned any node whose
bounding box overlapped the dragged node, regardless of actual pixel
distance. On a sparse canvas this triggered the "Nest Workspace" dialog
even when the dragged node was nowhere near any target.

The fix adds an on-node-drag proximity filter: only nodes within 100px
(center-to-center) of the dragged node are eligible as nest targets.
Distance is computed as squared Euclidean to avoid the sqrt overhead in
the hot drag path.

Added two tests to Canvas.pan-to-node.test.tsx covering the mock wiring
and confirming the regression is addressed in Canvas.tsx.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: molecule-ai[bot] <276602405+molecule-ai[bot]@users.noreply.github.com>
Co-authored-by: Molecule AI Core-FE <core-fe@agents.moleculesai.app>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(canvas): add ?? 0 guard for optional budget_used in progressPct (#1324) (#1327)

* fix(ci): revert cancel-in-progress to true — ubuntu-runner dispatch stalled

With cancel-in-progress: false, pending CI runs accumulate in the
ci-staging concurrency group. New pushes create queued runs, but
GitHub dispatches multiple runs for the same SHA instead of replacing
the pending one. All runs get stuck/cancelled before completing.

Reverting to cancel-in-progress: true restores CI operation — runs
that are superseded are cancelled, freeing the concurrency slot for
the new run to proceed.

Runner availability (ubuntu-latest dispatch stall) is a separate
infra issue tracked independently.

* fix(security): validate tar header names in copyFilesToContainer — CWE-22 path traversal (#1043)

Tar header names were built from raw map keys without validation. A malicious
server-side caller could embed "../" in a file name to escape the destPath
volume mount (/configs) and write files outside the intended directory.

Fix: validate each name with filepath.Clean + IsAbs + HasPrefix("..") checks
before using it in the tar header, then join with destPath for the archive
header. Also guard parent-directory creation against traversal.

Closes #1043.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(canvas/test): patch regressed tests from PR #1243 orgs-page flakiness fix

Two regressions introduced by PR #1243 (fix issue #1207):

1. **ContextMenu.keyboard.test.tsx** — `setPendingDelete` now receives
   `{id, name, hasChildren}` (cascade-delete UX, PR #1252), but the test
   expected only `{id, name}`. Added `hasChildren: false` to the assertion.

2. **orgs-page.test.tsx** — 10 tests awaited `vi.advanceTimersByTimeAsync(50)`
   without `act()`. With fake timers, `setState` (synchronous) is flushed by
   `advanceTimersByTimeAsync`, but the React state update it triggers is a
   microtask — so the test saw stale render. Wrapping in `act(async () =>
   { await vi.advanceTimersByTimeAsync(50); })` ensures microtasks drain
   before assertions run.

All 813 vitest tests pass.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(canvas): add 100px proximity threshold to drag-to-nest detection

Fixes #1052 — previously, getIntersectingNodes() returned any node whose
bounding box overlapped the dragged node, regardless of actual pixel
distance. On a sparse canvas this triggered the "Nest Workspace" dialog
even when the dragged node was nowhere near any target.

The fix adds an on-node-drag proximity filter: only nodes within 100px
(center-to-center) of the dragged node are eligible as nest targets.
Distance is computed as squared Euclidean to avoid the sqrt overhead in
the hot drag path.

Added two tests to Canvas.pan-to-node.test.tsx covering the mock wiring
and confirming the regression is addressed in Canvas.tsx.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(canvas): add ?? 0 guard for optional budget_used in progressPct

Fixes #1324 — TypeScript strict mode flags budget.budget_used as
possibly undefined in the progressPct ternary, even though the
outer condition checks budget_limit > 0.

Fix: use nullish coalescing (budget_used ?? 0) so progress shows 0%
when the backend returns a partial shape (provisioning-stuck
workspaces). Also adds a test covering the undefined-budget_used
case with the progress bar aria-valuenow and fill width both at 0%.

Closes #1324.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: molecule-ai[bot] <276602405+molecule-ai[bot]@users.noreply.github.com>
Co-authored-by: Molecule AI Core-FE <core-fe@agents.moleculesai.app>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(canvas): add ?? 0 guard for optional budget_used in progressPct (issue #1324) (#1329)

* fix(ci): revert cancel-in-progress to true — ubuntu-runner dispatch stalled

With cancel-in-progress: false, pending CI runs accumulate in the
ci-staging concurrency group. New pushes create queued runs, but
GitHub dispatches multiple runs for the same SHA instead of replacing
the pending one. All runs get stuck/cancelled before completing.

Reverting to cancel-in-progress: true restores CI operation — runs
that are superseded are cancelled, freeing the concurrency slot for
the new run to proceed.

Runner availability (ubuntu-latest dispatch stall) is a separate
infra issue tracked independently.

* fix(security): validate tar header names in copyFilesToContainer — CWE-22 path traversal (#1043)

Tar header names were built from raw map keys without validation. A malicious
server-side caller could embed "../" in a file name to escape the destPath
volume mount (/configs) and write files outside the intended directory.

Fix: validate each name with filepath.Clean + IsAbs + HasPrefix("..") checks
before using it in the tar header, then join with destPath for the archive
header. Also guard parent-directory creation against traversal.

Closes #1043.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(canvas/test): patch regressed tests from PR #1243 orgs-page flakiness fix

Two regressions introduced by PR #1243 (fix issue #1207):

1. **ContextMenu.keyboard.test.tsx** — `setPendingDelete` now receives
   `{id, name, hasChildren}` (cascade-delete UX, PR #1252), but the test
   expected only `{id, name}`. Added `hasChildren: false` to the assertion.

2. **orgs-page.test.tsx** — 10 tests awaited `vi.advanceTimersByTimeAsync(50)`
   without `act()`. With fake timers, `setState` (synchronous) is flushed by
   `advanceTimersByTimeAsync`, but the React state update it triggers is a
   microtask — so the test saw stale render. Wrapping in `act(async () =>
   { await vi.advanceTimersByTimeAsync(50); })` ensures microtasks drain
   before assertions run.

All 813 vitest tests pass.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(canvas): add 100px proximity threshold to drag-to-nest detection

Fixes #1052 — previously, getIntersectingNodes() returned any node whose
bounding box overlapped the dragged node, regardless of actual pixel
distance. On a sparse canvas this triggered the "Nest Workspace" dialog
even when the dragged node was nowhere near any target.

The fix adds an on-node-drag proximity filter: only nodes within 100px
(center-to-center) of the dragged node are eligible as nest targets.
Distance is computed as squared Euclidean to avoid the sqrt overhead in
the hot drag path.

Added two tests to Canvas.pan-to-node.test.tsx covering the mock wiring
and confirming the regression is addressed in Canvas.tsx.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(canvas): add ?? 0 guard for optional budget_used in progressPct

Fixes #1324 — TypeScript strict mode flags budget.budget_used as
possibly undefined in the progressPct ternary, even though the
outer condition checks budget_limit > 0.

Fix: use nullish coalescing (budget_used ?? 0) so progress shows 0%
when the backend returns a partial shape (provisioning-stuck
workspaces). Also adds a test covering the undefined-budget_used
case with the progress bar aria-valuenow and fill width both at 0%.

Closes #1324.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: molecule-ai[bot] <276602405+molecule-ai[bot]@users.noreply.github.com>
Co-authored-by: Molecule AI Core-FE <core-fe@agents.moleculesai.app>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(platform): unblock SaaS workspace registration end-to-end

Every workspace in the cross-EC2 SaaS provisioning shape was failing
registration, heartbeat, or A2A routing. Four distinct blockers sat
between "EC2 is up" and "agent responds"; three are platform-side and
fixed here (the fourth is in the CP user-data, separate PR).

1. SSRF validator blocked RFC-1918 (registry.go + mcp.go)
   validateAgentURL and isPrivateOrMetadataIP rejected 172.16.0.0/12,
   which contains the AWS default VPC range (172.31.x.x) that every
   sibling workspace EC2 registers from. Registration returned 400 and
   the 10-min provision sweep flipped status to failed. RFC-1918 +
   IPv6 ULA are now gated behind saasMode(); link-local (169.254/16),
   loopback, IPv6 metadata (fe80::/10, ::1), and TEST-NET stay blocked
   unconditionally in both modes.

   saasMode() resolution order:
     1. MOLECULE_DEPLOY_MODE=saas|self-hosted (explicit operator flag)
     2. MOLECULE_ORG_ID presence (legacy implicit signal, kept for
        back-compat so existing deployments don't need a config change)

   isPrivateOrMetadataIP now actually checks IPv6 — previously it
   returned false on any non-IPv4 input, which would let a registered
   [::1] or [fe80::...] URL bypass the SSRF check entirely.

2. Orphan auth-token minting (workspace_provision.go)
   issueAndInjectToken mints a token and stuffs it into
   cfg.ConfigFiles[".auth_token"]. The Docker provisioner writes that
   file into the /configs volume — the CP provisioner ignores it
   (only cfg.EnvVars crosses the wire). Result: live token in DB, no
   plaintext on disk, RegistryHandler.requireWorkspaceToken 401s every
   /registry/register attempt because the workspace is no longer in
   the "no live token → bootstrap-allowed" state. Now no-ops in SaaS
   mode; the register handler already mints on first successful
   register and returns the plaintext in the response body for the
   runtime to persist locally.

   Also removes the redundant wsauth.IssueToken call at the bottom of
   provisionWorkspaceCP, which created the same orphan-token pattern
   a second time.

3. Compaction artefacts (bundle/importer.go, handlers/org_tokens.go,
   scheduler.go, workspace_provision.go)
   Four pre-existing compile errors on main from an earlier session's
   code truncation: missing tuple destructuring on ExecContext /
   redactSecrets / orgTokenActor, missing close-brace in
   Scheduler.fireSchedule's panic recovery. All one-line mechanical
   fixes; without them the binary would not build.

Tests
-----
ssrf_test.go adds:
  * TestSaasMode — covers the env resolution ladder (explicit flag
    wins over legacy signal, case-insensitive, whitespace tolerant)
  * TestIsPrivateOrMetadataIP_SaaSMode — asserts RFC-1918 + IPv6 ULA
    flip to allowed, metadata/loopback/TEST-NET still blocked
  * TestIsPrivateOrMetadataIP_IPv6 — regression guard for the old
    "returns false for all IPv6" behaviour

Follow-up issue for CP-sourced workspace_id attestation will be filed
separately — closes the residual intra-VPC SSRF + token-race windows
the SaaS-mode relaxation introduces.

Verified end-to-end today on workspace 6565a2e0 (hermes runtime, OpenAI
provider) — agent returned "PONG" in 1.4s after register → heartbeat →
A2A proxy → runtime.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(runtime+scheduler): increment/decrement active_tasks + max_concurrent (#1408)

Runtime (shared_runtime.py):
- set_current_task now increments active_tasks on task start, decrements
  on completion (was binary 0/1)
- Counter never goes below 0 (max(0, n-1))
- Pushes heartbeat immediately on BOTH increment and decrement (#1372)

Scheduler (scheduler.go):
- Reads max_concurrent_tasks from DB (default 1, backward compatible)
- Skips cron only when active_tasks >= max_concurrent_tasks (was > 0)
- Leaders can be configured with max_concurrent_tasks > 1 to accept
  A2A delegations while a cron runs

Platform:
- Added max_concurrent_tasks column to workspaces (migration 037)
- Workspace model + list/get queries include the new field
- API exposes max_concurrent_tasks in workspace JSON

Config.yaml support (future): runtime_config.max_concurrent_tasks

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(review): address 3 critical issues from code review

1. BLOCKER: executor_helpers.py now uses increment/decrement too
   (was still binary 0/1, stomping the counter for CLI + SDK executors)

2. BUG: asymmetric getattr defaults fixed — both paths use default 0
   (was 0 on increment, 1 on decrement)

3. UX: current_task preserved when active_tasks > 0 on decrement
   (was clearing task description even when other tasks still running)

4. Scheduler polling loop re-reads max_concurrent_tasks on each poll
   (was using stale value from initial query)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Hongming Wang <hongmingwangrabbit@gmail.com>
Co-authored-by: molecule-ai[bot] <276602405+molecule-ai[bot]@users.noreply.github.com>
Co-authored-by: Molecule AI Technical Writer <technical-writer@agents.moleculesai.app>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Molecule AI Infra-Runtime-BE <infra-runtime-be@agents.moleculesai.app>
Co-authored-by: Molecule AI Core-BE <core-be@agents.moleculesai.app>
Co-authored-by: Molecule AI SDK Lead <sdk-lead@agents.moleculesai.app>
Co-authored-by: Molecule AI Core-FE <core-fe@agents.moleculesai.app>
Co-authored-by: Hongming Wang <hongmingwang.rabbit@users.noreply.github.com>

* docs: workspace files API reference, skill catalog, and links

* docs: fix secrets endpoint path across docs

The workspace secrets endpoint is `/workspaces/:id/secrets`, not
`/secrets/values`. This was wrong in quickstart.md (Path 2: Remote Agent)
and workspace-runtime.md (registration flow example and comparison table).
The external-agent-registration guide already had the correct path.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* docs: fix broken blog cross-link in skills-vs-bundled-tools post

Link path had an extra `/docs/` segment: `/docs/blog/...` instead of
`/blog/...`. Nextra resolves blog posts directly under `/blog/<slug>`,
not under `/docs/blog/`.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* docs: add skill-catalog.md guide

Linked from the skills-vs-bundled-tools blog post as a reference
for TTS/image-generation/web-search skills. The blog promises
"install directly via the CLI" with a skill catalog — this page
fills that promise by documenting available skill types, install
commands, version management, custom skill authoring, and removal.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* docs(marketing): update Phase 30 brief — Action 5 complete, docs/index.md update noted

* docs(api-ref): add workspace file copy API reference

Documents TemplatesHandler.copyFilesToContainer (container_files.go):
- Endpoint overview: PUT /workspaces/:id/files/*path
- Parameter descriptions for all four function parameters
- CWE-22 path traversal protection (PRs #1267/1270/1271)
- Defense-in-depth: validateRelPath at handler + archive boundary
- Full error code table (400/404/500)
- curl example with success and path-traversal rejection cases

Also covers: writeViaEphemeral routing, findContainer fallback,
allowed roots allow-list, and related links to platform-api.md.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Molecule AI Technical Writer <technical-writer@agents.moleculesai.app>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: molecule-ai[bot] <276602405+molecule-ai[bot]@users.noreply.github.com>

* fix(handlers): add saasMode() gating to isPrivateOrMetadataIP in a2a_proxy_helpers.go

Issue #1421 / #1401: PR #1363 (handler split) moved isPrivateOrMetadataIP
into a2a_proxy_helpers.go but kept the OLD pre-SaaS version — it
unconditionally blocks RFC-1918 addresses, regressing the fix in
commits 1125a02 / cf10733.

The A2A proxy path now has the same SaaS-gated logic as registry.go:
- Cloud metadata (169.254/16, fe80::/10, ::1) always blocked in both modes
- RFC-1918 (10/8, 172.16/12, 192.168/16) + IPv6 ULA (fc00::/7) blocked in
  self-hosted, allowed in SaaS cross-EC2 mode
- IPv6 addresses now properly checked (previous version returned false for all)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* docs(marketing): Discord adapter Day 2 Reddit + HN community copy

* fix(tests): supply *events.Broadcaster pointer to captureBroadcaster

Cannot use *captureBroadcaster as *events.Broadcaster when the struct
embeds events.Broadcaster as a value — must initialize as a named field.

Fixes go vet error in workspace_provision_test.go:
  cannot use broadcaster (*captureBroadcaster) as *events.Broadcaster value

* Merge pull request #1429 from fix/canvas-tooltip-clear-timer

Without this, a 400ms setTimeout from onFocus/onMouseEnter that fires
after onBlur will re-show a tooltip the user just dismissed. The
setShow(false) in onBlur closes the tooltip immediately but leaves the
timer pending — Tab-blur followed by timer-fire would re-show it.

Fix: add clearTimeout(timerRef.current) at the top of onBlur, mirroring
the pattern already used in onMouseLeave and onFocus.

Refs: PR #1367 (a11y keyboard support — this was a pre-existing gap)

Co-authored-by: Molecule AI App-FE <app-fe@agents.moleculesai.app>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(canvas/test): add missing children:[] to setPendingDelete expectation (#1426)

PR #1252 (cascade-delete UX) updated setPendingDelete to pass a
children array for cascade-warning rendering. The keyboard-a11y test
assertion was not updated to match.

Test: clicking 'Delete' hoists state to the store and closes the menu

Co-authored-by: Molecule AI Core-QA <core-qa@agents.moleculesai.app>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(canvas/test): add children:[] to setPendingDelete + \&apos; entity fix (closes #1380) (#1427)

* ci: retry — trigger fresh runner allocation

* fix(canvas/test): add children:[] to setPendingDelete assertion

setPendingDelete now includes children:[] (PR #1383 extended the
pendingDelete type). The keyboard accessibility test at line 225 used
exact object matching which omitted the new field, causing a failure
after staging merged #1383.

Issue: #1380

* fix(canvas): replace &apos; HTML entity with straight apostrophe

JSX does not entity-decode &apos; — it renders the literal text
"&apos;" instead of "'".  Found at line 157 (payment confirmed) and
line 321 (empty org list).  Replaced with a straight apostrophe,
which JSX handles correctly.

Ref: issue #1375
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: DevOps Engineer <devops@molecule.ai>
Co-authored-by: Molecule AI Core-UIUX <core-uiux@agents.moleculesai.app>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* Merge pull request #1430 from fix/1421-saas-ssrf-helpers

Issue #1421 / #1401: PR #1363 (handler split) moved isPrivateOrMetadataIP
into a2a_proxy_helpers.go but kept the OLD pre-SaaS version — it
unconditionally blocks RFC-1918 addresses, regressing the fix in
commits 1125a02 / cf10733.

The A2A proxy path now has the same SaaS-gated logic as registry.go:
- Cloud metadata (169.254/16, fe80::/10, ::1) always blocked in both modes
- RFC-1918 (10/8, 172.16/12, 192.168/16) + IPv6 ULA (fc00::/7) blocked in
  self-hosted, allowed in SaaS cross-EC2 mode
- IPv6 addresses now properly checked (previous version returned false for all)

Co-authored-by: Molecule AI Core-BE <core-be@agents.moleculesai.app>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(P0): CWE-22 path traversal in copyFilesToContainer + ContextMenu test

Issue #1434 — CWE-22 Path Traversal Regression:
PR #1280 (dc218212) correctly used cleaned path in tar header.
PR #1363 (e9615af) regressed to using uncleaned `name`.
Fix: use `clean` in filepath.Join AND add defence-in-depth escape check.

Issue #1422 — ContextMenu Test Regression:
PR #1340 expanded pendingDelete store type to include `children:[]`.
Test assertion missing the field — add `children:[]` to match.

Note: ssrf.go created (shared isSafeURL/isPrivateOrMetadataIP) to
prepare for the handler-split refactor fix — current branch has no
build error, but the shared file will prevent regression when PR #1363
is merged. isSafeURL/isPrivateOrMetadataIP retained in both files
for now to avoid breaking callers while the split is finalized.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: resolve 3 go vet failures + add idempotency_key to delegate_task_async

- workspace_provision_test.go: add missing mock := setupTestDB(t) to
  TestSeedInitialMemories_Truncation — mock was referenced but never
  declared, causing "undefined: mock" vet error
- orgtoken/tokens_test.go: discard unused orgID return value with _ in
  Validate call — "declared and not used" vet error
- a2a_tools.py: delegate_task_async now sends idempotency_key (SHA-256
  of workspace_id + task) to POST /workspaces/:id/delegate, fixing
  duplicate task execution when an agent restarts mid-delegation (#1456)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: airenostars <airenostars@gmail.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: molecule-ai[bot] <276602405+molecule-ai[bot]@users.noreply.github.com>
Co-authored-by: Hongming Wang <hongmingwangrabbit@gmail.com>
Co-authored-by: Molecule AI Technical Writer <technical-writer@agents.moleculesai.app>
Co-authored-by: Molecule AI Infra-Runtime-BE <infra-runtime-be@agents.moleculesai.app>
Co-authored-by: Molecule AI Core-BE <core-be@agents.moleculesai.app>
Co-authored-by: Molecule AI SDK Lead <sdk-lead@agents.moleculesai.app>
Co-authored-by: Molecule AI Core-FE <core-fe@agents.moleculesai.app>
Co-authored-by: Hongming Wang <hongmingwang.rabbit@users.noreply.github.com>
Co-authored-by: Molecule AI Community Manager <community-manager@agents.moleculesai.app>
Co-authored-by: Molecule AI App-FE <app-fe@agents.moleculesai.app>
Co-authored-by: Molecule AI Core-QA <core-qa@agents.moleculesai.app>
Co-authored-by: DevOps Engineer <devops@molecule.ai>
Co-authored-by: Molecule AI Core-UIUX <core-uiux@agents.moleculesai.app>
Co-authored-by: Molecule AI Dev Lead <dev-lead@agents.moleculesai.app>
2026-04-21 18:22:30 +00:00