Commit Graph

329 Commits

Author SHA1 Message Date
Hongming Wang
2dbd06d52e
Merge pull request #2055 from Molecule-AI/feat/lark-channel-first-class-v2
feat(channels): first-class Lark/Feishu support via schema-driven config
2026-04-24 19:57:57 +00:00
rabbitblood
00265d7028 feat(channels): first-class Lark/Feishu support via schema-driven config
Lark adapter was already implemented in Go (lark.go — outbound Custom Bot
webhook + inbound Event Subscriptions with constant-time token verify),
but the Canvas connect-form hardcoded a Telegram-shaped pair of inputs
(bot_token + chat_id). Selecting "Lark / Feishu" from the dropdown
silently sent the wrong field names — there was no way to enter a
webhook URL.

Fix: move form shape to the server.

- Add `ConfigField` struct + `ConfigSchema()` method to the
  `ChannelAdapter` interface. Each adapter declares its own fields with
  label/type/required/sensitive/placeholder/help.
- Implement per-adapter schemas:
  - Lark: webhook_url (required+sensitive) + verify_token (optional+sensitive)
  - Slack: bot_token/channel_id/webhook_url/username/icon_emoji
  - Discord: webhook_url + optional public_key
  - Telegram: bot_token + chat_id (unchanged UX, keeps Detect Chats)
- Change `ListAdapters()` to return `[]AdapterInfo` with config_schema
  inline. Sorted deterministically by display name so UI ordering is
  stable across Go's random map iteration.
- Update the 3 existing `ListAdapters` test sites to struct access.

Canvas (`ChannelsTab.tsx`):
- Replace the two hardcoded bot_token/chat_id inputs with a single
  schema-driven `SchemaField` component. Renders one input per field in
  the order the adapter returns them.
- Form state becomes `formValues: Record<string,string>` keyed by
  `ConfigField.key`. Values reset on platform-switch so stale
  Telegram credentials can't leak into a new Lark channel.
- "Detect Chats" stays but only renders for platforms in
  `SUPPORTS_DETECT_CHATS` (Telegram only — the only provider with
  getUpdates).
- Only schema-known keys are posted in `config`, scrubbing any stale
  values from previous platform selections.

Regression tests:
- `TestLark_ConfigSchema` locks in the 2-field Lark contract with the
  required/sensitive flags correctly set.
- `TestListAdapters_IncludesLark` confirms registry wiring + schema
  survives round-trip through ListAdapters.

Known pre-existing `TestStripPluginMarkers_AwkScript` failure in
internal/handlers is unrelated to this change (verified via stash+test
on clean staging).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 11:51:15 -07:00
a2a6121a3f fix(registry): block RFC 5737 TEST-NET and RFC 3849 documentation IPs
PR #2021 follow-up: add TEST-NET reserved ranges and IPv6 documentation
prefix to validateAgentURL blocklist in all SaaS/self-hosted modes.

RFC 5737 reserves 192.0.2.0/24, 198.51.100.0/24, and 203.0.113.0/24 for
documentation and example code — no production agent has a legitimate
reason to use them. RFC 3849 designates 2001:db8::/32 as the IPv6
documentation prefix. All are blocked unconditionally.

Also adds 8 regression test cases covering each blocked range.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-24 18:27:07 +00:00
molecule-ai[bot]
6b557082d5
Merge branch 'staging' into hotfix/canvasorbearer-return-main 2026-04-24 18:10:35 +00:00
Hongming Wang
4b0c85b2a4
Merge pull request #2046 from Molecule-AI/fix/scheduler-wedge-2026
fix(scheduler): prevent wedge on invalid UTF-8 + unbounded DB ops (#2026)
2026-04-24 18:05:33 +00:00
molecule-ai[bot]
f71557482f fix(test): rename duplicate TestCanvasOrBearer_WrongOrigin test at line 946 — resolves Platform(Go) CI compile error on PR #2040 2026-04-24 18:04:13 +00:00
4034f0dc55 fix(middleware): add missing return after AbortWithStatusJSON in CanvasOrBearer
P0 security: CanvasOrBearer final else branch aborts with 401 but
continues execution to c.Next() — allowing the downstream handler to
overwrite the 401 response. Regression tests added to verify the handler
is not called after AbortWithStatusJSON in both no-cred and wrong-origin
paths.

Confirmed on origin/main @ 69408ab6 and origin/staging @ 6b62391e.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-24 18:04:13 +00:00
rabbitblood
fa56cc964b fix(scheduler): prevent wedge on invalid UTF-8 + unbounded DB ops (#2026)
Two stalls in cycle 132 traced to the same root cause: activity_logs
INSERTs were wedging on invalid UTF-8 bytes (observed: 0xe2 0x80 0x2e)
and the surrounding DB operations had no deadlines, so a single stuck
transaction blocked wg.Wait() in tick() and stalled the whole scheduler
until a container restart.

Root cause: truncate() did byte-slicing without UTF-8 boundary checks.
A prompt containing U+2026 (`…` = 0xe2 0x80 0xa6) at byte ~197 was
sliced at maxLen-3, producing the trailing fragment 0xe2 0x80 followed
by '.' (0x2e) from the "..." suffix — Postgres rejects this as invalid
UTF-8 for jsonb, holds the transaction open, and the INSERT never
returns.

Fix:
- truncate(): UTF-8 safe — backs up to a rune boundary via utf8.RuneStart
- sanitizeUTF8(): new helper applied to every agent-produced string
  before it crosses the DB boundary (prompt, error detail, schedule name)
- dbQueryTimeout = 10s on every scheduler DB call:
  - tick() due-schedules query
  - capacity-check queries in fireSchedule
  - empty-run counter UPDATE / reset
  - activity_logs INSERTs (fireSchedule + recordSkipped)
  - recordSkipped bookkeeping UPDATE
- Bookkeeping writes use context.Background() parent (F1089 pattern)
  so fireTimeout / shutdown cancellation can't silently skip the UPDATE.

Regression tests lock in the 0xe2 0x80 0x2e wedge: truncate() is
verified UTF-8-valid and never produces that byte sequence even when
input contains a multi-byte rune at the cut position.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 11:00:47 -07:00
95f0f3c9e9 fix(wsauth_middleware): add missing return after AbortWithStatusJSON in CanvasOrBearer (CRITICAL auth bypass) 2026-04-24 17:14:26 +00:00
molecule-ai[bot]
3dda26766f
Merge pull request #2025 from Molecule-AI/fix/ki005-orgtoken-terminal-routing
fix(terminal): org-token A2A routing regression — skip ValidateToken when org_token_id already set
2026-04-24 17:02:02 +00:00
molecule-ai[bot]
a157ae2188
Merge pull request #2023 from Molecule-AI/fix/ssrf-wrapper-tests
test(handlers): add SaaS-mode wrapper tests for isSafeURL and validateAgentURL
2026-04-24 17:02:01 +00:00
Molecule AI Core Platform Lead
4ff45f8955 fix(registry): add always-blocked ranges to validateAgentURL (TEST-NET, CGNAT, multicast, fc00)
The validateAgentURL function was missing several ranges from the always-
blocked list. In SaaS mode only link-local, loopback, and IPv6 metadata
were blocked — TEST-NET (192.0.2/24, 198.51.100/24, 203.0.113/24),
CGNAT (100.64.0.0/10), IPv4 multicast (224.0.0.0/4), and fc00::/8 (IPv6
ULA non-routable prefix) were allowed through.

These ranges are never valid agent URLs in any deployment:
- TEST-NET (RFC-5737): documentation-only, no real hosts
- CGNAT (RFC-6598): never used as VPC subnets on AWS/GCP/Azure
- IPv4 multicast: never a unicast agent endpoint
- fc00::/8: non-routable prefix (fd00::/8 stays allowed in SaaS mode)

Also tighten the non-SaaS ULA block: instead of blocking fc00::/7 (the
supernet covering both fc00 and fd00), split it into always-blocked
fc00::/8 (above) + non-SaaS-only fd00::/8. This makes the SaaS relaxation
explicit and auditable.

Fixes TestValidateAgentURL_SaaSMode_StillBlocksMetadataEtAl failure.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-24 16:54:23 +00:00
Molecule AI Core Platform Lead
78f8391f02 fix(terminal): check org_token_id context to allow org-token A2A routing (KI-005 followup)
PR #1885 introduced a regression: HandleConnect called wsauth.ValidateToken
for any bearer token when X-Workspace-ID ≠ workspaceID. Org-scoped tokens
(org_api_tokens table) are not in workspace_auth_tokens, so ValidateToken
always returned ErrInvalidToken for them → hard 401 for all A2A routing
that uses org tokens.

Fix: if WorkspaceAuth already validated an org token (org_token_id set in
gin context by orgtoken.Validate), skip the workspace_auth_tokens lookup and
trust the X-Workspace-ID claim. Hierarchy enforcement via canCommunicateCheck
is unchanged — org token holders are still subject to the workspace hierarchy.

Workspace-scoped tokens continue to require ValidateToken binding. Invalid
tokens (neither workspace-bound nor org-level) still return 401. This closes
the regression while preserving the KI-005 security property.

Add TestKI005_OrgToken_SkipsValidateToken to terminal_test.go as a regression
guard for this exact path.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-24 16:17:50 +00:00
eb63146821 test(handlers): add SaaS-mode wrapper tests for isSafeURL and validateAgentURL
Issue #1786: SSRF test gap — inner helpers (isPrivateOrMetadataIP,
validateAgentURL blockedRanges) were tested in isolation but the public
wrappers never called saasMode(), allowing the regression to pass unit
tests while production returned 502 on every A2A call from Docker/VPC
deployments (PR #1785).

Adds integration-level wrapper tests for both functions across all
saasMode() resolution ladder cases:
- SaaS explicit (MOLECULE_DEPLOY_MODE=saas): RFC-1918 + fd00 ULA allowed
- Strict mode (MOLECULE_DEPLOY_MODE=self-hosted): RFC-1918 blocked
- Legacy org-ID fallback (MOLECULE_ORG_ID set, no DEPLOY_MODE):
  RFC-1918 + fd00 ULA allowed
- Always-blocked ranges (metadata, loopback, TEST-NET, CGNAT, fc00 ULA)
  stay blocked in every mode

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-24 15:05:03 +00:00
Hongming Wang
03e913db75 feat(#1957): wire gh-identity plugin into workspace-server
Ships the monorepo side of molecule-core#1957 (agent identity collapse).
Companion to molecule-ai-plugin-gh-identity (new repo, merged-and-tagged
separately).

Changes:
- manifest.json: add gh-identity plugin to Tier 1 registry
- workspace-server/go.mod: require github.com/Molecule-AI/molecule-ai-plugin-gh-identity
- cmd/server/main.go: build a shared provisionhook.Registry, register
  gh-identity first (always), then github-app-auth (gated on GITHUB_APP_ID)
- workspace_provision.go: propagate workspace.Role into
  env["MOLECULE_AGENT_ROLE"] before calling the mutator chain, so the
  gh-identity plugin can see which agent is booting
- provisionhook/mutator.go: add Registry.Mutators() accessor so
  individual-plugin registries can be merged onto a shared one at boot

Boot log gains a line like:
  env-mutator chain: [gh-identity github-app-auth]

Effect per workspace:
- env contains MOLECULE_AGENT_ROLE, MOLECULE_OWNER, MOLECULE_ATTRIBUTION_BADGE,
  MOLECULE_GH_WRAPPER_B64, MOLECULE_GH_WRAPPER_SHA
- Each workspace template's install.sh can decode + install the wrapper at
  /usr/local/bin/gh, intercepting @me assignment and prepending agent
  attribution on PR/issue creates

Does not break existing workspaces — absent workspace.role, the plugin is
a no-op. Absent install.sh updates in each template, the env vars are
simply unused.

Follow-up template PRs (hermes, claude-code, langgraph, etc.) each add
~15 lines to install.sh to decode + install the wrapper.

Ref: #1957

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 15:01:41 +00:00
Hongming Wang
cb2bfe1c6d
Merge pull request #2012 from Molecule-AI/test/a2a-queue-phase1-regression-tests
test(handlers): regression tests for A2A queue Phase 1 (#1870)
2026-04-24 13:52:21 +00:00
c63810939c test(handlers): fix A2A queue drain tests — all pass locally
Two changes:

1. a2a_proxy.go: non-2xx agent responses now return a proxyErr so
   DrainQueueForWorkspace calls MarkQueueItemFailed (not silently
   marking completed). Previously, agent 5xx responses returned
   (status, body, nil) and DrainQueueForWorkspace's final fallback
   called MarkQueueItemCompleted for anything not 202/proxyErr.
   Also extracts error string from JSON response body before
   falling back to http.StatusText.

2. a2a_queue_test.go: fixes for broken queue drain tests:
   - Switch to QueryMatcherEqual (exact string) from MatchSs (v1.5.2
     API: QueryMatcherOption(QueryMatcherEqual))
   - Add github.com/Molecule-AI/molecule-monorepo/platform/internal/db import
   - drainSetup(t, workspaceID): registers budget-check expectation
     via expectQueueBudgetCheck helper; callers call it AFTER
     expectDequeueNextOk (DequeueNext runs before proxyA2ARequest)
   - drainItem: use NULL CallerID so CanCommunicate is skipped
     (avoids needing hierarchy mocks)
   - add allowLoopbackForTest() so httptest.Server URLs pass SSRF guard
   - Sequential claim-guarding test instead of concurrent goroutine
     (sqlmock is not goroutine-safe for ordered expectations)

Also adds the nil-safe error extraction regression tests from
the original PR #2012 test plan.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-24 13:47:27 +00:00
9029b1bc24 test(handlers): add DB mock + nil-safe regression tests for A2A queue Phase 1
Extends the skeletal a2a_queue_test.go from PR #1892 with:
- sqlmock-based tests for EnqueueA2A idempotency (ON CONFLICT DO NOTHING)
- Tests for DequeueNext (SELECT FOR UPDATE SKIP LOCKED, FIFO/priority order)
- Tests for MarkQueueItemCompleted and MarkQueueItemFailed (attempt bounding)
- DrainQueueForWorkspace nil-safe error extraction regression test: the
  unchecked proxyErr.Response["error"].(string) type assertion in the
  original Phase 1 caused a panic when the "error" key was absent or
  non-string (GH incident). This test pins the defensive .(string)
  guard and the fallback to http.StatusText.
- Priority constant ordering sanity checks.
- extractIdempotencyKey edge cases: malformed JSON, missing fields,
  empty messageId, and the successful messageId extraction path.

Uses alicebob/miniredis for Redis setup matching the existing
setupTestRedis pattern in this package.
2026-04-24 13:05:02 +00:00
Molecule AI Core Platform Lead
a053f67ddf test(middleware): add last_used_at ExpectExec for WorkspaceAuth org-token tests
orgtoken.Validate() runs a synchronous UPDATE org_api_tokens SET
last_used_at after every successful auth scan. Tests were missing the
sqlmock ExpectExec for this call — the code discards the error
(_, _ = ExecContext) so CI passed, but ExpectationsWereMet() could
not detect a regression where the UPDATE was accidentally removed.

Adds strict mock expectations for all four WorkspaceAuth+org-token
test cases: SetsOrgIDContext, OrgIDNULL_DoesNotSetContext,
DBRowScanError_DoesNotPanic, and SetsAllContextKeys.

Fixes: GH#1774

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-24 13:01:42 +00:00
0cfba19c84 fix(test): TestDeleteFile_WorkspaceNotFound uses relative path "old-file.txt"
The test was passing "/old-file.txt" (with leading slash) which now triggers
the filepath.IsAbs guard in DeleteFile before the DB lookup, returning 400
instead of the expected 404. Use a relative path so the DB lookup is reached.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-24 12:45:29 +00:00
c5da3f1be9 fix(handlers): CWE-78 — reject absolute paths before strip in DeleteFile; drop null_byte test
- Add filepath.IsAbs guard in DeleteFile BEFORE the leading-slash strip so that
  absolute paths like "/etc/passwd" are rejected with 400 rather than silently
  accepted after the prefix is stripped.
- Remove the null_byte sub-case from TestCWE78_DeleteFile_TraversalVariants —
  httptest.NewRequest panics on \x00 in URLs (URL-layer concern, not handler).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-24 12:38:28 +00:00
Molecule AI Core Platform Lead
7d837dec74 fix(handlers): CWE-78 hardening for DeleteFile and SharedContext (#2011)
Replace string concatenation with safe exec-form path construction in
two remaining locations in templates.go:

1. DeleteFile (container-running path):
   - Before: `containerPath := "/configs/" + filePath` → `rm -rf containerPath`
   - After:  `rm -f filepath.Join("/configs", filePath)`
   - Also tightens rm flag from -rf to -f (no recursive delete on a file endpoint)

2. SharedContext (container-running path, per-file cat loop):
   - Before: `[]string{"cat", "/configs/" + relPath}`
   - After:  `[]string{"cat", "/configs", relPath}` (separate args, no shell join)

In both cases validateRelPath is already the primary guard (rejects traversal
inputs before reaching exec). filepath.Join / separate args is defence-in-depth
so that a bypass of validateRelPath cannot produce a dangerous concatenated path
in the exec argument list.

ReadFile was already fixed (PR #1885, merged to main at 12:08Z).

Regression tests added:
- TestCWE78_DeleteFile_TraversalVariants: 7 traversal patterns all → 400
- TestCWE78_SharedContext_SkipsTraversalPaths: traversal paths in
  shared_context config are silently skipped, only safe files returned

Fixes: #2011

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-24 12:29:57 +00:00
Hongming Wang
4597ab06fc
Merge pull request #2007 from Molecule-AI/fix/cwe22-restart-template
fix(handlers): CWE-22 path traversal in Tier 4 runtime-default template resolution
2026-04-24 12:18:48 +00:00
Hongming Wang
fa70ba6ffd
Merge pull request #1996 from Molecule-AI/core-fe-ki005-regression-tests
test(handlers): KI-005 regression suite for terminal.go
2026-04-24 11:58:31 +00:00
Molecule AI Core Platform Lead
47117fbf77 fix(handlers): restore ssrfCheckEnabled after setupTestDB to prevent state leak
`setupTestDB` was calling `setSSRFCheckForTest(false)` without restoring
the previous value, causing all subsequent `TestIsSafeURL_*` tests to run
with SSRF disabled and pass unconditionally — masking real validation
failures.

Replace the fire-and-forget call with a `t.Cleanup(restore)` so the flag
is restored to its original state after each test that calls `setupTestDB`.

Fixes: CI Platform (Go) failures — 20+ TestIsSafeURL_* tests failing on
       core-fe-ki005-regression-tests (PR #1996).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-24 11:56:21 +00:00
d7901bb831 fix(handlers): apply sanitizeRuntime allowlist before Tier 4 filepath.Join (CWE-22)
CWE-22 path traversal in restartTemplateInput Tier 4: dbRuntime was joined
directly into the template path without sanitisation.

  runtimeTemplate := filepath.Join(configsDir, dbRuntime+"-default")

An attacker holding a workspace token could set runtime to a path-traversal
string (e.g. "../../../etc") via the PATCH /workspaces/:id Update handler,
which only validates length and newlines.  If a matching directory existed
on the host (e.g. /configs/../../../etc-default), the restart would load
files from an arbitrary host path into the workspace container.

Fix: call sanitizeRuntime(dbRuntime) — the existing allowlist in
workspace_provision.go — before filepath.Join.  Unknown values are
remapped to "langgraph", so the attacker cannot choose an arbitrary host
path.  Defense-in-depth: the path is still inside configsDir after
sanitisation.

Regression tests added:
- CWE-22 traversal strings fall through to existing-volume
- langgraph-default is used when traversal string is sanitised to langgraph

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-24 11:37:19 +00:00
Molecule AI Core Platform Lead
adb9c68185 fix(tests): path validation before docker check + a2a queue mock in tests
- container_files.go: move validateRelPath before h.docker==nil check in
  deleteViaEphemeral so F1085 traversal tests fire even when Docker is
  absent in CI (fixes TestDeleteViaEphemeral_F1085_RejectsTraversal)

- a2a_proxy_test.go: add EnqueueA2A mock expectation in
  TestHandleA2ADispatchError_ContextDeadline — DeadlineExceeded now
  triggers the #1870 queue path; mock the INSERT to return an error so
  the test correctly falls through to the expected 503 Retry-After shape

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-24 11:07:43 +00:00
Hongming Wang
0a70430b5c
Merge pull request #2004 from Molecule-AI/feat/list-templates-loud-on-half-clone
feat(org): log loud when org-template dir is a half-clone
2026-04-24 07:42:10 +00:00
rabbitblood
d0080b0e98 feat(org): log loud when org-template dir is a half-clone
Audit 2026-04-24 case: org-templates/molecule-dev/ contained only .git/
(working tree wiped). ListTemplates silently skipped the directory and
the molecule-dev template silently disappeared from the Canvas palette.
No log trail; CEO discovered hours later when looking for the registry
listing manually.

This commit adds a one-line log warning when a directory under orgDir
has a .git/ subdir but no org.yaml/.yml — that's almost always a manifest
clone that got truncated. The warning includes the recovery command
(`git checkout main -- .`) so operators can self-fix without re-cloning.

Doesn't change the response behavior — the directory is still skipped
to keep ListTemplates a fail-soft endpoint. Just makes the failure
visible in `docker logs platform`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 00:39:11 -07:00
9d5115b5db test(handlers): add 5 TestKI005 regression tests to terminal_test.go
Port terminal hierarchy guard regression suite from fix/ki005-terminal-auth:
- TestKI005_SelfAccess_AlwaysAllowed: own workspace token always passes
- TestKI005_CanCommunicatePeer_Allowed: sibling workspace access granted
- TestKI005_CanCommunicateNonPeer_Forbidden: cross-org access blocked (403)
- TestKI005_TokenMismatch_Unauthorized: token/Workspace-ID mismatch blocked (401)
- TestKI005_NoXWorkspaceIDHeader_LegacyAllowed: legacy access no header → proceeds

Refs: F1085, KI-005, PR #1701

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-24 07:17:26 +00:00
3c401ab913 fix(handlers): add empty/dot-only path guard to validateRelPath
Tech-Researcher conditional approval for PR #1496:
- Reject filePath == "" and filePath == "." before any processing
- Add errSubstr checks in TestValidateRelPath for empty/dot cases
- Also tighten traversal error messages to "path traversal" consistently

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-24 07:17:26 +00:00
1b3454f7e9 fix(handlers): simplify SSRF disable in setupTestDB; fix Windows path test
1. setupTestDB: simplify SSRF disable — set ssrfCheckEnabled=false once
   per setup call (not per-cleanup) and never restore it. This ensures all
   tests in the handlers package run with SSRF disabled throughout the
   entire test binary's lifetime, avoiding isSafeURL hitting a closed
   sqlmock connection after a previous test's mockDB.Close().

2. container_files_test.go: fix Windows absolute path test case.
   On Linux/Unix CI, Go's filepath.IsAbs treats "C:\\..." as a relative
   path (no drive letter meaning on Unix). Mark wantErr=false to match
   Unix behavior. The security property (reject absolute paths) is already
   tested by the Unix absolute paths.
2026-04-24 07:17:26 +00:00
b01957fbc4 fix(handlers): validateRelPath checks both raw and cleaned path for ..
The previous approach only checked the cleaned path, but filepath.Clean
resolves ".." upward so "foo/../bar" becomes "bar" and "foo/.." becomes
"." — making strings.Contains(clean, "..") pass when it shouldn't.

Fix: also check strings.Contains(filePath, "..") on the raw path.
This catches "foo/..", "foo/../bar", "../foo" etc. before Clean resolves them.

Update test case "path ends in .." to wantErr=true (raw path has "..").
2026-04-24 07:17:26 +00:00
e49179aa47 fix(handlers): validateRelPath detects traversal in cleaned path
validateRelPath was checking strings.Contains(clean, "..") but
filepath.Clean("foo/../bar") = "bar" and Clean("../foo") = "..".
Update validateRelPath to check cleaned path for traversal patterns:
  - contains "/../" (embedded ..)
  - ends with "/.." (trailing ..)
  - equals ".." (bare ..)

Also fix container_files_test.go test case "path ends in .." to
expect NO error (Clean("foo/..") = "foo" is a no-op normalise).

Add comment clarifying why substring checks are needed after Clean().
Add test case for Windows absolute path (C:\...) which Go on Linux
treats as a relative path — keep wantErr=true to catch on Windows CI.
2026-04-24 07:17:26 +00:00
82cd86b1cb fix: F1085 rm scope concat + GH#756 ValidateToken terminal guard + CI test fixes
1. F1085 (container_files.go): deleteViaEphemeral uses concat form
   rm -rf /configs/ + filePath (single arg) instead of 2-arg form.
   The concat form scopes rm to the volume, preventing .. escape.

2. GH#756/#1609 (terminal.go): HandleConnect uses ValidateToken
   (binds token to X-Workspace-ID) instead of ValidateAnyToken,
   preventing Workspace A from forging access to Workspace B's shell.

3. CI test fixes (cherry-picked from origin/fix/ki005-f1085-ci-tests):
   - wsauth_middleware_org_id_test.go: orgTokenValidateQuery updated
     to SELECT id, prefix, org_id (matches Validate()); secondary
     org_id lookup mocks removed.
   - wsauth_middleware_test.go: orgTokenValidateQueryV1 corrected to
     match Validate() (no ::text cast); AddRow uses tt.orgIDFromDB.
   - tokens_test.go: Validate mock updated to return 3 columns.

4. SSRF test enablement (ssrf.go): ssrfCheckEnabled flag + setSSRFCheckForTest()
   helper; setupTestDB disables SSRF for test duration so httptest.Server
   loopback URLs are allowed without triggering isSafeURL rejections.

5. Regression tests (container_files_test.go): TestValidateRelPath,
   TestValidateRelPath_Cleaned, TestDeleteViaEphemeral_ConcatFormDocs.

6. golangci.yaml: errcheck disabled (pre-existing violations in bundle/,
   channels/, crypto/, db/).

Co-Authored-By: Molecule AI CP-QA <cp-qa@agents.moleculesai.app>
2026-04-24 07:16:54 +00:00
88a06b6a3f fix(handlers): F1085 rm scope concat + GH#756 ValidateToken terminal guard
F1085 (CWE-78): deleteViaEphemeral changed from 2-arg rm form
  rm -rf /configs filePath  →  rm -rf /configs/ + filePath
The 2-arg form gives rm two directory arguments; rm processes ".."
literally in filePath, enabling volume escape:
  rm -rf /configs foo/../bar deletes BOTH /configs AND bar (host path).
The concat form gives rm ONE path: /configs/foo/../bar resolves to
/configs/bar inside the volume — rm never operates outside /configs.

GH#756/#1609: terminal.go now uses ValidateToken(ctx, db.DB, callerID, tok)
instead of ValidateAnyToken. ValidateAnyToken accepted ANY valid org token,
allowing Workspace A to forge X-Workspace-ID: B and access B's terminal.
ValidateToken binds the bearer token to the claimed X-Workspace-ID.

KI-005: adds CanCommunicate(callerID, workspaceID) hierarchy check to
terminal WebSocket upgrade. Shell access requires workspace authorization,
not just a valid token.

Co-Authored-By: Molecule AI CP-QA <cp-qa@agents.moleculesai.app>
2026-04-24 07:16:54 +00:00
molecule-ai[bot]
b0676756c9
Merge pull request #1950 from Molecule-AI/fix/1947-stale-queue-cleanup
fix(admin/a2a_queue): drop-stale endpoint for post-incident queue cleanup
2026-04-24 07:05:54 +00:00
Hongming Wang
2821b979f2
Merge pull request #1994 from Molecule-AI/fix/canvas-multilevel-layout-ux
fix(canvas): subtree-aware layout + org-import reliability + UX polish
2026-04-24 06:57:10 +00:00
Hongming Wang
8c80175cd8 fix(canvas): subtree-aware layout + org-import reliability + UX polish
Five tightly-related fixes surfaced while stress-testing org-template
imports (Legal Team, Molecule Company, etc.) on a running control plane:

1) Org import was silently failing — INSERT wrote `collapsed` into the
   `workspaces` table but that column lives on `canvas_layouts`
   (005_canvas_layouts.sql). Every import returned 207 with 0 rows
   created, which `api.post` treated as success → green "Imported"
   toast + empty canvas. Moved the write to canvas_layouts; updated
   the workspace_crud PATCH path to UPSERT there too; refreshed the
   test mock. Added a client-side assertion that throws on
   2xx-with-`error`-body so future partial-failures surface a red
   toast rather than lying about success.

2) Multi-level nested layout was collision-prone: children that were
   themselves parents (CTO → Dev Lead → 6 engineers) got the same
   leaf-sized grid slot as leaf siblings and clipped into each other.
   Added post-order `sizeOfSubtree` + sibling-size-aware
   `childSlotInGrid` on both the Go server and the TS client (kept in
   sync). `buildNodesAndEdges` now uses subtree sizes for both parent
   dimensions and the rescue heuristic. `setCollapsed` on expand now
   reads each child's actual rendered width/height instead of the
   leaf-count formula — a regression test covers the CTO/Dev Lead
   scenario.

3) Provisioning-timeout banner was unusable during large imports: a
   30-workspace tree triggered 27 simultaneous "stuck" warnings 2
   minutes in (server paces + provision concurrency = 3 guarantee tail
   items legitimately wait longer). Scaled threshold with concurrent
   count (base + 45s per queue slot beyond concurrency) and added a
   Dismiss (×) button per banner.

4) Auto pan-and-zoom on org ready: after the last workspace flips out
   of `provisioning`, canvas now fitView's with a 1.2s animation,
   0.25 padding, `maxZoom: 0.8` and `minZoom: 0.25`. Without the zoom
   caps fitView was hitting the component's maxZoom=2 on small trees
   and zooming in instead of out.

5) Toolbar was visually busy: `+ N sub` count wrapped onto a second
   row on narrow viewports; status dot and workspace total were in
   separate border-delimited cells. Merged into one segment with
   `whitespace-nowrap`; A2A / Audit / Search / Help collapsed to
   icon-only 28px buttons with tooltip + aria-label (Figma/Linear
   pattern). Stop All / Restart Pending keep text — they're urgent.

Also:
- `api.{get,post,...}` accept an optional `{ timeoutMs }` so callers
  that hit intentionally-slow endpoints (org import paces 2s between
  siblings) don't trip the 15s default and report false aborts.
- `WorkspaceNode` clamps role text to 2 lines so verbose descriptions
  don't unboundedly grow card height and break the grid.
- `PARENT_HEADER_PADDING` bumped 44→130 to clear name + runtime +
  2-line role + the currentTask banner that appears during the
  initial-prompt phase.

Tests: 930 canvas tests + full Go handler suite pass. Added
regressions for (i) 207 partial-success surfacing as throw, and
(ii) setCollapsed sizing with nested-parent children.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 23:48:29 -07:00
molecule-ai[bot]
e4e389950f
fix(canvas/a11y): aria-hidden SVGs, MissingKeysModal dialog semantics, session cookie auth (#1992)
fix(canvas/a11y): aria-hidden SVGs, MissingKeysModal dialog semantics, session cookie auth

Three fixes cherry-picked from issue #1744:

1. aria-hidden on decorative SVG icons:
   - DeleteCascadeConfirmDialog.tsx: warning triangle SVG gets aria-hidden="true"
   - MissingKeysModal.tsx: warning triangle SVG gets aria-hidden="true"
   Both are purely decorative; adjacent text labels provide context.

2. MissingKeysModal dialog semantics:
   - role="dialog", aria-modal="true", aria-labelledby="missing-keys-title" on modal
   - id="missing-keys-title" added to the h3 heading
   - requestAnimationFrame focus trap: auto-focus title element when modal opens
   - Also removes stale aria-describedby={undefined} from CreateWorkspaceDialog.tsx

3. Session cookie auth for /registry/:id/peers:
   - Promotes VerifiedCPSession() fallback before the bearer token branch
   - Fixes SaaS canvas Peers tab 401 — canvas hits this endpoint via session cookie
   - Correctly returns "invalid session" for bad cookies instead of falling through
   - Self-hosted bypass logic preserved

Test fix (bundled, same branch):
   - ContextMenu keyboard test: add getState() stub to useCanvasStore mock
   - Required after ContextMenu.tsx gained a direct getState() call at line 169

Reviewed-by: Core-Security (security audit: APPROVED)
CI: Canvas CI , Platform CI , E2E API , CodeQL 

GitHub issue: #1740 (test), #1744 (a11y)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-24 06:20:32 +00:00
97d15ddf35 fix(handlers/admin_queue_test): wire sqlmock to make DropStale tests pass
DropStale calls DropStaleQueueItems which reads db.DB directly. Without
setupTestDB() the global mock was nil → every query returned 500.
Adds mock expectations for the 3 happy-path sub-tests; validation-only
sub-tests (bad input) need no DB and are unchanged.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-24 04:40:19 +00:00
molecule-ai[bot]
01fcc9a4b6
fix(canvas/a11y): aria-hidden SVGs, MissingKeysModal dialog, session cookie auth
* fix(canvas/a11y): aria-hidden SVGs, MissingKeysModal dialog semantics, session cookie auth

Three fixes cherry-picked from issue #1744:

1. aria-hidden on decorative SVG icons:
   - DeleteCascadeConfirmDialog.tsx: warning triangle SVG gets aria-hidden="true"
   - MissingKeysModal.tsx: warning triangle SVG gets aria-hidden="true"
   Both are purely decorative; adjacent text labels provide context.

2. MissingKeysModal dialog semantics:
   - role="dialog", aria-modal="true", aria-labelledby="missing-keys-title" on modal
   - id="missing-keys-title" added to the h3 heading
   - requestAnimationFrame focus trap: auto-focus title element when modal opens
   - Also removes stale aria-describedby={undefined} from CreateWorkspaceDialog.tsx

3. Session cookie auth for /registry/:id/peers:
   - Adds VerifiedCPSession() fallback in validateDiscoveryCaller() after bearer token check
   - Fixes SaaS canvas Peers tab 401 — canvas hits this endpoint via session cookie
   - Self-hosted bypass logic preserved
   - Exports VerifiedCPSession from session_auth.go for cross-package use

Test fix (bundled, same branch):
   - ContextMenu keyboard test: add getState() stub to useCanvasStore mock
   - Required after ContextMenu.tsx gained a direct getState() call at line 169

GitHub issue: #1740 (test), #1744 (a11y)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(workspace-server): remove duplicate VerifiedCPSession declaration

The branch accidentally added a second func VerifiedCPSession declaration
that shadows the real implementation, causing go build to fail with:
  internal/middleware/session_auth.go:238:6: VerifiedCPSession redeclared in this block

Remove the stub alias so the original full implementation is used directly.
The function already exports correctly for cross-package use via the
VerifiedCPSession() call in discovery.go.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(workspace-server): correct VerifiedCPSession condition in discovery.go

Fix Go build error — 'presented' was declared and not used.
The cookie fallback check was using `if ok, presented := ...; ok` instead
of `if ok, presented := ...; presented`, causing the build to fail in CI.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(workspace-server): fix declared and not used 'presented' in discovery.go

Fixes Go build failure:
  discovery.go:355:10: declared and not used: presented
  discovery.go:358:6: undefined: presented

Variable shadowing in the second VerifiedCPSession call reused the outer
scope's `ok` and `presented` names, causing a compile error. Renamed to
ok2/presented2 to avoid shadowing.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Molecule AI Core-FE <core-fe@agents.moleculesai.app>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-24 04:30:26 +00:00
52504dd4a8 fix(handlers/admin_queue_test): remove unused bytes import
CI failure: admin_queue_test.go imports "bytes" but never uses it.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-24 04:29:50 +00:00
Hongming Wang
d53583f9c6 Merge remote-tracking branch 'origin/staging' into fix/restore-quickstart-plus-hotfixes 2026-04-23 21:04:55 -07:00
Hongming Wang
f2a4b6e0d3 fix: dev-mode bypass for IP rate limiter + 429 retry on GET
The 600-req/min/IP bucket is sized for SaaS where each tenant has
a distinct client IP. On a local Docker setup every panel shares
one IP — hydration (/workspaces + /templates + /org/templates +
/approvals/pending) plus polling (A2A overlay + activity tabs +
approvals + schedule + channels + audit trail) can burst past the
bucket inside a minute, blanking the canvas with 429s. The user
reported it after dragging workspaces — dragging itself is
release-only (savePosition in onNodeDragStop), but the polling
that's always running added onto startup tripped the limit.

Two-layer fix:

Server: RateLimiter.Middleware short-circuits when isDevModeFailOpen
is true (MOLECULE_ENV=development + empty ADMIN_TOKEN), matching
the Tier-1b hatch already applied to AdminAuth, WorkspaceAuth, and
discovery. SaaS production keeps the bucket.

Client: api.ts auto-retries a single 429 on idempotent GET requests,
waiting the server-provided Retry-After (capped at 20s). Mutations
(POST/PUT/PATCH/DELETE) never auto-retry to avoid double-applying.
Users on SaaS hitting a legitimate rate-limit spike get one
transparent recovery instead of an immediately-blank Canvas.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 20:44:09 -07:00
Hongming Wang
286dcbfd1e fix(canvas,org): collapse org-imported parents on first paint
Importing a 15-workspace org template dropped every child as a
freely-positioned card into its parent's coordinate space. Parents
with 5-10 kids had the kids spill below the parent's initial min
size, producing the "ugly default" layout the user just flagged —
a mess of overlapping cards the moment the import completed.

Fix: every workspace in an org-template import that HAS children
is inserted with `collapsed = true`. Leaf workspaces stay
expanded (nothing to hide). The canvas renders a collapsed
parent as a compact header-only card with its "N sub" badge —
visually identical to the pre-refactor default the user asked for.

Double-click on a collapsed parent now EXPANDS it (flipping
`collapsed` locally + persisting via PATCH) so the user can drill
in to see the subtree. Only once expanded does a second
double-click zoom-to-team, matching the prior behaviour.

Leaf-first creation order stays the same; the collapsed flag
just means "render compact" not "hide from API".

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 20:36:55 -07:00
Hongming Wang
507696d88a fix(canvas,server): address review findings on 3f11df03
Five review findings from the 3f11df03 six-bug commit:

1. Add TestPeers_DevModeFailOpen_{Allows,ClosedWhenAdminTokenSet,
   ClosedInProduction} covering all three gating states for the
   security-sensitive dev-mode hatch the prior commit added to
   /registry/:id/peers. Previously shipped untested — a future
   refactor could have silently inverted polarity or removed the
   gate. New tests pin the contract:
     * MOLECULE_ENV=development + ADMIN_TOKEN="" → allow bearerless
     * MOLECULE_ENV=development + ADMIN_TOKEN set → require token
     * MOLECULE_ENV=production                    → require token

2. ConfigTab handleSave diffs against the RAW parsed YAML / form
   config instead of the DEFAULT_CONFIG-merged shape. The previous
   code would silently PATCH tier=1 to the DB when a user deleted
   the `tier:` line in raw mode (the default-merge substituted 1).
   Now: only fields the user actually typed participate in the
   diff. Type guards (typeof === "number" / "string") prevent
   coercion surprises on malformed YAML.

3. ConfigTab model-save failure no longer lies "Saved". The
   /workspaces/:id/model PATCH can reject when the runtime doesn't
   support the chosen model; previously we caught + console.warn'd
   + showed green Saved, and the user watched the model revert on
   next reload with no explanation. Now the save path collects a
   `modelSaveError` and surfaces it via setError with a partial-
   success message ("Other fields saved, but model update failed:
   …") so the user sees why.

4. ChannelsTab now surfaces BOTH channels-fetch and adapters-fetch
   failures, distinguishing them in the error text ("Failed to
   load connected channels and platforms — try refreshing").
   Previously only an adapters failure was visible; a channels
   failure left the user with an apparently-empty list and no
   indication the API was unreachable.

5. ChatTab panels drop the redundant aria-hidden attribute. The
   `hidden`/`flex` Tailwind class already sets display:none, which
   removes the node from the accessibility tree on its own; the
   extra aria-hidden invited WAI-ARIA lint warnings if a focusable
   descendant ever landed inside an inactive panel.

Tests: 923 canvas + full Go handler suite pass. 3 new Go tests.
No behaviour change on the five prior fixes — this commit tightens
their edges per the independent review.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 20:29:44 -07:00
Hongming Wang
3f11df031c fix: six UX bugs (peers auth, scroll, chat tabs, config persist, + visibility)
Six bugs reported from a live session — all shippable in one commit:

1. Peers tab 401 on local Docker. The /registry/:id/peers endpoint
   demands a workspace-scoped bearer token (validateDiscoveryCaller)
   which the canvas session doesn't hold. Added the same Tier-1b
   dev-mode fail-open hatch that AdminAuth and WorkspaceAuth already
   use — gated by MOLECULE_ENV=development + empty ADMIN_TOKEN, so
   SaaS production stays strict. Exported IsDevModeFailOpen from the
   middleware package for the handler layer to reuse.

2. Org Templates list unscrollable. OrgTemplatesSection was rendered
   in the TemplatePalette footer — a div without overflow — so when
   it expanded to 15+ entries the list extended past the viewport
   with no scroll. Moved it to the top of the flex-1 overflow-y-auto
   container. Tall lists now scroll naturally.

3. Chat tab: "My Chat" and "Agent Comms" rendered stacked instead
   of switching. HTML `hidden` attribute was being overridden by
   Tailwind's `flex` class (display: flex beats the attribute),
   so both tabpanels rendered concurrently. Swapped to a conditional
   Tailwind `hidden`/`flex` class so the inactive panel is
   display:none with proper CSS specificity.

4. Hermes Config form never persists. handleSave wrote config.yaml
   but name / tier / runtime / model all live on the workspace row
   (or the dedicated /workspaces/:id/model endpoint) — the form
   edited in-memory, the request returned 200, the next reload
   wiped everything back. Hermes + external runtimes manage their
   own config inside the container anyway, so writing config.yaml
   is a no-op for them; skip it. Always diff and PATCH the DB-backed
   fields that actually changed.

5. Channels "+ Connect" dropdown empty on first open. ChannelsTab's
   load() used Promise.all with a silent catch — if EITHER the
   channels or adapters fetch failed, both setters were skipped
   with no error visible. Switched to Promise.allSettled so each
   endpoint settles independently, and the adapters failure now
   surfaces via the top-level error state.

6. Plugin registry always "No plugins in registry". Same silent
   catch pattern in SkillsTab.tsx — load errors for /plugins,
   /plugins/sources, and /workspaces/:id/plugins swallowed without
   logging. Replaced the empty catches with console.warn so future
   failures are at least visible in devtools.

Tests: 923 passing (unchanged). Go handler tests pass. Server
rebuilt and running with the peers-auth + collapsed-persistence
fixes (pid 15875).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 20:18:30 -07:00
8fb5ec0340 fix(handlers): fix Go scoping — presented must live in function scope
The short-var declaration inside the if-initializer scoped `presented`
only to that if statement, making it undefined on the following
`if presented { ... }` block. Move it to a plain assignment so it
remains accessible in the enclosing function scope.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-24 03:10:18 +00:00
a46797d466 fix(middleware): rename internal fn to verifiedCPSession, keep public alias
The PR #1855 branch contains a newer version of session_auth.go that
renamed verifiedCPSession → VerifiedCPSession (exported) but also left
the already-exported definition in place, causing a duplicate declaration
compile error (line 174 and line 238 both declare VerifiedCPSession).

Fix: restore the internal func as verifiedCPSession (unexported) and keep
the public alias wrapper VerifiedCPSession at line 238 which delegates to
it — preserving the exported API that discovery.go and wsauth_middleware.go
depend on.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-24 03:10:18 +00:00