Schema-driven ChannelsTab renders no inputs when config_schema is
absent — the test's bare {type, display_name} mock mismatched the
real API shape and every getByLabelText("Bot Token") failed.
Mock now mirrors GET /channels/adapters with the Telegram schema
(bot_token password + chat_id text) so the a11y assertions run
against the actual rendered form.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Prior state: compliance.mode default was "" (fully off) and no template
in the repo set it explicitly — so prompt-injection detection, PII
redaction, and agency-limit checks were silently disabled on every
live workspace, despite the machinery being present in
workspace/builtin_tools/compliance.py.
This was surfaced during a 2026-04-24 review of the A2A inbound path:
a2a_executor.py gates three security checks on
_compliance_cfg.mode == "owasp_agentic"
and default config never matches, so every A2A message skipped all three.
Fix: default is now owasp_agentic + prompt_injection=detect. Detect mode
logs injection attempts as audit events without blocking — no UX cost,
just visibility. Operators who want stricter enforcement set
`prompt_injection: block` per workspace. Operators who genuinely want
compliance fully off can set `mode: ""` (not recommended; documented).
Changes:
- ComplianceConfig.mode default: "" → "owasp_agentic"
- Yaml parser fallback default: "" → "owasp_agentic" (must match dataclass)
- Docstring updated with rationale + opt-out snippet
Tests: 66/66 test_compliance.py + test_a2a_executor.py pass. 19/19
test_config.py pass. The one test asserting compliance_mode == "" is
for the "config load failed" fallback path (different from the default
config path) — correctly unchanged.
Security posture improvement: prompt-injection detection is now always
on for every workspace created after this ships, with zero behavior
change for legitimate inputs. Block mode remains an opt-in when an
operator wants to actively reject injection attempts rather than just
log them.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Lark adapter was already implemented in Go (lark.go — outbound Custom Bot
webhook + inbound Event Subscriptions with constant-time token verify),
but the Canvas connect-form hardcoded a Telegram-shaped pair of inputs
(bot_token + chat_id). Selecting "Lark / Feishu" from the dropdown
silently sent the wrong field names — there was no way to enter a
webhook URL.
Fix: move form shape to the server.
- Add `ConfigField` struct + `ConfigSchema()` method to the
`ChannelAdapter` interface. Each adapter declares its own fields with
label/type/required/sensitive/placeholder/help.
- Implement per-adapter schemas:
- Lark: webhook_url (required+sensitive) + verify_token (optional+sensitive)
- Slack: bot_token/channel_id/webhook_url/username/icon_emoji
- Discord: webhook_url + optional public_key
- Telegram: bot_token + chat_id (unchanged UX, keeps Detect Chats)
- Change `ListAdapters()` to return `[]AdapterInfo` with config_schema
inline. Sorted deterministically by display name so UI ordering is
stable across Go's random map iteration.
- Update the 3 existing `ListAdapters` test sites to struct access.
Canvas (`ChannelsTab.tsx`):
- Replace the two hardcoded bot_token/chat_id inputs with a single
schema-driven `SchemaField` component. Renders one input per field in
the order the adapter returns them.
- Form state becomes `formValues: Record<string,string>` keyed by
`ConfigField.key`. Values reset on platform-switch so stale
Telegram credentials can't leak into a new Lark channel.
- "Detect Chats" stays but only renders for platforms in
`SUPPORTS_DETECT_CHATS` (Telegram only — the only provider with
getUpdates).
- Only schema-known keys are posted in `config`, scrubbing any stale
values from previous platform selections.
Regression tests:
- `TestLark_ConfigSchema` locks in the 2-field Lark contract with the
required/sensitive flags correctly set.
- `TestListAdapters_IncludesLark` confirms registry wiring + schema
survives round-trip through ListAdapters.
Known pre-existing `TestStripPluginMarkers_AwkScript` failure in
internal/handlers is unrelated to this change (verified via stash+test
on clean staging).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Preparation for a "hundreds of runtimes" plugin ecosystem. Keeping the
runtime-specific UX knobs in-line inside ProvisioningTimeout scales badly
— every new runtime would require editing a component, not just adding a
table entry. Other components (create-workspace dialog, workspace card
tooltips, etc.) will want the same runtime metadata.
Changes:
- New file `canvas/src/lib/runtimeProfiles.ts` owns:
* `RuntimeProfile` type — structural shape, every field optional so
new runtimes can partially-fill without breaking consumers.
* `DEFAULT_RUNTIME_PROFILE` — 2-min default floor (docker-fast).
* `RUNTIME_PROFILES` — named overrides (currently: hermes 12 min).
* `WorkspaceRuntimeOverrides` — interface for server-provided
per-workspace overrides, so operators can tune via template
manifest / workspace metadata without a canvas release.
* `getRuntimeProfile()` — resolver with
overrides → profile → default priority.
* `provisionTimeoutForRuntime()` — convenience wrapper.
- `ProvisioningTimeout.tsx` now delegates to the profile module.
`DEFAULT_PROVISION_TIMEOUT_MS` re-exported for legacy test importers.
- Tests: 16/16 (up from 9 before the first fix). Adds pinning for:
* overrides > profile > default priority chain
* "every entry in RUNTIME_PROFILES resolves to a number" contract
* backward-compat export
Adding a new slow runtime is now one table entry in
`canvas/src/lib/runtimeProfiles.ts` with a mandatory `WHY` comment.
Moving to server-driven profiles later is a ~10-line change (the
resolver already threads WorkspaceRuntimeOverrides through).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Hermes workspaces cold-boot in 8-13 min (ripgrep + ffmpeg + node22 +
hermes-agent source build + Playwright + Chromium ~300MB). The canvas's
2-min hardcoded "Provisioning Timeout" warning fired at ~2min and told
users their workspace was "stuck" while it was still mid-install. Users
hit Retry, triggering fresh cold boots and cancelling healthy workspaces.
User-facing symptom (reported 2026-04-24 18:35Z): hermes workspace showed
"has been provisioning for 3m 15s — it may have encountered an issue"
with Retry + Cancel buttons, while the EC2 was installing node_modules.
Fix:
- Keep DEFAULT_PROVISION_TIMEOUT_MS = 120_000 (2min) — correct for fast
docker runtimes (claude-code, langgraph, crewai) where cold boot is
30-90s.
- Add RUNTIME_TIMEOUT_OVERRIDES_MS = { hermes: 720_000 } (12min).
Aligns with tests/e2e/test_staging_full_saas.sh's
PROVISION_TIMEOUT_SECS=900 (15min) so UI warns shortly before the
backend itself gives up.
- New timeoutForRuntime() resolves the base; per-node lookup in the
check-timeouts interval so a mixed batch (1 hermes + 2 langgraph) uses
the right threshold for each.
- timeoutMs prop is now optional. Undefined → per-runtime lookup; a
number → forces a single threshold for every workspace (tests use this
for deterministic behavior).
Tests: 4 new cases pinning the runtime-aware resolution, including a
guard that catches future regressions that would weaken hermes's budget.
Existing tests unchanged (they import DEFAULT_PROVISION_TIMEOUT_MS which
still exports 120_000).
13/13 pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
EmbeddedTeam was defined in WorkspaceNode.tsx but had no call site —
TeamMemberChip (which is called directly) covers the same rendering
responsibility. The function was stranded after a prior refactor and
was flagged by github-code-quality on PR #1989 (merged 2026-04-24T14:09Z
without this cleanup because the token died before push).
Removes 25 lines of dead code. MAX_NESTING_DEPTH is kept — it is used
by TeamMemberChip at line 498.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Ships the monorepo side of molecule-core#1957 (agent identity collapse).
Companion to molecule-ai-plugin-gh-identity (new repo, merged-and-tagged
separately).
Changes:
- manifest.json: add gh-identity plugin to Tier 1 registry
- workspace-server/go.mod: require github.com/Molecule-AI/molecule-ai-plugin-gh-identity
- cmd/server/main.go: build a shared provisionhook.Registry, register
gh-identity first (always), then github-app-auth (gated on GITHUB_APP_ID)
- workspace_provision.go: propagate workspace.Role into
env["MOLECULE_AGENT_ROLE"] before calling the mutator chain, so the
gh-identity plugin can see which agent is booting
- provisionhook/mutator.go: add Registry.Mutators() accessor so
individual-plugin registries can be merged onto a shared one at boot
Boot log gains a line like:
env-mutator chain: [gh-identity github-app-auth]
Effect per workspace:
- env contains MOLECULE_AGENT_ROLE, MOLECULE_OWNER, MOLECULE_ATTRIBUTION_BADGE,
MOLECULE_GH_WRAPPER_B64, MOLECULE_GH_WRAPPER_SHA
- Each workspace template's install.sh can decode + install the wrapper at
/usr/local/bin/gh, intercepting @me assignment and prepending agent
attribution on PR/issue creates
Does not break existing workspaces — absent workspace.role, the plugin is
a no-op. Absent install.sh updates in each template, the env vars are
simply unused.
Follow-up template PRs (hermes, claude-code, langgraph, etc.) each add
~15 lines to install.sh to decode + install the wrapper.
Ref: #1957
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PR #2021 follow-up: add TEST-NET reserved ranges and IPv6 documentation
prefix to validateAgentURL blocklist in all SaaS/self-hosted modes.
RFC 5737 reserves 192.0.2.0/24, 198.51.100.0/24, and 203.0.113.0/24 for
documentation and example code — no production agent has a legitimate
reason to use them. RFC 3849 designates 2001:db8::/32 as the IPv6
documentation prefix. All are blocked unconditionally.
Also adds 8 regression test cases covering each blocked range.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- wsauth_middleware: add missing return after AbortWithStatusJSON in
CanvasOrBearer final else branch (CRITICAL auth bypass)
- restart_template: apply sanitizeRuntime before filepath.Join to
prevent CWE-22 path traversal via dbRuntime field
P0 security: CanvasOrBearer final else branch aborts with 401 but
continues execution to c.Next() — allowing the downstream handler to
overwrite the 401 response. Regression tests added to verify the handler
is not called after AbortWithStatusJSON in both no-cred and wrong-origin
paths.
Confirmed on origin/main @ 69408ab6 and origin/staging @ 6b62391e.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
If asyncio.CancelledError arrived during the heartbeat HTTP push inside
set_current_task() (the increment call), the code raised before entering
the try/finally block in _execute_locked. The finally block never ran,
so active_tasks stayed at 1 forever. Every subsequent heartbeat reported
active_tasks=1, the server saw active_tasks < max_concurrent_tasks as
false (1 < 1), and DrainQueueForWorkspace never fired. Queued A2A
requests were permanently stuck.
Fix: move set_current_task(increment) to be the FIRST statement inside
the try block, not before it. set_current_task's synchronous portion
(heartbeat.active_tasks mutation) still runs unconditionally; only the
optional HTTP push can be cancelled. The finally block now always runs
and always decrements active_tasks back to 0.
Affected executors: claude_sdk_executor, cli_executor, a2a_executor.
hermes_executor is not affected (does not call set_current_task).
Root cause of today's "active_tasks: 1 + queue drain never triggers"
P1 pattern across three workspaces.
All 167 executor tests pass.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Two stalls in cycle 132 traced to the same root cause: activity_logs
INSERTs were wedging on invalid UTF-8 bytes (observed: 0xe2 0x80 0x2e)
and the surrounding DB operations had no deadlines, so a single stuck
transaction blocked wg.Wait() in tick() and stalled the whole scheduler
until a container restart.
Root cause: truncate() did byte-slicing without UTF-8 boundary checks.
A prompt containing U+2026 (`…` = 0xe2 0x80 0xa6) at byte ~197 was
sliced at maxLen-3, producing the trailing fragment 0xe2 0x80 followed
by '.' (0x2e) from the "..." suffix — Postgres rejects this as invalid
UTF-8 for jsonb, holds the transaction open, and the INSERT never
returns.
Fix:
- truncate(): UTF-8 safe — backs up to a rune boundary via utf8.RuneStart
- sanitizeUTF8(): new helper applied to every agent-produced string
before it crosses the DB boundary (prompt, error detail, schedule name)
- dbQueryTimeout = 10s on every scheduler DB call:
- tick() due-schedules query
- capacity-check queries in fireSchedule
- empty-run counter UPDATE / reset
- activity_logs INSERTs (fireSchedule + recordSkipped)
- recordSkipped bookkeeping UPDATE
- Bookkeeping writes use context.Background() parent (F1089 pattern)
so fireTimeout / shutdown cancellation can't silently skip the UPDATE.
Regression tests lock in the 0xe2 0x80 0x2e wedge: truncate() is
verified UTF-8-valid and never produces that byte sequence even when
input contains a multi-byte rune at the cut position.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Root cause of PR #1981 E2E failures (step 7 timeout):
- hermes-agent install from NousResearch (Node 22 tarball + Python
deps from source) + gateway health wait takes 15-25 min on staging
The validateAgentURL function was missing several ranges from the always-
blocked list. In SaaS mode only link-local, loopback, and IPv6 metadata
were blocked — TEST-NET (192.0.2/24, 198.51.100/24, 203.0.113/24),
CGNAT (100.64.0.0/10), IPv4 multicast (224.0.0.0/4), and fc00::/8 (IPv6
ULA non-routable prefix) were allowed through.
These ranges are never valid agent URLs in any deployment:
- TEST-NET (RFC-5737): documentation-only, no real hosts
- CGNAT (RFC-6598): never used as VPC subnets on AWS/GCP/Azure
- IPv4 multicast: never a unicast agent endpoint
- fc00::/8: non-routable prefix (fd00::/8 stays allowed in SaaS mode)
Also tighten the non-SaaS ULA block: instead of blocking fc00::/7 (the
supernet covering both fc00 and fd00), split it into always-blocked
fc00::/8 (above) + non-SaaS-only fd00::/8. This makes the SaaS relaxation
explicit and auditable.
Fixes TestValidateAgentURL_SaaSMode_StillBlocksMetadataEtAl failure.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
PR #1885 introduced a regression: HandleConnect called wsauth.ValidateToken
for any bearer token when X-Workspace-ID ≠ workspaceID. Org-scoped tokens
(org_api_tokens table) are not in workspace_auth_tokens, so ValidateToken
always returned ErrInvalidToken for them → hard 401 for all A2A routing
that uses org tokens.
Fix: if WorkspaceAuth already validated an org token (org_token_id set in
gin context by orgtoken.Validate), skip the workspace_auth_tokens lookup and
trust the X-Workspace-ID claim. Hierarchy enforcement via canCommunicateCheck
is unchanged — org token holders are still subject to the workspace hierarchy.
Workspace-scoped tokens continue to require ValidateToken binding. Invalid
tokens (neither workspace-bound nor org-level) still return 401. This closes
the regression while preserving the KI-005 security property.
Add TestKI005_OrgToken_SkipsValidateToken to terminal_test.go as a regression
guard for this exact path.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Issue #1786: SSRF test gap — inner helpers (isPrivateOrMetadataIP,
validateAgentURL blockedRanges) were tested in isolation but the public
wrappers never called saasMode(), allowing the regression to pass unit
tests while production returned 502 on every A2A call from Docker/VPC
deployments (PR #1785).
Adds integration-level wrapper tests for both functions across all
saasMode() resolution ladder cases:
- SaaS explicit (MOLECULE_DEPLOY_MODE=saas): RFC-1918 + fd00 ULA allowed
- Strict mode (MOLECULE_DEPLOY_MODE=self-hosted): RFC-1918 blocked
- Legacy org-ID fallback (MOLECULE_ORG_ID set, no DEPLOY_MODE):
RFC-1918 + fd00 ULA allowed
- Always-blocked ranges (metadata, loopback, TEST-NET, CGNAT, fc00 ULA)
stay blocked in every mode
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Ships the monorepo side of molecule-core#1957 (agent identity collapse).
Companion to molecule-ai-plugin-gh-identity (new repo, merged-and-tagged
separately).
Changes:
- manifest.json: add gh-identity plugin to Tier 1 registry
- workspace-server/go.mod: require github.com/Molecule-AI/molecule-ai-plugin-gh-identity
- cmd/server/main.go: build a shared provisionhook.Registry, register
gh-identity first (always), then github-app-auth (gated on GITHUB_APP_ID)
- workspace_provision.go: propagate workspace.Role into
env["MOLECULE_AGENT_ROLE"] before calling the mutator chain, so the
gh-identity plugin can see which agent is booting
- provisionhook/mutator.go: add Registry.Mutators() accessor so
individual-plugin registries can be merged onto a shared one at boot
Boot log gains a line like:
env-mutator chain: [gh-identity github-app-auth]
Effect per workspace:
- env contains MOLECULE_AGENT_ROLE, MOLECULE_OWNER, MOLECULE_ATTRIBUTION_BADGE,
MOLECULE_GH_WRAPPER_B64, MOLECULE_GH_WRAPPER_SHA
- Each workspace template's install.sh can decode + install the wrapper at
/usr/local/bin/gh, intercepting @me assignment and prepending agent
attribution on PR/issue creates
Does not break existing workspaces — absent workspace.role, the plugin is
a no-op. Absent install.sh updates in each template, the env vars are
simply unused.
Follow-up template PRs (hermes, claude-code, langgraph, etc.) each add
~15 lines to install.sh to decode + install the wrapper.
Ref: #1957
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Root cause of PR #1981 E2E failures (step 7 timeout):
- hermes-agent install from NousResearch (Node 22 tarball + Python
deps from source) + gateway health wait takes 15-25 min on staging
- install.sh runs BEFORE molecule-runtime launches, blocking heartbeats
- bootstrap-watcher fires at 5 min (cp#245) → workspace=failed
- workspace never recovers because molecule-runtime never starts in time
Fix: increase WS_DEADLINE from 1200s (20 min) to 1800s (30 min) to
give hermes cold-boot enough runway. Also bump job timeout-minutes
from 30 → 45 to accommodate the longer wait.
Medium-term: fix cp#245 (bootstrap-watcher hermes deadline too short)
in molecule-controlplane to reduce false-failed noise.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Two changes:
1. a2a_proxy.go: non-2xx agent responses now return a proxyErr so
DrainQueueForWorkspace calls MarkQueueItemFailed (not silently
marking completed). Previously, agent 5xx responses returned
(status, body, nil) and DrainQueueForWorkspace's final fallback
called MarkQueueItemCompleted for anything not 202/proxyErr.
Also extracts error string from JSON response body before
falling back to http.StatusText.
2. a2a_queue_test.go: fixes for broken queue drain tests:
- Switch to QueryMatcherEqual (exact string) from MatchSs (v1.5.2
API: QueryMatcherOption(QueryMatcherEqual))
- Add github.com/Molecule-AI/molecule-monorepo/platform/internal/db import
- drainSetup(t, workspaceID): registers budget-check expectation
via expectQueueBudgetCheck helper; callers call it AFTER
expectDequeueNextOk (DequeueNext runs before proxyA2ARequest)
- drainItem: use NULL CallerID so CanCommunicate is skipped
(avoids needing hierarchy mocks)
- add allowLoopbackForTest() so httptest.Server URLs pass SSRF guard
- Sequential claim-guarding test instead of concurrent goroutine
(sqlmock is not goroutine-safe for ordered expectations)
Also adds the nil-safe error extraction regression tests from
the original PR #2012 test plan.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Extends the skeletal a2a_queue_test.go from PR #1892 with:
- sqlmock-based tests for EnqueueA2A idempotency (ON CONFLICT DO NOTHING)
- Tests for DequeueNext (SELECT FOR UPDATE SKIP LOCKED, FIFO/priority order)
- Tests for MarkQueueItemCompleted and MarkQueueItemFailed (attempt bounding)
- DrainQueueForWorkspace nil-safe error extraction regression test: the
unchecked proxyErr.Response["error"].(string) type assertion in the
original Phase 1 caused a panic when the "error" key was absent or
non-string (GH incident). This test pins the defensive .(string)
guard and the fallback to http.StatusText.
- Priority constant ordering sanity checks.
- extractIdempotencyKey edge cases: malformed JSON, missing fields,
empty messageId, and the successful messageId extraction path.
Uses alicebob/miniredis for Redis setup matching the existing
setupTestRedis pattern in this package.
orgtoken.Validate() runs a synchronous UPDATE org_api_tokens SET
last_used_at after every successful auth scan. Tests were missing the
sqlmock ExpectExec for this call — the code discards the error
(_, _ = ExecContext) so CI passed, but ExpectationsWereMet() could
not detect a regression where the UPDATE was accidentally removed.
Adds strict mock expectations for all four WorkspaceAuth+org-token
test cases: SetsOrgIDContext, OrgIDNULL_DoesNotSetContext,
DBRowScanError_DoesNotPanic, and SetsAllContextKeys.
Fixes: GH#1774
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
TeamMemberChip used MAX_NESTING_DEPTH to cap recursive sub-agent
rendering at depth 3, but the constant was never declared — causing
a TypeScript build error ('Cannot find name MAX_NESTING_DEPTH') that
blocked Canvas CI on PR #1989.
Add the constant above EmbeddedTeam with a doc comment explaining its
purpose (guards against circular parentId cycles + readability cap).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>