Introduces the UX A/B Lab org template — a 7-agent cell for rapid
landing-page variant generation. The template is also the first
consumer of the new any_of env schema (ANTHROPIC_API_KEY OR
CLAUDE_CODE_OAUTH_TOKEN), so it doubles as an end-to-end fixture
for that feature.
Canvas tree (all claude-code / sonnet):
Design Director
├── UX Researcher
├── Visual Designer
├── React Engineer
├── Deploy Engineer
├── A11y + SEO Auditor ← WCAG AA + canonical/noindex gate
└── Perf Auditor ← Core Web Vitals gate
Template files live in their own standalone repo
(Molecule-AI/molecule-ai-org-template-ux-ab-lab, to be published);
this change adds the manifest.json entry so fresh clones + CI
populate the template via scripts/clone-manifest.sh.
Tests:
- TestOrgTemplate_ClaudeAnyOfAuthPreflight — parses the exact
required_env / recommended_env shape the template ships with
via inline YAML (not on-disk, since org-templates/ is
gitignored in this monorepo) and verifies either member
alternative satisfies the preflight.
SEO safety built into the auditor's system prompt:
- One canonical variant; all others canonicalise to it.
- noindex, follow on non-canonical variants.
- Sitemap contains only the canonical URL.
- No robots.txt disallow (blocked pages can't emit canonical).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Extends the org-import env preflight so a template can declare an
alternative: satisfy ANY one member to pass. Motivated by the
Claude-family node case where either ANTHROPIC_API_KEY or
CLAUDE_CODE_OAUTH_TOKEN unlocks the agent — forcing both was wrong.
Server (workspace-server):
- New EnvRequirement union type with custom YAML + JSON
(un)marshaling. Accepts scalar (strict) or {any_of: [...]} in
both on-disk org.yaml and inline POST /org/import bodies.
- collectOrgEnv now returns []EnvRequirement. Dedups groups by
sorted-member signature. "Strict wins" pruning drops any-of
groups that mention a name already declared strictly (same
tier and cross-tier).
- Import preflight uses EnvRequirement.IsSatisfied — scalar =
exact match, group = any member present.
- Empty any_of: [] rejected at parse time (never-satisfiable).
- 14 handler tests (6 updated for the union shape, 8 new
covering any-of satisfaction, dedup, strict-dominates-group,
cross-tier pruning, invalid-member filtering, YAML round-trip,
and empty-any-of rejection).
Canvas:
- EnvRequirement = string | {any_of: string[]} with envReqMembers,
envReqSatisfied, envReqKey helpers.
- OrgImportPreflightModal renders strict rows and any-of groups
via a new AnyOfEnvGroup sub-component: "Configure any one"
banner, per-member input, ✓-satisfied indicator, and dimmed
siblings once any member is configured so the user can still
switch providers.
- TemplatePalette.OrgTemplate.required_env / recommended_env
retyped to EnvRequirement[]; passthrough to the modal
unchanged.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Commit 5adc8a74 (part of this PR) intentionally made
molecule:fit-deploying-org fire for root-level workspaces too — it
used to only fire for children, which meant a standalone create
didn't center the viewport until the first child arrived ~2s later.
The existing regression test still expected ONLY the
molecule:pan-to-node event for a new root, so it started failing
with "expected length 1, got 2". The product behavior is correct
(centering on the root immediately is better UX); the test was
pinning the old single-dispatch shape.
Fix: assert BOTH events fire, each with the right detail payload,
so a future regression that drops either one (or duplicates) trips
the test. Single-test update, no production code change. 953/953
canvas tests pass locally.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Builds on #2061. Three internally-cohesive sub-features; easiest to
read in order.
## 1. Org-level env preflight
Server
- `OrgTemplate` + `OrgWorkspace` gain `required_env: string[]` and
`recommended_env: string[]` YAML fields.
- `GET /org/templates` walks the tree and returns the tree-union
(deduped, sorted) of both. `collectOrgEnv` dedup prefers required
when the same key is declared at both tiers.
- `POST /org/import` preflights against `global_secrets` WHERE
`octet_length(encrypted_value) > 0` (empty-value rows used to be
counted as "configured" and the per-container preflight still
failed at start time). 412 Precondition Failed + `missing_env`
list when required keys are absent. `force=true` bypasses with
an audit log line. DB lookup failure now returns 500 (was:
silent fall-through that defeated the guard). Env-var NAMES
validated against `^[A-Z][A-Z0-9_]{0,127}$` so a malicious
template can't ship pathological names into the UI or DB.
Canvas
- New `OrgImportPreflightModal`: red "Required" section (blocking)
and yellow "Recommended" section (non-blocking, import stays
enabled, shows live missing-count next to the Import button).
- Per-key password input → `PUT /settings/secrets` → strike-through
on save. Functional `setDrafts` throughout (no stale-closure
clobbers on rapid successive saves). `useEffect` seed keyed on a
sorted-join string signature so a parent re-render with a new
array identity doesn't clobber typed inputs.
- `TemplatePalette.handleImport` branches: zero env declarations →
straight to import; any declarations → fetch configured global
secret keys, open the modal.
Tests (Go): `TestCollectOrgEnv_*` (5) cover union-across-levels,
required-wins-over-recommended (including same-struct), dedup,
empty, invalid-name rejection.
## 2. EmptyState parity with TemplatePalette
The "Deploy your first agent" grid used to call `POST /workspaces`
with no preflight while the sidebar palette ran
`checkDeploySecrets` + `MissingKeysModal` first. Same template
deployed two different ways → first-run users saw containers boot
in `failed` state without guidance. Now both surfaces share one
preflight + modal handshake.
EmptyState's previous `interface Template` dropped `runtime`,
`models`, and `required_env` — silently discarding exactly the
fields the preflight needs. `Template` now lives in
`deploy-preflight.ts` and is imported from there by both surfaces.
## 3. useTemplateDeploy hook
With the preflight + modal wiring now duplicated across
EmptyState + TemplatePalette + (going forward) any third surface,
extracted the pattern into `canvas/src/hooks/useTemplateDeploy.tsx`:
const { deploy, deploying, error, modal } = useTemplateDeploy({
canvasCoords: ..., // optional, default random
onDeployed: (id) => ...,
});
Closes three drift surfaces that the duplication had created:
- `resolveRuntime` id→runtime fallback table (moved to
`deploy-preflight.ts`). EmptyState had a narrower fallback that
would have silently disagreed with the palette on any future id
needing a non-identity mapping.
- `checkDeploySecrets` call signature. One owner.
- `MissingKeysModal` JSX wiring. One owner.
Narrow try/catch around `checkDeploySecrets` so a preflight network
failure clears `deploying` and surfaces via `setError` instead of
stranding the button forever. `modal: ReactNode` (not a
`renderModal()` function) — the previous memoization bought
nothing since consumers called it inline every render. Named
`MissingKeysInfo` interface for the state shape.
## 4. Viewport auto-fit user-pan gate fix
During org deploy the canvas was meant to pan+zoom to follow each
arriving workspace (`molecule:fit-deploying-org` event → debounced
fitView). In practice the fit stayed stuck on wherever the first
fit landed.
Root cause: React Flow v12 fires `onMoveEnd` with a truthy `event`
at the END of a programmatic `fitView` animation. The original
"respect-user-pan" gate stamped `userPannedAtRef` in `onMoveEnd`,
so our own fit completing looked like a user pan, and every
subsequent auto-fit short-circuited for the rest of the deploy.
Fix: stop trusting `onMoveEnd` for user-intent detection. Register
explicit `wheel` + `pointerdown` listeners on `document` with
capture phase and `target.closest('.react-flow__pane')` filter.
Capture-phase immunity to `stopPropagation`; pane-filter rejects
toolbar / modal / side-panel clicks (the old `window` fallback
caught those). `onMoveEnd` simplified to only drive the debounced
viewport save.
Also: fit event dispatched on root arrivals (not just children),
so the canvas centers on the just-landed root immediately instead
of waiting ~2s for the first child. Animation 600ms → 400ms so
successive per-arrival fits don't pile up visually. End-state fit
stays at 1200ms — intentional asymmetry ("settling" vs
"tracking"), documented in code.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Python scoping rule: any name assigned anywhere in a function body
is local for the entire body. The outbound-files block at ~L442
had `from a2a.types import ... Part ...`, which made `Part` a local
name throughout the execute() function. The astream_events loop at
L358 — which runs BEFORE that import — then raised:
UnboundLocalError: cannot access local variable 'Part' where it
is not associated with a value
Every streaming A2A reply died with "Agent error: cannot access
local variable 'Part' where it is not associated with a value"
instead of the actual agent text. 5 tests caught it:
- test_streaming_plain_string_content
- test_streaming_anthropic_content_blocks
- test_non_stream_events_ignored
- test_core_execute_on_chat_model_end_captures_last_ai_message
- test_core_execute_pii_redaction_when_pii_found
Fix: drop `Part` from the function-scope import (it is already
imported at module level on line 42) and leave a comment pinning
the rationale so a future refactor doesn't re-introduce the shadow.
All 43 test_a2a_executor tests pass locally.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Merge origin/staging into fix/canvas-multilevel-layout-ux. 18 files
auto-merged (mostly canvas/tabs/chat and workspace-server handlers
the earlier DIRTY marker was stale relative to current staging).
- Fix 7 test failures surfaced by the merge:
1. Canvas.pan-to-node.test.tsx — mockGetIntersectingNodes was
inferred as vi.fn(() => never[]); mockReturnValueOnce of a node
object failed type check. Explicit return-type annotation.
2. Canvas.pan-to-node.test.tsx + Canvas.a11y.test.tsx — Canvas.tsx
reads deletingIds.size (new multilevel-layout state). Both mock
stores lacked deletingIds; added new Set<string>() to each.
3. canvas-batch-partial-failure.test.ts — makeWS() built a wire-
format WorkspaceData (snake_case, with x/y/uptime_seconds). The
store's node.data is now WorkspaceNodeData (camelCase, no wire-
only fields). Rewrote makeWS to produce WorkspaceNodeData and
updated 5 call-site casts. No assertions changed.
4. ConfigTab.hermes.test.tsx — two tests pinned pre-#2061 behavior
that the PR intentionally inverts:
a. "shows hermes-specific info banner" — RUNTIMES_WITH_OWN_CONFIG
now contains only {"external"}, so the banner is no longer
shown for hermes. Inverted assertion: now pins ABSENCE of
the banner, with a comment noting the inversion.
b. "config.yaml runtime wins over DB" — priority reversed:
DB is now authoritative so the tier-on-node badge matches
the form. Inverted scenario: DB=hermes + yaml=crewai →
form shows hermes. Switched test's DB runtime off langgraph
because the dropdown collapses langgraph into an empty-
valued "default" option that would hide the win signal.
- No production code changed — this commit is staging merge + test
realignment only. 953/953 canvas tests pass. tsc --noEmit clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Session's accumulated UX work across frontend and platform. Reviewable
in four logical sections — diff is large but internally cohesive
(each section fixes a gap the next one depends on).
## Chat attachments — user ↔ agent file round trip
- New POST /workspaces/:id/chat/uploads (multipart, 50 MB total /
25 MB per file, UUID-prefixed storage under
/workspace/.molecule/chat-uploads/).
- New GET /workspaces/:id/chat/download with RFC 6266 filename
escaping and binary-safe io.CopyN streaming.
- Canvas: drag-and-drop onto chat pane, pending-file pills,
per-message attachment chips with fetch+blob download (anchor
navigation can't carry auth headers).
- A2A flow carries FileParts end-to-end; hermes template executor
now consumes attachments via platform helpers.
## Platform attachment helpers (workspace/executor_helpers.py)
Every runtime's executor routes through the same helpers so future
runtimes inherit attachment awareness for free:
- extract_attached_files — resolve workspace:/file:///bare URIs,
reject traversal, skip non-existent.
- build_user_content_with_files — manifest for non-image files,
multi-modal list (text + image_url) for images. Respects
MOLECULE_DISABLE_IMAGE_INLINING for providers whose vision
adapter hangs on base64 payloads (MiniMax M2.7).
- collect_outbound_files — scans agent reply for /workspace/...
paths, stages each into chat-uploads/ (download endpoint
whitelist), emits as FileParts in the A2A response.
- ensure_workspace_writable — called at molecule-runtime startup
so non-root agents can write /workspace without each template
having to chmod in its Dockerfile.
Hermes template executor + langgraph (a2a_executor.py) + claude-code
(claude_sdk_executor.py) all adopt the helpers.
## Model selection & related platform fixes
- PUT /workspaces/:id/model — was 404'ing, so canvas "Save"
silently lost the model choice. Stores into workspace_secrets
(MODEL_PROVIDER), auto-restarts via RestartByID.
- applyRuntimeModelEnv falls back to envVars["MODEL_PROVIDER"]
so Restart propagates the stored model to HERMES_DEFAULT_MODEL
without needing the caller to rehydrate payload.Model.
- ConfigTab Tier dropdown now reads from workspaces row, not the
(stale) config.yaml — fixes "badge shows T3, form shows T2".
## ChatTab & WebSocket UX fixes
- Send button no longer locks after a dropped TASK_COMPLETE —
`sending` no longer initializes from data.currentTask.
- A2A POST timeout 15 s → 120 s. LLM turns routinely exceed 15 s;
the previous default aborted fetches while the server was still
replying, producing "agent may be unreachable" on success.
- socket.ts: disposed flag + reconnectTimer cancellation + handler
detachment fix zombie-WebSocket in React StrictMode.
- Hermes Config tab: RUNTIMES_WITH_OWN_CONFIG drops 'hermes' —
the adaptor's purpose IS the form, banner was contradictory.
- workspace_provision.go auto-recovery: try <runtime>-default AND
bare <runtime> for template path (hermes lives at the bare name).
## Org deploy/delete animation (theme-ready CSS)
- styles/theme-tokens.css — design tokens (durations, easings,
colors). Light theme overrides by setting only the deltas.
- styles/org-deploy.css — animation classes + keyframes, every
value references a token. prefers-reduced-motion respected.
- Canvas projects node.draggable=false onto locked workspaces
(deploying children AND actively-deleting ids) — RF's
authoritative drag lock; useDragHandlers retains a belt-and-
braces check.
- Organ cancel button (red pulse pill on root during deploy)
cascades via existing DELETE /workspaces/:id?confirm=true.
- Auto fit-view after each arrival, debounced 500 ms so rapid
sibling arrivals coalesce into one fit (previous per-event
fit made the viewport lurch continuously).
- Auto-fit respects user-pan — onMoveEnd stamps a user-pan
timestamp only when event !== null (ignores programmatic
fitView) so auto-fits don't self-cancel.
- deletingIds store slice + useOrgDeployState merge gives the
delete flow the same dim + non-draggable treatment as deploy.
- Platform-level classNames.ts shared by canvas-events +
useCanvasViewport (DRY'd 3 copies of split/filter/join).
## Server payload change
- org_import.go WORKSPACE_PROVISIONING broadcast now includes
parent_id + parent-RELATIVE x/y (slotX/slotY) so the canvas
renders the child at the right parent-nested slot without doing
any absolute-position walk. createWorkspaceTree signature gains
relX, relY alongside absX, absY; both call sites updated.
## Tests
- workspace/tests/test_executor_helpers.py — 11 new cases
covering URI resolution (including traversal rejection),
attached-file extraction (both Part shapes), manifest-only
vs multi-modal content, large-image skip, outbound staging,
dedup, and ensure_workspace_writable (chmod 777 + non-root
tolerance).
- workspace-server chat_files_test.go — upload validation,
Content-Disposition escaping, filename sanitisation.
- workspace-server secrets_test.go — SetModel upsert, empty
clears, invalid UUID rejection.
- tests/e2e/test_chat_attachments_e2e.sh — round-trip against
a live hermes workspace.
- tests/e2e/test_chat_attachments_multiruntime_e2e.sh — static
plumbing check + round-trip across hermes/langgraph/claude-code.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Schema-driven ChannelsTab renders no inputs when config_schema is
absent — the test's bare {type, display_name} mock mismatched the
real API shape and every getByLabelText("Bot Token") failed.
Mock now mirrors GET /channels/adapters with the Telegram schema
(bot_token password + chat_id text) so the a11y assertions run
against the actual rendered form.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Prior state: compliance.mode default was "" (fully off) and no template
in the repo set it explicitly — so prompt-injection detection, PII
redaction, and agency-limit checks were silently disabled on every
live workspace, despite the machinery being present in
workspace/builtin_tools/compliance.py.
This was surfaced during a 2026-04-24 review of the A2A inbound path:
a2a_executor.py gates three security checks on
_compliance_cfg.mode == "owasp_agentic"
and default config never matches, so every A2A message skipped all three.
Fix: default is now owasp_agentic + prompt_injection=detect. Detect mode
logs injection attempts as audit events without blocking — no UX cost,
just visibility. Operators who want stricter enforcement set
`prompt_injection: block` per workspace. Operators who genuinely want
compliance fully off can set `mode: ""` (not recommended; documented).
Changes:
- ComplianceConfig.mode default: "" → "owasp_agentic"
- Yaml parser fallback default: "" → "owasp_agentic" (must match dataclass)
- Docstring updated with rationale + opt-out snippet
Tests: 66/66 test_compliance.py + test_a2a_executor.py pass. 19/19
test_config.py pass. The one test asserting compliance_mode == "" is
for the "config load failed" fallback path (different from the default
config path) — correctly unchanged.
Security posture improvement: prompt-injection detection is now always
on for every workspace created after this ships, with zero behavior
change for legitimate inputs. Block mode remains an opt-in when an
operator wants to actively reject injection attempts rather than just
log them.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Lark adapter was already implemented in Go (lark.go — outbound Custom Bot
webhook + inbound Event Subscriptions with constant-time token verify),
but the Canvas connect-form hardcoded a Telegram-shaped pair of inputs
(bot_token + chat_id). Selecting "Lark / Feishu" from the dropdown
silently sent the wrong field names — there was no way to enter a
webhook URL.
Fix: move form shape to the server.
- Add `ConfigField` struct + `ConfigSchema()` method to the
`ChannelAdapter` interface. Each adapter declares its own fields with
label/type/required/sensitive/placeholder/help.
- Implement per-adapter schemas:
- Lark: webhook_url (required+sensitive) + verify_token (optional+sensitive)
- Slack: bot_token/channel_id/webhook_url/username/icon_emoji
- Discord: webhook_url + optional public_key
- Telegram: bot_token + chat_id (unchanged UX, keeps Detect Chats)
- Change `ListAdapters()` to return `[]AdapterInfo` with config_schema
inline. Sorted deterministically by display name so UI ordering is
stable across Go's random map iteration.
- Update the 3 existing `ListAdapters` test sites to struct access.
Canvas (`ChannelsTab.tsx`):
- Replace the two hardcoded bot_token/chat_id inputs with a single
schema-driven `SchemaField` component. Renders one input per field in
the order the adapter returns them.
- Form state becomes `formValues: Record<string,string>` keyed by
`ConfigField.key`. Values reset on platform-switch so stale
Telegram credentials can't leak into a new Lark channel.
- "Detect Chats" stays but only renders for platforms in
`SUPPORTS_DETECT_CHATS` (Telegram only — the only provider with
getUpdates).
- Only schema-known keys are posted in `config`, scrubbing any stale
values from previous platform selections.
Regression tests:
- `TestLark_ConfigSchema` locks in the 2-field Lark contract with the
required/sensitive flags correctly set.
- `TestListAdapters_IncludesLark` confirms registry wiring + schema
survives round-trip through ListAdapters.
Known pre-existing `TestStripPluginMarkers_AwkScript` failure in
internal/handlers is unrelated to this change (verified via stash+test
on clean staging).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Preparation for a "hundreds of runtimes" plugin ecosystem. Keeping the
runtime-specific UX knobs in-line inside ProvisioningTimeout scales badly
— every new runtime would require editing a component, not just adding a
table entry. Other components (create-workspace dialog, workspace card
tooltips, etc.) will want the same runtime metadata.
Changes:
- New file `canvas/src/lib/runtimeProfiles.ts` owns:
* `RuntimeProfile` type — structural shape, every field optional so
new runtimes can partially-fill without breaking consumers.
* `DEFAULT_RUNTIME_PROFILE` — 2-min default floor (docker-fast).
* `RUNTIME_PROFILES` — named overrides (currently: hermes 12 min).
* `WorkspaceRuntimeOverrides` — interface for server-provided
per-workspace overrides, so operators can tune via template
manifest / workspace metadata without a canvas release.
* `getRuntimeProfile()` — resolver with
overrides → profile → default priority.
* `provisionTimeoutForRuntime()` — convenience wrapper.
- `ProvisioningTimeout.tsx` now delegates to the profile module.
`DEFAULT_PROVISION_TIMEOUT_MS` re-exported for legacy test importers.
- Tests: 16/16 (up from 9 before the first fix). Adds pinning for:
* overrides > profile > default priority chain
* "every entry in RUNTIME_PROFILES resolves to a number" contract
* backward-compat export
Adding a new slow runtime is now one table entry in
`canvas/src/lib/runtimeProfiles.ts` with a mandatory `WHY` comment.
Moving to server-driven profiles later is a ~10-line change (the
resolver already threads WorkspaceRuntimeOverrides through).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Hermes workspaces cold-boot in 8-13 min (ripgrep + ffmpeg + node22 +
hermes-agent source build + Playwright + Chromium ~300MB). The canvas's
2-min hardcoded "Provisioning Timeout" warning fired at ~2min and told
users their workspace was "stuck" while it was still mid-install. Users
hit Retry, triggering fresh cold boots and cancelling healthy workspaces.
User-facing symptom (reported 2026-04-24 18:35Z): hermes workspace showed
"has been provisioning for 3m 15s — it may have encountered an issue"
with Retry + Cancel buttons, while the EC2 was installing node_modules.
Fix:
- Keep DEFAULT_PROVISION_TIMEOUT_MS = 120_000 (2min) — correct for fast
docker runtimes (claude-code, langgraph, crewai) where cold boot is
30-90s.
- Add RUNTIME_TIMEOUT_OVERRIDES_MS = { hermes: 720_000 } (12min).
Aligns with tests/e2e/test_staging_full_saas.sh's
PROVISION_TIMEOUT_SECS=900 (15min) so UI warns shortly before the
backend itself gives up.
- New timeoutForRuntime() resolves the base; per-node lookup in the
check-timeouts interval so a mixed batch (1 hermes + 2 langgraph) uses
the right threshold for each.
- timeoutMs prop is now optional. Undefined → per-runtime lookup; a
number → forces a single threshold for every workspace (tests use this
for deterministic behavior).
Tests: 4 new cases pinning the runtime-aware resolution, including a
guard that catches future regressions that would weaken hermes's budget.
Existing tests unchanged (they import DEFAULT_PROVISION_TIMEOUT_MS which
still exports 120_000).
13/13 pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
EmbeddedTeam was defined in WorkspaceNode.tsx but had no call site —
TeamMemberChip (which is called directly) covers the same rendering
responsibility. The function was stranded after a prior refactor and
was flagged by github-code-quality on PR #1989 (merged 2026-04-24T14:09Z
without this cleanup because the token died before push).
Removes 25 lines of dead code. MAX_NESTING_DEPTH is kept — it is used
by TeamMemberChip at line 498.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Ships the monorepo side of molecule-core#1957 (agent identity collapse).
Companion to molecule-ai-plugin-gh-identity (new repo, merged-and-tagged
separately).
Changes:
- manifest.json: add gh-identity plugin to Tier 1 registry
- workspace-server/go.mod: require github.com/Molecule-AI/molecule-ai-plugin-gh-identity
- cmd/server/main.go: build a shared provisionhook.Registry, register
gh-identity first (always), then github-app-auth (gated on GITHUB_APP_ID)
- workspace_provision.go: propagate workspace.Role into
env["MOLECULE_AGENT_ROLE"] before calling the mutator chain, so the
gh-identity plugin can see which agent is booting
- provisionhook/mutator.go: add Registry.Mutators() accessor so
individual-plugin registries can be merged onto a shared one at boot
Boot log gains a line like:
env-mutator chain: [gh-identity github-app-auth]
Effect per workspace:
- env contains MOLECULE_AGENT_ROLE, MOLECULE_OWNER, MOLECULE_ATTRIBUTION_BADGE,
MOLECULE_GH_WRAPPER_B64, MOLECULE_GH_WRAPPER_SHA
- Each workspace template's install.sh can decode + install the wrapper at
/usr/local/bin/gh, intercepting @me assignment and prepending agent
attribution on PR/issue creates
Does not break existing workspaces — absent workspace.role, the plugin is
a no-op. Absent install.sh updates in each template, the env vars are
simply unused.
Follow-up template PRs (hermes, claude-code, langgraph, etc.) each add
~15 lines to install.sh to decode + install the wrapper.
Ref: #1957
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PR #2021 follow-up: add TEST-NET reserved ranges and IPv6 documentation
prefix to validateAgentURL blocklist in all SaaS/self-hosted modes.
RFC 5737 reserves 192.0.2.0/24, 198.51.100.0/24, and 203.0.113.0/24 for
documentation and example code — no production agent has a legitimate
reason to use them. RFC 3849 designates 2001:db8::/32 as the IPv6
documentation prefix. All are blocked unconditionally.
Also adds 8 regression test cases covering each blocked range.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- wsauth_middleware: add missing return after AbortWithStatusJSON in
CanvasOrBearer final else branch (CRITICAL auth bypass)
- restart_template: apply sanitizeRuntime before filepath.Join to
prevent CWE-22 path traversal via dbRuntime field
P0 security: CanvasOrBearer final else branch aborts with 401 but
continues execution to c.Next() — allowing the downstream handler to
overwrite the 401 response. Regression tests added to verify the handler
is not called after AbortWithStatusJSON in both no-cred and wrong-origin
paths.
Confirmed on origin/main @ 69408ab6 and origin/staging @ 6b62391e.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
If asyncio.CancelledError arrived during the heartbeat HTTP push inside
set_current_task() (the increment call), the code raised before entering
the try/finally block in _execute_locked. The finally block never ran,
so active_tasks stayed at 1 forever. Every subsequent heartbeat reported
active_tasks=1, the server saw active_tasks < max_concurrent_tasks as
false (1 < 1), and DrainQueueForWorkspace never fired. Queued A2A
requests were permanently stuck.
Fix: move set_current_task(increment) to be the FIRST statement inside
the try block, not before it. set_current_task's synchronous portion
(heartbeat.active_tasks mutation) still runs unconditionally; only the
optional HTTP push can be cancelled. The finally block now always runs
and always decrements active_tasks back to 0.
Affected executors: claude_sdk_executor, cli_executor, a2a_executor.
hermes_executor is not affected (does not call set_current_task).
Root cause of today's "active_tasks: 1 + queue drain never triggers"
P1 pattern across three workspaces.
All 167 executor tests pass.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Two stalls in cycle 132 traced to the same root cause: activity_logs
INSERTs were wedging on invalid UTF-8 bytes (observed: 0xe2 0x80 0x2e)
and the surrounding DB operations had no deadlines, so a single stuck
transaction blocked wg.Wait() in tick() and stalled the whole scheduler
until a container restart.
Root cause: truncate() did byte-slicing without UTF-8 boundary checks.
A prompt containing U+2026 (`…` = 0xe2 0x80 0xa6) at byte ~197 was
sliced at maxLen-3, producing the trailing fragment 0xe2 0x80 followed
by '.' (0x2e) from the "..." suffix — Postgres rejects this as invalid
UTF-8 for jsonb, holds the transaction open, and the INSERT never
returns.
Fix:
- truncate(): UTF-8 safe — backs up to a rune boundary via utf8.RuneStart
- sanitizeUTF8(): new helper applied to every agent-produced string
before it crosses the DB boundary (prompt, error detail, schedule name)
- dbQueryTimeout = 10s on every scheduler DB call:
- tick() due-schedules query
- capacity-check queries in fireSchedule
- empty-run counter UPDATE / reset
- activity_logs INSERTs (fireSchedule + recordSkipped)
- recordSkipped bookkeeping UPDATE
- Bookkeeping writes use context.Background() parent (F1089 pattern)
so fireTimeout / shutdown cancellation can't silently skip the UPDATE.
Regression tests lock in the 0xe2 0x80 0x2e wedge: truncate() is
verified UTF-8-valid and never produces that byte sequence even when
input contains a multi-byte rune at the cut position.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Root cause of PR #1981 E2E failures (step 7 timeout):
- hermes-agent install from NousResearch (Node 22 tarball + Python
deps from source) + gateway health wait takes 15-25 min on staging
The validateAgentURL function was missing several ranges from the always-
blocked list. In SaaS mode only link-local, loopback, and IPv6 metadata
were blocked — TEST-NET (192.0.2/24, 198.51.100/24, 203.0.113/24),
CGNAT (100.64.0.0/10), IPv4 multicast (224.0.0.0/4), and fc00::/8 (IPv6
ULA non-routable prefix) were allowed through.
These ranges are never valid agent URLs in any deployment:
- TEST-NET (RFC-5737): documentation-only, no real hosts
- CGNAT (RFC-6598): never used as VPC subnets on AWS/GCP/Azure
- IPv4 multicast: never a unicast agent endpoint
- fc00::/8: non-routable prefix (fd00::/8 stays allowed in SaaS mode)
Also tighten the non-SaaS ULA block: instead of blocking fc00::/7 (the
supernet covering both fc00 and fd00), split it into always-blocked
fc00::/8 (above) + non-SaaS-only fd00::/8. This makes the SaaS relaxation
explicit and auditable.
Fixes TestValidateAgentURL_SaaSMode_StillBlocksMetadataEtAl failure.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
PR #1885 introduced a regression: HandleConnect called wsauth.ValidateToken
for any bearer token when X-Workspace-ID ≠ workspaceID. Org-scoped tokens
(org_api_tokens table) are not in workspace_auth_tokens, so ValidateToken
always returned ErrInvalidToken for them → hard 401 for all A2A routing
that uses org tokens.
Fix: if WorkspaceAuth already validated an org token (org_token_id set in
gin context by orgtoken.Validate), skip the workspace_auth_tokens lookup and
trust the X-Workspace-ID claim. Hierarchy enforcement via canCommunicateCheck
is unchanged — org token holders are still subject to the workspace hierarchy.
Workspace-scoped tokens continue to require ValidateToken binding. Invalid
tokens (neither workspace-bound nor org-level) still return 401. This closes
the regression while preserving the KI-005 security property.
Add TestKI005_OrgToken_SkipsValidateToken to terminal_test.go as a regression
guard for this exact path.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Issue #1786: SSRF test gap — inner helpers (isPrivateOrMetadataIP,
validateAgentURL blockedRanges) were tested in isolation but the public
wrappers never called saasMode(), allowing the regression to pass unit
tests while production returned 502 on every A2A call from Docker/VPC
deployments (PR #1785).
Adds integration-level wrapper tests for both functions across all
saasMode() resolution ladder cases:
- SaaS explicit (MOLECULE_DEPLOY_MODE=saas): RFC-1918 + fd00 ULA allowed
- Strict mode (MOLECULE_DEPLOY_MODE=self-hosted): RFC-1918 blocked
- Legacy org-ID fallback (MOLECULE_ORG_ID set, no DEPLOY_MODE):
RFC-1918 + fd00 ULA allowed
- Always-blocked ranges (metadata, loopback, TEST-NET, CGNAT, fc00 ULA)
stay blocked in every mode
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Ships the monorepo side of molecule-core#1957 (agent identity collapse).
Companion to molecule-ai-plugin-gh-identity (new repo, merged-and-tagged
separately).
Changes:
- manifest.json: add gh-identity plugin to Tier 1 registry
- workspace-server/go.mod: require github.com/Molecule-AI/molecule-ai-plugin-gh-identity
- cmd/server/main.go: build a shared provisionhook.Registry, register
gh-identity first (always), then github-app-auth (gated on GITHUB_APP_ID)
- workspace_provision.go: propagate workspace.Role into
env["MOLECULE_AGENT_ROLE"] before calling the mutator chain, so the
gh-identity plugin can see which agent is booting
- provisionhook/mutator.go: add Registry.Mutators() accessor so
individual-plugin registries can be merged onto a shared one at boot
Boot log gains a line like:
env-mutator chain: [gh-identity github-app-auth]
Effect per workspace:
- env contains MOLECULE_AGENT_ROLE, MOLECULE_OWNER, MOLECULE_ATTRIBUTION_BADGE,
MOLECULE_GH_WRAPPER_B64, MOLECULE_GH_WRAPPER_SHA
- Each workspace template's install.sh can decode + install the wrapper at
/usr/local/bin/gh, intercepting @me assignment and prepending agent
attribution on PR/issue creates
Does not break existing workspaces — absent workspace.role, the plugin is
a no-op. Absent install.sh updates in each template, the env vars are
simply unused.
Follow-up template PRs (hermes, claude-code, langgraph, etc.) each add
~15 lines to install.sh to decode + install the wrapper.
Ref: #1957
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Root cause of PR #1981 E2E failures (step 7 timeout):
- hermes-agent install from NousResearch (Node 22 tarball + Python
deps from source) + gateway health wait takes 15-25 min on staging
- install.sh runs BEFORE molecule-runtime launches, blocking heartbeats
- bootstrap-watcher fires at 5 min (cp#245) → workspace=failed
- workspace never recovers because molecule-runtime never starts in time
Fix: increase WS_DEADLINE from 1200s (20 min) to 1800s (30 min) to
give hermes cold-boot enough runway. Also bump job timeout-minutes
from 30 → 45 to accommodate the longer wait.
Medium-term: fix cp#245 (bootstrap-watcher hermes deadline too short)
in molecule-controlplane to reduce false-failed noise.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>