Commit Graph

2952 Commits

Author SHA1 Message Date
Hongming Wang
c159d85eb5 fix(a2a): review-driven hardening — prefix-anchored type check, error_detail cap, shared hint module
Three required fixes from the bundle review of 391e1872:

1. workspace/a2a_client.py: substring `type_name in msg` could miss
   the diagnostic prefix when an exception's message embedded a
   different class name mid-string (e.g. `OSError("see ConnectionError
   below")` → printed as plain msg, type lost). Switched to a
   prefix-anchored check (`msg.startswith(f"{type_name}:")` etc.) so
   the type label is always added when not already at the start of
   the message.

2. workspace/a2a_tools.py: `activity_logs.error_detail` is unbounded
   TEXT on the platform (handlers/activity.go does not validate
   length). A buggy or hostile peer could stream arbitrarily large
   error messages into the caller's activity log. Cap at 4096 chars
   at the producer — comfortably above any real exception traceback,
   well below an obvious-DoS threshold.

3. New regression test for JSON-RPC `code=0` — pins the
   `code is not None` semantics so the code is preserved in the
   detail rather than collapsing into the no-code path. Code=0 is
   not valid per the spec, but a malformed peer can still emit it
   and we want it visible for diagnosis.

Plus one optional taken: extracted the A2A-error → hint mapping into
canvas/src/components/tabs/chat/a2aErrorHint.ts. The two prior copies
(AgentCommsPanel.inferCauseHint + ActivityTab.inferA2AErrorHint) had
already drifted — Activity tab gained `not found`/`offline` cases the
chat panel never picked up, AgentCommsPanel handled empty-input
explicitly while Activity didn't. The shared module is the merged
superset, with 10 unit tests pinning each named pattern + the
"most specific first" ordering (Claude SDK wedge wins over generic
timeout).

Skipped (per analysis):
- Unicode-naive 120-char slice — Python str[:N] slices on code
  points, not bytes. Safe.
- Nested [A2A_ERROR] confusion — non-issue per reviewer; outer
  prefix winning still produces a structured render.
- MessagePreview + JsonBlock dual render on errors — intentional
  drilldown; raw JSON is below the fold for operators who need it.
- console.warn dedup — refetches don't happen per-event so spam
  risk is low.
- str(data)[:200] materialization — A2A response bodies aren't
  typically MB-sized.

Verified: 1005 canvas tests pass (10 new hint tests); 10 Python
send_a2a_message tests pass (1 new for code=0); tsc clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 23:47:44 -07:00
Hongming Wang
391e187281 fix(a2a,canvas): make delivery failures comprehensive instead of "[A2A_ERROR] "
Symptom: Activity tab and Agent Comms surfaced bare "[A2A_ERROR] "
(prefix + nothing) for failed delegations. Operator had no signal
to act on — no exception type, no target, no hint about what went
wrong, no next step. Fix is in three layers.

1. workspace/a2a_client.py — every error path now produces an
   actionable detail string:

   - except branch: some httpx exceptions (RemoteProtocolError,
     ConnectionReset variants) stringify to "". Pre-fix the catch
     was `f"{_A2A_ERROR_PREFIX}{e}"` → bare prefix. Now falls back
     to `<TypeName> (no message — likely connection reset or silent
     timeout)` and always appends `[target=<url>]` for traceability
     in chained delegations.
   - JSON-RPC error branch: previously dropped error.code on the
     floor and printed "unknown" when message was missing. Now
     surfaces both, including the well-defined "JSON-RPC error
     with no message (code=N)" path.
   - "neither result nor error" branch: pre-fix returned
     str(payload) which the canvas rendered as a successful
     response block. Now tagged as A2A_ERROR with a payload
     snippet so downstream UI routes through the error path.

2. workspace/a2a_tools.py — tool_delegate_task now passes
   error_detail (the stripped error message) through to the
   activity-log POST. The platform's activity_logs.error_detail
   column is the canvas's red error chip source; populating it
   makes the failure visible in the row header without the user
   having to expand into raw response_body JSON. The summary line
   also gets a 120-char prefix of the cause so the collapsed row
   reads "React Engineer failed: ConnectionResetError: ... [target=...]"
   instead of "React Engineer failed".

3. canvas/src/components/tabs/ActivityTab.tsx — MessagePreview
   now detects [A2A_ERROR]-prefixed bodies and renders a
   structured error block (red chip, stripped detail, cause hint)
   instead of the previous gray text-block that showed the literal
   "[A2A_ERROR]" string. inferA2AErrorHint mirrors the patterns
   from AgentCommsPanel.inferCauseHint so the same symptom reads
   the same way in both surfaces (Claude SDK init wedge → restart
   workspace; timeout → busy/stuck; connection-reset → transient
   blip then check logs).

Tests: 9 send_a2a_message tests pass (including a new regression
test for the empty-stringifying-exception case that the user
reported); 995 canvas tests pass; tsc clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 23:40:05 -07:00
Hongming Wang
54f7c75c81 fix(canvas): make AgentCommsPanel load failures observable
Reported symptom: canvas edges show "1 call · just now" between two
agents, but the Agent Comms tab for the source workspace renders
"No agent-to-agent communications yet" — even though
GET /workspaces/<id>/activity?source=agent&limit=50 returns a2a_send
+ a2a_receive rows.

Confirmed via curl that the API does return the rows the panel
should map. The panel's load handler was the suspect, but it had:

  .catch(() => setLoading(false))

which swallowed every failure path — network errors, JSON parse,
ANY throw inside the .then body — without leaving a single trace in
the console. The panel just sat on its empty state and gave the user
zero signal to act on. (And by extension, gave us nothing to debug
remotely either.)

Two changes:

1. Wrap the per-row `toCommMessage` call in a try/catch so one
   malformed activity row (unexpected request_body shape, etc.)
   doesn't throw out of the for-loop and skip the
   setMessages(msgs) line. Previously the panel would silently
   drop the entire batch when ANY row failed to parse.

2. Replace the bare `.catch(() => setLoading(false))` with a
   logging variant. Now a future "panel stuck empty" report comes
   with `AgentCommsPanel: load activity failed <err>` or
   `AgentCommsPanel: failed to map activity row {...}` in the
   console — diagnosable instead of opaque.

Behavior on the happy path is unchanged (5 existing tests still
pass; tsc clean). This is purely defensive: it makes the failure
path visible so the next stuck-empty report can be root-caused
instead of guessed at.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 23:27:50 -07:00
Hongming Wang
28911ded40 fix(canvas): split shared autoFitTimerRef so settle + tracking fits don't cross-cancel
Bundle-level review caught an implicit coupling in useCanvasViewport
between two distinct fit effects:

  - settle fit: 1200ms one-shot when provisioning transitions to zero
    (deploy just finished — settle on the whole org once)
  - tracking fit: 500ms debounced per molecule:fit-deploying-org event
    (track the org's bounds as children land during the deploy)

Both effects shared a single autoFitTimerRef, so each one's
clearTimeout call could silently cancel the other's pending fit.
Today's behavior happened to land in the right order out of luck —
the tracking handler fires per-arrival during the deploy, then the
settle effect arms after the last child completes. But nothing in
the code enforces that ordering; a future refactor that, say,
fires the settle effect from the same event sequence as the
tracking timer (mid-deploy status flicker) would silently drop the
settle fit because the tracking timer's clearTimeout ran last.

Splitting into settleFitTimerRef + trackingFitTimerRef makes the
two effects fully independent. Cleanup clears both. Tests still pass
(995/995); the refactor is mechanical.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 23:19:02 -07:00
Hongming Wang
e0f338e8ae fix(canvas): plug timer leak + optimistic-install semantics in SkillsTab
Three review-driven fixes plus regression coverage for the bugs
landed in 176b703d / deedb5ef:

1. clearTimeout the prior reload handle before scheduling a new one in
   both installFromSource and handleUninstall. Two installs within the
   PLUGIN_RELOAD_DELAY_MS window (15s) used to queue two
   loadInstalled() calls; the unmount cleanup only cleared the latest
   handle, and the second reconciliation could overwrite a still-
   correct optimistic state with a stale snapshot mid-restart.

2. Drop `setInstalledLoaded(true)` from the optimistic block. That
   flag's contract is "the initial GET has succeeded at least once" —
   it gates the auto-expand-registry effect. A user installing a
   custom-source plugin BEFORE the initial fetch returned would flip
   the gate prematurely, the auto-expand would never fire, and a
   followup loadInstalled racing with the optimistic write could
   overwrite our entry with [] mid-restart.

3. Don't force `supported_on_runtime: true` on the optimistic record.
   The "inert on this runtime" badge in the row renders on the value
   `=== false`. Forcing true would hide the badge for 15s if the user
   installed a plugin that doesn't actually support the workspace's
   runtime; the real value lands at refetch. Leaving the field
   undefined keeps the badge neutral until reconciliation arrives.

Plus a behavioral test (SkillsTab.install.test.tsx) that asserts:
  - the install POST URL contains the workspaceId (not "undefined")
  - the row's "Install" button is replaced by the green "Installed"
    tag synchronously after POST resolves, without advancing any
    timer — locks in the optimistic-update contract so a future
    refactor can't silently regress it.

995 canvas tests pass (2 new); tsc clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 22:47:46 -07:00
Hongming Wang
deedb5eff6 fix(canvas): optimistic plugin install so the UI flips to "Installed" instantly
After clicking Install, the button reverted from "Installing..." → "Install"
the moment the POST returned, then sat there for ~15s before the green
"Installed" tag appeared. The 15s gap is PLUGIN_RELOAD_DELAY_MS — we
delay the GET /workspaces/:id/plugins refetch to wait for the workspace
to restart (the listing handler returns [] while the container is
restarting because findRunningContainer comes up empty).

Uninstall already does optimistic local-state mutation (line 244 prior
to this commit) so the green tag → install button transition is
instant. Install was the inconsistent half — push the registry entry
into `installed` immediately after POST returns 200 and let the
delayed refetch reconcile.

The optimistic record uses the registry entry's metadata (name,
version, description, tags, runtimes, skills) and sets
supported_on_runtime=true. If reconciliation later disagrees (server
filter, install actually failed at the runtime layer), the refetch
overwrites the local record. Worst case is a brief 15s window where
we show "Installed" for a plugin that won't load — same window the
user previously experienced as "stuck on Install button" — but flipped
to the correct expected state.

Custom-source installs (github://, etc.) don't have a registry entry
to use, so they keep the old behavior of waiting for the refetch. Most
users install from the registry list in the UI.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 22:41:51 -07:00
Hongming Wang
176b703dbc fix(canvas): plugin install POSTed to /workspaces/undefined/plugins
SkillsTab read \`data.id\` from its props and used the value to build
two API URLs:
  POST   /workspaces/\${data.id}/plugins
  DELETE /workspaces/\${data.id}/plugins/\${pluginName}

But \`data\` is the React Flow node.data blob (WorkspaceNodeData) —
the workspace id lives on \`node.id\`, NOT on \`node.data\`. WorkspaceNodeData
extends \`Record<string, unknown>\`, which makes \`data.id\` type-check
silently as \`unknown\` instead of erroring. So every install/uninstall
hit \`/workspaces/undefined/plugins\`, the server's not-found path
returned 503 "workspace container not running" (misleading — the real
issue was the bogus URL), and the user got a confusing toast.

Every other tab in SidePanel takes \`workspaceId={selectedNodeId}\` as
an explicit prop. SkillsTab was the lone outlier, presumably because
"data has all the fields I need" is the obvious-looking shortcut that
TypeScript can't catch through the index-signature interface.

Fix: make \`workspaceId\` an explicit prop on SkillsTab, drop the
\`data.id\` reads, thread the prop from SidePanel like the other tabs.
Test fixture updated to pass it.

Verified: 993 canvas tests pass; tsc clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 22:36:35 -07:00
Hongming Wang
ee429cfee7 fix(canvas,dotenv): review-driven hardening of fit gate + parser parity
Independent code review surfaced two required documentation fixes and
one growth-correctness gap. All addressed here.

Auto-fit gate (useCanvasViewport):

The previous "subtree-grew-by-count" check missed the delete-then-add
case: subtree of 6 → delete one → 5 → a different child arrives → 6
again. A length-only comparison reads no growth and the fit is
skipped, leaving the new node off-screen. Switched to an id-set
membership snapshot so any brand-new id forces the fit even when the
count is unchanged.

The gate logic is now extracted as a pure exported function
`shouldFitGrowing(currentIds, prevIds, userPannedAt, lastAutoFitAt)`
so the regression-prone decision can be unit-tested in isolation
without standing up React Flow + DOM event refs. 8 cases cover:
first-fit, empty-prior, brand-new id, status-update with user pan,
no-pan-ever, pan-before-last-fit, delete-then-add same length, and
shrink-only with user pan.

Parser parity (dotenv.go + next.config.ts):

Existing-env semantics were undocumented in both parsers. Both now
explicitly note that an explicitly-set empty string (`KEY=` from the
parent shell) counts as "set" — the file value does NOT backfill —
matching the Go (os.LookupEnv) and Node (`process.env[k] !==
undefined`) primitives.

`export ` prefix uses a literal space; `export\tFOO=bar` is
intentionally rejected. Added the same comment in both parsers
to lock in this parity invariant since the commit message claims
"if one parser changes, the other has to."

Skipped (per analysis):
- Drag-pan respect for left-click drag-pan during deploy. The
  growth-check safety net means any pan gets overridden on the
  next arrival anyway, which is the desired behavior for the
  "watch the org deploy" use case. After deploy completes, no
  more fit-deploying-org events fire so drag-pan works freely.
- Map cleanup for lastFitSubtreeIdsRef. Per-tab session, UUID
  keys, tiny entries — not worth the cleanup hook.

993 canvas tests pass (8 new); Go dotenv tests pass; tsc clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 22:23:51 -07:00
Hongming Wang
e900a773ac fix(canvas): keep tracking org bounds during deploy after first fit
Symptom: org import zoomed to fit the parent + first child, then froze
at that framing while the remaining children kept materialising
off-screen. The user had to manually pan/zoom to see the new arrivals.

Two stacked bugs in useCanvasViewport's deploy-time auto-fit:

1. The user-pan-respect gate stamps userPannedAtRef on EVERY
   pointerdown that lands inside .react-flow__pane. That fires for
   ordinary clicks (deselect, click-near-a-card, modal-close-bubble
   from the import dialog) — not just for actual pan gestures. One
   accidental pre-import click was enough to lock out every fit for
   the rest of the deploy. Wheel is the canonical unambiguous
   pan/zoom signal; drop pointerdown.

2. Even with a real pan during deploy, when more children land the
   org's bounds grow and the user has lost context — the new
   arrivals are off-screen and the deploy is the primary thing they
   want to watch right now. The guard had no growth awareness, so
   one pan cancelled all follow-up fits unconditionally. Now we
   track the subtree size at the last fit (per root), and if the
   current subtree is larger we force the fit through regardless of
   the user-pan timestamp. When the subtree size hasn't changed
   (status updates on already-positioned nodes), the user-pan
   respect still applies — so post-deploy exploration isn't
   yanked back.

The Map keyed by root id supports back-to-back imports of different
orgs without one's growth count blocking the other's first fit.

985 canvas tests pass; tsc clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 21:37:54 -07:00
Hongming Wang
ec7ecd5461 fix(canvas): load monorepo .env in next.config so WS connects in dev
Symptom: spawn animation missing on org import. Workspaces appeared in
their final positions all at once instead of materialising one-by-one.

Root cause: the WS pill said "Reconnecting" forever because the canvas
was trying to connect to ws://localhost:3000/ws — its own port, where
Next.js dev doesn't serve a WebSocket — instead of the platform's
ws://localhost:8080/ws.

Why: deriveWsBaseUrl() falls back to window.location when
NEXT_PUBLIC_WS_URL is unset. Next.js auto-loads .env from the project
root only — and the canonical NEXT_PUBLIC_WS_URL /
NEXT_PUBLIC_PLATFORM_URL live in the monorepo root .env, alongside the
Go platform's MOLECULE_ENV / DATABASE_URL. Without an extra
canvas/.env.local copy (which would still be a per-developer manual
step), the canvas dev server starts blind to those vars.

Fix: next.config.ts now walks upward from __dirname looking for the
monorepo root (same workspace-server/go.mod sentinel the platform's
dotenv loader uses) and merges the root .env into process.env BEFORE
Next.js compiles. Existing env wins over file values, so docker
runs / CI / explicit exports still dominate.

The parser is a TypeScript mirror of workspace-server/cmd/server/
dotenv.go's parseDotEnvLine — same rules (export prefix, quotes,
inline comments, BOM) so a single .env line behaves identically across
both processes. If one parser changes, the other has to.

Production unaffected: `output: "standalone"` bakes resolved env into
the build, the workspace-server sentinel isn't shipped in deploy
artifacts, and the existing-env-wins rule means container env
dominates anywhere this file is consulted at runtime.

Verified: canvas dev startup log now shows
"[next.config] loaded 49 vars from /Users/.../molecule-core/.env";
served bundle has the correct ws://localhost:8080/ws URL; WS pill
flips to "Connected" after a hard refresh and per-workspace spawn
animations fire on the next org import as expected.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 21:29:05 -07:00
Hongming Wang
4014513b94 fix(dotenv): empty value with inline comment was returning the comment
The repo's own .env contains lines like
  CONFIGS_DIR=                   # Path to workspace-configs-templates/...
where the value is empty + an inline comment. The pre-fix parser:
  1. v = "                   # Path to ..."
  2. TrimLeft → "# Path to ..."
  3. Inline-comment loop looked for " #" or "\t#" — neither matches
     because the leading whitespace is gone.
  4. Returned the comment text as the value.

Result: os.Setenv("CONFIGS_DIR", "# Path to ...") clobbered the auto-
discovery fallback. The TemplatesHandler then opened the comment as
a directory, ReadDir errored silently, and GET /templates returned
[]. Canvas's Templates panel showed "No templates found in
workspace-configs-templates/" even though 8 valid templates existed
on disk.

Fix: strip leading whitespace from the value FIRST, then run a
position-aware comment scan that treats `#` as a comment marker iff
it's at the start of the (trimmed) value or preceded by whitespace.
A bare `#` mid-value (e.g. `KEY=token#fragment`) still survives.

Quoted-value handling moved above the comment scan so
`KEY="value # not"` keeps the `#` as part of the value — pulled the
quote-detection into the same TrimLeft-then-check shape as the bare
path. The unterminated-quote case still falls through to bare-value
handling.

Three regression tests added covering the exact .env line that
broke (`CONFIGS_DIR=    # ...`), spaces-only with comment, and tab-
only with comment.

Verified end-to-end: GET /templates now returns all 8 templates.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 21:17:21 -07:00
Hongming Wang
9a223afba1 fix(dotenv,socket): review-driven hardening of .env loader + WS poll
Independent code review surfaced three required fixes and one cheap
optional one. All addressed here.

dotenv parser:
- `export FOO=bar` was parsed as key `"export FOO"` (with embedded
  space) and silently os.Setenv'd, so a developer pasting from a
  direnv `.envrc` would get junk vars. Now strips the prefix.
- Quoted values weren't unwrapped: `FOO="hello world"` produced value
  `"hello world"` with literal quotes. Now strips one matched pair of
  surrounding `"` or `'`. Inside a quoted value `#` is part of the
  value, not a comment marker (matches godotenv convention).
- UTF-8 BOM at file start (Windows editors) would have produced a
  first key like U+FEFF + "FOO". Now stripped via TrimPrefix.

dotenv loader:
- findDotEnv()'s upward walk would happily pick up `~/.env` or a
  sibling-repo `.env` if the binary was run from `~/Documents/other-
  project/`. Real foot-gun on shared dev boxes. Now gated on a
  monorepo sentinel: the candidate directory must contain
  `workspace-server/go.mod`. Falls through to "no .env found" (=
  pre-fix behavior) when the sentinel is absent.

socket fallback poll:
- startFallbackPoll() previously fired only on onclose, so the very
  first connect attempt — when onclose hasn't fired yet because we
  never had a successful onopen — left the canvas with no HTTP poll
  for the duration of the failing handshake (Chrome can hold a
  SYN-SENT WebSocket open ~75s before giving up). Now also called at
  the top of connect(); the timer-already-running guard makes it a
  no-op when one cycle later onclose calls it again.

Test coverage added: export prefix, single+double quoted values, hash
inside quotes preserved, unterminated quote falls back to bare value,
CRLF stripping locked in, BOM stripping, and a sentinel-rejection
regression test that creates a temp .env with no workspace-server
sibling and asserts findDotEnv refuses to load it.

Verified: 985 canvas tests + 30 dotenv subtests + 4 dotenv integration
tests all pass; tsc clean; rebuilt platform from monorepo root with
stripped env still loads .env (49 vars) and /workspaces returns 200.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 21:09:18 -07:00
Hongming Wang
21db85d691 fix(canvas): cascade delete locally so children disappear without WS
Deleting a parent on a wedged WS used to leave the child cards on
the canvas as orphaned roots until the user manually refreshed.

Why: Canvas.tsx and DetailsTab.tsx both called `removeNode(parentId)`
after `DELETE /workspaces/:id?confirm=true` returned 200. `removeNode`
deliberately re-parents children rather than cascading — it relies on
the per-descendant WORKSPACE_REMOVED WS events the platform emits as
part of the cascade to drop each child individually. When the WS is
unhealthy those events never arrive, so the local store keeps the
children alive (now re-parented to root since their actual parent is
gone).

Fix: new `removeSubtree(rootId)` action on the canvas store mirrors
the server-side cascade — drops the root + every descendant + every
incident edge in one atomic set(). Both delete call sites now use it.
The WS events still arrive when WS is healthy and become idempotent
no-ops because the nodes are already gone.

Why a new action instead of changing removeNode: removeNode's
re-parenting behavior is correct for non-cascading flows (drag-out,
manual node detach in the future). Adding a sibling action keeps
both call shapes available rather than forcing every caller to opt
out of cascade.

6 new unit tests cover root cascade, mid-level cascade, leaf
no-op-cascade, selection clearing across the subtree, selection
preservation outside the subtree, and edge cleanup.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 20:51:09 -07:00
Hongming Wang
f8c900909e fix(platform): auto-load .env from CWD on startup
Local dev runs (`/tmp/molecule-server` after `go build`) used to 401 on
/workspaces the moment the DB had any workspace token in it: the binary
inherited a bare shell env with no MOLECULE_ENV, so AdminAuth's dev
fail-open branch (gated on MOLECULE_ENV=development) didn't fire.

The repo's .env already has MOLECULE_ENV=development plus DATABASE_URL,
REDIS_URL, ADMIN_TOKEN=, etc. Until now you had to `set -a && source
.env` in the launching shell — a paper cut, but worse, it's a paper
cut in EVERY automated dev workflow (IDE run configs, integration
test harnesses, the smoke-test loop in this branch's manual testing).

Fix: cmd/server now walks upward from CWD looking for a .env (capped
at 6 levels) and merges KEY=VALUE pairs into os.Environ before any
other code reads env. Already-set vars win over file values, so
docker run -e / CI exports / `KEY=val ./binary` still dominate — only
unset keys get filled in.

Why no godotenv dep: the format we use is plain KEY=VALUE with `#`
comments, no interpolation, no quoting (verified against the live
.env: 49 kv lines, zero references to ${...} or `export`). A 30-line
parser is auditable and avoids supply-chain surface.

Why it's safe in production: Dockerfile doesn't COPY .env into the
image and .env is gitignored, so prod containers have no .env on
disk to load — the function's findDotEnv() loop finds nothing and
returns silently. If an operator deliberately drops one in, the
existing-env-wins rule means container-injected env still dominates.

Verified by booting `env -i HOME=$HOME PATH=$PATH /tmp/molecule-server`
from the repo root with a stripped env: log shows
".env: /Users/.../molecule-core/.env — loaded 49, 0 already set" and
/workspaces returns 200 instead of 401.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 20:33:28 -07:00
Hongming Wang
0b4dfbd121 fix(canvas): suppress stale provisioning banners + add WS-down HTTP fallback poll
Two related fixes for the case where the canvas thinks workspaces are
stuck provisioning when they're actually online:

1. ProvisioningTimeout banners now gate on wsStatus === "connected".
   While the WS is in connecting/disconnected state, the local
   "provisioning" status reflects the last event received before the
   drop — workspaces may have transitioned to online minutes ago. The
   8m timeout was firing against frozen state and showing a wall of
   yellow warnings on already-online workspaces.

2. Socket layer now starts a 10s rehydrate poll when the WS goes
   unhealthy (onclose) and stops it on onopen/disconnect. The
   reconnect attempts continue in parallel; whichever recovers first
   wins. rehydrate()'s existing dedup gate prevents the open-time
   rehydrate from racing with a fallback poll. Without this the
   store could stay frozen for minutes while WS exponential backoff
   chewed through retries.

Plus the previously-uncommitted TemplatePalette flushSync change so
the import modal unmounts synchronously before doImport runs (otherwise
React batches the close with the import's setState prefix and the
modal backdrop hides the spawn animation).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 20:22:15 -07:00
Hongming Wang
1d71b4e9e5 fix(canvas): bundle of UX hardening — modals, position stability, error UX, paste
Single-themed bundle of fixes accumulated while polishing the canvas
chat / agent-comms / plugins / position flows. Each piece is small;
the connective tissue is "things observable from the canvas right
panel and the org-deploy flow that surprised real users".

UI / composer
  - Legend: add close X + persisted-localStorage state + reopener
    pill; default open for first-time users.
  - SidePanel: rename "Skills" tab label → "Plugins" (single-line;
    internal panelTab enum value, component name, and store keys
    unchanged).
  - SkillsTab: registry tri-state UI (loading / error / empty) with
    actionable Retry button + 10s explicit fetch timeout. Handle
    AbortSignal.timeout's DOMException by name (TimeoutError /
    AbortError) — Chromium's "signal timed out" message wouldn't
    match the prior naive /timeout/ regex. Reset mountedRef on every
    mount: pre-existing StrictMode dev-mode bug where cleanup-only
    `current = false` was never re-set, permanently wedging every
    `if (mountedRef.current) setX(...)` guard and producing a
    "Loading…" panel that never resolved on hard refresh.
  - ChatTab: paste-image-from-clipboard via onPaste handler; unique
    monotonic-counter filenames so same-second pastes don't collide
    on name+size dedup. mime→ext map avoids `image/svg+xml`-style
    raw extensions on synthesised filenames. Bypasses the
    DataTransfer constructor so Safari < 14.1 / older Edge work.
  - ChatTab: drop stuck error toast when the WS path already
    delivered the agent reply but the HTTP path errored late
    (sendingFromAPIRef gate now covers the .catch() handler).
  - ChatTab: filter heartbeat-style internal self-messages from the
    My Chat tab so historical rows with source_id=NULL don't
    surface as user-typed input.
  - Modal portals: OrgImportPreflightModal + MissingKeysModal
    (ProviderPickerModal + AllKeysModal) now createPortal to
    document.body and clamp max-h to 80vh. Escapes the ancestor
    containing block (TemplatePalette's fixed+filtered sidebar
    re-anchored descendants' position:fixed to itself, hiding
    modals behind workspace cards). MissingKeysModal bumped to
    z-[60] for stack ordering when both modals are open.
  - OrgImportPreflightModal saveOne: ref-based microtask-safe
    in-flight gate replaces the brittle "set startValue inside a
    setState updater and read on the next line" pattern (React 18
    doesn't guarantee functional updaters run synchronously; that
    path strands `saving:true` and never calls createSecret). Same
    useRef pattern guards SkillsTab.loadRegistry against concurrent
    fires and Fast-Refresh-stranded promises; force=true parameter
    on retry click bypasses the gate.

Agent comms
  - AgentCommsPanel: derive UI-facing `flow` field instead of using
    activity_type-derived direction. Self-logged a2a_receive rows
    (source_id == workspace_id, what the agent runtime writes to log
    its own outbound delegation replies) now correctly render as
    OUTBOUND with → arrow + right-justified bubble. Previously they
    rendered "← From Self" with Restart pointing at THIS workspace.
  - AgentCommsPanel: error rows replace the unactionable
    "X failed [A2A_ERROR]" body with banner + underlying-error
    code-block + cause-hint (matched on Claude Code SDK init wedge,
    deadline-exceeded, agent-thrown exception, empty-error) +
    Restart [peer] / Open [peer] action buttons.
  - AgentCommsPanel: render text bodies through ReactMarkdown +
    remark-gfm so multi-part replies (tables, code) render properly.

Multi-part text extractor
  - extractReplyText (live A2A response in ChatTab) and
    extractResponseText (chat history loader in message-parser):
    now COLLECT from every source — top-level parts, parts.root.text,
    and artifacts — joined with "\n". Previous "first source wins"
    silently dropped multi-part replies (Hermes summary+detail,
    Claude Code long-form table). Tests cover joined-from-parts,
    joined-from-artifacts, joined-from-both.

Position stability
  - canvas-topology.buildNodesAndEdges: auto-rescue heuristic now
    accepts currentParentSizes map; uses max(initial min, currently
    grown) for the bbox check. Fixes "child jumps to weird location
    after 30s" — the periodic socket health-check rehydrate
    (silenceSec > 30) was rebuilding nodes from scratch, and the
    rescue's reliance on grid-derived initial size false-flagged
    children the user dragged into the user-grown area.
  - canvas.hydrate: pass live measured dimensions from the existing
    store into buildNodesAndEdges.
  - socket.RehydrateDedup: pure exported helper class that gates
    rehydrate calls. Two states — in-flight (in-flight Promise reused
    by concurrent callers) + post-completion window (1.5s, returns
    Promise.resolve()). Initialised with -Infinity so first call
    always passes the gate. Wired into ReconnectingSocket.rehydrate.

A2A edges
  - New A2AEdge custom React Flow edge component portals its label
    out of the SVG layer via EdgeLabelRenderer so labels (a) render
    above workspace cards instead of being hidden behind them and
    (b) accept clicks. Click selects source + switches panel to
    Activity, but only on a NEW selection (preserves current tab on
    re-click of an already-selected source).
  - buildA2AEdges output tagged type:"a2a"; edgeTypes wired in
    Canvas.tsx.

Tests
  - 14 new vitest cases across 4 files (964 → 978 passing):
    OrgImportPreflightModal saveOne single-fire / double-click,
    any-of rendering; AgentCommsPanel toCommMessage flow derivation
    in all four shapes; canvas-topology rescue respects-grown /
    rescues-genuine-drift / fallback-without-live-size; socket
    RehydrateDedup gate behaviour; message-parser multi-part
    response extraction.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 19:54:43 -07:00
Hongming Wang
65b531acf6 fix(workspace): tag self-originated A2A POSTs with X-Workspace-ID
Workspace runtime fired four classes of A2A request to the platform
without the X-Workspace-ID header that identifies the source
workspace: heartbeat self-messages, initial_prompt, idle-loop fires,
and peer-to-peer A2A from runtime tools. The platform's a2a_receive
logger keys source_id off that header — without it, every such row
was written with source_id=NULL, which the canvas's My Chat tab
filters as ?source=canvas (i.e. "user typed this") and rendered the
internal triggers as if the human user had sent them. The
"Delegation results are ready..." heartbeat trigger was visible to
end users in the chat history; delegate_task A2A calls between agents
were misclassified the same way.

Centralise the header construction in a new platform_auth helper
self_source_headers(workspace_id) that returns auth_headers() PLUS
{X-Workspace-ID: <id>}. Apply it to:

  - heartbeat.py self-message (refactored from inline header dict)
  - main.py initial_prompt POST
  - main.py idle_prompt POST
  - a2a_client.py send_a2a_message (peer A2A from runtime)
  - builtin_tools/a2a_tools.py delegate_task (was missing ALL headers)

Tests:
  - test_heartbeat.py asserts the X-Workspace-ID header is set on
    the self-message POST.
  - test_a2a_tools_module.py asserts the same on delegate_task POSTs;
    FakeClient.post mocks updated to accept the headers kwarg.

Production effect lands the moment workspace containers are rebuilt
with this code; existing rows in activity_logs keep their NULL
source_id (legacy data). The canvas-side filter (#follow-up)
covers the historical-rows case until backfill.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 19:54:43 -07:00
Hongming Wang
01c417828d chore: re-trigger CI — staging CP has SetMaxIdleConns(0) fix from CP#259 2026-04-24 19:06:18 -07:00
Hongming Wang
560172968f chore: re-trigger CI — staging CP has CP#257 orgs UPDATE fix now 2026-04-24 16:45:16 -07:00
Hongming Wang
a7eb071e35 feat(org-templates): add ux-ab-lab + manifest entry + schema smoke test
Introduces the UX A/B Lab org template — a 7-agent cell for rapid
landing-page variant generation. The template is also the first
consumer of the new any_of env schema (ANTHROPIC_API_KEY OR
CLAUDE_CODE_OAUTH_TOKEN), so it doubles as an end-to-end fixture
for that feature.

Canvas tree (all claude-code / sonnet):

  Design Director
  ├── UX Researcher
  ├── Visual Designer
  ├── React Engineer
  ├── Deploy Engineer
  ├── A11y + SEO Auditor     ← WCAG AA + canonical/noindex gate
  └── Perf Auditor           ← Core Web Vitals gate

Template files live in their own standalone repo
(Molecule-AI/molecule-ai-org-template-ux-ab-lab, to be published);
this change adds the manifest.json entry so fresh clones + CI
populate the template via scripts/clone-manifest.sh.

Tests:
  - TestOrgTemplate_ClaudeAnyOfAuthPreflight — parses the exact
    required_env / recommended_env shape the template ships with
    via inline YAML (not on-disk, since org-templates/ is
    gitignored in this monorepo) and verifies either member
    alternative satisfies the preflight.

SEO safety built into the auditor's system prompt:
  - One canonical variant; all others canonicalise to it.
  - noindex, follow on non-canonical variants.
  - Sitemap contains only the canonical URL.
  - No robots.txt disallow (blocked pages can't emit canonical).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 16:22:14 -07:00
Hongming Wang
ad73a56db1 feat(env-preflight): support any_of OR groups (e.g. API_KEY OR OAUTH_TOKEN)
Extends the org-import env preflight so a template can declare an
alternative: satisfy ANY one member to pass. Motivated by the
Claude-family node case where either ANTHROPIC_API_KEY or
CLAUDE_CODE_OAUTH_TOKEN unlocks the agent — forcing both was wrong.

Server (workspace-server):
  - New EnvRequirement union type with custom YAML + JSON
    (un)marshaling. Accepts scalar (strict) or {any_of: [...]} in
    both on-disk org.yaml and inline POST /org/import bodies.
  - collectOrgEnv now returns []EnvRequirement. Dedups groups by
    sorted-member signature. "Strict wins" pruning drops any-of
    groups that mention a name already declared strictly (same
    tier and cross-tier).
  - Import preflight uses EnvRequirement.IsSatisfied — scalar =
    exact match, group = any member present.
  - Empty any_of: [] rejected at parse time (never-satisfiable).
  - 14 handler tests (6 updated for the union shape, 8 new
    covering any-of satisfaction, dedup, strict-dominates-group,
    cross-tier pruning, invalid-member filtering, YAML round-trip,
    and empty-any-of rejection).

Canvas:
  - EnvRequirement = string | {any_of: string[]} with envReqMembers,
    envReqSatisfied, envReqKey helpers.
  - OrgImportPreflightModal renders strict rows and any-of groups
    via a new AnyOfEnvGroup sub-component: "Configure any one"
    banner, per-member input, ✓-satisfied indicator, and dimmed
    siblings once any member is configured so the user can still
    switch providers.
  - TemplatePalette.OrgTemplate.required_env / recommended_env
    retyped to EnvRequirement[]; passthrough to the modal
    unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 16:16:25 -07:00
Hongming Wang
f995b90a85 test(canvas-events): expect both pan-to-node AND fit-deploying-org on NEW root provision
Commit 5adc8a74 (part of this PR) intentionally made
molecule:fit-deploying-org fire for root-level workspaces too — it
used to only fire for children, which meant a standalone create
didn't center the viewport until the first child arrived ~2s later.

The existing regression test still expected ONLY the
molecule:pan-to-node event for a new root, so it started failing
with "expected length 1, got 2". The product behavior is correct
(centering on the root immediately is better UX); the test was
pinning the old single-dispatch shape.

Fix: assert BOTH events fire, each with the right detail payload,
so a future regression that drops either one (or duplicates) trips
the test. Single-test update, no production code change. 953/953
canvas tests pass locally.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 15:55:52 -07:00
Hongming Wang
5adc8a74d5 feat(canvas+org): env preflight, EmptyState parity, shared useTemplateDeploy hook
Builds on #2061. Three internally-cohesive sub-features; easiest to
read in order.

## 1. Org-level env preflight

Server
- `OrgTemplate` + `OrgWorkspace` gain `required_env: string[]` and
  `recommended_env: string[]` YAML fields.
- `GET /org/templates` walks the tree and returns the tree-union
  (deduped, sorted) of both. `collectOrgEnv` dedup prefers required
  when the same key is declared at both tiers.
- `POST /org/import` preflights against `global_secrets` WHERE
  `octet_length(encrypted_value) > 0` (empty-value rows used to be
  counted as "configured" and the per-container preflight still
  failed at start time). 412 Precondition Failed + `missing_env`
  list when required keys are absent. `force=true` bypasses with
  an audit log line. DB lookup failure now returns 500 (was:
  silent fall-through that defeated the guard). Env-var NAMES
  validated against `^[A-Z][A-Z0-9_]{0,127}$` so a malicious
  template can't ship pathological names into the UI or DB.

Canvas
- New `OrgImportPreflightModal`: red "Required" section (blocking)
  and yellow "Recommended" section (non-blocking, import stays
  enabled, shows live missing-count next to the Import button).
- Per-key password input → `PUT /settings/secrets` → strike-through
  on save. Functional `setDrafts` throughout (no stale-closure
  clobbers on rapid successive saves). `useEffect` seed keyed on a
  sorted-join string signature so a parent re-render with a new
  array identity doesn't clobber typed inputs.
- `TemplatePalette.handleImport` branches: zero env declarations →
  straight to import; any declarations → fetch configured global
  secret keys, open the modal.

Tests (Go): `TestCollectOrgEnv_*` (5) cover union-across-levels,
required-wins-over-recommended (including same-struct), dedup,
empty, invalid-name rejection.

## 2. EmptyState parity with TemplatePalette

The "Deploy your first agent" grid used to call `POST /workspaces`
with no preflight while the sidebar palette ran
`checkDeploySecrets` + `MissingKeysModal` first. Same template
deployed two different ways → first-run users saw containers boot
in `failed` state without guidance. Now both surfaces share one
preflight + modal handshake.

EmptyState's previous `interface Template` dropped `runtime`,
`models`, and `required_env` — silently discarding exactly the
fields the preflight needs. `Template` now lives in
`deploy-preflight.ts` and is imported from there by both surfaces.

## 3. useTemplateDeploy hook

With the preflight + modal wiring now duplicated across
EmptyState + TemplatePalette + (going forward) any third surface,
extracted the pattern into `canvas/src/hooks/useTemplateDeploy.tsx`:

  const { deploy, deploying, error, modal } = useTemplateDeploy({
    canvasCoords: ...,   // optional, default random
    onDeployed: (id) => ...,
  });

Closes three drift surfaces that the duplication had created:
- `resolveRuntime` id→runtime fallback table (moved to
  `deploy-preflight.ts`). EmptyState had a narrower fallback that
  would have silently disagreed with the palette on any future id
  needing a non-identity mapping.
- `checkDeploySecrets` call signature. One owner.
- `MissingKeysModal` JSX wiring. One owner.

Narrow try/catch around `checkDeploySecrets` so a preflight network
failure clears `deploying` and surfaces via `setError` instead of
stranding the button forever. `modal: ReactNode` (not a
`renderModal()` function) — the previous memoization bought
nothing since consumers called it inline every render. Named
`MissingKeysInfo` interface for the state shape.

## 4. Viewport auto-fit user-pan gate fix

During org deploy the canvas was meant to pan+zoom to follow each
arriving workspace (`molecule:fit-deploying-org` event → debounced
fitView). In practice the fit stayed stuck on wherever the first
fit landed.

Root cause: React Flow v12 fires `onMoveEnd` with a truthy `event`
at the END of a programmatic `fitView` animation. The original
"respect-user-pan" gate stamped `userPannedAtRef` in `onMoveEnd`,
so our own fit completing looked like a user pan, and every
subsequent auto-fit short-circuited for the rest of the deploy.

Fix: stop trusting `onMoveEnd` for user-intent detection. Register
explicit `wheel` + `pointerdown` listeners on `document` with
capture phase and `target.closest('.react-flow__pane')` filter.
Capture-phase immunity to `stopPropagation`; pane-filter rejects
toolbar / modal / side-panel clicks (the old `window` fallback
caught those). `onMoveEnd` simplified to only drive the debounced
viewport save.

Also: fit event dispatched on root arrivals (not just children),
so the canvas centers on the just-landed root immediately instead
of waiting ~2s for the first child. Animation 600ms → 400ms so
successive per-arrival fits don't pile up visually. End-state fit
stays at 1200ms — intentional asymmetry ("settling" vs
"tracking"), documented in code.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 15:15:33 -07:00
Hongming Wang
a34121d451 fix(a2a_executor): remove shadowing local Part import that broke streaming
Python scoping rule: any name assigned anywhere in a function body
is local for the entire body. The outbound-files block at ~L442
had `from a2a.types import ... Part ...`, which made `Part` a local
name throughout the execute() function. The astream_events loop at
L358 — which runs BEFORE that import — then raised:

  UnboundLocalError: cannot access local variable 'Part' where it
  is not associated with a value

Every streaming A2A reply died with "Agent error: cannot access
local variable 'Part' where it is not associated with a value"
instead of the actual agent text. 5 tests caught it:
  - test_streaming_plain_string_content
  - test_streaming_anthropic_content_blocks
  - test_non_stream_events_ignored
  - test_core_execute_on_chat_model_end_captures_last_ai_message
  - test_core_execute_pii_redaction_when_pii_found

Fix: drop `Part` from the function-scope import (it is already
imported at module level on line 42) and leave a comment pinning
the rationale so a future refactor doesn't re-introduce the shadow.

All 43 test_a2a_executor tests pass locally.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 14:21:04 -07:00
Hongming Wang
425df5e5a9 merge(staging): resolve conflicts + fix 7 test regressions on top of #2061
- Merge origin/staging into fix/canvas-multilevel-layout-ux. 18 files
  auto-merged (mostly canvas/tabs/chat and workspace-server handlers
  the earlier DIRTY marker was stale relative to current staging).

- Fix 7 test failures surfaced by the merge:

  1. Canvas.pan-to-node.test.tsx — mockGetIntersectingNodes was
     inferred as vi.fn(() => never[]); mockReturnValueOnce of a node
     object failed type check. Explicit return-type annotation.

  2. Canvas.pan-to-node.test.tsx + Canvas.a11y.test.tsx — Canvas.tsx
     reads deletingIds.size (new multilevel-layout state). Both mock
     stores lacked deletingIds; added new Set<string>() to each.

  3. canvas-batch-partial-failure.test.ts — makeWS() built a wire-
     format WorkspaceData (snake_case, with x/y/uptime_seconds). The
     store's node.data is now WorkspaceNodeData (camelCase, no wire-
     only fields). Rewrote makeWS to produce WorkspaceNodeData and
     updated 5 call-site casts. No assertions changed.

  4. ConfigTab.hermes.test.tsx — two tests pinned pre-#2061 behavior
     that the PR intentionally inverts:

       a. "shows hermes-specific info banner" — RUNTIMES_WITH_OWN_CONFIG
          now contains only {"external"}, so the banner is no longer
          shown for hermes. Inverted assertion: now pins ABSENCE of
          the banner, with a comment noting the inversion.

       b. "config.yaml runtime wins over DB" — priority reversed:
          DB is now authoritative so the tier-on-node badge matches
          the form. Inverted scenario: DB=hermes + yaml=crewai →
          form shows hermes. Switched test's DB runtime off langgraph
          because the dropdown collapses langgraph into an empty-
          valued "default" option that would hide the win signal.

- No production code changed — this commit is staging merge + test
  realignment only. 953/953 canvas tests pass. tsc --noEmit clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 13:50:39 -07:00
Hongming Wang
94d9331c76 feat(canvas+platform): chat attachments, model selection, deploy/delete UX
Session's accumulated UX work across frontend and platform. Reviewable
in four logical sections — diff is large but internally cohesive
(each section fixes a gap the next one depends on).

## Chat attachments — user ↔ agent file round trip

- New POST /workspaces/:id/chat/uploads (multipart, 50 MB total /
  25 MB per file, UUID-prefixed storage under
  /workspace/.molecule/chat-uploads/).
- New GET /workspaces/:id/chat/download with RFC 6266 filename
  escaping and binary-safe io.CopyN streaming.
- Canvas: drag-and-drop onto chat pane, pending-file pills,
  per-message attachment chips with fetch+blob download (anchor
  navigation can't carry auth headers).
- A2A flow carries FileParts end-to-end; hermes template executor
  now consumes attachments via platform helpers.

## Platform attachment helpers (workspace/executor_helpers.py)

Every runtime's executor routes through the same helpers so future
runtimes inherit attachment awareness for free:
- extract_attached_files — resolve workspace:/file:///bare URIs,
  reject traversal, skip non-existent.
- build_user_content_with_files — manifest for non-image files,
  multi-modal list (text + image_url) for images. Respects
  MOLECULE_DISABLE_IMAGE_INLINING for providers whose vision
  adapter hangs on base64 payloads (MiniMax M2.7).
- collect_outbound_files — scans agent reply for /workspace/...
  paths, stages each into chat-uploads/ (download endpoint
  whitelist), emits as FileParts in the A2A response.
- ensure_workspace_writable — called at molecule-runtime startup
  so non-root agents can write /workspace without each template
  having to chmod in its Dockerfile.

Hermes template executor + langgraph (a2a_executor.py) + claude-code
(claude_sdk_executor.py) all adopt the helpers.

## Model selection & related platform fixes

- PUT /workspaces/:id/model — was 404'ing, so canvas "Save"
  silently lost the model choice. Stores into workspace_secrets
  (MODEL_PROVIDER), auto-restarts via RestartByID.
- applyRuntimeModelEnv falls back to envVars["MODEL_PROVIDER"]
  so Restart propagates the stored model to HERMES_DEFAULT_MODEL
  without needing the caller to rehydrate payload.Model.
- ConfigTab Tier dropdown now reads from workspaces row, not the
  (stale) config.yaml — fixes "badge shows T3, form shows T2".

## ChatTab & WebSocket UX fixes

- Send button no longer locks after a dropped TASK_COMPLETE —
  `sending` no longer initializes from data.currentTask.
- A2A POST timeout 15 s → 120 s. LLM turns routinely exceed 15 s;
  the previous default aborted fetches while the server was still
  replying, producing "agent may be unreachable" on success.
- socket.ts: disposed flag + reconnectTimer cancellation + handler
  detachment fix zombie-WebSocket in React StrictMode.
- Hermes Config tab: RUNTIMES_WITH_OWN_CONFIG drops 'hermes' —
  the adaptor's purpose IS the form, banner was contradictory.
- workspace_provision.go auto-recovery: try <runtime>-default AND
  bare <runtime> for template path (hermes lives at the bare name).

## Org deploy/delete animation (theme-ready CSS)

- styles/theme-tokens.css — design tokens (durations, easings,
  colors). Light theme overrides by setting only the deltas.
- styles/org-deploy.css — animation classes + keyframes, every
  value references a token. prefers-reduced-motion respected.
- Canvas projects node.draggable=false onto locked workspaces
  (deploying children AND actively-deleting ids) — RF's
  authoritative drag lock; useDragHandlers retains a belt-and-
  braces check.
- Organ cancel button (red pulse pill on root during deploy)
  cascades via existing DELETE /workspaces/:id?confirm=true.
- Auto fit-view after each arrival, debounced 500 ms so rapid
  sibling arrivals coalesce into one fit (previous per-event
  fit made the viewport lurch continuously).
- Auto-fit respects user-pan — onMoveEnd stamps a user-pan
  timestamp only when event !== null (ignores programmatic
  fitView) so auto-fits don't self-cancel.
- deletingIds store slice + useOrgDeployState merge gives the
  delete flow the same dim + non-draggable treatment as deploy.
- Platform-level classNames.ts shared by canvas-events +
  useCanvasViewport (DRY'd 3 copies of split/filter/join).

## Server payload change

- org_import.go WORKSPACE_PROVISIONING broadcast now includes
  parent_id + parent-RELATIVE x/y (slotX/slotY) so the canvas
  renders the child at the right parent-nested slot without doing
  any absolute-position walk. createWorkspaceTree signature gains
  relX, relY alongside absX, absY; both call sites updated.

## Tests

- workspace/tests/test_executor_helpers.py — 11 new cases
  covering URI resolution (including traversal rejection),
  attached-file extraction (both Part shapes), manifest-only
  vs multi-modal content, large-image skip, outbound staging,
  dedup, and ensure_workspace_writable (chmod 777 + non-root
  tolerance).
- workspace-server chat_files_test.go — upload validation,
  Content-Disposition escaping, filename sanitisation.
- workspace-server secrets_test.go — SetModel upsert, empty
  clears, invalid UUID rejection.
- tests/e2e/test_chat_attachments_e2e.sh — round-trip against
  a live hermes workspace.
- tests/e2e/test_chat_attachments_multiruntime_e2e.sh — static
  plumbing check + round-trip across hermes/langgraph/claude-code.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 13:27:51 -07:00
Hongming Wang
2dbd06d52e
Merge pull request #2055 from Molecule-AI/feat/lark-channel-first-class-v2
feat(channels): first-class Lark/Feishu support via schema-driven config
2026-04-24 19:57:57 +00:00
rabbitblood
998cd03265 fix(tabs-a11y): mock config_schema on adapter response
Schema-driven ChannelsTab renders no inputs when config_schema is
absent — the test's bare {type, display_name} mock mismatched the
real API shape and every getByLabelText("Bot Token") failed.

Mock now mirrors GET /channels/adapters with the Telegram schema
(bot_token password + chat_id text) so the a11y assertions run
against the actual rendered form.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 12:04:51 -07:00
molecule-ai[bot]
92a0c0073d
Merge pull request #2058 from Molecule-AI/chore/canvas-node22-upgrade
chore(canvas): upgrade node:20-alpine → node:22-alpine
2026-04-24 19:04:25 +00:00
molecule-ai[bot]
17f29e874a
Merge pull request #2029 from Molecule-AI/fix/canvas-a11y-tabs-v2
fix(canvas/a11y): add type=button to tab toolbar and settings buttons
2026-04-24 19:01:24 +00:00
molecule-ai[bot]
02406ea823
Merge pull request #2024 from Molecule-AI/fix/gh-identity-plugin-role-env-v2
feat(#1957): wire gh-identity plugin into workspace-server
2026-04-24 19:01:22 +00:00
Hongming Wang
fc2e6150d3
Merge pull request #2056 from Molecule-AI/fix/compliance-default-owasp-agentic
fix(compliance): flip default mode to owasp_agentic (detect-only)
2026-04-24 18:56:00 +00:00
molecule-ai[bot]
58745145cb
Merge pull request #2038 from Molecule-AI/hotfix/audit34-to-main
hotfix: Audit #34 fixes to main
2026-04-24 18:55:39 +00:00
1e5fc48acb chore(canvas): upgrade node:20-alpine → node:22-alpine
Node.js 20 reaches EOL 2026-09 and actions/checkout@v4 emits
Node.js 20 deprecation warnings on GitHub Actions (Node 24 forced
2026-06-02). Next.js 15.1 is fully compatible with Node 22.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-24 18:54:30 +00:00
Hongming Wang
9af058b82d fix(compliance): flip default mode to owasp_agentic (detect-only)
Prior state: compliance.mode default was "" (fully off) and no template
in the repo set it explicitly — so prompt-injection detection, PII
redaction, and agency-limit checks were silently disabled on every
live workspace, despite the machinery being present in
workspace/builtin_tools/compliance.py.

This was surfaced during a 2026-04-24 review of the A2A inbound path:
a2a_executor.py gates three security checks on
  _compliance_cfg.mode == "owasp_agentic"
and default config never matches, so every A2A message skipped all three.

Fix: default is now owasp_agentic + prompt_injection=detect. Detect mode
logs injection attempts as audit events without blocking — no UX cost,
just visibility. Operators who want stricter enforcement set
`prompt_injection: block` per workspace. Operators who genuinely want
compliance fully off can set `mode: ""` (not recommended; documented).

Changes:
- ComplianceConfig.mode default: "" → "owasp_agentic"
- Yaml parser fallback default: "" → "owasp_agentic" (must match dataclass)
- Docstring updated with rationale + opt-out snippet

Tests: 66/66 test_compliance.py + test_a2a_executor.py pass. 19/19
test_config.py pass. The one test asserting compliance_mode == "" is
for the "config load failed" fallback path (different from the default
config path) — correctly unchanged.

Security posture improvement: prompt-injection detection is now always
on for every workspace created after this ships, with zero behavior
change for legitimate inputs. Block mode remains an opt-in when an
operator wants to actively reject injection attempts rather than just
log them.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 11:52:09 -07:00
Hongming Wang
04e60e7303
Merge pull request #2052 from Molecule-AI/fix/canvas-provisioning-timeout-runtime-aware
fix(canvas): runtime-aware provisioning-timeout threshold (hermes 12min vs default 2min)
2026-04-24 18:51:46 +00:00
rabbitblood
00265d7028 feat(channels): first-class Lark/Feishu support via schema-driven config
Lark adapter was already implemented in Go (lark.go — outbound Custom Bot
webhook + inbound Event Subscriptions with constant-time token verify),
but the Canvas connect-form hardcoded a Telegram-shaped pair of inputs
(bot_token + chat_id). Selecting "Lark / Feishu" from the dropdown
silently sent the wrong field names — there was no way to enter a
webhook URL.

Fix: move form shape to the server.

- Add `ConfigField` struct + `ConfigSchema()` method to the
  `ChannelAdapter` interface. Each adapter declares its own fields with
  label/type/required/sensitive/placeholder/help.
- Implement per-adapter schemas:
  - Lark: webhook_url (required+sensitive) + verify_token (optional+sensitive)
  - Slack: bot_token/channel_id/webhook_url/username/icon_emoji
  - Discord: webhook_url + optional public_key
  - Telegram: bot_token + chat_id (unchanged UX, keeps Detect Chats)
- Change `ListAdapters()` to return `[]AdapterInfo` with config_schema
  inline. Sorted deterministically by display name so UI ordering is
  stable across Go's random map iteration.
- Update the 3 existing `ListAdapters` test sites to struct access.

Canvas (`ChannelsTab.tsx`):
- Replace the two hardcoded bot_token/chat_id inputs with a single
  schema-driven `SchemaField` component. Renders one input per field in
  the order the adapter returns them.
- Form state becomes `formValues: Record<string,string>` keyed by
  `ConfigField.key`. Values reset on platform-switch so stale
  Telegram credentials can't leak into a new Lark channel.
- "Detect Chats" stays but only renders for platforms in
  `SUPPORTS_DETECT_CHATS` (Telegram only — the only provider with
  getUpdates).
- Only schema-known keys are posted in `config`, scrubbing any stale
  values from previous platform selections.

Regression tests:
- `TestLark_ConfigSchema` locks in the 2-field Lark contract with the
  required/sensitive flags correctly set.
- `TestListAdapters_IncludesLark` confirms registry wiring + schema
  survives round-trip through ListAdapters.

Known pre-existing `TestStripPluginMarkers_AwkScript` failure in
internal/handlers is unrelated to this change (verified via stash+test
on clean staging).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 11:51:15 -07:00
Hongming Wang
0b237ed9dd refactor(canvas): extract runtime profiles to @/lib/runtimeProfiles
Preparation for a "hundreds of runtimes" plugin ecosystem. Keeping the
runtime-specific UX knobs in-line inside ProvisioningTimeout scales badly
— every new runtime would require editing a component, not just adding a
table entry. Other components (create-workspace dialog, workspace card
tooltips, etc.) will want the same runtime metadata.

Changes:

- New file `canvas/src/lib/runtimeProfiles.ts` owns:
  * `RuntimeProfile` type — structural shape, every field optional so
    new runtimes can partially-fill without breaking consumers.
  * `DEFAULT_RUNTIME_PROFILE` — 2-min default floor (docker-fast).
  * `RUNTIME_PROFILES` — named overrides (currently: hermes 12 min).
  * `WorkspaceRuntimeOverrides` — interface for server-provided
    per-workspace overrides, so operators can tune via template
    manifest / workspace metadata without a canvas release.
  * `getRuntimeProfile()` — resolver with
    overrides → profile → default priority.
  * `provisionTimeoutForRuntime()` — convenience wrapper.

- `ProvisioningTimeout.tsx` now delegates to the profile module.
  `DEFAULT_PROVISION_TIMEOUT_MS` re-exported for legacy test importers.

- Tests: 16/16 (up from 9 before the first fix). Adds pinning for:
  * overrides > profile > default priority chain
  * "every entry in RUNTIME_PROFILES resolves to a number" contract
  * backward-compat export

Adding a new slow runtime is now one table entry in
`canvas/src/lib/runtimeProfiles.ts` with a mandatory `WHY` comment.
Moving to server-driven profiles later is a ~10-line change (the
resolver already threads WorkspaceRuntimeOverrides through).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 11:48:39 -07:00
molecule-ai[bot]
1a27370e7b
Merge pull request #2051 from Molecule-AI/fix/canvas-embeddedteam-removal-and-canvasorbearer-return
refactor(canvas): remove unused EmbeddedTeam component from WorkspaceNode
2026-04-24 18:47:16 +00:00
Hongming Wang
9597d262ca fix(canvas): runtime-aware provisioning-timeout threshold
Hermes workspaces cold-boot in 8-13 min (ripgrep + ffmpeg + node22 +
hermes-agent source build + Playwright + Chromium ~300MB). The canvas's
2-min hardcoded "Provisioning Timeout" warning fired at ~2min and told
users their workspace was "stuck" while it was still mid-install. Users
hit Retry, triggering fresh cold boots and cancelling healthy workspaces.

User-facing symptom (reported 2026-04-24 18:35Z): hermes workspace showed
"has been provisioning for 3m 15s — it may have encountered an issue"
with Retry + Cancel buttons, while the EC2 was installing node_modules.

Fix:
- Keep DEFAULT_PROVISION_TIMEOUT_MS = 120_000 (2min) — correct for fast
  docker runtimes (claude-code, langgraph, crewai) where cold boot is
  30-90s.
- Add RUNTIME_TIMEOUT_OVERRIDES_MS = { hermes: 720_000 } (12min).
  Aligns with tests/e2e/test_staging_full_saas.sh's
  PROVISION_TIMEOUT_SECS=900 (15min) so UI warns shortly before the
  backend itself gives up.
- New timeoutForRuntime() resolves the base; per-node lookup in the
  check-timeouts interval so a mixed batch (1 hermes + 2 langgraph) uses
  the right threshold for each.
- timeoutMs prop is now optional. Undefined → per-runtime lookup; a
  number → forces a single threshold for every workspace (tests use this
  for deterministic behavior).

Tests: 4 new cases pinning the runtime-aware resolution, including a
guard that catches future regressions that would weaken hermes's budget.
Existing tests unchanged (they import DEFAULT_PROVISION_TIMEOUT_MS which
still exports 120_000).

13/13 pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 11:46:09 -07:00
molecule-ai[bot]
345dc9c2b4
Merge pull request #2033 from Molecule-AI/fix/validateagenturl-testnet-blocklist
fix(registry): block RFC 5737 TEST-NET and RFC 3849 documentation IPs
2026-04-24 18:42:18 +00:00
molecule-ai[bot]
312af5a94a
Merge pull request #2020 from Molecule-AI/fix/gh-identity-plugin-role-env
feat(#1957): wire gh-identity plugin into workspace-server
2026-04-24 18:42:14 +00:00
Molecule AI Core Platform Lead
49fc97e6e4 refactor(canvas): remove unused EmbeddedTeam component from WorkspaceNode
EmbeddedTeam was defined in WorkspaceNode.tsx but had no call site —
TeamMemberChip (which is called directly) covers the same rendering
responsibility. The function was stranded after a prior refactor and
was flagged by github-code-quality on PR #1989 (merged 2026-04-24T14:09Z
without this cleanup because the token died before push).

Removes 25 lines of dead code. MAX_NESTING_DEPTH is kept — it is used
by TeamMemberChip at line 498.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-24 18:30:36 +00:00
Hongming Wang
40cfc55784 feat(#1957): wire gh-identity plugin into workspace-server
Ships the monorepo side of molecule-core#1957 (agent identity collapse).
Companion to molecule-ai-plugin-gh-identity (new repo, merged-and-tagged
separately).

Changes:
- manifest.json: add gh-identity plugin to Tier 1 registry
- workspace-server/go.mod: require github.com/Molecule-AI/molecule-ai-plugin-gh-identity
- cmd/server/main.go: build a shared provisionhook.Registry, register
  gh-identity first (always), then github-app-auth (gated on GITHUB_APP_ID)
- workspace_provision.go: propagate workspace.Role into
  env["MOLECULE_AGENT_ROLE"] before calling the mutator chain, so the
  gh-identity plugin can see which agent is booting
- provisionhook/mutator.go: add Registry.Mutators() accessor so
  individual-plugin registries can be merged onto a shared one at boot

Boot log gains a line like:
  env-mutator chain: [gh-identity github-app-auth]

Effect per workspace:
- env contains MOLECULE_AGENT_ROLE, MOLECULE_OWNER, MOLECULE_ATTRIBUTION_BADGE,
  MOLECULE_GH_WRAPPER_B64, MOLECULE_GH_WRAPPER_SHA
- Each workspace template's install.sh can decode + install the wrapper at
  /usr/local/bin/gh, intercepting @me assignment and prepending agent
  attribution on PR/issue creates

Does not break existing workspaces — absent workspace.role, the plugin is
a no-op. Absent install.sh updates in each template, the env vars are
simply unused.

Follow-up template PRs (hermes, claude-code, langgraph, etc.) each add
~15 lines to install.sh to decode + install the wrapper.

Ref: #1957

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 18:28:18 +00:00
a2a6121a3f fix(registry): block RFC 5737 TEST-NET and RFC 3849 documentation IPs
PR #2021 follow-up: add TEST-NET reserved ranges and IPv6 documentation
prefix to validateAgentURL blocklist in all SaaS/self-hosted modes.

RFC 5737 reserves 192.0.2.0/24, 198.51.100.0/24, and 203.0.113.0/24 for
documentation and example code — no production agent has a legitimate
reason to use them. RFC 3849 designates 2001:db8::/32 as the IPv6
documentation prefix. All are blocked unconditionally.

Also adds 8 regression test cases covering each blocked range.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-24 18:27:07 +00:00
molecule-ai[bot]
f5d44eba8c
Merge pull request #2048 from Molecule-AI/fix/active-tasks-cancellation-stuck-2026
fix(executors): active_tasks stuck at 1 under CancelledError — queue drain blocked (#2026)
2026-04-24 18:17:03 +00:00
molecule-ai[bot]
90def3f3b9
Merge pull request #2040 from Molecule-AI/hotfix/canvasorbearer-return-main
hotfix(middleware): P0 — add missing return after AbortWithStatusJSON in CanvasOrBearer
2026-04-24 18:16:05 +00:00
f11b1703f0 hotfix(wsauth+restart_template): CanvasOrBearer return + CWE-22 path traversal guard
- wsauth_middleware: add missing return after AbortWithStatusJSON in
  CanvasOrBearer final else branch (CRITICAL auth bypass)
- restart_template: apply sanitizeRuntime before filepath.Join to
  prevent CWE-22 path traversal via dbRuntime field
2026-04-24 18:12:07 +00:00
molecule-ai[bot]
6b557082d5
Merge branch 'staging' into hotfix/canvasorbearer-return-main 2026-04-24 18:10:35 +00:00
Hongming Wang
4b0c85b2a4
Merge pull request #2046 from Molecule-AI/fix/scheduler-wedge-2026
fix(scheduler): prevent wedge on invalid UTF-8 + unbounded DB ops (#2026)
2026-04-24 18:05:33 +00:00