molecule-core

Author	SHA1	Message	Date
Molecule AI Community Manager	84f676f85c	docs(community): Phase 34 Discord-style community announcement Community announcement for Phase 34 GA (April 30, 2026). Four features: Tool Trace, Platform Instructions, Partner API Keys, SaaS Federation v2. Discord-format, ~550 words, community-native tone. Addresses Molecule-AI/molecule-core#1836. Co-Authored-By: Claude Community Manager <noreply@anthropic.com>	2026-04-24 01:44:26 +00:00
Molecule AI Community Manager	899eeabacf	docs(community): Phase 34 Discord-style community announcement Community announcement for Phase 34 GA (April 30, 2026). Four features: Tool Trace, Platform Instructions, Partner API Keys, SaaS Federation v2. Discord-format, ~550 words, community-native tone. Addresses Molecule-AI/molecule-core#1836. Co-Authored-By: Claude Community Manager <noreply@anthropic.com>	2026-04-24 01:44:26 +00:00
Molecule AI Community Manager	9bc24f7ee6	docs(community): Phase 34 launch content — Reddit/HN/Discord posts + FAQ Phase 34 GA: April 30, 2026. Four launch files: - phase34-reddit-post.md: r/MachineLearning self-post, tool_trace-led, ~400w - phase34-hn-post.md: Show HN title + body + first-reply technical comment - phase34-discord-announcement.md: @devs ping, bullet-point feature summary - phase34-community-faq.md: top-10 pre-brief for DevRel + Support Partner name placeholder "Acme Corp" — swap when PM confirms. Co-Authored-By: Claude Community Manager <noreply@anthropic.com>	2026-04-24 01:44:26 +00:00
Molecule AI Plugin-Dev	61c5f8ad9a	feat(plugin): implement MCPServerAdaptor (issue #847 ) Rule-of-three threshold met: 4 plugin proposals (molecule-firecrawl #512, molecule-github-mcp #520, molecule-browser-use #553, mcp-connector #573) all independently shipped the same mcpServers-adapter pattern. Adds MCPServerAdaptor to builtins.py — plugins wrapping an MCP server now declare `from plugins_registry.builtins import MCPServerAdaptor as Adaptor` in their per-runtime adapter file. The adaptor: - Merges mcpServers from settings-fragment.json into <configs>/.claude/settings.json (deep-merge so multiple plugins' servers coexist). - Optionally ships skills/rules/setup.sh via AgentskillsAdaptor delegation. - On uninstall: removes skills/rules but intentionally leaves mcpServers entries in settings.json (users may share configs with other tools or have manually curated entries). Also fixes _deep_merge_hooks: non-hook top-level keys that are dicts (e.g. mcpServers) are now deep-merged with existing values instead of being skipped via setdefault. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-24 01:42:13 +00:00
Hongming Wang	d359390f83	fix(canvas): parent auto-fit sizing + rescue out-of-bounds children Two playability bugs in the new flat-cards layout: 1. On first load or fresh org import a parent had no explicit width or height, so children whose stored position sat inside their (eventual) parent's rectangle rendered visually outside the smaller default parent box. Compute a parent starting size in canvas-topology: • 2-column grid of child-default footprints + header/side padding • Grows per child count (2→1 row, 3-4→2 rows, etc.) and stamp it onto the Node's width/height so the first paint already contains every child. 2. If a child's stored relative position actually falls outside the parent's computed bounds (legacy org-imports at 0,0, pre-refactor absolute coordinates, manually-nudged rows), assign that child a deterministic default grid slot inside the parent instead. Runtime cascade: added growParentsToFitChildren to onNodesChange so when the user drags or resizes a child past the parent's current bounds, the parent grows to contain it (+padding). Miro/FigJam-style frame auto-fit — grow-only, never shrinks under the user's manual resize. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 18:29:04 -07:00
Hongming Wang	cc194f0b7e	refactor(canvas): flat workspace cards with React Flow native parenting Every workspace now renders as a first-class card on the canvas regardless of parent_id. The old "parent card contains mini TeamMember chips" layout is gone — if B is parented to A, B renders as a full card inside A's coordinate space using React Flow's `parentId` binding, so moving A carries B along and children have the same detail + actions as root cards. Details: - canvas-topology.ts: topologically sort parents before children (React Flow ordering requirement), compute each child's RF-native parentId + relative position on load. DB keeps absolute x/y; the abs→rel conversion happens here, reverse translation in Canvas.onNodeDragStop before savePosition PATCHes the DB. - WorkspaceNode.tsx: delete the EmbeddedTeam + TeamMemberChip blocks, simplify the size classes, and add NodeResizer (visible when selected) so users can drag any edge/corner to grow or shrink. Parent cards default to a larger min size so nested children have breathing room. - Canvas.tsx drop targeting rewritten: bounds-based hit test against each node's measured absolute bbox, deepest match wins. Fixes two prior bugs at once — dropping onto Claude Code with a nested same- named Hermes no longer picks the wrong node, and the target can now be a nested workspace when that's where the pointer actually released. - canvas.ts nestNode + removeNode: translate position between old and new parent's absolute origin on nest/unnest so the card doesn't jump, and re-point the RF `parentId` alongside `data.parentId` on reparent. - Tests: hidden-flag assertions replaced with parentId checks; obsolete TeamMemberChip a11y/eject tests deleted (the UI component no longer exists). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 18:18:44 -07:00
Hongming Wang	1265bcbec6	Merge pull request #1921 from Molecule-AI/fix/1877-token-rotation-race fix(#1877): close token-rotation race on restart — Option A+Option B	2026-04-23 17:51:13 -07:00
Hongming Wang	8a07cf4035	fix(canvas): skip already-nested workspaces as drop targets Dragging one workspace onto another could pick a nested child as the "nearest" drop target instead of the visible parent card the user actually hovered. The effect: dropping a free-floating Hermes Agent onto a Claude Code Agent that already had a Hermes Agent nested inside showed "Move 'Hermes Agent' inside 'Hermes Agent'?" — the confirmation referenced the nested same-named child, not Claude Code. Why: getIntersectingNodes returns every overlapping node, including hidden=true children that render inside their parent's card. The parent and child share bounding boxes, so the child often "won" the nearest-distance check. Filter them out at the source: a node that's already got a parentId (or is hidden) is never a valid top-level drop target. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 17:49:01 -07:00
Molecule AI Dev Lead	9cd4e06a78	feat(ci): run E2E Staging Canvas on staging branch pushes Add `staging` to push/pull_request branches in e2e-staging-canvas.yml so the auto-promote gate check (`--event push --branch staging`) can find a completed run for this workflow. Without this, the E2E Staging Canvas gate is structurally impossible to satisfy from staging pushes. Mirrors what PR #1891 does for e2e-api.yml — completing the two-part fix for the auto-promote gate gap (issue tracking: auto-promote blocked because both E2E gate workflows only fired on main). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-23 17:47:51 -07:00
molecule-ai[bot]	946dc574cf	feat(ci): run E2E API smoke test on staging branch Adds branches: [main, staging] to e2e-api.yml triggers so the auto-promote workflow can see E2E API status on staging SHA. Without this, the promoter gate for E2E API always reports missing and auto-promotion is permanently blocked.	2026-04-23 17:47:47 -07:00
Molecule AI Core-BE	88c929875e	fix(#1877 ): nil provisioner guard in issueAndInjectToken Fix panic in TestIssueAndInjectToken_HappyPath where h.provisioner is nil (the handler was created without a real provisioner in unit tests). Add nil guard so the pre-write step is skipped gracefully — token is still injected into ConfigFiles as before, and the runtime-side 401 retry handles any race. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-23 17:47:18 -07:00
Molecule AI Core-BE	b5e2142c46	fix(#1877 ): close token-rotation race on restart — Option A+Option B combined Platform side (Option B): - provisioner.go: add WriteAuthTokenToVolume() — writes .auth_token to the Docker named volume BEFORE ContainerStart using a throwaway alpine container, eliminating the race window where a restarted container could read a stale token before WriteFilesToContainer writes the new one. - workspace_provision.go: call WriteAuthTokenToVolume() in issueAndInjectToken as a best-effort pre-write before the container starts. Runtime side (Option A): - heartbeat.py: on HTTPStatusError 401 from /registry/heartbeat, call refresh_cache() to force re-read of /configs/.auth_token from disk, then retry the heartbeat once. Fall through to normal failure tracking if the retry also fails. - platform_auth.py: add refresh_cache() which discards the in-process _cached_token and calls get_token() to re-read from disk. Together these eliminate the >1 consecutive 401 window described in issue #1877. Pre-write (B) is the primary fix; runtime retry (A) is the self-healing fallback for any residual race. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-23 17:47:18 -07:00
Hongming Wang	9ce8d97448	test: regression guard for #1738 — cp-provisioner uses real instance_id Pins the fix-invariants from PR #1738 (merged 2026-04-23) against regression. Pre-fix, `CPProvisioner.Stop` and `IsRunning` both passed the workspace UUID as the `instance_id` query param: url := fmt.Sprintf("%s/cp/workspaces/%s?instance_id=%s", baseURL, workspaceID, workspaceID) ^ should be the real i-* ID AWS rejected downstream with InvalidInstanceID.Malformed, orphaned the EC2, and the next provision hit InvalidGroup.Duplicate on the leftover SG — full Save & Restart cascade failure. ## Tests added - TestStop_UsesRealInstanceIDNotWorkspaceUUID: stub resolveInstanceID to return an i-* ID, assert the CP request's instance_id query param carries that i-* value (not the workspace UUID). - TestStop_NoInstanceIDSkipsCPCall: empty DB lookup → no CP call at all (idempotent). Guards against re-introducing the "call CP with '' and let AWS reject" footgun. - TestIsRunning_UsesRealInstanceIDNotWorkspaceUUID: mirror for the /cp/workspaces/:id/status path — same bug shape. All 3 pass on current staging (which has the fix). Reverting either Stop or IsRunning to the pre-#1738 shape causes these to fail loud. Extends molecule-core#1902's regression suite. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 17:45:13 -07:00
Hongming Wang	5ebe6ccb33	test: regression guards for 2026-04-23 hermes + CP bug wave Three complementary regression tests for the chain of P0s fixed today. Each targets a specific bug class that reached production, and will fire loud if any of them regress. ## 1. E2E A2A assertion enhancements (tests/e2e/test_staging_full_saas.sh) The existing A2A check looked for "error\|exception" in the response text, which was too broad and missed the actual error patterns we hit. Now matches each known error class individually with a diagnostic fail message pointing at the exact bug: - "[hermes-agent error 401]" → hermes #12 (API_SERVER_KEY) - "hermes-agent unreachable" → gateway process died - "model_not_found" → hermes #13 (model prefix) - "Encrypted content is not supported" → hermes #14 (api_mode) - "Unknown provider" → bridge PROVIDER misconfig Also asserts the response contains the PONG token the prompt asked for — catches silent-truncation/echo regressions. ## 2. Hermes install.sh bridge shell harness (tools/test-hermes-bridge.sh) 4 scenarios × 16 assertions, all offline (no docker, no network): - openai-bridge-happy: OPENAI_API_KEY + openai/gpt-4o → provider=custom, model="gpt-4o" (prefix stripped), api_mode=chat_completions - operator-custom-wins: explicit HERMES_CUSTOM_* → bridge skipped - openrouter-not-touched: OPENROUTER_API_KEY → provider=openrouter, slug kept - non-prefixed-model: bare "gpt-4o" → prefix-strip is a no-op Runs in <1s, can be wired into template-hermes CI. Pins the exact config.yaml shape — any drift in derive-provider.sh or the bridge if-block breaks a test. ## 3. Canvas ConfigTab hermes tests (ConfigTab.hermes.test.tsx) 5 vitest cases covering the #1894 bugs: - Runtime loads from workspace metadata when config.yaml missing - "No config.yaml found" red error hidden for hermes - Hermes info banner shown instead - Langgraph workspace still sees the red error (regression-guard the other way) - config.yaml runtime wins over workspace metadata when present ## Running bash tools/test-hermes-bridge.sh # 16 assertions cd canvas && npx vitest run src/components/tabs/__tests__/ConfigTab.hermes.test.tsx # 5 cases # E2E enhancements ride on the existing staging E2E workflow ## Not yet covered (tracked in #1900) CP admin delete-tenant EC2 cascade, cp-provisioner instance_id lookup (#1738), purge audit SQL mismatch (#241), and pq prepared- statement cache collision (#242). These are in-controlplane-repo concerns — separate PR with CP-side sqlmock + integration tests. Closes items in #1900. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 17:45:13 -07:00
Hongming Wang	307b5b5408	Merge pull request #1930 from Molecule-AI/fix/e2e-hermes-boot-timeout fix(e2e): hermes cold-boot tolerance — 20min deadline + treat failed as transient	2026-04-23 17:44:50 -07:00
Hongming Wang	7356cf8d3a	fix(chat): clear sending spinner when any path delivers the reply Two latent bugs kept the "Processing with Claude Code..." timer ticking after the agent had already answered: 1. The A2A_RESPONSE store handler wrote into agentMessages[workspaceId] (no prefix) but ChatTab's "clear sending" effect subscribed to agentMessages["a2a:" + workspaceId]. Keys never matched — the effect was dead code from day one. Removed the dead subscription and moved the setSending(false) into the pendingAgentMsgs effect so any reply delivered via a WS push (Claude Code SDK, Hermes's send_message_to_user) also closes the spinner. 2. Added an activity-log fallback: when the platform emits a successful a2a_receive ACTIVITY_LOGGED for this workspace, clear sending and stop the timer. That covers the "runtime answered but we never saw the store message" case Claude Code exhibited tonight — the HTTP request can stay in flight while the SDK already pushed its reply. Symmetric a2a_receive error path also clears sending and surfaces the error message, so a runtime-side failure no longer hangs the UI. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 17:43:30 -07:00
Hongming Wang	b3da0b29c5	fix(e2e): hermes cold-boot tolerance — 20min deadline + treat failed as transient Today's E2E run 24864011116 timed out at 10 min waiting for workspace to reach online. Hermes cold-boot measured 13 min on the same day's apt mirror (my manual repro on 18.217.175.225). The original 10 min deadline was a ~2x too-tight budget. Also: the `failed` branch was a hard fail, but bootstrap-watcher (cp#245) marks workspace=failed at 5 min if install.sh hasn't finished yet. Heartbeat then transitions failed → online around 10-13 min. Pre this fix, the E2E bailed at the failed read and missed the recovery that was seconds away. ## Changes - Deadline: 10 min → 20 min (hermes worst-case 15 + slack) - `failed` status: now tolerated as transient; loop logs once then keeps polling. Only hard-fails at the final deadline. - Added transition logging (`WS_LAST_STATUS`) so CI output shows the provisioning → failed → online flow instead of silent polling. ## Why not fix cp#245 instead Both should be fixed. cp#245 (bootstrap-watcher deadline) is the root cause; this E2E fix is the defense-in-depth. When cp#245 lands, the `failed` transient log will stop firing but the rest of the logic still protects against other slow-apt-day spikes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 17:42:52 -07:00
Hongming Wang	9813d2905b	Merge pull request #1897 from Molecule-AI/fix/restore-quickstart-plus-hotfixes fix(quickstart): restore 5 dropped commits from #1871 + live-test hotfixes	2026-04-23 17:40:43 -07:00
Hongming Wang	1c60869e1e	Merge remote-tracking branch 'origin/staging' into fix/restore-quickstart-plus-hotfixes # Conflicts: # .gitignore	2026-04-23 17:38:08 -07:00
Hongming Wang	18ebb1d7bf	fix(server): remove 60s A2A client timeout + correct file-read cat args Two bugs surfaced while testing Claude Code + OAuth deploys: 1. A2A proxy: a2aClient had a 60s Client.Timeout "safety net" that defeated the per-request context deadlines the code otherwise sets (canvas = 5m, agent-to-agent = 30m). Claude Code's first-token cold start over OAuth takes 30-60s, so every first "hi" into a fresh claude-code workspace returned 503 at exactly the 1m mark. Removed the Client.Timeout — the context deadline now governs as documented in the adjacent comment. 2. Files tab: ReadFile ran `cat <rootPath> <filePath>` as two args to cat. `cat /home agent/turtle_draw.py` tries to read the rootPath directory (errors "Is a directory") and then resolves the filePath relative to the container cwd, which is not guaranteed to equal rootPath. Result: the file-content pane stayed blank even though the file listed fine. Join into a single path before exec. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 17:25:53 -07:00
Hongming Wang	d812c28431	Merge pull request #1932 from Molecule-AI/chore/sync-staging-to-main-followup chore: sync staging → main (follow-up: 9 commits since #1913)	2026-04-23 17:25:07 -07:00
Hongming Wang	e337efe974	fix(canvas): propagate runtime through WORKSPACE_PROVISIONING event The side-panel runtime pill read "unknown" for newly-deployed workspaces because canvas-events.ts created the node from WORKSPACE_PROVISIONING payload — and the payload only carried name + tier. No refetch filled the gap during provisioning, so the user saw "RUNTIME unknown" on the card even though the DB row had the real runtime set. Includes runtime in every WORKSPACE_PROVISIONING emitter: * handlers/workspace.go — initial create * handlers/workspace_restart.go — explicit restart, auto-restart, and crash-recovery resume loop * handlers/org_import.go — multi-workspace org imports Canvas-side: canvas-events.ts reads payload.runtime when creating the node; the provisioning test asserts the pill value is populated before any refetch. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 17:17:49 -07:00
Hongming Wang	dc50a1c775	refactor(canvas): data-drive provider picker from template config.yaml The MissingKeysModal's provider list was hardcoded in deploy-preflight.ts as RUNTIME_PROVIDERS — a per-runtime map that duplicated what each template repo already declares in its config.yaml. That meant adding a new provider required changes in two places, and the UI could drift out of sync with the actual template (e.g. when a template adds a MiniMax or Kimi model, the picker wouldn't know). The single source of truth for "which env vars does this workspace need" is each template's config.yaml: * `runtime_config.models[].required_env` — per-model key list * `runtime_config.required_env` — runtime-level AND list Go /templates already returned `models`. This change: * Adds `required_env` alongside `models` on templateSummary so the canvas receives the full picture. * Rewrites deploy-preflight.ts to derive ProviderChoice[] from a template object via `providersFromTemplate(template)`: - groups `models[]` by unique required_env tuple - falls back to runtime_config.required_env when models is empty - decorates labels with model counts (e.g. "OpenRouter (14 models)") * `checkDeploySecrets(template, workspaceId?)` now takes a template object instead of a runtime string. Any-provider satisfaction still short-circuits preflight to ok=true. * MissingKeysModal receives `providers` directly; no more lookups. * TemplatePalette threads `template.models` + `template.required_env` into the preflight. Side effects: * Claude Code's dual-auth (OAuth token OR Anthropic API key) now surfaces as two picker options — its config.yaml already declared both, the UI just wasn't reading them. * Hermes picker now shows 8 provider options (Nous, OpenRouter, Anthropic, Gemini, DeepSeek, GLM, Kimi, Kilocode) instead of the hand-picked 3, matching its 35-model reality. Removed the legacy RUNTIME_PROVIDERS / RUNTIME_REQUIRED_KEYS / getRequiredKeys / findMissingKeys exports; MissingKeysModal.test.tsx deleted (its coverage is subsumed by the new template-driven deploy-preflight.test.ts). 58 modal-adjacent tests pass; full canvas suite 919 pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 17:07:15 -07:00
Hongming Wang	3456bf79a7	Merge pull request #1931 from Molecule-AI/chore/remove-internal-content-from-monorepo chore: remove internal content + add hard CI gate (CEO directive 2026-04-23)	2026-04-23 17:04:29 -07:00
rabbitblood	427b764f58	chore: remove internal content + add hard CI gate (CEO directive 2026-04-23) This monorepo is public. Internal content (positioning, competitive briefs, sales playbooks, PMM/press drip, draft campaigns) belongs in Molecule-AI/internal — never here. ## What this PR removes /research/ (3 competitive briefs) /marketing/ (45 files: assets, audio, community, copy, demos, devrel, drip, pmm, press, sales) /docs/marketing/ (31 draft campaign / blog / brief files) comment-1172.json + comment-1173.json test-pmm-temp.txt tick-reflections-temp.md 83 files removed, 7,141 lines deleted from public history (going forward — historical commits remain visible in this repo's git log). ## Companion: internal repo absorption Molecule-AI/internal PR `chore/migrate-monorepo-internal-content-2026-04-23` absorbs all 79 files into `from-monorepo-2026-04-23/` for curator triage into the existing internal/marketing/ tree. Bulk-dump avoids file-collision on overlapping subdirs (audio, devrel, pmm). ## Three-layer enforcement so this can't recur 1. .gitignore — blocks `git add` of /research, /marketing, /docs/marketing, /comment-.json, -temp.{md,txt}, /test-pmm-, /tick-reflections- 2. .github/workflows/block-internal-paths.yml — CI hard gate. Fails any PR that adds a forbidden path. Cannot be silently bypassed. 3. docs/internal-content-policy.md — canonical decision tree for agents and humans. Linked from the CI failure message. A separate PR on molecule-ai-org-template-molecule-dev updates SHARED_RULES to teach every agent role to write internal content directly to Molecule-AI/internal via gh repo clone + commit + PR (the prevention-at- source layer; this PR is the mechanical backstop). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 16:58:28 -07:00
Hongming Wang	958eec3a7d	Merge pull request #1929 from Molecule-AI/chore/remove-org-templates chore: remove org-templates/molecule-dev — standalone repo is source of truth	2026-04-23 16:46:55 -07:00
Hongming Wang	a8f41a57ea	chore: remove org-templates/molecule-dev — standalone repo is source of truth Reverts the `.gitignore` checkin-exception for molecule-dev that let it creep back on every main↔staging sync. Keeping this dir in core meant: - 800KB of template files shipping with every monorepo clone - Confusion about which copy is canonical (this one vs the standalone Molecule-AI/molecule-ai-org-template-dev repo) - Merge churn — `0506e0c` re-added it against #6e6de39's removal intent just by taking 'theirs' in a conflict resolution All org-templates now live in their own repos, fetched via scripts/clone-manifest.sh when needed locally. molecule-dev has no special status; it's the same shape as every other org template. The .gitignore rule is now a simple `/org-templates/` with no exceptions, matching the rule structure already used for `/plugins/` and `/workspace-configs-templates/`. Future conflict resolutions can't re-add by accident because git won't track anything under that path. User flagged this at session start 2026-04-23 ('org-templates should only exist as standalone template repo'). Fixing for real this time. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 16:44:18 -07:00
Hongming Wang	c5bcd7298c	Merge remote-tracking branch 'origin/staging' into fix/restore-quickstart-plus-hotfixes # Conflicts: # workspace-server/internal/handlers/ssrf.go	2026-04-23 16:42:41 -07:00
Hongming Wang	baa7e1531f	feat(canvas): provider-picker MissingKeysModal for multi-provider runtimes Runtimes like Hermes and LangGraph accept any one of several LLM provider keys (OpenRouter OR OpenAI OR Anthropic OR Nous-native). Before this change, the missing-keys modal treated all supported providers as simultaneously required — a fresh user on Hermes was asked for three parallel API keys when any one suffices. Introduces RUNTIME_PROVIDERS in deploy-preflight.ts as the canonical per-runtime provider list (label, envVar, note). checkDeploySecrets now returns all alternatives as missingKeys when nothing is configured, so the modal can offer a picker. MissingKeysModal dispatches between two render paths: * ProviderPickerModal — radio list of supported providers, a single env input for the chosen one. Saving that one key satisfies the preflight. Activated whenever the runtime has ≥2 provider choices. * AllKeysModal — legacy parallel-inputs UX, all keys must be saved before deploy. Kept for single-provider runtimes (claude-code, gemini-cli) and callers that pass unrelated-key lists. Dual-mode preserves the pre-existing contract for every caller while fixing the multi-provider UX. All 930 canvas vitest tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 16:41:09 -07:00
Hongming Wang	03b56fa5af	fix(canvas): collapse Org Templates section by default in palette The TemplatePalette's Org Templates section rendered all cards inline, each ~120 px tall (name + description + "Import org" button). With 4 org templates on disk that's ~500 px of drawer height — the individual workspace templates at the top (AutoGen / LangGraph / Hermes / …) got pushed off-screen, which is the exact complaint from the test session ("templates still 90% org, cant even see normal workspace template"). Collapsed the Org Templates section by default. The header now toggles with an ▶ caret and shows the count ("Org Templates (4)"). Clicking expands to reveal the full card list; clicking again collapses. Persists only within a session — fresh mounts start collapsed so the primary deploy path stays visible. Individual workspace templates are the usual starting point (pick a runtime, deploy one agent), while org templates are a heavier "deploy this whole pre-built team" action. Making the second expandable matches the relative frequency. - `TemplatePalette.tsx::OrgTemplatesSection` — added `expanded` state (default false), wrapped the cards in `{expanded && …}`, turned the header into a toggle button with `aria-expanded` + `aria-controls`. - `__tests__/OrgTemplatesSection.test.tsx` — 3 new rendering tests: collapsed-by-default (cards absent), click expands (cards appear), click again collapses (cards gone). Mocks /org/templates with a 2-entry response so the count assertion is stable. Full canvas vitest: 930/930 pass (up from 927). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 16:24:49 -07:00
Hongming Wang	50ae33e8b3	Merge pull request #1885 from Molecule-AI/fix/ki005-security-clean [P0] fix(security): F1085/KI-005/CWE-78 — clean rebase onto staging	2026-04-23 16:11:03 -07:00
Hongming Wang	b4719ad070	fix(canvas): Legend avoids TemplatePalette + silence WS handshake races ### Two unrelated but small UI fixes surfaced while testing the Canvas 1. Legend hidden under the open TemplatePalette. Legend is `fixed bottom-6 left-4 z-30`. TemplatePalette's drawer (when open) is `fixed top-0 left-0 w-[280px] z-30` — same z-index, same left-edge column. The Legend overlapped the palette's bottom 180 px. Published the palette-open state to the canvas store so the Legend can shift right (to `left-[296px]` — 280 px palette + 16 px gap) while the palette is open, animated via a 200 ms `transition-[left]` to match the palette's slide. Closes cleanly back to `left-4` when the palette is dismissed. Files: - `store/canvas.ts` — added `templatePaletteOpen` + `setTemplatePaletteOpen`. - `TemplatePalette.tsx` — calls `setTemplatePaletteOpen(open)` on every open/close transition via a new useEffect. - `Legend.tsx` — reads the flag and swaps `left-4` <-> `left-[296px]`. 2. "WebSocket is closed before the connection is established" spam. Two components (`ChatTab`, `AgentCommsPanel`) open their own short- lived WebSocket to tail the ACTIVITY_LOGGED stream. Their cleanup path called `ws.close()` unconditionally, which trips a browser console warning when React StrictMode re-runs the effect in dev and the handshake hasn't completed yet. Confirmed via DevTools console on the running canvas. Added a `closeWebSocketGracefully(ws)` helper in `lib/ws-close.ts`: - OPEN / CLOSING → close immediately (normal path). - CONNECTING → defer close to the 'open' listener so the browser sees a full handshake. Also wires an 'error' listener that cancels the queued close if the handshake fails (no double-close). - CLOSED → no-op. Both consumers now call the helper in their useEffect cleanup. Silences the warning without changing observable behaviour. ### Tests `canvas/src/lib/__tests__/ws-close.test.ts` — 5 cases with a fake WebSocket covering each readyState branch plus the error-before-open cancellation path. Full vitest suite: 927/927 pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 16:03:01 -07:00
Hongming Wang	255fd3c192	Merge branch 'staging' into fix/ki005-security-clean	2026-04-23 16:01:01 -07:00
Hongming Wang	5eb5e38c59	fix(canvas): re-centre Toolbar on canvas area when SidePanel is open When a workspace is selected the SidePanel (fixed, right-0, z-50) opens from the right edge and covers the right third of the viewport. The Toolbar at the top was positioned `fixed top-3 left-1/2 -translate-x-1/2 z-20` — centred on the full viewport, not the remaining canvas area. Consequence: the right half of the Toolbar (Audit / Search / Help / Settings) was hidden behind the panel as soon as the user clicked any workspace. Fix: publish the live SidePanel width to the canvas store and read it in Toolbar. When a node is selected, shift the Toolbar LEFT by `sidePanelWidth / 2` so its centre lines up with the middle of the remaining canvas area. Animated via a 200 ms `transition-[margin-left]` to match the SidePanel's own slide-in easing. - `store/canvas.ts` — added `sidePanelWidth` + `setSidePanelWidth`. Default 480 (matches SIDEPANEL_DEFAULT_WIDTH). - `SidePanel.tsx` — calls `setSidePanelWidth(width)` on every width change so the store stays in sync with localStorage. - `Toolbar.tsx` — reads `sidePanelWidth`, applies a negative `marginLeft` style when `selectedNodeId` is non-null. - `SidePanel.tabs.test.tsx` — added `setSidePanelWidth: vi.fn()` to the mocked store state so SidePanel's new useEffect has a callable to invoke. 18 previously-passing tests now pass again. No visual regression when no workspace is selected — the toolbar stays in its original centred position. SaaS canvas unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 15:57:12 -07:00
Hongming Wang	6faea202b9	fix(a2a-queue): nil-safe drain + 202-requeue handling (followup to #1893 ) (#1896 ) * fix(a2a-queue): nil-safe error extraction in DrainQueueForWorkspace + handle 202-requeue The drain path called proxyErr.Response["error"].(string) without a comma- ok assertion. When proxyErr.Response had no "error" key (which happens in the 202-Accepted-queued branch I added in the same PR — that response is {"queued": true, "queue_id": ..., "queue_depth": ...}), the type assertion panicked and killed the platform process. The platform was down 25 minutes today before this was diagnosed. Fleet went from 30 real outputs/15min → 0 events. Two fixes here: 1. Treat 202 Accepted from the inner proxyA2ARequest as "re-queued" (target was busy AGAIN). Mark THIS attempt completed; the new queue row will be drained on the next heartbeat tick. Don't propagate as failure. 2. Defensive type-assertion when reading the error string. Falls back to http.StatusText, then a generic "unknown drain dispatch error" so the queue still gets a non-empty error_detail for ops debugging. Now the drain path can never panic on a malformed proxy response. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(a2a-queue): return (202, body, nil) so callers see queued-as-success Cycle 53 found callers logging 45× 'delegation failed: proxy a2a error' even though the queue's drain stats showed 48 completions in the same window. Investigation: my busy-error path returned return http.StatusAccepted, nil, &proxyA2AError{Status: 202, Response: ...} The non-nil proxyA2AError is the failure signal. Even with status=202, callers' `if proxyErr != nil` branch fires and logs the request as failed. The 202 status was meaningless — the response body was nil too, so the caller never even saw the queue_id/depth metadata. Fix: return success-shape so callers do NOT enter the error branch: respBody, _ := json.Marshal(gin.H{"queued": true, "queue_id": qid, ...}) return http.StatusAccepted, respBody, nil Net effect: queue continues to absorb busy-errors (working since #1893), AND callers correctly record the dispatch as queued-success rather than failed. Closes the cycle 53 misclassification that was making the queue look ineffective on activity_logs counts. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: molecule-ai[bot] <276602405+molecule-ai[bot]@users.noreply.github.com>	2026-04-23 22:55:43 +00:00
molecule-ai[bot]	254db21f6a	fix(ci): handle both module path formats in coverage-gate path-strip The sed stripping only handled platform/workspace-server/... paths, but go tool cover may emit platform/internal/... paths (without workspace-server/). When the pattern doesn't match, rel retains the full package import path and the allowlist grep -qxF fails to find the short entry (e.g. internal/handlers/tokens.go). Add a second substitution to strip the platform/ prefix as a fallback so both path formats normalize to the same allowlist-relative form.	2026-04-23 22:49:51 +00:00
Molecule AI Content Marketer	a95e0b363f	docs(blog + assets): MCP Server List blog post + OG image — v2 from staging blog: re-staged from origin/fix/chrome-devtools-mcp-tutorial assets: OG image (1200×630, dark tech, MCP teal) + og_image path fix (was: /2026-04-21-mcp-server-list-og.png — non-existent) now: /assets/blog/2026-04-20-mcp-server-list/og.png) Branch: origin/staging baseline (no conflicts) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-23 22:48:15 +00:00
Molecule AI Documentation Specialist	a14e361c18	fix(blog): remove fake /org/tokens/:id/logs endpoint reference The monitoring section referenced GET /org/tokens/:id/logs which does not exist. The org token API only exposes List/Create/Revoke (GET/POST/DELETE /org/tokens). Per-token activity logs via API are a planned feature, not yet built. Fixes: molecule-core#1914 - Replaced fake curl example with Canvas Activity Log path - Added roadmap note: per-token activity logs via API (planned) - Updated footer to include per-token activity logs on roadmap - Kept the operational guidance (monitor call patterns, revoke if suspicious) since the principle is correct even if the API is TBD	2026-04-23 22:38:59 +00:00
Hongming Wang	a0ac72f725	test(canvas): update a11y tests for T3 default tier CreateWorkspaceDialog.a11y.test.tsx's two tier-button tests assumed T1 was the default selection. After the previous commit flipped the non-SaaS default to T3, the radio group's default-selected button changed accordingly. Updated: - "tier buttons have role=radio and aria-checked reflects selection" — T3 is now `aria-checked="true"`, T1 is the "unselected" foil we click to verify the flip. - "selected radio has tabIndex=0, others have tabIndex=-1" — T3 is the tabindex=0 member now. The roving-tabIndex and ArrowDown / ArrowRight tests further down the file start by explicitly clicking/focusing T1 or T2, so they're unaffected by the default change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 15:37:23 -07:00
Hongming Wang	69408ab61a	Merge pull request #1913 from Molecule-AI/sync/staging-to-main-2026-04-23-final chore: sync staging → main (post 2026-04-23 bug wave, conflicts resolved)	2026-04-23 15:36:30 -07:00
Hongming Wang	2baaa977c7	feat(quickstart): default new agents to T3 (Privileged) Default tier for a newly-created workspace was T1 (Sandboxed) on self-hosted and T4 (Full Access) on SaaS. Real work needs at minimum a read_write workspace mount + Docker daemon access — that's T3 ("Privileged") per the tier ladder in CreateWorkspaceDialog. The user-visible consequence was that clicking "Deploy" on almost any template landed in a sandbox that couldn't actually run the agent's tooling until the user knew to bump the tier manually. ### Changes Platform (Go) — default tier flipped from 1→3 in two places so API callers (Canvas, molecli, org import) all get the same default: - `handlers/workspace.go`: `POST /workspaces` default when `tier` is omitted from the request body. - `handlers/template_import.go`: `generateDefaultConfig` writes `tier: 3` into the auto-generated `config.yaml` for bundle imports that don't declare one. Canvas — `CreateWorkspaceDialog.tsx` self-hosted form default flipped from T1→T3. SaaS stays at T4 (each SaaS workspace runs on its own sibling EC2, so the shared-blast-radius reasoning doesn't apply and we can safely go a tier higher). ### Tests Updated every sqlmock assertion that anchored on the old `tier=1` default: - `handlers_test.go::TestWorkspaceCreate` — default-path INSERT now expects `3`. - `handlers_additional_test.go::TestWorkspaceCreate_WithParentID` — same. - `workspace_test.go::TestWorkspaceCreate_DBInsertError` / `TestWorkspaceCreate_WithSecrets_Persists` — same. - `workspace_test.go::TestWorkspaceCreate_TemplateDefaults*` — same (current handler semantics ignore the template's `tier:` field and fall through to the default; kept tests faithful to the implementation, left a comment flagging the latent inconsistency). - `workspace_budget_test.go::TestWorkspaceBudget_Create_WithLimit` — same. - `template_import_test.go::TestGenerateDefaultConfig` — asserts `tier: 3` now. All `go test -race ./internal/handlers/` pass. Canvas `CreateWorkspaceDialog` tests don't assert the default tier (they only reference `tier` as prop data on stub workspaces) so no test update needed on that side. ### SaaS parity Zero behaviour change on hosted SaaS. The Go-side default only fires when the Canvas (or any caller) omits `tier` from the request body. The SaaS Canvas explicitly passes `tier: 4` from the CreateWorkspaceDialog `isSaaS ? 4 : 3` branch, so the Go default never runs on a SaaS request. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 15:34:22 -07:00
Hongming Wang	72158a0e96	Merge remote-tracking branch 'origin/main' into sync/staging-to-main-2026-04-23-final # Conflicts: # docs/ecosystem-watch.md # docs/marketing/battlecard/phase-34-partner-api-keys-battlecard.md # docs/marketing/launches/pr-1533-ec2-instance-connect-ssh.md	2026-04-23 15:32:49 -07:00
Hongming Wang	30ed7ba0b9	Merge pull request #1898 from Molecule-AI/fix/config-tab-runtime-model-hermes fix(canvas/config): load runtime+model from workspace metadata + hide misleading config.yaml error for hermes	2026-04-23 15:16:53 -07:00
molecule-ai[bot]	6c5bfe7cbf	Merge branch 'staging' into docs/saas-federation-tutorial	2026-04-23 22:13:11 +00:00
molecule-ai[bot]	371c9d4a81	Merge branch 'staging' into content-marketer/phase34-launch-post-v2	2026-04-23 22:12:09 +00:00
molecule-ai[bot]	b0198631e3	Merge branch 'staging' into content/a2a-v1-deep-dive	2026-04-23 22:11:37 +00:00
molecule-ai[bot]	70ff4252a8	Merge branch 'staging' into fix/config-tab-runtime-model-hermes	2026-04-23 22:11:06 +00:00
Hongming Wang	19cd5c9f4b	test(router): set ADMIN_TOKEN in TestTestTokenRoute_RequiresAdminAuth_WhenTokensExist The test asserts that AdminAuth rejects an unauthenticated request to the test-token route once any workspace token exists in the DB. It sets MOLECULE_ENV=development to enable the handler's gate. After this branch's AdminAuth Tier-1b hatch (middleware/devmode.go), MOLECULE_ENV=development + empty ADMIN_TOKEN becomes the explicit fail-open signal for local dev — so the request correctly passes AdminAuth and falls through to the handler, which then 500s on an unmocked DB lookup instead of the expected 401. The security property the test is protecting (no bearer → 401 when tokens exist) corresponds to the SaaS configuration where ADMIN_TOKEN is always set. Setting ADMIN_TOKEN in the test suppresses the dev-mode hatch and reaches AdminAuth's Tier-2 bearer check, which correctly aborts 401 with "admin auth required". No production behaviour change — the test is now verifying the path that actually runs in production (MOLECULE_ENV=production + ADMIN_TOKEN set). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 15:03:34 -07:00
Hongming Wang	06273b11ef	fix(canvas/config): load runtime+model from workspace metadata + hide misleading config.yaml error for hermes Canvas Config tab had 3 bugs visible on hermes workspaces (#1894): 1. Runtime dropdown showed "LangGraph (default)" even when the workspace's actual runtime was hermes — because the form only loaded runtime from config.yaml, and hermes doesn't use the platform's config.yaml template. 2. Model field was empty for the same reason. 3. "No config.yaml found" error appeared on hermes workspaces despite everything being fine — hermes manages its own config at ~/.hermes/config.yaml on the workspace host. Worse, clicking Save with the empty form would silently flip `runtime` back from `hermes` to `LangGraph (default)`. ## Fix - loadConfig now always fetches workspace metadata (runtime + model) via GET /workspaces/:id and GET /workspaces/:id/model BEFORE attempting the config.yaml fetch. These act as the source of truth for runtime and model when config.yaml doesn't set them. - RUNTIMES_WITH_OWN_CONFIG set lists runtimes that manage their own config outside the platform template (hermes, external). For these: - Missing config.yaml is NOT an error — no red banner shown. - An informational gray banner tells the user where to edit the runtime's config (e.g. "edit ~/.hermes/config.yaml via Terminal tab or the hermes CLI" for hermes). Closes #1894. Verified 2026-04-23 on user's hongmingwang tenant which runs hermes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 14:58:36 -07:00
Hongming Wang	de99a22ffc	fix(quickstart): hotfixes discovered during live testing session Five additional breakages surfaced while testing the restored stack end-to-end (spin up Hermes template → click node → open side panel → configure secrets → send chat). Each fix is narrowly scoped and has matching unit or e2e tests so they don't regress. ### 1. SSRF defence blocked loopback A2A on self-hosted Docker handlers/ssrf.go was rejecting `http://127.0.0.1:<port>` workspace URLs as loopback, so POST /workspaces/:id/a2a returned 502 on every Canvas chat send in local-dev. The provisioner on self-hosted Docker publishes each container's A2A port on 127.0.0.1:<ephemeral> — that's the only reachable address for the platform-on-host path. Added `devModeAllowsLoopback()` — allows loopback only when MOLECULE_ENV ∈ {development, dev}. SaaS (MOLECULE_ENV=production) continues to block loopback; every other blocked range (metadata 169.254/16, TEST-NET, CGNAT, link-local) stays blocked in dev mode. Tests: 5 new tests in ssrf_test.go covering dev-mode loopback, dev-mode short-alias ("dev"), production still blocks loopback, dev-mode still blocks every other range, and a 9-case table test of the predicate with case/whitespace/typo variants. ### 2. canvas/src/lib/api.ts: 401 → login redirect broke localhost Every 401 called `redirectToLogin()` which navigates to `/cp/auth/login`. That route exists only on SaaS (mounted by the cp_proxy when CP_UPSTREAM_URL is set). On localhost it 404s — users landed on a blank "404 page not found" instead of seeing the actual error they should fix. Gated the redirect on the SaaS-tenant slug check: on <slug>.moleculesai.app, redirect unchanged; on any non-SaaS host (localhost, LAN IP, reserved subdomains like app.moleculesai.app), throw a real error so the calling component can render a retry affordance. Tests: 4 new vitest cases in a dedicated api-401.test.ts (needs jsdom for window.location.hostname) — SaaS redirects, localhost throws, LAN hostname throws, reserved apex throws. ### 3. SecretsSection rendered a hardcoded key list config/secrets-section.tsx shipped a fixed COMMON_KEYS list (Anthropic / OpenAI / Google / SERP / Model Override) regardless of what the workspace's template actually needed. A Hermes workspace declaring MINIMAX_API_KEY in required_env got five irrelevant slots and nothing for the key it actually needed. Made the slot list template-driven via a new `requiredEnv?: string[]` prop passed down from ConfigTab. Added `KNOWN_LABELS` for well-known names and `humanizeKeyName` to turn arbitrary SCREAMING_SNAKE_CASE into a readable label (e.g. MINIMAX_API_KEY → "Minimax API Key"). Acronyms (API, URL, ID, SDK, MCP, LLM, AI) stay uppercase. Legacy fallback preserved when required_env is empty. Tests: 8 new vitest cases covering known-label lookup, humanise fallback, acronym preservation, deduplication, and both fallback paths. ### 4. Confusing placeholder in Required Env Vars field The TagList in ConfigTab labelled "Required Env Vars (from template)" is a DECLARATION field — stores variable names. The placeholder "e.g. CLAUDE_CODE_OAUTH_TOKEN" suggested that, but users naturally typed the value of their API key into the field instead. The actual values go in the Secrets section further down the tab. Relabelled to "Required Env Var Names (from template)", changed the placeholder to "variable NAME (e.g. ANTHROPIC_API_KEY) — not the value", and added a one-line helper below pointing to Secrets. ### 5. Agent chat replies rendered 2-3 times Three delivery paths can fire for a single agent reply — HTTP response to POST /a2a, A2A_RESPONSE WS event, and a send_message_to_user WS push. Paths 2↔3 were already guarded by `sendingFromAPIRef`; path 1 had no guard. Hermes emits both the reply body AND a send_message_to_user with the same text, which manifested as duplicate bubbles with identical timestamps. Added `appendMessageDeduped(prev, msg, windowMs = 3000)` in chat/types.ts — dedupes on (role, content) within a 3s window. Threaded into all three setMessages call sites. The window is short enough that legitimate repeat messages ("hi", "hi") from a real user/agent a few seconds apart still render. Tests: 8 new vitest cases covering empty history, different content, duplicate within window, different roles, window elapsed, stale match, malformed timestamps, and custom window. ### 6. New end-to-end regression test tests/e2e/test_dev_mode.sh — 7 HTTP assertions that run against a live platform with MOLECULE_ENV=development and catch regressions on all the dev-mode escape hatches in a single pass: AdminAuth (empty DB + after-token), WorkspaceAuth (/activity, /delegations), AdminAuth on /approvals/pending, and the populated /org/templates response. Shellcheck-clean. ### Test sweep - `go test -race ./internal/handlers/ ./internal/middleware/ ./internal/provisioner/` — all pass - `npx vitest run` in canvas — 922/922 pass (up from 902) - `shellcheck --severity=warning infra/scripts/setup.sh tests/e2e/test_dev_mode.sh` — clean - `bash tests/e2e/test_dev_mode.sh` — 7/7 pass against a live platform + populated template registry ### SaaS parity Every relaxation remains conditional on MOLECULE_ENV=development. Production tenants run MOLECULE_ENV=production (enforced by the secrets-encryption strict-init path) and always set ADMIN_TOKEN, so none of these code paths fire on hosted SaaS. Behaviour on real tenants is byte-for-byte unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 14:57:18 -07:00

... 17 18 19 20 21 ...

3663 Commits