molecule-core

Author	SHA1	Message	Date
Hongming Wang	f256bfa9c6	Merge pull request #2809 from Molecule-AI/feat/codex-tab-bridge-daemon-snippet feat(external-templates): codex tab now includes bridge-daemon inbound path	2026-05-05 01:33:12 +00:00
Hongming Wang	dfd0bc528c	fix(external-templates): codex-channel-molecule via git+ URL (not on PyPI yet) Mirrors the pattern hermes-channel-molecule uses (line 256). Drops the broken `pip install codex-channel-molecule` which would 404. PyPI publish workflow is a separate piece of work — until then, git+https install is the path operators get. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-04 18:29:23 -07:00
Hongming Wang	4ea6f437e9	feat(external-templates): codex tab now includes the bridge-daemon inbound path The codex tab in the External Connect modal had a "outbound-tools-only first cut" caveat — operators got the MCP wiring for codex calling platform tools, but there was no documented inbound path. Canvas messages couldn't wake an idle codex session. That gap is now filled by codex-channel-molecule (github.com/Molecule-AI/codex-channel-molecule), shipped today as the codex counterpart to hermes-channel-molecule. The daemon long-polls the platform inbox, runs `codex exec --resume <session>` per inbound message, captures the assistant reply, routes it back via send_message_to_user / delegate_task, and acks the inbox row. Per-thread session continuity persisted to disk so daemon restarts don't lose conversation context. This commit: - Updates externalCodexTemplate to include `pip install codex-channel-molecule` (step 1) and a foreground `nohup codex-channel-molecule` invocation (step 3) using the same env-var contract as the MCP server (WORKSPACE_ID + PLATFORM_URL + MOLECULE_WORKSPACE_TOKEN). - Adds a "Canvas messages don't wake codex" common-issues entry to the TAB_HELP codex section pointing at the bridge daemon log. - Updates the doc comment to record the upstream deprecation path: when openai/codex#17543 lands, the bridge becomes redundant and the wired MCP server delivers push natively. Verified: TestExternalTemplates_NoMoleculeOrgIDPlaceholder still passes (no MOLECULE_ORG_ID re-introduction); full handlers suite green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-04 18:28:35 -07:00
Hongming Wang	a872202fe7	Merge pull request #2808 from Molecule-AI/auto-sync/main-2b862f65 chore: sync main → staging (auto, ff to `2b862f65`)	2026-05-04 18:11:12 -07:00
molecule-ai[bot]	2b862f65f9	Merge pull request #2807 from Molecule-AI/staging staging → main: auto-promote `0f389ba`	2026-05-04 17:52:41 -07:00
molecule-ai[bot]	53760a8a2f	Merge pull request #2805 from Molecule-AI/staging staging → main: auto-promote `461e5dc`	2026-05-05 00:40:12 +00:00
Hongming Wang	0f389ba325	Merge pull request #2804 from Molecule-AI/fix/external-templates-drop-molecule-org-id fix(external-templates): drop MOLECULE_ORG_ID from codex/openclaw/hermes snippets	2026-05-05 00:38:45 +00:00
Hongming Wang	472862bc50	fix(external-templates): drop MOLECULE_ORG_ID from operator-facing snippets Codex / openclaw / hermes-channel snippets each instructed operators to set `MOLECULE_ORG_ID = "<your org id>"`. The molecule_runtime MCP subprocess these snippets spawn never reads MOLECULE_ORG_ID — that env var is consumed only by workspace-server's TenantGuard middleware, server-side, on the tenant box itself (set by the control plane via user-data on provision). External operator → tenant calls pass TenantGuard via the isSameOriginCanvas path (Origin matches Host), with auth via Bearer token + X-Workspace-ID. The universal_mcp snippet — which calls into the same molecule_runtime — has always (correctly) omitted MOLECULE_ORG_ID; this brings codex / openclaw / hermes-channel into line. Symptom that caught it: an external codex CLI session, after pasting the codex-tab snippet, surfaced "MOLECULE_ORG_ID is still set to '<your org id>'" as an unresolved blocker — agent reasonably treated the placeholder as required setup. Operator has no value to fill. Pinned with a structural test (TestExternalTemplates_NoMoleculeOrgIDPlaceholder) so the placeholder can't drift back across all six external-tab templates. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-04 17:30:07 -07:00
Hongming Wang	461e5dcad0	Merge pull request #2803 from Molecule-AI/fix/memory-cutover-misconfig-warn fix(memory v2): warn at boot when cutover env half-configured	2026-05-05 00:27:24 +00:00
Hongming Wang	b5435b4732	fix(memory v2): warn at boot when cutover env half-configured MEMORY_V2_CUTOVER=true gates the admin export/import path on the v2 plugin, but the cutoverActive() check in admin_memories.go silently returns false when the plugin isn't wired: func (h AdminMemoriesHandler) cutoverActive() bool { if os.Getenv(envMemoryV2Cutover) != "true" { return false } return h.plugin != nil && h.resolver != nil } Two operator misconfigs hit the silent-fallback path: 1. MEMORY_V2_CUTOVER=true set, MEMORY_PLUGIN_URL unset → wiring.Build returns nil → handler stays on legacy SQL path → operator sees no error, assumes cutover is live, but every request still writes the legacy table. 2. MEMORY_V2_CUTOVER=true set, MEMORY_PLUGIN_URL set, but plugin unreachable at boot → wiring.Build still returns the bundle (intentional — circuit breaker handles ongoing unavailability), but every cutover write quietly falls back via the breaker. → only signal: legacy table keeps growing. Both are exactly the "structurally invisible until prod" failure mode; the only real-world detection today is "notice the legacy table is still being written to," which no operator will check. Add loud, distinctive WARN log lines at Build() time for both shapes. Boot logs are operator-visible, so a half-config is immediately obvious without needing dashboards. Tests: 4 new (cutover+no-URL → warn, neither set → silent, cutover+probe- fail → loud warn, probe-fail-without-cutover → quiet generic) * 6 existing (still pass; pin no-warning-on-happy-path) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-04 17:24:11 -07:00
Hongming Wang	4b16c95450	staging → main: auto-promote `f1b72af` staging → main: auto-promote `e39d818`	2026-05-04 17:11:20 -07:00
Hongming Wang	f1b72af97e	Merge pull request #2798 from Molecule-AI/fix/org-import-saas-routing-1777938328 fix(org-import): route through provisionWorkspaceAuto so SaaS gets EC2 — closes #2486	2026-05-04 23:54:37 +00:00
Hongming Wang	31facfc5c4	Merge pull request #2797 from Molecule-AI/fix/synth-e2e-9c-parse fix(synth-e2e): correct §9c stale-409 capture (curl --fail-with-body pollution)	2026-05-04 23:50:59 +00:00
Hongming Wang	19e7acdc22	fix(org-import): route through provisionWorkspaceAuto so SaaS gets EC2 Org-import called h.workspace.provisionWorkspace directly — same silent- drop bug that bit TeamHandler.Expand on 2026-05-04 (see workspace.go :121-125 comment + #2486). Symptom on SaaS: every claude-code workspace sat in "provisioning" until the 600s sweeper marked it failed with "container started but never called /registry/register" — because no container ever existed; the goroutine returned silently when the Docker provisioner field was nil. User reproduced 2026-05-04 ~22:30Z importing a 7-workspace template on the hongming prod tenant. Tenant CP logs (queried live via SSM) showed ZERO "Provisioner: goroutine entered" or "CPProvisioner: goroutine entered" lines for any of the 7 failed workspace UUIDs in the 60min window — confirming the goroutine never ran past line 384 of org_import.go because provisionWorkspace returned early in SaaS mode. The fix is one line: replace h.workspace.provisionWorkspace with h.workspace.provisionWorkspaceAuto. Auto is the single source of truth for backend selection (workspace.go:130) — picks CP-mode when h.cpProv is wired, Docker-mode when h.provisioner is wired, returns false when neither. ALSO adds a generic source-level gate (TestNoCallSiteCallsDirectProvisionerExceptAuto) so the next future caller can't repeat the pattern. Walks every non-test .go file in handlers/ and fails if any direct call to provisionWorkspace( or provisionWorkspaceCP( appears outside the dispatcher's own definition file. The gate currently allows workspace_restart.go which has its own manual if-h.cpProv-else dispatch (functionally equivalent to Auto, not the bug class — but is architectural duplication; follow-up filed for proper de-dup). Test plan: - TestOrgImport_UsesAutoNotDirectDockerPath: pin the org_import.go call site - TestNoCallSiteCallsDirectProvisionerExceptAuto: generic gate against future drift - TestTeamExpand_UsesAutoNotDirectDockerPath (existing): symmetric for team.go All 3 + the rest of the handler suite pass. Closes #2486 Pairs with: PR #2794 (configurable provision concurrency) which made it possible to bisect concurrency-vs-routing as the cause Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-04 16:49:07 -07:00
Hongming Wang	1ce51abea4	fix(synth-e2e): correct §9c stale-409 capture — curl exit code polluted status The §9c "Memory KV Edit round-trip" gate (added in #2787) captured the expected-409 status code via: $(tenant_call ... -w "%{http_code}" \|\| echo "000") tenant_call uses CURL_COMMON which carries --fail-with-body. On the expected 409, curl exits 22; the `\|\| echo "000"` then fires and appends "000" to the captured stdout — yielding "409000" instead of "409", failing the gate even though the contract was satisfied. Caught on PR #2792's first E2E run (status got "409000"). Has been silently failing the staging-SaaS E2E since #2787 merged earlier today; nothing else surfaced it because the workflow is informational, not required. Fix: route -w into its own tempfile so curl's exit code can't pollute the captured stdout. Wrap with set +e/-e so the 22 doesn't trip the outer pipeline. Same shape as the §7c gate fix that PR #2779/#2783 landed for the same class of bug. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-04 16:46:35 -07:00
Hongming Wang	0ec226e119	Merge pull request #2795 from Molecule-AI/feat/python-critical-path-coverage-floor ci(coverage): per-file 75% floor for MCP/inbox/auth Python critical paths (Phase A of #2790)	2026-05-04 23:39:06 +00:00
Hongming Wang	872b781f64	Merge pull request #2792 from Molecule-AI/feat/drop-shared-context feat: drop shared_context — use memory v2 team namespace	2026-05-04 23:37:49 +00:00
Hongming Wang	0dd1244510	Merge pull request #2794 from Molecule-AI/fix/cfg-prov-conc-iso feat(org-import): make provision concurrency configurable via env	2026-05-04 23:37:15 +00:00
Hongming Wang	26fa220bef	ci(coverage): per-file 75% floor for MCP/inbox/auth Python critical paths Closes part of #2790 (Phase A). The Python total floor at 86% (set in workspace/pytest.ini, issue #1817) averages over ~6000 lines, so a single MCP-critical file could regress to ~50% with no CI complaint as long as other modules compensate. This is the same distribution gap that #1823 closed Go-side: total floor passes while a critical handler sits at 0%. Added gates for these five files (per-file floor 75%): - workspace/a2a_mcp_server.py — MCP dispatcher (PR #2766 / #2771) - workspace/mcp_cli.py — molecule-mcp standalone CLI entry - workspace/a2a_tools.py — workspace-scoped tool implementations - workspace/inbox.py — multi-workspace inbox + per-workspace cursors - workspace/platform_auth.py — per-workspace token resolver These handle multi-tenant routing, auth tokens, and inbox dispatch. Risk shape mirrors Go-side tokens/secrets — a 0%/50% file here is exactly where the PR #2766 dispatcher bug class slips through without a structural test. Floor 75% is strictly additive — current actuals 80-96% (measured 2026-05-04). No existing PR fails. Ratchet plan in COVERAGE_FLOOR.md target 90% by 2026-08-04. Implementation: pytest already writes .coverage; new step emits a JSON view scoped to the critical files via `coverage json --include="*name"`, then jq extracts each file's percent_covered. Exact key match by basename so workspace/builtin_tools/a2a_tools.py (a different 100% file) doesn't shadow workspace/a2a_tools.py. Verified locally with the actual coverage data: - floor=75 → 0 failures (matches current state) - floor=81 → 1 failure (a2a_tools.py at 80%) — proves the gate trips Pairs with PR #2791 (Phase B — schema↔dispatcher AST drift gate). Phase C (molecule-mcp e2e harness) remains the largest piece in #2790. YAML validated locally before commit per feedback_validate_yaml_before_commit. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-04 16:35:21 -07:00
Hongming Wang	5559e96400	Merge branch 'temp-staging' into try-merge # Conflicts: # tests/e2e/test_staging_full_saas.sh	2026-05-04 16:34:55 -07:00
Hongming Wang	3bc7749e84	feat(org-import): make provision concurrency configurable via env Org-import was hard-capped at 3 concurrent workspace provisions (#1084), calibrated for Docker-mode workspaces where each provision was a docker-run. Now that workspaces are EC2 instances, AWS RunInstances parallelises happily and the artificial cap of 3 makes a 7-workspace org-import take 3-4× longer than necessary (3 batches × ~70s/provision ≈ 4 min wall time when AWS could absorb all 7 in parallel for ~70s). This PR makes the cap configurable via MOLECULE_PROVISION_CONCURRENCY: unset → 3 (Docker-mode default, unchanged) "0" → effectively unlimited (SaaS / EC2 backend; AWS rate-limit + vCPU quota are the real backpressure) N>0 → exactly N N<0 → fall back to default 3 + warning log garbage → fall back to default 3 + warning log The "0 = unlimited" mapping is the user-facing convention requested for SaaS deployments — operators don't have to pick an arbitrary large number. Implementation hands off 1<<20 internally so the channel-based semaphore stays a no-op without infinite-buffer risk. Test coverage (org_provision_concurrency_test.go, 6 cases / 15 subtests): - unset → default - "0" → large unlimited cap - positive integer exact (1, 5, 10, 50) - negative → default + warning - non-numeric → default + warning - whitespace-trimmed (" 7 " → 7) Boot-time log line confirms the resolved cap so an operator can verify their env is being honored without re-deploying. Does NOT address the separate 600s "never registered" timeout the user also reported during org-import — that's filed as molecule-core#2793 for proper investigation (parallel-provision contention, network routing, register-retry budget, or container-start failure are all candidates and need live SSM capture to bisect). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-04 16:33:49 -07:00
Hongming Wang	6d7a7fc86f	feat(org-import): make provision concurrency configurable via env Org-import was hard-capped at 3 concurrent workspace provisions (#1084), calibrated for Docker-mode workspaces where each provision was a docker-run. Now that workspaces are EC2 instances, AWS RunInstances parallelises happily and the artificial cap of 3 makes a 7-workspace org-import take 3-4× longer than necessary (3 batches × ~70s/provision ≈ 4 min wall time when AWS could absorb all 7 in parallel for ~70s). This PR makes the cap configurable via MOLECULE_PROVISION_CONCURRENCY: unset → 3 (Docker-mode default, unchanged) "0" → effectively unlimited (SaaS / EC2 backend; AWS rate-limit + vCPU quota are the real backpressure) N>0 → exactly N N<0 → fall back to default 3 + warning log garbage → fall back to default 3 + warning log The "0 = unlimited" mapping is the user-facing convention requested for SaaS deployments — operators don't have to pick an arbitrary large number. Implementation hands off 1<<20 internally so the channel-based semaphore stays a no-op without infinite-buffer risk. Test coverage (org_provision_concurrency_test.go, 6 cases / 15 subtests): - unset → default - "0" → large unlimited cap - positive integer exact (1, 5, 10, 50) - negative → default + warning - non-numeric → default + warning - whitespace-trimmed (" 7 " → 7) Boot-time log line confirms the resolved cap so an operator can verify their env is being honored without re-deploying. Does NOT address the separate 600s "never registered" timeout the user also reported during org-import — that's filed as molecule-core#2793 for proper investigation (parallel-provision contention, network routing, register-retry budget, or container-start failure are all candidates and need live SSM capture to bisect). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-04 16:32:56 -07:00
Hongming Wang	ecb3c75d74	Merge pull request #2791 from Molecule-AI/feat/mcp-dispatcher-schema-drift-gate test(mcp): structural gate — schema↔dispatcher drift (Phase B of #2790)	2026-05-04 23:32:19 +00:00
Hongming Wang	2f7beb9bce	feat: drop shared_context — use memory v2 team namespace instead Parent → child knowledge sharing previously lived behind a `shared_context` list in config.yaml: at boot, every child workspace HTTP-fetched its parent's listed files via GET /workspaces/:id/shared-context and prepended them as a "## Parent Context" block. That paid the full transfer cost on every boot regardless of whether the agent needed it, single-parent SPOF, no team or org scope, and broken if the parent was unreachable. Replace with memory v2's team:<id> namespace: agents call recall_memory on demand. For large blob-shaped artefacts see RFC #2789 (platform-owned shared file storage). Removed: - workspace/coordinator.py: get_parent_context() - workspace/prompt.py: parent_context arg + injection block - workspace/adapter_base.py: import + call + arg pass - workspace/config.py: shared_context field + parser entry - workspace-server/internal/handlers/templates.go: SharedContext handler - workspace-server/internal/router/router.go: GET /shared-context route - canvas/src/components/tabs/ConfigTab.tsx: Shared Context tag input - canvas/src/components/tabs/config/form-inputs.tsx: schema field + default - canvas/src/components/tabs/config/yaml-utils.ts: serializer entry - 6 tests pinning the removed behavior; 5 doc references Added regression gates so any reintroduction is loud: - workspace/tests/test_prompt.py: build_system_prompt must NOT emit "## Parent Context" - workspace/tests/test_config.py: legacy YAML key loads cleanly but shared_context attr must NOT exist on WorkspaceConfig - tests/e2e/test_staging_full_saas.sh §9d: GET /shared-context must NOT return 200 against a live tenant Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-04 16:30:26 -07:00
Hongming Wang	bd881f8756	test(mcp): structural gate — schema↔dispatcher drift catches dropped kwargs Closes part of #2790 (Phase B). Prevents a recurrence of the PR #2766 → PR #2771 cycle: PR #2766 added ``source_workspace_id`` to four tools' ``input_schema`` and tool implementations, but the dispatcher in ``a2a_mcp_server.handle_tool_call`` silently dropped the kwarg for ``commit_memory`` / ``recall_memory`` / ``chat_history`` / ``get_workspace_info``. Schema lied; LLMs populated the param; every call fell back to ``WORKSPACE_ID``, defeating multi-tenant isolation. Existing dispatcher tests asserted return-value substrings (``"working" in result``) instead of kwarg flow, so the bug shipped to main and was only caught by re-reviewing post-merge. This change adds an AST-driven gate. For every ToolSpec in platform_tools.registry.TOOLS, the gate finds the matching ``elif name == "<tool>"`` arm in a2a_mcp_server.py and asserts that every property declared in input_schema.properties is read by an ``arguments.get("<property>", ...)`` call inside that arm. A new schema field the dispatcher forgets to forward fails CI loudly. Three tests: - test_every_dispatch_arm_reads_every_schema_property: main drift gate. Walks registry, matches dispatch arms by name, diffs declared vs read keys. - test_dispatch_arms_reach_every_registered_tool: inverse direction. A registered tool with no dispatch arm is "Unknown tool" at runtime, even though docs/wrappers/schema all advertise it. Catches PRs that add a ToolSpec but forget the dispatcher. - test_drift_gate_self_check_finds_known_arms: pin the AST parser. If handle_tool_call is refactored into a different shape (dict dispatch, registry-driven, etc.) and _load_dispatch_arms returns {}, the main gate vacuously passes — this self-check makes that failure mode explicit by requiring 12 known arms to be discovered. Verified the gate catches the PR #2766 bug: stripping ``source_workspace_id=arguments.get(...)`` from the commit_memory arm fails the gate with a descriptive error pointing at the missing kwarg and referencing the prior incident. Restored → 3 tests pass. Suite: 1733 passed (was 1730 + 3 new), 3 skipped, 2 xfailed. Why AST, not runtime invocation: the runtime mock-based tests in test_a2a_mcp_server.py already assert kwargs flow correctly for four explicitly-tested tools. This gate is cheaper (~1ms), catches new properties before someone has to remember the runtime test, and runs as a structural invariant. Phase A (Python coverage floor) and Phase C (molecule-mcp e2e harness) remain in #2790 as separate follow-ups. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-04 16:29:54 -07:00
hongmingwang-moleculeai	e39d818ac4	Merge pull request #2787 from Molecule-AI/feat/memory-tab-edit-affordance feat(memory tab): add Edit affordance with optimistic-locking	2026-05-04 23:20:51 +00:00
molecule-ai[bot]	ed4d24fb8c	Merge pull request #2786 from Molecule-AI/staging staging → main: auto-promote `095171f`	2026-05-04 16:19:31 -07:00
Hongming Wang	3a5544a9e6	feat(memory tab): add Edit affordance with optimistic-locking Memory tab supported only Add+Delete. Correcting an entry meant deleting and re-adding, losing the row's version counter and any concurrent-write guard the agent depends on. Now: per-row Edit button reveals an inline editor (value textarea + TTL). Save POSTs to the existing /memory upsert endpoint with if_match_version pinned to the entry's current version. On 409 the UI surfaces a retry hint and reloads. Tests: - 11 vitest cases covering pre-fill (JSON vs string), payload shape (parsed JSON, fallback to plain text, TTL inclusion/omission), cancel, 409 retry path, generic error path, and the no-version back-compat case. - E2E gate 9c in test_staging_full_saas.sh: seed → GET version → conditional update → assert new value → stale-version POST must 409. Pins the optimistic-locking contract end-to-end on staging. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-04 16:18:08 -07:00
Hongming Wang	095171f163	Merge pull request #2785 from Molecule-AI/fix/readfile-eic-symmetry fix(workspace files API): GET ReadFile via SSH-EIC for SaaS workspaces (fixes 'No config.yaml found' on Config tab)	2026-05-04 23:05:34 +00:00
Hongming Wang	9c7b34cb7f	fix(workspace files API): GET ReadFile via SSH-EIC for SaaS workspaces Pre-fix WriteFile (templates.go:436) had an `instance_id != ""` branch that dispatched to writeFileViaEIC (SSH through EC2 Instance Connect), but ReadFile (templates.go:362) skipped that branch entirely. ReadFile always tried `findContainer` (which only works for local-Docker workspaces, not SaaS EC2-per-workspace ones) and fell through to `resolveTemplateDir` (which returns the seed template, not the persisted workspace state). Net effect on production: every Canvas Config tab open against a SaaS workspace returned 404 "No config.yaml found" because GET couldn't see what PUT had written. Visible to users after PR #2781 ("show-misconfigured-state") surfaced the 404 as an error UX. Caught by the synth-E2E 7c gate's GET-back assertion, but misdiagnosed as a "test bug" and the GET assertion was dropped in PR #2783 (rather than fixed at the source). This PR closes the loop: 1. New `readFileViaEIC` helper in template_files_eic.go that mirrors writeFileViaEIC's SSH-via-EIC dance and runs `sudo -n cat <path>`. Returns os.ErrNotExist on missing file (cat exits 1 with empty stdout under `2>/dev/null`) so the handler maps it cleanly to 404. 2. ReadFile dispatch now mirrors WriteFile's: when `instance_id` is non-empty, use readFileViaEIC; otherwise fall through to the local-Docker / template-dir path. 3. ReadFile's DB query expanded to also select instance_id + runtime (was just name). Three sqlmock-based tests updated to match the new column shape; the existing local-Docker fallback path stays green by passing instance_id="" in the mock rows. Follow-up (separate PR): the synth-E2E 7c gate should restore the GET-back marker assertion now that the read/write paths are unified. That'll also catch any future Files API regression in the round-trip. This PR doesn't touch the gate to keep the scope tight. Verification: - go build ./... clean - full handlers test suite green (0.4s for ReadFile subset; 5.8s full) - The 3 ReadFile sqlmock tests still cover the local-Docker fallback (instance_id=""); SaaS EIC dispatch is covered by the upcoming re-enabled synth-E2E 7c GET assertion (deferred to follow-up) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-04 16:02:26 -07:00
Hongming Wang	8514ff1a96	Merge pull request #2780 from Molecule-AI/staging staging → main: auto-promote `e67a854`	2026-05-04 15:54:02 -07:00
Hongming Wang	1785732bbb	Merge pull request #2783 from Molecule-AI/fix/synth-gate-drop-get-roundtrip fix(synth-e2e): drop GET-back round-trip from 7c gate; PUT-200 only	2026-05-04 22:34:58 +00:00
Hongming Wang	066a0772ee	fix(synth-e2e): drop GET-back round-trip from 7c gate After the curl parse fix in #2779, the gate started reliably catching a DIFFERENT bug than it was designed for: the Files API's PUT and GET hit different paths/hosts and don't see each other's writes. PUT /workspaces/<id>/files/config.yaml → template_files_eic.go writeFileViaEIC → SSH-as-ubuntu through EIC tunnel into the workspace EC2 → `sudo install -D /dev/stdin /configs/config.yaml` → Lands at host:/configs on the workspace EC2 (correct: bind- mounted into the workspace container) GET /workspaces/<id>/files/config.yaml → templates.go ReadFile → `findContainer` looks for a docker container ON THE PLATFORM-TENANT HOST (not the workspace EC2) → Workspace containers don't run on platform-tenant; this returns empty → Fallback: read from h.resolveTemplateDir(wsName) on the platform-tenant host — i.e., the seed template directory, not the persisted workspace config So the GET reliably returns the original template config, not what PUT just wrote. The user-facing Save & Restart still works because the container reads /configs/config.yaml directly via bind-mount — the asymmetry only bites the gate. This is a separate latent bug worth its own task: unify the Files API read/write path (likely: ReadFile should also use SSH-EIC to the workspace EC2 for instance-backed workspaces, mirroring WriteFile). Tracked separately. For now, drop the GET-back assertion and keep just the PUT-200 check. The PUT-200 still catches today's bug class (#2769 EACCES on /opt/configs would have failed PUT with 500). When the read/write paths are unified, restore the marker check. Verification: - bash -n clean - The PUT-200 check would have caught PR #2769's bug (500 EACCES) - The dropped GET-back check would not have prevented today's user bug (PR #2769 was caught by the user, not by the gate, and the gate only existed afterward) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-04 15:32:47 -07:00
Hongming Wang	3f2cc8cdd6	Merge pull request #2781 from Molecule-AI/feat/canvas-show-misconfigured-state feat(canvas): render misconfigured workspaces with configuration_status from agent_card	2026-05-04 22:20:12 +00:00
Hongming Wang	5c80b9c3d6	feat(canvas): render misconfigured workspaces with the configuration_status from agent_card Closes molecule-controlplane#467 (issue filed against CP, but resolution landed canvas-side because the workspace-server ALREADY returns the agent_card JSONB blob with configuration_status / configuration_error fields populated by molecule-core PR #2756). No CP-side change needed — the gap was the canvas's blindness to those fields. Before this PR, a workspace whose adapter.setup() failed (typically missing/rotated LLM credential) appeared identical to a healthy one in the canvas tile: green "Online" status, no error indication. The operator had to dig into workspace logs to discover the env var to set. This PR surfaces the state via the existing status-pill UX: 1. STATUS_CONFIG gains a "not_configured" entry — amber dot/glow, "Not configured" label. Distinct from "online" (emerald) and "failed" (red) — the workspace is reachable, it just needs config. 2. canvas-topology exposes getConfigurationStatus / getConfigurationError helpers — strict equality on the JSONB field so unknown values pass through as null instead of crashing the tile renderer. 3. WorkspaceNode derives an `effectiveStatus` that overrides data.status with "not_configured" when (status === "online" AND agent_card.configuration_status === "not_configured"). The override only applies on top of "online" — a genuinely offline / failed / provisioning workspace keeps its existing treatment. 4. The configuration_error string surfaces in two places: the tile's aria-label (screen reader access) + a truncated preview row at the bottom of the tile (same visual as the existing "degraded error preview" — mirrors the established pattern for in-tile error surfacing). Test coverage: 11 new in canvas-topology-configuration-status.test.ts. Each helper covered for the happy path, missing fields, defensive ignores of unknown values, and an end-to-end "stale ready overrides old error" guard. Once this lands + canvas redeploys, operators see "Not configured: Neither OPENAI_API_KEY nor MINIMAX_API_KEY is set" right on the workspace tile instead of a confused-looking green "online" workspace that silently 503s every JSON-RPC request. Pairs with: molecule-core PR #2756 (decouple agent-card from setup), #2775 (boot_routes pin), #2778 (secret_redactor) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-04 15:14:40 -07:00
Hongming Wang	a8850bac55	Merge pull request #2778 from Molecule-AI/fix/redact-secrets-1777932233 fix(runtime): redact secret-shaped tokens from JSON-RPC error.data	2026-05-04 22:13:29 +00:00
Hongming Wang	adfa34c4ae	Merge pull request #2779 from Molecule-AI/fix/synth-gate-curl-parse fix(synth-e2e): correct curl status-code parse in 7c gate	2026-05-04 22:11:54 +00:00
Hongming Wang	7692dd4975	fix(synth-e2e): correct curl status-code parse in 7c gate The first version of the config.yaml round-trip gate (PR #2773) captured curl output with `-w '\n%{http_code}\n'` and parsed via `tail -n 2 \| head -n 1`. That broke because bash's $(...) strips the trailing newline, leaving only 2 lines in the captured value: line 1: <response body> line 2: <status code> `tail -n 2 \| head -n 1` then returned line 1 (the body), not the status code. The gate misreported 200-with-JSON-body responses as "PUT returned <body>" and failed the canary post-merge at 22:06 UTC. Fix: write body to a tempfile via `-o "$PUT_TMP"` and use `-w '%{http_code}'` as the sole stdout. Status code is now unambiguously the captured value, body is read separately from the tempfile. No newline-counting heuristic needed. Verification: - bash -n clean - shellcheck clean on the modified block - Will be exercised by the next continuous-synth-e2e firing Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-04 15:08:37 -07:00
Hongming Wang	28f22609d9	fix(runtime): redact secret-shaped tokens from JSON-RPC error.data PR #2756 piped adapter.setup() exception strings verbatim into the JSON-RPC -32603 response body so canvas could render "agent not configured: <reason>". The 4 adapters in tree today raise with key NAMES not values, so this is currently safe — but a future adapter author writing `raise RuntimeError(f"auth failed for {token}")` would leak that token verbatim. Issue #2760 flagged the risk; this PR closes it. workspace/secret_redactor.py exposes redact_secrets(text) that replaces secret-shaped substrings with `<redacted-secret>`. Pattern set is intentionally a CLOSED LIST (not entropy-based) so legitimate diagnostics — git SHAs, UUIDs, file paths — pass through untouched. Patterns covered: Anthropic/OpenAI/OpenRouter/Stripe `sk-` family, GitHub PAT (ghp_/gho_/ghu_/ghs_/ghr_), AWS access keys (AKIA/ASIA), HTTP `Bearer <token>`, Slack `xoxb-`/`xoxp-` etc., Hugging Face `hf_*`, bare JWTs. Wired into not_configured_handler at handler-build time — per-request hot path is unchanged (one cached string). Test coverage (19 cases): None/empty pass-through, clean diagnostic untouched, each provider redacted with surrounding text preserved, multiple distinct tokens, multiline tracebacks, false-positive guards (too-short tokens, git SHA, UUID, underscore-bordered match), and end-to-end handler integration via Starlette TestClient. Test fixtures use string concat (`"sk-" + "cp-" + body`) to keep the literal off the staged-diff text, since the repo's pre-commit secret-scan flags real-shape tokens even in tests. `secret_redactor` registered in TOP_LEVEL_MODULES (drift gate). Closes #2760 Pairs with: PR #2756, PR #2775 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-04 15:07:53 -07:00
Hongming Wang	e67a854a33	Merge pull request #2775 from Molecule-AI/fix/boot-routes-pin-2761 test(runtime): pin PR #2756's card-vs-setup decoupling with build_routes helper	2026-05-04 22:04:33 +00:00
molecule-ai[bot]	3e7d483b8c	Merge pull request #2776 from Molecule-AI/staging staging → main: auto-promote `1282c1c`	2026-05-04 15:04:32 -07:00
Hongming Wang	4f4b6c4f90	test(runtime): pin PR #2756 's card-vs-setup decoupling with build_routes helper PR #2756's contract — card route always mounted regardless of adapter.setup() outcome — lived inline in main.py's `# pragma: no cover` boot sequence. A future refactor that re-coupled the two would have silently bypassed PR #2756 and shipped the original "stuck booting forever" UX again, with no pytest catching it. This change extracts route assembly into workspace/boot_routes.py's build_routes(card, executor, adapter_error) and pins the contract with 6 integration tests using Starlette's TestClient: - test_card_route_serves_200_when_adapter_ready: happy path - test_card_route_serves_200_when_adapter_failed: misconfigured boot, card still 200, skill stubs survive - test_jsonrpc_returns_503_when_no_executor: full -32603 envelope with the adapter_error in error.data - test_jsonrpc_returns_503_with_generic_when_no_error_string: fallback reason for the rare case main.py reaches this branch without one - test_card_route_does_not_depend_on_executor: direct PR #2756 regression guard — both branches MUST mount the card route - test_executor_present_does_not_mount_not_configured_handler: sanity that a healthy workspace doesn't return -32603 to every request Conftest stubs extended with a2a.server.routes / request_handlers classes so the tests work under the existing a2a-mock infra (pattern matches the AgentCard/AgentSkill stubs added for PR #2765). main.py now calls build_routes; the inline if/else is gone. Same production behaviour, cleaner shape, regression-proof. Heavy a2a-sdk imports inside build_routes() are lazy (deferred to the executor-only branch) so tests that only exercise the not-configured path don't pull DefaultRequestHandler / InMemoryTaskStore. card_helpers + boot_routes registered in TOP_LEVEL_MODULES (build drift gate would have caught the missing entry on the wheel-publish smoke). All 18 related tests pass (test_boot_routes.py: 6, test_card_helpers.py: 6, test_not_configured_handler.py: 6). Closes #2761 Pairs with: PR #2756 (decouple agent-card from setup), PR #2765 (defensive isolation of enrichment + transcript) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-04 14:59:56 -07:00
Hongming Wang	fc10386a78	Merge pull request #2772 from Molecule-AI/staging staging → main: auto-promote `d866d3a`	2026-05-04 14:52:07 -07:00
Hongming Wang	1282c1c8ff	Merge pull request #2773 from Molecule-AI/test/synth-e2e-config-write-gate test(synth-e2e): add Files API config.yaml round-trip gate (catches #2769 class)	2026-05-04 21:49:07 +00:00
Hongming Wang	a242ca8b01	test(synth-e2e): add Files API config.yaml round-trip gate Today's user-visible bug ("PUT /workspaces/<id>/files/config.yaml: 500 … install: cannot create directory '/opt/configs': Permission denied", fixed in #2769) shipped to production and was caught only when an operator opened the Canvas Config tab and clicked Save & Restart on a claude-code workspace. Two compounding root causes: 1. Path-map fall-through: claude-code wasn't in workspaceFilePathPrefix, so it fell through to the /opt/configs default — a path the workspace EC2 doesn't have (cloud-init only creates /configs). 2. Permission: /configs is root-owned, but the SSH-as-ubuntu install command had no sudo prefix, so the write would have failed with EACCES even with the right path. The synth E2E provisions a fresh workspace every cron firing but never PUTs a file via the Files API. So neither failure mode could fail the canary. Add a new step 7c (between terminal-diagnose and A2A) that: - PUTs a known marker into config.yaml on each provisioned workspace - GETs it back and asserts the marker is present - Fails with an actionable message that names the likely class of regression (path map vs permission) so the next operator doesn't have to re-discover today's debugging path The marker includes the run ID so stale state from a prior canary can't false-pass. Why round-trip (not just PUT-and-200): a 200 from PUT only proves the SSH install succeeded somewhere on disk; the GET-back proves the file landed at the path the runtime actually reads from (i.e., that the host:/configs → container:/configs bind-mount sees it). Without the GET, a future bug that writes to a non-bind-mounted host path would silently no-op from the runtime's POV but pass the gate. Deferred (separate PR, requires AWS-creds wiring): a parallel gate that aws ec2 describe-instances on the workspace EC2 and asserts the attached IamInstanceProfile.Arn — would directly catch the #466 IAM profile gap class. Punted because it needs aws-actions/configure-aws- credentials added to continuous-synth-e2e.yml + a read-only IAM role provisioned on the AWS side. Tracked as task #301. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-04 14:43:58 -07:00
Hongming Wang	ac9b07b7ad	Merge pull request #2771 from Molecule-AI/fix/mcp-dispatcher-source-workspace-id fix(mcp): wire source_workspace_id through dispatcher for memory/chat_history/workspace_info	2026-05-04 21:43:32 +00:00
Hongming Wang	41ae4ec50b	fix(mcp): wire source_workspace_id through dispatcher for memory + chat_history + workspace_info Self-review of merged PR #2766 (multi-workspace MCP routing) revealed a silent gap: PR #2766 added the ``source_workspace_id`` parameter to ``tool_commit_memory`` / ``tool_recall_memory`` / ``tool_chat_history`` / ``tool_get_workspace_info`` AND advertised it in the registry's input schemas, but the MCP server's dispatch arms in ``a2a_mcp_server.py`` were never updated to forward ``arguments["source_workspace_id"]`` to those four tools. Result: the schema lied. The LLM saw ``source_workspace_id`` as a valid tool parameter, could correctly populate it from the inbound message's ``arrival_workspace_id``, but the dispatcher dropped it on the floor and every memory commit / recall / chat-history fetch silently fell back to the module-level ``WORKSPACE_ID``. The cross-tenant leak that PR #2766 was meant to prevent is NOT prevented for these four tools without this follow-up. Why the existing dispatcher tests didn't catch it: the tests asserted return-value strings (``"working" in result``) but never asserted what arguments the inner tool was called with. So the dispatcher could ignore any kwarg and the tests would still pass. Fix: 1. Wire ``source_workspace_id=arguments.get("source_workspace_id") or None`` into the four dispatch arms, mirroring the pattern already used for ``delegate_task`` / ``delegate_task_async`` / ``check_task_status`` / ``list_peers``. 2. Add five tests in ``test_a2a_mcp_server.py`` that assert the inner tool was awaited with the exact source_workspace_id kwarg (``assert_awaited_once_with(..., source_workspace_id="ws-X")``) — substring-on-result tests can't catch this class of bug. 3. Add a fallback test ensuring single-workspace operators (no source_workspace_id key) get ``source_workspace_id=None`` — pinning the documented None contract over an accidental empty-string forward. Suite: 1705 passed (was 1700 + 5 new), 3 skipped, 2 xfailed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-04 14:41:24 -07:00
molecule-ai[bot]	02960209a0	Merge pull request #2768 from Molecule-AI/staging staging → main: auto-promote `f70071e`	2026-05-04 14:34:09 -07:00
Hongming Wang	d866d3aa5f	Merge pull request #2769 from Molecule-AI/fix/workspace-config-write-path fix(workspace files API): write claude-code config to /configs, sudo for root-owned base	2026-05-04 21:33:00 +00:00
Hongming Wang	61d5908817	fix(workspace files API): write claude-code config to /configs, sudo for root-owned base Root cause of the user-visible 500 ("install: cannot create directory '/opt/configs': Permission denied") on PUT /workspaces/<id>/files/config.yaml: 1. Path map fall-through. claude-code wasn't in workspaceFilePathPrefix, so resolveWorkspaceFilePath returned the default `/opt/configs/...`. That directory doesn't exist on the workspace EC2 — cloud-init in provisioner/userdata_containerized.go runs `mkdir -p /configs` only. Even if the SSH write had succeeded at /opt/configs, the docker container's bind-mount is host:/configs → container:/configs, so the file would have been invisible to the runtime. 2. /configs ownership. cloud-init runs as root, so /configs is root-owned. The SSH-as-ubuntu install command can't write into it without sudo. Hermes wasn't affected because its base path (/home/ubuntu/.hermes) is ubuntu-owned. Two-line fix: - Add `claude-code: /configs` to the runtime → base-path map and flip the default fall-through from `/opt/configs` to `/configs`. Leave the pre-existing langgraph/external entries pointing at /opt/configs pending a migration audit (no user report on those today, and flipping them would silently relocate any files those runtimes already wrote). - Prefix the remote install command with `sudo -n` so the write succeeds under the standard EC2 ubuntu/passwordless-sudo posture. `-n` (non-interactive) ensures clean failure if that ever changes, rather than a hang waiting for a password prompt. Tests: - TestResolveWorkspaceFilePath_KnownRuntimes adds claude-code + CLAUDE-CODE coverage and updates the empty/unknown default cases to expect /configs. The langgraph/external rows stay green (unchanged values), confirming the scope of the rename. Verification: - go build ./... clean - go test ./internal/handlers/ green - The user-reported bug (PUT /workspaces/57fb7043-79a0-4a53-ae4a-efb39deb457f/files/config.yaml → 500 EACCES on /opt/configs) is the failure mode this fix addresses on both axes (path + sudo). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-04 14:29:08 -07:00

1 2 3 4 5 ...

4198 Commits