Commit Graph

290 Commits

Author SHA1 Message Date
7f90630f98 fix(tests): correct test_sanitize_agent_error_stderr_and_exc assertion
The test expected the exception class to be hidden when stderr is provided,
but the implementation always uses the exc type as the tag. Fix the
assertion to match actual (correct) behavior: ValueError is in the tag,
stderr is the body. Also add a check that we don't fall back to the
generic "workspace logs" form.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-11 22:59:41 +00:00
a92839e39a ci: verify publish-runtime pipeline end-to-end (internal#327)
Some checks failed
publish-runtime-autobump / bump-and-tag (pull_request) Has been skipped
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 17s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 24s
publish-runtime-autobump / pr-validate (pull_request) Successful in 1m4s
CI / Detect changes (pull_request) Successful in 1m12s
E2E API Smoke Test / detect-changes (pull_request) Successful in 1m21s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 1m23s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 1m23s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 1m15s
gate-check-v3 / gate-check (pull_request) Successful in 42s
qa-review / approved (pull_request) Failing after 22s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 8s
CI / Platform (Go) (pull_request) Successful in 12s
security-review / approved (pull_request) Failing after 24s
CI / Canvas (Next.js) (pull_request) Successful in 12s
sop-tier-check / tier-check (pull_request) Successful in 25s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 16s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 13s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 9s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 3m10s
audit-force-merge / audit (pull_request) Successful in 30s
CI / Python Lint & Test (pull_request) Successful in 7m57s
CI / all-required (pull_request) Successful in 5s
Marker file triggers workspace/** path filter on publish-runtime-autobump.yml,
exercising the full runtime publish pipeline after publish-runtime-bot
provisioning + stale-tag resolution.

Acceptance: bump-and-tag green, tag exists, publish-runtime.yml green,
PyPI updated, 9 template repos updated.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-11 20:26:55 +00:00
Molecule AI Core Platform Lead
6d5fd6be3e fix(workspace): wrap delegate_task return with sanitize_a2a_result (CWE-117, closes #537)
Some checks failed
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 18s
publish-runtime-autobump / bump-and-tag (pull_request) Has been skipped
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 17s
CI / Detect changes (pull_request) Successful in 49s
qa-review / approved (pull_request) Failing after 19s
security-review / approved (pull_request) Failing after 19s
gate-check-v3 / gate-check (pull_request) Failing after 34s
E2E API Smoke Test / detect-changes (pull_request) Successful in 56s
sop-tier-check / tier-check (pull_request) Successful in 17s
publish-runtime-autobump / pr-validate (pull_request) Successful in 47s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 1m0s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 47s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 50s
CI / Platform (Go) (pull_request) Successful in 12s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 11s
CI / Canvas (Next.js) (pull_request) Successful in 13s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 20s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 23s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 22s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
audit-force-merge / audit (pull_request) Successful in 18s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 2m53s
CI / Python Lint & Test (pull_request) Successful in 7m36s
Issue #537: builtin_tools/a2a_tools.py:72 returns peer-sourced text from
delegate_task() without OFFSEC-003 sanitization. Sibling regression to #491 / #492
in a different code path (google-adk delegation surface).

Fix: import sanitize_a2a_result from _sanitize_a2a and wrap all 4 peer-controlled
return sites in delegate_task() — parts[0].text path, empty-parts str(result) path,
fallback str(result) path, and the error message path.

Closes #537.
2026-05-11 19:09:18 +00:00
389613bb95 fix(tests): correct assert in test_sanitize_agent_error_stderr_and_exc
Some checks failed
publish-runtime-autobump / bump-and-tag (pull_request) Has been skipped
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 17s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 21s
publish-runtime-autobump / pr-validate (pull_request) Successful in 50s
E2E API Smoke Test / detect-changes (pull_request) Successful in 1m3s
sop-tier-check / tier-check (pull_request) Successful in 17s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 1m3s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 1m4s
CI / Detect changes (pull_request) Successful in 1m9s
gate-check-v3 / gate-check (pull_request) Failing after 24s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 55s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 11s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 10s
CI / Platform (Go) (pull_request) Successful in 9s
CI / Canvas (Next.js) (pull_request) Successful in 9s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 5s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 9s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
audit-force-merge / audit (pull_request) Successful in 22s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 2m41s
CI / Python Lint & Test (pull_request) Successful in 7m25s
The exc class IS the tag when stderr is provided:
  "Agent error (ValueError): rate limit exceeded"

Fixes the incorrect assertion added in PR #517.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-11 18:21:19 +00:00
6a2a5a6018 fix(workspace): include ~1KB sanitized stderr in A2A error responses
Adds an optional `stderr` parameter to sanitize_agent_error(). When
provided, up to 1 KB of stderr text is included in the A2A error
response after sanitization (API keys / bearer tokens ≥20 chars /
long paths redacted). The existing generic form is preserved when
stderr is absent. Updates both the main a2a_executor and the google-adk
adapter.

Closes: roadmap item — SDK executor stderr swallowing.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-11 18:21:19 +00:00
50319b69f2 fix(workspace): patch enrich_peer_metadata directly in test
All checks were successful
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 15s
CI / Detect changes (pull_request) Successful in 44s
E2E API Smoke Test / detect-changes (pull_request) Successful in 47s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 40s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 11s
sop-tier-check / tier-check (pull_request) Successful in 11s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 27s
CI / Platform (Go) (pull_request) Successful in 7s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 6s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 28s
CI / Canvas (Next.js) (pull_request) Successful in 8s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 7s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 7s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 6s
audit-force-merge / audit (pull_request) Successful in 11s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 2m7s
CI / Python Lint & Test (pull_request) Successful in 6m58s
test_blocks_until_inflight_completes used patch("a2a_client.httpx.Client")
to mock the HTTP call, but httpx.Client is created inside the background
worker thread AFTER the patch context manager exits — the executor thread
was created before the patch, so it uses the original httpx module.

The httpx patch approach fails reliably when running with
test_envelope_enrichment_fetches_on_cache_miss (different httpx patch,
different peer ID, same executor thread pool). Fix: directly replace
enrich_peer_metadata on the module so the replacement is visible to the
background worker regardless of thread creation timing.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-11 17:25:46 +00:00
1380bf0907 fix(a2a): add cache-first check to enrich_peer_metadata_nonblocking
All checks were successful
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 15s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 15s
CI / Detect changes (pull_request) Successful in 59s
E2E API Smoke Test / detect-changes (pull_request) Successful in 1m1s
CI / Platform (Go) (pull_request) Successful in 8s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 1m6s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 7s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 1m7s
CI / Canvas (Next.js) (pull_request) Successful in 9s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 1m11s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
sop-tier-check / tier-check (pull_request) Successful in 20s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 8s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 10s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 9s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 2m16s
CI / Python Lint & Test (pull_request) Successful in 6m54s
audit-force-merge / audit (pull_request) Successful in 15s
enrich_peer_metadata_nonblocking (a2a_client.py) never checked the
_peer_metadata cache before scheduling a background fetch — it always
returned None and always fired the executor thread pool. The docstring
promised "cache hit: return the cached record" but the code did not
implement it.

Fix: add the same TTL-check that enrich_peer_metadata uses before
scheduling the worker. On a warm cache hit the function now returns
immediately without touching the in-flight set or the executor.

Closes the remaining 5 test failures in test_a2a_mcp_server.py on main
that were not covered by PR #508's test-assertions fix.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-11 16:59:54 +00:00
ec20cd04ba fix(workspace): update 3 test assertions for OFFSEC-003 boundary wrapping (PR #477)
Some checks failed
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 7s
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 10s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 8s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 14s
sop-tier-check / tier-check (pull_request) Successful in 16s
CI / Detect changes (pull_request) Successful in 36s
E2E API Smoke Test / detect-changes (pull_request) Successful in 40s
CI / Platform (Go) (pull_request) Successful in 7s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 44s
audit-force-merge / audit (pull_request) Successful in 15s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 44s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 8s
CI / Canvas (Next.js) (pull_request) Successful in 9s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 46s
CI / Python Lint & Test (pull_request) Failing after 6m44s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 8s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 2m13s
PR #477 added _A2A_BOUNDARY_START/END wrapping to tool_delegate_task's
success path. Three tests in test_delegation_sync_via_polling.py were
still asserting exact raw strings and broke:

  test_flag_off_uses_send_a2a_message_not_polling
  test_queued_sentinel_triggers_polling_fallback
  test_non_queued_send_result_does_not_trigger_fallback

Fix: check for boundary markers + inner content instead of exact match.
Import _A2A_BOUNDARY_START/END from _sanitize_a2a in the affected
test methods.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-11 16:29:31 +00:00
40ca44aa4d chore(workspace): remove unused imports and f-string prefixes
Some checks failed
audit-force-merge / audit (pull_request) Successful in 5s
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 5s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 6s
sop-tier-check / tier-check (pull_request) Successful in 7s
CI / Detect changes (pull_request) Successful in 12s
E2E API Smoke Test / detect-changes (pull_request) Successful in 12s
CI / Python Lint & Test (pull_request) Failing after 6m20s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 11s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 12s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 12s
CI / Platform (Go) (pull_request) Successful in 2s
CI / Canvas (Next.js) (pull_request) Successful in 2s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 2s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 3s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 3s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 3s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 1m33s
- test_a2a_tools_delegation.py: remove unused `import os`
- test_a2a_tools_impl.py: remove unused `import sys` and `import pytest`
- test_a2a_sanitization.py: remove unused `import pytest` and fix
  two f-strings with no placeholders (extra `f` prefix)

All 27 related tests still pass.
2026-05-11 16:10:17 +00:00
92f3a17a17 test(workspace): add 17-case coverage for enrich_peer_metadata + nonblocking + worker (#502)
Some checks failed
E2E Staging Canvas (Playwright) / detect-changes (push) Successful in 24s
E2E API Smoke Test / detect-changes (push) Successful in 25s
Handlers Postgres Integration / detect-changes (push) Successful in 24s
CI / Python Lint & Test (push) Failing after 6m53s
Runtime PR-Built Compatibility / detect-changes (push) Successful in 22s
Staging SaaS smoke (every 30 min) / Staging SaaS smoke (push) Failing after 4m40s
CI / Platform (Go) (push) Successful in 6s
main-red-watchdog / watchdog (push) Successful in 25s
CI / Canvas (Next.js) (push) Successful in 6s
Sweep stale e2e-* orgs (staging) / Sweep e2e orgs (push) Successful in 8s
Continuous synthetic E2E (staging) / Synthetic E2E against staging (push) Failing after 5m30s
CI / Shellcheck (E2E scripts) (push) Successful in 3s
E2E API Smoke Test / E2E API Smoke Test (push) Successful in 5s
Handlers Postgres Integration / Handlers Postgres Integration (push) Successful in 6s
CI / Canvas Deploy Reminder (push) Has been skipped
E2E Staging Canvas (Playwright) / Canvas tabs E2E (push) Successful in 9s
Block internal-flavored paths / Block forbidden paths (push) Successful in 10s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (push) Successful in 2m10s
publish-runtime-autobump / autobump-and-tag (push) Failing after 46s
Secret scan / Scan diff for credential-shaped strings (push) Successful in 7s
CI / Detect changes (push) Successful in 23s
Co-authored-by: Molecule AI · core-be <core-be@agents.moleculesai.app>
Co-committed-by: Molecule AI · core-be <core-be@agents.moleculesai.app>
2026-05-11 15:56:25 +00:00
7b783aa2ed fix(workspace): poll activity_logs for a2a_proxy delegation results (closes #354) (#501)
Some checks failed
CI / Python Lint & Test (push) Has been cancelled
Block internal-flavored paths / Block forbidden paths (push) Successful in 7s
Secret scan / Scan diff for credential-shaped strings (push) Successful in 6s
CI / Detect changes (push) Successful in 19s
E2E API Smoke Test / detect-changes (push) Successful in 21s
E2E Staging Canvas (Playwright) / detect-changes (push) Successful in 21s
Handlers Postgres Integration / detect-changes (push) Successful in 20s
Runtime PR-Built Compatibility / detect-changes (push) Successful in 20s
CI / Shellcheck (E2E scripts) (push) Successful in 4s
CI / Platform (Go) (push) Successful in 5s
CI / Canvas (Next.js) (push) Successful in 5s
CI / Canvas Deploy Reminder (push) Has been skipped
E2E API Smoke Test / E2E API Smoke Test (push) Successful in 7s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (push) Successful in 7s
Handlers Postgres Integration / Handlers Postgres Integration (push) Successful in 5s
publish-runtime-autobump / autobump-and-tag (push) Failing after 41s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (push) Successful in 1m47s
Co-authored-by: Molecule AI · core-be <core-be@agents.moleculesai.app>
Co-committed-by: Molecule AI · core-be <core-be@agents.moleculesai.app>
2026-05-11 15:53:05 +00:00
952bfb3ca2 fix(workspace): replace asyncio.get_event_loop().run_until_complete with asyncio.run() (#307) (#498)
Some checks failed
Block internal-flavored paths / Block forbidden paths (push) Successful in 18s
Harness Replays / detect-changes (push) Failing after 18s
Lint curl status-code capture / Scan workflows for curl status-capture pollution (push) Successful in 17s
Harness Replays / Harness Replays (push) Has been skipped
publish-workspace-server-image / build-and-push (push) Failing after 16s
CI / Detect changes (push) Successful in 1m26s
E2E API Smoke Test / detect-changes (push) Successful in 1m17s
E2E Staging Canvas (Playwright) / detect-changes (push) Successful in 1m19s
Handlers Postgres Integration / detect-changes (push) Successful in 1m12s
Secret scan / Scan diff for credential-shaped strings (push) Successful in 18s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (push) Successful in 11s
publish-runtime-autobump / autobump-and-tag (push) Failing after 1m19s
Runtime PR-Built Compatibility / detect-changes (push) Successful in 47s
CI / Canvas (Next.js) (push) Successful in 11s
CI / Shellcheck (E2E scripts) (push) Successful in 8s
CI / Canvas Deploy Reminder (push) Has been skipped
E2E Staging External Runtime / E2E Staging External Runtime (push) Successful in 5m40s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (push) Successful in 3m9s
E2E API Smoke Test / E2E API Smoke Test (push) Failing after 5m31s
Handlers Postgres Integration / Handlers Postgres Integration (push) Successful in 6m21s
Sweep stale e2e-* orgs (staging) / Sweep e2e orgs (push) Successful in 19s
Sweep stale Cloudflare Tunnels / Sweep CF tunnels (push) Failing after 23s
CI / Python Lint & Test (push) Failing after 7m38s
Continuous synthetic E2E (staging) / Synthetic E2E against staging (push) Failing after 4m36s
CI / Platform (Go) (push) Has been cancelled
Co-authored-by: core-be <core-be@agents.moleculesai.app>
Co-committed-by: core-be <core-be@agents.moleculesai.app>
2026-05-11 15:37:34 +00:00
3d73fb1a72 Merge branch 'main' into sre/fix-test-polling-sanitization
Some checks failed
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 5s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 9s
sop-tier-check / tier-check (pull_request) Successful in 10s
CI / Detect changes (pull_request) Successful in 15s
E2E API Smoke Test / detect-changes (pull_request) Successful in 14s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 15s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 14s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 15s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 1s
CI / Platform (Go) (pull_request) Successful in 1s
CI / Canvas (Next.js) (pull_request) Successful in 2s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 4s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 3s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 2s
audit-force-merge / audit (pull_request) Successful in 4s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 1m48s
CI / Python Lint & Test (pull_request) Failing after 6m31s
2026-05-11 15:28:34 +00:00
d7de4afad4 fix: TestPollingPathSanitization regression — 3 bugs, correct assertions
Some checks failed
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 11s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 15s
CI / Detect changes (pull_request) Successful in 38s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 36s
E2E API Smoke Test / detect-changes (pull_request) Successful in 40s
sop-tier-check / tier-check (pull_request) Successful in 17s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 39s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 38s
CI / Platform (Go) (pull_request) Successful in 7s
CI / Canvas (Next.js) (pull_request) Successful in 8s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 7s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 9s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 8s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 9s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 2m0s
CI / Python Lint & Test (pull_request) Failing after 6m36s
Three bugs introduced in PR #477:
1. fake_discover(ws_id) missing source_workspace_id kwarg — discover_peer
   signature is (target_id, source_workspace_id=None).
2. Direct attribute assignment (d._delegate_sync_via_polling = ...)
   does not replace module-level 'from module import name' bindings
   resolved at call time; must use monkeypatch.setattr.
3. Assertions checked for [A2A_RESULT_FROM_PEER] but the polling path
   uses _A2A_BOUNDARY_START/END — _A2A_RESULT_FROM_PEER is added by
   send_a2a_message (messaging path), not by _delegate_sync_via_polling.

Additionally: monkeypatch.setenv("DELEGATION_SYNC_VIA_INBOX", "1") forces
the polling code path so the test exercises the correct logic regardless
of environment defaults.

Closes #495.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-11 15:22:16 +00:00
c4dcfbb089 fix(workspace): default PLATFORM_URL to host.docker.internal in all modules (#475)
Some checks failed
Block internal-flavored paths / Block forbidden paths (push) Successful in 6s
Secret scan / Scan diff for credential-shaped strings (push) Successful in 7s
CI / Detect changes (push) Successful in 13s
E2E API Smoke Test / detect-changes (push) Successful in 14s
E2E Staging Canvas (Playwright) / detect-changes (push) Successful in 14s
Handlers Postgres Integration / detect-changes (push) Successful in 15s
Runtime PR-Built Compatibility / detect-changes (push) Successful in 14s
CI / Platform (Go) (push) Successful in 4s
CI / Canvas (Next.js) (push) Successful in 5s
CI / Shellcheck (E2E scripts) (push) Successful in 4s
CI / Canvas Deploy Reminder (push) Has been skipped
E2E API Smoke Test / E2E API Smoke Test (push) Successful in 6s
Handlers Postgres Integration / Handlers Postgres Integration (push) Successful in 5s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (push) Successful in 6s
publish-runtime-autobump / autobump-and-tag (push) Failing after 33s
CI / Python Lint & Test (push) Failing after 1m13s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (push) Successful in 1m33s
Continuous synthetic E2E (staging) / Synthetic E2E against staging (push) Failing after 4m49s
Co-authored-by: Molecule AI Infra-Runtime-BE <infra-runtime-be@agents.moleculesai.app>
Co-committed-by: Molecule AI Infra-Runtime-BE <infra-runtime-be@agents.moleculesai.app>
2026-05-11 15:17:53 +00:00
635a42745a fix(workspace): OFFSEC-003 — separate sanitize vs. wrap, fix tool_delegate_task (#477)
Some checks failed
Block internal-flavored paths / Block forbidden paths (push) Successful in 5s
Secret scan / Scan diff for credential-shaped strings (push) Successful in 7s
CI / Detect changes (push) Successful in 14s
E2E API Smoke Test / detect-changes (push) Successful in 15s
E2E Staging Canvas (Playwright) / detect-changes (push) Successful in 15s
Runtime PR-Built Compatibility / detect-changes (push) Successful in 16s
Handlers Postgres Integration / detect-changes (push) Successful in 17s
CI / Platform (Go) (push) Successful in 4s
CI / Canvas (Next.js) (push) Successful in 4s
CI / Shellcheck (E2E scripts) (push) Successful in 4s
CI / Canvas Deploy Reminder (push) Has been skipped
E2E Staging Canvas (Playwright) / Canvas tabs E2E (push) Successful in 6s
E2E API Smoke Test / E2E API Smoke Test (push) Successful in 6s
Handlers Postgres Integration / Handlers Postgres Integration (push) Successful in 4s
publish-runtime-autobump / autobump-and-tag (push) Failing after 37s
CI / Python Lint & Test (push) Failing after 1m15s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (push) Successful in 1m35s
Sweep stale e2e-* orgs (staging) / Sweep e2e orgs (push) Successful in 2s
Sweep stale Cloudflare DNS records / Sweep CF orphans (push) Failing after 5s
Continuous synthetic E2E (staging) / Synthetic E2E against staging (push) Failing after 5m17s
ci-required-drift / drift (push) Failing after 51s
Co-authored-by: Molecule AI Infra-Runtime-BE <infra-runtime-be@agents.moleculesai.app>
Co-committed-by: Molecule AI Infra-Runtime-BE <infra-runtime-be@agents.moleculesai.app>
2026-05-11 15:10:25 +00:00
4a7e1bd988 refactor(workspace): extract idle-loop pending-check guard for direct unit-testing
Follows up on #432 (merged). Extracts _check_delegation_results_pending()
from the inline guard in _run_idle_loop() so tests can call the real
production function directly via patch(builtins.open, ...).

Fixes #401: the previous test used a mirror copy of the guard logic,
which risks drifting from the production implementation over time.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-11 10:49:40 +00:00
318e0ad742 fix(workspace): skip idle prompt when delegation results are pending (#381) (#432)
Some checks failed
CI / Detect changes (push) Waiting to run
CI / Platform (Go) (push) Blocked by required conditions
CI / Canvas (Next.js) (push) Blocked by required conditions
CI / Shellcheck (E2E scripts) (push) Blocked by required conditions
CI / Canvas Deploy Reminder (push) Blocked by required conditions
CI / Python Lint & Test (push) Blocked by required conditions
E2E API Smoke Test / E2E API Smoke Test (push) Blocked by required conditions
E2E Staging Canvas (Playwright) / Canvas tabs E2E (push) Blocked by required conditions
Handlers Postgres Integration / Handlers Postgres Integration (push) Blocked by required conditions
Runtime PR-Built Compatibility / PR-built wheel + import smoke (push) Blocked by required conditions
Block internal-flavored paths / Block forbidden paths (push) Successful in 13s
E2E API Smoke Test / detect-changes (push) Successful in 1m12s
E2E Staging Canvas (Playwright) / detect-changes (push) Successful in 1m16s
Handlers Postgres Integration / detect-changes (push) Successful in 1m13s
Secret scan / Scan diff for credential-shaped strings (push) Successful in 18s
Runtime PR-Built Compatibility / detect-changes (push) Successful in 1m3s
publish-runtime-autobump / autobump-and-tag (push) Failing after 1m34s
Continuous synthetic E2E (staging) / Synthetic E2E against staging (push) Has started running
Co-authored-by: Molecule AI Infra-Runtime-BE <infra-runtime-be@agents.moleculesai.app>
Co-committed-by: Molecule AI Infra-Runtime-BE <infra-runtime-be@agents.moleculesai.app>
2026-05-11 09:30:32 +00:00
39db2e6d73 fix(workspace): complete OFFSEC-003 fix — promote full sanitization to main
Some checks failed
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 16s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 20s
sop-tier-check / tier-check (pull_request) Successful in 20s
CI / Detect changes (pull_request) Successful in 59s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 1m3s
E2E API Smoke Test / detect-changes (pull_request) Successful in 1m8s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 58s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 57s
audit-force-merge / audit (pull_request) Successful in 20s
CI / Platform (Go) (pull_request) Successful in 9s
CI / Canvas (Next.js) (pull_request) Successful in 9s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 6s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 9s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 10s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 9s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 2m29s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
CI / Python Lint & Test (pull_request) Failing after 7m40s
Promotes the complete OFFSEC-003 boundary-marker sanitization from staging
to main, including:

- _delegate_sync_via_polling: sanitize response_preview and error strings
  before returning (OFFSEC-003 polling-path fix from PR #417).
- tool_check_task_status JSON endpoint: sanitize summary + response_preview
  in both the task_id filter path and the list path.
- tool_delegate_task non-polling path: preserve main's existing
  sanitize_a2a_result(result) wrapper (staging accidentally removed it).

Closes #418.

Co-Authored-By: Molecule AI · core-be <core-be@agents.moleculesai.app>
2026-05-11 08:51:45 +00:00
93b7d9a88a fix(a2a_tools): add comment + test coverage for string-form error handling in delegate_task
All checks were successful
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 8s
sop-tier-check / tier-check (pull_request) Manual override — infra#241 duplicate runner fails immediately. PR only adds comment + tests to a2a_tools.py. core-qa APPROVED.
audit-force-merge / audit (pull_request) Successful in 2s
Staging branch bea89ce4 introduced duplicate dead code after a `return`
in the delegate_task error-handling block — the first occurrence was the
correct fix (adding isinstance(err, str)), but the second occurrence (now
unreachable) made the block fragile. Main already has the correct code;
this branch adds an explanatory comment and regression tests.

The non-tool delegate_task() in a2a_tools.py uses httpx.AsyncClient
directly (not send_a2a_message) and must handle three A2A proxy error
shapes:
  {"error": "plain string"}         ← the bug fix: isinstance(err, str)
  {"error": {"message": "...", ...}} ← pre-existing path
  {"error": {"nested": "object"}}    ← falls through to str(err)

Adds TestDelegateTaskDirect:
  test_string_form_error_returns_error_message  — regression for AttributeError
  test_dict_form_error_returns_error_message    — pre-existing path still works
  test_success_returns_result_text               — happy path still works

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-11 05:51:48 +00:00
99f3cf7c8f [core-be-agent] fix(#354): wire delegation-results consumer into a2a executor
Close the A2A delegation auto-resume gap.

Root cause: heartbeat.py's _check_delegations already writes completed
delegation rows to DELEGATION_RESULTS_FILE and sends a self-message to
wake the agent. executor_helpers.read_delegation_results() was defined to
atomically consume that file, but a2a_executor._core_execute() never
called it — so delegation results were written but the agent never saw
them.

Fix: call read_delegation_results() at the top of _core_execute() and
prepend the results to the user input context so the agent can act on
them without an explicit check_task_status call. The Temporal durable
workflow path is also covered because it calls _core_execute() directly.

Test: two new cases — delegation results injected when file exists;
user input passed through unchanged when file is empty.

Closes molecule-core#354.
2026-05-11 02:49:32 +00:00
3eb3609b0c test(workspace): add queue_id-absence and push-vs-poll distinction tests
Incorporates valuable extra coverage from fullstack-engineer's PR #336:
- test_push_queued_missing_queue_id_still_parsed: queue_id is optional,
  absence must not break parsing
- test_push_queued_is_distinct_from_poll_queued: both envelope shapes
  parse correctly and independently, with correct delivery_mode values

Also adds push_queued_no_queue_id fixture and regression gate entry.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-11 02:47:21 +00:00
0a9b66a3ed fix(workspace): push-mode Queued returns delivery_mode="push" (not silent default "poll")
Bug: a2a_response.py:197 returned Queued(method=method) without passing
delivery_mode, silently defaulting to "poll" for push-mode busy-queue
responses. Callers branching on v.delivery_mode would mis-identify push-mode
responses as poll-mode, causing wrong dispatch logic.

Fix: pass delivery_mode="push" explicitly in the push-mode branch.

Tests: add push_queued_full/notify/no_method fixtures and 4 test cases
asserting delivery_mode="push" for all three envelope shapes. Also add
adversarial {"queued": "yes"} and {"queued": False} → Malformed guards.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-11 02:47:21 +00:00
a205099652 fix(security): OFFSEC-003 — boundary-marker escape + shared sanitizer
Root cause (from infra-lead PR#7 review id=724):
Sanitization in PR#7 wrapped peer text in [A2A_RESULT_FROM_PEER]
markers, but the markers themselves were not escaped — a malicious
peer could inject "[/A2A_RESULT_FROM_PEER]" to close the trust
boundary early, making subsequent text appear inside the trusted zone.

Fix:
- Create workspace/_sanitize_a2a.py (leaf module, no circular import
  risk) with shared sanitize_a2a_result() + _escape_boundary_markers()
- _escape_boundary_markers() escapes boundary open/close markers in the
  raw peer text before wrapping (primary security control)
- Defense-in-depth: also escapes SYSTEM/OVERRIDE/INSTRUCTIONS/IGNORE
  ALL/YOU ARE NOW patterns (secondary, per PR#7 design intent)
- Update a2a_tools_delegation.py: import from _sanitize_a2a; wrap
  tool_delegate_task return and tool_check_task_status response_preview
- Add 15 tests covering boundary escape, injection patterns, integration
  shapes (workspace/tests/test_a2a_sanitization.py)

Follow-up (non-blocking, noted in PR#7 infra-lead review):
- Deduplicate if a2a_tools.py also wraps (currently handled in
  delegation module only — callers get sanitized output regardless)
- tool_check_task_status: consider sanitizing 'summary' field too

Closes: molecule-ai/molecule-ai-workspace-runtime#7 (wrong-repo PR
that this supersedes)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-11 02:16:09 +00:00
5ecec3f253 Merge pull request 'fix(a2a): reject delegate_task to your own workspace ID (self-deadlock guard)' (#291) from fix/self-delegation-guard into main
All checks were successful
Secret scan / Scan diff for credential-shaped strings (push) Successful in 5s
2026-05-10 10:53:18 +00:00
f58a11d171 Merge pull request 'fix(runtime): MODEL_PROVIDER env is misnamed — accept MODEL/MOLECULE_MODEL, deprecate legacy name' (#280) from fix/model-provider-misnomer into main
All checks were successful
Secret scan / Scan diff for credential-shaped strings (push) Successful in 7s
2026-05-10 10:52:40 +00:00
31ed137b74 fix(a2a): reject delegate_task / delegate_task_async to your own workspace ID
Some checks failed
sop-tier-check / tier-check (pull_request) Failing after 9s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 10s
audit-force-merge / audit (pull_request) Successful in 5s
Self-delegation deadlocks: the sending turn holds `_run_lock`, the receive
handler waits for the same lock, the A2A request 30s-times-out, and the
whole cycle is wasted (the Dev Lead system prompt warns agents off this by
hand — "Never delegate_task to your own workspace ID … there is no peer who
is also you"). The platform/runtime had no guard. Now both
`tool_delegate_task` and `tool_delegate_task_async` early-return an
actionable error when `workspace_id == effective_source` (`source_workspace_id
or _peer_to_source[target] or WORKSPACE_ID`) — before `discover_peer`, so no
network round-trip is wasted either. A genuinely different target (incl.
another of a multi-workspace agent's own registered workspaces) is
unaffected.

Tests: tests/test_a2a_tools_delegation.py — new TestSelfDelegationGuard (4
cases: rejects own ID; rejects when source_workspace_id explicitly == target;
async path rejects; a different target passes the guard through to
discover_peer). `pytest tests/test_a2a_tools_delegation.py` → 12 passed.
(tests/test_a2a_tools_impl.py's TestToolDelegateTask* suite is red on this
PC2/Windows checkout — same on `main` without this change; httpx-mock infra,
not this PR — CI validates on Linux.)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 03:46:59 -07:00
fe1b3d9a82 Merge branch 'main' into fix/a2a-tools-and-workflow-cleanup
All checks were successful
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 24s
sop-tier-check / tier-check (pull_request) Successful in 25s
audit-force-merge / audit (pull_request) Successful in 17s
2026-05-10 10:12:50 +00:00
e647efe7c5 fix(a2a): handle string error in a2a_tools.py + remove dead staging trigger
All checks were successful
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 6s
sop-tier-check / tier-check (pull_request) Successful in 38s
Two-part fix from PR #268 (ported by Integration Tester after PR #268
was closed without merge):

PART 1 — workspace/builtin_tools/a2a_tools.py: Fixes AttributeError
when platform returns a plain string as the error field. Before:
  data["error"].get("message")  ← crashes if error is a string
After:
  isinstance(err, dict) → err.get("message")
  isinstance(err, str)  → use err directly
  otherwise              → str(err)

Also guards result.get("parts") against non-dict result.
Includes fix for issue #279: empty-parts regression where
{"parts": []} returned "(no text)" instead of str(result).

PART 2 — .gitea/workflows/ and .github/workflows/
publish-workspace-server-image.yml: Removed dead "staging" branch
trigger. Trunk-based migration (2026-05-08) removed the staging branch
but the workflow triggers were not updated.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-10 09:52:36 +00:00
2ba3af5330 fix(runtime): MODEL_PROVIDER env is misnamed — accept MODEL/MOLECULE_MODEL, deprecate the legacy name
Some checks failed
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 17s
sop-tier-check / tier-check (pull_request) Failing after 16s
audit-force-merge / audit (pull_request) Successful in 8s
`molecule_runtime.config.load_config` read the `MODEL_PROVIDER` env var as
the *picked model id* — despite the name, it never carried the provider
(that's `LLM_PROVIDER` / the YAML `provider:` field). So `claude-code`,
`minimax`, and `opus` were all "valid" values for a var named
MODEL_PROVIDER. That footgun bit the dev-team rollout (2026-05-10): the
lead persona env files set `MODEL=claude-opus-4-7` (the intended model)
*and* `MODEL_PROVIDER=claude-code` (mistaking it for "the runtime"); the
loader picked up MODEL_PROVIDER → the claude CLI got `--model claude-code`
→ 404 on every turn, surfaced only as "Command failed with exit code 1"
with empty stderr (the real error is in the stream-json stdout, swallowed
by the SDK's placeholder). The 22 IC workspaces "worked" only because
their `MODEL_PROVIDER=minimax` happened to fuzzy-match on MiniMax's side —
they were actually running `--model minimax`, not `MiniMax-M2.7-highspeed`.

New precedence in `_picked_model_from_env`: `MOLECULE_MODEL` (canonical,
unambiguous) > `MODEL` (the obviously-correct name, already plumbed by
workspace-server's applyRuntimeModelEnv) > `MODEL_PROVIDER` (legacy —
still honored so canvas Save+Restart, the secret-mint path, and existing
persona env files keep working, but if it's the only one set we log a
one-time deprecation pointing at the misnomer) > the YAML `model:` field.
Applied at both the top-level `model` and `runtime_config.model`
resolution sites; semantics are otherwise unchanged. Bonus: workspaces
that already set `MODEL` correctly now get exactly that model instead of
whatever fuzzy-match the upstream did with the provider slug.

Tests: 5 new cases in test_config.py (MODEL beats MODEL_PROVIDER;
MOLECULE_MODEL beats MODEL; MODEL overrides YAML; legacy MODEL_PROVIDER
still resolves + warns; no warning when MODEL is set) + an autouse
fixture that clears MODEL*/resets the warn-latch so resolution is
deterministic regardless of the CI env or test order. `pytest
tests/test_config.py` — 66 passed; the config-importing suites
(test_preflight, test_skills_loader) — 129 passed.

Companion: molecule-dev-department PR #10 fixes the six dev-team lead
`workspace.yaml`s from `model: MiniMax-M2.7` to `model: opus`. Follow-ups
(not in scope here): plumb `MOLECULE_MODEL` from applyRuntimeModelEnv and
the canvas; strip `MODEL`/`MODEL_PROVIDER` from the operator-host persona
env files once the org-template `model:` field is authoritative end-to-end.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 02:38:14 -07:00
736d9959bc fix(a2a): handle push-mode queue envelope in response parser
All checks were successful
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 46s
sop-tier-check / tier-check (pull_request) Successful in 11s
When a push-mode workspace (one with a public URL) is at capacity, the
platform queues the delegation request and returns:

    {"queued": true, "message": "...", "queue_depth": N, "queue_id": "..."}

The existing SSOT parser (a2a_response.py) only handled the poll-mode
envelope (status=queued + delivery_mode=poll). Push-mode queue
responses fell through to Malformed, causing send_a2a_message to log a
warning and return an error — even though delivery was actually queued
successfully.

Fix: add handling for data.get("queued") is True as a Queued variant
with delivery_mode="push". Checked before the poll-mode envelope so the
two cases are mutually exclusive.

Fixes observed 2026-05-10: platform returning push-mode queue
envelopes to Integration Tester when Release Manager workspace was at
capacity.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-10 09:28:51 +00:00
7ae3ee786f feat(workspace): add static .github-token fallback to git credential helper
Some checks failed
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 4s
sop-tier-check / tier-check (pull_request) Failing after 4s
Adds a 4th fallback step to the token chain (cache > API > env > static)
so workspace git/gh operations survive a platform outage without requiring
a restart or platform-side fix. Addresses the 2026-05-08 incident where
every workspace lost git+gh auth simultaneously when the
/github-installation-token endpoint returned 500.

Operator places a PAT in ${CONFIGS_DIR:-/configs}/.github-token
(no root needed — /configs is agent-writable). Both _fetch_token
(git credential helper path) and _refresh_gh (gh CLI daemon path)
gain the static fallback so git and gh both recover post-incident.

Pure additive — existing cache > API > env chain is unchanged.
Empty static file is rejected (whitespace-stripped before use).
Static path never writes the cache, so the API recovers transparently
on the next refresh cycle when it comes back online.

Ref: issue #140.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-10 02:17:22 +00:00
1492b40b38 ci(docker): pin base image digests in all Dockerfiles
Some checks failed
sop-tier-check / tier-check (pull_request) Failing after 28s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 37s
Pins all FROM image tags to exact SHA256 digests for reproducible
builds. Without digest pinning, a registry push of a new image to the
same tag can silently change the layer content between builds — a
supply-chain risk especially for prod-deployed images.

Pinned images (7 Dockerfiles):
- golang:1.25-alpine → sha256:c4ea15b... (workspace-server/Dockerfile,
  Dockerfile.dev, Dockerfile.tenant, tests/harness/cp-stub/Dockerfile)
- alpine:3.20 → sha256:c64c687c... (workspace-server/Dockerfile,
  tests/harness/cp-stub/Dockerfile)
- node:20-alpine → sha256:afdf982... (workspace-server/Dockerfile.tenant)
- node:22-alpine → sha256:cb15fca... (canvas/Dockerfile)
- python:3.11-slim → sha256:e78299e... (workspace/Dockerfile)
- nginx:1.27-alpine → sha256:62223d6... (tests/harness/cf-proxy/Dockerfile)

Note: docker-compose.yml service images (postgres, redis, clickhouse,
litellm, ollama) are intentionally left on major-version tags — those
are runtime-pulled and updated regularly for local-dev ergonomics.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-09 23:56:39 +00:00
76ac5a88dc [core-be-agent]
Some checks failed
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 4s
sop-tier-check / tier-check (pull_request) Failing after 4s
fix(tests): clear platform_auth cache before each test

Fixes issue #160: workspace tests fail when MOLECULE_WORKSPACE_TOKEN
is set in the environment.

The bug: platform_auth._cached_token is populated at module import or
first get_token() call and persists for the process lifetime. Tests
that use monkeypatch.delenv("MOLECULE_WORKSPACE_TOKEN") to simulate "no
token in env" were failing because delenv removes the env var but not
the module-level cache — subsequent get_token() calls returned the
stale cached value.

Fix: add a function-scoped autouse fixture in conftest.py that calls
platform_auth.clear_cache() before every test. The import is inside the
fixture to avoid collection-time import issues when platform_auth is
not yet available.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-09 22:16:11 +00:00
57aedec1a3 fix(tests): isolate token resolution from real .auth_token on disk
Some checks failed
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 3s
sop-tier-check / tier-check (pull_request) Failing after 4s
Issue #160: workspace tests fail when MOLECULE_WORKSPACE_TOKEN is set in
the test environment (or when /configs/.auth_token exists on disk, as it
does in a container CI runner).

Root cause:
- test_resolve_token_returns_none_when_missing: monkeypatch.delenv()
  removes the env var, but _resolve_token() falls through to
  configs_dir.resolve()/.auth_token which exists in the container.
- Multi-workspace tests: clear_cache() resets _cached_token, but
  get_token() immediately re-reads /configs/.auth_token and caches
  the real token before the env var is even checked.

Fix:
- test_mcp_doctor: patch configs_dir.resolve() to return a bare tmp_path
  so the disk-file fallback finds nothing.
- Multi-workspace tests: patch platform_auth._token_file() to return a
  non-existent path (via tmp_path) alongside clear_cache(), ensuring
  the env var wins as intended.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-09 21:55:29 +00:00
252f8d0c47 tech-debt: rename molecule-monorepo-net -> molecule-core-net
Some checks failed
sop-tier-check / tier-check (pull_request) Failing after 4s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 4s
Renames Docker network across all code, configs, scripts, and docs.

Per issue #93: the network was named molecule-monorepo-net as a holdover
from when the repo was called molecule-monorepo. The canonical repo name is
now molecule-core, so the network should be molecule-core-net.

Files changed:
- docker-compose.yml, docker-compose.infra.yml: network definition
- infra/scripts/setup.sh: docker network create
- scripts/nuke-and-rebuild.sh: docker network rm
- workspace-server/internal/provisioner/provisioner.go: DefaultNetwork
- All comments/docs: updated wording

Acceptance: grep -rn 'molecule-monorepo-net' returns zero matches.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-09 20:51:48 +00:00
6193f67bc0 fix(workspace): set git user.name/email from $GITEA_USER at boot (#156)
All checks were successful
Secret scan / Scan diff for credential-shaped strings (push) Successful in 4s
Closes #155.

[FORCE-MERGE AUDIT — §SOP-7]
- Approver: hongming (Gitea PR review APPROVED 2026-05-09T20:27:01Z)
- Chat-go: explicit go in conversation transcript ~20:39 UTC after Hongming clicked approve
- Bypassed: required status checks (all pending forever — likely runner pickup issue, separate from this PR's correctness)
- Audit channel: orchestrator force-merge log + this commit message

Next: workspace runtime image rebuilds via publish-runtime.yml; new workspaces pick up persistent persona git identity.
2026-05-09 20:36:58 +00:00
orchestrator
a4fc04189c fix(workspace): set git user.name/email from $GITEA_USER at boot
Some checks failed
branch-protection drift check / Branch protection drift (pull_request) Successful in 10s
Check merge_group trigger on required workflows / Required workflows have merge_group trigger (pull_request) Successful in 11s
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 14s
cascade-list-drift-gate / check (pull_request) Successful in 18s
Check migration collisions / Migration version collision check (pull_request) Successful in 23s
CI / Detect changes (pull_request) Successful in 24s
E2E API Smoke Test / detect-changes (pull_request) Successful in 24s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 24s
pr-guards / disable-auto-merge-on-push (pull_request) Failing after 7s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 22s
Lint curl status-code capture / Scan workflows for curl status-capture pollution (pull_request) Successful in 16s
Harness Replays / detect-changes (pull_request) Successful in 25s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 20s
sop-tier-check / tier-check (pull_request) Failing after 21s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 38s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 26s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 41s
Ops Scripts Tests / Ops scripts (unittest) (pull_request) Successful in 1m46s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 13s
Harness Replays / Harness Replays (pull_request) Successful in 9s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 1m31s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 4m8s
CI / Python Lint & Test (pull_request) Successful in 8m54s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
CI / Canvas (Next.js) (pull_request) Failing after 10m21s
CI / Platform (Go) (pull_request) Successful in 13m8s
CodeQL / Analyze (${{ matrix.language }}) (go) (pull_request) Failing after 22m59s
CodeQL / Analyze (${{ matrix.language }}) (javascript-typescript) (pull_request) Failing after 23m26s
CodeQL / Analyze (${{ matrix.language }}) (python) (pull_request) Failing after 23m31s
audit-force-merge / audit (pull_request) Successful in 4s
Closes #155.

Without this, every commit from a workspace booted via the standard
provisioner lands with an empty `user.name`/`user.email` and Gitea
attributes the work to whichever PAT pushed (typically the founder's
`claude-ceo-assistant`), instead of the persona that actually authored
the commit. That's the same fingerprint pattern that got us suspended
on GitHub 2026-05-06.

GITEA_USER is already injected per-workspace by the provisioner from
workspace_secrets (verified: 8/8 Core-* workspaces have it set,
correctly-named, on operator + local). Boot picks it up unconditionally;
falls through cleanly if unset (e.g. legacy boxes without persona
identity wiring).

Email uses `bot.moleculesai.app` so agent commits are visually distinct
from human-authored commits in Gitea history. The `gitconfig` copy from
`/root/.gitconfig` to `/home/agent/.gitconfig` is now unconditional —
previously it was nested inside the `molecule-git-token-helper.sh`
block, which meant the per-persona identity wouldn't propagate to the
agent user when the helper was unavailable.

Also added an inline note that the github.com credential-helper block
is post-suspension legacy. Full removal tracked under #171; this PR
deliberately doesn't touch it (smaller blast radius).

Tested: docker exec sets the same config in 8 running Core-* workspaces
locally and they pick up correct identity for `git config -l`. Will
reset when those containers restart, hence this PR for the persistent
fix.
2026-05-09 12:52:17 -07:00
documentation-specialist
bd145dcec6 docs(workspace-runtime): migrate github.com refs at source so mirror inherits Gitea links (internal#41)
Some checks failed
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 4s
CI / Detect changes (pull_request) Successful in 5s
E2E API Smoke Test / detect-changes (pull_request) Successful in 6s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 6s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 6s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 6s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 5s
CI / Platform (Go) (pull_request) Successful in 3s
CI / Canvas (Next.js) (pull_request) Successful in 4s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 4s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 8s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 5s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 4s
Ops Scripts Tests / Ops scripts (unittest) (pull_request) Failing after 12s
CI / Python Lint & Test (pull_request) Failing after 12s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Failing after 11s
CodeQL / Analyze (${{ matrix.language }}) (go) (pull_request) Failing after 41s
CodeQL / Analyze (${{ matrix.language }}) (javascript-typescript) (pull_request) Failing after 1m18s
CodeQL / Analyze (${{ matrix.language }}) (python) (pull_request) Failing after 1m21s
The molecule-ai-workspace-runtime mirror is regenerated on every
runtime-v* tag from this monorepo's workspace/. Per saved memory
reference_runtime_repo_is_mirror_only, mirror-guard rejects direct
PRs to the mirror; edit at source.

Source-side files that propagate to the mirror's published README +
read by users of the in-monorepo workspace-runtime docs:

- scripts/build_runtime_package.py (the README generator):
  * line 281 README_TEMPLATE: 'Shared workspace runtime for Molecule
    AI' link → Gitea
  * line 399 doc-link to workspace-runtime-package.md → Gitea path
    (with /src/branch/main/ shape)
  LEFT AS-IS (per Q3 audit-trail decision):
  * lines 379, 392 historical issue cross-refs (#2936, #2937)

- workspace/build-all.sh:5 — comment block linking to template-*
  repos. Migrated to Gitea path-shape.

- docs/workspace-runtime-package.md:
  * lines 101-108 adapter→repo table (8 templates, all PUBLIC on
    Gitea) — Gitea URLs
  * line 247 starter-repo link — substituted host + added inline
    note that starter doesn't survive the suspension migration
    (recreation pending; cross-link to this issue)
  * line 259 generic git clone command for new templates → Gitea
  * line 289 second starter mention — same handling as 247

Files NOT touched in this PR:
- workspace/ Python source code (.py files) — those use github
  paths in docstrings + a few log strings; fix bundled with the
  cross-repo Go-module-style migration (per #37 Q5 + parked
  follow-ups).
- 'Writing a new adapter' section's `gh repo create` command (line
  254-256) — gh CLI doesn't talk to Gitea (per #45 parked follow-up).
- 'Writing a new adapter' section's ghcr.io image ref (line 276) —
  per #46 ghcr→ECR migration (separate concern).

After this PR merges to staging + a runtime-v* tag is pushed, the
mirror's published README will inherit the Gitea link. Until then
the mirror's README continues to reference github.com/Molecule-AI
(stale but historical-marker-correct since the mirror existed
pre-suspension).

Refs: molecule-ai/internal#41, molecule-ai/internal#37,
molecule-ai/internal#38, molecule-ai/internal#42,
molecule-ai/internal#45, molecule-ai/internal#46
2026-05-07 00:48:04 -07:00
Hongming Wang
166ad20cd7 test(e2e): Phase 3.5 — wheel parser classifies real server response (#2967)
Previously Phase 3 only checked the workspace-server's poll-mode short-circuit
emit shape ({"status":"queued","delivery_mode":"poll","method":"..."}); the
matching client-side classification was tested in isolation against fixture
dicts in test_a2a_response.py.

This phase closes the loop by piping the actual on-the-wire response from a
real workspace-server back through the wheel's a2a_response.parse() and
asserting it classifies as the Queued variant with the right method +
delivery_mode. A regression in EITHER the server emit shape OR the client
parser will now fail this E2E, eliminating the gap that allowed the original
"unexpected response shape" production bug to ship despite green unit tests.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 17:31:45 -07:00
Hongming Wang
8b9f809966 fix(a2a): SSOT response parser — handle poll-mode queued envelope (#2967)
Introduce ``workspace/a2a_response.py`` as the single source of truth for
the wire shapes the workspace-server proxy can return at
``/workspaces/<id>/a2a``:

  * ``Result``    — JSON-RPC success
  * ``Error``     — JSON-RPC error or platform-level error (with
                    restart-in-progress metadata when present)
  * ``Queued``    — poll-mode short-circuit envelope: the platform
                    queued the message into the target's inbox, the
                    target will fetch via /activity poll
  * ``Malformed`` — anything the parser can't classify (logged at
                    WARNING so a future server change is loud)

``send_a2a_message`` (in ``a2a_client.py``) now dispatches via
``a2a_response.parse(data)`` instead of inline ``"result" in data`` /
``"error" in data`` sniffing. The Queued variant returns a new
``_A2A_QUEUED_PREFIX`` sentinel so callers can distinguish "delivered
async, no synchronous reply" from both success-with-text and failure.

reno-stars production data caught two intermittent failures that
both reduced to the same root cause:

  1. **File transfer announce silently failed** — when CEO Ryan PC
     (poll-mode external molecule-mcp) sent the harmi.zip
     announcement to Reno Stars Business Intelligent (also poll-mode
     external), ``send_a2a_message`` saw the platform's poll-queued
     envelope ``{"status":"queued","delivery_mode":"poll","method":"..."}``,
     didn't recognize it as the synthetic delivery-acknowledgement
     it is, and returned ``[A2A_ERROR] unexpected response shape``.
     The agent fell back to a chunk-shipping path; receiver did get
     the file but operator-facing logs showed a failure that didn't
     actually fail.

  2. **Duplicated agent comm** — same bug, inverted direction. d76
     delegated to 67d, send_a2a_message returned the unexpected-shape
     error, delegate_task wrapped it as DELEGATION FAILED, the calling
     agent retried with sharper wording, the recipient saw the same
     request twice and self-reported "二次请求 — 我先不执行".

External molecule-mcp standalone runtimes are inherently poll-mode
(they have no public URL), so every external↔external A2A pair was
hitting this on every send. The pre-fix client only handled JSON-RPC
``result``/``error`` keys and treated the queued envelope (which has
neither) as malformed. RFC #2339 PR 2 added the queued envelope on
the server side; the client never caught up.

When ``send_a2a_message`` returns the ``_A2A_QUEUED_PREFIX`` sentinel,
``tool_delegate_task`` now transparently falls back to
``_delegate_sync_via_polling`` (RFC #2829 PR-5's durable
``/delegate`` + ``/delegations`` polling path, which DOES work for
poll-mode peers because the platform's executeDelegation goroutine
writes to the inbox queue and the result row arrives when the target
picks it up + replies). The agent gets a real synchronous reply
instead of the empty queued sentinel.

  * ``test_a2a_response.py`` — 62 tests, **100% line coverage** on
    the parser (verified via ``coverage run --source=a2a_response``).
    Includes adversarial-input fuzzing across ~25 pathological
    payloads — parser must never raise.
  * ``test_a2a_client.py::TestSendA2AMessagePollMode`` — 4 tests for
    the new Queued/Error wiring in ``send_a2a_message``.
  * ``test_delegation_sync_via_polling.py::TestPollModeAutoFallback``
    — 3 tests for the auto-fallback in ``tool_delegate_task``,
    including negative cases (push-mode reply must NOT trigger
    fallback; genuine error must NOT silently retry).
  * **Verified all new tests FAIL on pre-fix source** by stashing
    a2a_client.py + a2a_tools_delegation.py and re-running — 5
    failures including ImportError for the missing
    ``_A2A_QUEUED_PREFIX``.

Per the operator-debuggability directive:

  * INFO at every Queued classification (expected variant; operator
    sees normal poll-mode-peer queueing in log stream).
  * INFO at the auto-fallback decision in ``tool_delegate_task``
    so a future operator can correlate "send returned queued →
    falling back to polling path" without reading the source.
  * WARNING at every Malformed classification (server contract
    drift; operator MUST see this immediately).
  * Existing transient-retry WARNING preserved.

  * Mirror Go-side typed model in workspace-server. The wire shape
    is documented in ``a2a_response.py``'s module docstring with
    file:line pointers to the canonical emitters; a future PR can
    introduce ``models/a2a_response.go`` without changing wire
    behavior. The fixture corpus in ``test_a2a_response.py`` is
    designed so a one-sided edit breaks CI.
  * ``send_message_to_user`` and ``chat_upload_receive`` use a
    different endpoint (``/notify``) and aren't affected by this
    bug; their parsing stays unchanged.

  * 135 tests pass across ``test_a2a_response.py`` +
    ``test_a2a_client.py`` + ``test_delegation_sync_via_polling.py``
    + ``test_a2a_tools_impl.py``.
  * ``coverage run --source=a2a_response -m pytest`` reports 100%
    line coverage with 0 missing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 17:21:28 -07:00
Hongming Wang
146c0e7c60 fix(a2a-client): recognize poll-mode 'queued' envelope (#2967)
workspace-server's a2a_proxy poll-mode short-circuit returns

    {status: "queued", delivery_mode: "poll", method: <a2a_method>}

when the peer has no URL to dispatch to (poll-mode peers, including
every external molecule-mcp standalone runtime). The bare
send_a2a_message parser only knew about JSON-RPC {result, error}
keys, so this envelope fell through to the "unexpected response shape"
error path. Two production symptoms on the reno-stars tenant traced
to it:

1. File transfer logged as failed when it actually succeeded —
   operator-facing logs showed an A2A_ERROR but the receiving
   workspace did get the chunked file via the agent's fallback path.
2. delegate_task retried after the false failure → peer received
   duplicate delegations → conversation got confused, the second
   peer self-diagnosed in a notify ("⚠️ Peer 二次请求 — 我先不执行").

Add a third branch to the parser, BETWEEN the existing JSON-RPC
{result, error} cases and the catch-all "unexpected" fallback. The
queued envelope is delivery-acknowledged-but-pending-consumption —
not an error — so it returns a clean success string the agent can
render as a normal outcome. The success string includes "queued"
and "poll" so an operator scanning logs sees the routing path
without parsing JSON.

Defensive: the new branch only fires when BOTH status="queued" AND
delivery_mode="poll" are present. A partial envelope (one key
missing) still falls through to the catch-all, so a future server
bug that emits a malformed shape gets surfaced instead of silently
swallowed.

Tests:
- test_poll_queued_envelope_returns_success_string — pins the canonical
  envelope returns a non-error string. Discriminating: verified to FAIL
  on old code (returned [A2A_ERROR] string), PASS on new.
- test_poll_queued_envelope_with_other_method — pins the parser doesn't
  hardcode message/send. Discriminating: also FAILS on old code.
- test_status_queued_without_poll_mode_still_falls_through — pins both
  keys are required (defensive against future server bugs).

12 existing tests in TestSendA2AMessage still pass — no regression.

Scope: hotfix for the bare send_a2a_message path. The full SSOT
typed-A2AResponse refactor (#158-#163, parents under #2967) covers the
broader vocabulary alignment between Go server and Python client. This
PR ends the production symptoms now without preempting that work.
2026-05-05 16:58:48 -07:00
Hongming Wang
2652ea8342 fix(mcp-doctor): heartbeat (idempotent) instead of register (UPSERT)
Self-review caught after #2954 landed: check_register() POSTed to
/registry/register with agent_card.name="doctor-probe". The endpoint
is an UPSERT, so the doctor probe overwrites the workspace's actual
agent_card metadata until the real agent's next register call. An
operator running `molecule-mcp doctor` against a live workspace
would see their canvas briefly display "doctor-probe" as the agent
name — invisible production-disruption.

Switches to POST /registry/heartbeat. heartbeat only updates
last_heartbeat_at (and clears awaiting_agent if needed) — the same
work a normal molecule-mcp boot does every 20s in steady state, so
the doctor's extra heartbeat is indistinguishable from background
traffic.

Function renamed check_register → check_token_auth to match what
it actually does. check_register kept as back-compat alias so any
external test/import still resolves.

Also unified the duplicated token-resolution paths into a single
_resolve_token() returning (value, source_label). Pre-fix:
check_register and _resolve_token_summary read env in parallel
ladders — a future env-var addition would have to touch both.

New tests:
  - test_check_token_auth_uses_heartbeat_endpoint: mocks urlopen,
    asserts the URL ends in /registry/heartbeat AND does NOT
    contain /registry/register. Pins the load-bearing invariant
    so a future refactor can't silently re-route through register.
  - test_resolve_token_returns_value_and_label_for_env: pins the
    consolidated resolver returns both pieces of info from the
    same source-decision.
  - test_resolve_token_returns_none_when_missing: missing-env
    happy path.

Verification:
  - 13/13 tests pass (10 existing + 3 new)
  - Manual stripped-env run still renders 4 FAIL + 2 WARN with
    actionable hints, exit 1.

Refs molecule-core#2934 item 6 (doctor side-effect fix-up).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 16:11:08 -07:00
Hongming Wang
f01f374072 feat(mcp): add molecule-mcp doctor onboarding diagnostic
Closes #2934 item 6 — the deferred follow-up from Ryan's onboarding-
friction report. Quote: "this single command would have saved me
30 of the 45 minutes."

When push delivery fails or the install half-works, the operator
today has no signal — they hand-grep the Claude Code binary or
chase the `from versions: none` red herring. Doctor renders six
checks in one screen with concrete next-step suggestions:

  1. Python version    >=3.11? (wheel's pin)
  2. Wheel install     molecule-ai-workspace-runtime importable +
                        version surfaced
  3. PATH for binary   `molecule-mcp` resolves on PATH; if not,
                        prints the resolved user-site bin dir to
                        add (or recommends pipx)
  4. Env vars          PLATFORM_URL + WORKSPACE_ID + token (env or
                        *_FILE or .auth_token)
  5. Platform reach    GET ${PLATFORM_URL}/healthz returns 2xx
  6. Registry register POST /registry/register with the resolved
                        token returns 2xx — end-to-end auth check

Each line: `[OK|WARN|FAIL] <label>: <status>` plus a `next:` hint
when not OK. ANSI colors auto-disable on non-TTY / NO_COLOR.

Exit code: 0 on all-OK or only-WARN, 1 on any FAIL — scriptable
from CI install-checks.

## Files

`workspace/mcp_doctor.py`  (new) — six check functions + `run()`
                                   entry point. Uses urllib (stdlib)
                                   so doctor works even on a partial
                                   install where `requests` is missing.

`workspace/mcp_cli.py`             Subcommand dispatch:
                                     molecule-mcp doctor   → mcp_doctor.run()
                                     molecule-mcp --help   → usage banner
                                     molecule-mcp          → server (unchanged)

`workspace/tests/test_mcp_doctor.py`  (new) — 10 tests covering each
                                       check's pass/fail/skip path
                                       plus the end-to-end exit-code
                                       contract on a stripped env.

`scripts/build_runtime_package.py`    Adds `mcp_doctor` to
                                       TOP_LEVEL_MODULES so the
                                       wheel ships the new module.

## Out of scope (deferred follow-ups)
- Claude Code-specific checks (parse ~/.claude.json, verify each
  MCP entry is plugin-sourced + dev-channels flag set). That's a
  separate Claude-Code-shaped doctor; lives in the channel plugin.
- Automated remediation. Doctor is diagnostic — tells the operator
  what's wrong + how to fix it, doesn't apply changes.

## Verification
  - python -m pytest tests/test_mcp_doctor.py -v   → 10/10 PASS
  - python -m pytest tests/test_mcp_cli*.py        → 67/67 PASS
    (existing CLI suite still green; subcommand dispatch added
    before env-validation, doesn't disturb the server-boot path)
  - manual: `molecule-mcp doctor` on a stripped env renders 4 FAIL
    + 2 WARN + exit code 1, with each `next:` hint actionable

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 15:44:36 -07:00
Hongming Wang
3b7ed9cf53
Merge pull request #2946 from Molecule-AI/fix/onboarding-followup-2934
mcp: surface specific TOKEN_FILE errors + link follow-ups (#2934)
2026-05-05 22:19:21 +00:00
Hongming Wang
da9061c131 mcp: surface specific TOKEN_FILE errors + link follow-ups (#2934)
Self-review of #2935 turned up two real defects:

1. Stale README issue references — the build_runtime_package.py
   README template said "(issue #2934 follow-up)" twice, but the
   marketplace-plugin and `doctor` items now have dedicated tracking
   issues. Updated to point at #2936 and #2937 respectively.

2. Silent fallthrough on broken MOLECULE_WORKSPACE_TOKEN_FILE — when
   an operator EXPLICITLY pointed TOKEN_FILE at a path that didn't
   exist / wasn't readable / was blank / contained internal whitespace,
   the resolver silently returned the generic "set one of these three
   vars" error. That's exactly the silent failure mode #2934 flagged
   ("a new user has no chance"). Refactor `_read_token_from_file_env`
   to return `(token, error)`; surface the SPECIFIC failure when the
   operator's intent was clearly the file path. Skip the CONFIGS_DIR
   fallback in that case so the operator's config bug isn't masked
   by a different source happening to work.

Adds 2 renames + 2 new tests in test_mcp_cli_split.py:
  - test_missing_file_returns_specific_error (asserts "does not exist")
  - test_empty_file_returns_specific_error (asserts "is empty")
  - test_multi_line_file_rejected (asserts "internal whitespace")
  - test_token_file_error_skips_configs_dir_fallback (asserts a valid
    CONFIGS_DIR/.auth_token does NOT silently rescue a broken
    TOKEN_FILE)

All 81 mcp_cli + mcp_cli_multi_workspace + mcp_cli_split tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 15:07:15 -07:00
Hongming Wang
c4807a930d
Merge pull request #2940 from Molecule-AI/refactor/a2a-tools-inbox-extract-rfc2873-iter4e
refactor(workspace): extract inbox tools from a2a_tools.py (RFC #2873 iter 4e)
2026-05-05 21:58:32 +00:00
Hongming Wang
475da5b64c refactor(workspace): extract inbox tools from a2a_tools.py (RFC #2873 iter 4e)
Continues the OSS-shape refactor. After iters 4a-4d (rbac, delegation,
memory, messaging) the only behavior left in ``a2a_tools.py`` was
``report_activity`` plus three thin inbox-tool wrappers and the
``_enrich_inbound_for_agent`` helper. This iter extracts the inbox
slice to ``a2a_tools_inbox.py`` so the kitchen-sink module shrinks
from 280 LOC to ~165 LOC of imports + report_activity + back-compat
re-export blocks.

Extracted symbols:
  - ``_INBOX_NOT_ENABLED_MSG`` (sentinel)
  - ``_enrich_inbound_for_agent`` (poll-path peer enrichment helper)
  - ``tool_inbox_peek``
  - ``tool_inbox_pop``
  - ``tool_wait_for_message``

Re-exports (`from a2a_tools_inbox import …`) preserve the public
``a2a_tools.tool_inbox_*`` surface so existing tests + call sites
continue to resolve unchanged.

New tests in test_a2a_tools_inbox_split.py:
  1. **Drift gate (5)** — every previously-public symbol on a2a_tools
     is the EXACT same object as a2a_tools_inbox.foo (`is`, not `==`),
     catches a future "wrap with logging" refactor that silently loses
     existing test coverage.
  2. **Import contract (1)** — a2a_tools_inbox does NOT eagerly import
     a2a_tools at module load. Pins the layered architecture: the
     extracted slice depends on ``inbox`` + a lazy ``a2a_client``
     import, never on the kitchen-sink that re-exports it.
  3. **_enrich_inbound_for_agent branches (5)** — peer_id-empty
     (canvas_user) returns dict unchanged; missing peer_id key same;
     a2a_client unavailable (test harness, partial install) degrades
     gracefully with a bare envelope; registry hit populates
     peer_name + peer_role + agent_card_url; registry miss still
     surfaces agent_card_url (constructable from peer_id alone).

The full timeout-clamp / validation / JSON-shape behavior matrix for
the three wrappers stays in test_a2a_tools_inbox_wrappers.py — those
tests pass identically against both the alias and the underlying impl.

Wiring updates:
  - ``scripts/build_runtime_package.py``: add ``a2a_tools_inbox`` to
    ``TOP_LEVEL_MODULES`` so it ships in the runtime wheel and the
    drift gate doesn't fail the next publish.
  - ``.github/workflows/ci.yml``: add ``a2a_tools_inbox.py`` to
    ``CRITICAL_FILES`` so the 75% MCP/inbox/auth per-file floor
    applies — this is now where the inbox-delivery code actually
    lives.
2026-05-05 14:28:58 -07:00
Hongming Wang
1ad107cc15
Merge pull request #2935 from Molecule-AI/fix/onboarding-friction-2934
fix(onboarding): address Claude Code MCP onboarding friction (#2934)
2026-05-05 21:25:57 +00:00
Hongming Wang
01deeb36cf fix(onboarding): address Claude Code MCP onboarding friction (#2934)
Ryan's bug report (#2934) walked through ~45 min of debugging a stock
external-runtime install. This PR fixes the four items he flagged that
have a small surface, and stubs out the larger ones for follow-up.

Fixed in this PR
================

#1 — Python floor disclosure (README in publish bundle)
  Add an explicit "Requires Python ≥3.11" section that calls out the
  cryptic "Could not find a version that satisfies the requirement"
  failure mode; recommend `pipx install` over `pip install` so the
  binary lands on PATH automatically; show the explicit `pip install
  --user` alternative with the PATH caveat.

#3 — MOLECULE_WORKSPACE_TOKEN_FILE support (mcp_workspace_resolver.py)
  Add a third resolution step between the inline env var and the
  in-container CONFIGS_DIR fallback. Operators can write the bearer to
  a 0600 file (e.g. ~/.config/molecule/token) and point
  MOLECULE_WORKSPACE_TOKEN_FILE at it, keeping the secret out of
  ~/.zsh_history and out of plaintext in MCP-host configs like
  ~/.claude.json. Inline TOKEN still wins on conflict so rotation flows
  are predictable. README documents the safer option as the
  recommended path. 6 new tests pin every leg (file resolves, inline
  wins, missing/empty file falls through, blank env unset-equivalent,
  help text advertises it).

#4 — Push delivery 3-condition gating (README in publish bundle)
  Document that real-time push on Claude Code requires (a) the server
  to declare experimental.claude/channel (we do), (b) the server to be
  marketplace-plugin-sourced (operators must scaffold their own until
  the official marketplace lands — see #2934 follow-up), and (c) the
  --dangerously-load-development-channels flag on the claude
  invocation. Until any of the three is in place, delivery silently
  falls back to poll mode with no diagnostic. The README now says all
  of this explicitly so a new operator doesn't grep the binary for
  channel_enable to figure it out.

#8 — serverInfo.name mismatch (a2a_mcp_server.py)
  The server reported `serverInfo.name = "a2a-delegation"` while
  operators register it as `molecule` (the name in `claude mcp add
  molecule …`). Harmless on tool routing today but matters for any
  future Claude Code allowlist that gates push by hardcoded server
  name. Renamed to "molecule" with an inline comment explaining the
  invariant.

Deferred (separate issues to track)
===================================

#2 — covered transitively by #1's pipx recommendation; no separate fix.
#5 — `moleculesai/claude-code-plugin` marketplace repo (substantial new
     repo work; the README references it as a documented follow-up).
#6 — `molecule-mcp doctor` subcommand (substantial new CLI surface;
     mentioned in the README's push-vs-poll section as the planned
     diagnostic for silent push fallback).
#7 — `--dangerously-load-development-channels` rename — not in our
     control; that's Claude Code's flag.

Tests
=====
164/164 mcp_cli + a2a_mcp_server tests pass locally
(WORKSPACE_ID=00000000-0000-0000-0000-000000000001 pytest …) including
6 new TestTokenFileEnv cases. Wheel builds successfully via
scripts/build_runtime_package.py with the new README markers verified
in the output.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 14:19:09 -07:00