molecule-core/workspace/tests
Hongming Wang e87a9c3858 fix(a2a): auto-retry transient transport errors in send_a2a_message
Three different intermittent failures observed during a single
manual-test session — RemoteProtocolError, ReadTimeout, ConnectError —
each surfaced as a "Failed to deliver to <peer>" error chip in the
canvas Agent Comms panel even though the next attempt would have
succeeded (verified by direct probes from the same source workspace
to the same peer). The error message even told the user "Usually a
transient network blip — retry once," but it left the retry to a
human reading the error message.

Auto-retry inside send_a2a_message itself: up to 5 attempts (1
initial + 4 retries) with exponential backoff (1s, 2s, 4s, 8s,
16s-capped), each backoff jittered ±25% to break sync across
siblings. Cumulative wall-clock capped at 600s by
_DELEGATE_TOTAL_BUDGET_S so a string of 5×300s ReadTimeouts can't
make the caller wait 25 minutes — once the deadline elapses, retries
stop even if attempts remain.

Retry only on transport-layer transients:
  - ConnectError / ConnectTimeout (peer's listening socket not ready)
  - RemoteProtocolError (peer closed TCP without writing — observed
    when a peer's prior in-flight Claude SDK session aborted)
  - ReadError / WriteError (network blip on Docker bridge)
  - ReadTimeout (peer wrote no response in 300s)

Application-level errors are NOT retried — they're deterministic and
retrying just wastes wall-clock:
  - HTTP 4xx (peer rejected the request format)
  - JSON parse failures (peer returned garbage)
  - JSON-RPC error in response body (peer's runtime errored cleanly)
  - Programmer-bug exceptions (ValueError, etc.)

8 new tests pin the contract:
  - retry succeeds after 2 RemoteProtocolErrors
  - retry succeeds after 1 ConnectError
  - all 5 attempts fail → returns formatted last-error
  - capped at exactly _DELEGATE_MAX_ATTEMPTS (regression cover for
    "did someone bump the constant accidentally?")
  - JSON-RPC error response NOT retried (1 attempt only)
  - non-httpx exception NOT retried (programmer bugs stay loud)
  - total budget caps the loop even if attempts remain
  - backoff schedule grows exponentially with ±25% jitter

Refactor: extracted _format_a2a_error() so the success and exhausted
paths share one error-formatting routine. _delegate_backoff_seconds()
is a pure function so the schedule is unit-testable without monkey-
patching asyncio.sleep.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 13:52:01 -07:00
..
adapters chore: open-source restructure — rename dirs, remove internal files, scrub secrets 2026-04-18 00:24:44 -07:00
__init__.py chore: open-source restructure — rename dirs, remove internal files, scrub secrets 2026-04-18 00:24:44 -07:00
conftest.py test: update a2a.helpers mock to export new_text_message 2026-04-27 05:34:28 -07:00
test_a2a_cli.py chore: open-source restructure — rename dirs, remove internal files, scrub secrets 2026-04-18 00:24:44 -07:00
test_a2a_client.py fix(a2a): auto-retry transient transport errors in send_a2a_message 2026-04-27 13:52:01 -07:00
test_a2a_executor.py feat(workspace): migrate a2a-sdk from 0.3.x to 1.0.0 (KI-009) (#1974) 2026-04-24 04:43:17 +00:00
test_a2a_mcp_server.py feat(tools): tighten send_message_to_user description to forbid pasting URLs in body 2026-04-27 01:13:11 -07:00
test_a2a_tools_impl.py feat(tools): borrow hermes-style discipline — error/summary caps + sharper MCP descriptions 2026-04-26 23:25:54 -07:00
test_a2a_tools_module.py fix(workspace): tag self-originated A2A POSTs with X-Workspace-ID 2026-04-24 19:54:43 -07:00
test_agent.py chore: open-source restructure — rename dirs, remove internal files, scrub secrets 2026-04-18 00:24:44 -07:00
test_agents_md.py chore: open-source restructure — rename dirs, remove internal files, scrub secrets 2026-04-18 00:24:44 -07:00
test_approval.py chore: open-source restructure — rename dirs, remove internal files, scrub secrets 2026-04-18 00:24:44 -07:00
test_audit_ledger.py chore: open-source restructure — rename dirs, remove internal files, scrub secrets 2026-04-18 00:24:44 -07:00
test_audit.py chore: open-source restructure — rename dirs, remove internal files, scrub secrets 2026-04-18 00:24:44 -07:00
test_awareness_client_full.py chore: open-source restructure — rename dirs, remove internal files, scrub secrets 2026-04-18 00:24:44 -07:00
test_compliance.py chore: open-source restructure — rename dirs, remove internal files, scrub secrets 2026-04-18 00:24:44 -07:00
test_config.py refactor(test_config): parametrize the 3 yaml-default cases (simplify on #2085) 2026-04-26 02:03:59 -07:00
test_consolidation.py chore: open-source restructure — rename dirs, remove internal files, scrub secrets 2026-04-18 00:24:44 -07:00
test_coordinator_parent.py chore: open-source restructure — rename dirs, remove internal files, scrub secrets 2026-04-18 00:24:44 -07:00
test_coordinator_routing.py chore: open-source restructure — rename dirs, remove internal files, scrub secrets 2026-04-18 00:24:44 -07:00
test_delegation.py fix(delegation): lazy-refresh QUEUED state from platform; live DELEGATION_* events 2026-04-26 16:05:04 -07:00
test_events.py chore: open-source restructure — rename dirs, remove internal files, scrub secrets 2026-04-18 00:24:44 -07:00
test_executor_helpers.py fix(runtime): replace remaining /app/ legacy paths in agent prompts + docstrings 2026-04-27 11:22:00 -07:00
test_gh_wrapper.sh chore: open-source restructure — rename dirs, remove internal files, scrub secrets 2026-04-18 00:24:44 -07:00
test_governance.py chore: open-source restructure — rename dirs, remove internal files, scrub secrets 2026-04-18 00:24:44 -07:00
test_heartbeat_runtime_metadata.py fix(test): drop unused MagicMock import in test_heartbeat_runtime_metadata 2026-04-26 22:58:21 -07:00
test_heartbeat.py fix(workspace): tag self-originated A2A POSTs with X-Workspace-ID 2026-04-24 19:54:43 -07:00
test_hitl.py chore: open-source restructure — rename dirs, remove internal files, scrub secrets 2026-04-18 00:24:44 -07:00
test_jsonrpc_wire_role_format.py fix(runtime): use lowercase wire role for v0.3 JSON-RPC compat layer 2026-04-27 12:40:11 -07:00
test_main_initial_prompt.py chore: open-source restructure — rename dirs, remove internal files, scrub secrets 2026-04-18 00:24:44 -07:00
test_mcp_memory.py chore: open-source restructure — rename dirs, remove internal files, scrub secrets 2026-04-18 00:24:44 -07:00
test_memory.py chore: open-source restructure — rename dirs, remove internal files, scrub secrets 2026-04-18 00:24:44 -07:00
test_molecule_ai_status.py test(runtime): update molecule_ai_status test for renamed error prefix 2026-04-27 11:48:05 -07:00
test_namespaces.py chore: open-source restructure — rename dirs, remove internal files, scrub secrets 2026-04-18 00:24:44 -07:00
test_openclaw_adapter.py chore: open-source restructure — rename dirs, remove internal files, scrub secrets 2026-04-18 00:24:44 -07:00
test_platform_auth.py chore: final open-source cleanup — binary, stale paths, private refs 2026-04-18 00:38:55 -07:00
test_plugins_builtins.py feat(plugin): implement MCPServerAdaptor (issue #847) 2026-04-24 01:42:13 +00:00
test_plugins_registry.py chore: final open-source cleanup — binary, stale paths, private refs 2026-04-18 00:38:55 -07:00
test_plugins.py chore: open-source restructure — rename dirs, remove internal files, scrub secrets 2026-04-18 00:24:44 -07:00
test_pre_stop.py feat(workspace): pre-stop serialization for pause/resume (closes #1386) 2026-04-21 12:40:44 +00:00
test_preflight.py feat(preflight): replace SUPPORTED_RUNTIMES static list with adapter discovery 2026-04-27 00:44:51 -07:00
test_prompt.py chore: open-source restructure — rename dirs, remove internal files, scrub secrets 2026-04-18 00:24:44 -07:00
test_routing_policy.py chore: open-source restructure — rename dirs, remove internal files, scrub secrets 2026-04-18 00:24:44 -07:00
test_runtime_capabilities.py feat(runtime): adapter-declared idle_timeout_override end-to-end 2026-04-26 22:38:01 -07:00
test_runtime_wedge.py chore(workspace): drop claude_sdk_executor — Phase 2 of #87 2026-04-27 00:52:55 -07:00
test_safe_env.py chore: open-source restructure — rename dirs, remove internal files, scrub secrets 2026-04-18 00:24:44 -07:00
test_sandbox.py chore: open-source restructure — rename dirs, remove internal files, scrub secrets 2026-04-18 00:24:44 -07:00
test_secret_redact.py chore: open-source restructure — rename dirs, remove internal files, scrub secrets 2026-04-18 00:24:44 -07:00
test_security_scan.py chore: open-source restructure — rename dirs, remove internal files, scrub secrets 2026-04-18 00:24:44 -07:00
test_skills_loader.py feat(skills): per-skill runtime compatibility (#119, hermes pattern) 2026-04-27 01:57:43 -07:00
test_skills_watcher.py test(skills): make watcher test fakes accept current_runtime kwarg 2026-04-27 02:04:26 -07:00
test_snapshot_scrub.py feat(workspace): snapshot secret scrubber (closes #823) 2026-04-19 00:32:42 -07:00
test_telemetry.py chore: open-source restructure — rename dirs, remove internal files, scrub secrets 2026-04-18 00:24:44 -07:00
test_temporal_workflow.py chore: open-source restructure — rename dirs, remove internal files, scrub secrets 2026-04-18 00:24:44 -07:00
test_transcript_auth.py chore: open-source restructure — rename dirs, remove internal files, scrub secrets 2026-04-18 00:24:44 -07:00
test_watcher.py chore: open-source restructure — rename dirs, remove internal files, scrub secrets 2026-04-18 00:24:44 -07:00