Compare commits

...

8 Commits

Author SHA1 Message Date
f08c9de7a0 fix(workspace): replace _run() with @pytest.mark.asyncio in test_a2a_tools_inbox_wrappers
Some checks failed
Secret scan / Scan diff for credential-shaped strings (pull_request) Failing after 2s
sop-tier-check / tier-check (pull_request) Successful in 33s
audit-force-merge / audit (pull_request) Has been skipped
Replace the `_run(coro) = asyncio.get_event_loop().run_until_complete(coro)`
pattern with proper `@pytest.mark.asyncio` async test methods.

Root cause (issue #307): `_run()` creates a nested event loop that bypasses
pytest-asyncio's lifecycle management. With `asyncio_mode = auto`, the
auto-awaitification + nested loop corrupts event loop state when conftest.py
fixtures (from test_a2a_executor.py etc.) run first — coroutines are never
executed and RuntimeWarnings fire. The tests pass in isolation (no prior
fixtures) but fail in the full suite (14/14 failures, 1959 passed, exit code 1).

Fix: convert all 14 test methods to `async def` with `@pytest.mark.asyncio`
decorators and `await` calls. pytest-asyncio now owns the event loop lifecycle
for these tests, eliminating the nested-loop corruption entirely.

Note: #272 (sqlalchemy missing) and #271 (MODEL_PROVIDER test isolation) were
already resolved at time of investigation — sqlalchemy is in requirements.txt
and _clean_model_env fixture already handles MODEL_PROVIDER cleanup.
2026-05-10 15:25:45 +00:00
a3c9f0b717 Merge pull request 'ci: pin GitHub Actions by SHA instead of mutable tags (staging sync)' (#276) from ci/staging-sha-pinning into staging
Some checks failed
Secret scan / Scan diff for credential-shaped strings (push) Failing after 2s
2026-05-10 14:03:05 +00:00
de9f46ea30 Merge pull request '[release-blocker] fix(ci): retry git clone in clone-manifest.sh (publish-workspace-server-image OOM flake)' (#298) from fix/publish-workspace-server-ci-clone-manifest-retry into staging
Some checks are pending
Secret scan / Scan diff for credential-shaped strings (push) Waiting to run
2026-05-10 12:44:35 +00:00
7ff5622a42 [infra-lead-agent] fix(ci): retry git clone in clone-manifest.sh (publish-workspace-server-image flake)
Some checks failed
Secret scan / Scan diff for credential-shaped strings (pull_request) Failing after 1s
sop-tier-check / tier-check (pull_request) Failing after 1s
audit-force-merge / audit (pull_request) Failing after 2s
The publish-workspace-server-image / build-and-push job clones the full
manifest (~36 repos) serially in the "Pre-clone manifest deps" step on a
memory-constrained Gitea Actions runner. Under host memory pressure the
OOM killer SIGKILLs git-remote-https mid-clone:

  cloning .../molecule-ai-plugin-molecule-skill-code-review.git ...
  error: git-remote-https died of signal 9
  fatal: the remote end hung up unexpectedly
    Failure - Main Pre-clone manifest deps
  exitcode '128': failure

Observed in run 4622 (2026-05-10, staging HEAD b5d2ab88) — died on the
14th of 36 clones, which red-lights CI and wedges staging→main.

Wrap each `git clone` in clone-manifest.sh with bounded retry + backoff
(3 attempts, 3s/6s), wiping any partial checkout between tries. A single
transient SIGKILL / network blip no longer fails the whole tenant image
rebuild. Benefits every caller of the script (publish-workspace-server-image,
harness-replays, Dockerfile builds, local quickstart).

This is a mitigation; the durable fix is more runner RAM/swap on the
operator host — tracked separately with Infra-SRE.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-10 11:58:09 +00:00
bea89ce4e9 fix(a2a): handle string-form errors in delegate_task
Some checks failed
Secret scan / Scan diff for credential-shaped strings (pull_request) Failing after 14s
sop-tier-check / tier-check (pull_request) Failing after 7s
audit-force-merge / audit (pull_request) Failing after 5s
The A2A proxy can return three error shapes:
  {"error": "plain string"}
  {"error": {"message": "...", "code": ...}}
  {"error": {"message": {"nested": "object"}}}   ← value at .message is a string

builtin_tools/a2a_tools.py:72 called data["error"].get("message")
without guarding against error being a string, which raised:
  AttributeError: 'str' object has no attribute 'get'

This broke every delegation attempt through the legacy a2a_tools path
(the LangChain-wrapped version used by adapter templates). The
SSOT parser a2a_response.py already handled string errors; the
legacy inline sniffer in a2a_tools.py did not.

Fix: branch on isinstance(err, dict/str/other) before calling .get().

Also update both publish-workflow files to remove the dead
`staging` branch trigger — trunk-based migration (PR #109,
2026-05-08) removed the staging branch.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-10 11:39:32 +00:00
14f05b5a64 chore: restore manifest.json after trigger test 2026-05-10 11:38:34 +00:00
7caee806df chore: trigger publish workflow [Integration Tester 2026-05-10T08:45Z] 2026-05-10 11:38:34 +00:00
a914f675a4 chore: staging trigger commit from Integration Tester 2026-05-10 11:38:34 +00:00
6 changed files with 102 additions and 43 deletions

View File

@ -32,11 +32,9 @@ on:
- '.gitea/workflows/publish-workspace-server-image.yml'
workflow_dispatch:
# Serialize per-branch so two rapid staging pushes don't race the same
# :staging-latest tag retag. Allow staging and main to run in parallel
# (different GITHUB_REF → different concurrency group) since they
# produce different :staging-<sha> tags and last-write-wins on
# :staging-latest is acceptable across branches.
# Serialize per-branch so two rapid main pushes don't race the same
# :staging-latest tag retag. Allow parallel runs as they produce
# different :staging-<sha> tags and last-write-wins on :staging-latest.
#
# cancel-in-progress: false → in-flight builds finish; the next push's
# build queues. This avoids a partially-pushed image.

1
.staging-trigger Normal file
View File

@ -0,0 +1 @@
staging trigger

View File

@ -44,3 +44,4 @@
{"name": "mock-bigorg", "repo": "molecule-ai/molecule-ai-org-template-mock-bigorg", "ref": "main"}
]
}
// Triggered by Integration Tester at 2026-05-10T08:52Z

View File

@ -37,6 +37,50 @@ PLUGINS_DIR="${4:?Missing plugins dir}"
EXPECTED=0
CLONED=0
# clone_one_with_retry — clone a single repo, retrying on transient failure.
#
# Why: the publish-workspace-server-image (and harness-replays) CI jobs
# clone the full manifest (~36 repos) serially on a memory-constrained
# Gitea Actions runner. Under host memory pressure the OOM killer
# occasionally SIGKILLs git-remote-https mid-clone:
#
# error: git-remote-https died of signal 9
# fatal: the remote end hung up unexpectedly
#
# (observed in publish-workspace-server-image run 4622 on 2026-05-10 — the
# job died on the 14th of 36 clones, which wedged staging→main). One
# transient SIGKILL / network blip would otherwise fail the whole tenant
# image rebuild. Retrying after a short backoff lets the pressure subside.
# The durable fix is more runner RAM/swap (tracked with Infra-SRE); this
# just stops a single flake from being release-blocking.
#
# Args: <target_dir> <name> <clone_url> <display_url> <ref>
clone_one_with_retry() {
local tdir="$1" name="$2" url="$3" display="$4" ref="$5"
local attempt=1 max_attempts=3 backoff
while : ; do
# A killed attempt can leave a partial directory behind; git clone
# refuses a non-empty target, so wipe it before each try.
rm -rf "$tdir/$name"
if [ "$ref" = "main" ]; then
if git clone --depth=1 -q "$url" "$tdir/$name"; then return 0; fi
else
if git clone --depth=1 -q --branch "$ref" "$url" "$tdir/$name"; then return 0; fi
fi
if [ "$attempt" -ge "$max_attempts" ]; then
echo "::error::clone failed after ${max_attempts} attempts: ${display}" >&2
return 1
fi
backoff=$((attempt * 3)) # 3s, then 6s
echo " ⚠ clone attempt ${attempt}/${max_attempts} failed for ${display} — retrying in ${backoff}s" >&2
sleep "$backoff"
attempt=$((attempt + 1))
done
}
clone_category() {
local category="$1"
local target_dir="$2"
@ -82,11 +126,7 @@ clone_category() {
fi
echo " cloning $display_url -> $target_dir/$name (ref=$ref)"
if [ "$ref" = "main" ]; then
git clone --depth=1 -q "$clone_url" "$target_dir/$name"
else
git clone --depth=1 -q --branch "$ref" "$clone_url" "$target_dir/$name"
fi
clone_one_with_retry "$target_dir" "$name" "$clone_url" "$display_url" "$ref"
CLONED=$((CLONED + 1))
i=$((i + 1))
done

View File

@ -77,6 +77,16 @@ async def delegate_task(workspace_id: str, task: str) -> str:
return str(result) if isinstance(result, str) else "(no text)"
elif "error" in data:
err = data["error"]
# Handle both string-form errors ("error": "some string")
# and object-form errors ("error": {"message": "...", "code": ...}).
msg = ""
if isinstance(err, dict):
msg = err.get("message", "")
elif isinstance(err, str):
msg = err
else:
msg = str(err)
return f"Error: {msg}"
msg = ""
if isinstance(err, dict):
msg = err.get("message", "")

View File

@ -15,7 +15,6 @@ The wrappers are ~40 LOC of glue. The full delivery behavior
"""
from __future__ import annotations
import asyncio
import json
from unittest.mock import MagicMock, patch
@ -29,24 +28,22 @@ def _require_workspace_id(monkeypatch):
yield
def _run(coro):
return asyncio.get_event_loop().run_until_complete(coro)
# ---------------------------------------------------------------------------
# tool_inbox_peek
# ---------------------------------------------------------------------------
class TestToolInboxPeek:
def test_returns_not_enabled_when_state_none(self):
@pytest.mark.asyncio
async def test_returns_not_enabled_when_state_none(self):
import a2a_tools
with patch("inbox.get_state", return_value=None):
out = _run(a2a_tools.tool_inbox_peek())
out = await a2a_tools.tool_inbox_peek()
assert "not enabled" in out
def test_returns_json_array_of_messages(self):
@pytest.mark.asyncio
async def test_returns_json_array_of_messages(self):
import a2a_tools
msg1 = MagicMock()
@ -58,20 +55,21 @@ class TestToolInboxPeek:
fake_state.peek.return_value = [msg1, msg2]
with patch("inbox.get_state", return_value=fake_state):
out = _run(a2a_tools.tool_inbox_peek(limit=5))
out = await a2a_tools.tool_inbox_peek(limit=5)
# peek limit is forwarded
fake_state.peek.assert_called_once_with(limit=5)
parsed = json.loads(out)
assert len(parsed) == 2
assert parsed[0]["activity_id"] == "a1"
def test_non_int_limit_falls_back_to_10(self):
@pytest.mark.asyncio
async def test_non_int_limit_falls_back_to_10(self):
import a2a_tools
fake_state = MagicMock()
fake_state.peek.return_value = []
with patch("inbox.get_state", return_value=fake_state):
_run(a2a_tools.tool_inbox_peek(limit="garbage")) # type: ignore[arg-type]
await a2a_tools.tool_inbox_peek(limit="garbage") # type: ignore[arg-type]
fake_state.peek.assert_called_once_with(limit=10)
@ -81,49 +79,54 @@ class TestToolInboxPeek:
class TestToolInboxPop:
def test_returns_not_enabled_when_state_none(self):
@pytest.mark.asyncio
async def test_returns_not_enabled_when_state_none(self):
import a2a_tools
with patch("inbox.get_state", return_value=None):
out = _run(a2a_tools.tool_inbox_pop("act-1"))
out = await a2a_tools.tool_inbox_pop("act-1")
assert "not enabled" in out
def test_rejects_empty_activity_id(self):
@pytest.mark.asyncio
async def test_rejects_empty_activity_id(self):
import a2a_tools
fake_state = MagicMock()
with patch("inbox.get_state", return_value=fake_state):
out = _run(a2a_tools.tool_inbox_pop(""))
out = await a2a_tools.tool_inbox_pop("")
assert "activity_id is required" in out
fake_state.pop.assert_not_called()
def test_rejects_non_str_activity_id(self):
@pytest.mark.asyncio
async def test_rejects_non_str_activity_id(self):
import a2a_tools
fake_state = MagicMock()
with patch("inbox.get_state", return_value=fake_state):
out = _run(a2a_tools.tool_inbox_pop(123)) # type: ignore[arg-type]
out = await a2a_tools.tool_inbox_pop(123) # type: ignore[arg-type]
assert "activity_id is required" in out
fake_state.pop.assert_not_called()
def test_returns_removed_true_when_popped(self):
@pytest.mark.asyncio
async def test_returns_removed_true_when_popped(self):
import a2a_tools
fake_state = MagicMock()
fake_state.pop.return_value = MagicMock() # truthy = something was removed
with patch("inbox.get_state", return_value=fake_state):
out = _run(a2a_tools.tool_inbox_pop("act-7"))
out = await a2a_tools.tool_inbox_pop("act-7")
parsed = json.loads(out)
assert parsed == {"removed": True, "activity_id": "act-7"}
fake_state.pop.assert_called_once_with("act-7")
def test_returns_removed_false_when_unknown(self):
@pytest.mark.asyncio
async def test_returns_removed_false_when_unknown(self):
import a2a_tools
fake_state = MagicMock()
fake_state.pop.return_value = None
with patch("inbox.get_state", return_value=fake_state):
out = _run(a2a_tools.tool_inbox_pop("act-missing"))
out = await a2a_tools.tool_inbox_pop("act-missing")
parsed = json.loads(out)
assert parsed == {"removed": False, "activity_id": "act-missing"}
@ -134,25 +137,28 @@ class TestToolInboxPop:
class TestToolWaitForMessage:
def test_returns_not_enabled_when_state_none(self):
@pytest.mark.asyncio
async def test_returns_not_enabled_when_state_none(self):
import a2a_tools
with patch("inbox.get_state", return_value=None):
out = _run(a2a_tools.tool_wait_for_message(timeout_secs=1.0))
out = await a2a_tools.tool_wait_for_message(timeout_secs=1.0)
assert "not enabled" in out
def test_timeout_payload_when_no_message(self):
@pytest.mark.asyncio
async def test_timeout_payload_when_no_message(self):
import a2a_tools
fake_state = MagicMock()
fake_state.wait.return_value = None
with patch("inbox.get_state", return_value=fake_state):
out = _run(a2a_tools.tool_wait_for_message(timeout_secs=0.1))
out = await a2a_tools.tool_wait_for_message(timeout_secs=0.1)
parsed = json.loads(out)
assert parsed["timeout"] is True
assert parsed["timeout_secs"] == 0.1
def test_returns_message_when_delivered(self):
@pytest.mark.asyncio
async def test_returns_message_when_delivered(self):
import a2a_tools
msg = MagicMock()
@ -160,37 +166,40 @@ class TestToolWaitForMessage:
fake_state = MagicMock()
fake_state.wait.return_value = msg
with patch("inbox.get_state", return_value=fake_state):
out = _run(a2a_tools.tool_wait_for_message(timeout_secs=2.0))
out = await a2a_tools.tool_wait_for_message(timeout_secs=2.0)
parsed = json.loads(out)
assert parsed["activity_id"] == "a-9"
def test_timeout_clamped_to_300(self):
@pytest.mark.asyncio
async def test_timeout_clamped_to_300(self):
import a2a_tools
fake_state = MagicMock()
fake_state.wait.return_value = None
with patch("inbox.get_state", return_value=fake_state):
_run(a2a_tools.tool_wait_for_message(timeout_secs=99999))
await a2a_tools.tool_wait_for_message(timeout_secs=99999)
# Whatever wait was called with, it must not exceed 300
passed = fake_state.wait.call_args.args[0]
assert passed == 300.0
def test_timeout_clamped_to_zero_floor(self):
@pytest.mark.asyncio
async def test_timeout_clamped_to_zero_floor(self):
import a2a_tools
fake_state = MagicMock()
fake_state.wait.return_value = None
with patch("inbox.get_state", return_value=fake_state):
_run(a2a_tools.tool_wait_for_message(timeout_secs=-5))
await a2a_tools.tool_wait_for_message(timeout_secs=-5)
passed = fake_state.wait.call_args.args[0]
assert passed == 0.0
def test_non_numeric_timeout_falls_back_to_60(self):
@pytest.mark.asyncio
async def test_non_numeric_timeout_falls_back_to_60(self):
import a2a_tools
fake_state = MagicMock()
fake_state.wait.return_value = None
with patch("inbox.get_state", return_value=fake_state):
_run(a2a_tools.tool_wait_for_message(timeout_secs="garbage")) # type: ignore[arg-type]
await a2a_tools.tool_wait_for_message(timeout_secs="garbage") # type: ignore[arg-type]
passed = fake_state.wait.call_args.args[0]
assert passed == 60.0