Files
molecule-core/tests/e2e/test_a2a_e2e.sh
hongming-ceo-delegated 3269e93216
ci-arm64-advisory / fast-checks (pull_request) Waiting to run
Lint shellcheck (arm64 pilot) / shellcheck-arm64 (pilot) (pull_request) Successful in 11s
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 22s
CI / Python Lint & Test (pull_request) Successful in 16s
CI / Detect changes (pull_request) Successful in 38s
E2E API Smoke Test / detect-changes (pull_request) Successful in 22s
E2E Chat / detect-changes (pull_request) Successful in 13s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 10s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 7s
Harness Replays / detect-changes (pull_request) Successful in 5s
Lint curl status-code capture / Scan workflows for curl status-capture pollution (pull_request) Successful in 12s
Lint forbidden tenant-env keys / Scan workspace_secrets writers for forbidden env keys (pull_request) Successful in 3s
E2E Staging SaaS (full lifecycle) / pr-validate (pull_request) Successful in 34s
Lint no tenant GITEA or GITHUB token write / Scan for repo-host token write into tenant workspace surface (pull_request) Successful in 4s
lint-continue-on-error-tracking / lint-continue-on-error-tracking (pull_request) Failing after 1m7s
lint-mask-pr-atomicity / lint-mask-pr-atomicity (pull_request) Successful in 1m14s
Lint pre-flip continue-on-error / Verify continue-on-error flips have run-log proof (pull_request) Successful in 1m11s
lint-required-workflows-docker-host-pinned / Lint docker-host pin on docker-touching workflows (pull_request) Successful in 8s
lint-required-context-exists-in-bp / lint-required-context-exists-in-bp (pull_request) Successful in 1m24s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 4s
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 1m3s
gate-check-v3 / gate-check (pull_request) Successful in 5s
qa-review / approved (pull_request) Failing after 8s
security-review / approved (pull_request) Failing after 12s
verify-providers-gen / Regenerate providers artifact and fail on drift (pull_request) Successful in 27s
sop-checklist / na-declarations (pull_request) N/A: (none)
sop-checklist / all-items-acked (pull_request) Successful in 4s
sop-checklist / review-refire (pull_request) Has been skipped
sop-tier-check / tier-check (pull_request) Successful in 6s
Lint workflow YAML (Gitea-1.22.6-hostile shapes) / Lint workflow YAML for Gitea-1.22.6-hostile shapes (pull_request) Successful in 1m25s
E2E Staging SaaS (full lifecycle) / E2E Staging SaaS (pull_request) Failing after 4m31s
CI / Canvas (Next.js) (pull_request) Successful in 6s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 17s
E2E Chat / E2E Chat (pull_request) Successful in 4s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 8s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 2m12s
Harness Replays / Harness Replays (pull_request) Successful in 4s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 2m16s
CI / Platform (Go) (pull_request) Successful in 6m10s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
CI / all-required (pull_request) Successful in 13m28s
test(e2e): add real-completion + per-provider liveness + byok-routing A2A gate (#1994 follow-on)
The A2A e2e historically asserted only response SHAPE (test_a2a_e2e.sh
checked '"kind":"text"' only). A broken agent returns its error AS a
text part -- {"kind":"text","text":"Agent error (Exception) ..."} --
which STILL matches the shape check, so it PASSED on a fully broken
agent. That is why the 2026-05-2x drained-key / byok-misroute failures
(agents-team PM + reno marketing erroring on every LLM call) sailed
through CI. "Channel returns text shape" is not "agent completed an LLM
round-trip."

Adds, ADDITIVELY (no existing assertion weakened or removed):

- tests/e2e/lib/completion_assert.sh -- reusable gates:
  * a2a_assert_real_completion: deterministic known-answer round-trip;
    asserts CONTAINS the expected token AND NOT an error-as-text marker
    (Agent error / Exception / error result / MISSING_BYOK_CREDENTIAL).
  * provider_liveness_matrix + offered_platform_models_for_runtime:
    per-offered-provider cheap (max_tokens:4) probe; the offered set is
    read from the providers.yaml SSOT (runtimes.<rt>.providers[platform]
    .models) -- not a hardcoded list -- so the matrix tracks the SSOT.
  * assert_byok_not_platform_proxy: #1994 regression guard -- a
    byok-resolving workspace must NOT resolve platform_managed (reads the
    same derived resolver GET /admin/workspaces/:id/llm-billing-mode the
    provision strip gate uses).

- tests/e2e/test_staging_full_saas.sh (the live-agent lane, MiniMax
  primary): new stanzas 8b (PINEAPPLE known-answer, the core gate),
  8c (byok-routing guard), 8d (SSOT-driven per-provider liveness matrix).

- tests/e2e/test_a2a_e2e.sh: added check_no_error_as_text on Echo + SEO
  replies so the brief's literal shape-only example now FAILS on an
  error-as-text payload.

- tests/e2e/test_completion_assert_unit.sh: offline fail-direction proof
  (16 cases) that the negative gates are load-bearing -- error-as-text
  MUST fail, platform_managed MUST trip the #1994 guard. Wired into
  ci.yml "Run E2E bash unit tests (no live infra)" (required, per-PR +
  main). e2e-staging-saas.yml paths filter extended to re-trigger the
  live lane on lib changes.

No #1994 fix code touched -- tests/e2e + workflow wiring only.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-28 18:58:13 +00:00

251 lines
9.4 KiB
Bash

#!/usr/bin/env bash
set -euo pipefail
BASE="${BASE:-http://localhost:8080}"
PASS=0
FAIL=0
TIMEOUT="${A2A_TIMEOUT:-120}" # seconds per A2A call (override via A2A_TIMEOUT env var)
# shellcheck source=_lib.sh
source "$(dirname "$0")/_lib.sh"
# molecule-core#1995 (#1994 follow-on): real-completion assertion helpers.
# Adds a NEGATIVE error-as-text check on top of the shape checks below, so a
# broken agent that returns its error AS a text part
# ({"kind":"text","text":"Agent error (Exception) ..."}) — which STILL
# matches the shape check `"kind":"text"` — now FAILS instead of passing.
# shellcheck source=lib/completion_assert.sh
source "$(dirname "$0")/lib/completion_assert.sh"
# check_no_error_as_text <desc> <agent_text>
# Additive negative gate: PASS only if the agent text carries NO
# error-as-text marker (Agent error / Exception / error result /
# MISSING_BYOK_CREDENTIAL). Uses the same scanner as the staging
# real-completion gate so the trap is closed consistently across lanes.
check_no_error_as_text() {
local desc="$1"
local text="$2"
local hit
if hit=$(a2a_completion_error_marker "$text"); then
echo "FAIL: $desc"
echo " agent returned an error-AS-text payload (matched '$hit') — a broken"
echo " agent that surfaces its error as a text part is NOT a real reply."
echo " got: $(echo "$text" | head -3)"
FAIL=$((FAIL + 1))
else
echo "PASS: $desc"
PASS=$((PASS + 1))
fi
}
check() {
local desc="$1"
local expected="$2"
local actual="$3"
if echo "$actual" | grep -qF -- "$expected"; then
echo "PASS: $desc"
PASS=$((PASS + 1))
else
echo "FAIL: $desc"
echo " expected to contain: $expected"
echo " got: $(echo "$actual" | head -5)"
FAIL=$((FAIL + 1))
fi
}
a2a_send() {
local ws_id="$1"
local text="$2"
curl -s --max-time "$TIMEOUT" -X POST "$BASE/workspaces/$ws_id/a2a" \
-H "Content-Type: application/json" \
-d "{
\"method\": \"message/send\",
\"params\": {
\"message\": {
\"role\": \"user\",
\"parts\": [{\"type\": \"text\", \"text\": \"$text\"}]
}
}
}"
}
echo "=== A2A End-to-End Tests (Free Model: google/gemini-2.5-flash) ==="
echo ""
# --- Setup: find or create workspaces ---
ECHO_ID=$(curl -s "$BASE/workspaces" | python3 -c "
import sys, json
ws = json.load(sys.stdin)
for w in ws:
if w['name'] == 'Echo Agent' and w['status'] == 'online':
print(w['id']); break
else:
print('')
")
SEO_ID=$(curl -s "$BASE/workspaces" | python3 -c "
import sys, json
ws = json.load(sys.stdin)
for w in ws:
if w['name'] == 'SEO Agent' and w['status'] == 'online':
print(w['id']); break
else:
print('')
")
if [ -z "$ECHO_ID" ] || [ -z "$SEO_ID" ]; then
echo "ERROR: Need both Echo Agent and SEO Agent online. Found echo=$ECHO_ID seo=$SEO_ID"
exit 1
fi
echo "Echo Agent: $ECHO_ID"
echo "SEO Agent: $SEO_ID"
echo ""
# ========================================
# Test 1: Basic message/send — Echo Agent
# ========================================
echo "--- Test 1: Basic message/send ---"
R=$(a2a_send "$ECHO_ID" "Say hello back")
check "JSON-RPC response has result" '"result"' "$R"
check "Response has agent role" '"role":"agent"' "$R"
check "Response has text part" '"kind":"text"' "$R"
TEXT=$(echo "$R" | python3 -c "import sys,json; r=json.load(sys.stdin); print(r['result']['parts'][0]['text'][:200])" 2>/dev/null || echo "PARSE_ERROR")
# Negative gate (#1994): the text part must not BE an error.
check_no_error_as_text "Echo reply is not an error-as-text payload" "$TEXT"
echo " Agent said: $TEXT"
echo ""
# ========================================
# Test 2: Basic message/send — SEO Agent
# ========================================
echo "--- Test 2: SEO Agent responds ---"
R=$(a2a_send "$SEO_ID" "What SEO skills do you have?")
check "SEO agent responds" '"result"' "$R"
check "SEO response has text" '"kind":"text"' "$R"
TEXT=$(echo "$R" | python3 -c "import sys,json; r=json.load(sys.stdin); print(r['result']['parts'][0]['text'][:200])" 2>/dev/null || echo "PARSE_ERROR")
# Negative gate (#1994): a broken SEO agent that returns "Agent error
# (Exception) ..." AS text still matches the `"kind":"text"` shape check
# above — THAT is the gap that let drained-key/byok-misroute failures pass
# CI. This makes that case FAIL.
check_no_error_as_text "SEO reply is not an error-as-text payload" "$TEXT"
echo " SEO Agent said: $TEXT"
echo ""
# ========================================
# Test 3: JSON-RPC envelope wrapping
# ========================================
echo "--- Test 3: Auto JSON-RPC envelope wrapping ---"
# Send bare method+params (no jsonrpc/id fields) — proxy should wrap it
R=$(curl -s --max-time "$TIMEOUT" -X POST "$BASE/workspaces/$ECHO_ID/a2a" \
-H "Content-Type: application/json" \
-d '{"method":"message/send","params":{"message":{"role":"user","parts":[{"type":"text","text":"Bare request test"}]}}}')
check "Bare request gets wrapped and works" '"result"' "$R"
echo ""
# ========================================
# Test 4: Full JSON-RPC 2.0 envelope
# ========================================
echo "--- Test 4: Full JSON-RPC 2.0 envelope ---"
R=$(curl -s --max-time "$TIMEOUT" -X POST "$BASE/workspaces/$ECHO_ID/a2a" \
-H "Content-Type: application/json" \
-d '{"jsonrpc":"2.0","id":"custom-id-123","method":"message/send","params":{"message":{"role":"user","parts":[{"type":"text","text":"Full envelope test"}]}}}')
check "Full envelope returns result" '"result"' "$R"
check "Preserves custom request ID" '"id":"custom-id-123"' "$R"
echo ""
# ========================================
# Test 5: Invalid method returns error
# ========================================
echo "--- Test 5: Invalid method ---"
R=$(curl -s --max-time 10 -X POST "$BASE/workspaces/$ECHO_ID/a2a" \
-H "Content-Type: application/json" \
-d '{"method":"nonexistent/method","params":{}}')
check "Invalid method returns JSON-RPC error" '"error"' "$R"
check "Error code is method not found" '-32601' "$R"
echo ""
# ========================================
# Test 6: Offline workspace returns error
# ========================================
echo "--- Test 6: Offline workspace ---"
# Create a workspace but don't provision it
R=$(curl -s -X POST "$BASE/workspaces" -H "Content-Type: application/json" -d '{"name":"Offline Test","tier":1,"runtime":"external","external":true}')
OFFLINE_ID=$(echo "$R" | python3 -c "import sys,json; print(json.load(sys.stdin)['id'])")
R=$(curl -s --max-time 10 -X POST "$BASE/workspaces/$OFFLINE_ID/a2a" \
-H "Content-Type: application/json" \
-d '{"method":"message/send","params":{"message":{"role":"user","parts":[{"type":"text","text":"test"}]}}}')
check "Offline workspace returns error" '"error"' "$R"
# Clean up
e2e_delete_workspace "$OFFLINE_ID" "Offline Test"
echo ""
# ========================================
# Test 7: Nonexistent workspace returns 404
# ========================================
echo "--- Test 7: Nonexistent workspace ---"
R=$(curl -s -o /dev/null -w "%{http_code}" --max-time 10 -X POST "$BASE/workspaces/00000000-0000-0000-0000-000000000000/a2a" \
-H "Content-Type: application/json" \
-d '{"method":"message/send","params":{"message":{"role":"user","parts":[{"type":"text","text":"test"}]}}}')
check "Nonexistent workspace returns 404" "404" "$R"
echo ""
# ========================================
# Test 8: Multi-turn conversation
# ========================================
echo "--- Test 8: Multi-turn conversation ---"
R1=$(a2a_send "$ECHO_ID" "My name is Alice. Remember that.")
check "Turn 1 succeeds" '"result"' "$R1"
R2=$(a2a_send "$ECHO_ID" "What is my name?")
check "Turn 2 succeeds" '"result"' "$R2"
TEXT2=$(echo "$R2" | python3 -c "import sys,json; r=json.load(sys.stdin); print(r['result']['parts'][0]['text'])" 2>/dev/null || echo "PARSE_ERROR")
echo " Turn 2 response: $(echo "$TEXT2" | head -3)"
echo ""
# ========================================
# Test 9: Long input handling
# ========================================
echo "--- Test 9: Long input ---"
LONG_TEXT=$(python3 -c "print('This is a test sentence. ' * 50)")
R=$(a2a_send "$ECHO_ID" "$LONG_TEXT")
check "Long input returns result" '"result"' "$R"
echo ""
# ========================================
# Test 10: Peers can discover each other
# ========================================
echo "--- Test 10: Peer discovery ---"
R=$(curl -s "$BASE/registry/$ECHO_ID/peers")
check "Echo sees SEO as peer" 'SEO Agent' "$R"
R=$(curl -s "$BASE/registry/$SEO_ID/peers")
check "SEO sees Echo as peer" 'Echo Agent' "$R"
echo ""
# ========================================
# Test 11: Agent card reflects skills
# ========================================
echo "--- Test 11: Agent cards ---"
R=$(curl -s "$BASE/workspaces/$ECHO_ID")
check "Echo agent has agent_card" '"agent_card"' "$R"
check "Echo has skills" '"skills"' "$R"
R=$(curl -s "$BASE/workspaces/$SEO_ID")
check "SEO agent has agent_card" '"agent_card"' "$R"
check "SEO has skills" '"skills"' "$R"
echo ""
# ========================================
# Test 12: Heartbeat updates
# ========================================
echo "--- Test 12: Heartbeat ---"
sleep 2
R=$(curl -s "$BASE/workspaces/$ECHO_ID")
UPTIME=$(echo "$R" | python3 -c "import sys,json; print(json.load(sys.stdin)['uptime_seconds'])")
check "Echo has uptime > 0" "true" "$([ "$UPTIME" -gt 0 ] 2>/dev/null && echo true || echo false)"
echo " Echo uptime: ${UPTIME}s"
echo ""
# ========================================
# Summary
# ========================================
echo "=== Results: $PASS passed, $FAIL failed ==="
exit $FAIL