From 924c3ca9dc1b4d706beb9691e42729954cca8a87 Mon Sep 17 00:00:00 2001 From: core-qa Date: Fri, 15 May 2026 23:58:49 -0700 Subject: [PATCH] test(e2e): add LOCAL backend for the peer-visibility MCP gate MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit PR #1298 added the peer-visibility gate but staging-only. Per the standing rule that the local prod-mimic stack must run a MANDATORY local-Postgres E2E BEFORE staging E2E (feedback_local_must_mimic_ production, feedback_mandatory_local_e2e_before_ship, feedback_local_ test_before_staging_e2e), peer-visibility must also run locally so regressions are caught fast/cheap instead of late on cold EC2. - Factor the byte-identical assertion core out of test_peer_visibility_mcp_staging.sh into tests/e2e/lib/ peer_visibility_assert.sh::pv_assert_runtime. It drives the literal JSON-RPC tools/call name=list_peers envelope to POST /workspaces/:id/ mcp via each workspace's OWN bearer through the real WorkspaceAuth + MCPRateLimiter chain, with the same anti-proxy / anti-native-fallback guarantees. NOT a proxy: no registry row, /health, heartbeat, or GET /registry/:id/peers. Only provisioning differs per backend. - Refactor the staging script to source the shared lib (assertion byte-identical; provisioning/teardown/exit-codes unchanged). - Add tests/e2e/test_peer_visibility_mcp_local.sh: local docker-compose backend — POST /workspaces directly, e2e_mint_test_token for the MCP bearer (same model test_priority_runtimes_e2e.sh / test_api.sh use, no new credential flow), wait online, run the shared assertion, scoped per-workspace teardown only (feedback_cleanup_after_each_test, feedback_never_run_cluster_cleanup_tests_on_live_platform). bash-3.2- safe (no associative arrays) so it runs on local macOS dev boxes too. - Wire a peer-visibility-local job into e2e-peer-visibility.yml, bootstrapped exactly like e2e-api.yml's proven E2E API Smoke Test (per-run container names + ephemeral ports, go build, background platform-server). Runs on PR + push (local boot is minutes, not the 30+ min cold-EC2 path), so peer-visibility is part of the local gate that fires before the staging E2E. Its OWN non-required status context `E2E Peer Visibility (local)` — non-required-by-design like the staging job, HONEST gate with NO continue-on-error mask (feedback_fix_root_not_symptom); flip-to-required tracked at #1296 via the bp-required: pending directive. Co-Authored-By: Claude Opus 4.7 (1M context) --- .gitea/workflows/e2e-peer-visibility.yml | 182 +++++++++- tests/e2e/lib/peer_visibility_assert.sh | 165 +++++++++ tests/e2e/test_peer_visibility_mcp_local.sh | 328 ++++++++++++++++++ tests/e2e/test_peer_visibility_mcp_staging.sh | 103 +----- 4 files changed, 684 insertions(+), 94 deletions(-) create mode 100644 tests/e2e/lib/peer_visibility_assert.sh create mode 100755 tests/e2e/test_peer_visibility_mcp_local.sh diff --git a/.gitea/workflows/e2e-peer-visibility.yml b/.gitea/workflows/e2e-peer-visibility.yml index f7b13f161..55318f7cb 100644 --- a/.gitea/workflows/e2e-peer-visibility.yml +++ b/.gitea/workflows/e2e-peer-visibility.yml @@ -52,6 +52,30 @@ name: E2E Peer Visibility (literal MCP list_peers) # flip-to-required-ready (mirrors e2e-staging-saas.yml's proven shape; # real EC2-provisioning E2E is push/dispatch/cron only — it is 30+ min # and cannot run per-PR-update). +# +# LOCAL BACKEND (added 2026-05-15 — feedback_local_must_mimic_production, +# feedback_mandatory_local_e2e_before_ship, feedback_local_test_before_ +# staging_e2e) +# -------------------------------------------------------------------- +# The standing rule is that the local prod-mimic stack runs a MANDATORY +# local-Postgres E2E BEFORE staging E2E. A staging-only peer-visibility +# gate caught regressions late + expensively (cold EC2). The +# `peer-visibility-local` job below runs the SAME byte-identical +# assertion (tests/e2e/lib/peer_visibility_assert.sh) against the local +# docker-compose stack — built + booted exactly like e2e-api.yml's +# proven E2E API Smoke Test job (ephemeral pg/redis ports, go build, +# background platform-server). It runs on PR + push (local boot is +# minutes, not the 30+ min cold-EC2 path), so peer-visibility is part of +# the local gate that fires before the staging E2E. +# +# It is its OWN non-required status context `E2E Peer Visibility (local)` +# — same non-required-by-design decision as the staging job (red until +# Hermes-401 #162 / OpenClaw-never-online #165 land; flip-to-required +# tracked at molecule-core#1296). It is an HONEST gate: NO +# continue-on-error mask (feedback_fix_root_not_symptom). It is kept a +# distinct context (not folded into e2e-api.yml's required `E2E API +# Smoke Test`) precisely so a deliberately-RED-today gate cannot wedge +# the required local-E2E job or any unrelated merge. on: push: @@ -65,6 +89,8 @@ on: - 'workspace/a2a_mcp_server.py' - 'workspace/platform_tools/registry.py' - 'tests/e2e/test_peer_visibility_mcp_staging.sh' + - 'tests/e2e/test_peer_visibility_mcp_local.sh' + - 'tests/e2e/lib/peer_visibility_assert.sh' - '.gitea/workflows/e2e-peer-visibility.yml' pull_request: branches: [main] @@ -77,6 +103,8 @@ on: - 'workspace/a2a_mcp_server.py' - 'workspace/platform_tools/registry.py' - 'tests/e2e/test_peer_visibility_mcp_staging.sh' + - 'tests/e2e/test_peer_visibility_mcp_local.sh' + - 'tests/e2e/lib/peer_visibility_assert.sh' - '.gitea/workflows/e2e-peer-visibility.yml' workflow_dispatch: schedule: @@ -108,16 +136,160 @@ jobs: timeout-minutes: 5 steps: - uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2 - - name: Validate driving script + - name: Validate driving scripts + shared assertion lib run: | + bash -n tests/e2e/lib/peer_visibility_assert.sh + echo "lib/peer_visibility_assert.sh — bash syntax OK" bash -n tests/e2e/test_peer_visibility_mcp_staging.sh echo "test_peer_visibility_mcp_staging.sh — bash syntax OK" - echo "Real fresh-provision MCP list_peers E2E runs on push to" + bash -n tests/e2e/test_peer_visibility_mcp_local.sh + echo "test_peer_visibility_mcp_local.sh — bash syntax OK" + echo "Staging fresh-provision MCP list_peers E2E runs on push to" echo "main / workflow_dispatch / daily cron (30+ min EC2 boot)." + echo "The LOCAL backend runs in the peer-visibility-local job" + echo "below on this same PR (local docker-compose stack)." - # Real gate: provisions a throwaway org + sibling-per-runtime, drives - # the LITERAL list_peers MCP call per runtime, asserts 200 + expected - # peer set, then scoped teardown. push(main)/dispatch/cron only. + # LOCAL gate: same byte-identical assertion against the local prod-mimic + # docker-compose stack — the MANDATORY local-E2E that must run BEFORE + # the staging E2E (feedback_mandatory_local_e2e_before_ship, + # feedback_local_test_before_staging_e2e). Bootstrap mirrors + # e2e-api.yml's proven E2E API Smoke Test job (per-run container names + + # ephemeral host ports so concurrent host-network act_runner runs don't + # collide; go build; background platform-server). Its OWN non-required + # status context `E2E Peer Visibility (local)` — non-required-by-design + # exactly like the staging job (red until #162/#165 land; + # flip-to-required tracked at molecule-core#1296). HONEST gate, NO + # continue-on-error mask (feedback_fix_root_not_symptom). Runs on PR + + # push (local boot is minutes, not the 30+ min cold-EC2 path). + # bp-required: pending #1296 + peer-visibility-local: + name: E2E Peer Visibility (local) + runs-on: ubuntu-latest + timeout-minutes: 30 + env: + # Per-run names + ephemeral ports — same collision-avoidance as + # e2e-api.yml (host-network act_runner; feedback_act_runner_*). + PG_CONTAINER: pg-e2e-pv-${{ github.run_id }}-${{ github.run_attempt }} + REDIS_CONTAINER: redis-e2e-pv-${{ github.run_id }}-${{ github.run_attempt }} + # LLM keys so hermes/openclaw can actually boot. The local script + # SKIPs (not fails) any runtime whose key is absent, so a partially + # keyed CI env still exercises whatever it can. + CLAUDE_CODE_OAUTH_TOKEN: ${{ secrets.E2E_CLAUDE_CODE_OAUTH_TOKEN }} + E2E_MINIMAX_API_KEY: ${{ secrets.MOLECULE_STAGING_MINIMAX_API_KEY }} + E2E_ANTHROPIC_API_KEY: ${{ secrets.MOLECULE_STAGING_ANTHROPIC_API_KEY }} + E2E_OPENAI_API_KEY: ${{ secrets.MOLECULE_STAGING_OPENAI_API_KEY }} + PV_RUNTIMES: "hermes openclaw claude-code" + steps: + - uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2 + - uses: actions/setup-go@40f1582b2485089dde7abd97c1529aa768e1baff # v5 + with: + go-version: 'stable' + cache: true + cache-dependency-path: workspace-server/go.sum + - name: Pre-pull alpine + ensure provisioner network + run: | + docker pull alpine:latest >/dev/null + docker network create molecule-core-net >/dev/null 2>&1 || true + echo "alpine:latest pre-pulled; molecule-core-net ensured." + - name: Start Postgres (docker, ephemeral port) + run: | + docker rm -f "$PG_CONTAINER" 2>/dev/null || true + docker run -d --name "$PG_CONTAINER" \ + -e POSTGRES_USER=dev -e POSTGRES_PASSWORD=dev -e POSTGRES_DB=molecule \ + -p 0:5432 postgres:16 >/dev/null + PG_PORT=$(docker port "$PG_CONTAINER" 5432/tcp | awk -F: '/^0\.0\.0\.0:/ {print $2; exit}') + [ -n "$PG_PORT" ] || PG_PORT=$(docker port "$PG_CONTAINER" 5432/tcp | head -1 | awk -F: '{print $NF}') + if [ -z "$PG_PORT" ]; then + echo "::error::Could not resolve host port for $PG_CONTAINER" + docker logs "$PG_CONTAINER" || true; exit 1 + fi + echo "DATABASE_URL=postgres://dev:dev@127.0.0.1:${PG_PORT}/molecule?sslmode=disable" >> "$GITHUB_ENV" + for i in $(seq 1 30); do + docker exec "$PG_CONTAINER" pg_isready -U dev >/dev/null 2>&1 && { echo "Postgres ready after ${i}s"; exit 0; } + sleep 1 + done + echo "::error::Postgres did not become ready in 30s"; docker logs "$PG_CONTAINER" || true; exit 1 + - name: Start Redis (docker, ephemeral port) + run: | + docker rm -f "$REDIS_CONTAINER" 2>/dev/null || true + docker run -d --name "$REDIS_CONTAINER" -p 0:6379 redis:7 >/dev/null + REDIS_PORT=$(docker port "$REDIS_CONTAINER" 6379/tcp | awk -F: '/^0\.0\.0\.0:/ {print $2; exit}') + [ -n "$REDIS_PORT" ] || REDIS_PORT=$(docker port "$REDIS_CONTAINER" 6379/tcp | head -1 | awk -F: '{print $NF}') + if [ -z "$REDIS_PORT" ]; then + echo "::error::Could not resolve host port for $REDIS_CONTAINER" + docker logs "$REDIS_CONTAINER" || true; exit 1 + fi + echo "REDIS_URL=redis://127.0.0.1:${REDIS_PORT}" >> "$GITHUB_ENV" + for i in $(seq 1 15); do + docker exec "$REDIS_CONTAINER" redis-cli ping 2>/dev/null | grep -q PONG && { echo "Redis ready after ${i}s"; exit 0; } + sleep 1 + done + echo "::error::Redis did not become ready in 15s"; docker logs "$REDIS_CONTAINER" || true; exit 1 + - name: Build platform + working-directory: workspace-server + run: go build -o platform-server ./cmd/server + - name: Pick platform port + run: | + PLATFORM_PORT=$(python3 - <<'PY' + import socket + with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s: + s.bind(("127.0.0.1", 0)) + print(s.getsockname()[1]) + PY + ) + echo "PORT=${PLATFORM_PORT}" >> "$GITHUB_ENV" + echo "BASE=http://127.0.0.1:${PLATFORM_PORT}" >> "$GITHUB_ENV" + echo "Platform host port: ${PLATFORM_PORT}" + - name: Kill stale platform-server before start + run: | + killed=0 + for pid in $(grep -l "platform-serve" /proc/[0-9]*/comm 2>/dev/null); do + kpid="${pid%/comm}"; kpid="${kpid##*/}" + cmdline=$(cat "/proc/${kpid}/cmdline" 2>/dev/null | tr '\0' ' ') + if echo "$cmdline" | grep -q "platform-server"; then + echo "Killing stale platform-server pid ${kpid}" + kill "$kpid" 2>/dev/null || true; killed=$((killed + 1)) + fi + done + [ "$killed" -gt 0 ] && sleep 2 || true + echo "stale-kill done ($killed killed)" + - name: Start platform (background) + working-directory: workspace-server + run: | + ./platform-server > platform.log 2>&1 & + echo $! > platform.pid + - name: Wait for /health + run: | + for i in $(seq 1 30); do + curl -sf "$BASE/health" > /dev/null && { echo "Platform up after ${i}s"; exit 0; } + sleep 1 + done + echo "::error::Platform did not become healthy in 30s" + cat workspace-server/platform.log || true; exit 1 + - name: Run LOCAL fresh-provision peer-visibility E2E (literal MCP list_peers) + # HONEST gate — NO continue-on-error. Red today (Hermes-401 #162 / + # OpenClaw-never-online #165 not yet fixed); green when they land. + # Non-required-by-design via its distinct status context until the + # molecule-core#1296 flip-to-required. + run: bash tests/e2e/test_peer_visibility_mcp_local.sh + - name: Dump platform log on failure + if: failure() + run: cat workspace-server/platform.log || true + - name: Stop platform + if: always() + run: | + if [ -f workspace-server/platform.pid ]; then + kill "$(cat workspace-server/platform.pid)" 2>/dev/null || true + fi + - name: Stop service containers + if: always() + run: | + docker rm -f "$PG_CONTAINER" 2>/dev/null || true + docker rm -f "$REDIS_CONTAINER" 2>/dev/null || true + + # Real STAGING gate: provisions a throwaway org + sibling-per-runtime, + # drives the LITERAL list_peers MCP call per runtime, asserts 200 + + # expected peer set, then scoped teardown. push(main)/dispatch/cron only. peer-visibility: name: E2E Peer Visibility runs-on: ubuntu-latest diff --git a/tests/e2e/lib/peer_visibility_assert.sh b/tests/e2e/lib/peer_visibility_assert.sh new file mode 100644 index 000000000..9c21fbbfd --- /dev/null +++ b/tests/e2e/lib/peer_visibility_assert.sh @@ -0,0 +1,165 @@ +# shellcheck shell=bash +# Shared peer-visibility assertion core — runtime/backend-AGNOSTIC. +# +# WHY THIS FILE EXISTS +# -------------------- +# The peer-visibility gate (PR #1298) was staging-only. Per the standing +# rule that the local prod-mimic stack must run a MANDATORY local-Postgres +# E2E BEFORE staging E2E (memory: feedback_local_must_mimic_production, +# feedback_mandatory_local_e2e_before_ship, feedback_local_test_before_ +# staging_e2e), peer-visibility must also run against the local stack. +# +# The ASSERTION must be byte-identical between local and staging — only +# provisioning differs. So the literal MCP `list_peers` call + every +# anti-proxy / anti-native-fallback guarantee lives HERE, sourced by both +# tests/e2e/test_peer_visibility_mcp_staging.sh (staging/CP backend) and +# tests/e2e/test_peer_visibility_mcp_local.sh (local docker-compose +# backend). If this assertion ever diverges between the two, that is the +# bug — keep it in one place. +# +# THIS IS NOT A PROXY. pv_assert_runtime issues the byte-for-byte +# JSON-RPC `tools/call name=list_peers` envelope to `POST +# /workspaces/:id/mcp` using the workspace's OWN bearer token, through +# the real WorkspaceAuth + MCPRateLimiter middleware chain — the exact +# call mcp_molecule_list_peers makes from a canvas agent. It does NOT +# read a registry row, /health, the heartbeat table, or +# GET /registry/:id/peers. +# +# Contract: +# pv_assert_runtime \ +# +# +# staging: the X-Molecule-Org-Id header value. +# local: "" (the local single-tenant stack does +# not gate on the org header; the header +# is simply omitted when empty). +# every provisioned workspace id (parent + every +# runtime sibling). The expected peer set for this +# runtime is every id in here EXCEPT . +# +# Sets the global PV_VERDICT to one of: +# OK +# FAIL(http=) +# FAIL(native-fallback) +# FAIL(rpc=) +# FAIL(peers=) +# FAIL(unknown) +# Returns 0 when PV_VERDICT=OK, 1 otherwise. Never exits — the caller +# owns aggregation + the gate exit code (10 = regression reproduced). +# +# The literal JSON-RPC envelope. Identical to what +# workspace/platform_tools/registry.py's mcp_molecule_list_peers emits. +PV_RPC_BODY='{"jsonrpc":"2.0","id":1,"method":"tools/call","params":{"name":"list_peers","arguments":{}}}' + +pv_assert_runtime() { + local rt="$1" wid="$2" wtok="$3" base_url="$4" org_id="$5" all_ws_ids="$6" + + # Expected peer set = every OTHER provisioned workspace, excluding the + # caller itself. Byte-identical selection to the original staging script. + local expect_ids + expect_ids=$(echo "$all_ws_ids" | tr ' ' '\n' | grep -v "^${wid}$" | grep -v '^$') + + # X-Molecule-Org-Id only when the backend supplies one (staging multi- + # tenant). Local single-tenant omits it — the same WorkspaceAuth + + # MCPRateLimiter chain still runs; only the tenant-routing header differs. + local org_header=() + if [ -n "$org_id" ]; then + org_header=(-H "X-Molecule-Org-Id: $org_id") + fi + + local resp http_code body + set +e + resp=$(curl -sS -X POST "$base_url/workspaces/$wid/mcp" \ + -H "Authorization: Bearer $wtok" \ + "${org_header[@]}" \ + -H "Content-Type: application/json" \ + -d "$PV_RPC_BODY" \ + -o /tmp/pv_mcp_body.json -w "%{http_code}" 2>/dev/null) + set -e + http_code="$resp" + body=$(cat /tmp/pv_mcp_body.json 2>/dev/null || echo '') + + echo "--- $rt (ws=$wid) ---" + echo " HTTP $http_code" + echo " body: $(echo "$body" | head -c 600)" + + # (1) HTTP 200 — a 401 (WorkspaceAuth reject, the Hermes symptom) fails here. + if [ "$http_code" != "200" ]; then + echo " ✗ $rt: list_peers MCP call returned HTTP $http_code (expected 200)" + PV_VERDICT="FAIL(http=$http_code)" + return 1 + fi + + # (2) JSON-RPC result present, not an error object; expected sibling IDs + # present; not a native-sessions fallback. Byte-identical to the + # original staging script's inline python. + local parse + parse=$(echo "$body" | python3 -c " +import sys, json +expect = set(filter(None, '''$expect_ids'''.split())) +try: + d = json.load(sys.stdin) +except Exception as e: + print('PARSE_ERROR:' + str(e)); sys.exit(0) +if isinstance(d, dict) and d.get('error') is not None: + print('RPC_ERROR:' + json.dumps(d['error'])[:200]); sys.exit(0) +res = d.get('result') if isinstance(d, dict) else None +if res is None: + print('NO_RESULT'); sys.exit(0) +# MCP tools/call result shape: {content:[{type:text,text:''}]} +text = '' +if isinstance(res, dict): + for c in res.get('content', []): + if c.get('type') == 'text': + text += c.get('text', '') +text_l = text.lower() +# Native-sessions fallback signature (the OpenClaw symptom): the agent +# answered from its own runtime session list, not the platform peer set. +if 'sessions_list' in text_l or 'no platform peers' in text_l or 'native session' in text_l: + print('NATIVE_FALLBACK:' + text[:200]); sys.exit(0) +# The expected sibling IDs must literally appear in the returned peer text. +found = sorted(i for i in expect if i in text) +missing = sorted(expect - set(found)) +if not expect: + print('NO_EXPECTED_PEERS_CONFIGURED'); sys.exit(0) +if missing: + print('MISSING_PEERS:found=%d/%d missing=%s' % (len(found), len(expect), ','.join(m[:8] for m in missing))) + sys.exit(0) +print('OK:found=%d/%d' % (len(found), len(expect))) +" 2>/dev/null) + + case "$parse" in + OK:*) + echo " ✓ $rt: list_peers returned 200 and contains all expected peers ($parse)" + PV_VERDICT="OK" + return 0 + ;; + NATIVE_FALLBACK:*) + echo " ✗ $rt: list_peers fell back to NATIVE sessions — sees no platform peers ($parse)" + PV_VERDICT="FAIL(native-fallback)" + return 1 + ;; + RPC_ERROR:*|NO_RESULT|PARSE_ERROR:*) + echo " ✗ $rt: list_peers MCP call did not return a usable result ($parse)" + PV_VERDICT="FAIL(rpc=$parse)" + return 1 + ;; + MISSING_PEERS:*) + echo " ✗ $rt: list_peers returned 200 but peer set is wrong/empty ($parse)" + PV_VERDICT="FAIL(peers=$parse)" + return 1 + ;; + NO_EXPECTED_PEERS_CONFIGURED) + # Caller bug, not a runtime regression — surface loudly so a + # mis-wired backend can't mint a false green. + echo " ✗ $rt: no expected peers were configured for this caller" + PV_VERDICT="FAIL(rpc=NO_EXPECTED_PEERS_CONFIGURED)" + return 1 + ;; + *) + echo " ✗ $rt: unexpected verdict '$parse'" + PV_VERDICT="FAIL(unknown)" + return 1 + ;; + esac +} diff --git a/tests/e2e/test_peer_visibility_mcp_local.sh b/tests/e2e/test_peer_visibility_mcp_local.sh new file mode 100755 index 000000000..6fc454a3b --- /dev/null +++ b/tests/e2e/test_peer_visibility_mcp_local.sh @@ -0,0 +1,328 @@ +#!/usr/bin/env bash +# LOCAL E2E — fresh-provision peer-visibility gate via the LITERAL MCP path. +# +# WHY THIS EXISTS +# --------------- +# tests/e2e/test_peer_visibility_mcp_staging.sh (PR #1298) codified the +# literal user-facing peer-visibility path — but staging-only. The +# standing rule is that the local prod-mimic stack runs a MANDATORY +# local-Postgres E2E BEFORE staging E2E (memory: +# feedback_local_must_mimic_production, feedback_mandatory_local_e2e_ +# before_ship, feedback_local_test_before_staging_e2e, +# feedback_real_subprocess_test_for_boot_path). A staging-only gate means +# regressions are caught late and expensively on EC2. This is the LOCAL +# backend: same byte-identical assertion, local docker-compose stack. +# +# THE ASSERTION IS NOT A PROXY and is BYTE-IDENTICAL to staging — it is +# the SAME tests/e2e/lib/peer_visibility_assert.sh::pv_assert_runtime that +# the staging script calls. It issues the byte-for-byte JSON-RPC +# `tools/call name=list_peers` envelope to `POST /workspaces/:id/mcp` +# using each workspace's OWN bearer token, through the real WorkspaceAuth +# + MCPRateLimiter middleware chain — the exact call +# mcp_molecule_list_peers makes from a canvas agent. It does NOT read a +# registry row, /health, the heartbeat table, or GET /registry/:id/peers. +# +# Only PROVISIONING differs from staging: +# - staging: POST /cp/admin/orgs (cold EC2 tenant) + per-tenant admin +# token + each workspace's auth_token from the POST /workspaces resp. +# - local: POST /workspaces directly against the local stack +# (BASE, default http://localhost:8080), MCP bearer minted via +# GET /admin/workspaces/:id/test-token (e2e_mint_test_token — +# deterministic, gated by MOLECULE_ENV != production). Same model +# every other local E2E (test_priority_runtimes_e2e.sh, +# test_api.sh) already uses; no new credential/provision flow. +# +# It is written to FAIL on today's broken Hermes/OpenClaw behavior and go +# green only when the in-flight root-cause fixes (Hermes-401 #162, +# OpenClaw-never-online/MCP-wiring #165) actually land — same gate +# semantics + exit codes as the staging script. NON-required by design +# until then (flip-to-required tracked at molecule-core#1296), and NOT +# masked with continue-on-error (feedback_fix_root_not_symptom). +# +# Required env: none (local stack only). +# Optional env: +# BASE default http://localhost:8080 +# PV_RUNTIMES space list; default "hermes openclaw claude-code" +# E2E_PROVISION_TIMEOUT_SECS per-workspace online budget; default 900 +# (hermes cold apt+uv is the slow path locally) +# E2E_KEEP_WS 1 → skip teardown (local debugging only) +# LLM provider keys (a workspace boots only if its provider key is set; +# a runtime whose key is absent is SKIPPED, not failed — a partially +# keyed local env must not false-fail the gate): +# CLAUDE_CODE_OAUTH_TOKEN claude-code +# E2E_MINIMAX_API_KEY hermes/openclaw (MiniMax, preferred) +# E2E_ANTHROPIC_API_KEY hermes/openclaw (direct Anthropic) +# E2E_OPENAI_API_KEY hermes/openclaw (OpenAI) +# +# Exit codes (match the staging script): +# 0 every runtime under test saw its peers via the literal MCP call +# 1 generic failure +# 3 a workspace never reached online within the budget +# 10 peer-visibility regression reproduced (the gate firing as designed) + +set -uo pipefail + +source "$(dirname "$0")/_lib.sh" +# Byte-identical assertion shared with the staging backend. +# shellcheck source=tests/e2e/lib/peer_visibility_assert.sh +source "$(dirname "$0")/lib/peer_visibility_assert.sh" + +PV_RUNTIMES="${PV_RUNTIMES:-hermes openclaw claude-code}" +PROVISION_TIMEOUT_SECS="${E2E_PROVISION_TIMEOUT_SECS:-900}" +NAME_PREFIX="PV-Local-$$-$(date +%H%M%S)" + +log() { echo "[$(date +%H:%M:%S)] $*"; } +ok() { echo "[$(date +%H:%M:%S)] ✅ $*"; } + +CREATED_WSIDS=() + +# ─── Scoped teardown ─────────────────────────────────────────────────── +# Deletes ONLY the workspaces THIS run created (tracked in CREATED_WSIDS), +# one DELETE /workspaces/:id?confirm=true each. NEVER e2e_cleanup_all_ +# workspaces / any blanket sweep — honors feedback_cleanup_after_each_test +# and feedback_never_run_cluster_cleanup_tests_on_live_platform (a local +# stack can still be shared with other concurrent local E2E). +teardown() { + local rc=$? + set +e + if [ "${E2E_KEEP_WS:-0}" = "1" ]; then + echo "" + log "[teardown] E2E_KEEP_WS=1 — leaving ${#CREATED_WSIDS[@]} ws for debugging (REMEMBER TO DELETE)" + exit $rc + fi + echo "" + log "[teardown] deleting ${#CREATED_WSIDS[@]} workspace(s) this run created (scoped)" + for wid in ${CREATED_WSIDS[@]+"${CREATED_WSIDS[@]}"}; do + [ -n "$wid" ] || continue + curl -s -X DELETE "$BASE/workspaces/$wid?confirm=true" >/dev/null 2>&1 || true + done + exit $rc +} +trap teardown EXIT INT TERM + +# Pre-sweep workspaces a prior crashed run of THIS script left behind +# (name prefix match only — never a blanket delete). The trap fires on +# normal exit, but a kill -9 / SIGPIPE can bypass it. +PRIOR=$(curl -s "$BASE/workspaces" | python3 -c ' +import json, sys +try: + print(" ".join(w["id"] for w in json.load(sys.stdin) if w.get("name","").startswith("PV-Local-"))) +except Exception: + pass +' 2>/dev/null) +for _wid in $PRIOR; do + log "Pre-sweeping prior PV-Local workspace: $_wid" + curl -s -X DELETE "$BASE/workspaces/$_wid?confirm=true" >/dev/null 2>&1 || true +done + +# ─── Local-stack preflight ───────────────────────────────────────────── +log "0/5 local stack preflight: $BASE/health" +if ! curl -fsS "$BASE/health" -m 5 >/dev/null 2>&1; then + echo "::error::Local stack not healthy at $BASE/health — bring it up (make up) before this gate. Infra, not a workspace bug (feedback_fix_root_not_symptom)." >&2 + exit 1 +fi +# admin/test-token is the local MCP-bearer mint path; it 404s in +# production. If it is off, this gate cannot drive the literal call. +if ! curl -fsS "$BASE/admin/workspaces/preflight-probe/test-token" -m 5 >/dev/null 2>&1; then + # A 404 here is EITHER "no such ws" (fine — endpoint is enabled) OR the + # endpoint is disabled (MOLECULE_ENV=production). Distinguish by body. + PROBE=$(curl -s "$BASE/admin/workspaces/preflight-probe/test-token" -m 5 2>/dev/null) + if echo "$PROBE" | grep -qi 'production\|disabled\|not found.*endpoint'; then + echo "::error::GET /admin/workspaces/:id/test-token disabled (MOLECULE_ENV=production?). Cannot mint a local MCP bearer." >&2 + exit 1 + fi +fi +ok " local stack healthy" + +# ─── Resolve per-runtime provisioning secrets ────────────────────────── +# Mirrors test_priority_runtimes_e2e.sh / test_staging_full_saas.sh's +# provider-key chain. A runtime whose key is absent is SKIPPED (not +# failed) so a partially keyed local env doesn't false-fail the gate. +runtime_secrets() { + local rt="$1" + case "$rt" in + claude-code) + [ -n "${CLAUDE_CODE_OAUTH_TOKEN:-}" ] || { echo ""; return 1; } + python3 -c "import json,os;print(json.dumps({'CLAUDE_CODE_OAUTH_TOKEN':os.environ['CLAUDE_CODE_OAUTH_TOKEN']}))" + ;; + hermes|openclaw) + if [ -n "${E2E_MINIMAX_API_KEY:-}" ]; then + python3 -c "import json,os;k=os.environ['E2E_MINIMAX_API_KEY'];print(json.dumps({'ANTHROPIC_BASE_URL':'https://api.minimax.io/anthropic','ANTHROPIC_AUTH_TOKEN':k,'MINIMAX_API_KEY':k}))" + elif [ -n "${E2E_ANTHROPIC_API_KEY:-}" ]; then + python3 -c "import json,os;k=os.environ['E2E_ANTHROPIC_API_KEY'];print(json.dumps({'ANTHROPIC_API_KEY':k}))" + elif [ -n "${E2E_OPENAI_API_KEY:-}" ]; then + python3 -c "import json,os;k=os.environ['E2E_OPENAI_API_KEY'];print(json.dumps({'OPENAI_API_KEY':k,'OPENAI_BASE_URL':'https://api.openai.com/v1','MODEL_PROVIDER':'openai:gpt-4o','HERMES_INFERENCE_PROVIDER':'custom','HERMES_CUSTOM_BASE_URL':'https://api.openai.com/v1','HERMES_CUSTOM_API_KEY':k,'HERMES_CUSTOM_API_MODE':'chat_completions'}))" + else + echo ""; return 1 + fi + ;; + *) + # Unknown runtime: provision with empty secrets and let the stack + # decide (kept permissive so PV_RUNTIMES can be widened later). + echo "{}" + ;; + esac +} + +# Block until $1 reaches one of $2 (space-separated), or $3 sec elapse. +wait_for_status() { + local wsid="$1" want="$2" budget="$3" start=$SECONDS last="" + while [ $((SECONDS - start)) -lt "$budget" ]; do + local s + s=$(curl -s "$BASE/workspaces/$wsid" | python3 -c 'import json,sys +try: + d=json.load(sys.stdin); w=d.get("workspace") if isinstance(d.get("workspace"),dict) else d; print(w.get("status","")) +except Exception: + print("")' 2>/dev/null || echo "") + [ "$s" != "$last" ] && { log " $wsid → ${s:-}"; last="$s"; } + for w in $want; do [ "$s" = "$w" ] && { echo "$s"; return 0; }; done + sleep 5 + done + echo "$last" + return 1 +} + +# ─── 1. Provision parent (claude-code) + one sibling per runtime ─────── +# Same topology as the staging script: a claude-code parent plus one +# sibling per runtime under test, so each runtime should see all others. +log "1/5 provisioning parent (claude-code) + one sibling per runtime under test..." + +PARENT_SECRETS=$(runtime_secrets claude-code) || PARENT_SECRETS="" +if [ -z "$PARENT_SECRETS" ]; then + # Parent still needs to exist as a peer target even without an LLM key; + # it never has to answer list_peers itself (it is excluded from the + # caller set), so an empty-secrets claude-code shell is sufficient. + PARENT_SECRETS="{}" +fi +P_RESP=$(curl -s -X POST "$BASE/workspaces" -H "Content-Type: application/json" \ + -d "{\"name\":\"${NAME_PREFIX}-parent\",\"runtime\":\"claude-code\",\"tier\":3,\"secrets\":$PARENT_SECRETS}") +PARENT_ID=$(echo "$P_RESP" | python3 -c 'import json,sys;print(json.load(sys.stdin).get("id",""))' 2>/dev/null) +if [ -z "$PARENT_ID" ]; then + echo "::error::parent create failed: $(echo "$P_RESP" | head -c 300)" >&2 + exit 1 +fi +CREATED_WSIDS+=("$PARENT_ID") +log " PARENT_ID=$PARENT_ID" + +# NOTE: no `declare -A` — this script must also run on a local macOS dev +# box (bash 3.2, no associative arrays) per feedback_local_must_mimic_ +# production. WS_IDS / VERDICT are kept as newline-delimited "rtval" +# maps with tiny get/set helpers (portable to bash 3.2+ AND ubuntu CI). +WS_IDS_MAP="" +VERDICT_MAP="" +_map_set() { # _map_set + local __m="$1" __k="$2" __v="$3" __cur + eval "__cur=\$$__m" + __cur=$(printf '%s' "$__cur" | grep -v "^${__k} " || true) + if [ -n "$__cur" ]; then + eval "$__m=\$(printf '%s\n%s\t%s' \"\$__cur\" \"\$__k\" \"\$__v\")" + else + eval "$__m=\$(printf '%s\t%s' \"\$__k\" \"\$__v\")" + fi +} +_map_get() { # _map_get -> stdout value (empty if absent) + local __m="$1" __k="$2" __cur + eval "__cur=\$$__m" + printf '%s\n' "$__cur" | awk -F'\t' -v k="$__k" '$1==k {print $2; exit}' +} + +ALL_WS_IDS="$PARENT_ID" +ACTIVE_RUNTIMES="" +for rt in $PV_RUNTIMES; do + SEC=$(runtime_secrets "$rt") || SEC="" + if [ -z "$SEC" ]; then + log " SKIP $rt — no provider key in env (partially-keyed local env; not a failure)" + continue + fi + R=$(curl -s -X POST "$BASE/workspaces" -H "Content-Type: application/json" \ + -d "{\"name\":\"${NAME_PREFIX}-$rt\",\"runtime\":\"$rt\",\"tier\":2,\"parent_id\":\"$PARENT_ID\",\"secrets\":$SEC}") + WID=$(echo "$R" | python3 -c 'import json,sys;print(json.load(sys.stdin).get("id",""))' 2>/dev/null) + if [ -z "$WID" ]; then + echo "::error::$rt workspace create failed: $(echo "$R" | head -c 300)" >&2 + exit 1 + fi + _map_set WS_IDS_MAP "$rt" "$WID" + CREATED_WSIDS+=("$WID") + ALL_WS_IDS="$ALL_WS_IDS $WID" + ACTIVE_RUNTIMES="$ACTIVE_RUNTIMES $rt" + log " $rt → $WID" +done +ACTIVE_RUNTIMES="$(echo "$ACTIVE_RUNTIMES" | xargs)" + +if [ -z "$ACTIVE_RUNTIMES" ]; then + echo "::error::No runtime had a provider key set — cannot run the local peer-visibility gate. Set CLAUDE_CODE_OAUTH_TOKEN and/or E2E_MINIMAX_API_KEY (or ANTHROPIC/OPENAI)." >&2 + exit 1 +fi + +# ─── 2. Wait for the parent online (it is a peer target) ─────────────── +log "2/5 waiting for parent online (peer target)..." +PF=$(wait_for_status "$PARENT_ID" "online" "$PROVISION_TIMEOUT_SECS") || true +if [ "$PF" != "online" ]; then + echo "::error::parent ($PARENT_ID) never reached online (last=$PF) within ${PROVISION_TIMEOUT_SECS}s" >&2 + exit 3 +fi +ok " parent online" + +# ─── 3. Wait for every sibling online ────────────────────────────────── +# A runtime that never comes online locally is itself a finding: it +# reproduces the openclaw-never-online class (#165) on the local stack. +log "3/5 waiting for all siblings online (up to ${PROVISION_TIMEOUT_SECS}s each — cold boot)..." +REGRESSED=0 +ONLINE_RUNTIMES="" +for rt in $ACTIVE_RUNTIMES; do + wid="$(_map_get WS_IDS_MAP "$rt")" + S=$(wait_for_status "$wid" "online" "$PROVISION_TIMEOUT_SECS") || true + if [ "$S" != "online" ]; then + echo " ✗ $rt ($wid): never reached online (last=$S) — reproduces the never-online class locally" + _map_set VERDICT_MAP "$rt" "FAIL(never-online:last=$S)" + REGRESSED=1 + continue + fi + ok " $rt online" + ONLINE_RUNTIMES="$ONLINE_RUNTIMES $rt" +done + +# ─── 4. THE GATE — literal mcp_molecule_list_peers via POST /:id/mcp ──── +# Shared, byte-identical assertion. Local passes "" for the org id (the +# single-tenant local stack does not gate on X-Molecule-Org-Id); the +# literal MCP call + every anti-proxy / anti-native-fallback guarantee is +# the SAME code the staging backend runs. +log "4/5 driving the LITERAL list_peers MCP call per online runtime..." +echo "" +for rt in $ONLINE_RUNTIMES; do + wid="$(_map_get WS_IDS_MAP "$rt")" + WTOK=$(e2e_mint_test_token "$wid" 2>/dev/null || true) + if [ -z "$WTOK" ]; then + echo "--- $rt (ws=$wid) ---" + echo " ✗ $rt: could not mint a local MCP bearer (admin/test-token) — cannot drive the literal call" + _map_set VERDICT_MAP "$rt" "FAIL(no-bearer)" + REGRESSED=1 + echo "" + continue + fi + PV_VERDICT="" + pv_assert_runtime "$rt" "$wid" "$WTOK" "$BASE" "" "$ALL_WS_IDS" || REGRESSED=1 + _map_set VERDICT_MAP "$rt" "$PV_VERDICT" + echo "" +done + +# ─── 5. Summary + honest gate exit ───────────────────────────────────── +echo "=== SUMMARY — LOCAL fresh-provision peer-visibility (literal MCP list_peers) ===" +for rt in $ACTIVE_RUNTIMES; do + _v="$(_map_get VERDICT_MAP "$rt")" + printf ' %-14s %s\n' "$rt" "${_v:-NO_RUN}" +done +echo "" + +if [ "$REGRESSED" -ne 0 ]; then + echo "✗ GATE FAILED (LOCAL) — at least one runtime cannot see its peers via" + echo " the literal mcp_molecule_list_peers call on the local prod-mimic" + echo " stack. This is the SAME user-facing failure the proxy signals were" + echo " hiding, reproduced locally (far faster than EC2). Expected RED until" + echo " the Hermes-401 (#162) + OpenClaw-never-online/MCP-wiring (#165)" + echo " root-cause fixes land; goes green only when they actually do." + exit 10 +fi + +ok "GATE PASSED (LOCAL) — every runtime under test sees its platform peers via the literal MCP call." +exit 0 diff --git a/tests/e2e/test_peer_visibility_mcp_staging.sh b/tests/e2e/test_peer_visibility_mcp_staging.sh index 44bb35aa3..3600e1b23 100755 --- a/tests/e2e/test_peer_visibility_mcp_staging.sh +++ b/tests/e2e/test_peer_visibility_mcp_staging.sh @@ -64,6 +64,13 @@ set -uo pipefail +# The literal MCP list_peers assertion lives in the shared, backend- +# agnostic lib so it is BYTE-IDENTICAL between this staging backend and +# the local docker-compose backend (tests/e2e/test_peer_visibility_mcp_ +# local.sh). Only provisioning/teardown differs per backend. +# shellcheck source=tests/e2e/lib/peer_visibility_assert.sh +source "$(dirname "${BASH_SOURCE[0]}")/lib/peer_visibility_assert.sh" + CP_URL="${MOLECULE_CP_URL:-https://staging-api.moleculesai.app}" ADMIN_TOKEN="${MOLECULE_ADMIN_TOKEN:?MOLECULE_ADMIN_TOKEN required — Railway staging CP_ADMIN_API_TOKEN}" RUN_ID_SUFFIX="${E2E_RUN_ID:-$(date +%H%M%S)-$$}" @@ -259,101 +266,19 @@ done # through WorkspaceAuth + MCPRateLimiter. log "6/6 driving the LITERAL list_peers MCP call per runtime..." echo "" -RPC_BODY='{"jsonrpc":"2.0","id":1,"method":"tools/call","params":{"name":"list_peers","arguments":{}}}' REGRESSED=0 declare -A VERDICT for rt in $PV_RUNTIMES; do wid="${WS_IDS[$rt]}" wtok="${WS_TOKENS[$rt]}" - # The expected peer set = every OTHER provisioned workspace (parent + - # the sibling runtimes), excluding the caller itself. - EXPECT_IDS=$(echo "$ALL_WS_IDS" | tr ' ' '\n' | grep -v "^${wid}$" | grep -v '^$') - - set +e - RESP=$(curl -sS -X POST "$TENANT_URL/workspaces/$wid/mcp" \ - -H "Authorization: Bearer $wtok" \ - -H "X-Molecule-Org-Id: $ORG_ID" \ - -H "Content-Type: application/json" \ - -d "$RPC_BODY" \ - -o /tmp/pv_mcp_body.json -w "%{http_code}" 2>/dev/null) - set -e - HTTP_CODE="$RESP" - BODY=$(cat /tmp/pv_mcp_body.json 2>/dev/null || echo '') - - echo "--- $rt (ws=$wid) ---" - echo " HTTP $HTTP_CODE" - echo " body: $(echo "$BODY" | head -c 600)" - - # (1) HTTP 200 — a 401 (WorkspaceAuth reject, the Hermes symptom) fails here. - if [ "$HTTP_CODE" != "200" ]; then - echo " ✗ $rt: list_peers MCP call returned HTTP $HTTP_CODE (expected 200)" - VERDICT[$rt]="FAIL(http=$HTTP_CODE)" - REGRESSED=1 - continue - fi - - # (2) JSON-RPC result present, not an error object. - PARSE=$(echo "$BODY" | python3 -c " -import sys, json -expect = set(filter(None, '''$EXPECT_IDS'''.split())) -try: - d = json.load(sys.stdin) -except Exception as e: - print('PARSE_ERROR:' + str(e)); sys.exit(0) -if isinstance(d, dict) and d.get('error') is not None: - print('RPC_ERROR:' + json.dumps(d['error'])[:200]); sys.exit(0) -res = d.get('result') if isinstance(d, dict) else None -if res is None: - print('NO_RESULT'); sys.exit(0) -# MCP tools/call result shape: {content:[{type:text,text:''}]} -text = '' -if isinstance(res, dict): - for c in res.get('content', []): - if c.get('type') == 'text': - text += c.get('text', '') -text_l = text.lower() -# Native-sessions fallback signature (the OpenClaw symptom): the agent -# answered from its own runtime session list, not the platform peer set. -if 'sessions_list' in text_l or 'no platform peers' in text_l or 'native session' in text_l: - print('NATIVE_FALLBACK:' + text[:200]); sys.exit(0) -# The expected sibling IDs must literally appear in the returned peer text. -found = sorted(i for i in expect if i in text) -missing = sorted(expect - set(found)) -if not expect: - print('NO_EXPECTED_PEERS_CONFIGURED'); sys.exit(0) -if missing: - print('MISSING_PEERS:found=%d/%d missing=%s' % (len(found), len(expect), ','.join(m[:8] for m in missing))) - sys.exit(0) -print('OK:found=%d/%d' % (len(found), len(expect))) -" 2>/dev/null) - - case "$PARSE" in - OK:*) - echo " ✓ $rt: list_peers returned 200 and contains all expected peers ($PARSE)" - VERDICT[$rt]="OK" - ;; - NATIVE_FALLBACK:*) - echo " ✗ $rt: list_peers fell back to NATIVE sessions — sees no platform peers ($PARSE)" - VERDICT[$rt]="FAIL(native-fallback)" - REGRESSED=1 - ;; - RPC_ERROR:*|NO_RESULT|PARSE_ERROR:*) - echo " ✗ $rt: list_peers MCP call did not return a usable result ($PARSE)" - VERDICT[$rt]="FAIL(rpc=$PARSE)" - REGRESSED=1 - ;; - MISSING_PEERS:*) - echo " ✗ $rt: list_peers returned 200 but peer set is wrong/empty ($PARSE)" - VERDICT[$rt]="FAIL(peers=$PARSE)" - REGRESSED=1 - ;; - *) - echo " ✗ $rt: unexpected verdict '$PARSE'" - VERDICT[$rt]="FAIL(unknown)" - REGRESSED=1 - ;; - esac + # Byte-identical assertion via the shared lib. Staging passes ORG_ID as + # the X-Molecule-Org-Id header value; the literal MCP call + every + # anti-proxy / anti-native-fallback guarantee is the SAME code the + # local backend runs. + PV_VERDICT="" + pv_assert_runtime "$rt" "$wid" "$wtok" "$TENANT_URL" "$ORG_ID" "$ALL_WS_IDS" || REGRESSED=1 + VERDICT[$rt]="$PV_VERDICT" echo "" done -- 2.52.0