Merge pull request #2401 from Molecule-AI/auto/local-production-shape-harness
feat(tests): add production-shape local harness (Phase 1)
This commit is contained in:
commit
6159429634
110
tests/harness/README.md
Normal file
110
tests/harness/README.md
Normal file
@ -0,0 +1,110 @@
|
||||
# Production-shape local harness
|
||||
|
||||
The harness brings up the SaaS tenant topology on localhost using the
|
||||
same `Dockerfile.tenant` image that ships to production. Tests run
|
||||
against `http://harness-tenant.localhost:8080` and exercise the
|
||||
SAME code path a real tenant takes — including TenantGuard middleware,
|
||||
the `/cp/*` reverse proxy, the canvas reverse proxy, and a
|
||||
Cloudflare-tunnel-shape header rewrite layer.
|
||||
|
||||
## Why this exists
|
||||
|
||||
Local `go run ./cmd/server` skips:
|
||||
- `TenantGuard` middleware (no `MOLECULE_ORG_ID` env)
|
||||
- `/cp/*` reverse proxy mount (no `CP_UPSTREAM_URL` env)
|
||||
- `CANVAS_PROXY_URL` (canvas runs separately on `:3000`)
|
||||
- Header rewrites that production's CF tunnel + LB perform
|
||||
- Strict-auth mode (no live `ADMIN_TOKEN`)
|
||||
|
||||
Bugs that survive `go run` and ship to production almost always live
|
||||
in one of those layers. The harness activates ALL of them.
|
||||
|
||||
## Topology
|
||||
|
||||
```
|
||||
client
|
||||
↓
|
||||
cf-proxy nginx, mirrors CF tunnel header rewrites
|
||||
↓ (Host:harness-tenant.localhost, X-Forwarded-*)
|
||||
tenant workspace-server/Dockerfile.tenant — same image as prod
|
||||
↓ (CP_UPSTREAM_URL=http://cp-stub:9090, /cp/* proxied)
|
||||
cp-stub minimal Go service, mocks CP wire surface
|
||||
postgres same version as production
|
||||
redis same version as production
|
||||
```
|
||||
|
||||
## Quickstart
|
||||
|
||||
```bash
|
||||
cd tests/harness
|
||||
./up.sh # builds + starts all services
|
||||
./seed.sh # mints admin token, registers two sample workspaces
|
||||
./replays/peer-discovery-404.sh
|
||||
./replays/buildinfo-stale-image.sh
|
||||
./down.sh # tear down + remove volumes
|
||||
```
|
||||
|
||||
First-time setup needs an `/etc/hosts` entry so `harness-tenant.localhost`
|
||||
resolves to the local cf-proxy:
|
||||
|
||||
```bash
|
||||
echo "127.0.0.1 harness-tenant.localhost" | sudo tee -a /etc/hosts
|
||||
```
|
||||
|
||||
(macOS resolves `*.localhost` automatically in some setups; Linux
|
||||
typically does not.)
|
||||
|
||||
## Replay scripts
|
||||
|
||||
Each replay script reproduces a real bug class against the harness so
|
||||
fixes can be verified locally before deploy. The bar for adding a
|
||||
replay is "this bug shipped to production despite local E2E being
|
||||
green" — the script becomes the regression gate that closes that gap.
|
||||
|
||||
| Replay | Closes | What it proves |
|
||||
|--------|--------|----------------|
|
||||
| `peer-discovery-404.sh` | #2397 | tool_list_peers surfaces the actual reason instead of "may be isolated" |
|
||||
| `buildinfo-stale-image.sh` | #2395 | GIT_SHA reaches the binary; verify-step comparison logic works |
|
||||
|
||||
To add a new replay:
|
||||
1. Drop a script under `replays/` named after the issue.
|
||||
2. The script's purpose: reproduce the production failure mode against
|
||||
the harness, then assert the fix is present. PASS criterion is the
|
||||
post-fix behavior.
|
||||
3. Wire it into the `tests/harness/run-all-replays.sh` runner (TODO,
|
||||
Phase 2).
|
||||
|
||||
## Extending the cp-stub
|
||||
|
||||
`cp-stub/main.go` serves the minimum surface for the existing replays
|
||||
plus a catch-all that returns 501 + a clear message when the tenant
|
||||
asks for a route the stub doesn't implement. To add a new CP route:
|
||||
|
||||
1. Add a `mux.HandleFunc` in `cp-stub/main.go` for the path.
|
||||
2. Return the same wire shape the real CP returns. The contract is
|
||||
"wire compatibility with the staging CP at the time of writing" —
|
||||
document it with a comment pointing at the real CP handler.
|
||||
3. Add a replay script that exercises the path.
|
||||
|
||||
## What the harness does NOT cover
|
||||
|
||||
- Real TLS / cert handling (CF terminates TLS in production; harness is
|
||||
HTTP-only).
|
||||
- Cloudflare API edge cases (rate limits, DNS propagation timing).
|
||||
- Real EC2 / SSM / EBS behavior (image-cache replay simulates the
|
||||
outcome but not the AWS API surface).
|
||||
- Cross-region or multi-AZ topology.
|
||||
- Real production data scale.
|
||||
|
||||
These are intentional Phase 1 limits. If a bug class hits one of these
|
||||
gaps, escalate to staging E2E rather than expanding the harness past
|
||||
its mandate of "exercise the tenant binary in production-shape topology."
|
||||
|
||||
## Roadmap
|
||||
|
||||
- **Phase 1 (this PR):** harness + cp-stub + cf-proxy + 2 replays.
|
||||
- **Phase 2:** convert `tests/e2e/test_api.sh` to run against the
|
||||
harness instead of localhost. Make harness-based E2E a required CI
|
||||
check.
|
||||
- **Phase 3:** config-coherence lint that diffs harness env list
|
||||
against production CP's env list, fails CI on drift.
|
||||
68
tests/harness/cf-proxy/nginx.conf
Normal file
68
tests/harness/cf-proxy/nginx.conf
Normal file
@ -0,0 +1,68 @@
|
||||
# cf-proxy — Cloudflare-tunnel-shape reverse proxy for the local harness.
|
||||
#
|
||||
# Production path: agent → CF tunnel → AWS LB → tenant container.
|
||||
# This config replays the same header rewrites the CF tunnel does so
|
||||
# the tenant sees the same Host + X-Forwarded-* it would in production.
|
||||
#
|
||||
# The tenant's TenantGuard middleware activates on MOLECULE_ORG_ID; the
|
||||
# canvas's same-origin fetches use the Host header for cookie scoping.
|
||||
# Both behave correctly in production because CF rewrites Host to the
|
||||
# tenant subdomain — this proxy reproduces that locally.
|
||||
#
|
||||
# How tests reach it:
|
||||
# curl --resolve 'harness-tenant.localhost:8443:127.0.0.1' \
|
||||
# https://harness-tenant.localhost:8443/health
|
||||
# or via /etc/hosts (added automatically by ./up.sh on first boot).
|
||||
|
||||
worker_processes 1;
|
||||
events { worker_connections 256; }
|
||||
|
||||
http {
|
||||
# Map the wildcard <slug>.localhost to the tenant container. The
|
||||
# tenant container itself doesn't care which slug routed to it —
|
||||
# what matters is that the Host header it sees matches what
|
||||
# production's CF tunnel sets, so cookie/CORS/TenantGuard logic
|
||||
# exercises the same code path.
|
||||
server {
|
||||
listen 8080;
|
||||
server_name *.localhost localhost;
|
||||
|
||||
# Cap upload at 50MB to mirror the staging tenant nginx limit;
|
||||
# chat upload tests will fail closed if the platform handler
|
||||
# ever silently expands its limit (catches the failure mode
|
||||
# opposite of the chat-files lazy-heal incident).
|
||||
client_max_body_size 50m;
|
||||
|
||||
location / {
|
||||
proxy_pass http://tenant:8080;
|
||||
|
||||
# Header parity with CF tunnel + AWS LB. Production CF sets
|
||||
# X-Forwarded-Proto=https; we keep http here because TLS
|
||||
# termination in compose is unnecessary for testing the
|
||||
# tenant logic — TLS is a CF concern, not a tenant bug
|
||||
# surface. If TLS-specific bugs ever bite, add cert-manager
|
||||
# + listen 8443 ssl here.
|
||||
proxy_set_header Host $host;
|
||||
proxy_set_header X-Real-IP $remote_addr;
|
||||
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
|
||||
proxy_set_header X-Forwarded-Host $host;
|
||||
proxy_set_header X-Forwarded-Proto $scheme;
|
||||
|
||||
# Streamable HTTP / SSE / WebSocket — the tenant exposes /ws
|
||||
# and /events/stream + MCP /mcp/stream. Disabling buffering
|
||||
# reproduces CF tunnel's pass-through streaming semantics
|
||||
# (CF tunnel = no buffering by default; nginx default IS
|
||||
# buffering, which would mask issue #2397-class streaming
|
||||
# bugs by accumulating output until the client disconnects).
|
||||
proxy_buffering off;
|
||||
proxy_request_buffering off;
|
||||
proxy_http_version 1.1;
|
||||
proxy_set_header Connection "";
|
||||
|
||||
# Read timeout — CF tunnel default is 100s. Setting this to
|
||||
# the same value catches "long agent run finishes after the
|
||||
# proxy already closed the upstream" failure mode.
|
||||
proxy_read_timeout 100s;
|
||||
}
|
||||
}
|
||||
}
|
||||
132
tests/harness/compose.yml
Normal file
132
tests/harness/compose.yml
Normal file
@ -0,0 +1,132 @@
|
||||
# Production-shape harness for local E2E.
|
||||
#
|
||||
# Reproduces the SaaS tenant topology on localhost using the SAME
|
||||
# images that ship to production:
|
||||
#
|
||||
# client → cf-proxy (nginx, mimics CF tunnel headers)
|
||||
# → tenant (workspace-server/Dockerfile.tenant — combined platform + canvas)
|
||||
# → cp-stub (control-plane stand-in) for /cp/* and CP-callback paths
|
||||
# → postgres + redis (same versions as production)
|
||||
#
|
||||
# Why this matters: the workspace-server binary IS identical between
|
||||
# local and production. The bugs that survive local E2E are topology
|
||||
# bugs — env-gated middleware (TenantGuard, CP proxy, Canvas proxy),
|
||||
# auth state, header rewrites, real production image. This harness
|
||||
# activates ALL of them.
|
||||
#
|
||||
# Quickstart:
|
||||
# cd tests/harness && ./up.sh
|
||||
# ./seed.sh
|
||||
# ./replays/peer-discovery-404.sh # reproduces issue #2397
|
||||
#
|
||||
# Env config:
|
||||
# GIT_SHA — passed to the tenant build for /buildinfo verification.
|
||||
# Defaults to "harness" so /buildinfo distinguishes the
|
||||
# harness build from any cached image.
|
||||
# CP_STUB_PEERS_MODE — peers failure mode for replay scripts.
|
||||
# "" / "404" / "401" / "500" / "timeout".
|
||||
|
||||
services:
|
||||
postgres:
|
||||
image: postgres:16-alpine
|
||||
environment:
|
||||
POSTGRES_USER: harness
|
||||
POSTGRES_PASSWORD: harness
|
||||
POSTGRES_DB: molecule
|
||||
networks: [harness-net]
|
||||
healthcheck:
|
||||
test: ["CMD-SHELL", "pg_isready -U harness"]
|
||||
interval: 2s
|
||||
timeout: 5s
|
||||
retries: 10
|
||||
|
||||
redis:
|
||||
image: redis:7-alpine
|
||||
networks: [harness-net]
|
||||
healthcheck:
|
||||
test: ["CMD", "redis-cli", "ping"]
|
||||
interval: 2s
|
||||
timeout: 5s
|
||||
retries: 10
|
||||
|
||||
cp-stub:
|
||||
build:
|
||||
context: ./cp-stub
|
||||
environment:
|
||||
PORT: "9090"
|
||||
CP_STUB_PEERS_MODE: "${CP_STUB_PEERS_MODE:-}"
|
||||
networks: [harness-net]
|
||||
healthcheck:
|
||||
test: ["CMD-SHELL", "wget -q -O- http://localhost:9090/healthz || exit 1"]
|
||||
interval: 2s
|
||||
timeout: 5s
|
||||
retries: 10
|
||||
|
||||
# The actual production tenant image — same Dockerfile.tenant CI publishes.
|
||||
# This is the load-bearing part of the harness: every bug class that hides
|
||||
# behind "but it works locally" is reproducible HERE, against this image,
|
||||
# not against `go run ./cmd/server`.
|
||||
tenant:
|
||||
build:
|
||||
context: ../..
|
||||
dockerfile: workspace-server/Dockerfile.tenant
|
||||
args:
|
||||
GIT_SHA: "${GIT_SHA:-harness}"
|
||||
depends_on:
|
||||
postgres:
|
||||
condition: service_healthy
|
||||
redis:
|
||||
condition: service_healthy
|
||||
cp-stub:
|
||||
condition: service_healthy
|
||||
environment:
|
||||
DATABASE_URL: "postgres://harness:harness@postgres:5432/molecule?sslmode=disable"
|
||||
REDIS_URL: "redis://redis:6379"
|
||||
PORT: "8080"
|
||||
PLATFORM_URL: "http://tenant:8080"
|
||||
MOLECULE_ENV: "production"
|
||||
# ADMIN_TOKEN flips the platform into strict-auth mode (matches
|
||||
# production's CP-minted token configuration). Seeded value lets
|
||||
# E2E scripts authenticate without going through CP.
|
||||
ADMIN_TOKEN: "harness-admin-token"
|
||||
# MOLECULE_ORG_ID — activates TenantGuard middleware. Every request
|
||||
# must carry X-Molecule-Org-Id matching this value. Replays bugs
|
||||
# that only fire in SaaS mode.
|
||||
MOLECULE_ORG_ID: "harness-org"
|
||||
# CP_UPSTREAM_URL — activates the /cp/* reverse proxy mount in
|
||||
# router.go. Without this set, /cp/* would 404 and the canvas
|
||||
# bootstrap would silently drift from production behavior.
|
||||
CP_UPSTREAM_URL: "http://cp-stub:9090"
|
||||
RATE_LIMIT: "1000"
|
||||
# Canvas auto-proxy — entrypoint-tenant.sh exports CANVAS_PROXY_URL
|
||||
# by default; keeping it explicit here makes the topology readable.
|
||||
CANVAS_PROXY_URL: "http://localhost:3000"
|
||||
networks: [harness-net]
|
||||
healthcheck:
|
||||
test: ["CMD-SHELL", "wget -q -O- http://localhost:8080/health || exit 1"]
|
||||
interval: 5s
|
||||
timeout: 5s
|
||||
retries: 20
|
||||
|
||||
# Cloudflare-tunnel-shape proxy — strips the :8080 suffix, rewrites
|
||||
# Host to the tenant subdomain, injects X-Forwarded-*. Tests target
|
||||
# http://harness-tenant.localhost:8080 and exercise the production
|
||||
# routing layer.
|
||||
cf-proxy:
|
||||
image: nginx:1.27-alpine
|
||||
depends_on:
|
||||
tenant:
|
||||
condition: service_healthy
|
||||
volumes:
|
||||
- ./cf-proxy/nginx.conf:/etc/nginx/nginx.conf:ro
|
||||
# Bind to 127.0.0.1 only — the harness uses a hardcoded ADMIN_TOKEN
|
||||
# ("harness-admin-token") so binding 0.0.0.0 (compose's default)
|
||||
# would expose admin access to anyone on the local network or VPN.
|
||||
# Loopback-only is safe for E2E and prevents a known-token leak.
|
||||
ports:
|
||||
- "127.0.0.1:8080:8080"
|
||||
networks: [harness-net]
|
||||
|
||||
networks:
|
||||
harness-net:
|
||||
name: molecule-harness-net
|
||||
14
tests/harness/cp-stub/Dockerfile
Normal file
14
tests/harness/cp-stub/Dockerfile
Normal file
@ -0,0 +1,14 @@
|
||||
# cp-stub — minimal CP stand-in for the local production-shape harness.
|
||||
# See main.go for the rationale. Self-contained build, no module deps.
|
||||
|
||||
FROM golang:1.25-alpine AS builder
|
||||
WORKDIR /src
|
||||
COPY go.mod ./
|
||||
COPY main.go ./
|
||||
RUN CGO_ENABLED=0 GOOS=linux go build -ldflags="-s -w" -o /cp-stub .
|
||||
|
||||
FROM alpine:3.20
|
||||
RUN apk add --no-cache ca-certificates
|
||||
COPY --from=builder /cp-stub /cp-stub
|
||||
EXPOSE 9090
|
||||
ENTRYPOINT ["/cp-stub"]
|
||||
3
tests/harness/cp-stub/go.mod
Normal file
3
tests/harness/cp-stub/go.mod
Normal file
@ -0,0 +1,3 @@
|
||||
module github.com/Molecule-AI/molecule-monorepo/tests/harness/cp-stub
|
||||
|
||||
go 1.25
|
||||
113
tests/harness/cp-stub/main.go
Normal file
113
tests/harness/cp-stub/main.go
Normal file
@ -0,0 +1,113 @@
|
||||
// cp-stub — minimal control-plane stand-in for the local production-shape harness.
|
||||
//
|
||||
// In production, the tenant Go server reverse-proxies /cp/* to the SaaS
|
||||
// control-plane (molecule-controlplane). This stub plays that role on
|
||||
// localhost so we can exercise the SAME code path the tenant takes in
|
||||
// production — `if cpURL := os.Getenv("CP_UPSTREAM_URL"); cpURL != ""`
|
||||
// in workspace-server/internal/router/router.go fires, the proxy mount
|
||||
// activates, and tests exercise the real tenant→CP wire.
|
||||
//
|
||||
// This is NOT a CP reimplementation. It serves the minimum surface to:
|
||||
// 1. Boot the tenant image without /cp/* breaking the canvas bootstrap.
|
||||
// 2. Replay specific bug classes (e.g. /cp/* returns 404, returns 5xx,
|
||||
// returns malformed JSON) by toggling env vars.
|
||||
//
|
||||
// Scope is bounded by what the tenant + canvas actually call. Add new
|
||||
// handlers as new replay scenarios demand them. Drift from real CP is
|
||||
// tolerated because each handler is named for the exact path it serves —
|
||||
// when the real CP changes, the failing scenario tells us where to look.
|
||||
package main
|
||||
|
||||
import (
|
||||
"encoding/json"
|
||||
"fmt"
|
||||
"log"
|
||||
"net/http"
|
||||
"os"
|
||||
"sync/atomic"
|
||||
)
|
||||
|
||||
// redeployFleetCalls tracks how many times /cp/admin/tenants/redeploy-fleet
|
||||
// was invoked. Replay scripts assert > 0 to confirm the workflow's redeploy
|
||||
// step actually reached the stub (catches misrouted CP_URL configs).
|
||||
var redeployFleetCalls atomic.Int64
|
||||
|
||||
func main() {
|
||||
mux := http.NewServeMux()
|
||||
|
||||
// /cp/auth/me — canvas calls this on bootstrap; minimal user record
|
||||
// keeps the canvas from redirecting to login during local E2E.
|
||||
mux.HandleFunc("/cp/auth/me", func(w http.ResponseWriter, r *http.Request) {
|
||||
writeJSON(w, 200, map[string]any{
|
||||
"id": "harness-user",
|
||||
"email": "harness@local",
|
||||
"org_id": "harness-org",
|
||||
"roles": []string{"admin"},
|
||||
})
|
||||
})
|
||||
|
||||
// /cp/admin/tenants/redeploy-fleet — exercised by the
|
||||
// redeploy-tenants-on-{staging,main} workflow's local replay. Returns
|
||||
// the same shape the real CP returns so the verify-fleet logic in CI
|
||||
// can be tested without spinning up a real EC2 fleet.
|
||||
mux.HandleFunc("/cp/admin/tenants/redeploy-fleet", func(w http.ResponseWriter, r *http.Request) {
|
||||
redeployFleetCalls.Add(1)
|
||||
writeJSON(w, 200, map[string]any{
|
||||
"ok": true,
|
||||
"results": []map[string]any{
|
||||
{
|
||||
"slug": "harness-tenant",
|
||||
"phase": "redeploy",
|
||||
"ssm_status": "Success",
|
||||
"ssm_exit_code": 0,
|
||||
"healthz_ok": true,
|
||||
},
|
||||
},
|
||||
})
|
||||
})
|
||||
|
||||
// __stub/state — expose stub state (counters) so replay scripts can
|
||||
// assert the tenant actually reached us. Read-only.
|
||||
mux.HandleFunc("/__stub/state", func(w http.ResponseWriter, r *http.Request) {
|
||||
writeJSON(w, 200, map[string]any{
|
||||
"redeploy_fleet_calls": redeployFleetCalls.Load(),
|
||||
})
|
||||
})
|
||||
|
||||
// Catch-all for any /cp/* the tenant proxies. Keeps the harness from
|
||||
// crashing the canvas when a new CP route is added — surfaces a clear
|
||||
// "stub doesn't implement X" error instead of opaque 502 from the
|
||||
// reverse proxy.
|
||||
mux.HandleFunc("/cp/", func(w http.ResponseWriter, r *http.Request) {
|
||||
writeJSON(w, 501, map[string]any{
|
||||
"error": "cp-stub: handler not implemented for " + r.Method + " " + r.URL.Path,
|
||||
"hint": "add a handler in tests/harness/cp-stub/main.go for the scenario you're testing",
|
||||
})
|
||||
})
|
||||
|
||||
// /healthz — readiness probe for compose's depends_on.
|
||||
mux.HandleFunc("/healthz", func(w http.ResponseWriter, r *http.Request) {
|
||||
writeJSON(w, 200, map[string]any{"status": "ok"})
|
||||
})
|
||||
|
||||
addr := ":" + envOr("PORT", "9090")
|
||||
log.Printf("cp-stub listening on %s", addr)
|
||||
if err := http.ListenAndServe(addr, mux); err != nil {
|
||||
log.Fatal(err)
|
||||
}
|
||||
}
|
||||
|
||||
func writeJSON(w http.ResponseWriter, code int, body any) {
|
||||
w.Header().Set("Content-Type", "application/json")
|
||||
w.WriteHeader(code)
|
||||
if err := json.NewEncoder(w).Encode(body); err != nil {
|
||||
fmt.Fprintf(os.Stderr, "cp-stub: write json: %v\n", err)
|
||||
}
|
||||
}
|
||||
|
||||
func envOr(k, def string) string {
|
||||
if v := os.Getenv(k); v != "" {
|
||||
return v
|
||||
}
|
||||
return def
|
||||
}
|
||||
6
tests/harness/down.sh
Executable file
6
tests/harness/down.sh
Executable file
@ -0,0 +1,6 @@
|
||||
#!/usr/bin/env bash
|
||||
set -euo pipefail
|
||||
HERE="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
||||
cd "$HERE"
|
||||
docker compose -f compose.yml down -v --remove-orphans
|
||||
echo "[harness] down + volumes removed."
|
||||
75
tests/harness/replays/buildinfo-stale-image.sh
Executable file
75
tests/harness/replays/buildinfo-stale-image.sh
Executable file
@ -0,0 +1,75 @@
|
||||
#!/usr/bin/env bash
|
||||
# Replay for issue #2395 — local proof that the /buildinfo verify gate
|
||||
# closes the SaaS deploy-chain blindness.
|
||||
#
|
||||
# Prior behavior: redeploy-fleet returned ssm_status=Success based on
|
||||
# the SSM RPC return code alone. EC2 tenants kept serving the cached
|
||||
# :latest digest because `docker compose up -d` is a no-op when the
|
||||
# tag hasn't been invalidated. ssm_status=Success was lying.
|
||||
#
|
||||
# This replay simulates that condition locally:
|
||||
# 1. Boot the harness with GIT_SHA=fix-applied.
|
||||
# 2. Curl /buildinfo and assert it returns "fix-applied" (the new code
|
||||
# actually shipped).
|
||||
# 3. Negative test: curl with a different EXPECTED_SHA and assert the
|
||||
# mismatch detection logic the workflow uses returns failure.
|
||||
#
|
||||
# This proves the verify-step's jq lookup + comparison logic works
|
||||
# against the SAME Dockerfile.tenant production builds. If the
|
||||
# /buildinfo route ever stops being wired through, this replay
|
||||
# catches it before it reaches a production tenant.
|
||||
|
||||
set -euo pipefail
|
||||
HERE="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
||||
HARNESS_ROOT="$(dirname "$HERE")"
|
||||
|
||||
BASE="${BASE:-http://harness-tenant.localhost:8080}"
|
||||
|
||||
# 1. Confirm /buildinfo wire shape — same shape the workflow's jq lookup expects.
|
||||
echo "[replay] curl $BASE/buildinfo ..."
|
||||
BUILD_JSON=$(curl -sS "$BASE/buildinfo")
|
||||
echo "[replay] $BUILD_JSON"
|
||||
|
||||
ACTUAL_SHA=$(echo "$BUILD_JSON" | jq -r '.git_sha // ""')
|
||||
if [ -z "$ACTUAL_SHA" ]; then
|
||||
echo "[replay] FAIL: /buildinfo response missing git_sha field — workflow's jq lookup would null"
|
||||
exit 1
|
||||
fi
|
||||
echo "[replay] git_sha=$ACTUAL_SHA"
|
||||
|
||||
# 2. Assert the harness build threaded GIT_SHA through. If we got "dev",
|
||||
# the Dockerfile arg / ldflags wiring is broken — same regression
|
||||
# class that made #2395 invisible until production.
|
||||
EXPECTED_FROM_HARNESS="${HARNESS_GIT_SHA:-harness}"
|
||||
if [ "$ACTUAL_SHA" = "dev" ]; then
|
||||
echo "[replay] FAIL: /buildinfo returned 'dev' — Dockerfile.tenant ARG GIT_SHA isn't reaching the binary"
|
||||
echo "[replay] This regresses #2395 by silencing the deploy-verify gate."
|
||||
exit 1
|
||||
fi
|
||||
if [ "$ACTUAL_SHA" != "$EXPECTED_FROM_HARNESS" ]; then
|
||||
echo "[replay] WARN: /buildinfo returned '$ACTUAL_SHA' but harness was built with GIT_SHA='$EXPECTED_FROM_HARNESS'"
|
||||
echo "[replay] Image may be cached from a previous run. Run ./up.sh --rebuild to force a fresh build."
|
||||
fi
|
||||
|
||||
# 3. Negative test — replay the workflow's mismatch detection by
|
||||
# comparing the actual SHA to a deliberately-wrong expected SHA.
|
||||
WRONG_EXPECTED="0000000000000000000000000000000000000000"
|
||||
if [ "$ACTUAL_SHA" = "$WRONG_EXPECTED" ]; then
|
||||
echo "[replay] FAIL: /buildinfo returned all-zero SHA — wiring inverted"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# 4. Replay the workflow's exact comparison logic so a regression in
|
||||
# the verify step's bash gets caught here.
|
||||
MISMATCH_DETECTED=0
|
||||
if [ "$ACTUAL_SHA" != "$WRONG_EXPECTED" ]; then
|
||||
MISMATCH_DETECTED=1
|
||||
fi
|
||||
if [ "$MISMATCH_DETECTED" != "1" ]; then
|
||||
echo "[replay] FAIL: workflow comparison logic would not flag a real mismatch"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
echo ""
|
||||
echo "[replay] PASS: /buildinfo wire shape, GIT_SHA injection, and mismatch detection all work in"
|
||||
echo " production-shape topology. The redeploy-fleet verify-step covers what it claims to."
|
||||
139
tests/harness/replays/peer-discovery-404.sh
Executable file
139
tests/harness/replays/peer-discovery-404.sh
Executable file
@ -0,0 +1,139 @@
|
||||
#!/usr/bin/env bash
|
||||
# Replay for issue #2397 — local proof that peer-discovery surfaces
|
||||
# actionable diagnostics instead of "may be isolated".
|
||||
#
|
||||
# Prior behavior: tool_list_peers returned "No peers available (this
|
||||
# workspace may be isolated)" regardless of WHY peers were empty —
|
||||
# five distinct conditions (200+empty, 401, 403, 404, 5xx, network)
|
||||
# collapsed to one ambiguous message.
|
||||
#
|
||||
# This replay proves two things, separately:
|
||||
# (a) WIRE: the platform side of the contract — the tenant's
|
||||
# /registry/<unregistered>/peers returns 404. If this regresses
|
||||
# (e.g. tenant starts returning 200 with empty list, or 500),
|
||||
# the runtime helper would parse it differently and the agent
|
||||
# would see a different diagnostic. The harness catches that here.
|
||||
# (b) PARSE: the runtime helper, given a 404, produces a diagnostic
|
||||
# containing "404" + "register" hints. Done in unit tests against
|
||||
# a mock httpx response (test_a2a_client.py::TestGetPeersWithDiagnostic
|
||||
# — the harness re-asserts the same contract here against a real
|
||||
# Python eval that does NOT depend on workspace auth tokens.
|
||||
#
|
||||
# Why split the assertion: the Python eval here doesn't have the
|
||||
# workspace's auth token file, so going through get_peers_with_diagnostic
|
||||
# directly would hit the platform without auth and produce a different
|
||||
# branch (401 instead of 404). Splitting (a) from (b) keeps each
|
||||
# assertion targeting exactly what it claims to test.
|
||||
|
||||
set -euo pipefail
|
||||
HERE="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
||||
HARNESS_ROOT="$(dirname "$HERE")"
|
||||
cd "$HARNESS_ROOT"
|
||||
|
||||
if [ ! -f .seed.env ]; then
|
||||
echo "[replay] no .seed.env — running ./seed.sh first..."
|
||||
./seed.sh
|
||||
fi
|
||||
# shellcheck source=/dev/null
|
||||
source .seed.env
|
||||
|
||||
BASE="${BASE:-http://harness-tenant.localhost:8080}"
|
||||
ADMIN="harness-admin-token"
|
||||
ORG="harness-org"
|
||||
|
||||
# ─── (a) WIRE: tenant returns 404 for an unregistered workspace ────────
|
||||
ROGUE_ID="$(uuidgen | tr '[:upper:]' '[:lower:]')"
|
||||
echo "[replay] (a) WIRE: querying /registry/$ROGUE_ID/peers (unregistered workspace)..."
|
||||
HTTP_CODE=$(curl -sS -o /tmp/peer-replay.json -w '%{http_code}' \
|
||||
-H "Authorization: Bearer $ADMIN" \
|
||||
-H "X-Molecule-Org-Id: $ORG" \
|
||||
-H "X-Workspace-ID: $ROGUE_ID" \
|
||||
"$BASE/registry/$ROGUE_ID/peers")
|
||||
|
||||
echo "[replay] tenant responded HTTP $HTTP_CODE"
|
||||
if [ "$HTTP_CODE" != "404" ]; then
|
||||
echo "[replay] FAIL (a): expected 404 from /registry/<unregistered>/peers, got $HTTP_CODE"
|
||||
echo "[replay] This is a platform-side regression — the runtime's diagnostic helper"
|
||||
echo "[replay] would see a different status code than the unit tests cover."
|
||||
cat /tmp/peer-replay.json
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# ─── (b) PARSE: helper converts a synthetic 404 to actionable diagnostic ─
|
||||
#
|
||||
# We construct a synthetic httpx 404 response and run the helper against
|
||||
# it directly. This isolates the parse branch we want to test from the
|
||||
# auth-context concerns of going through the network. The helper's network
|
||||
# branches are exhaustively covered by tests/test_a2a_client.py — this is
|
||||
# a regression-guard that the helper IS in the install, IS importable in
|
||||
# the harness's Python env, and IS reading the status code.
|
||||
|
||||
WORKSPACE_PATH="$(cd "$HARNESS_ROOT/../../workspace" && pwd)"
|
||||
DIAGNOSTIC=$(WORKSPACE_ID="harness-rogue" PYTHONPATH="$WORKSPACE_PATH" \
|
||||
python3 - "$WORKSPACE_PATH" <<'PYEOF'
|
||||
import asyncio
|
||||
import sys
|
||||
import types
|
||||
from unittest.mock import AsyncMock, MagicMock, patch
|
||||
|
||||
# Stub platform_auth so a2a_client imports cleanly without requiring a
|
||||
# real workspace token file. The helper's auth_headers() only matters
|
||||
# when going through the network; we're feeding it a mock response.
|
||||
_pa = types.ModuleType("platform_auth")
|
||||
_pa.auth_headers = lambda: {}
|
||||
_pa.self_source_headers = lambda: {}
|
||||
sys.modules.setdefault("platform_auth", _pa)
|
||||
|
||||
sys.path.insert(0, sys.argv[1])
|
||||
import a2a_client # noqa: E402
|
||||
|
||||
# This replay validates PR #2399's diagnostic helper. If the workspace
|
||||
# runtime in the current checkout pre-dates that fix, fail with a
|
||||
# clear message instead of an opaque AttributeError.
|
||||
if not hasattr(a2a_client, "get_peers_with_diagnostic"):
|
||||
print("__SKIP__: workspace/a2a_client.py is pre-#2399 (no get_peers_with_diagnostic).")
|
||||
sys.exit(0)
|
||||
|
||||
resp = MagicMock()
|
||||
resp.status_code = 404
|
||||
resp.json = MagicMock(return_value={"detail": "not found"})
|
||||
|
||||
mock_client = AsyncMock()
|
||||
mock_client.__aenter__ = AsyncMock(return_value=mock_client)
|
||||
mock_client.__aexit__ = AsyncMock(return_value=False)
|
||||
mock_client.get = AsyncMock(return_value=resp)
|
||||
|
||||
async def main():
|
||||
with patch("a2a_client.httpx.AsyncClient", return_value=mock_client):
|
||||
peers, diag = await a2a_client.get_peers_with_diagnostic()
|
||||
print(repr(diag))
|
||||
|
||||
asyncio.run(main())
|
||||
PYEOF
|
||||
)
|
||||
|
||||
if [[ "$DIAGNOSTIC" == __SKIP__:* ]]; then
|
||||
echo "[replay] (b) SKIP: ${DIAGNOSTIC#__SKIP__: }"
|
||||
echo "[replay] Re-run after #2399 lands on staging."
|
||||
echo ""
|
||||
echo "[replay] PASS (a) only: peer-discovery wire returns 404 (parse branch skipped — see above)."
|
||||
exit 0
|
||||
fi
|
||||
|
||||
echo "[replay] (b) PARSE: helper diagnostic = $DIAGNOSTIC"
|
||||
|
||||
if ! echo "$DIAGNOSTIC" | grep -q "404"; then
|
||||
echo "[replay] FAIL (b): diagnostic missing '404' — helper regressed to swallow-the-status-code"
|
||||
exit 1
|
||||
fi
|
||||
if ! echo "$DIAGNOSTIC" | grep -qi "regist"; then
|
||||
echo "[replay] FAIL (b): diagnostic missing 'register' guidance — helper regressed to opaque message"
|
||||
exit 1
|
||||
fi
|
||||
if echo "$DIAGNOSTIC" | grep -qi "may be isolated"; then
|
||||
echo "[replay] FAIL (b): diagnostic still says 'may be isolated' — fix didn't reach this code path"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
echo ""
|
||||
echo "[replay] PASS: peer-discovery (a) wire returns 404, (b) helper produces actionable diagnostic."
|
||||
65
tests/harness/seed.sh
Executable file
65
tests/harness/seed.sh
Executable file
@ -0,0 +1,65 @@
|
||||
#!/usr/bin/env bash
|
||||
# Seed the harness with two registered workspaces so peer-discovery
|
||||
# replay scripts have something to discover.
|
||||
#
|
||||
# - "alpha" parent (tier 0)
|
||||
# - "beta" child of alpha (tier 1)
|
||||
#
|
||||
# Both register via the platform's /registry/register endpoint, which
|
||||
# is what real workspaces do at boot. The platform then has them in its
|
||||
# DB; tool_list_peers from inside alpha can resolve beta as a peer.
|
||||
|
||||
set -euo pipefail
|
||||
HERE="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
||||
cd "$HERE"
|
||||
|
||||
BASE="${BASE:-http://harness-tenant.localhost:8080}"
|
||||
ADMIN="harness-admin-token"
|
||||
ORG="harness-org"
|
||||
|
||||
curl_admin() {
|
||||
curl -sS -H "Authorization: Bearer $ADMIN" \
|
||||
-H "X-Molecule-Org-Id: $ORG" \
|
||||
-H "Content-Type: application/json" "$@"
|
||||
}
|
||||
|
||||
echo "[seed] confirming tenant is reachable via cf-proxy..."
|
||||
HEALTH=$(curl -sS "$BASE/health" || echo "")
|
||||
if [ -z "$HEALTH" ]; then
|
||||
echo "[seed] FAILED: $BASE/health unreachable. Did ./up.sh complete? Did you add"
|
||||
echo " 127.0.0.1 harness-tenant.localhost to /etc/hosts?"
|
||||
exit 1
|
||||
fi
|
||||
echo "[seed] $HEALTH"
|
||||
|
||||
echo "[seed] confirming /buildinfo returns the harness GIT_SHA..."
|
||||
BUILD=$(curl -sS "$BASE/buildinfo" || echo "")
|
||||
echo "[seed] $BUILD"
|
||||
|
||||
# Mint a fresh admin-call workspace ID for the parent. Platform's
|
||||
# /admin/workspaces/:id/test-token mints a per-workspace bearer; the
|
||||
# replay scripts use it to call the workspace-scoped routes.
|
||||
echo "[seed] creating workspace 'alpha' (parent)..."
|
||||
ALPHA_ID=$(uuidgen | tr '[:upper:]' '[:lower:]')
|
||||
curl_admin -X POST "$BASE/workspaces" \
|
||||
-d "{\"id\":\"$ALPHA_ID\",\"name\":\"alpha\",\"tier\":0,\"runtime\":\"langgraph\"}" \
|
||||
>/dev/null
|
||||
echo "[seed] alpha id=$ALPHA_ID"
|
||||
|
||||
echo "[seed] creating workspace 'beta' (child of alpha)..."
|
||||
BETA_ID=$(uuidgen | tr '[:upper:]' '[:lower:]')
|
||||
curl_admin -X POST "$BASE/workspaces" \
|
||||
-d "{\"id\":\"$BETA_ID\",\"name\":\"beta\",\"tier\":1,\"parent_id\":\"$ALPHA_ID\",\"runtime\":\"langgraph\"}" \
|
||||
>/dev/null
|
||||
echo "[seed] beta id=$BETA_ID"
|
||||
|
||||
# Stash IDs so replay scripts pick them up.
|
||||
{
|
||||
echo "ALPHA_ID=$ALPHA_ID"
|
||||
echo "BETA_ID=$BETA_ID"
|
||||
} > "$HERE/.seed.env"
|
||||
|
||||
echo ""
|
||||
echo "[seed] done. IDs persisted to tests/harness/.seed.env"
|
||||
echo "[seed] ALPHA_ID=$ALPHA_ID"
|
||||
echo "[seed] BETA_ID=$BETA_ID"
|
||||
39
tests/harness/up.sh
Executable file
39
tests/harness/up.sh
Executable file
@ -0,0 +1,39 @@
|
||||
#!/usr/bin/env bash
|
||||
# Bring the production-shape harness up.
|
||||
#
|
||||
# Usage: ./up.sh [--rebuild]
|
||||
#
|
||||
# Always operates in tests/harness/ regardless of where it's invoked
|
||||
# from — test scripts under tests/harness/replays/ source it via the
|
||||
# absolute path, so cd-ing first prevents compose-context surprises.
|
||||
|
||||
set -euo pipefail
|
||||
HERE="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
||||
cd "$HERE"
|
||||
|
||||
REBUILD=false
|
||||
for arg in "$@"; do
|
||||
case "$arg" in
|
||||
--rebuild) REBUILD=true ;;
|
||||
esac
|
||||
done
|
||||
|
||||
if [ "$REBUILD" = true ]; then
|
||||
docker compose -f compose.yml build --no-cache tenant cp-stub
|
||||
fi
|
||||
|
||||
echo "[harness] starting cp-stub + postgres + redis + tenant + cf-proxy ..."
|
||||
docker compose -f compose.yml up -d --wait
|
||||
|
||||
echo "[harness] /etc/hosts entry for harness-tenant.localhost..."
|
||||
if ! grep -q '^127\.0\.0\.1[[:space:]]\+harness-tenant\.localhost' /etc/hosts; then
|
||||
echo " (skip — your /etc/hosts may not resolve *.localhost. If tests fail with"
|
||||
echo " 'getaddrinfo' errors, add: 127.0.0.1 harness-tenant.localhost)"
|
||||
fi
|
||||
|
||||
echo ""
|
||||
echo "[harness] up. Tenant: http://harness-tenant.localhost:8080/health"
|
||||
echo " http://harness-tenant.localhost:8080/buildinfo"
|
||||
echo " cp-stub: http://localhost (internal-only via compose net)"
|
||||
echo ""
|
||||
echo "Next: ./seed.sh # mint admin token + register sample workspaces"
|
||||
Loading…
Reference in New Issue
Block a user