molecule-core/tests/harness/replays/buildinfo-stale-image.sh
Hongming Wang 5cca462843 harness(phase-0): sudo-free Host-header path + chat_history + envelope replays
Three changes that bring the local harness from "covers what staging
covers minus the SaaS topology" to "exercises every surface we shipped
this session against the prod-shape Dockerfile.tenant image."

1. Drop the /etc/hosts requirement.

   Replays previously needed `127.0.0.1 harness-tenant.localhost` in
   /etc/hosts to resolve the cf-proxy. That gated the harness behind a
   sudo step on every fresh dev box and CI runner. The cf-proxy nginx
   already routes by Host header (matches production CF tunnel: URL is
   public, Host carries tenant identity), so the no-sudo path is to
   target loopback :8080 with `Host: harness-tenant.localhost` set as
   a header.

   New `tests/harness/_curl.sh` centralises this — curl_anon /
   curl_admin / curl_workspace / psql_exec wrappers all set the Host
   + auth headers automatically. seed.sh, peer-discovery-404.sh,
   buildinfo-stale-image.sh updated to source it. Legacy /etc/hosts
   users still work via env-var override.

2. Fix the seed.sh FK regression that blocked DB-side replays.

   POST /workspaces ignores any `id` in the request body and generates
   one server-side. seed.sh was minting client-side UUIDs that never
   reached the workspaces table, so any replay that INSERTed into
   activity_logs (FK-constrained on workspace_id) failed with the
   workspace-not-found error. Capture the returned id from the
   response instead.

3. Two new replays cover the surfaces shipped this session.

   chat-history.sh — exercises the full SaaS-shape wire that PR #2472
   (peer_id filter), #2474 (chat_history client tool), and #2476
   (before_ts paging) ride on. 8 phases / 16 assertions: peer_id filter,
   limit cap, before_ts paging, OR-clause covering both source_id and
   target_id, malformed peer_id 400, malformed before_ts 400, URL-encoded
   SQLi-shape rejection. Verified PASS against the live harness.

   channel-envelope-trust-boundary.sh — exercises PR #2471 + #2481 by
   importing from `molecule_runtime.*` (the wheel-rewritten path) so
   it catches "wheel build dropped a fix that unit tests still pass."
   5 phases / 11 assertions: malicious peer_id scrubbed from envelope,
   agent_card_url omitted on validation failure, XML-injection bytes
   scrubbed, valid UUID preserved, _agent_card_url_for direct gate.
   Verified PASS against published wheel 0.1.79.

run-all-replays.sh auto-discovers — no registration needed. Full
lifecycle (boot → seed → 4 replays → teardown) runs clean.

Roadmap section updated to reflect Phase 1 (this PR) → Phase 2
(multi-tenant + CI gate) → Phase 3 (real CP) → Phase 4 (Miniflare +
LocalStack + traffic replay).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 20:12:49 -07:00

76 lines
3.2 KiB
Bash
Executable File

#!/usr/bin/env bash
# Replay for issue #2395 — local proof that the /buildinfo verify gate
# closes the SaaS deploy-chain blindness.
#
# Prior behavior: redeploy-fleet returned ssm_status=Success based on
# the SSM RPC return code alone. EC2 tenants kept serving the cached
# :latest digest because `docker compose up -d` is a no-op when the
# tag hasn't been invalidated. ssm_status=Success was lying.
#
# This replay simulates that condition locally:
# 1. Boot the harness with GIT_SHA=fix-applied.
# 2. Curl /buildinfo and assert it returns "fix-applied" (the new code
# actually shipped).
# 3. Negative test: curl with a different EXPECTED_SHA and assert the
# mismatch detection logic the workflow uses returns failure.
#
# This proves the verify-step's jq lookup + comparison logic works
# against the SAME Dockerfile.tenant production builds. If the
# /buildinfo route ever stops being wired through, this replay
# catches it before it reaches a production tenant.
set -euo pipefail
HERE="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
HARNESS_ROOT="$(dirname "$HERE")"
# shellcheck source=../_curl.sh
source "$HARNESS_ROOT/_curl.sh"
# 1. Confirm /buildinfo wire shape — same shape the workflow's jq lookup expects.
echo "[replay] curl $BASE/buildinfo ..."
BUILD_JSON=$(curl_anon "$BASE/buildinfo")
echo "[replay] $BUILD_JSON"
ACTUAL_SHA=$(echo "$BUILD_JSON" | jq -r '.git_sha // ""')
if [ -z "$ACTUAL_SHA" ]; then
echo "[replay] FAIL: /buildinfo response missing git_sha field — workflow's jq lookup would null"
exit 1
fi
echo "[replay] git_sha=$ACTUAL_SHA"
# 2. Assert the harness build threaded GIT_SHA through. If we got "dev",
# the Dockerfile arg / ldflags wiring is broken — same regression
# class that made #2395 invisible until production.
EXPECTED_FROM_HARNESS="${HARNESS_GIT_SHA:-harness}"
if [ "$ACTUAL_SHA" = "dev" ]; then
echo "[replay] FAIL: /buildinfo returned 'dev' — Dockerfile.tenant ARG GIT_SHA isn't reaching the binary"
echo "[replay] This regresses #2395 by silencing the deploy-verify gate."
exit 1
fi
if [ "$ACTUAL_SHA" != "$EXPECTED_FROM_HARNESS" ]; then
echo "[replay] WARN: /buildinfo returned '$ACTUAL_SHA' but harness was built with GIT_SHA='$EXPECTED_FROM_HARNESS'"
echo "[replay] Image may be cached from a previous run. Run ./up.sh --rebuild to force a fresh build."
fi
# 3. Negative test — replay the workflow's mismatch detection by
# comparing the actual SHA to a deliberately-wrong expected SHA.
WRONG_EXPECTED="0000000000000000000000000000000000000000"
if [ "$ACTUAL_SHA" = "$WRONG_EXPECTED" ]; then
echo "[replay] FAIL: /buildinfo returned all-zero SHA — wiring inverted"
exit 1
fi
# 4. Replay the workflow's exact comparison logic so a regression in
# the verify step's bash gets caught here.
MISMATCH_DETECTED=0
if [ "$ACTUAL_SHA" != "$WRONG_EXPECTED" ]; then
MISMATCH_DETECTED=1
fi
if [ "$MISMATCH_DETECTED" != "1" ]; then
echo "[replay] FAIL: workflow comparison logic would not flag a real mismatch"
exit 1
fi
echo ""
echo "[replay] PASS: /buildinfo wire shape, GIT_SHA injection, and mismatch detection all work in"
echo " production-shape topology. The redeploy-fleet verify-step covers what it claims to."