59d699b61c
Lint shellcheck (arm64 pilot) / shellcheck-arm64 (pilot) (pull_request) Waiting to run
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 4s
CI / Detect changes (pull_request) Successful in 7s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 24s
E2E API Smoke Test / detect-changes (pull_request) Successful in 14s
E2E Chat / detect-changes (pull_request) Successful in 11s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 7s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 9s
Lint forbidden tenant-env keys / Scan workspace_secrets writers for forbidden env keys (pull_request) Successful in 5s
Lint no tenant GITEA or GITHUB token write / Scan for repo-host token write into tenant workspace surface (pull_request) Successful in 6s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 12s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 9s
gate-check-v3 / gate-check (pull_request) Successful in 7s
qa-review / approved (pull_request) Failing after 7s
security-review / approved (pull_request) Failing after 6s
sop-checklist / na-declarations (pull_request) N/A: (none)
sop-checklist / all-items-acked (pull_request) Successful in 5s
sop-checklist / review-refire (pull_request) Has been skipped
sop-tier-check / tier-check (pull_request) Successful in 5s
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 1m3s
CI / Platform (Go) (pull_request) Successful in 5m45s
CI / Python Lint & Test (pull_request) Successful in 7m0s
CI / Canvas (Next.js) (pull_request) Successful in 7m34s
CI / all-required (pull_request) Successful in 7m14s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 5s
E2E Chat / E2E Chat (pull_request) Successful in 6s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 6s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 2s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 2s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
Adds a self-contained docker-compose harness in local-e2e/ that gates
RFC#600-class template changes BEFORE customer canary. Implements the 4
canonical canaries:
1. 2-turn name continuity — SessionStore key derivation
2. File-only message — no caption drop-to-empty-prompt regress
3. File + prompt (multimodal) — multimodal happy path
4. Cross-session memory — explicit memory tool, distinct context_ids
Architecture is deliberately lean per CTO "separate CI as possible":
local-e2e/
docker-compose.yml # runtime + cp_sim ONLY (no platform Go, no pg)
cp_sim/ # ~250 LoC Python A2A wire-shape emitter
cp_sim/canary/ # 4 canary scenarios + layer-isolation probes
scripts/run-canary.sh # one-shot orchestration (target <3 min)
scripts/onboard-template.sh # gitops helper for cascade
templates/session-continuity-e2e.yml # canonical workflow shim
Rationale for a Python tenant-CP simulator (not the real workspace-server):
SessionStore behaviour is fully owned by workspace/a2a_executor.py +
executor_helpers.py — the Go platform service doesn't touch session
continuity. Excising it gets the harness to <3 min cold-boot on
docker-host runners and keeps the surface small enough to debug fast.
The simulator emits the byte-identical JSON-RPC message/send envelope
that workspace-server POSTs (cross-checked against
tests/e2e/test_chat_attachments_e2e.sh and workspace/a2a_executor.py
:_core_execute).
Per feedback_no_single_source_of_truth: the harness IS the canonical
session-continuity validator across templates. Per-template unit tests
keep covering their own guard logic.
Per feedback_image_promote_is_not_user_live + feedback_verify_actual_
endstate_not_ack_follow_sop: every canary asserts at the running-
container layer; artifacts dump SessionStore state + runtime logs on
failure for post-mortem.
Rollout (deliberate sequencing, per task #342):
1. THIS PR — lands harness in molecule-core. NOT yet wired to any
template repo.
2. Companion PR in molecule-ai-workspace-template-hermes — adds
.gitea/workflows/session-continuity-e2e.yml. NOT required yet.
3. Bake on hermes for ≥5 business days.
4. Cascade to remaining 6 templates via onboard-template.sh.
5. Per-template BP flip — add "session-continuity-e2e (pull_request)"
to status_check_contexts on each repo, hermes first.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
106 lines
4.5 KiB
Bash
Executable File
106 lines
4.5 KiB
Bash
Executable File
#!/usr/bin/env bash
|
|
# run-canary.sh — one-shot orchestration for the local-e2e session-continuity
|
|
# canary harness. Used by both interactive local runs and the per-template
|
|
# .gitea/workflows/session-continuity-e2e.yml.
|
|
#
|
|
# Usage:
|
|
# TEMPLATE_IMAGE=ghcr.io/molecule-ai/workspace-template-hermes:latest \
|
|
# ./local-e2e/scripts/run-canary.sh
|
|
#
|
|
# Optional env:
|
|
# CANARY_RUN_ID — disambiguator for parallel CI runs (default: random)
|
|
# RUNTIME_PORT — host port for runtime :8000 (default: 18000)
|
|
# KEEP_RUNNING — set =1 to leave containers up for post-mortem
|
|
#
|
|
# Exit codes:
|
|
# 0 — all 4 canaries passed
|
|
# 1 — at least one canary failed (artifacts/ has the dump)
|
|
# 2 — harness infrastructure failure (image pull / compose / etc.)
|
|
#
|
|
# Cross-refs:
|
|
# feedback_image_promote_is_not_user_live — we verify at the running
|
|
# container layer, NOT at the pipeline-green layer.
|
|
# feedback_verify_actual_endstate_not_ack_follow_sop — every assert
|
|
# reads state back; no side-effect-ack claims success.
|
|
|
|
set -euo pipefail
|
|
|
|
: "${TEMPLATE_IMAGE:?TEMPLATE_IMAGE env required (the runtime image under test)}"
|
|
|
|
# ----------------------------------------------------------------- paths
|
|
HARNESS_ROOT="$( cd "$( dirname "${BASH_SOURCE[0]}" )/.." && pwd )"
|
|
ARTIFACTS_DIR="$HARNESS_ROOT/artifacts"
|
|
mkdir -p "$ARTIFACTS_DIR"
|
|
|
|
export CANARY_RUN_ID="${CANARY_RUN_ID:-$(uuidgen 2>/dev/null | tr A-Z a-z | tr -d - | cut -c1-12 || date +%s)}"
|
|
export RUNTIME_PORT="${RUNTIME_PORT:-18000}"
|
|
export TEMPLATE_IMAGE
|
|
COMPOSE_PROJECT="canary-${CANARY_RUN_ID}"
|
|
COMPOSE_FILE="$HARNESS_ROOT/docker-compose.yml"
|
|
|
|
log() { printf "\n=== [%s] %s ===\n" "$(date +%H:%M:%S)" "$*"; }
|
|
|
|
# ----------------------------------------------------------- cleanup hook
|
|
cleanup() {
|
|
local rc=$?
|
|
if [ "${KEEP_RUNNING:-0}" = "1" ]; then
|
|
log "KEEP_RUNNING=1 — leaving containers up (project=$COMPOSE_PROJECT)"
|
|
return $rc
|
|
fi
|
|
log "Tearing down compose project $COMPOSE_PROJECT"
|
|
# On non-zero exit, capture logs FIRST. Per feedback_image_promote_is_
|
|
# not_user_live: dump state from the actually-running container, not
|
|
# an inferred pipeline state.
|
|
if [ $rc -ne 0 ]; then
|
|
log "Canary FAILED — dumping artifacts to $ARTIFACTS_DIR"
|
|
docker compose -p "$COMPOSE_PROJECT" -f "$COMPOSE_FILE" logs \
|
|
--no-color --tail=200 runtime \
|
|
> "$ARTIFACTS_DIR/runtime.log" 2>&1 || true
|
|
# SessionStore state probe — runtime exposes /admin/session-store
|
|
# in canary mode; if not present this 404s and the file is empty.
|
|
docker compose -p "$COMPOSE_PROJECT" -f "$COMPOSE_FILE" exec -T runtime \
|
|
sh -c 'ls -la /tmp/canary-memory 2>/dev/null; find /tmp -name "session*.json" -exec cat {} \; 2>/dev/null' \
|
|
> "$ARTIFACTS_DIR/session-store.txt" 2>&1 || true
|
|
fi
|
|
docker compose -p "$COMPOSE_PROJECT" -f "$COMPOSE_FILE" down --volumes --remove-orphans >/dev/null 2>&1 || true
|
|
return $rc
|
|
}
|
|
trap cleanup EXIT
|
|
|
|
# ------------------------------------------------------ stack bring-up
|
|
log "Building cp_sim image"
|
|
docker compose -p "$COMPOSE_PROJECT" -f "$COMPOSE_FILE" build cp_sim
|
|
|
|
log "Pulling runtime image: $TEMPLATE_IMAGE"
|
|
docker compose -p "$COMPOSE_PROJECT" -f "$COMPOSE_FILE" pull runtime 2>&1 \
|
|
| tail -5 || true
|
|
|
|
log "Starting runtime (host port $RUNTIME_PORT)"
|
|
docker compose -p "$COMPOSE_PROJECT" -f "$COMPOSE_FILE" up -d runtime
|
|
|
|
# Wait for healthcheck — docker-compose `--wait` is the canonical mechanism
|
|
# (introduced in v2.1.1 in 2021, available on every supported runner pool).
|
|
log "Waiting for runtime healthcheck"
|
|
if ! docker compose -p "$COMPOSE_PROJECT" -f "$COMPOSE_FILE" up -d --wait runtime; then
|
|
log "Runtime never went healthy — dumping logs"
|
|
docker compose -p "$COMPOSE_PROJECT" -f "$COMPOSE_FILE" logs --no-color --tail=200 runtime \
|
|
> "$ARTIFACTS_DIR/runtime-boot-failure.log" 2>&1 || true
|
|
exit 2
|
|
fi
|
|
|
|
# -------------------------------------------------------------- run tests
|
|
log "Running canary suite"
|
|
# Run cp_sim under the same compose project so DNS (runtime hostname)
|
|
# resolves on the molecule-core-net bridge. --rm cleans the driver container
|
|
# after pytest exits; volume bind mounts pytest's junit-xml back to host.
|
|
if docker compose -p "$COMPOSE_PROJECT" -f "$COMPOSE_FILE" --profile driver run \
|
|
--rm \
|
|
-v "$ARTIFACTS_DIR:/harness/artifacts" \
|
|
cp_sim; then
|
|
log "All canaries PASSED"
|
|
exit 0
|
|
else
|
|
log "At least one canary FAILED — see $ARTIFACTS_DIR/junit.xml"
|
|
exit 1
|
|
fi
|