forked from molecule-ai/molecule-core
The auto-promote-staging.yml gate-check (line 99) treats "workflow didn't run" as failure. Path-filtered triggers on E2E API Smoke Test and E2E Staging Canvas meant a platform-only or test-only push to staging — say, the prior PR #2201 which only touched tests/e2e/test_staging_full_saas.sh — never triggered the canvas workflow, and auto-promote saw `missing/none`, marked all_green=false, and aborted. Same class for any push that doesn't touch the gate's watched paths. Dead-lock by design, never noticed because the gate was new. Fix per Design B (always-run + fast-skip): - Drop `paths:` from the push/pull_request triggers on both gate workflows. The workflow now always fires on every staging+main push/PR. - Add a `detect-changes` job using `dorny/paths-filter@v3` that decides whether to do real work, scoped to the same paths the trigger filter used to watch. - Real work job (e2e-api / playwright) gates on `needs: detect-changes; if: needs.detect-changes.outputs.X == 'true'`. - Add a sibling `no-op` job that runs when the filter output is false, emitting `::notice::… no-op pass`. The workflow run's conclusion is `success` either way — auto-promote sees green and proceeds. manual `workflow_dispatch` and the weekly canvas `schedule` short- circuit detect-changes to always-run — those triggers exist precisely to exercise the suite and shouldn't be silently no-op'd. Why this approach over making auto-promote-staging smarter: The alternative (Design A, considered + rejected) was to teach auto-promote-staging to read each gate's `paths:` filter and treat "no run because filter excluded the commit" as conditional pass. That couples auto-promote to other workflows' YAML schema and breaks silently if a gate is renamed or its filter changes. Design B keeps the auto-promote contract simple ("each gate emits success") and makes each gate self-describing — adding a new gate doesn't require touching auto-promote. Cost: ~10-30s of runner overhead per gate per push for the no-op when paths don't match. Negligible vs the alternative of dead-locked auto-promote chains. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
167 lines
6.3 KiB
YAML
167 lines
6.3 KiB
YAML
name: E2E API Smoke Test
|
|
# Extracted from ci.yml so workflow-level concurrency can protect this job
|
|
# from run-level cancellation (issue #458).
|
|
#
|
|
# Trigger model (changed 2026-04-28 — see auto-promote gap below):
|
|
#
|
|
# This workflow always FIRES on push/pull_request to staging+main, but
|
|
# only does real work when paths under `workspace-server/`,
|
|
# `tests/e2e/`, or this workflow file changed. The detect-changes job
|
|
# uses dorny/paths-filter to decide; the e2e-api job runs only if
|
|
# changes match. Otherwise the no-op job emits success so the workflow
|
|
# always produces a `completed/success` run record.
|
|
#
|
|
# Why: auto-promote-staging.yml's gate-check (line 99) treats "workflow
|
|
# didn't run" as failure, which dead-locked any platform-only or
|
|
# test-only push to staging that didn't touch workspace-server paths.
|
|
# Dropping the path filter on the trigger and gating real work
|
|
# internally guarantees the workflow always emits a result that the
|
|
# auto-promote chain can read. Same pattern applied to
|
|
# e2e-staging-canvas.yml in the same PR.
|
|
|
|
on:
|
|
push:
|
|
branches: [main, staging]
|
|
pull_request:
|
|
branches: [main, staging]
|
|
workflow_dispatch:
|
|
|
|
concurrency:
|
|
group: e2e-api-${{ github.ref }}
|
|
cancel-in-progress: false
|
|
|
|
jobs:
|
|
detect-changes:
|
|
runs-on: ubuntu-latest
|
|
outputs:
|
|
api: ${{ steps.decide.outputs.api }}
|
|
steps:
|
|
- uses: actions/checkout@v4
|
|
- uses: dorny/paths-filter@v3
|
|
id: filter
|
|
with:
|
|
filters: |
|
|
api:
|
|
- 'workspace-server/**'
|
|
- 'tests/e2e/**'
|
|
- '.github/workflows/e2e-api.yml'
|
|
- id: decide
|
|
# Always run real work for manual dispatch — no diff context to
|
|
# filter against and ops dispatching this expects the suite to
|
|
# actually exercise the platform.
|
|
run: |
|
|
if [ "${{ github.event_name }}" = "workflow_dispatch" ]; then
|
|
echo "api=true" >> "$GITHUB_OUTPUT"
|
|
else
|
|
echo "api=${{ steps.filter.outputs.api }}" >> "$GITHUB_OUTPUT"
|
|
fi
|
|
|
|
no-op:
|
|
needs: detect-changes
|
|
if: needs.detect-changes.outputs.api != 'true'
|
|
runs-on: ubuntu-latest
|
|
steps:
|
|
- run: |
|
|
echo "No workspace-server / tests/e2e / workflow changes — E2E API gate satisfied without running tests."
|
|
echo "::notice::E2E API Smoke Test no-op pass (paths filter excluded this commit)."
|
|
|
|
e2e-api:
|
|
needs: detect-changes
|
|
if: needs.detect-changes.outputs.api == 'true'
|
|
name: E2E API Smoke Test
|
|
runs-on: ubuntu-latest
|
|
timeout-minutes: 15
|
|
env:
|
|
DATABASE_URL: postgres://dev:dev@localhost:15432/molecule?sslmode=disable
|
|
REDIS_URL: redis://localhost:16379
|
|
PORT: "8080"
|
|
PG_CONTAINER: molecule-ci-postgres
|
|
REDIS_CONTAINER: molecule-ci-redis
|
|
steps:
|
|
- uses: actions/checkout@v4
|
|
- uses: actions/setup-go@v5
|
|
with:
|
|
go-version: 'stable'
|
|
cache: true
|
|
cache-dependency-path: workspace-server/go.sum
|
|
- name: Start Postgres (docker)
|
|
run: |
|
|
docker rm -f "$PG_CONTAINER" 2>/dev/null || true
|
|
docker run -d --name "$PG_CONTAINER" -e POSTGRES_USER=dev -e POSTGRES_PASSWORD=dev -e POSTGRES_DB=molecule -p 15432:5432 postgres:16
|
|
for i in $(seq 1 30); do
|
|
if docker exec "$PG_CONTAINER" pg_isready -U dev >/dev/null 2>&1; then
|
|
echo "Postgres ready after ${i}s"
|
|
exit 0
|
|
fi
|
|
sleep 1
|
|
done
|
|
echo "::error::Postgres did not become ready in 30s"
|
|
docker logs "$PG_CONTAINER" || true
|
|
exit 1
|
|
- name: Start Redis (docker)
|
|
run: |
|
|
docker rm -f "$REDIS_CONTAINER" 2>/dev/null || true
|
|
docker run -d --name "$REDIS_CONTAINER" -p 16379:6379 redis:7
|
|
for i in $(seq 1 15); do
|
|
if docker exec "$REDIS_CONTAINER" redis-cli ping 2>/dev/null | grep -q PONG; then
|
|
echo "Redis ready after ${i}s"
|
|
exit 0
|
|
fi
|
|
sleep 1
|
|
done
|
|
echo "::error::Redis did not become ready in 15s"
|
|
docker logs "$REDIS_CONTAINER" || true
|
|
exit 1
|
|
- name: Build platform
|
|
working-directory: workspace-server
|
|
run: go build -o platform-server ./cmd/server
|
|
- name: Start platform (background)
|
|
working-directory: workspace-server
|
|
run: |
|
|
./platform-server > platform.log 2>&1 &
|
|
echo $! > platform.pid
|
|
- name: Wait for /health
|
|
run: |
|
|
for i in $(seq 1 30); do
|
|
if curl -sf http://localhost:8080/health > /dev/null; then
|
|
echo "Platform up after ${i}s"
|
|
exit 0
|
|
fi
|
|
sleep 1
|
|
done
|
|
echo "::error::Platform did not become healthy in 30s"
|
|
cat workspace-server/platform.log || true
|
|
exit 1
|
|
- name: Assert migrations applied
|
|
run: |
|
|
tables=$(docker exec "$PG_CONTAINER" psql -U dev -d molecule -tAc "SELECT count(*) FROM information_schema.tables WHERE table_schema='public' AND table_name='workspaces'")
|
|
if [ "$tables" != "1" ]; then
|
|
echo "::error::Migrations did not apply"
|
|
cat workspace-server/platform.log || true
|
|
exit 1
|
|
fi
|
|
echo "Migrations OK"
|
|
- name: Run E2E API tests
|
|
run: bash tests/e2e/test_api.sh
|
|
- name: Run notify-with-attachments E2E
|
|
run: bash tests/e2e/test_notify_attachments_e2e.sh
|
|
- name: Run priority-runtimes E2E (claude-code + hermes — skips when keys absent)
|
|
# Validates the test script itself runs cleanly even with no LLM
|
|
# keys (both phases skip gracefully). The wire-real coverage with
|
|
# actual keys runs in canary-staging.yml + e2e-staging-saas.yml.
|
|
run: bash tests/e2e/test_priority_runtimes_e2e.sh
|
|
- name: Dump platform log on failure
|
|
if: failure()
|
|
run: cat workspace-server/platform.log || true
|
|
- name: Stop platform
|
|
if: always()
|
|
run: |
|
|
if [ -f workspace-server/platform.pid ]; then
|
|
kill "$(cat workspace-server/platform.pid)" 2>/dev/null || true
|
|
fi
|
|
- name: Stop service containers
|
|
if: always()
|
|
run: |
|
|
docker rm -f "$PG_CONTAINER" 2>/dev/null || true
|
|
docker rm -f "$REDIS_CONTAINER" 2>/dev/null || true
|