Post-deploy verification for staging tenant images. Runs against the canary fleet after each publish-workspace-server-image build — catches auto-update breakage (a la today's E2E current_task drift) before it propagates to the prod tenant fleet that auto-pulls :latest every 5 min. scripts/canary-smoke.sh iterates a space-sep list of canary base URLs (paired with their ADMIN_TOKENs) and checks: - /admin/liveness reachable with admin bearer (tenant boot OK) - /workspaces list responds (wsAuth + DB path OK) - /memories/commit + /memories/search round-trip (encryption + scrubber) - /events admin read (AdminAuth C4 path) - /admin/liveness without bearer returns 401 (C4 fail-closed regression) .github/workflows/canary-verify.yml runs after publish succeeds: - 6-min sleep (tenant auto-updater pulls every 5 min) - bash scripts/canary-smoke.sh with secrets pulled from repo settings - on failure: writes a Step Summary flagging that :latest should be rolled back to prior known-good digest Phase 3 follow-up will split the publish workflow so only :staging-<sha> ships initially, and canary-verify's green gate is what promotes :staging-<sha> → :latest. This commit lays the test gate alone so we have something running against tenants immediately. Secrets to set in GitHub repo settings before this workflow can run: - CANARY_TENANT_URLS (space-sep list) - CANARY_ADMIN_TOKENS (same order as URLs) - CANARY_CP_SHARED_SECRET (matches staging CP PROVISION_SHARED_SECRET) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
57 lines
2.1 KiB
YAML
57 lines
2.1 KiB
YAML
name: canary-verify
|
|
|
|
# Runs the canary smoke suite against the staging canary tenant fleet
|
|
# after a new workspace-server image lands on :latest. On failure,
|
|
# alerts via a GitHub Actions summary — follow-up PR will add:
|
|
# - :staging-<sha> intermediate tag published BY publish workflow
|
|
# - retag :staging-<sha> → :latest ONLY when this workflow is green
|
|
# - Telegram/Slack notifier on red
|
|
# For now this exists as the test gate itself so we can catch
|
|
# auto-update breakage before it reaches the prod tenant fleet
|
|
# (which auto-pulls :latest every 5 min).
|
|
|
|
on:
|
|
workflow_run:
|
|
workflows: ["publish-workspace-server-image"]
|
|
types: [completed]
|
|
workflow_dispatch:
|
|
|
|
permissions:
|
|
contents: read
|
|
actions: read
|
|
|
|
jobs:
|
|
canary-smoke:
|
|
# Skip when the upstream workflow failed — no image to test against.
|
|
if: ${{ github.event.workflow_run.conclusion == 'success' || github.event_name == 'workflow_dispatch' }}
|
|
runs-on: ubuntu-latest
|
|
steps:
|
|
- name: Checkout
|
|
uses: actions/checkout@v4
|
|
|
|
- name: Wait for canary tenants to pick up new image
|
|
# Tenant auto-updater runs every 5 min. Sleep 6 min to give every
|
|
# canary time to pull + restart. Cheaper than polling.
|
|
run: sleep 360
|
|
|
|
- name: Run canary smoke suite
|
|
env:
|
|
CANARY_TENANT_URLS: ${{ secrets.CANARY_TENANT_URLS }}
|
|
CANARY_ADMIN_TOKENS: ${{ secrets.CANARY_ADMIN_TOKENS }}
|
|
CANARY_CP_BASE_URL: https://staging-api.moleculesai.app
|
|
CANARY_CP_SHARED_SECRET: ${{ secrets.CANARY_CP_SHARED_SECRET }}
|
|
run: bash scripts/canary-smoke.sh
|
|
|
|
- name: Summary on failure
|
|
if: ${{ failure() }}
|
|
run: |
|
|
{
|
|
echo "## Canary smoke FAILED"
|
|
echo
|
|
echo "The staging canary tenant fleet failed its post-deploy smoke suite."
|
|
echo "The :latest tag on ghcr.io/molecule-ai/platform should be rolled back"
|
|
echo "to the prior known-good digest until this is resolved."
|
|
echo
|
|
echo "See job log above for the specific failed assertions."
|
|
} >> "$GITHUB_STEP_SUMMARY"
|