896afc5bd7
Lint shellcheck (arm64 pilot) / shellcheck-arm64 (pilot) (pull_request) Waiting to run
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 8s
CI / Detect changes (pull_request) Successful in 8s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 15s
E2E API Smoke Test / detect-changes (pull_request) Successful in 9s
E2E Chat / detect-changes (pull_request) Successful in 12s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 12s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 7s
lint-continue-on-error-tracking / lint-continue-on-error-tracking (pull_request) Successful in 1m33s
Lint curl status-code capture / Scan workflows for curl status-capture pollution (pull_request) Successful in 4s
Lint forbidden tenant-env keys / Scan workspace_secrets writers for forbidden env keys (pull_request) Successful in 4s
Lint no tenant GITEA or GITHUB token write / Scan for repo-host token write into tenant workspace surface (pull_request) Successful in 3s
CI / Platform (Go) (pull_request) Successful in 4m34s
Lint pre-flip continue-on-error / Verify continue-on-error flips have run-log proof (pull_request) Successful in 1m17s
CI / Canvas (Next.js) (pull_request) Successful in 5m43s
lint-required-workflows-docker-host-pinned / Lint docker-host pin on docker-touching workflows (pull_request) Successful in 4s
lint-required-context-exists-in-bp / lint-required-context-exists-in-bp (pull_request) Successful in 1m21s
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 1m20s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 10s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 4s
gate-check-v3 / gate-check (pull_request) Failing after 4s
qa-review / approved (pull_request) Failing after 4s
security-review / approved (pull_request) Failing after 3s
sop-checklist / na-declarations (pull_request) N/A: (none)
sop-checklist / all-items-acked (pull_request) Successful in 5s
sop-checklist / review-refire (pull_request) Has been skipped
sop-tier-check / tier-check (pull_request) Successful in 4s
Lint workflow YAML (Gitea-1.22.6-hostile shapes) / Lint workflow YAML for Gitea-1.22.6-hostile shapes (pull_request) Successful in 1m14s
CI / Python Lint & Test (pull_request) Successful in 6m58s
CI / all-required (pull_request) Successful in 6m56s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 7s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 2s
E2E Chat / E2E Chat (pull_request) Successful in 3s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 1s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 2s
audit-force-merge / audit (pull_request) Successful in 3s
The ECR registry triplet (account.dkr.ecr.region.amazonaws.com = 153263036946.dkr.ecr.us-east-2.amazonaws.com) is currently hardcoded in every publish/verify workflow across 4+ repos. Switching AWS accounts or regions means touching every workflow. Refactor each affected workflow's env block to source the triplet from `vars.ECR_REGISTRY` with the current prod-account literal as a bootstrap fallback. Once the org-level variable is set, the fallback becomes dead code and an account/region migration is a one-line change at the org level instead of N PRs. Pattern mirrors `vars.CP_URL || 'https://api.moleculesai.app'` already in use in molecule-core/staging-verify.yml + redeploy-tenants-on-main.yml — proven to work on Gitea 1.22.6. Constraints honored: - No cross-repo `uses:` (blocked on 1.22.6 per feedback_gitea_cross_repo_uses_blocked). - No new admin-required setup (the org-level var can be set later by CTO without touching these workflows again). - Zero functional change today (fallback literal == current hardcoded value), so the in-flight cascade (publish → ECR → redeploy-fleet) is unaffected.
371 lines
18 KiB
YAML
371 lines
18 KiB
YAML
name: publish-workspace-server-image
|
|
|
|
# Gitea Actions port of .github/workflows/publish-workspace-server-image.yml.
|
|
#
|
|
# Ported 2026-05-10 (issue #228). Key differences from the GitHub version:
|
|
# - Gitea Actions reads .gitea/workflows/, not .github/workflows/
|
|
# - Dropped `environment:` declarations — Gitea Actions does not support
|
|
# named environments (used by GitHub OIDC token gates)
|
|
# - Replaced `github.ref_name` (GitHub-only) with `${GITHUB_REF#refs/heads/}`
|
|
# — Gitea Actions exposes GITHUB_REF in the same format as GitHub Actions
|
|
# - docker/setup-buildx-action and aws-actions/configure-aws-credentials are
|
|
# GitHub Marketplace actions; they are installed by Gitea Actions runners and
|
|
# work identically here
|
|
# - All other variables (GITHUB_SHA, GITHUB_REPOSITORY, GITHUB_OUTPUT,
|
|
# secrets.*) use the same syntax as GitHub Actions
|
|
#
|
|
# Image tags produced:
|
|
# :staging-<sha> — per-commit digest, stable for canary verify
|
|
# :staging-latest — tracks most recent build on this branch
|
|
#
|
|
# Production auto-deploy:
|
|
# After both platform and tenant images are pushed, deploy-production waits
|
|
# for strict required push contexts on the same SHA to go green, then
|
|
# calls the production CP redeploy-fleet endpoint with target_tag=
|
|
# staging-<sha>. Set repo variable or secret PROD_AUTO_DEPLOY_DISABLED=true
|
|
# to stop production rollout while keeping image publishing enabled.
|
|
#
|
|
# ECR target: 153263036946.dkr.ecr.us-east-2.amazonaws.com/molecule-ai/*
|
|
# Required secrets: AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AUTO_SYNC_TOKEN
|
|
#
|
|
# mc#711: Docker daemon not accessible on ubuntu-latest runner (molecule-canonical-1
|
|
# shows client-only in `docker info` — daemon not running). DinD mount is present but
|
|
# daemon doesn't respond. Fix: add diagnostic step showing socket info so ops can
|
|
# identify which runners have a live daemon. If no daemon is available, the job
|
|
# fails fast with actionable output rather than silent deep failure.
|
|
|
|
on:
|
|
push:
|
|
branches: [main]
|
|
workflow_dispatch:
|
|
|
|
# No `concurrency:` block here. Gitea 1.22.6 can cancel queued runs despite
|
|
# `cancel-in-progress: false`; that is not acceptable for a workflow with a
|
|
# production deploy job. Per-SHA image tags are immutable, and staging-latest is
|
|
# best-effort last-writer-wins metadata.
|
|
#
|
|
# 2026-05-20 retrigger: run #86994 on mc#1589 merge sha 0f0f1ba2 failed at
|
|
# setup-buildx-action with EACCES on PC2 WSL publish runner — the runner's
|
|
# DOCKER_CONFIG=/home/hongming/.docker-ecr/ dir didn't have a buildx/certs
|
|
# subdir writable by the container's UID 1001. Hot-patched the dir perms;
|
|
# this chore push retriggers the workflow. Proper fix (per-runner
|
|
# DOCKER_CONFIG owned by 1001, internal#597 --env HOME=/home/runner pattern)
|
|
# is tracked as a CI-hygiene follow-up — not in scope here.
|
|
|
|
permissions:
|
|
contents: read
|
|
packages: write
|
|
|
|
env:
|
|
# SSOT-Instance-10 (#333): ECR registry triplet (account.dkr.ecr.region.amazonaws.com)
|
|
# sourced from org/repo var `ECR_REGISTRY` with the current prod-account literal as
|
|
# bootstrap fallback. When the org var is set, the fallback becomes dead code and
|
|
# switching accounts/regions is a one-line change at the org level (instead of
|
|
# touching every workflow). Pattern mirrors `vars.CP_URL || 'literal'` already in
|
|
# use below in this repo's staging-verify.yml.
|
|
IMAGE_NAME: ${{ vars.ECR_REGISTRY || '153263036946.dkr.ecr.us-east-2.amazonaws.com' }}/molecule-ai/platform
|
|
TENANT_IMAGE_NAME: ${{ vars.ECR_REGISTRY || '153263036946.dkr.ecr.us-east-2.amazonaws.com' }}/molecule-ai/platform-tenant
|
|
|
|
jobs:
|
|
build-and-push:
|
|
# Dedicated publish/release lane (internal#462 / #394 / #399). This
|
|
# is a post-merge ship job (on: push:main) — it must NOT FIFO-compete
|
|
# with PR required-CI on the shared pool (PR#1350's prod image build
|
|
# was delayed ~25min this way). The `publish` label resolves ONLY to
|
|
# the reserved molecule-runner-publish-* sub-pool (config.publish.yaml,
|
|
# OUTSIDE the managed 1..20 range) so a merged fix's image build
|
|
# starts immediately while PR-CI keeps the general pool.
|
|
runs-on: publish
|
|
steps:
|
|
- name: Checkout
|
|
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
|
|
|
|
# Health check: verify Docker daemon is accessible before attempting any
|
|
# build steps. This fails loudly at step 1 when the runner's docker.sock
|
|
# is inaccessible rather than silently continuing where `docker build`
|
|
# fails deep in the process with a cryptic ECR auth error.
|
|
- name: Verify Docker daemon access
|
|
run: |
|
|
set -euo pipefail
|
|
echo "::group::Docker daemon health check"
|
|
echo "Runner: ${HOSTNAME:-unknown}"
|
|
docker_info="$(docker info 2>&1)" || {
|
|
echo "::error::Docker daemon is not accessible at /var/run/docker.sock"
|
|
echo "::error::Runner: ${HOSTNAME:-unknown}"
|
|
printf '%s\n' "${docker_info}"
|
|
echo "::error::Check: (1) daemon is running, (2) runner user is in docker group, (3) sock permissions are 660+"
|
|
exit 1
|
|
}
|
|
printf '%s\n' "${docker_info}" | sed -n '1,5p'
|
|
echo "Docker daemon OK"
|
|
echo "::endgroup::"
|
|
|
|
# Pre-clone manifest deps before docker build.
|
|
#
|
|
# Why: workspace-template-* repos on Gitea are private. The pre-fix
|
|
# Dockerfile.tenant ran `git clone` inside an in-image stage with no
|
|
# auth path — every CI build failed. We clone in the trusted CI
|
|
# context where AUTO_SYNC_TOKEN is available and Dockerfile.tenant
|
|
# just COPYs from .tenant-bundle-deps/.
|
|
#
|
|
# Token: AUTO_SYNC_TOKEN is the devops-engineer persona PAT.
|
|
# clone-manifest.sh embeds it as basic-auth for the clones, then
|
|
# strips .git dirs — the token never enters the image.
|
|
- name: Pre-clone manifest deps
|
|
env:
|
|
MOLECULE_GITEA_TOKEN: ${{ secrets.AUTO_SYNC_TOKEN }}
|
|
run: |
|
|
set -euo pipefail
|
|
mkdir -p .tenant-bundle-deps
|
|
# Strip JSON5 comments before jq parsing — Integration Tester appends
|
|
# `// Triggered by ...` which breaks `jq` in clone-manifest.sh.
|
|
sed '/^[[:space:]]*\/\//d' manifest.json > .manifest-stripped.json
|
|
bash scripts/clone-manifest.sh \
|
|
.manifest-stripped.json \
|
|
.tenant-bundle-deps/workspace-configs-templates \
|
|
.tenant-bundle-deps/org-templates \
|
|
.tenant-bundle-deps/plugins
|
|
ws_count=$(find .tenant-bundle-deps/workspace-configs-templates -mindepth 1 -maxdepth 1 -type d | wc -l)
|
|
org_count=$(find .tenant-bundle-deps/org-templates -mindepth 1 -maxdepth 1 -type d | wc -l)
|
|
plugins_count=$(find .tenant-bundle-deps/plugins -mindepth 1 -maxdepth 1 -type d | wc -l)
|
|
echo "Cloned: ws=$ws_count org=$org_count plugins=$plugins_count"
|
|
|
|
- name: Compute tags
|
|
id: tags
|
|
run: |
|
|
echo "sha=${GITHUB_SHA::7}" >> "$GITHUB_OUTPUT"
|
|
|
|
# Build + push platform image (inline ECR auth — mirrors the operator-host
|
|
# approach; credentials come from GITHUB_SECRET_AWS_ACCESS_KEY_ID /
|
|
# GITHUB_SECRET_AWS_SECRET_ACCESS_KEY in Gitea Actions).
|
|
# docker buildx bake / build required for `imagetools inspect` digest
|
|
# capture in the CP pin-update step (RFC internal#229 §X step 4 PR-1).
|
|
- name: Set up Docker Buildx
|
|
uses: docker/setup-buildx-action@4d04d5d9486b7bd6fa91e7baf45bbb4f8b9deedd # v4.0.0
|
|
|
|
- name: Build & push platform image to ECR (staging-<sha> + staging-latest)
|
|
env:
|
|
IMAGE_NAME: ${{ env.IMAGE_NAME }}
|
|
TAG_SHA: staging-${{ steps.tags.outputs.sha }}
|
|
TAG_LATEST: staging-latest
|
|
GIT_SHA: ${{ github.sha }}
|
|
REPO: ${{ github.repository }}
|
|
AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
|
|
AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
|
|
AWS_DEFAULT_REGION: us-east-2
|
|
run: |
|
|
set -euo pipefail
|
|
ECR_REGISTRY="${IMAGE_NAME%%/*}"
|
|
aws ecr get-login-password --region us-east-2 | \
|
|
docker login --username AWS --password-stdin "${ECR_REGISTRY}"
|
|
docker buildx build \
|
|
--file ./workspace-server/Dockerfile \
|
|
--build-arg GIT_SHA="${GIT_SHA}" \
|
|
--label "org.opencontainers.image.source=https://git.moleculesai.app/molecule-ai/${REPO}" \
|
|
--label "org.opencontainers.image.revision=${GIT_SHA}" \
|
|
--label "org.opencontainers.image.created=$(date -u +%Y-%m-%dT%H:%M:%SZ)" \
|
|
--label "molecule.workflow.run_id=${GITHUB_RUN_ID}" \
|
|
--tag "${IMAGE_NAME}:${TAG_SHA}" \
|
|
--tag "${IMAGE_NAME}:${TAG_LATEST}" \
|
|
--push .
|
|
|
|
# Build + push tenant image (Go platform + Next.js canvas in one image).
|
|
- name: Build & push tenant image to ECR (staging-<sha> + staging-latest)
|
|
env:
|
|
TENANT_IMAGE_NAME: ${{ env.TENANT_IMAGE_NAME }}
|
|
TAG_SHA: staging-${{ steps.tags.outputs.sha }}
|
|
TAG_LATEST: staging-latest
|
|
GIT_SHA: ${{ github.sha }}
|
|
REPO: ${{ github.repository }}
|
|
AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
|
|
AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
|
|
AWS_DEFAULT_REGION: us-east-2
|
|
run: |
|
|
set -euo pipefail
|
|
ECR_REGISTRY="${TENANT_IMAGE_NAME%%/*}"
|
|
aws ecr get-login-password --region us-east-2 | \
|
|
docker login --username AWS --password-stdin "${ECR_REGISTRY}"
|
|
docker buildx build \
|
|
--file ./workspace-server/Dockerfile.tenant \
|
|
--build-arg NEXT_PUBLIC_PLATFORM_URL= \
|
|
--build-arg GIT_SHA="${GIT_SHA}" \
|
|
--label "org.opencontainers.image.source=https://git.moleculesai.app/molecule-ai/${REPO}" \
|
|
--label "org.opencontainers.image.revision=${GIT_SHA}" \
|
|
--label "org.opencontainers.image.created=$(date -u +%Y-%m-%dT%H:%M:%SZ)" \
|
|
--label "molecule.workflow.run_id=${GITHUB_RUN_ID}" \
|
|
--tag "${TENANT_IMAGE_NAME}:${TAG_SHA}" \
|
|
--tag "${TENANT_IMAGE_NAME}:${TAG_LATEST}" \
|
|
--push .
|
|
|
|
# bp-exempt: production deploy side-effect; merge is gated by CI / all-required and this job waits for push CI before acting.
|
|
deploy-production:
|
|
name: Production auto-deploy
|
|
needs: build-and-push
|
|
if: ${{ github.event_name == 'push' && github.ref == 'refs/heads/main' }}
|
|
# Publish/release lane (internal#462) — production deploy of a merged
|
|
# fix; reserved capacity, never queued behind PR-CI.
|
|
runs-on: publish
|
|
timeout-minutes: 75
|
|
env:
|
|
CP_URL: ${{ vars.PROD_CP_URL || 'https://api.moleculesai.app' }}
|
|
CP_ADMIN_API_TOKEN: ${{ secrets.CP_ADMIN_API_TOKEN }}
|
|
GITEA_HOST: git.moleculesai.app
|
|
GITEA_TOKEN: ${{ secrets.PROD_AUTO_DEPLOY_CONTROL_TOKEN || secrets.AUTO_SYNC_TOKEN }}
|
|
PROD_AUTO_DEPLOY_DISABLED: ${{ vars.PROD_AUTO_DEPLOY_DISABLED || secrets.PROD_AUTO_DEPLOY_DISABLED || '' }}
|
|
PROD_AUTO_DEPLOY_CANARY_SLUG: ${{ vars.PROD_AUTO_DEPLOY_CANARY_SLUG || 'hongming' }}
|
|
PROD_AUTO_DEPLOY_SOAK_SECONDS: ${{ vars.PROD_AUTO_DEPLOY_SOAK_SECONDS || '60' }}
|
|
PROD_AUTO_DEPLOY_BATCH_SIZE: ${{ vars.PROD_AUTO_DEPLOY_BATCH_SIZE || '3' }}
|
|
PROD_AUTO_DEPLOY_DRY_RUN: ${{ vars.PROD_AUTO_DEPLOY_DRY_RUN || '' }}
|
|
PROD_ALLOW_NON_PROD_CP_URL: ${{ vars.PROD_ALLOW_NON_PROD_CP_URL || '' }}
|
|
steps:
|
|
- name: Checkout
|
|
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
|
|
|
|
- name: Build deploy plan
|
|
id: plan
|
|
run: |
|
|
set -euo pipefail
|
|
python3 .gitea/scripts/prod-auto-deploy.py plan > "$RUNNER_TEMP/prod-auto-deploy-plan.json"
|
|
jq . "$RUNNER_TEMP/prod-auto-deploy-plan.json"
|
|
enabled="$(jq -r '.enabled' "$RUNNER_TEMP/prod-auto-deploy-plan.json")"
|
|
echo "enabled=$enabled" >> "$GITHUB_OUTPUT"
|
|
if [ "$enabled" != "true" ]; then
|
|
reason="$(jq -r '.disabled_reason' "$RUNNER_TEMP/prod-auto-deploy-plan.json")"
|
|
echo "::notice::Production auto-deploy disabled: $reason"
|
|
{
|
|
echo "## Production auto-deploy skipped"
|
|
echo ""
|
|
echo "Reason: \`$reason\`"
|
|
} >> "$GITHUB_STEP_SUMMARY"
|
|
exit 0
|
|
fi
|
|
if [ -z "${CP_ADMIN_API_TOKEN:-}" ]; then
|
|
echo "::error::CP_ADMIN_API_TOKEN secret is required for production auto-deploy."
|
|
exit 1
|
|
fi
|
|
if [ -z "${GITEA_TOKEN:-}" ]; then
|
|
echo "::error::AUTO_SYNC_TOKEN secret is required so production deploy can wait for green CI."
|
|
exit 1
|
|
fi
|
|
|
|
- name: Self-test production deploy helper
|
|
if: ${{ steps.plan.outputs.enabled == 'true' }}
|
|
run: |
|
|
set -euo pipefail
|
|
python3 -m pip install --quiet 'pytest==9.0.2' 'PyYAML==6.0.2'
|
|
python3 -m pytest .gitea/scripts/tests/test_prod_auto_deploy.py -q
|
|
python3 .gitea/scripts/lint-workflow-yaml.py --workflow-dir .gitea/workflows
|
|
|
|
- name: Wait for green main CI on this SHA
|
|
if: ${{ steps.plan.outputs.enabled == 'true' }}
|
|
run: |
|
|
set -euo pipefail
|
|
python3 .gitea/scripts/prod-auto-deploy.py wait-ci
|
|
|
|
- name: Call production CP redeploy-fleet
|
|
if: ${{ steps.plan.outputs.enabled == 'true' }}
|
|
run: |
|
|
set -euo pipefail
|
|
python3 .gitea/scripts/prod-auto-deploy.py assert-enabled
|
|
PLAN="$RUNNER_TEMP/prod-auto-deploy-plan.json"
|
|
TARGET_TAG="$(jq -r '.target_tag' "$PLAN")"
|
|
BODY="$(jq -c '.body' "$PLAN")"
|
|
|
|
echo "POST $CP_URL/cp/admin/tenants/redeploy-fleet"
|
|
echo " target_tag: $TARGET_TAG"
|
|
echo " body: $BODY"
|
|
|
|
HTTP_RESPONSE="$RUNNER_TEMP/prod-redeploy-response.json"
|
|
HTTP_CODE_FILE="$RUNNER_TEMP/prod-redeploy-http-code.txt"
|
|
set +e
|
|
curl -sS -o "$HTTP_RESPONSE" -w '%{http_code}' \
|
|
-m 1200 \
|
|
-H "Authorization: Bearer $CP_ADMIN_API_TOKEN" \
|
|
-H "Content-Type: application/json" \
|
|
-X POST "$CP_URL/cp/admin/tenants/redeploy-fleet" \
|
|
-d "$BODY" > "$HTTP_CODE_FILE"
|
|
set -e
|
|
|
|
HTTP_CODE="$(cat "$HTTP_CODE_FILE" 2>/dev/null || echo "000")"
|
|
[ -z "$HTTP_CODE" ] && HTTP_CODE="000"
|
|
echo "HTTP $HTTP_CODE"
|
|
jq '{ok, result_count: (.results // [] | length)}' "$HTTP_RESPONSE" || true
|
|
|
|
{
|
|
echo "## Production auto-deploy"
|
|
echo ""
|
|
echo "**Commit:** \`${GITHUB_SHA:0:7}\`"
|
|
echo "**Target tag:** \`$TARGET_TAG\`"
|
|
echo "**HTTP:** $HTTP_CODE"
|
|
echo ""
|
|
echo "### Per-tenant result"
|
|
echo ""
|
|
echo "| Slug | Phase | SSM Status | Exit | Healthz | Error present |"
|
|
echo "|------|-------|------------|------|---------|---------------|"
|
|
jq -r '.results[]? | "| \(.slug) | \(.phase) | \(.ssm_status // "-") | \(.ssm_exit_code) | \(.healthz_ok) | \((.error // "") != "") |"' "$HTTP_RESPONSE" || true
|
|
} >> "$GITHUB_STEP_SUMMARY"
|
|
|
|
if [ "$HTTP_CODE" != "200" ]; then
|
|
echo "::error::redeploy-fleet returned HTTP $HTTP_CODE"
|
|
exit 1
|
|
fi
|
|
OK="$(jq -r '.ok' "$HTTP_RESPONSE")"
|
|
if [ "$OK" != "true" ]; then
|
|
echo "::error::redeploy-fleet reported ok=false; production rollout halted."
|
|
exit 1
|
|
fi
|
|
|
|
- name: Verify reachable tenants report this SHA
|
|
if: ${{ steps.plan.outputs.enabled == 'true' }}
|
|
env:
|
|
TENANT_DOMAIN: moleculesai.app
|
|
run: |
|
|
set -euo pipefail
|
|
RESP="$RUNNER_TEMP/prod-redeploy-response.json"
|
|
mapfile -t SLUGS < <(jq -r '.results[]? | .slug' "$RESP")
|
|
if [ ${#SLUGS[@]} -eq 0 ]; then
|
|
echo "::error::No tenants returned from redeploy-fleet; refusing to mark production deploy verified."
|
|
exit 1
|
|
fi
|
|
|
|
STALE_COUNT=0
|
|
UNREACHABLE_COUNT=0
|
|
UNHEALTHY_COUNT=0
|
|
for slug in "${SLUGS[@]}"; do
|
|
healthz_ok="$(jq -r --arg slug "$slug" '.results[]? | select(.slug == $slug) | .healthz_ok' "$RESP" | tail -1)"
|
|
if [ "$healthz_ok" != "true" ]; then
|
|
echo "::error::$slug did not report healthz_ok=true in redeploy-fleet response."
|
|
UNHEALTHY_COUNT=$((UNHEALTHY_COUNT + 1))
|
|
continue
|
|
fi
|
|
url="https://${slug}.${TENANT_DOMAIN}/buildinfo"
|
|
body="$(curl -sS --max-time 30 --retry 3 --retry-delay 5 --retry-connrefused "$url" || true)"
|
|
actual="$(echo "$body" | jq -r '.git_sha // ""' 2>/dev/null || echo "")"
|
|
if [ -z "$actual" ]; then
|
|
echo "::error::$slug did not return /buildinfo after deploy."
|
|
UNREACHABLE_COUNT=$((UNREACHABLE_COUNT + 1))
|
|
continue
|
|
fi
|
|
if [ "$actual" != "$GITHUB_SHA" ]; then
|
|
echo "::error::$slug is stale: actual=${actual:0:7}, expected=${GITHUB_SHA:0:7}"
|
|
STALE_COUNT=$((STALE_COUNT + 1))
|
|
else
|
|
echo "$slug: ${actual:0:7}"
|
|
fi
|
|
done
|
|
|
|
{
|
|
echo ""
|
|
echo "### Buildinfo verification"
|
|
echo ""
|
|
echo "Expected SHA: \`${GITHUB_SHA:0:7}\`"
|
|
echo "Verified tenants: ${#SLUGS[@]}"
|
|
echo "Stale tenants: $STALE_COUNT"
|
|
echo "Unhealthy tenants: $UNHEALTHY_COUNT"
|
|
echo "Unreachable tenants: $UNREACHABLE_COUNT"
|
|
} >> "$GITHUB_STEP_SUMMARY"
|
|
|
|
if [ "$STALE_COUNT" -gt 0 ] || [ "$UNHEALTHY_COUNT" -gt 0 ] || [ "$UNREACHABLE_COUNT" -gt 0 ]; then
|
|
exit 1
|
|
fi
|