Adds the missing Upptime-format aggregator step that was lost in the Upptime → custom-probe migration (post-2026-05-06 GitHub suspension). Changes: - scripts/aggregate.py (NEW): Python script that reads history/<slug>.jsonl, computes rolling uptime% and response-time aggregates, writes history/<slug>.yml (latest status) and history/summary.json (day/week/month/year per-site aggregates) - .github/workflows/uptime-probe.yml: adds "Aggregate probe results" step between probe run and commit; ensures .yml and summary.json are regenerated on every probe tick Immediate effect: fixes false-positive "down" status on Canvas pricing and legal routes (stuck at 404 from 2026-04-19); refreshes all rolling uptime aggregates to reflect current probe data. See: molecule-ai/molecule-ai-status#7 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
129 lines
5.4 KiB
YAML
129 lines
5.4 KiB
YAML
name: Uptime probe (Gitea-native — replaces upptime)
|
|
#
|
|
# Runs the molecule-ai-uptime-probe binary on a 5-minute cadence,
|
|
# appends per-site JSONL results to history/, and commits the changes
|
|
# back to main. Replaces the five upptime workflows that lived in this
|
|
# repo before they were moved to .github/workflows-disabled/ (because
|
|
# every upptime call to api.github.com 401s post-2026-05-06 GitHub
|
|
# org suspension).
|
|
#
|
|
# See molecule-ai/molecule-ai-status#2 for the design rationale +
|
|
# molecule-ai/molecule-ai-uptime-probe for the probe binary itself.
|
|
#
|
|
# Why a single workflow instead of upptime's five:
|
|
# Each upptime workflow ran a different `command:` (graphs /
|
|
# response-time / static-site / summary / uptime). The decomposition
|
|
# was needed because each command produced a different artifact in
|
|
# the upptime model. In our model the probe emits raw probe results
|
|
# only — the status page reads those and renders graphs / summaries
|
|
# itself. One concern per tool. One workflow.
|
|
|
|
on:
|
|
schedule:
|
|
# Every 5 minutes — matches the upptime default cadence.
|
|
- cron: "*/5 * * * *"
|
|
# Manual trigger for ad-hoc checks.
|
|
workflow_dispatch:
|
|
# Re-run when probe-list config changes so a new endpoint gets a
|
|
# baseline immediately, not at the next /5 mark.
|
|
push:
|
|
branches: [main]
|
|
paths: [".upptimerc.yml"]
|
|
|
|
permissions:
|
|
contents: write # required to commit history/ updates
|
|
|
|
jobs:
|
|
probe:
|
|
name: Probe + commit
|
|
runs-on: ubuntu-latest
|
|
# Concurrency: at most one probe run at a time per branch. Two
|
|
# cron firings overlapping would race on history/ commits.
|
|
concurrency:
|
|
group: uptime-probe-${{ github.ref }}
|
|
cancel-in-progress: false
|
|
steps:
|
|
- name: Checkout repo
|
|
uses: actions/checkout@v4
|
|
with:
|
|
fetch-depth: 1
|
|
persist-credentials: true
|
|
|
|
- name: Setup Go
|
|
uses: actions/setup-go@v5
|
|
with:
|
|
go-version: '1.23'
|
|
token: ${{ secrets.GITEA_TOKEN }} # see molecule-ai/internal#75
|
|
|
|
- name: Install probe
|
|
# Build directly from the probe's repo at a pinned commit. Pin
|
|
# is updated explicitly in this workflow file when the probe
|
|
# itself ships a new behaviour-changing version. Avoids
|
|
# supply-chain ambiguity.
|
|
run: |
|
|
set -euo pipefail
|
|
GOPROBE_REPO=https://git.moleculesai.app/molecule-ai/molecule-ai-uptime-probe.git
|
|
GOPROBE_REF=main
|
|
tmp=$(mktemp -d)
|
|
git clone --depth 1 --branch "$GOPROBE_REF" "$GOPROBE_REPO" "$tmp/probe"
|
|
(cd "$tmp/probe" && go build -o /usr/local/bin/uptime-probe ./cmd/probe)
|
|
/usr/local/bin/uptime-probe -h 2>&1 | head -5
|
|
|
|
- name: Run probes
|
|
# Exit 1 from the probe when any site fails — but we don't
|
|
# want a single failing site to abort the workflow before the
|
|
# commit step. `|| true` swallows the non-zero exit; the
|
|
# failure shows up as success=false in the JSONL history,
|
|
# where the status page picks it up.
|
|
run: |
|
|
mkdir -p history
|
|
/usr/local/bin/uptime-probe \
|
|
-config .upptimerc.yml \
|
|
-history-dir history \
|
|
-timeout 30s \
|
|
> /tmp/run.json || true
|
|
echo "== run summary =="
|
|
jq -r '.[] | "\(.name): \(.status_code) \(.latency_ms)ms success=\(.success)"' /tmp/run.json || cat /tmp/run.json
|
|
|
|
- name: Aggregate probe results → Upptime format
|
|
# Reads history/<slug>.jsonl files, computes rolling uptime/response-time
|
|
# aggregates, and writes history/<slug>.yml + history/summary.json.
|
|
# This fills the gap left by the Upptime → custom-probe migration:
|
|
# the probe binary handles JSONL appends; this step handles the
|
|
# aggregator outputs that the status page UI reads.
|
|
# See molecule-ai/molecule-ai-status#7.
|
|
run: |
|
|
set -euo pipefail
|
|
python3 scripts/aggregate.py --history-dir history
|
|
|
|
- name: Commit history changes
|
|
# Fails fast if Gitea is unhealthy rather than silently swallowing
|
|
# the push. The next /5 cron firing picks up where this left off once
|
|
# Gitea recovers. Also guarded against concurrent-run race: the
|
|
# workflow-level concurrency group (line ~42) ensures at most one
|
|
# probe run per branch is in-flight at any time.
|
|
run: |
|
|
set -euo pipefail
|
|
|
|
# Health gate: fail fast if Gitea is 502 or otherwise unreachable.
|
|
# The probe ran successfully; we just can't persist the results yet.
|
|
GATEWAY="https://git.moleculesai.app"
|
|
HTTP_CODE=$(curl -s -o /dev/null -w "%{http_code}" \
|
|
--max-time 10 "$GATEWAY/api/v1/version" || echo 000)
|
|
if [ "$HTTP_CODE" != "200" ]; then
|
|
echo "::error::Gitea unhealthy (HTTP $HTTP_CODE) — cannot push results."
|
|
echo "::error::Probe data is in history/. Next successful push after Gitea"
|
|
echo "::error::recovers will commit all buffered results."
|
|
exit 1
|
|
fi
|
|
|
|
git config user.name "uptime-probe[bot]"
|
|
git config user.email "uptime-probe@bots.moleculesai.app"
|
|
git add history/
|
|
if git diff --cached --quiet; then
|
|
echo "no history changes to commit"
|
|
exit 0
|
|
fi
|
|
git commit -m "chore(uptime): probe results $(date -u +%Y-%m-%dT%H:%M:%SZ)"
|
|
git push origin HEAD:main
|