feat(ci): replace upptime with Gitea-native uptime probe (closes #2)
Combined transition PR — does the full upptime → Gitea-native cron swap in one place, vs two separate PRs that would land in interleaved state. Why upptime had to go - All 5 upptime workflows call api.github.com for releases lookup, issue management, and result commits. - Post the 2026-05-06 GitHub org suspension, no token in our org authenticates against api.github.com — every scheduled run fails with HTTP 401 "Bad credentials". Run #70 is the most recent example; the failure mode has been continuous since the suspension. What this PR does - Moves all 5 upptime workflows from .github/workflows/ to .github/workflows-disabled/. Gitea Actions does not scan that directory, so they stop scheduling immediately on merge. - Adds .github/workflows-disabled/README.md explaining the move + linking #2 + linking the replacement. - Adds a single new .github/workflows/uptime-probe.yml that runs the new Gitea-native probe (https://git.moleculesai.app/molecule-ai/ molecule-ai-uptime-probe) on a 5-minute cadence and commits per-site JSONL history to history/. Why a single new workflow vs the upptime decomposition - Each upptime workflow ran a different command: argument (graphs / response-time / static-site / summary / uptime). The decomposition existed because each command produced a different artifact in upptime's model. - Our model: probe emits raw probe results only. Status page (Vercel, separate PR) reads those JSONL files and renders graphs/summaries itself. One concern per tool, one workflow. History migration: out of scope. Existing history/ JSON files (one per site) stay untouched; the new probe writes a new history/<slug>.jsonl alongside. Whether to back-fill or archive the old format is a separate decision tracked in the issue body. Status page rebuild: out of scope. Vercel app reading JSONL is follow-up — first we want to see real probe data flowing for ~24h. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
parent
25d0896c6b
commit
08066d3d67
22
.github/workflows-disabled/README.md
vendored
Normal file
22
.github/workflows-disabled/README.md
vendored
Normal file
@ -0,0 +1,22 @@
|
||||
# Disabled upptime workflows
|
||||
|
||||
These five workflows (`graphs.yml`, `response-time.yml`,
|
||||
`static-site.yml`, `summary.yml`, `uptime.yml`) are upptime-driven
|
||||
and call `api.github.com` for releases lookup, issue management, and
|
||||
result commits.
|
||||
|
||||
Post the 2026-05-06 GitHub org suspension, no token in our org
|
||||
authenticates against api.github.com, so every scheduled run failed
|
||||
with HTTP 401 "Bad credentials". See `molecule-ai-status#2` for full
|
||||
diagnosis + the replacement plan.
|
||||
|
||||
Workflows here will not be re-enabled — they're moved to
|
||||
`workflows-disabled/` so the failed-run noise stops while the
|
||||
replacement (Gitea-native uptime probe at
|
||||
`molecule-ai/molecule-ai-uptime-probe`) is built. The new probe runs
|
||||
under `.github/workflows/uptime-probe.yml`.
|
||||
|
||||
Delete this directory after the replacement has run for ~7 days
|
||||
clean and the existing history is either migrated or marked archived.
|
||||
|
||||
Tracked: molecule-ai-status#2
|
||||
101
.github/workflows/uptime-probe.yml
vendored
Normal file
101
.github/workflows/uptime-probe.yml
vendored
Normal file
@ -0,0 +1,101 @@
|
||||
name: Uptime probe (Gitea-native — replaces upptime)
|
||||
#
|
||||
# Runs the molecule-ai-uptime-probe binary on a 5-minute cadence,
|
||||
# appends per-site JSONL results to history/, and commits the changes
|
||||
# back to main. Replaces the five upptime workflows that lived in this
|
||||
# repo before they were moved to .github/workflows-disabled/ (because
|
||||
# every upptime call to api.github.com 401s post-2026-05-06 GitHub
|
||||
# org suspension).
|
||||
#
|
||||
# See molecule-ai/molecule-ai-status#2 for the design rationale +
|
||||
# molecule-ai/molecule-ai-uptime-probe for the probe binary itself.
|
||||
#
|
||||
# Why a single workflow instead of upptime's five:
|
||||
# Each upptime workflow ran a different `command:` (graphs /
|
||||
# response-time / static-site / summary / uptime). The decomposition
|
||||
# was needed because each command produced a different artifact in
|
||||
# the upptime model. In our model the probe emits raw probe results
|
||||
# only — the status page reads those and renders graphs / summaries
|
||||
# itself. One concern per tool. One workflow.
|
||||
|
||||
on:
|
||||
schedule:
|
||||
# Every 5 minutes — matches the upptime default cadence.
|
||||
- cron: "*/5 * * * *"
|
||||
# Manual trigger for ad-hoc checks.
|
||||
workflow_dispatch:
|
||||
# Re-run when probe-list config changes so a new endpoint gets a
|
||||
# baseline immediately, not at the next /5 mark.
|
||||
push:
|
||||
branches: [main]
|
||||
paths: [".upptimerc.yml"]
|
||||
|
||||
permissions:
|
||||
contents: write # required to commit history/ updates
|
||||
|
||||
jobs:
|
||||
probe:
|
||||
name: Probe + commit
|
||||
runs-on: ubuntu-latest
|
||||
# Concurrency: at most one probe run at a time per branch. Two
|
||||
# cron firings overlapping would race on history/ commits.
|
||||
concurrency:
|
||||
group: uptime-probe-${{ github.ref }}
|
||||
cancel-in-progress: false
|
||||
steps:
|
||||
- name: Checkout repo
|
||||
uses: actions/checkout@v4
|
||||
with:
|
||||
fetch-depth: 1
|
||||
persist-credentials: true
|
||||
|
||||
- name: Setup Go
|
||||
uses: actions/setup-go@v5
|
||||
with:
|
||||
go-version: '1.23'
|
||||
token: ${{ secrets.GITEA_TOKEN }} # see molecule-ai/internal#75
|
||||
|
||||
- name: Install probe
|
||||
# Build directly from the probe's repo at a pinned commit. Pin
|
||||
# is updated explicitly in this workflow file when the probe
|
||||
# itself ships a new behaviour-changing version. Avoids
|
||||
# supply-chain ambiguity.
|
||||
run: |
|
||||
set -euo pipefail
|
||||
GOPROBE_REPO=https://git.moleculesai.app/molecule-ai/molecule-ai-uptime-probe.git
|
||||
GOPROBE_REF=main
|
||||
tmp=$(mktemp -d)
|
||||
git clone --depth 1 --branch "$GOPROBE_REF" "$GOPROBE_REPO" "$tmp/probe"
|
||||
(cd "$tmp/probe" && go build -o /usr/local/bin/uptime-probe ./cmd/probe)
|
||||
/usr/local/bin/uptime-probe -h 2>&1 | head -5
|
||||
|
||||
- name: Run probes
|
||||
# Exit 1 from the probe when any site fails — but we don't
|
||||
# want a single failing site to abort the workflow before the
|
||||
# commit step. `|| true` swallows the non-zero exit; the
|
||||
# failure shows up as success=false in the JSONL history,
|
||||
# where the status page picks it up.
|
||||
run: |
|
||||
mkdir -p history
|
||||
/usr/local/bin/uptime-probe \
|
||||
-config .upptimerc.yml \
|
||||
-history-dir history \
|
||||
-timeout 30s \
|
||||
> /tmp/run.json || true
|
||||
echo "== run summary =="
|
||||
jq -r '.[] | "\(.name): \(.status_code) \(.latency_ms)ms success=\(.success)"' /tmp/run.json || cat /tmp/run.json
|
||||
|
||||
- name: Commit history changes (best-effort)
|
||||
# Best-effort: a transient git push race shouldn't block the
|
||||
# next probe run. The next /5 firing will commit again.
|
||||
run: |
|
||||
set +e
|
||||
git config user.name "uptime-probe[bot]"
|
||||
git config user.email "uptime-probe@bots.moleculesai.app"
|
||||
git add history/
|
||||
if git diff --cached --quiet; then
|
||||
echo "no history changes to commit"
|
||||
exit 0
|
||||
fi
|
||||
git commit -m "chore(uptime): probe results $(date -u +%Y-%m-%dT%H:%M:%SZ)"
|
||||
git push origin HEAD:main || echo "push failed; next run will retry"
|
||||
Loading…
Reference in New Issue
Block a user