Adds the missing Upptime-format aggregator step that was lost in the
Upptime → custom-probe migration (post-2026-05-06 GitHub suspension).
Changes:
- scripts/aggregate.py (NEW): Python script that reads history/<slug>.jsonl,
computes rolling uptime% and response-time aggregates, writes
history/<slug>.yml (latest status) and history/summary.json
(day/week/month/year per-site aggregates)
- .github/workflows/uptime-probe.yml: adds "Aggregate probe results"
step between probe run and commit; ensures .yml and summary.json
are regenerated on every probe tick
Immediate effect: fixes false-positive "down" status on Canvas pricing
and legal routes (stuck at 404 from 2026-04-19); refreshes all rolling
uptime aggregates to reflect current probe data.
See: molecule-ai/molecule-ai-status#7
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
curl without -f returns non-zero on connection failure (DNS/timeout)
before reading the HTTP status line, causing a bare exit under set -euo
pipefail. Wrapping with || echo 000 ensures the friendly ::error::
message fires for all failure modes (HTTP 502 and connect failures alike).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Probe runs on GitHub Actions (ubuntu-latest) — confirmed independent of
Gitea Actions runner. Previously the commit step silently swallowed push
failures with `|| echo "push failed"`. Now:
1. Health gate: checks git.moleculesai.app/api/v1/version returns 200
before pushing. Fails fast with a clear ::error message if Gitea is
502 or unreachable, rather than silently skipping the push.
2. Fail loudly: `set -euo pipefail` replaces `set +e`, so any push error
surfaces as a workflow failure (visible in GitHub Actions UI).
3. Self-heals: the next /5 cron firing picks up the buffered history/
results once Gitea recovers.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Combined transition PR — does the full upptime → Gitea-native cron
swap in one place, vs two separate PRs that would land in interleaved
state.
Why upptime had to go
- All 5 upptime workflows call api.github.com for releases lookup,
issue management, and result commits.
- Post the 2026-05-06 GitHub org suspension, no token in our org
authenticates against api.github.com — every scheduled run fails
with HTTP 401 "Bad credentials". Run #70 is the most recent
example; the failure mode has been continuous since the suspension.
What this PR does
- Moves all 5 upptime workflows from .github/workflows/ to
.github/workflows-disabled/. Gitea Actions does not scan that
directory, so they stop scheduling immediately on merge.
- Adds .github/workflows-disabled/README.md explaining the move +
linking #2 + linking the replacement.
- Adds a single new .github/workflows/uptime-probe.yml that runs the
new Gitea-native probe (https://git.moleculesai.app/molecule-ai/
molecule-ai-uptime-probe) on a 5-minute cadence and commits per-site
JSONL history to history/.
Why a single new workflow vs the upptime decomposition
- Each upptime workflow ran a different command: argument
(graphs / response-time / static-site / summary / uptime). The
decomposition existed because each command produced a different
artifact in upptime's model.
- Our model: probe emits raw probe results only. Status page (Vercel,
separate PR) reads those JSONL files and renders graphs/summaries
itself. One concern per tool, one workflow.
History migration: out of scope. Existing history/ JSON files (one
per site) stay untouched; the new probe writes a new
history/<slug>.jsonl alongside. Whether to back-fill or archive the
old format is a separate decision tracked in the issue body.
Status page rebuild: out of scope. Vercel app reading JSONL is
follow-up — first we want to see real probe data flowing for ~24h.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
After PR #5 moved all 5 upptime workflows out of .github/workflows/,
no CI fires on push to main. The dashboard's CI status badge is
sticky on the LAST CI run, which was the broken upptime cron from
before the disable — so the repo displays a permanent red X.
Add a tiny noop workflow that prints why the repo is idle and
exits 0. Fires on push + daily cron so the badge stays accurate.
Replacement tracked in internal#97 (external uptime monitor RFC).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Five status-page workflows have been red on every cron tick (5x/hour)
since the 2026-05-06 GitHub org suspension. Symptom from the latest
run (run 8002):
url: api.github.com/repos/upptime/uptime-monitor/releases?per_page=1
data: { message: 'Bad credentials', status: '401' }
upptime fundamentally cannot work on this infra:
- upstream upptime/uptime-monitor action calls api.github.com on
every run to check its own version
- GitHub Molecule-AI org PAT is dead
- operator-host anonymous IP is rate-limited
- re-tokenizing with a personal PAT recreates the bot-ring shape
that triggered the original suspension (memory:
feedback_github_botring_fingerprint)
Move the five workflow files to .github/workflows-disabled-post-suspension/
so Gitea Actions stops dispatching them. This eliminates the 5x/hour
red CI noise on dashboards and stops paging on a known-impossible run.
Replacement plan: external uptime monitor (StatusPage.io, BetterStack,
healthchecks.io). RFC follow-up filed separately on internal#.
Files moved (no functional change to YAML):
- uptime.yml
- response-time.yml
- graphs.yml
- summary.yml
- static-site.yml
Plus a README explaining why under the new dir.
Rollback: git mv them back if upptime ever becomes runnable again.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Static Site CI was building the Sapper export successfully but had no
step to publish it — `gh-pages` branch never existed so GitHub Pages
couldn't serve anything. Matches the canonical Upptime template:
- static-site.yml: adds peaceiris/actions-gh-pages@v4 step publishing
site/status-page/__sapper__/export/ to gh-pages
- All 5 workflows: pin upptime/uptime-monitor to @v1.41.0 instead of
@master (reproducibility + matches upstream template expectations)
The Upptime action reads .upptimerc.yml from the current working
directory; without an explicit checkout step the runner starts in an
empty dir and every run fails with ENOENT. Adding actions/checkout@v4
with fetch-depth: 0 (full history required by the commit-back step)
as the first step of each workflow.
Observed on the initial-commit runs of uptime.yml + static-site.yml
which both failed with:
ERROR [Error: ENOENT: no such file or directory, open '.upptimerc.yml']
Seeds the Upptime-powered status page for Molecule AI. Zero-infra:
GitHub Actions cron every 5min checks each endpoint, commits the
result to history/, and rebuilds the static site into the gh-pages
branch. Incident detection auto-opens Issues in this repo.
- .upptimerc.yml — five sites monitored on first cut:
- molecule-cp /health + /legal/terms
- moleculesai.app / + /pricing + /legal/terms
Each has a display name that matches the status page UI.
- .github/workflows/uptime.yml — 5min uptime check
- .github/workflows/response-time.yml — hourly latency histogram
- .github/workflows/graphs.yml — daily long-term graphs
- .github/workflows/static-site.yml — hourly site rebuild
- .github/workflows/summary.yml — daily README badge refresh
- README.md — landing page with workflow status badges, Upptime
markers for auto-populated status section
- history/.gitkeep — placeholder so the workflows' first run has a
dir to commit into
- LICENSE — MIT
Next steps documented separately: enable GitHub Pages (Settings →
Pages → Source: gh-pages branch), add DNS CNAME record for
status.moleculesai.app → molecule-ai.github.io.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>