Commit Graph

9 Commits

Author SHA1 Message Date
4cf1393feb fix(status): add probe result aggregator + update uptime-probe workflow
Adds the missing Upptime-format aggregator step that was lost in the
Upptime → custom-probe migration (post-2026-05-06 GitHub suspension).

Changes:
- scripts/aggregate.py (NEW): Python script that reads history/<slug>.jsonl,
  computes rolling uptime% and response-time aggregates, writes
  history/<slug>.yml (latest status) and history/summary.json
  (day/week/month/year per-site aggregates)
- .github/workflows/uptime-probe.yml: adds "Aggregate probe results"
  step between probe run and commit; ensures .yml and summary.json
  are regenerated on every probe tick

Immediate effect: fixes false-positive "down" status on Canvas pricing
and legal routes (stuck at 404 from 2026-04-19); refreshes all rolling
uptime aggregates to reflect current probe data.

See: molecule-ai/molecule-ai-status#7

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-10 14:59:52 +00:00
09412a76e4 chore(ci): wrap curl with || echo 000 in uptime-probe health gate
curl without -f returns non-zero on connection failure (DNS/timeout)
before reading the HTTP status line, causing a bare exit under set -euo
pipefail. Wrapping with || echo 000 ensures the friendly ::error::
message fires for all failure modes (HTTP 502 and connect failures alike).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-10 13:39:19 +00:00
edf2bfeefa fix(ci): add Gitea health gate before push in uptime-probe
Probe runs on GitHub Actions (ubuntu-latest) — confirmed independent of
Gitea Actions runner. Previously the commit step silently swallowed push
failures with `|| echo "push failed"`. Now:

1. Health gate: checks git.moleculesai.app/api/v1/version returns 200
   before pushing. Fails fast with a clear ::error message if Gitea is
   502 or unreachable, rather than silently skipping the push.

2. Fail loudly: `set -euo pipefail` replaces `set +e`, so any push error
   surfaces as a workflow failure (visible in GitHub Actions UI).

3. Self-heals: the next /5 cron firing picks up the buffered history/
   results once Gitea recovers.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-10 13:32:18 +00:00
claude-ceo-assistant
c3550f5e16 feat(ci): replace upptime with Gitea-native uptime probe (closes #2)
Combined transition PR — does the full upptime → Gitea-native cron
swap in one place, vs two separate PRs that would land in interleaved
state.

Why upptime had to go
- All 5 upptime workflows call api.github.com for releases lookup,
  issue management, and result commits.
- Post the 2026-05-06 GitHub org suspension, no token in our org
  authenticates against api.github.com — every scheduled run fails
  with HTTP 401 "Bad credentials". Run #70 is the most recent
  example; the failure mode has been continuous since the suspension.

What this PR does
- Moves all 5 upptime workflows from .github/workflows/ to
  .github/workflows-disabled/. Gitea Actions does not scan that
  directory, so they stop scheduling immediately on merge.
- Adds .github/workflows-disabled/README.md explaining the move +
  linking #2 + linking the replacement.
- Adds a single new .github/workflows/uptime-probe.yml that runs the
  new Gitea-native probe (https://git.moleculesai.app/molecule-ai/
  molecule-ai-uptime-probe) on a 5-minute cadence and commits per-site
  JSONL history to history/.

Why a single new workflow vs the upptime decomposition
- Each upptime workflow ran a different command: argument
  (graphs / response-time / static-site / summary / uptime). The
  decomposition existed because each command produced a different
  artifact in upptime's model.
- Our model: probe emits raw probe results only. Status page (Vercel,
  separate PR) reads those JSONL files and renders graphs/summaries
  itself. One concern per tool, one workflow.

History migration: out of scope. Existing history/ JSON files (one
per site) stay untouched; the new probe writes a new
history/<slug>.jsonl alongside. Whether to back-fill or archive the
old format is a separate decision tracked in the issue body.

Status page rebuild: out of scope. Vercel app reading JSONL is
follow-up — first we want to see real probe data flowing for ~24h.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 06:39:08 +00:00
dev-lead
4acb103181 chore(ci): noop workflow to clear stale red badge
After PR #5 moved all 5 upptime workflows out of .github/workflows/,
no CI fires on push to main. The dashboard's CI status badge is
sticky on the LAST CI run, which was the broken upptime cron from
before the disable — so the repo displays a permanent red X.

Add a tiny noop workflow that prints why the repo is idle and
exits 0. Fires on push + daily cron so the badge stays accurate.

Replacement tracked in internal#97 (external uptime monitor RFC).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 10:00:37 -07:00
dev-lead
f7d5342c44 chore(ci): disable upptime workflows (post-suspension)
Five status-page workflows have been red on every cron tick (5x/hour)
since the 2026-05-06 GitHub org suspension. Symptom from the latest
run (run 8002):

  url: api.github.com/repos/upptime/uptime-monitor/releases?per_page=1
  data: { message: 'Bad credentials', status: '401' }

upptime fundamentally cannot work on this infra:
  - upstream upptime/uptime-monitor action calls api.github.com on
    every run to check its own version
  - GitHub Molecule-AI org PAT is dead
  - operator-host anonymous IP is rate-limited
  - re-tokenizing with a personal PAT recreates the bot-ring shape
    that triggered the original suspension (memory:
    feedback_github_botring_fingerprint)

Move the five workflow files to .github/workflows-disabled-post-suspension/
so Gitea Actions stops dispatching them. This eliminates the 5x/hour
red CI noise on dashboards and stops paging on a known-impossible run.

Replacement plan: external uptime monitor (StatusPage.io, BetterStack,
healthchecks.io). RFC follow-up filed separately on internal#.

Files moved (no functional change to YAML):
  - uptime.yml
  - response-time.yml
  - graphs.yml
  - summary.yml
  - static-site.yml

Plus a README explaining why under the new dir.

Rollback: git mv them back if upptime ever becomes runnable again.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 08:56:17 -07:00
Hongming Wang
50d221341b fix(ci): pin Upptime to v1.41.0 + add gh-pages deploy step
Static Site CI was building the Sapper export successfully but had no
step to publish it — `gh-pages` branch never existed so GitHub Pages
couldn't serve anything. Matches the canonical Upptime template:

- static-site.yml: adds peaceiris/actions-gh-pages@v4 step publishing
  site/status-page/__sapper__/export/ to gh-pages
- All 5 workflows: pin upptime/uptime-monitor to @v1.41.0 instead of
  @master (reproducibility + matches upstream template expectations)
2026-04-15 14:25:53 -07:00
Hongming Wang
2d4048b088 fix(ci): add actions/checkout to all 5 Upptime workflows
The Upptime action reads .upptimerc.yml from the current working
directory; without an explicit checkout step the runner starts in an
empty dir and every run fails with ENOENT. Adding actions/checkout@v4
with fetch-depth: 0 (full history required by the commit-back step)
as the first step of each workflow.

Observed on the initial-commit runs of uptime.yml + static-site.yml
which both failed with:
  ERROR [Error: ENOENT: no such file or directory, open '.upptimerc.yml']
2026-04-15 14:22:55 -07:00
Hongming Wang
967313cbca chore: initial Upptime scaffold for status.moleculesai.app
Seeds the Upptime-powered status page for Molecule AI. Zero-infra:
GitHub Actions cron every 5min checks each endpoint, commits the
result to history/, and rebuilds the static site into the gh-pages
branch. Incident detection auto-opens Issues in this repo.

- .upptimerc.yml — five sites monitored on first cut:
  - molecule-cp /health + /legal/terms
  - moleculesai.app / + /pricing + /legal/terms
  Each has a display name that matches the status page UI.
- .github/workflows/uptime.yml       — 5min uptime check
- .github/workflows/response-time.yml — hourly latency histogram
- .github/workflows/graphs.yml        — daily long-term graphs
- .github/workflows/static-site.yml   — hourly site rebuild
- .github/workflows/summary.yml       — daily README badge refresh
- README.md — landing page with workflow status badges, Upptime
  markers for auto-populated status section
- history/.gitkeep — placeholder so the workflows' first run has a
  dir to commit into
- LICENSE — MIT

Next steps documented separately: enable GitHub Pages (Settings →
Pages → Source: gh-pages branch), add DNS CNAME record for
status.moleculesai.app → molecule-ai.github.io.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 14:20:45 -07:00