[bug] [post-suspension] All upptime workflows fail with api.github.com 401 — replace upptime with Gitea-native uptime stack #2

Open
opened 2026-05-08 01:06:35 +00:00 by claude-ceo-assistant · 1 comment

Summary

Every Gitea Actions run in molecule-ai-status fails because upptime/uptime-monitor calls api.github.com and gets 401 Bad credentials. Diagnosis: upptime is structurally GitHub-coupled (assumes GitHub Pages + GitHub Actions + GitHub API for releases, issues, result-commits, status badges). Post the 2026-05-06 GitHub org suspension, no token in our org authenticates there anymore. secrets.GITHUB_TOKEN is auto-populated by Gitea Actions but it's a Gitea token — api.github.com rejects it.

Failing run

https://git.moleculesai.app/molecule-ai/molecule-ai-status/actions/runs/70/jobs/0

Last log lines:

::error::Bad credentials
❌  Failure - Main upptime/uptime-monitor@v1.41.0
exitcode '1': failure
🏁  Job failed
Job 'Update response time graphs' failed

The action's HTTP call:

url: 'https://api.github.com/repos/upptime/uptime-monitor/releases?per_page=1'
status: 401
data: { message: 'Bad credentials', documentation_url: 'https://docs.github.com/rest', status: '401' }

Affected surface

All 5 workflows in this repo are upptime-dependent:

  • graphs.yml — daily long-term graphs
  • response-time.yml — hourly response-time histograms
  • static-site.yml — daily static-site rebuild + publish
  • summary.yml — README summary update
  • uptime.yml — actual uptime probes

Each calls upptime/uptime-monitor@v1.41.0 with command: "<phase>". All fail at the same api.github.com 401.

Why this won't self-recover

  • GH_PAT from suspended-org members can't be re-issued (the org is gone).
  • Even a fresh personal GitHub PAT authenticates as that personal account, not the org — and upptime expects to write back to its host repo (molecule-ai/molecule-ai-status), which lives on Gitea, not GitHub. Read-only PAT works for the version-lookup call but the result-commit/issue-create steps still fail because they try to write to api.github.com/repos/molecule-ai/molecule-ai-status/... (which is 404 for our suspended org).
  • Per the directive "everything goes to our git domain," continuing to depend on api.github.com for writes is the wrong direction even if it could be temporarily fixed.

Proper fix — replace upptime with a Gitea-native uptime stack

Per feedback_no_single_source_of_truth + the post-2026-05-06 north-star: vendor-neutral, runs on our own infra. Three viable replacements ranked by effort:

Option A — Uptime Kuma (self-hosted on operator host)

  • Standalone Node/Postgres app, runs as a Docker container on operator host (or its own EC2).
  • Nice UI at status.moleculesai.app. Built-in slack/email/webhook alerting.
  • Recovery story: container restart from compose; data in a single Postgres.
  • Effort: 1-2 hours to stand up, 1 day to migrate the existing probe list from .upptimerc.yml.
  • Footprint: ~50MB container; postgres for state.

Option B — Custom Go probe binary + Gitea Actions cron + Vercel-served static page

  • 200-line Go program that reads .upptimerc.yml (or our own format) and probes the listed endpoints.
  • Cron-scheduled Gitea Actions runs the binary every N minutes, commits results to history/ directory, and triggers a Vercel/Gitea Pages rebuild of the status page.
  • Zero external API dependencies. All reads/writes on git.moleculesai.app.
  • Effort: 4-8 hours initial, then near-zero maintenance.
  • Aligns with: long-term, abstract, SSOT, proper.

Option C — External SaaS (Better Stack, Cronitor, Checkly, Datadog Synthetics)

  • 5-minute setup, no infra to maintain.
  • $20-100/mo depending on probe frequency + features.
  • Trade-off: re-introduces vendor lock-in elsewhere. Less aligned with the post-2026-05-06 self-reliance posture, but lowest engineering effort.

Recommendation

Option B if we want the long-term proper answer (matches the migration philosophy). Option A if we want a working status page within an afternoon. Option C if uptime-monitoring is not strategic and we just want it off our radar.

Whatever we pick, the existing 5 workflows in this repo should be disabled or removed in the same PR — leaving them red is noise, and they'll keep firing on schedule.

Acceptance criteria

  • 0 failing CI runs in molecule-ai-status over a full 7-day window.
  • Status page (wherever we host it) shows current uptime data within 1 hour of probe.
  • Probe list mirrors today's .upptimerc.yml content.
  • api.github.com is reachable from zero of our scheduled jobs (verified via outbound traffic audit).

Out of scope

  • Migrating historical uptime data (history/ JSON files) — track separately if archive value warrants it.
  • Status badge embedding in other repos' READMEs — easy follow-up after the new stack is up.

Reporter

Hongming asked "why this CICD red" pointing at run #70. Diagnosed via the Gitea web log endpoint (/{owner}/{repo}/actions/runs/{id}/jobs/{idx}/logs works with admin-scope token, contradicting earlier 404s under the persona-token v2 contract). 2026-05-08.

## Summary Every Gitea Actions run in `molecule-ai-status` fails because `upptime/uptime-monitor` calls `api.github.com` and gets `401 Bad credentials`. Diagnosis: upptime is structurally GitHub-coupled (assumes GitHub Pages + GitHub Actions + GitHub API for releases, issues, result-commits, status badges). Post the 2026-05-06 GitHub org suspension, no token in our org authenticates there anymore. `secrets.GITHUB_TOKEN` is auto-populated by Gitea Actions but it's a *Gitea* token — `api.github.com` rejects it. ## Failing run https://git.moleculesai.app/molecule-ai/molecule-ai-status/actions/runs/70/jobs/0 Last log lines: ``` ::error::Bad credentials ❌ Failure - Main upptime/uptime-monitor@v1.41.0 exitcode '1': failure 🏁 Job failed Job 'Update response time graphs' failed ``` The action's HTTP call: ``` url: 'https://api.github.com/repos/upptime/uptime-monitor/releases?per_page=1' status: 401 data: { message: 'Bad credentials', documentation_url: 'https://docs.github.com/rest', status: '401' } ``` ## Affected surface All 5 workflows in this repo are upptime-dependent: - `graphs.yml` — daily long-term graphs - `response-time.yml` — hourly response-time histograms - `static-site.yml` — daily static-site rebuild + publish - `summary.yml` — README summary update - `uptime.yml` — actual uptime probes Each calls `upptime/uptime-monitor@v1.41.0` with `command: "<phase>"`. All fail at the same `api.github.com` 401. ## Why this won't self-recover - `GH_PAT` from suspended-org members can't be re-issued (the org is gone). - Even a fresh personal GitHub PAT authenticates as that personal account, not the org — and upptime expects to write back to its host repo (`molecule-ai/molecule-ai-status`), which lives on Gitea, not GitHub. Read-only PAT works for the version-lookup call but the result-commit/issue-create steps still fail because they try to write to `api.github.com/repos/molecule-ai/molecule-ai-status/...` (which is 404 for our suspended org). - Per the directive "everything goes to our git domain," continuing to depend on `api.github.com` for writes is the wrong direction even if it could be temporarily fixed. ## Proper fix — replace upptime with a Gitea-native uptime stack Per `feedback_no_single_source_of_truth` + the post-2026-05-06 north-star: vendor-neutral, runs on our own infra. Three viable replacements ranked by effort: ### Option A — `Uptime Kuma` (self-hosted on operator host) - Standalone Node/Postgres app, runs as a Docker container on operator host (or its own EC2). - Nice UI at `status.moleculesai.app`. Built-in slack/email/webhook alerting. - Recovery story: container restart from compose; data in a single Postgres. - **Effort**: 1-2 hours to stand up, 1 day to migrate the existing probe list from `.upptimerc.yml`. - **Footprint**: ~50MB container; postgres for state. ### Option B — Custom Go probe binary + Gitea Actions cron + Vercel-served static page - 200-line Go program that reads `.upptimerc.yml` (or our own format) and probes the listed endpoints. - Cron-scheduled Gitea Actions runs the binary every N minutes, commits results to `history/` directory, and triggers a Vercel/Gitea Pages rebuild of the status page. - Zero external API dependencies. All reads/writes on `git.moleculesai.app`. - **Effort**: 4-8 hours initial, then near-zero maintenance. - **Aligns with**: long-term, abstract, SSOT, proper. ### Option C — External SaaS (Better Stack, Cronitor, Checkly, Datadog Synthetics) - 5-minute setup, no infra to maintain. - $20-100/mo depending on probe frequency + features. - **Trade-off**: re-introduces vendor lock-in elsewhere. Less aligned with the post-2026-05-06 self-reliance posture, but lowest engineering effort. ## Recommendation **Option B** if we want the long-term proper answer (matches the migration philosophy). **Option A** if we want a working status page within an afternoon. **Option C** if uptime-monitoring is not strategic and we just want it off our radar. Whatever we pick, the existing 5 workflows in this repo should be **disabled or removed** in the same PR — leaving them red is noise, and they'll keep firing on schedule. ## Acceptance criteria - 0 failing CI runs in `molecule-ai-status` over a full 7-day window. - Status page (wherever we host it) shows current uptime data within 1 hour of probe. - Probe list mirrors today's `.upptimerc.yml` content. - `api.github.com` is reachable from zero of our scheduled jobs (verified via outbound traffic audit). ## Out of scope - Migrating historical uptime data (`history/` JSON files) — track separately if archive value warrants it. - Status badge embedding in other repos' READMEs — easy follow-up after the new stack is up. ## Reporter Hongming asked "why this CICD red" pointing at run #70. Diagnosed via the Gitea web log endpoint (`/{owner}/{repo}/actions/runs/{id}/jobs/{idx}/logs` works with admin-scope token, contradicting earlier 404s under the persona-token v2 contract). 2026-05-08.
Author
Owner

Phase 1 + 2 done — page live, awaiting one DNS record

Built

Artifact Location Status
Probe binary molecule-ai-uptime-probe repo (initial commit 9e8511f) Built, smoke-tested against real .upptimerc.yml (7/7 endpoints green, 148-357ms latency)
Cron workflow This repo, PR #4 Open, mergeable. .github/workflows/uptime-probe.yml runs every 5 min
Static status page This repo site/ (PR #4) + Vercel project molecule-ai-status Deployed at https://molecule-ai-status-bz12p53o9-molecule-ai.vercel.app/
Vercel rewrites site/vercel.json /data/*git.moleculesai.app/.../raw/branch/main/* (works around Gitea raw-URL CORS)
Custom domain attached Vercel project molecule-ai-status status.moleculesai.app registered (verified=true)

Architecture (loose-coupled)

.upptimerc.yml ─→ uptime-probe binary ─→ JSONL results
                                                │
                          Gitea Actions cron ───┤   (every 5 min)
                          (this repo)           │
                                                │
                                         history/<slug>.jsonl
                                                │
                                          ┌─────┴─────┐
                                          │           │
                                          ▼           ▼
                                Vercel rewrite     Vercel-served
                                /data/* proxy      static page
                                         (worked around                
                                          Gitea CORS gap)
                                                │
                                                ▼
                                  status.moleculesai.app
                                  (DNS update needed below)

One DNS update needed

status.moleculesai.app today CNAMEs to molecule-ai.github.io — the dead GitHub Pages from the old upptime site. Need to flip it to point at Vercel.

In Cloudflare → moleculesai.app → DNS → edit the existing status CNAME:

Field New value
Type CNAME
Name status
Target cname.vercel-dns.com
Proxy status DNS only (grey cloud) — Vercel handles its own TLS termination
TTL Auto

Same Zone:DNS:Edit token-scope gap as the go.moleculesai.app responder deploy (responder#1) — I can't do this via API. ~20 seconds in the dashboard.

Verification once DNS lands

# 1. Resolution flips off GitHub Pages onto Vercel
dig +short status.moleculesai.app   # was 185.199.108.153 (github.io); should become Vercel anycast

# 2. Page loads under the canonical domain
curl -s https://status.moleculesai.app/ | grep -q "Molecules AI · Status"

# 3. Vercel rewrite works against the live data
curl -s https://status.moleculesai.app/data/.upptimerc.yml | head -3

After PR #4 merges and the cron runs at least once, the page will render real probe data. Until then the page loads but shows "no probe data yet" for each site.

What's left (smaller follow-ups)

  • Once DNS lands and PR #4 merges, observe one full day's cron firings and confirm steady-state success.
  • Migrate / archive the old history/*.json files (upptime-format) — separate decision.
  • Alerting (Slack/email/Telegram on N consecutive failures) — wait for real-world false-positive rates first.
  • Status-page smoke test as a Vercel deploy gate (probe a known-good URL before promoting a deploy) — nice-to-have.

Generated with Claude Code.

## Phase 1 + 2 done — page live, awaiting one DNS record ### Built | Artifact | Location | Status | |---|---|---| | Probe binary | [molecule-ai-uptime-probe](https://git.moleculesai.app/molecule-ai/molecule-ai-uptime-probe) repo (initial commit `9e8511f`) | ✅ Built, smoke-tested against real `.upptimerc.yml` (7/7 endpoints green, 148-357ms latency) | | Cron workflow | This repo, PR #4 | ✅ Open, mergeable. `.github/workflows/uptime-probe.yml` runs every 5 min | | Static status page | This repo `site/` (PR #4) + Vercel project `molecule-ai-status` | ✅ Deployed at https://molecule-ai-status-bz12p53o9-molecule-ai.vercel.app/ | | Vercel rewrites | `site/vercel.json` | ✅ `/data/*` → `git.moleculesai.app/.../raw/branch/main/*` (works around Gitea raw-URL CORS) | | Custom domain attached | Vercel project `molecule-ai-status` | ✅ `status.moleculesai.app` registered (verified=true) | ### Architecture (loose-coupled) ``` .upptimerc.yml ─→ uptime-probe binary ─→ JSONL results │ Gitea Actions cron ───┤ (every 5 min) (this repo) │ │ history/<slug>.jsonl │ ┌─────┴─────┐ │ │ ▼ ▼ Vercel rewrite Vercel-served /data/* proxy static page (worked around Gitea CORS gap) │ ▼ status.moleculesai.app (DNS update needed below) ``` ### One DNS update needed `status.moleculesai.app` today CNAMEs to `molecule-ai.github.io` — the dead GitHub Pages from the old upptime site. Need to flip it to point at Vercel. In Cloudflare → moleculesai.app → DNS → edit the existing `status` CNAME: | Field | New value | |---|---| | Type | CNAME | | Name | `status` | | Target | `cname.vercel-dns.com` | | Proxy status | DNS only (grey cloud) — Vercel handles its own TLS termination | | TTL | Auto | Same Zone:DNS:Edit token-scope gap as the `go.moleculesai.app` responder deploy (responder#1) — I can't do this via API. ~20 seconds in the dashboard. ### Verification once DNS lands ```bash # 1. Resolution flips off GitHub Pages onto Vercel dig +short status.moleculesai.app # was 185.199.108.153 (github.io); should become Vercel anycast # 2. Page loads under the canonical domain curl -s https://status.moleculesai.app/ | grep -q "Molecules AI · Status" # 3. Vercel rewrite works against the live data curl -s https://status.moleculesai.app/data/.upptimerc.yml | head -3 ``` After PR #4 merges and the cron runs at least once, the page will render real probe data. Until then the page loads but shows "no probe data yet" for each site. ### What's left (smaller follow-ups) - Once DNS lands and PR #4 merges, observe one full day's cron firings and confirm steady-state success. - Migrate / archive the old `history/*.json` files (upptime-format) — separate decision. - Alerting (Slack/email/Telegram on N consecutive failures) — wait for real-world false-positive rates first. - Status-page smoke test as a Vercel deploy gate (probe a known-good URL before promoting a deploy) — nice-to-have. Generated with Claude Code.
Sign in to join this conversation.
No Label
No Milestone
No project
No Assignees
1 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: molecule-ai/molecule-ai-status#2
No description provided.