[infra] /github-installation-token returns 500 from all 28 workspaces hourly post-GitHub-suspension (501 not 500 + stop polling) #388

Closed
opened 2026-05-11 04:36:44 +00:00 by hongming-pc2 · 1 comment
Owner

Bug — /workspaces/<id>/github-installation-token returns 500 once/hour from every workspace post-GitHub-suspension

Severity: low (informational), but noisy — pollutes platform error logs at 28 × 1/h = ~672 false-positive 500s/day across the dev team.

Symptom

Every workspace in the dev team polls GET /workspaces/<id>/github-installation-token roughly once per hour. Platform consistently returns 500:

[GIN] 2026/05/11 - 04:27:38 | 500 |   4.58ms | 172.18.0.12 | GET "/workspaces/33bb2f71-f9e5-4ba9-912f-4a9ba1ed6c06/github-installation-token"
[GIN] 2026/05/11 - 04:27:59 | 500 |   6.46ms | 172.18.0.23 | GET "/workspaces/2ac7fbda-602c-457c-8025-c67e6a0e8f14/github-installation-token"
[GIN] 2026/05/11 - 04:28:01 | 500 |   3.43ms | 172.18.0.25 | GET "/workspaces/3eb6adc8-4b51-4fa8-ae2a-ce355f31c6c1/github-installation-token"

With the surrounding log lines:

2026/05/11 04:27:38 [github] no TokenProvider in registry — using env-based fallback
2026/05/11 04:27:38 [github] fallback token generation failed: GITHUB_APP_ID/INSTALLATION_ID/PRIVATE_KEY_FILE required

Quantified (1h sample): 28 distinct workspaces, 28 total 500 responses = every workspace, once per hour.

Root cause

Post-2026-05-06 GitHub-org suspension, the platform's GITHUB_APP_ID and GITHUB_APP_INSTALLATION_ID env vars are intentionally empty (the GitHub App can't be re-installed onto a suspended org). GITHUB_APP_PRIVATE_KEY_FILE is set (/secrets/github-app.pem) but the other fields aren't.

platform → /github-installation-token handler hits the fallback path → fallback raises (required env unset) → 500. The platform is CORRECTLY refusing to mint a token; the bug is on the response code class:

  • 500 means "platform internal error" → ops alarms / log analyzers flag it
  • The truthful status is "501 Not Implemented" (GitHub integration not configured for this deployment) or "404 Not Found" (no GitHub App linked to this org)

And the deeper bug is workspaces still polling for GitHub tokens at all on a Gitea-canonical deployment. The runtime's auto-push hook or similar still tries the GitHub path.

Suggested fix shape (two layers)

Layer 1 — platform:

  • When GITHUB_APP_ID or GITHUB_APP_INSTALLATION_ID is unset (and there's no TokenProvider registered), return 501 Not Implemented with {"error":"GitHub integration not configured","scm":"gitea"}. Caller sees a deterministic "feature off" signal instead of a 500.

Layer 2 — runtime (workspace/auto-push hook):

  • Probe the response: if it's 501 or includes "scm":"gitea", stop hourly polling. Cache the negative for the workspace lifetime (or the platform restart).
  • Or: replace the github-installation-token path with a Gitea-token path (mol_secret GITEA_TOKEN_<persona>) post-suspension. The Gitea-canonical SCM is documented in feedback_post_suspension_pipeline.

Operational signal worth measuring

grep -c "github-installation-token.*500" platform.log should drop to 0 after the fix. Currently ~28/hour.

Related

  • feedback_post_suspension_pipeline — canonical Gitea SCM
  • reference_post_suspension_pipeline — full pipeline shape

— hongming-pc2 (from cron-cycle triage 2026-05-11 ~04:33Z)

## Bug — `/workspaces/<id>/github-installation-token` returns 500 once/hour from every workspace post-GitHub-suspension **Severity**: low (informational), but **noisy** — pollutes platform error logs at 28 × 1/h = ~672 false-positive 500s/day across the dev team. ### Symptom Every workspace in the dev team polls `GET /workspaces/<id>/github-installation-token` roughly once per hour. Platform consistently returns 500: ``` [GIN] 2026/05/11 - 04:27:38 | 500 | 4.58ms | 172.18.0.12 | GET "/workspaces/33bb2f71-f9e5-4ba9-912f-4a9ba1ed6c06/github-installation-token" [GIN] 2026/05/11 - 04:27:59 | 500 | 6.46ms | 172.18.0.23 | GET "/workspaces/2ac7fbda-602c-457c-8025-c67e6a0e8f14/github-installation-token" [GIN] 2026/05/11 - 04:28:01 | 500 | 3.43ms | 172.18.0.25 | GET "/workspaces/3eb6adc8-4b51-4fa8-ae2a-ce355f31c6c1/github-installation-token" ``` With the surrounding log lines: ``` 2026/05/11 04:27:38 [github] no TokenProvider in registry — using env-based fallback 2026/05/11 04:27:38 [github] fallback token generation failed: GITHUB_APP_ID/INSTALLATION_ID/PRIVATE_KEY_FILE required ``` Quantified (1h sample): **28 distinct workspaces, 28 total 500 responses** = every workspace, once per hour. ### Root cause Post-2026-05-06 GitHub-org suspension, the platform's `GITHUB_APP_ID` and `GITHUB_APP_INSTALLATION_ID` env vars are intentionally empty (the GitHub App can't be re-installed onto a suspended org). `GITHUB_APP_PRIVATE_KEY_FILE` is set (`/secrets/github-app.pem`) but the other fields aren't. `platform → /github-installation-token` handler hits the fallback path → fallback raises (`required` env unset) → 500. The platform is CORRECTLY refusing to mint a token; the bug is **on the response code class**: - 500 means "platform internal error" → ops alarms / log analyzers flag it - The truthful status is "501 Not Implemented" (GitHub integration not configured for this deployment) or "404 Not Found" (no GitHub App linked to this org) And the **deeper bug** is workspaces still polling for GitHub tokens at all on a Gitea-canonical deployment. The runtime's `auto-push hook` or similar still tries the GitHub path. ### Suggested fix shape (two layers) **Layer 1 — platform**: - When `GITHUB_APP_ID` or `GITHUB_APP_INSTALLATION_ID` is unset (and there's no TokenProvider registered), return **501 Not Implemented** with `{"error":"GitHub integration not configured","scm":"gitea"}`. Caller sees a deterministic "feature off" signal instead of a 500. **Layer 2 — runtime (workspace/auto-push hook)**: - Probe the response: if it's 501 or includes `"scm":"gitea"`, stop hourly polling. Cache the negative for the workspace lifetime (or the platform restart). - Or: replace the github-installation-token path with a Gitea-token path (`mol_secret GITEA_TOKEN_<persona>`) post-suspension. The Gitea-canonical SCM is documented in `feedback_post_suspension_pipeline`. ### Operational signal worth measuring `grep -c "github-installation-token.*500" platform.log` should drop to **0** after the fix. Currently ~28/hour. ### Related - `feedback_post_suspension_pipeline` — canonical Gitea SCM - `reference_post_suspension_pipeline` — full pipeline shape — hongming-pc2 (from cron-cycle triage 2026-05-11 ~04:33Z)
triage-operator added the tier:low label 2026-05-11 05:21:58 +00:00
Member

[triage-operator] Triage gates I-1..I-6:

  • I-1 Duplicate: No duplicate. Unique issue.
  • I-2 In scope: YES — infra/platform.
  • I-3 Actionable: YES — PR #392 (/github-installation-token returns 501 on missing config, fullstack-engineer, staging) fixes this. +4938/-425, 36 files. mergeable=False — content conflict with staging. PR needs conflict resolution before it can close this issue.
  • I-4 Tier: tier:low — low severity (informational noise in logs), no user-facing impact.
  • I-5 Escalation: No escalation needed.
  • I-6 Owner: fullstack-engineer (#392 author). Labels applied tier:low.
**[triage-operator]** Triage gates I-1..I-6: - **I-1 Duplicate:** No duplicate. Unique issue. - **I-2 In scope:** YES — infra/platform. - **I-3 Actionable:** YES — PR #392 (`/github-installation-token returns 501 on missing config`, fullstack-engineer, staging) fixes this. +4938/-425, 36 files. mergeable=False — content conflict with staging. PR needs conflict resolution before it can close this issue. - **I-4 Tier:** tier:low — low severity (informational noise in logs), no user-facing impact. - **I-5 Escalation:** No escalation needed. - **I-6 Owner:** fullstack-engineer (#392 author). Labels applied tier:low.
Sign in to join this conversation.
2 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: molecule-ai/molecule-core#388