[main-red] molecule-ai/molecule-core: 01087ddbe7 #1681

Closed
opened 2026-05-22 13:07:35 +00:00 by gitea-actions · 2 comments

Main is RED on molecule-ai/molecule-core at 01087ddbe7

Commit: https://git.moleculesai.app/molecule-ai/molecule-core/commit/01087ddbe740456231b75603f6505dd77fd41473

Auto-filed by .gitea/workflows/main-red-watchdog.yml (Option C of the main-never-red directive). Per feedback_no_such_thing_as_flakes + feedback_fix_root_not_symptom: investigate the root cause; do NOT revert as a reflex. The watchdog itself never reverts.

Failed status contexts

  • Railway pin audit (drift detection) / Audit Railway env vars for drift-prone pins (push)failurelogs
    • Failing after 5s

Resolution path

  1. Read the failed logs (links above).
  2. If reproducible locally, fix forward in a PR targeting main.
  3. If the failure is a real flake — STOP. Per feedback_no_such_thing_as_flakes, intermittent failures are real bugs. Investigate to root cause; do not mark as flake.
  4. If the failure is blocking unrelated work for >1 hour, file a follow-up issue and assign someone. Do NOT revert without a human GO per feedback_prod_apply_needs_hongming_chat_go (branch protection is a prod surface).

Debug

{
  "all_contexts": [
    {
      "context": "Secret scan / Scan diff for credential-shaped strings (push)",
      "state": "success"
    },
    {
      "context": "CI / Platform (Go) (push)",
      "state": "success"
    },
    {
      "context": "CI / Canvas (Next.js) (push)",
      "state": "success"
    },
    {
      "context": "CI / Shellcheck (E2E scripts) (push)",
      "state": "success"
    },
    {
      "context": "Lint workflow YAML (Gitea-1.22.6-hostile shapes) / Lint workflow YAML for Gitea-1.22.6-hostile shapes (push)",
      "state": "success"
    },
    {
      "context": "CI / all-required (push)",
      "state": "success"
    },
    {
      "context": "E2E API Smoke Test / E2E API Smoke Test (push)",
      "state": "success"
    },
    {
      "context": "E2E Staging SaaS (full lifecycle) / pr-validate (push)",
      "state": "success"
    },
    {
      "context": "publish-workspace-server-image / Production auto-deploy (push)",
      "state": "success"
    },
    {
      "context": "Handlers Postgres Integration / Handlers Postgres Integration (push)",
      "state": "success"
    },
    {
      "context": "CI / Canvas Deploy Reminder (push)",
      "state": "success"
    },
    {
      "context": "E2E Staging SaaS (full lifecycle) / E2E Staging SaaS (push)",
      "state": "success"
    },
    {
      "context": "E2E Peer Visibility (literal MCP list_peers) / E2E Peer Visibility (local) (push)",
      "state": "success"
    },
    {
      "context": "E2E Peer Visibility (literal MCP list_peers) / E2E Peer Visibility (push)",
      "state": "success"
    },
    {
      "context": "E2E Staging External Runtime / E2E Staging External Runtime (push)",
      "state": "success"
    },
    {
      "context": "E2E Staging Canvas (Playwright) / detect-changes (push)",
      "state": "success"
    },
    {
      "context": "E2E Staging Canvas (Playwright) / Canvas tabs E2E (push)",
      "state": "success"
    },
    {
      "context": "E2E Chat / detect-changes (push)",
      "state": "success"
    },
    {
      "context": "E2E Chat / E2E Chat (push)",
      "state": "success"
    },
    {
      "context": "Railway pin audit (drift detection) / Audit Railway env vars for drift-prone pins (push)",
      "state": "failure"
    },
    {
      "context": "lint-continue-on-error-tracking / lint-continue-on-error-tracking (push)",
      "state": "success"
    },
    {
      "context": "gate-check-v3 / gate-check (push)",
      "state": "success"
    },
    {
      "context": "Sweep stale Cloudflare DNS records / Sweep CF orphans (push)",
      "state": "success"
    },
    {
      "context": "ci-required-drift / drift (push)",
      "state": "success"
    },
    {
      "context": "Sweep stale AWS Secrets Manager secrets / Sweep AWS Secrets Manager (push)",
      "state": "success"
    },
    {
      "context": "Sweep stale Cloudflare Tunnels / Sweep CF tunnels (push)",
      "state": "success"
    },
    {
      "context": "Sweep stale e2e-* orgs (staging) / Sweep e2e orgs (push)",
      "state": "success"
    },
    {
      "context": "Continuous synthetic E2E (staging) / Synthetic E2E against staging (push)",
      "state": "pending"
    },
    {
      "context": "Staging SaaS smoke (every 30 min) / Staging SaaS smoke (push)",
      "state": "success"
    },
    {
      "context": "main-red-watchdog / watchdog (push)",
      "state": "pending"
    }
  ],
  "branch": "main",
  "combined_state": "failure",
  "failed_contexts": [
    "Railway pin audit (drift detection) / Audit Railway env vars for drift-prone pins (push)"
  ],
  "recheck_combined_state": "failure",
  "recheck_failed_contexts": [
    "Railway pin audit (drift detection) / Audit Railway env vars for drift-prone pins (push)"
  ],
  "sha": "01087ddbe740456231b75603f6505dd77fd41473"
}

This issue is idempotent: the watchdog runs hourly at :05 and edits this body in place. When main returns to green, the watchdog will close this issue automatically with a "main returned to green" comment.

# Main is RED on `molecule-ai/molecule-core` at `01087ddbe7` Commit: <https://git.moleculesai.app/molecule-ai/molecule-core/commit/01087ddbe740456231b75603f6505dd77fd41473> Auto-filed by `.gitea/workflows/main-red-watchdog.yml` (Option C of the [main-never-red directive](https://git.moleculesai.app/molecule-ai/molecule-core/issues/420)). Per `feedback_no_such_thing_as_flakes` + `feedback_fix_root_not_symptom`: investigate the root cause; do NOT revert as a reflex. The watchdog itself never reverts. ## Failed status contexts - **Railway pin audit (drift detection) / Audit Railway env vars for drift-prone pins (push)** — `failure` → [logs](/molecule-ai/molecule-core/actions/runs/79423/jobs/0) - Failing after 5s ## Resolution path 1. Read the failed logs (links above). 2. If reproducible locally, fix forward in a PR targeting `main`. 3. If the failure is a real flake — STOP. Per `feedback_no_such_thing_as_flakes`, intermittent failures are real bugs. Investigate to root cause; do not mark as flake. 4. If the failure is blocking unrelated work for >1 hour, file a follow-up issue and assign someone. Do NOT revert without a human GO per `feedback_prod_apply_needs_hongming_chat_go` (branch protection is a prod surface). ## Debug ```json { "all_contexts": [ { "context": "Secret scan / Scan diff for credential-shaped strings (push)", "state": "success" }, { "context": "CI / Platform (Go) (push)", "state": "success" }, { "context": "CI / Canvas (Next.js) (push)", "state": "success" }, { "context": "CI / Shellcheck (E2E scripts) (push)", "state": "success" }, { "context": "Lint workflow YAML (Gitea-1.22.6-hostile shapes) / Lint workflow YAML for Gitea-1.22.6-hostile shapes (push)", "state": "success" }, { "context": "CI / all-required (push)", "state": "success" }, { "context": "E2E API Smoke Test / E2E API Smoke Test (push)", "state": "success" }, { "context": "E2E Staging SaaS (full lifecycle) / pr-validate (push)", "state": "success" }, { "context": "publish-workspace-server-image / Production auto-deploy (push)", "state": "success" }, { "context": "Handlers Postgres Integration / Handlers Postgres Integration (push)", "state": "success" }, { "context": "CI / Canvas Deploy Reminder (push)", "state": "success" }, { "context": "E2E Staging SaaS (full lifecycle) / E2E Staging SaaS (push)", "state": "success" }, { "context": "E2E Peer Visibility (literal MCP list_peers) / E2E Peer Visibility (local) (push)", "state": "success" }, { "context": "E2E Peer Visibility (literal MCP list_peers) / E2E Peer Visibility (push)", "state": "success" }, { "context": "E2E Staging External Runtime / E2E Staging External Runtime (push)", "state": "success" }, { "context": "E2E Staging Canvas (Playwright) / detect-changes (push)", "state": "success" }, { "context": "E2E Staging Canvas (Playwright) / Canvas tabs E2E (push)", "state": "success" }, { "context": "E2E Chat / detect-changes (push)", "state": "success" }, { "context": "E2E Chat / E2E Chat (push)", "state": "success" }, { "context": "Railway pin audit (drift detection) / Audit Railway env vars for drift-prone pins (push)", "state": "failure" }, { "context": "lint-continue-on-error-tracking / lint-continue-on-error-tracking (push)", "state": "success" }, { "context": "gate-check-v3 / gate-check (push)", "state": "success" }, { "context": "Sweep stale Cloudflare DNS records / Sweep CF orphans (push)", "state": "success" }, { "context": "ci-required-drift / drift (push)", "state": "success" }, { "context": "Sweep stale AWS Secrets Manager secrets / Sweep AWS Secrets Manager (push)", "state": "success" }, { "context": "Sweep stale Cloudflare Tunnels / Sweep CF tunnels (push)", "state": "success" }, { "context": "Sweep stale e2e-* orgs (staging) / Sweep e2e orgs (push)", "state": "success" }, { "context": "Continuous synthetic E2E (staging) / Synthetic E2E against staging (push)", "state": "pending" }, { "context": "Staging SaaS smoke (every 30 min) / Staging SaaS smoke (push)", "state": "success" }, { "context": "main-red-watchdog / watchdog (push)", "state": "pending" } ], "branch": "main", "combined_state": "failure", "failed_contexts": [ "Railway pin audit (drift detection) / Audit Railway env vars for drift-prone pins (push)" ], "recheck_combined_state": "failure", "recheck_failed_contexts": [ "Railway pin audit (drift detection) / Audit Railway env vars for drift-prone pins (push)" ], "sha": "01087ddbe740456231b75603f6505dd77fd41473" } ``` _This issue is idempotent: the watchdog runs hourly at `:05` and edits this body in place. When `main` returns to green, the watchdog will close this issue automatically with a "main returned to green" comment._
gitea-actions bot added the tier:high label 2026-05-22 13:07:35 +00:00
Member

RCA — root cause

#1681 is not a product regression: every build/test/deploy context was green and only the scheduled Railway drift-audit context failed. The failure duration (5s) matches the workflow's first guard step, which hard-fails when RAILWAY_AUDIT_TOKEN is absent; additionally, the workflow relies on job-level continue-on-error: true, but Gitea still records the check context as failure, so this non-product audit can make main red.

Evidence

  • .gitea/workflows/railway-pin-audit.yml:61-73 — first step checks RAILWAY_AUDIT_TOKEN and exits 1 with a provisioning error when missing.
  • .gitea/workflows/railway-pin-audit.yml:75-122 — Railway CLI install/auth/link/audit only run if the secret check output is true, so a 5s failure is almost certainly the secret guard, not drift detection.
  • .gitea/workflows/railway-pin-audit.yml:53-55 — job is marked continue-on-error: true, but issue debug shows Gitea still published the context as failure.
  • scripts/ops/audit-railway-sha-pins.sh:21-24 — real audit exit codes distinguish drift (1) from CLI auth/link (2), but the workflow never reaches that script when the repo secret is missing.
  • Issue debug — CI / Platform, CI / Canvas, CI / all-required, E2E, and production auto-deploy were all success; only Railway pin audit was failure.

Suggested fix

Route to CI/ops secret provisioning and workflow-status hardening in molecule-core. Provision RAILWAY_AUDIT_TOKEN with read-only Railway variables scope if the daily audit should be active. If this audit is intentionally advisory, do not depend on Gitea honoring job-level continue-on-error; split the secret-presence check into a neutral/skipped status or make the workflow file a non-required advisory context that reports drift by opening/updating an issue without turning branch/main health red. Keep actual drift (rc=1) visible, but classify missing audit credentials as ops-config debt rather than a product main-red.

Confidence

Medium-high — the 5s runtime and workflow ordering point directly at the missing-secret guard. Access to the raw job log would confirm the exact RAILWAY_AUDIT_TOKEN secret missing line.

## RCA — root cause `#1681` is not a product regression: every build/test/deploy context was green and only the scheduled Railway drift-audit context failed. The failure duration (`5s`) matches the workflow's first guard step, which hard-fails when `RAILWAY_AUDIT_TOKEN` is absent; additionally, the workflow relies on job-level `continue-on-error: true`, but Gitea still records the check context as `failure`, so this non-product audit can make main red. ## Evidence - `.gitea/workflows/railway-pin-audit.yml:61-73` — first step checks `RAILWAY_AUDIT_TOKEN` and exits 1 with a provisioning error when missing. - `.gitea/workflows/railway-pin-audit.yml:75-122` — Railway CLI install/auth/link/audit only run if the secret check output is true, so a 5s failure is almost certainly the secret guard, not drift detection. - `.gitea/workflows/railway-pin-audit.yml:53-55` — job is marked `continue-on-error: true`, but issue debug shows Gitea still published the context as `failure`. - `scripts/ops/audit-railway-sha-pins.sh:21-24` — real audit exit codes distinguish drift (`1`) from CLI auth/link (`2`), but the workflow never reaches that script when the repo secret is missing. - Issue debug — `CI / Platform`, `CI / Canvas`, `CI / all-required`, E2E, and production auto-deploy were all `success`; only Railway pin audit was `failure`. ## Suggested fix Route to CI/ops secret provisioning and workflow-status hardening in `molecule-core`. Provision `RAILWAY_AUDIT_TOKEN` with read-only Railway variables scope if the daily audit should be active. If this audit is intentionally advisory, do not depend on Gitea honoring job-level `continue-on-error`; split the secret-presence check into a neutral/skipped status or make the workflow file a non-required advisory context that reports drift by opening/updating an issue without turning branch/main health red. Keep actual drift (`rc=1`) visible, but classify missing audit credentials as ops-config debt rather than a product main-red. ## Confidence Medium-high — the 5s runtime and workflow ordering point directly at the missing-secret guard. Access to the raw job log would confirm the exact `RAILWAY_AUDIT_TOKEN secret missing` line.

main returned to green at SHA ca9fe8dbfca459f4b4a61f55dcd21fecae6c1b73 (https://git.moleculesai.app/molecule-ai/molecule-core/commit/ca9fe8dbfca459f4b4a61f55dcd21fecae6c1b73). Closing automatically. If the underlying root cause is not yet understood, reopen this issue and file a postmortem — green-by-flake is still a bug per feedback_no_such_thing_as_flakes.

`main` returned to green at SHA `ca9fe8dbfca459f4b4a61f55dcd21fecae6c1b73` (<https://git.moleculesai.app/molecule-ai/molecule-core/commit/ca9fe8dbfca459f4b4a61f55dcd21fecae6c1b73>). Closing automatically. If the underlying root cause is not yet understood, reopen this issue and file a postmortem — green-by-flake is still a bug per `feedback_no_such_thing_as_flakes`.
gitea-actions bot closed this issue 2026-05-26 16:05:56 +00:00
Sign in to join this conversation.
2 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: molecule-ai/molecule-core#1681