[main-red] molecule-ai/molecule-core: 50720fb84a #1776

Closed
opened 2026-05-24 05:15:56 +00:00 by gitea-actions · 3 comments

Main is RED on molecule-ai/molecule-core at 50720fb84a

Commit: https://git.moleculesai.app/molecule-ai/molecule-core/commit/50720fb84aa416d6bddb9f8246790fa7ea098c0f

Auto-filed by .gitea/workflows/main-red-watchdog.yml (Option C of the main-never-red directive). Per feedback_no_such_thing_as_flakes + feedback_fix_root_not_symptom: investigate the root cause; do NOT revert as a reflex. The watchdog itself never reverts.

Failed status contexts

  • publish-workspace-server-image / Production auto-deploy (push)failurelogs
    • Failing after 30m14s

Resolution path

  1. Read the failed logs (links above).
  2. If reproducible locally, fix forward in a PR targeting main.
  3. If the failure is a real flake — STOP. Per feedback_no_such_thing_as_flakes, intermittent failures are real bugs. Investigate to root cause; do not mark as flake.
  4. If the failure is blocking unrelated work for >1 hour, file a follow-up issue and assign someone. Do NOT revert without a human GO per feedback_prod_apply_needs_hongming_chat_go (branch protection is a prod surface).

Debug

{
  "all_contexts": [
    {
      "context": "CI / Python Lint & Test (push)",
      "state": "success"
    },
    {
      "context": "E2E API Smoke Test / detect-changes (push)",
      "state": "success"
    },
    {
      "context": "E2E Chat / detect-changes (push)",
      "state": "success"
    },
    {
      "context": "E2E Staging Canvas (Playwright) / detect-changes (push)",
      "state": "success"
    },
    {
      "context": "Handlers Postgres Integration / detect-changes (push)",
      "state": "success"
    },
    {
      "context": "CI / all-required (push)",
      "state": "success"
    },
    {
      "context": "Lint curl status-code capture / Scan workflows for curl status-capture pollution (push)",
      "state": "success"
    },
    {
      "context": "Lint forbidden tenant-env keys / Scan workspace_secrets writers for forbidden env keys (push)",
      "state": "success"
    },
    {
      "context": "Lint no tenant GITEA or GITHUB token write / Scan for repo-host token write into tenant workspace surface (push)",
      "state": "success"
    },
    {
      "context": "lint-required-workflows-docker-host-pinned / Lint docker-host pin on docker-touching workflows (push)",
      "state": "success"
    },
    {
      "context": "lint-continue-on-error-tracking / lint-continue-on-error-tracking (push)",
      "state": "success"
    },
    {
      "context": "Secret scan / Scan diff for credential-shaped strings (push)",
      "state": "success"
    },
    {
      "context": "Lint workflow YAML (Gitea-1.22.6-hostile shapes) / Lint workflow YAML for Gitea-1.22.6-hostile shapes (push)",
      "state": "success"
    },
    {
      "context": "Ops Scripts Tests / Ops scripts (unittest) (push)",
      "state": "success"
    },
    {
      "context": "Sweep stale AWS Secrets Manager secrets / Sweep AWS Secrets Manager (push)",
      "state": "success"
    },
    {
      "context": "Continuous synthetic E2E (staging) / Synthetic E2E against staging (push)",
      "state": "pending"
    },
    {
      "context": "publish-workspace-server-image / Production auto-deploy (push)",
      "state": "failure"
    },
    {
      "context": "Sweep stale Cloudflare Tunnels / Sweep CF tunnels (push)",
      "state": "success"
    },
    {
      "context": "CI / Platform (Go) (push)",
      "state": "success"
    },
    {
      "context": "CI / Canvas (Next.js) (push)",
      "state": "success"
    },
    {
      "context": "CI / Shellcheck (E2E scripts) (push)",
      "state": "success"
    },
    {
      "context": "E2E API Smoke Test / E2E API Smoke Test (push)",
      "state": "success"
    },
    {
      "context": "E2E Chat / E2E Chat (push)",
      "state": "success"
    },
    {
      "context": "E2E Staging Canvas (Playwright) / Canvas tabs E2E (push)",
      "state": "success"
    },
    {
      "context": "Handlers Postgres Integration / Handlers Postgres Integration (push)",
      "state": "success"
    },
    {
      "context": "Staging SaaS smoke (every 30 min) / Staging SaaS smoke (push)",
      "state": "pending"
    },
    {
      "context": "Sweep stale e2e-* orgs (staging) / Sweep e2e orgs (push)",
      "state": "success"
    },
    {
      "context": "SECRET_PATTERNS drift lint / Detect SECRET_PATTERNS drift (push)",
      "state": "success"
    },
    {
      "context": "CI / Canvas Deploy Reminder (push)",
      "state": "success"
    },
    {
      "context": "main-red-watchdog / watchdog (push)",
      "state": "pending"
    }
  ],
  "branch": "main",
  "combined_state": "failure",
  "failed_contexts": [
    "publish-workspace-server-image / Production auto-deploy (push)"
  ],
  "recheck_combined_state": "failure",
  "recheck_failed_contexts": [
    "publish-workspace-server-image / Production auto-deploy (push)"
  ],
  "sha": "50720fb84aa416d6bddb9f8246790fa7ea098c0f"
}

This issue is idempotent: the watchdog runs hourly at :05 and edits this body in place. When main returns to green, the watchdog will close this issue automatically with a "main returned to green" comment.

# Main is RED on `molecule-ai/molecule-core` at `50720fb84a` Commit: <https://git.moleculesai.app/molecule-ai/molecule-core/commit/50720fb84aa416d6bddb9f8246790fa7ea098c0f> Auto-filed by `.gitea/workflows/main-red-watchdog.yml` (Option C of the [main-never-red directive](https://git.moleculesai.app/molecule-ai/molecule-core/issues/420)). Per `feedback_no_such_thing_as_flakes` + `feedback_fix_root_not_symptom`: investigate the root cause; do NOT revert as a reflex. The watchdog itself never reverts. ## Failed status contexts - **publish-workspace-server-image / Production auto-deploy (push)** — `failure` → [logs](/molecule-ai/molecule-core/actions/runs/83840/jobs/1) - Failing after 30m14s ## Resolution path 1. Read the failed logs (links above). 2. If reproducible locally, fix forward in a PR targeting `main`. 3. If the failure is a real flake — STOP. Per `feedback_no_such_thing_as_flakes`, intermittent failures are real bugs. Investigate to root cause; do not mark as flake. 4. If the failure is blocking unrelated work for >1 hour, file a follow-up issue and assign someone. Do NOT revert without a human GO per `feedback_prod_apply_needs_hongming_chat_go` (branch protection is a prod surface). ## Debug ```json { "all_contexts": [ { "context": "CI / Python Lint & Test (push)", "state": "success" }, { "context": "E2E API Smoke Test / detect-changes (push)", "state": "success" }, { "context": "E2E Chat / detect-changes (push)", "state": "success" }, { "context": "E2E Staging Canvas (Playwright) / detect-changes (push)", "state": "success" }, { "context": "Handlers Postgres Integration / detect-changes (push)", "state": "success" }, { "context": "CI / all-required (push)", "state": "success" }, { "context": "Lint curl status-code capture / Scan workflows for curl status-capture pollution (push)", "state": "success" }, { "context": "Lint forbidden tenant-env keys / Scan workspace_secrets writers for forbidden env keys (push)", "state": "success" }, { "context": "Lint no tenant GITEA or GITHUB token write / Scan for repo-host token write into tenant workspace surface (push)", "state": "success" }, { "context": "lint-required-workflows-docker-host-pinned / Lint docker-host pin on docker-touching workflows (push)", "state": "success" }, { "context": "lint-continue-on-error-tracking / lint-continue-on-error-tracking (push)", "state": "success" }, { "context": "Secret scan / Scan diff for credential-shaped strings (push)", "state": "success" }, { "context": "Lint workflow YAML (Gitea-1.22.6-hostile shapes) / Lint workflow YAML for Gitea-1.22.6-hostile shapes (push)", "state": "success" }, { "context": "Ops Scripts Tests / Ops scripts (unittest) (push)", "state": "success" }, { "context": "Sweep stale AWS Secrets Manager secrets / Sweep AWS Secrets Manager (push)", "state": "success" }, { "context": "Continuous synthetic E2E (staging) / Synthetic E2E against staging (push)", "state": "pending" }, { "context": "publish-workspace-server-image / Production auto-deploy (push)", "state": "failure" }, { "context": "Sweep stale Cloudflare Tunnels / Sweep CF tunnels (push)", "state": "success" }, { "context": "CI / Platform (Go) (push)", "state": "success" }, { "context": "CI / Canvas (Next.js) (push)", "state": "success" }, { "context": "CI / Shellcheck (E2E scripts) (push)", "state": "success" }, { "context": "E2E API Smoke Test / E2E API Smoke Test (push)", "state": "success" }, { "context": "E2E Chat / E2E Chat (push)", "state": "success" }, { "context": "E2E Staging Canvas (Playwright) / Canvas tabs E2E (push)", "state": "success" }, { "context": "Handlers Postgres Integration / Handlers Postgres Integration (push)", "state": "success" }, { "context": "Staging SaaS smoke (every 30 min) / Staging SaaS smoke (push)", "state": "pending" }, { "context": "Sweep stale e2e-* orgs (staging) / Sweep e2e orgs (push)", "state": "success" }, { "context": "SECRET_PATTERNS drift lint / Detect SECRET_PATTERNS drift (push)", "state": "success" }, { "context": "CI / Canvas Deploy Reminder (push)", "state": "success" }, { "context": "main-red-watchdog / watchdog (push)", "state": "pending" } ], "branch": "main", "combined_state": "failure", "failed_contexts": [ "publish-workspace-server-image / Production auto-deploy (push)" ], "recheck_combined_state": "failure", "recheck_failed_contexts": [ "publish-workspace-server-image / Production auto-deploy (push)" ], "sha": "50720fb84aa416d6bddb9f8246790fa7ea098c0f" } ``` _This issue is idempotent: the watchdog runs hourly at `:05` and edits this body in place. When `main` returns to green, the watchdog will close this issue automatically with a "main returned to green" comment._
gitea-actions bot added the tier:high label 2026-05-24 05:15:56 +00:00
Member

MECHANISM: #1776 was a production auto-deploy waiter failure, not an application test regression in commit 50720fb84aa416d6bddb9f8246790fa7ea098c0f. The publish-workspace-server-image / Production auto-deploy job reached .gitea/scripts/prod-auto-deploy.py wait-ci, then timed out after 1800s because required push CI contexts still read pending: CI / Platform (Go), CI / Canvas (Next.js), and CI / Shellcheck (E2E scripts). The commit itself moved CI / all-required onto the ci-meta lane and made that sentinel path-aware, but prod-auto-deploy.py still waits on its fixed required-context list before rollout; when those contexts do not drain inside 30 minutes, production deploy fails closed even though CI / all-required is already green.

EVIDENCE: actions run 83840 job 1 logs Timed out waiting 1800s for required CI contexts with last states CI / Platform (Go)=pending, CI / Canvas (Next.js)=pending, CI / Shellcheck (E2E scripts)=pending, while CI / Python Lint & Test, CI / all-required, and Secret scan were success. .gitea/scripts/prod-auto-deploy.py:334-374 implements the polling loop and raises the timeout with last_states. .gitea/workflows/publish-workspace-server-image.yml:293-297 runs that waiter before any CP rollout call. git show 50720fb84a shows the commit changed .gitea/workflows/ci.yml to run all-required on ci-meta and to wait only for path-relevant CI jobs.

RECOMMENDED FIX SHAPE: Align prod-auto-deploy.py with the same path-aware CI contract used by .gitea/workflows/ci.yml after #1766, or make production deploy wait on the authoritative CI / all-required (push) sentinel plus secret scan instead of a hard-coded pre-#1766 context set. Keep failing closed for real red contexts, but do not block rollout on contexts that the new meta sentinel intentionally no longer requires for that SHA.

MECHANISM: #1776 was a production auto-deploy waiter failure, not an application test regression in commit `50720fb84aa416d6bddb9f8246790fa7ea098c0f`. The `publish-workspace-server-image / Production auto-deploy` job reached `.gitea/scripts/prod-auto-deploy.py wait-ci`, then timed out after 1800s because required push CI contexts still read pending: `CI / Platform (Go)`, `CI / Canvas (Next.js)`, and `CI / Shellcheck (E2E scripts)`. The commit itself moved `CI / all-required` onto the `ci-meta` lane and made that sentinel path-aware, but `prod-auto-deploy.py` still waits on its fixed required-context list before rollout; when those contexts do not drain inside 30 minutes, production deploy fails closed even though `CI / all-required` is already green. EVIDENCE: actions run 83840 job 1 logs `Timed out waiting 1800s for required CI contexts` with last states `CI / Platform (Go)=pending`, `CI / Canvas (Next.js)=pending`, `CI / Shellcheck (E2E scripts)=pending`, while `CI / Python Lint & Test`, `CI / all-required`, and `Secret scan` were success. `.gitea/scripts/prod-auto-deploy.py:334-374` implements the polling loop and raises the timeout with `last_states`. `.gitea/workflows/publish-workspace-server-image.yml:293-297` runs that waiter before any CP rollout call. `git show 50720fb84a` shows the commit changed `.gitea/workflows/ci.yml` to run `all-required` on `ci-meta` and to wait only for path-relevant CI jobs. RECOMMENDED FIX SHAPE: Align `prod-auto-deploy.py` with the same path-aware CI contract used by `.gitea/workflows/ci.yml` after #1766, or make production deploy wait on the authoritative `CI / all-required (push)` sentinel plus secret scan instead of a hard-coded pre-#1766 context set. Keep failing closed for real red contexts, but do not block rollout on contexts that the new meta sentinel intentionally no longer requires for that SHA.

main returned to green at SHA ca9fe8dbfca459f4b4a61f55dcd21fecae6c1b73 (https://git.moleculesai.app/molecule-ai/molecule-core/commit/ca9fe8dbfca459f4b4a61f55dcd21fecae6c1b73). Closing automatically. If the underlying root cause is not yet understood, reopen this issue and file a postmortem — green-by-flake is still a bug per feedback_no_such_thing_as_flakes.

`main` returned to green at SHA `ca9fe8dbfca459f4b4a61f55dcd21fecae6c1b73` (<https://git.moleculesai.app/molecule-ai/molecule-core/commit/ca9fe8dbfca459f4b4a61f55dcd21fecae6c1b73>). Closing automatically. If the underlying root cause is not yet understood, reopen this issue and file a postmortem — green-by-flake is still a bug per `feedback_no_such_thing_as_flakes`.
gitea-actions bot closed this issue 2026-05-26 16:05:53 +00:00
Member

RCA update — prod auto-deploy required-context drift is fixed on current main

RCA — root cause

The #1776 failure was caused by prod-auto-deploy.py wait-ci waiting on a pre-#1766 static required-context list after CI / all-required became the path-aware push sentinel. At that time, production deploy could fail closed while CI / all-required (push) was already green because retired/not-path-required contexts such as Platform/Canvas/Shellcheck stayed pending longer than the deploy waiter timeout.

Evidence

  • #1776 original status matrix shows CI / all-required (push)=success while publish-workspace-server-image / Production auto-deploy (push)=failure.
  • The earlier Researcher comment on #1776 captured the failing waiter state: CI / Platform (Go), CI / Canvas (Next.js), and CI / Shellcheck (E2E scripts) were still pending when wait-ci timed out.
  • Current main has corrected the static list: .gitea/scripts/prod-auto-deploy.py:22-25 defaults to only CI / all-required (push) plus Secret scan / Scan diff for credential-shaped strings (push).
  • .gitea/scripts/prod-auto-deploy.py:126-130 still allows an explicit PROD_AUTO_DEPLOY_REQUIRED_CONTEXTS override, so operators can add external gates without hardcoding retired CI sub-jobs.
  • .gitea/scripts/tests/test_prod_auto_deploy.py:151-155 now locks the intended contract: prod deploy delegates path gating to CI / all-required (push).

Suggested fix

No new PR is needed for #1776 on current main. The concrete list-update PR shape from the original RCA has already landed: keep the default required contexts to CI / all-required (push) and Secret scan only. If #1776 recurs, inspect the workflow/env for a stale PROD_AUTO_DEPLOY_REQUIRED_CONTEXTS override reintroducing Platform/Canvas/Shellcheck, not the helper default.

Confidence

High — current source and regression test no longer reference the retired per-lane CI contexts in the default production deploy waiter.

## RCA update — prod auto-deploy required-context drift is fixed on current `main` ## RCA — root cause The #1776 failure was caused by `prod-auto-deploy.py wait-ci` waiting on a pre-#1766 static required-context list after `CI / all-required` became the path-aware push sentinel. At that time, production deploy could fail closed while `CI / all-required (push)` was already green because retired/not-path-required contexts such as Platform/Canvas/Shellcheck stayed pending longer than the deploy waiter timeout. ## Evidence - #1776 original status matrix shows `CI / all-required (push)=success` while `publish-workspace-server-image / Production auto-deploy (push)=failure`. - The earlier Researcher comment on #1776 captured the failing waiter state: `CI / Platform (Go)`, `CI / Canvas (Next.js)`, and `CI / Shellcheck (E2E scripts)` were still pending when `wait-ci` timed out. - Current `main` has corrected the static list: `.gitea/scripts/prod-auto-deploy.py:22-25` defaults to only `CI / all-required (push)` plus `Secret scan / Scan diff for credential-shaped strings (push)`. - `.gitea/scripts/prod-auto-deploy.py:126-130` still allows an explicit `PROD_AUTO_DEPLOY_REQUIRED_CONTEXTS` override, so operators can add external gates without hardcoding retired CI sub-jobs. - `.gitea/scripts/tests/test_prod_auto_deploy.py:151-155` now locks the intended contract: prod deploy delegates path gating to `CI / all-required (push)`. ## Suggested fix No new PR is needed for #1776 on current `main`. The concrete list-update PR shape from the original RCA has already landed: keep the default required contexts to `CI / all-required (push)` and Secret scan only. If #1776 recurs, inspect the workflow/env for a stale `PROD_AUTO_DEPLOY_REQUIRED_CONTEXTS` override reintroducing Platform/Canvas/Shellcheck, not the helper default. ## Confidence High — current source and regression test no longer reference the retired per-lane CI contexts in the default production deploy waiter.
Sign in to join this conversation.
2 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: molecule-ai/molecule-core#1776