[main-red] molecule-ai/molecule-core: 4d32736e25 #1757

Closed
opened 2026-05-24 00:51:07 +00:00 by gitea-actions · 2 comments

Main is RED on molecule-ai/molecule-core at 4d32736e25

Commit: https://git.moleculesai.app/molecule-ai/molecule-core/commit/4d32736e2503b534e43230318cbaeb03eb9d0b7f

Auto-filed by .gitea/workflows/main-red-watchdog.yml (Option C of the main-never-red directive). Per feedback_no_such_thing_as_flakes + feedback_fix_root_not_symptom: investigate the root cause; do NOT revert as a reflex. The watchdog itself never reverts.

Failed status contexts

  • E2E Chat / E2E Chat (push)failurelogs
    • Failing after 3m30s

Resolution path

  1. Read the failed logs (links above).
  2. If reproducible locally, fix forward in a PR targeting main.
  3. If the failure is a real flake — STOP. Per feedback_no_such_thing_as_flakes, intermittent failures are real bugs. Investigate to root cause; do not mark as flake.
  4. If the failure is blocking unrelated work for >1 hour, file a follow-up issue and assign someone. Do NOT revert without a human GO per feedback_prod_apply_needs_hongming_chat_go (branch protection is a prod surface).

Debug

{
  "all_contexts": [
    {
      "context": "Block internal-flavored paths / Block forbidden paths (push)",
      "state": "success"
    },
    {
      "context": "CI / Python Lint & Test (push)",
      "state": "success"
    },
    {
      "context": "E2E API Smoke Test / detect-changes (push)",
      "state": "success"
    },
    {
      "context": "E2E Staging Canvas (Playwright) / detect-changes (push)",
      "state": "success"
    },
    {
      "context": "E2E Chat / detect-changes (push)",
      "state": "success"
    },
    {
      "context": "Handlers Postgres Integration / detect-changes (push)",
      "state": "success"
    },
    {
      "context": "Harness Replays / detect-changes (push)",
      "state": "success"
    },
    {
      "context": "Lint forbidden tenant-env keys / Scan workspace_secrets writers for forbidden env keys (push)",
      "state": "success"
    },
    {
      "context": "Lint no tenant GITEA or GITHUB token write / Scan for repo-host token write into tenant workspace surface (push)",
      "state": "success"
    },
    {
      "context": "Secret scan / Scan diff for credential-shaped strings (push)",
      "state": "success"
    },
    {
      "context": "Sweep stale e2e-* orgs (staging) / Sweep e2e orgs (push)",
      "state": "success"
    },
    {
      "context": "E2E Staging SaaS (full lifecycle) / pr-validate (push)",
      "state": "success"
    },
    {
      "context": "Sweep stale AWS Secrets Manager secrets / Sweep AWS Secrets Manager (push)",
      "state": "success"
    },
    {
      "context": "Staging SaaS smoke (every 30 min) / Staging SaaS smoke (push)",
      "state": "success"
    },
    {
      "context": "E2E Staging SaaS (full lifecycle) / E2E Staging SaaS (push)",
      "state": "success"
    },
    {
      "context": "Continuous synthetic E2E (staging) / Synthetic E2E against staging (push)",
      "state": "success"
    },
    {
      "context": "CI / Canvas (Next.js) (push)",
      "state": "success"
    },
    {
      "context": "CI / Shellcheck (E2E scripts) (push)",
      "state": "success"
    },
    {
      "context": "E2E API Smoke Test / E2E API Smoke Test (push)",
      "state": "success"
    },
    {
      "context": "E2E Staging Canvas (Playwright) / Canvas tabs E2E (push)",
      "state": "success"
    },
    {
      "context": "Handlers Postgres Integration / Handlers Postgres Integration (push)",
      "state": "success"
    },
    {
      "context": "CI / Platform (Go) (push)",
      "state": "success"
    },
    {
      "context": "E2E Chat / E2E Chat (push)",
      "state": "failure"
    },
    {
      "context": "CI / all-required (push)",
      "state": "success"
    },
    {
      "context": "Harness Replays / Harness Replays (push)",
      "state": "success"
    },
    {
      "context": "publish-workspace-server-image / Production auto-deploy (push)",
      "state": "success"
    },
    {
      "context": "Sweep stale Cloudflare Tunnels / Sweep CF tunnels (push)",
      "state": "success"
    },
    {
      "context": "CI / Canvas Deploy Reminder (push)",
      "state": "success"
    },
    {
      "context": "main-red-watchdog / watchdog (push)",
      "state": "pending"
    },
    {
      "context": "gate-check-v3 / gate-check (push)",
      "state": "pending"
    }
  ],
  "branch": "main",
  "combined_state": "failure",
  "failed_contexts": [
    "E2E Chat / E2E Chat (push)"
  ],
  "recheck_combined_state": "failure",
  "recheck_failed_contexts": [
    "E2E Chat / E2E Chat (push)"
  ],
  "sha": "4d32736e2503b534e43230318cbaeb03eb9d0b7f"
}

This issue is idempotent: the watchdog runs hourly at :05 and edits this body in place. When main returns to green, the watchdog will close this issue automatically with a "main returned to green" comment.

# Main is RED on `molecule-ai/molecule-core` at `4d32736e25` Commit: <https://git.moleculesai.app/molecule-ai/molecule-core/commit/4d32736e2503b534e43230318cbaeb03eb9d0b7f> Auto-filed by `.gitea/workflows/main-red-watchdog.yml` (Option C of the [main-never-red directive](https://git.moleculesai.app/molecule-ai/molecule-core/issues/420)). Per `feedback_no_such_thing_as_flakes` + `feedback_fix_root_not_symptom`: investigate the root cause; do NOT revert as a reflex. The watchdog itself never reverts. ## Failed status contexts - **E2E Chat / E2E Chat (push)** — `failure` → [logs](/molecule-ai/molecule-core/actions/runs/82702/jobs/1) - Failing after 3m30s ## Resolution path 1. Read the failed logs (links above). 2. If reproducible locally, fix forward in a PR targeting `main`. 3. If the failure is a real flake — STOP. Per `feedback_no_such_thing_as_flakes`, intermittent failures are real bugs. Investigate to root cause; do not mark as flake. 4. If the failure is blocking unrelated work for >1 hour, file a follow-up issue and assign someone. Do NOT revert without a human GO per `feedback_prod_apply_needs_hongming_chat_go` (branch protection is a prod surface). ## Debug ```json { "all_contexts": [ { "context": "Block internal-flavored paths / Block forbidden paths (push)", "state": "success" }, { "context": "CI / Python Lint & Test (push)", "state": "success" }, { "context": "E2E API Smoke Test / detect-changes (push)", "state": "success" }, { "context": "E2E Staging Canvas (Playwright) / detect-changes (push)", "state": "success" }, { "context": "E2E Chat / detect-changes (push)", "state": "success" }, { "context": "Handlers Postgres Integration / detect-changes (push)", "state": "success" }, { "context": "Harness Replays / detect-changes (push)", "state": "success" }, { "context": "Lint forbidden tenant-env keys / Scan workspace_secrets writers for forbidden env keys (push)", "state": "success" }, { "context": "Lint no tenant GITEA or GITHUB token write / Scan for repo-host token write into tenant workspace surface (push)", "state": "success" }, { "context": "Secret scan / Scan diff for credential-shaped strings (push)", "state": "success" }, { "context": "Sweep stale e2e-* orgs (staging) / Sweep e2e orgs (push)", "state": "success" }, { "context": "E2E Staging SaaS (full lifecycle) / pr-validate (push)", "state": "success" }, { "context": "Sweep stale AWS Secrets Manager secrets / Sweep AWS Secrets Manager (push)", "state": "success" }, { "context": "Staging SaaS smoke (every 30 min) / Staging SaaS smoke (push)", "state": "success" }, { "context": "E2E Staging SaaS (full lifecycle) / E2E Staging SaaS (push)", "state": "success" }, { "context": "Continuous synthetic E2E (staging) / Synthetic E2E against staging (push)", "state": "success" }, { "context": "CI / Canvas (Next.js) (push)", "state": "success" }, { "context": "CI / Shellcheck (E2E scripts) (push)", "state": "success" }, { "context": "E2E API Smoke Test / E2E API Smoke Test (push)", "state": "success" }, { "context": "E2E Staging Canvas (Playwright) / Canvas tabs E2E (push)", "state": "success" }, { "context": "Handlers Postgres Integration / Handlers Postgres Integration (push)", "state": "success" }, { "context": "CI / Platform (Go) (push)", "state": "success" }, { "context": "E2E Chat / E2E Chat (push)", "state": "failure" }, { "context": "CI / all-required (push)", "state": "success" }, { "context": "Harness Replays / Harness Replays (push)", "state": "success" }, { "context": "publish-workspace-server-image / Production auto-deploy (push)", "state": "success" }, { "context": "Sweep stale Cloudflare Tunnels / Sweep CF tunnels (push)", "state": "success" }, { "context": "CI / Canvas Deploy Reminder (push)", "state": "success" }, { "context": "main-red-watchdog / watchdog (push)", "state": "pending" }, { "context": "gate-check-v3 / gate-check (push)", "state": "pending" } ], "branch": "main", "combined_state": "failure", "failed_contexts": [ "E2E Chat / E2E Chat (push)" ], "recheck_combined_state": "failure", "recheck_failed_contexts": [ "E2E Chat / E2E Chat (push)" ], "sha": "4d32736e2503b534e43230318cbaeb03eb9d0b7f" } ``` _This issue is idempotent: the watchdog runs hourly at `:05` and edits this body in place. When `main` returns to green, the watchdog will close this issue automatically with a "main returned to green" comment._
gitea-actions bot added the tier:high label 2026-05-24 00:51:08 +00:00
Member

MECHANISM: #1757 is an E2E Chat persistence race, not a failure of the merged #1751 busy-queue fallback itself. The Playwright test sent Persistence test, observed the live Echo: Persistence test, then reloaded and failed because /chat-history did not hydrate the user bubble within 5s. The live echo can arrive from the A2A response/broadcast path while the durable history row is still written asynchronously: logA2ASuccess schedules LogActivity in h.goAsync, and chat-history later reads only committed activity_logs rows where activity_type='a2a_receive' and source_id IS NULL. That makes reload-immediately-after-echo a real race between UI reload and activity-log persistence.

EVIDENCE: actions run 82702 job 1 failed in canvas/e2e/chat-desktop.spec.ts:77 waiting for exact text Persistence test after reload; the same test had already waited for Echo: Persistence test at chat-desktop.spec.ts:67. The server log shows repeated successful GET /workspaces/.../chat-history?limit=10 and POST /workspaces/.../a2a around the failure, with no HTTP 5xx. workspace-server/internal/handlers/a2a_proxy_helpers.go:357-409 records successful A2A sends through async h.goAsync(LogActivity...) and separately broadcasts live A2A responses. workspace-server/internal/messagestore/postgres_store.go:165-186 hydrates history from committed activity_logs only.

RECOMMENDED FIX SHAPE: Make canvas-user A2A success persistence synchronous, or at least durable-before-live-echo for the chat-history source of truth, matching the discipline already documented for poll-mode queued receives in logA2AReceiveQueued (a2a_proxy_helpers.go:624-650). If async logging is kept for throughput, the E2E and ChatTab reload path need a post-send durable-history acknowledgment or retry-until-message-present contract before reload assertions; otherwise this class will remain a timing-dependent main-red.

MECHANISM: #1757 is an E2E Chat persistence race, not a failure of the merged #1751 busy-queue fallback itself. The Playwright test sent `Persistence test`, observed the live `Echo: Persistence test`, then reloaded and failed because `/chat-history` did not hydrate the user bubble within 5s. The live echo can arrive from the A2A response/broadcast path while the durable history row is still written asynchronously: `logA2ASuccess` schedules `LogActivity` in `h.goAsync`, and `chat-history` later reads only committed `activity_logs` rows where `activity_type='a2a_receive'` and `source_id IS NULL`. That makes reload-immediately-after-echo a real race between UI reload and activity-log persistence. EVIDENCE: actions run 82702 job 1 failed in `canvas/e2e/chat-desktop.spec.ts:77` waiting for exact text `Persistence test` after reload; the same test had already waited for `Echo: Persistence test` at `chat-desktop.spec.ts:67`. The server log shows repeated successful `GET /workspaces/.../chat-history?limit=10` and `POST /workspaces/.../a2a` around the failure, with no HTTP 5xx. `workspace-server/internal/handlers/a2a_proxy_helpers.go:357-409` records successful A2A sends through async `h.goAsync(LogActivity...)` and separately broadcasts live A2A responses. `workspace-server/internal/messagestore/postgres_store.go:165-186` hydrates history from committed `activity_logs` only. RECOMMENDED FIX SHAPE: Make canvas-user A2A success persistence synchronous, or at least durable-before-live-echo for the chat-history source of truth, matching the discipline already documented for poll-mode queued receives in `logA2AReceiveQueued` (`a2a_proxy_helpers.go:624-650`). If async logging is kept for throughput, the E2E and ChatTab reload path need a post-send durable-history acknowledgment or retry-until-message-present contract before reload assertions; otherwise this class will remain a timing-dependent main-red.

main returned to green at SHA ca9fe8dbfca459f4b4a61f55dcd21fecae6c1b73 (https://git.moleculesai.app/molecule-ai/molecule-core/commit/ca9fe8dbfca459f4b4a61f55dcd21fecae6c1b73). Closing automatically. If the underlying root cause is not yet understood, reopen this issue and file a postmortem — green-by-flake is still a bug per feedback_no_such_thing_as_flakes.

`main` returned to green at SHA `ca9fe8dbfca459f4b4a61f55dcd21fecae6c1b73` (<https://git.moleculesai.app/molecule-ai/molecule-core/commit/ca9fe8dbfca459f4b4a61f55dcd21fecae6c1b73>). Closing automatically. If the underlying root cause is not yet understood, reopen this issue and file a postmortem — green-by-flake is still a bug per `feedback_no_such_thing_as_flakes`.
gitea-actions bot closed this issue 2026-05-26 16:05:54 +00:00
Sign in to join this conversation.
2 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: molecule-ai/molecule-core#1757