[main-red] molecule-ai/molecule-core: def18f28fa #1638

Closed
opened 2026-05-21 08:07:35 +00:00 by gitea-actions · 98 comments

Main is RED on molecule-ai/molecule-core at def18f28fa

Commit: https://git.moleculesai.app/molecule-ai/molecule-core/commit/def18f28fa7477ac82d114d570c9dc9fe63ce7d7

Auto-filed by .gitea/workflows/main-red-watchdog.yml (Option C of the main-never-red directive). Per feedback_no_such_thing_as_flakes + feedback_fix_root_not_symptom: investigate the root cause; do NOT revert as a reflex. The watchdog itself never reverts.

Failed status contexts

  • E2E Peer Visibility (literal MCP list_peers) / E2E Peer Visibility (push)failurelogs
    • Failing after 2m17s
  • Railway pin audit (drift detection) / Audit Railway env vars for drift-prone pins (push)failurelogs
    • Failing after 3s

Resolution path

  1. Read the failed logs (links above).
  2. If reproducible locally, fix forward in a PR targeting main.
  3. If the failure is a real flake — STOP. Per feedback_no_such_thing_as_flakes, intermittent failures are real bugs. Investigate to root cause; do not mark as flake.
  4. If the failure is blocking unrelated work for >1 hour, file a follow-up issue and assign someone. Do NOT revert without a human GO per feedback_prod_apply_needs_hongming_chat_go (branch protection is a prod surface).

Debug

{
  "all_contexts": [
    {
      "context": "Secret scan / Scan diff for credential-shaped strings (push)",
      "state": "success"
    },
    {
      "context": "Lint no tenant GITEA or GITHUB token write / Scan for repo-host token write into tenant workspace surface (push)",
      "state": "success"
    },
    {
      "context": "Handlers Postgres Integration / detect-changes (push)",
      "state": "success"
    },
    {
      "context": "E2E API Smoke Test / E2E API Smoke Test (push)",
      "state": "success"
    },
    {
      "context": "CI / Shellcheck (E2E scripts) (push)",
      "state": "success"
    },
    {
      "context": "Ops Scripts Tests / Ops scripts (unittest) (push)",
      "state": "success"
    },
    {
      "context": "Handlers Postgres Integration / Handlers Postgres Integration (push)",
      "state": "success"
    },
    {
      "context": "publish-workspace-server-image / build-and-push (push)",
      "state": "success"
    },
    {
      "context": "CI / Platform (Go) (push)",
      "state": "success"
    },
    {
      "context": "E2E Peer Visibility (literal MCP list_peers) / E2E Peer Visibility (local) (push)",
      "state": "success"
    },
    {
      "context": "CI / Canvas (Next.js) (push)",
      "state": "success"
    },
    {
      "context": "CI / all-required (push)",
      "state": "success"
    },
    {
      "context": "CI / Canvas Deploy Reminder (push)",
      "state": "success"
    },
    {
      "context": "E2E Peer Visibility (literal MCP list_peers) / E2E Peer Visibility (push)",
      "state": "failure"
    },
    {
      "context": "publish-workspace-server-image / Production auto-deploy (push)",
      "state": "success"
    },
    {
      "context": "E2E Staging External Runtime / E2E Staging External Runtime (push)",
      "state": "success"
    },
    {
      "context": "E2E Staging Canvas (Playwright) / detect-changes (push)",
      "state": "success"
    },
    {
      "context": "E2E Staging Canvas (Playwright) / Canvas tabs E2E (push)",
      "state": "success"
    },
    {
      "context": "E2E Chat / detect-changes (push)",
      "state": "success"
    },
    {
      "context": "E2E Chat / E2E Chat (push)",
      "state": "success"
    },
    {
      "context": "Railway pin audit (drift detection) / Audit Railway env vars for drift-prone pins (push)",
      "state": "failure"
    },
    {
      "context": "lint-continue-on-error-tracking / lint-continue-on-error-tracking (push)",
      "state": "success"
    },
    {
      "context": "gate-check-v3 / gate-check (push)",
      "state": "success"
    },
    {
      "context": "Sweep stale Cloudflare DNS records / Sweep CF orphans (push)",
      "state": "success"
    },
    {
      "context": "ci-required-drift / drift (push)",
      "state": "success"
    },
    {
      "context": "Sweep stale Cloudflare Tunnels / Sweep CF tunnels (push)",
      "state": "success"
    },
    {
      "context": "Sweep stale e2e-* orgs (staging) / Sweep e2e orgs (push)",
      "state": "success"
    },
    {
      "context": "Continuous synthetic E2E (staging) / Synthetic E2E against staging (push)",
      "state": "pending"
    },
    {
      "context": "main-red-watchdog / watchdog (push)",
      "state": "pending"
    },
    {
      "context": "Staging SaaS smoke (every 30 min) / Staging SaaS smoke (push)",
      "state": "success"
    }
  ],
  "branch": "main",
  "combined_state": "failure",
  "failed_contexts": [
    "E2E Peer Visibility (literal MCP list_peers) / E2E Peer Visibility (push)",
    "Railway pin audit (drift detection) / Audit Railway env vars for drift-prone pins (push)"
  ],
  "recheck_combined_state": "failure",
  "recheck_failed_contexts": [
    "E2E Peer Visibility (literal MCP list_peers) / E2E Peer Visibility (push)",
    "Railway pin audit (drift detection) / Audit Railway env vars for drift-prone pins (push)"
  ],
  "sha": "def18f28fa7477ac82d114d570c9dc9fe63ce7d7"
}

This issue is idempotent: the watchdog runs hourly at :05 and edits this body in place. When main returns to green, the watchdog will close this issue automatically with a "main returned to green" comment.

# Main is RED on `molecule-ai/molecule-core` at `def18f28fa` Commit: <https://git.moleculesai.app/molecule-ai/molecule-core/commit/def18f28fa7477ac82d114d570c9dc9fe63ce7d7> Auto-filed by `.gitea/workflows/main-red-watchdog.yml` (Option C of the [main-never-red directive](https://git.moleculesai.app/molecule-ai/molecule-core/issues/420)). Per `feedback_no_such_thing_as_flakes` + `feedback_fix_root_not_symptom`: investigate the root cause; do NOT revert as a reflex. The watchdog itself never reverts. ## Failed status contexts - **E2E Peer Visibility (literal MCP list_peers) / E2E Peer Visibility (push)** — `failure` → [logs](/molecule-ai/molecule-core/actions/runs/77595/jobs/2) - Failing after 2m17s - **Railway pin audit (drift detection) / Audit Railway env vars for drift-prone pins (push)** — `failure` → [logs](/molecule-ai/molecule-core/actions/runs/77680/jobs/0) - Failing after 3s ## Resolution path 1. Read the failed logs (links above). 2. If reproducible locally, fix forward in a PR targeting `main`. 3. If the failure is a real flake — STOP. Per `feedback_no_such_thing_as_flakes`, intermittent failures are real bugs. Investigate to root cause; do not mark as flake. 4. If the failure is blocking unrelated work for >1 hour, file a follow-up issue and assign someone. Do NOT revert without a human GO per `feedback_prod_apply_needs_hongming_chat_go` (branch protection is a prod surface). ## Debug ```json { "all_contexts": [ { "context": "Secret scan / Scan diff for credential-shaped strings (push)", "state": "success" }, { "context": "Lint no tenant GITEA or GITHUB token write / Scan for repo-host token write into tenant workspace surface (push)", "state": "success" }, { "context": "Handlers Postgres Integration / detect-changes (push)", "state": "success" }, { "context": "E2E API Smoke Test / E2E API Smoke Test (push)", "state": "success" }, { "context": "CI / Shellcheck (E2E scripts) (push)", "state": "success" }, { "context": "Ops Scripts Tests / Ops scripts (unittest) (push)", "state": "success" }, { "context": "Handlers Postgres Integration / Handlers Postgres Integration (push)", "state": "success" }, { "context": "publish-workspace-server-image / build-and-push (push)", "state": "success" }, { "context": "CI / Platform (Go) (push)", "state": "success" }, { "context": "E2E Peer Visibility (literal MCP list_peers) / E2E Peer Visibility (local) (push)", "state": "success" }, { "context": "CI / Canvas (Next.js) (push)", "state": "success" }, { "context": "CI / all-required (push)", "state": "success" }, { "context": "CI / Canvas Deploy Reminder (push)", "state": "success" }, { "context": "E2E Peer Visibility (literal MCP list_peers) / E2E Peer Visibility (push)", "state": "failure" }, { "context": "publish-workspace-server-image / Production auto-deploy (push)", "state": "success" }, { "context": "E2E Staging External Runtime / E2E Staging External Runtime (push)", "state": "success" }, { "context": "E2E Staging Canvas (Playwright) / detect-changes (push)", "state": "success" }, { "context": "E2E Staging Canvas (Playwright) / Canvas tabs E2E (push)", "state": "success" }, { "context": "E2E Chat / detect-changes (push)", "state": "success" }, { "context": "E2E Chat / E2E Chat (push)", "state": "success" }, { "context": "Railway pin audit (drift detection) / Audit Railway env vars for drift-prone pins (push)", "state": "failure" }, { "context": "lint-continue-on-error-tracking / lint-continue-on-error-tracking (push)", "state": "success" }, { "context": "gate-check-v3 / gate-check (push)", "state": "success" }, { "context": "Sweep stale Cloudflare DNS records / Sweep CF orphans (push)", "state": "success" }, { "context": "ci-required-drift / drift (push)", "state": "success" }, { "context": "Sweep stale Cloudflare Tunnels / Sweep CF tunnels (push)", "state": "success" }, { "context": "Sweep stale e2e-* orgs (staging) / Sweep e2e orgs (push)", "state": "success" }, { "context": "Continuous synthetic E2E (staging) / Synthetic E2E against staging (push)", "state": "pending" }, { "context": "main-red-watchdog / watchdog (push)", "state": "pending" }, { "context": "Staging SaaS smoke (every 30 min) / Staging SaaS smoke (push)", "state": "success" } ], "branch": "main", "combined_state": "failure", "failed_contexts": [ "E2E Peer Visibility (literal MCP list_peers) / E2E Peer Visibility (push)", "Railway pin audit (drift detection) / Audit Railway env vars for drift-prone pins (push)" ], "recheck_combined_state": "failure", "recheck_failed_contexts": [ "E2E Peer Visibility (literal MCP list_peers) / E2E Peer Visibility (push)", "Railway pin audit (drift detection) / Audit Railway env vars for drift-prone pins (push)" ], "sha": "def18f28fa7477ac82d114d570c9dc9fe63ce7d7" } ``` _This issue is idempotent: the watchdog runs hourly at `:05` and edits this body in place. When `main` returns to green, the watchdog will close this issue automatically with a "main returned to green" comment._
gitea-actions bot added the tier:high label 2026-05-21 08:07:35 +00:00
Owner

2026-05-21 01:25 PDT fresh triage update

  • molecule-core/main@def18f28fa74 is red on exactly one current failed status: E2E Peer Visibility (literal MCP list_peers) / E2E Peer Visibility, run 77595 job 2. Status API summary: failure:1,pending:43,success:42 (includes stale duplicate pending rows).
  • Direct failure evidence: the staging peer-visibility job provisioned org e2e-pv-20260521-90478-1, tenant reached running, /health became OK on attempt 16, parent workspace was created, then the hermes sibling workspace response was status=provisioning and workspace_access=none with no auth_token: hermes workspace did not return or mint an auth_token — cannot drive its MCP call. Teardown successfully purged the tenant.
  • Publish is no longer the active red: workspace-server publish run 77587 job 0 succeeded in 6m2s on canonical runner publish-2 and production auto-deploy job 1 succeeded in 3m31s. CI all-required succeeded in 7m34s.
  • Main publish workflows still lack workspace-local DOCKER_CONFIG / BUILDX_CONFIG; #1614 still adds those exports to both workspace-server and canvas publish workflows. With current publishes succeeding on canonical runners, #1614 remains hardening/recurrence prevention rather than an immediate unblocker.
  • PR #1614 remains open/mergeable at f9f9a0100a72 against base def18f28fa74; visible failures remain review/SOP gates only.

needs-hongming: the current main-red root appears to be the staging peer-visibility/hermes auth-token path, not image publish. No merge, force-push, push to main, branch-protection mutation, runner restart, secret rotation, or destructive cleanup was performed.

2026-05-21 01:25 PDT fresh triage update - `molecule-core/main@def18f28fa74` is red on exactly one current failed status: `E2E Peer Visibility (literal MCP list_peers) / E2E Peer Visibility`, run 77595 job 2. Status API summary: `failure:1,pending:43,success:42` (includes stale duplicate pending rows). - Direct failure evidence: the staging peer-visibility job provisioned org `e2e-pv-20260521-90478-1`, tenant reached `running`, `/health` became OK on attempt 16, parent workspace was created, then the hermes sibling workspace response was `status=provisioning` and `workspace_access=none` with no `auth_token`: `hermes workspace did not return or mint an auth_token — cannot drive its MCP call`. Teardown successfully purged the tenant. - Publish is no longer the active red: workspace-server publish run 77587 job 0 succeeded in 6m2s on canonical runner publish-2 and production auto-deploy job 1 succeeded in 3m31s. CI all-required succeeded in 7m34s. - Main publish workflows still lack workspace-local `DOCKER_CONFIG` / `BUILDX_CONFIG`; #1614 still adds those exports to both workspace-server and canvas publish workflows. With current publishes succeeding on canonical runners, #1614 remains hardening/recurrence prevention rather than an immediate unblocker. - PR #1614 remains open/mergeable at `f9f9a0100a72` against base `def18f28fa74`; visible failures remain review/SOP gates only. needs-hongming: the current main-red root appears to be the staging peer-visibility/hermes auth-token path, not image publish. No merge, force-push, push to main, branch-protection mutation, runner restart, secret rotation, or destructive cleanup was performed.
Owner

2026-05-21 02:25 PDT fresh triage update

  • molecule-core/main@def18f28fa74 remains red on the same single failed status: E2E Peer Visibility (literal MCP list_peers) / E2E Peer Visibility, run 77595 job 2. Status API summary is now failure:1,pending:57,success:56 with stale duplicate pending rows.
  • Direct log evidence is unchanged: the staging E2E provisioned org e2e-pv-20260521-90478-1, tenant reached running, /health became OK on attempt 16, parent workspace was created, then the hermes sibling create response had status=provisioning, workspace_access=none, and no auth_token; the script also attempted the fallback token mint endpoints before failing. Teardown purged the tenant.
  • Source check: tests/e2e/test_peer_visibility_mcp_staging.sh first reads auth_token or connection.auth_token, then tries POST /admin/workspaces/$WID/tokens, then GET /admin/workspaces/$WID/test-token; only after all three are empty does it fail. That narrows this to the hermes workspace token/provisioning surface, not a test parsing miss.
  • Publish/deploy remain green: workspace-server publish run 77587 job 0 succeeded in 6m2s on canonical runner publish-2, production auto-deploy job 1 succeeded in 3m31s, and CI all-required succeeded in 7m34s.
  • PR #1614 remains open/mergeable at f9f9a0100a72 against base def18f28fa74; visible failures remain review/SOP gates only. Since current publishes are green on canonical runners, #1614 remains hardening/recurrence prevention rather than the active unblocker.

needs-hongming: the active main-red remains peer-visibility/hermes auth-token provisioning. No merge, force-push, push to main, branch-protection mutation, runner restart, secret rotation, or destructive cleanup was performed.

2026-05-21 02:25 PDT fresh triage update - `molecule-core/main@def18f28fa74` remains red on the same single failed status: `E2E Peer Visibility (literal MCP list_peers) / E2E Peer Visibility`, run 77595 job 2. Status API summary is now `failure:1,pending:57,success:56` with stale duplicate pending rows. - Direct log evidence is unchanged: the staging E2E provisioned org `e2e-pv-20260521-90478-1`, tenant reached `running`, `/health` became OK on attempt 16, parent workspace was created, then the hermes sibling create response had `status=provisioning`, `workspace_access=none`, and no `auth_token`; the script also attempted the fallback token mint endpoints before failing. Teardown purged the tenant. - Source check: `tests/e2e/test_peer_visibility_mcp_staging.sh` first reads `auth_token` or `connection.auth_token`, then tries `POST /admin/workspaces/$WID/tokens`, then `GET /admin/workspaces/$WID/test-token`; only after all three are empty does it fail. That narrows this to the hermes workspace token/provisioning surface, not a test parsing miss. - Publish/deploy remain green: workspace-server publish run 77587 job 0 succeeded in 6m2s on canonical runner publish-2, production auto-deploy job 1 succeeded in 3m31s, and CI all-required succeeded in 7m34s. - PR #1614 remains open/mergeable at `f9f9a0100a72` against base `def18f28fa74`; visible failures remain review/SOP gates only. Since current publishes are green on canonical runners, #1614 remains hardening/recurrence prevention rather than the active unblocker. needs-hongming: the active main-red remains peer-visibility/hermes auth-token provisioning. No merge, force-push, push to main, branch-protection mutation, runner restart, secret rotation, or destructive cleanup was performed.
Owner

2026-05-21 03:25 PDT fresh triage update

  • molecule-core/main@def18f28fa74 remains red on the same single failed status: E2E Peer Visibility (literal MCP list_peers) / E2E Peer Visibility, run 77595 job 2. Status API summary is now failure:1,pending:70,success:69 with stale duplicate pending rows.
  • Direct log evidence remains the same: staging E2E provisioned e2e-pv-20260521-90478-1, tenant reached running, /health OK, parent workspace created, then hermes sibling create response had status=provisioning, workspace_access=none, and no auth_token; teardown purged the tenant.
  • Source/code scan this pass: the staging test’s failure is after all token extraction fallbacks (auth_token, connection.auth_token, POST /admin/workspaces/$WID/tokens, GET /admin/workspaces/$WID/test-token). admin_test_token.go confirms /admin/workspaces/:id/test-token mints via wsauth.IssueToken and is hidden only when test tokens are disabled/production; router.go documents it as the bootstrap endpoint, not behind AdminAuth; tokens.go returns auth_token exactly once on token create. workspace_provision.go also documents both Docker and CP paths minting workspace auth token + platform inbound secret before start. This keeps the working theory on hermes workspace token/provisioning behavior, not runner/publish or response parsing.
  • Publish/deploy remain green: workspace-server publish run 77587 job 0, production auto-deploy job 1, and CI all-required all succeeded on canonical runner/Docker root /mnt/ci-runner-storage-1/docker.
  • PR #1614 remains open/mergeable at f9f9a0100a72 against base def18f28fa74; visible failures remain review/SOP gates only. Since current publishes are green, #1614 remains hardening/recurrence prevention rather than active unblocker.

needs-hongming: active main-red remains peer-visibility/hermes auth-token provisioning. No merge, force-push, push to main, branch-protection mutation, runner restart, secret rotation, or destructive cleanup was performed.

2026-05-21 03:25 PDT fresh triage update - `molecule-core/main@def18f28fa74` remains red on the same single failed status: `E2E Peer Visibility (literal MCP list_peers) / E2E Peer Visibility`, run 77595 job 2. Status API summary is now `failure:1,pending:70,success:69` with stale duplicate pending rows. - Direct log evidence remains the same: staging E2E provisioned `e2e-pv-20260521-90478-1`, tenant reached `running`, `/health` OK, parent workspace created, then hermes sibling create response had `status=provisioning`, `workspace_access=none`, and no `auth_token`; teardown purged the tenant. - Source/code scan this pass: the staging test’s failure is after all token extraction fallbacks (`auth_token`, `connection.auth_token`, `POST /admin/workspaces/$WID/tokens`, `GET /admin/workspaces/$WID/test-token`). `admin_test_token.go` confirms `/admin/workspaces/:id/test-token` mints via `wsauth.IssueToken` and is hidden only when test tokens are disabled/production; `router.go` documents it as the bootstrap endpoint, not behind `AdminAuth`; `tokens.go` returns `auth_token` exactly once on token create. `workspace_provision.go` also documents both Docker and CP paths minting workspace auth token + platform inbound secret before start. This keeps the working theory on hermes workspace token/provisioning behavior, not runner/publish or response parsing. - Publish/deploy remain green: workspace-server publish run 77587 job 0, production auto-deploy job 1, and CI all-required all succeeded on canonical runner/Docker root `/mnt/ci-runner-storage-1/docker`. - PR #1614 remains open/mergeable at `f9f9a0100a72` against base `def18f28fa74`; visible failures remain review/SOP gates only. Since current publishes are green, #1614 remains hardening/recurrence prevention rather than active unblocker. needs-hongming: active main-red remains peer-visibility/hermes auth-token provisioning. No merge, force-push, push to main, branch-protection mutation, runner restart, secret rotation, or destructive cleanup was performed.
Owner

2026-05-21 08:29 PDT fresh triage update

  • molecule-core/main@def18f28fa74 now has two failed push statuses in the status API: E2E Peer Visibility (literal MCP list_peers) / E2E Peer Visibility, run 77595 job 2, and Railway pin audit (drift detection) / Audit Railway env vars for drift-prone pins, run 77680 job 0. Aggregate status rows are failure:2,pending:138,success:135 including stale duplicate pending rows.
  • Peer-visibility direct log evidence is unchanged: run 77595 provisioned e2e-pv-20260521-90478-1, tenant reached running, /health was OK, then hermes sibling create returned status=provisioning, workspace_access=none, and no auth_token; teardown purged the tenant.
  • New Railway pin audit failure is a secret-presence failure, not a drift finding: run 77680 job 0 fails at Verify RAILWAY_AUDIT_TOKEN present with RAILWAY_AUDIT_TOKEN secret missing — schedule trigger requires it. A local mol_secret_v2 RAILWAY_AUDIT_TOKEN presence probe returned no value within the bounded check, so this likely needs secret provisioning/sync rather than code change.
  • Publish/deploy evidence from earlier remains green on canonical runner; current runner check still shows Docker Name=molecule-canonical-1, Docker root /mnt/ci-runner-storage-1/docker, molecule-core-net present, and /mnt/ci-runner-storage-1 at 55% used.
  • PR #1614 remains open/mergeable at f9f9a0100a72; visible PR-head failures remain review/SOP gates only. It remains hardening/recurrence prevention, not the active mainline unblocker.

needs-hongming: active main-red is now split between peer-visibility/hermes auth-token provisioning and missing RAILWAY_AUDIT_TOKEN repo secret/SSOT sync for the scheduled Railway audit. No merge, force-push, push to main, branch-protection mutation, runner restart, secret rotation, or destructive cleanup was performed.

2026-05-21 08:29 PDT fresh triage update - `molecule-core/main@def18f28fa74` now has two failed push statuses in the status API: `E2E Peer Visibility (literal MCP list_peers) / E2E Peer Visibility`, run 77595 job 2, and `Railway pin audit (drift detection) / Audit Railway env vars for drift-prone pins`, run 77680 job 0. Aggregate status rows are `failure:2,pending:138,success:135` including stale duplicate pending rows. - Peer-visibility direct log evidence is unchanged: run 77595 provisioned `e2e-pv-20260521-90478-1`, tenant reached `running`, `/health` was OK, then hermes sibling create returned `status=provisioning`, `workspace_access=none`, and no `auth_token`; teardown purged the tenant. - New Railway pin audit failure is a secret-presence failure, not a drift finding: run 77680 job 0 fails at `Verify RAILWAY_AUDIT_TOKEN present` with `RAILWAY_AUDIT_TOKEN secret missing — schedule trigger requires it`. A local `mol_secret_v2 RAILWAY_AUDIT_TOKEN` presence probe returned no value within the bounded check, so this likely needs secret provisioning/sync rather than code change. - Publish/deploy evidence from earlier remains green on canonical runner; current runner check still shows Docker `Name=molecule-canonical-1`, Docker root `/mnt/ci-runner-storage-1/docker`, `molecule-core-net` present, and `/mnt/ci-runner-storage-1` at 55% used. - PR #1614 remains open/mergeable at `f9f9a0100a72`; visible PR-head failures remain review/SOP gates only. It remains hardening/recurrence prevention, not the active mainline unblocker. needs-hongming: active main-red is now split between peer-visibility/hermes auth-token provisioning and missing `RAILWAY_AUDIT_TOKEN` repo secret/SSOT sync for the scheduled Railway audit. No merge, force-push, push to main, branch-protection mutation, runner restart, secret rotation, or destructive cleanup was performed.
Owner

2026-05-21 09:29 PDT fresh triage update

  • molecule-core/main@def18f28fa74 still has two failed push statuses: peer-visibility/hermes auth-token provisioning (run 77595 job 2) and Railway pin audit missing RAILWAY_AUDIT_TOKEN (run 77680 job 0). Status API now shows failure:2,pending:151,success:148 including stale duplicate pending rows.
  • Peer-visibility log evidence remains unchanged: staging tenant e2e-pv-20260521-90478-1 reached running, /health was OK, hermes sibling create returned status=provisioning, workspace_access=none, and no auth_token; teardown purged the tenant.
  • Railway pin audit evidence remains a secret-presence failure before drift checking: RAILWAY_AUDIT_TOKEN was empty in the workflow env, and prior bounded mol_secret_v2 RAILWAY_AUDIT_TOKEN probe found no value.
  • PR #1614 remains open/mergeable with only review/SOP gate failures; no new PR-head code failures observed.
  • Runner isolation remains healthy: Docker Name=molecule-canonical-1, Docker root /mnt/ci-runner-storage-1/docker, molecule-core-net present, runner storage 55%.

needs-hongming remains: approve/mint/sync RAILWAY_AUDIT_TOKEN; peer-visibility/hermes auth-token provisioning remains the code-path red. No merge, force-push, push to main, branch-protection mutation, runner restart, secret rotation, or destructive cleanup was performed.

2026-05-21 09:29 PDT fresh triage update - `molecule-core/main@def18f28fa74` still has two failed push statuses: peer-visibility/hermes auth-token provisioning (`run 77595 job 2`) and Railway pin audit missing `RAILWAY_AUDIT_TOKEN` (`run 77680 job 0`). Status API now shows `failure:2,pending:151,success:148` including stale duplicate pending rows. - Peer-visibility log evidence remains unchanged: staging tenant `e2e-pv-20260521-90478-1` reached `running`, `/health` was OK, hermes sibling create returned `status=provisioning`, `workspace_access=none`, and no `auth_token`; teardown purged the tenant. - Railway pin audit evidence remains a secret-presence failure before drift checking: `RAILWAY_AUDIT_TOKEN` was empty in the workflow env, and prior bounded `mol_secret_v2 RAILWAY_AUDIT_TOKEN` probe found no value. - PR #1614 remains open/mergeable with only review/SOP gate failures; no new PR-head code failures observed. - Runner isolation remains healthy: Docker `Name=molecule-canonical-1`, Docker root `/mnt/ci-runner-storage-1/docker`, `molecule-core-net` present, runner storage 55%. needs-hongming remains: approve/mint/sync `RAILWAY_AUDIT_TOKEN`; peer-visibility/hermes auth-token provisioning remains the code-path red. No merge, force-push, push to main, branch-protection mutation, runner restart, secret rotation, or destructive cleanup was performed.
Owner

2026-05-21 10:29 PDT fresh triage update

  • molecule-core/main advanced to 51dcca592dcb (docs: clarify multi external workspace config). The current head has no failed statuses in the status API; it is pending/running with pending:24,success:15.
  • The previous red statuses from def18f28fa (peer-visibility/hermes auth-token provisioning and Railway pin audit missing RAILWAY_AUDIT_TOKEN) are no longer attached to the current head, but this does not prove either root cause is fixed. Current head has fresh E2E/publish/secret-scan jobs still running or queued.
  • Repo secret inventory check confirms RAILWAY_AUDIT_TOKEN is still absent from molecule-core action secrets; mol_secret_v2 RAILWAY_AUDIT_TOKEN still found no value in the bounded check. The next scheduled Railway audit is expected to fail again until that secret is minted/synced.
  • Runner isolation remains healthy: Docker Name=molecule-canonical-1, Docker root /mnt/ci-runner-storage-1/docker, molecule-core-net present, runner storage 57%.

needs-hongming remains only for the Railway audit secret decision/sync. No merge, force-push, push to main, branch-protection mutation, runner restart, secret rotation, or destructive cleanup was performed.

2026-05-21 10:29 PDT fresh triage update - `molecule-core/main` advanced to `51dcca592dcb` (`docs: clarify multi external workspace config`). The current head has no failed statuses in the status API; it is pending/running with `pending:24,success:15`. - The previous red statuses from `def18f28fa` (peer-visibility/hermes auth-token provisioning and Railway pin audit missing `RAILWAY_AUDIT_TOKEN`) are no longer attached to the current head, but this does not prove either root cause is fixed. Current head has fresh E2E/publish/secret-scan jobs still running or queued. - Repo secret inventory check confirms `RAILWAY_AUDIT_TOKEN` is still absent from `molecule-core` action secrets; `mol_secret_v2 RAILWAY_AUDIT_TOKEN` still found no value in the bounded check. The next scheduled Railway audit is expected to fail again until that secret is minted/synced. - Runner isolation remains healthy: Docker `Name=molecule-canonical-1`, Docker root `/mnt/ci-runner-storage-1/docker`, `molecule-core-net` present, runner storage 57%. needs-hongming remains only for the Railway audit secret decision/sync. No merge, force-push, push to main, branch-protection mutation, runner restart, secret rotation, or destructive cleanup was performed.
Owner

2026-05-21 11:29 PDT fresh triage update

  • molecule-core/main advanced to c9795a6c4dc7 (Merge pull request #1642 from chore/retrigger-peer-visibility-after-publish), and the peer-visibility retrigger is red again: E2E Peer Visibility (literal MCP list_peers) / E2E Peer Visibility, run 77909 job 2. Status API shows failure:2,pending:31,success:28; both failed rows point at the same run/job.
  • Fresh log evidence is more precise than earlier runs: tenant e2e-pv-20260521-91139-1 reached running, /health was OK on attempt 1, parent workspace 06999483-86d0-4bc1-866d-509975752c48 was created, then hermes sibling create returned status=provisioning, workspace_access=none, and no auth_token.
  • Token fallback evidence: POST /admin/workspaces/846df538-282d-4b69-a036-e1a93bf2460e/tokens returned HTTP 404 with Next.js HTML, while GET /admin/workspaces/846df538-282d-4b69-a036-e1a93bf2460e/test-token reached workspace-server shape and returned HTTP 404 {"error":"not found"}. The tenant was purged successfully.
  • Working theory is now narrower: not runner/publish and not just response parsing. The hermes workspace is either not inserted/visible to token minting at this point, or the admin token route/proxy shape is wrong for POST /admin/workspaces/:id/tokens while /test-token cannot find the workspace row.

needs-hongming: active main-red remains peer-visibility/hermes auth-token provisioning. No merge, force-push, push to main, branch-protection mutation, runner restart, secret rotation, or destructive cleanup was performed.

2026-05-21 11:29 PDT fresh triage update - `molecule-core/main` advanced to `c9795a6c4dc7` (`Merge pull request #1642 from chore/retrigger-peer-visibility-after-publish`), and the peer-visibility retrigger is red again: `E2E Peer Visibility (literal MCP list_peers) / E2E Peer Visibility`, run 77909 job 2. Status API shows `failure:2,pending:31,success:28`; both failed rows point at the same run/job. - Fresh log evidence is more precise than earlier runs: tenant `e2e-pv-20260521-91139-1` reached `running`, `/health` was OK on attempt 1, parent workspace `06999483-86d0-4bc1-866d-509975752c48` was created, then hermes sibling create returned `status=provisioning`, `workspace_access=none`, and no `auth_token`. - Token fallback evidence: `POST /admin/workspaces/846df538-282d-4b69-a036-e1a93bf2460e/tokens` returned HTTP 404 with Next.js HTML, while `GET /admin/workspaces/846df538-282d-4b69-a036-e1a93bf2460e/test-token` reached workspace-server shape and returned HTTP 404 `{"error":"not found"}`. The tenant was purged successfully. - Working theory is now narrower: not runner/publish and not just response parsing. The hermes workspace is either not inserted/visible to token minting at this point, or the admin token route/proxy shape is wrong for `POST /admin/workspaces/:id/tokens` while `/test-token` cannot find the workspace row. needs-hongming: active main-red remains peer-visibility/hermes auth-token provisioning. No merge, force-push, push to main, branch-protection mutation, runner restart, secret rotation, or destructive cleanup was performed.
Owner

2026-05-21 12:29 PDT fresh triage update

  • molecule-core/main advanced to da4b86a1593d (Merge pull request #1643 from fix/mcp-delegate-platform-path) and is red on four status rows: staging synthetic E2E run 77974 job 0, staging SaaS smoke run 77973 job 0, and duplicated peer-visibility run 77959 job 2.
  • New staging tenant-health evidence: both smoke slug e2e-smoke-20260521-smoke-91241 and synthetic slug e2e-20260521-synth-91244 provisioned far enough to get tenant URLs, but /health never returned 2xx within 900s and final curl diagnostics show Cloudflare HTTP/2 502. Both runs accepted teardown and reported no orphan leak.
  • Peer-visibility remains the same hermes auth-token path: tenant e2e-pv-20260521-91206-1 reached running, /health was OK on attempt 10, parent workspace 3e909a11-e562-404a-bb44-2768d54253ec was created, hermes sibling 3004c9e8-084a-485a-85ce-0befa473ddf5 returned status=provisioning, workspace_access=none, and no auth_token; fallback POST /admin/workspaces/:id/tokens returned Next.js HTML 404, while GET /admin/workspaces/:id/test-token returned JSON 404 {"error":"not found"}.
  • Working theory split: peer-visibility remains a hermes workspace token/provisioning bug, while the staging smoke/synthetic red is a broader tenant /health 502 regression after current main. I did not assert a deeper root cause without live service evidence.

needs-hongming: current main is red on staging tenant health plus peer-visibility/hermes token provisioning. No merge, force-push, push to main, branch-protection mutation, runner restart, secret rotation, or destructive cleanup was performed.

2026-05-21 12:29 PDT fresh triage update - `molecule-core/main` advanced to `da4b86a1593d` (`Merge pull request #1643 from fix/mcp-delegate-platform-path`) and is red on four status rows: staging synthetic E2E run 77974 job 0, staging SaaS smoke run 77973 job 0, and duplicated peer-visibility run 77959 job 2. - New staging tenant-health evidence: both smoke slug `e2e-smoke-20260521-smoke-91241` and synthetic slug `e2e-20260521-synth-91244` provisioned far enough to get tenant URLs, but `/health` never returned 2xx within 900s and final curl diagnostics show Cloudflare HTTP/2 502. Both runs accepted teardown and reported no orphan leak. - Peer-visibility remains the same hermes auth-token path: tenant `e2e-pv-20260521-91206-1` reached `running`, `/health` was OK on attempt 10, parent workspace `3e909a11-e562-404a-bb44-2768d54253ec` was created, hermes sibling `3004c9e8-084a-485a-85ce-0befa473ddf5` returned `status=provisioning`, `workspace_access=none`, and no `auth_token`; fallback `POST /admin/workspaces/:id/tokens` returned Next.js HTML 404, while `GET /admin/workspaces/:id/test-token` returned JSON 404 `{"error":"not found"}`. - Working theory split: peer-visibility remains a hermes workspace token/provisioning bug, while the staging smoke/synthetic red is a broader tenant `/health` 502 regression after current main. I did not assert a deeper root cause without live service evidence. needs-hongming: current main is red on staging tenant health plus peer-visibility/hermes token provisioning. No merge, force-push, push to main, branch-protection mutation, runner restart, secret rotation, or destructive cleanup was performed.
Owner

2026-05-21 13:29 PDT fresh triage update

  • molecule-core/main advanced to ff2557d89920 (test(e2e): forbid dev token path in staging peer visibility, #1650). Current status API has no failed rows on this head: pending:31,success:22.
  • The 12:29 PDT failures on staging smoke/synthetic /health 502 and peer-visibility/hermes token provisioning are no longer attached to current main. I am treating this as improved but not proven fixed until the pending current-head E2E rows finish green.
  • Open PRs #1614 and #1596 are still non-mergeable because their bases moved to ff2557d89920 and they still have review/SOP-only failures (plus #1596's known lint-continue-on-error-tracking failure). No new failed current-head secret-scan status observed.

No merge, force-push, push to main, branch-protection mutation, runner restart, secret rotation, or destructive cleanup was performed.

2026-05-21 13:29 PDT fresh triage update - `molecule-core/main` advanced to `ff2557d89920` (`test(e2e): forbid dev token path in staging peer visibility`, #1650). Current status API has no failed rows on this head: `pending:31,success:22`. - The 12:29 PDT failures on staging smoke/synthetic `/health` 502 and peer-visibility/hermes token provisioning are no longer attached to current main. I am treating this as improved but not proven fixed until the pending current-head E2E rows finish green. - Open PRs #1614 and #1596 are still non-mergeable because their bases moved to `ff2557d89920` and they still have review/SOP-only failures (plus #1596's known lint-continue-on-error-tracking failure). No new failed current-head secret-scan status observed. No merge, force-push, push to main, branch-protection mutation, runner restart, secret rotation, or destructive cleanup was performed.
Owner

2026-05-21 14:29 PDT fresh triage update

  • molecule-core/main advanced to 96c37cb098bf (Make AWS secrets janitor fail loud, #1652). Current status API is red on staging synthetic E2E run 78267 job 0 and staging SaaS smoke run 78266 job 0: failure:2,pending:37,success:32.
  • Synthetic E2E evidence: slug e2e-20260521-synth-91627 got TENANT_URL=https://e2e-20260521-synth-91627.staging.moleculesai.app, became reachable, then later failed with curl HTTP 502; teardown was accepted and clean.
  • Staging SaaS smoke evidence: slug e2e-smoke-20260521-smoke-91624 failed with curl HTTP 503, then teardown was accepted and clean; the workflow emitted the standard staging-smoke FAILED alert.
  • Peer-visibility/hermes auth-token is not currently a failed status on 96c37cb098bf; the active current-head failures are staging tenant/app 5xx paths.

needs-hongming: current main is red again on staging smoke/synthetic 5xx. No merge, force-push, push to main, branch-protection mutation, runner restart, secret rotation, or destructive cleanup was performed.

2026-05-21 14:29 PDT fresh triage update - `molecule-core/main` advanced to `96c37cb098bf` (`Make AWS secrets janitor fail loud`, #1652). Current status API is red on staging synthetic E2E run 78267 job 0 and staging SaaS smoke run 78266 job 0: `failure:2,pending:37,success:32`. - Synthetic E2E evidence: slug `e2e-20260521-synth-91627` got `TENANT_URL=https://e2e-20260521-synth-91627.staging.moleculesai.app`, became reachable, then later failed with curl HTTP 502; teardown was accepted and clean. - Staging SaaS smoke evidence: slug `e2e-smoke-20260521-smoke-91624` failed with curl HTTP 503, then teardown was accepted and clean; the workflow emitted the standard `staging-smoke FAILED` alert. - Peer-visibility/hermes auth-token is not currently a failed status on `96c37cb098bf`; the active current-head failures are staging tenant/app 5xx paths. needs-hongming: current main is red again on staging smoke/synthetic 5xx. No merge, force-push, push to main, branch-protection mutation, runner restart, secret rotation, or destructive cleanup was performed.
Owner

2026-05-21 15:29 PDT fresh triage update

  • molecule-core/main advanced to 9981a5099a61 (Use literal region for AWS secrets janitor, #1655). Current status API is red on staging synthetic E2E run 78353 job 0 and staging SaaS smoke run 78352 job 0: failure:2,pending:35,success:31.
  • Failure shape changed from the prior 502/503-only evidence: both current runs reached CP and tenant URL, then failed at terminal diagnose ssh-probe with exit status 255 — Connection timed out during banner exchange.
  • Synthetic: slug e2e-20260521-synth-91764, CP reachable, tenant URL created, tenant reachable, workspace 72690c21-c4a4-407b-9d95-2cac1d13755a failed ssh-probe; teardown accepted and clean.
  • Smoke: slug e2e-smoke-20260521-smoke-91758, CP reachable, tenant reachable, workspace 15fab997-c6d9-4e9d-9049-21f0ae7b261e failed ssh-probe; teardown accepted and clean.
  • Peer-visibility/hermes auth-token is not currently a failed status on 9981a5099a61; the active current-head failures are staging terminal SSH/banner exchange paths.

needs-hongming: current main remains red on staging smoke/synthetic terminal SSH probe timeouts. No merge, force-push, push to main, branch-protection mutation, runner restart, secret rotation, or destructive cleanup was performed.

2026-05-21 15:29 PDT fresh triage update - `molecule-core/main` advanced to `9981a5099a61` (`Use literal region for AWS secrets janitor`, #1655). Current status API is red on staging synthetic E2E run 78353 job 0 and staging SaaS smoke run 78352 job 0: `failure:2,pending:35,success:31`. - Failure shape changed from the prior 502/503-only evidence: both current runs reached CP and tenant URL, then failed at terminal diagnose `ssh-probe` with `exit status 255 — Connection timed out during banner exchange`. - Synthetic: slug `e2e-20260521-synth-91764`, CP reachable, tenant URL created, tenant reachable, workspace `72690c21-c4a4-407b-9d95-2cac1d13755a` failed ssh-probe; teardown accepted and clean. - Smoke: slug `e2e-smoke-20260521-smoke-91758`, CP reachable, tenant reachable, workspace `15fab997-c6d9-4e9d-9049-21f0ae7b261e` failed ssh-probe; teardown accepted and clean. - Peer-visibility/hermes auth-token is not currently a failed status on `9981a5099a61`; the active current-head failures are staging terminal SSH/banner exchange paths. needs-hongming: current main remains red on staging smoke/synthetic terminal SSH probe timeouts. No merge, force-push, push to main, branch-protection mutation, runner restart, secret rotation, or destructive cleanup was performed.
Owner

2026-05-21 16:29 PDT fresh triage update

  • molecule-core/main is still 9981a5099a61 (Use literal region for AWS secrets janitor, #1655). Current status API is now failure:7,pending:48,success:40.
  • Staging smoke/synthetic failures are repeating on terminal SSH/banner-exchange after tenant reachability. Latest examples: synthetic run 78369 slug e2e-20260521-synth-91858 reached CP and tenant URL, then workspace 1519bad6-34bf-43d8-ad83-b472b2c27fd6 failed ssh-probe with exit status 255 — Connection timed out during banner exchange; smoke run 78368 slug e2e-smoke-20260521-smoke-91855 failed the same way for workspace e7b4c8be-43f9-44a4-af5d-b26d982ffd8a. Both teardowns were clean.
  • New security/secret-drift signal: Sweep stale AWS Secrets Manager secrets run 78360 job 0 now fails loud. The workflow confirmed required Gitea secrets were present, then aws secretsmanager list-secrets failed with UnrecognizedClientException: The security token included in the request is invalid for region us-east-2.
  • Working theory split: staging smoke/synthetic is currently a terminal SSH/proxy/banner path after tenant reachability; AWS janitor is credential/token drift, not missing-secret detection.

needs-hongming: current main remains red on staging terminal SSH probe timeouts and invalid AWS janitor credentials. No merge, force-push, push to main, branch-protection mutation, runner restart, secret rotation, or destructive cleanup was performed.

2026-05-21 16:29 PDT fresh triage update - `molecule-core/main` is still `9981a5099a61` (`Use literal region for AWS secrets janitor`, #1655). Current status API is now `failure:7,pending:48,success:40`. - Staging smoke/synthetic failures are repeating on terminal SSH/banner-exchange after tenant reachability. Latest examples: synthetic run 78369 slug `e2e-20260521-synth-91858` reached CP and tenant URL, then workspace `1519bad6-34bf-43d8-ad83-b472b2c27fd6` failed `ssh-probe` with `exit status 255 — Connection timed out during banner exchange`; smoke run 78368 slug `e2e-smoke-20260521-smoke-91855` failed the same way for workspace `e7b4c8be-43f9-44a4-af5d-b26d982ffd8a`. Both teardowns were clean. - New security/secret-drift signal: `Sweep stale AWS Secrets Manager secrets` run 78360 job 0 now fails loud. The workflow confirmed required Gitea secrets were present, then `aws secretsmanager list-secrets` failed with `UnrecognizedClientException: The security token included in the request is invalid` for region `us-east-2`. - Working theory split: staging smoke/synthetic is currently a terminal SSH/proxy/banner path after tenant reachability; AWS janitor is credential/token drift, not missing-secret detection. needs-hongming: current main remains red on staging terminal SSH probe timeouts and invalid AWS janitor credentials. No merge, force-push, push to main, branch-protection mutation, runner restart, secret rotation, or destructive cleanup was performed.
Owner

2026-05-21 17:29 PDT fresh triage update

  • molecule-core/main advanced to a356bc94f3ad (feat(activity): chat_upload_receive flat-upload-manifest arm for attachments projection). Current status API is down to one failed row: synthetic E2E run 78417 job 0 (failure:1,pending:30,success:28). The AWS janitor and staging-smoke failures from the prior pass are no longer attached to current main, but AWS janitor still needs a fresh green before closing the loop.
  • Synthetic E2E evidence remains the same terminal path: run 78417 slug e2e-20260522-synth-91984 reached CP, tenant URL, and started parent/child workspace provisioning, then workspace e1e72cf5-8dd3-4654-90b0-aed0c5372ca6 failed terminal diagnose at ssh-probe with exit status 255 — Connection timed out during banner exchange; teardown was accepted and clean.
  • Staging smoke is not a current failed status on this SHA in the status API sample.

needs-hongming: current main remains red on synthetic E2E terminal SSH probe timeout. No merge, force-push, push to main, branch-protection mutation, runner restart, secret rotation, or destructive cleanup was performed.

2026-05-21 17:29 PDT fresh triage update - `molecule-core/main` advanced to `a356bc94f3ad` (`feat(activity): chat_upload_receive flat-upload-manifest arm for attachments projection`). Current status API is down to one failed row: synthetic E2E run 78417 job 0 (`failure:1,pending:30,success:28`). The AWS janitor and staging-smoke failures from the prior pass are no longer attached to current main, but AWS janitor still needs a fresh green before closing the loop. - Synthetic E2E evidence remains the same terminal path: run 78417 slug `e2e-20260522-synth-91984` reached CP, tenant URL, and started parent/child workspace provisioning, then workspace `e1e72cf5-8dd3-4654-90b0-aed0c5372ca6` failed terminal diagnose at `ssh-probe` with `exit status 255 — Connection timed out during banner exchange`; teardown was accepted and clean. - Staging smoke is not a current failed status on this SHA in the status API sample. needs-hongming: current main remains red on synthetic E2E terminal SSH probe timeout. No merge, force-push, push to main, branch-protection mutation, runner restart, secret rotation, or destructive cleanup was performed.
Owner

2026-05-21 18:29 PDT fresh triage update

  • molecule-core/main advanced to 51284546d22c (PR_TITLE). Current status API has one failed row: E2E Staging SaaS (full lifecycle), run 78731 job 1 (failure:1,pending:26,success:18).
  • Full-lifecycle evidence: CP reachable, tenant admin token retrieved, tenant URL https://e2e-20260522-92359-1.staging.moleculesai.app reachable, parent/child workspace provisioning started, then workspace 06b34d16-c34a-4d65-9122-27ae1367b4a1 failed terminal diagnose at ssh-probe with exit status 255 — Connection timed out during banner exchange.
  • Teardown was accepted and clean, including [aws-leak-check] no live EC2 instances for slug=e2e-20260522-92359-1.
  • Staging smoke, AWS janitor invalid-token, and peer-visibility/hermes are not current failed statuses on this SHA in the status API sample, but still need fresh green evidence before closing their loops.

needs-hongming: current main remains red on full-lifecycle staging SaaS terminal SSH probe timeout. No merge, force-push, push to main, branch-protection mutation, runner restart, secret rotation, or destructive cleanup was performed.

2026-05-21 18:29 PDT fresh triage update - `molecule-core/main` advanced to `51284546d22c` (`PR_TITLE`). Current status API has one failed row: `E2E Staging SaaS (full lifecycle)`, run 78731 job 1 (`failure:1,pending:26,success:18`). - Full-lifecycle evidence: CP reachable, tenant admin token retrieved, tenant URL `https://e2e-20260522-92359-1.staging.moleculesai.app` reachable, parent/child workspace provisioning started, then workspace `06b34d16-c34a-4d65-9122-27ae1367b4a1` failed terminal diagnose at `ssh-probe` with `exit status 255 — Connection timed out during banner exchange`. - Teardown was accepted and clean, including `[aws-leak-check] no live EC2 instances for slug=e2e-20260522-92359-1`. - Staging smoke, AWS janitor invalid-token, and peer-visibility/hermes are not current failed statuses on this SHA in the status API sample, but still need fresh green evidence before closing their loops. needs-hongming: current main remains red on full-lifecycle staging SaaS terminal SSH probe timeout. No merge, force-push, push to main, branch-protection mutation, runner restart, secret rotation, or destructive cleanup was performed.
Owner

2026-05-21 19:29 PDT fresh triage update

  • molecule-core/main is still 51284546d22c (fix(ci): point CI status reconciliation at canonical DB). Current status API shows failure:5,pending:43,success:36: latest synthetic run 78749, latest smoke run 78748, earlier smoke/synthetic runs 78741/78744, and full-lifecycle run 78731.
  • Latest synthetic evidence: run 78749 slug e2e-20260522-synth-92424 reached CP, retrieved a tenant admin token, reached the tenant URL, and started parent/child workspace provisioning (CHILD_ID=299f8977-d0c3-4caf-add2-291ee3da5cbb). Workspace c0d5269a-cb52-4cee-9969-6bd0b723c7cf then failed terminal diagnose at ssh-probe with exit status 255 — Connection timed out during banner exchange. Teardown was accepted; AWS leak check found no live EC2 for the slug; teardown clean.
  • Latest smoke evidence: run 78748 slug e2e-smoke-20260522-smoke-92421 reached CP, retrieved a tenant admin token, reached the tenant URL, then canary-mode workspace abac130a-f37c-4d33-8b7b-88ee88e79a5f failed the same ssh-probe banner-exchange timeout. Teardown and AWS leak check were clean.
  • Open PR heads still need their review-gate/status cleanup before merge eligibility: #1614 has failing sop-checklist, security-review, and qa-review; #1596 has those plus lint-continue-on-error-tracking.
  • Working theory remains focused on terminal SSH/proxy/banner exchange after tenant reachability and workspace provisioning, rather than CP reachability, tenant DNS/health, or teardown/leak cleanup.

needs-hongming: current main remains red on repeated staging workspace SSH probe timeouts. No merge, force-push, push to main, branch-protection mutation, runner restart, secret rotation, or destructive cleanup was performed.

2026-05-21 19:29 PDT fresh triage update - `molecule-core/main` is still `51284546d22c` (`fix(ci): point CI status reconciliation at canonical DB`). Current status API shows `failure:5,pending:43,success:36`: latest synthetic run 78749, latest smoke run 78748, earlier smoke/synthetic runs 78741/78744, and full-lifecycle run 78731. - Latest synthetic evidence: run 78749 slug `e2e-20260522-synth-92424` reached CP, retrieved a tenant admin token, reached the tenant URL, and started parent/child workspace provisioning (`CHILD_ID=299f8977-d0c3-4caf-add2-291ee3da5cbb`). Workspace `c0d5269a-cb52-4cee-9969-6bd0b723c7cf` then failed terminal diagnose at `ssh-probe` with `exit status 255 — Connection timed out during banner exchange`. Teardown was accepted; AWS leak check found no live EC2 for the slug; teardown clean. - Latest smoke evidence: run 78748 slug `e2e-smoke-20260522-smoke-92421` reached CP, retrieved a tenant admin token, reached the tenant URL, then canary-mode workspace `abac130a-f37c-4d33-8b7b-88ee88e79a5f` failed the same `ssh-probe` banner-exchange timeout. Teardown and AWS leak check were clean. - Open PR heads still need their review-gate/status cleanup before merge eligibility: #1614 has failing `sop-checklist`, `security-review`, and `qa-review`; #1596 has those plus `lint-continue-on-error-tracking`. - Working theory remains focused on terminal SSH/proxy/banner exchange after tenant reachability and workspace provisioning, rather than CP reachability, tenant DNS/health, or teardown/leak cleanup. needs-hongming: current main remains red on repeated staging workspace SSH probe timeouts. No merge, force-push, push to main, branch-protection mutation, runner restart, secret rotation, or destructive cleanup was performed.
Owner

2026-05-21 20:29 PDT fresh triage update

  • molecule-core/main advanced to 992ccfbd5e50 (Clarify EIC diagnose SG guidance (#1664)). Current status API shows failure:3,pending:36,success:30: synthetic run 78818, smoke run 78817, and full-lifecycle run 78809.
  • Important change from the prior hour: the latest failures are no longer stuck at ssh-probe. In all three sampled current failures the workspaces reached online, Canvas-terminal EIC diagnose passed as terminal-reachable, and config.yaml Files API round-trip succeeded.
  • Latest synthetic evidence: run 78818 slug e2e-20260522-synth-92533 created parent 48236a55-16d2-47aa-973e-d04b9c39ae4c and child fa616d03-1e16-49a7-9161-bc497759549b; both reached online, terminal reachable, and config PUT OK. Step 8/11 then failed sending the A2A message to parent with curl: (22) The requested URL returned error: 503. Teardown was accepted and AWS leak check found no live EC2.
  • Latest smoke evidence: run 78817 slug e2e-smoke-20260522-smoke-92530, parent 2d44db24-c313-4a6c-a641-bdc98b0ba24b, reached online, terminal reachable, config PUT OK, then A2A request returned HTTP 503. The runtime LLM key preflight passed (MOLECULE_STAGING_MINIMAX_API_KEY, len=125). Teardown and leak check were clean.
  • Latest full-lifecycle evidence: run 78809 slug e2e-20260522-92516-1, parent 46078135-718d-4481-9bc6-744ac9d07b3a and child 82d5a075-8187-4360-9910-4dbd3e2ba423, both terminal reachable and config PUT OK, then A2A request returned HTTP 503. Teardown and leak check were clean.
  • Working theory has shifted from terminal SSH/proxy/banner exchange to the post-terminal A2A/chat path returning 503 after workspace online + Files API success.

needs-hongming: current main remains red, but the failure surface moved forward to A2A HTTP 503. No merge, force-push, push to main, branch-protection mutation, runner restart, secret rotation, or destructive cleanup was performed.

2026-05-21 20:29 PDT fresh triage update - `molecule-core/main` advanced to `992ccfbd5e50` (`Clarify EIC diagnose SG guidance (#1664)`). Current status API shows `failure:3,pending:36,success:30`: synthetic run 78818, smoke run 78817, and full-lifecycle run 78809. - Important change from the prior hour: the latest failures are no longer stuck at `ssh-probe`. In all three sampled current failures the workspaces reached `online`, Canvas-terminal EIC diagnose passed as `terminal-reachable`, and `config.yaml` Files API round-trip succeeded. - Latest synthetic evidence: run 78818 slug `e2e-20260522-synth-92533` created parent `48236a55-16d2-47aa-973e-d04b9c39ae4c` and child `fa616d03-1e16-49a7-9161-bc497759549b`; both reached `online`, terminal reachable, and config PUT OK. Step 8/11 then failed sending the A2A message to parent with `curl: (22) The requested URL returned error: 503`. Teardown was accepted and AWS leak check found no live EC2. - Latest smoke evidence: run 78817 slug `e2e-smoke-20260522-smoke-92530`, parent `2d44db24-c313-4a6c-a641-bdc98b0ba24b`, reached `online`, terminal reachable, config PUT OK, then A2A request returned HTTP 503. The runtime LLM key preflight passed (`MOLECULE_STAGING_MINIMAX_API_KEY`, len=125). Teardown and leak check were clean. - Latest full-lifecycle evidence: run 78809 slug `e2e-20260522-92516-1`, parent `46078135-718d-4481-9bc6-744ac9d07b3a` and child `82d5a075-8187-4360-9910-4dbd3e2ba423`, both terminal reachable and config PUT OK, then A2A request returned HTTP 503. Teardown and leak check were clean. - Working theory has shifted from terminal SSH/proxy/banner exchange to the post-terminal A2A/chat path returning 503 after workspace online + Files API success. needs-hongming: current main remains red, but the failure surface moved forward to A2A HTTP 503. No merge, force-push, push to main, branch-protection mutation, runner restart, secret rotation, or destructive cleanup was performed.
Owner

2026-05-21 21:29 PDT fresh triage update

  • molecule-core/main remains 992ccfbd5e50 (Clarify EIC diagnose SG guidance (#1664)). Current status API shows failure:7,pending:50,success:41: latest synthetic/smoke runs 78852/78851, prior synthetic/smoke runs 78828/78826 and 78818/78817, plus full-lifecycle run 78809.
  • The current failure class is stable and no longer matches the earlier SSH/banner-exchange failure: sampled runs reach CP, retrieve tenant admin token, propagate tenant TLS, provision workspaces to online, pass Canvas-terminal EIC diagnose, and pass Files API config.yaml PUT. They then fail Step 8/11 when sending A2A/chat to the parent with HTTP 503.
  • Latest synthetic evidence: run 78852 slug e2e-20260522-synth-92596, parent 79d544fe-1975-4c4b-befc-cb89e3f969e5, child 77f4f81b-e9d3-4212-b13c-8af05bb9dbec; both online/terminal/config OK, then A2A returned 503. Teardown accepted; AWS leak check found no live EC2.
  • Latest smoke evidence: run 78851 slug e2e-smoke-20260522-smoke-92593, parent 11b19e2c-918c-4540-b8d7-10b9f6ddd9d5; LLM key preflight passed for MOLECULE_STAGING_MINIMAX_API_KEY, workspace online/terminal/config OK, then A2A returned 503. Teardown accepted; AWS leak check found no live EC2.
  • Prior 03:30/03:32 UTC smoke/synthetic runs show the same A2A 503 after terminal/config success, so this is recurring rather than one isolated tenant.
  • Working theory remains focused on the post-terminal A2A/chat path behind the workspace-server/tenant service, not CP reachability, workspace boot, terminal proxy, Files API, teardown, or EC2 cleanup.

needs-hongming: current main remains red on repeated A2A/chat HTTP 503 after workspace readiness. No merge, force-push, push to main, branch-protection mutation, runner restart, secret rotation, or destructive cleanup was performed.

2026-05-21 21:29 PDT fresh triage update - `molecule-core/main` remains `992ccfbd5e50` (`Clarify EIC diagnose SG guidance (#1664)`). Current status API shows `failure:7,pending:50,success:41`: latest synthetic/smoke runs 78852/78851, prior synthetic/smoke runs 78828/78826 and 78818/78817, plus full-lifecycle run 78809. - The current failure class is stable and no longer matches the earlier SSH/banner-exchange failure: sampled runs reach CP, retrieve tenant admin token, propagate tenant TLS, provision workspaces to `online`, pass Canvas-terminal EIC diagnose, and pass Files API `config.yaml` PUT. They then fail Step 8/11 when sending A2A/chat to the parent with HTTP 503. - Latest synthetic evidence: run 78852 slug `e2e-20260522-synth-92596`, parent `79d544fe-1975-4c4b-befc-cb89e3f969e5`, child `77f4f81b-e9d3-4212-b13c-8af05bb9dbec`; both online/terminal/config OK, then A2A returned 503. Teardown accepted; AWS leak check found no live EC2. - Latest smoke evidence: run 78851 slug `e2e-smoke-20260522-smoke-92593`, parent `11b19e2c-918c-4540-b8d7-10b9f6ddd9d5`; LLM key preflight passed for `MOLECULE_STAGING_MINIMAX_API_KEY`, workspace online/terminal/config OK, then A2A returned 503. Teardown accepted; AWS leak check found no live EC2. - Prior 03:30/03:32 UTC smoke/synthetic runs show the same A2A 503 after terminal/config success, so this is recurring rather than one isolated tenant. - Working theory remains focused on the post-terminal A2A/chat path behind the workspace-server/tenant service, not CP reachability, workspace boot, terminal proxy, Files API, teardown, or EC2 cleanup. needs-hongming: current main remains red on repeated A2A/chat HTTP 503 after workspace readiness. No merge, force-push, push to main, branch-protection mutation, runner restart, secret rotation, or destructive cleanup was performed.
Owner

2026-05-21 22:29 PDT fresh triage update

  • molecule-core/main advanced to f7183cc0d833 (Merge pull request #1668 from molecule-ai/fix/e2e-a2a-busy-retry). Current status API now shows no failed rows: pending:27,success:24.
  • This is a material improvement from the prior repeated A2A/chat HTTP 503 failures on 992ccfbd5e50; the fix PR #1668 is merged/closed. I am not calling the incident fully resolved yet because current-head statuses are still pending and need the next green staging synthetic/smoke/full-lifecycle evidence.
  • Open PR hygiene: #1614 remains open/non-mergeable with failing sop-checklist, security-review, and qa-review; #1596 remains open/non-mergeable with those gates plus lint-continue-on-error-tracking.

No merge, force-push, push to main, branch-protection mutation, runner restart, secret rotation, or destructive cleanup was performed by this triage pass.

2026-05-21 22:29 PDT fresh triage update - `molecule-core/main` advanced to `f7183cc0d833` (`Merge pull request #1668 from molecule-ai/fix/e2e-a2a-busy-retry`). Current status API now shows no failed rows: `pending:27,success:24`. - This is a material improvement from the prior repeated A2A/chat HTTP 503 failures on `992ccfbd5e50`; the fix PR #1668 is merged/closed. I am not calling the incident fully resolved yet because current-head statuses are still pending and need the next green staging synthetic/smoke/full-lifecycle evidence. - Open PR hygiene: #1614 remains open/non-mergeable with failing `sop-checklist`, `security-review`, and `qa-review`; #1596 remains open/non-mergeable with those gates plus `lint-continue-on-error-tracking`. No merge, force-push, push to main, branch-protection mutation, runner restart, secret rotation, or destructive cleanup was performed by this triage pass.
Owner

2026-05-21 23:29 PDT fresh triage update

  • molecule-core/main advanced again to 7fb0da3ed59e (Merge pull request #1677 from molecule-ai/fix/e2e-wait-after-config-put). Current status API shows no failed rows: pending:24,success:10.
  • This continues the recovery trend after #1668/#1677; the prior A2A/chat HTTP 503 failures are not attached to the current main head. I am still waiting on pending staging statuses before calling this closed.
  • Open PR hygiene remains unchanged: #1614 is open with failing sop-checklist, security-review, and qa-review; #1596 is open with those gates plus lint-continue-on-error-tracking.

No merge, force-push, push to main, branch-protection mutation, runner restart, secret rotation, or destructive cleanup was performed by this triage pass.

2026-05-21 23:29 PDT fresh triage update - `molecule-core/main` advanced again to `7fb0da3ed59e` (`Merge pull request #1677 from molecule-ai/fix/e2e-wait-after-config-put`). Current status API shows no failed rows: `pending:24,success:10`. - This continues the recovery trend after #1668/#1677; the prior A2A/chat HTTP 503 failures are not attached to the current main head. I am still waiting on pending staging statuses before calling this closed. - Open PR hygiene remains unchanged: #1614 is open with failing `sop-checklist`, `security-review`, and `qa-review`; #1596 is open with those gates plus `lint-continue-on-error-tracking`. No merge, force-push, push to main, branch-protection mutation, runner restart, secret rotation, or destructive cleanup was performed by this triage pass.
Owner

2026-05-22 00:29 PDT fresh triage update

  • molecule-core/main advanced to 01087ddbe740 (Merge pull request #1678 from molecule-ai/fix/ci-path-scope-main-push). Current status API shows failure:2,pending:44,success:42; both failure rows are duplicate commit-status rows for peer-visibility run 79259 job 2.
  • Fresh log evidence for run 79259 job 2: the job failed very early at the staging CP health preflight with Staging CP unhealthy (HTTP 502) — infra, not a workspace bug. No workspace provisioning / A2A / list_peers phase was reached.
  • A fresh direct health probe later in this pass returned https://staging-api.moleculesai.app/health HTTP 200, so this looks like a transient staging CP 502 captured by the peer-visibility gate, not a repeat of the prior A2A/chat 503 class or terminal SSH class.
  • Open PR hygiene remains unchanged: #1614 is open with failing sop-checklist, security-review, and qa-review; #1596 is open with those gates plus lint-continue-on-error-tracking.

needs-hongming: current main is red on a transient-looking staging CP health preflight 502 in peer visibility; next green peer-visibility/staging evidence is needed before closing the loop. No merge, force-push, push to main, branch-protection mutation, runner restart, secret rotation, or destructive cleanup was performed by this triage pass.

2026-05-22 00:29 PDT fresh triage update - `molecule-core/main` advanced to `01087ddbe740` (`Merge pull request #1678 from molecule-ai/fix/ci-path-scope-main-push`). Current status API shows `failure:2,pending:44,success:42`; both failure rows are duplicate commit-status rows for peer-visibility run 79259 job 2. - Fresh log evidence for run 79259 job 2: the job failed very early at the staging CP health preflight with `Staging CP unhealthy (HTTP 502) — infra, not a workspace bug`. No workspace provisioning / A2A / list_peers phase was reached. - A fresh direct health probe later in this pass returned `https://staging-api.moleculesai.app/health` HTTP 200, so this looks like a transient staging CP 502 captured by the peer-visibility gate, not a repeat of the prior A2A/chat 503 class or terminal SSH class. - Open PR hygiene remains unchanged: #1614 is open with failing `sop-checklist`, `security-review`, and `qa-review`; #1596 is open with those gates plus `lint-continue-on-error-tracking`. needs-hongming: current main is red on a transient-looking staging CP health preflight 502 in peer visibility; next green peer-visibility/staging evidence is needed before closing the loop. No merge, force-push, push to main, branch-protection mutation, runner restart, secret rotation, or destructive cleanup was performed by this triage pass.
Owner

2026-05-22 01:29 PDT fresh triage update

  • molecule-core/main remains 01087ddbe740 (Merge pull request #1678 from molecule-ai/fix/ci-path-scope-main-push). Current status API shows duplicate peer-visibility failed rows plus more pending/success rows: failure:2,pending:59,success:58.
  • Fresh log recheck for run 79259 job 2 is unchanged: the job failed at staging CP health preflight with Staging CP unhealthy (HTTP 502) — infra, not a workspace bug; no workspace provisioning, A2A, or list_peers phase was reached.
  • Fresh direct probe during this pass returned https://staging-api.moleculesai.app/health HTTP 200, so this still looks like a transient/point-in-time staging CP 502 captured by the peer-visibility gate rather than recurrence of the prior A2A/chat 503 or terminal SSH classes.
  • Open PR hygiene unchanged: #1614 is open with failing sop-checklist, security-review, and qa-review; #1596 is open with those gates plus lint-continue-on-error-tracking.

needs-hongming: current main remains red only because of the duplicate peer-visibility CP-health-preflight 502 rows; next green peer-visibility/staging evidence is needed. No merge, force-push, push to main, branch-protection mutation, runner restart, secret rotation, or destructive cleanup was performed by this triage pass.

2026-05-22 01:29 PDT fresh triage update - `molecule-core/main` remains `01087ddbe740` (`Merge pull request #1678 from molecule-ai/fix/ci-path-scope-main-push`). Current status API shows duplicate peer-visibility failed rows plus more pending/success rows: `failure:2,pending:59,success:58`. - Fresh log recheck for run 79259 job 2 is unchanged: the job failed at staging CP health preflight with `Staging CP unhealthy (HTTP 502) — infra, not a workspace bug`; no workspace provisioning, A2A, or list_peers phase was reached. - Fresh direct probe during this pass returned `https://staging-api.moleculesai.app/health` HTTP 200, so this still looks like a transient/point-in-time staging CP 502 captured by the peer-visibility gate rather than recurrence of the prior A2A/chat 503 or terminal SSH classes. - Open PR hygiene unchanged: #1614 is open with failing `sop-checklist`, `security-review`, and `qa-review`; #1596 is open with those gates plus `lint-continue-on-error-tracking`. needs-hongming: current main remains red only because of the duplicate peer-visibility CP-health-preflight 502 rows; next green peer-visibility/staging evidence is needed. No merge, force-push, push to main, branch-protection mutation, runner restart, secret rotation, or destructive cleanup was performed by this triage pass.
Owner

2026-05-22 02:29 PDT fresh triage update

  • molecule-core/main remains 01087ddbe740 (Merge pull request #1678 from molecule-ai/fix/ci-path-scope-main-push). Current status API shows the same duplicate peer-visibility failed rows, now with more completed/pending statuses: failure:2,pending:72,success:72.
  • Fresh log recheck for run 79259 job 2 is unchanged: admin token was present, then staging CP health preflight failed with Staging CP unhealthy (HTTP 502) — infra, not a workspace bug; no workspace provisioning, A2A, or list_peers phase was reached.
  • Fresh direct probe during this pass again returned https://staging-api.moleculesai.app/health HTTP 200, so this remains a stale/point-in-time staging CP 502 row pending replacement by a green peer-visibility run, not a current live CP outage in the direct sample.
  • Open PR hygiene unchanged: #1614 is open with failing sop-checklist, security-review, and qa-review; #1596 is open with those gates plus lint-continue-on-error-tracking.

needs-hongming: current main remains red only because of the duplicate peer-visibility CP-health-preflight 502 rows; next green peer-visibility/staging evidence is needed. No merge, force-push, push to main, branch-protection mutation, runner restart, secret rotation, or destructive cleanup was performed by this triage pass.

2026-05-22 02:29 PDT fresh triage update - `molecule-core/main` remains `01087ddbe740` (`Merge pull request #1678 from molecule-ai/fix/ci-path-scope-main-push`). Current status API shows the same duplicate peer-visibility failed rows, now with more completed/pending statuses: `failure:2,pending:72,success:72`. - Fresh log recheck for run 79259 job 2 is unchanged: admin token was present, then staging CP health preflight failed with `Staging CP unhealthy (HTTP 502) — infra, not a workspace bug`; no workspace provisioning, A2A, or list_peers phase was reached. - Fresh direct probe during this pass again returned `https://staging-api.moleculesai.app/health` HTTP 200, so this remains a stale/point-in-time staging CP 502 row pending replacement by a green peer-visibility run, not a current live CP outage in the direct sample. - Open PR hygiene unchanged: #1614 is open with failing `sop-checklist`, `security-review`, and `qa-review`; #1596 is open with those gates plus `lint-continue-on-error-tracking`. needs-hongming: current main remains red only because of the duplicate peer-visibility CP-health-preflight 502 rows; next green peer-visibility/staging evidence is needed. No merge, force-push, push to main, branch-protection mutation, runner restart, secret rotation, or destructive cleanup was performed by this triage pass.
Owner

2026-05-22 03:29 PDT fresh triage update

  • molecule-core/main remains 01087ddbe740 (Merge pull request #1678 from molecule-ai/fix/ci-path-scope-main-push). Current status API still shows the duplicate peer-visibility failed rows for run 79259 job 2, with failure:2,pending:86,success:86.
  • Fresh log recheck remains the same: admin token present, then staging CP health preflight failed with Staging CP unhealthy (HTTP 502) — infra, not a workspace bug; no workspace provisioning, A2A, or list_peers phase was reached.
  • Fresh direct probe during this pass again returned https://staging-api.moleculesai.app/health HTTP 200, so this remains a stale/point-in-time staging CP 502 row pending replacement by green peer-visibility evidence.
  • Open PR hygiene unchanged: #1614 is open with failing sop-checklist, security-review, qa-review; #1596 is open with those gates plus lint-continue-on-error-tracking.

needs-hongming: current main remains red only because of the duplicate peer-visibility CP-health-preflight 502 rows; next green peer-visibility/staging evidence is needed. No merge, force-push, push to main, branch-protection mutation, runner restart, secret rotation, or destructive cleanup was performed by this triage pass.

2026-05-22 03:29 PDT fresh triage update - `molecule-core/main` remains `01087ddbe740` (`Merge pull request #1678 from molecule-ai/fix/ci-path-scope-main-push`). Current status API still shows the duplicate peer-visibility failed rows for run 79259 job 2, with `failure:2,pending:86,success:86`. - Fresh log recheck remains the same: admin token present, then staging CP health preflight failed with `Staging CP unhealthy (HTTP 502) — infra, not a workspace bug`; no workspace provisioning, A2A, or list_peers phase was reached. - Fresh direct probe during this pass again returned `https://staging-api.moleculesai.app/health` HTTP 200, so this remains a stale/point-in-time staging CP 502 row pending replacement by green peer-visibility evidence. - Open PR hygiene unchanged: #1614 is open with failing `sop-checklist`, `security-review`, `qa-review`; #1596 is open with those gates plus `lint-continue-on-error-tracking`. needs-hongming: current main remains red only because of the duplicate peer-visibility CP-health-preflight 502 rows; next green peer-visibility/staging evidence is needed. No merge, force-push, push to main, branch-protection mutation, runner restart, secret rotation, or destructive cleanup was performed by this triage pass.
Owner

2026-05-22 09:59 PDT fresh triage update

  • molecule-core/main@01087ddbe740 is now red on failure:7,pending:178,success:173.
  • New current-head failures: latest staging smoke run 79466 and synthetic E2E run 79467 both fail during tenant provisioning with Cloudflare tunnel DNS create returning code 1000 There was an unknown error; teardown reports clean/no leaked EC2 for those sample slugs.
  • Railway pin audit run 79423 fails before audit execution because repo secret RAILWAY_AUDIT_TOKEN is missing.
  • Older duplicate peer-visibility rows for run 79259 job 2 remain: staging CP health preflight HTTP 502 before workspace provisioning/A2A/list_peers.
  • Direct probe during this pass: staging-api.moleculesai.app/health HTTP 200, OBS /api/health HTTP 200, api.moleculesai.app/healthz HTTP 404, staging.moleculesai.app did not resolve from this client.

needs-hongming: two current actionable blockers need owner decision/credential help: (1) Cloudflare DNS create failures in staging tenant provisioning, and (2) missing RAILWAY_AUDIT_TOKEN repo secret for the scheduled Railway pin audit. No merge, force-push, push to main, branch-protection mutation, runner restart, secret rotation, or destructive cleanup was performed by this triage pass.

2026-05-22 09:59 PDT fresh triage update - `molecule-core/main@01087ddbe740` is now red on `failure:7,pending:178,success:173`. - New current-head failures: latest staging smoke run 79466 and synthetic E2E run 79467 both fail during tenant provisioning with Cloudflare tunnel DNS create returning code 1000 `There was an unknown error`; teardown reports clean/no leaked EC2 for those sample slugs. - Railway pin audit run 79423 fails before audit execution because repo secret `RAILWAY_AUDIT_TOKEN` is missing. - Older duplicate peer-visibility rows for run 79259 job 2 remain: staging CP health preflight HTTP 502 before workspace provisioning/A2A/list_peers. - Direct probe during this pass: `staging-api.moleculesai.app/health` HTTP 200, OBS `/api/health` HTTP 200, `api.moleculesai.app/healthz` HTTP 404, `staging.moleculesai.app` did not resolve from this client. needs-hongming: two current actionable blockers need owner decision/credential help: (1) Cloudflare DNS create failures in staging tenant provisioning, and (2) missing `RAILWAY_AUDIT_TOKEN` repo secret for the scheduled Railway pin audit. No merge, force-push, push to main, branch-protection mutation, runner restart, secret rotation, or destructive cleanup was performed by this triage pass.
Owner

2026-05-22 10:59 PDT fresh triage update

  • molecule-core/main@01087ddbe740 remains red at failure:7,pending:192,success:187.
  • Current staging smoke/synthetic failures remain the same class: latest sampled smoke run 79466 and synthetic run 79467 fail during tenant provisioning because Cloudflare tunnel DNS create returns code 1000 There was an unknown error; teardown reports clean/no leaked EC2 for sampled slugs.
  • Railway pin audit run 79423 still fails before audit execution because repo secret RAILWAY_AUDIT_TOKEN is missing.
  • Older duplicate peer-visibility rows for run 79259 job 2 remain: staging CP health preflight HTTP 502 before workspace provisioning/A2A/list_peers.
  • Direct probe during this pass: staging-api.moleculesai.app/health HTTP 200, OBS /api/health HTTP 200, api.moleculesai.app/healthz HTTP 404, staging.moleculesai.app did not resolve from this client.

needs-hongming: no safe agent-side fix was applied. The same two current actionable blockers still need owner/credential help: Cloudflare DNS create failures in staging tenant provisioning, and missing RAILWAY_AUDIT_TOKEN repo secret for the scheduled Railway pin audit. No merge, force-push, push to main, branch-protection mutation, runner restart, secret rotation, or destructive cleanup was performed by this triage pass.

2026-05-22 10:59 PDT fresh triage update - `molecule-core/main@01087ddbe740` remains red at `failure:7,pending:192,success:187`. - Current staging smoke/synthetic failures remain the same class: latest sampled smoke run 79466 and synthetic run 79467 fail during tenant provisioning because Cloudflare tunnel DNS create returns code 1000 `There was an unknown error`; teardown reports clean/no leaked EC2 for sampled slugs. - Railway pin audit run 79423 still fails before audit execution because repo secret `RAILWAY_AUDIT_TOKEN` is missing. - Older duplicate peer-visibility rows for run 79259 job 2 remain: staging CP health preflight HTTP 502 before workspace provisioning/A2A/list_peers. - Direct probe during this pass: `staging-api.moleculesai.app/health` HTTP 200, OBS `/api/health` HTTP 200, `api.moleculesai.app/healthz` HTTP 404, `staging.moleculesai.app` did not resolve from this client. needs-hongming: no safe agent-side fix was applied. The same two current actionable blockers still need owner/credential help: Cloudflare DNS create failures in staging tenant provisioning, and missing `RAILWAY_AUDIT_TOKEN` repo secret for the scheduled Railway pin audit. No merge, force-push, push to main, branch-protection mutation, runner restart, secret rotation, or destructive cleanup was performed by this triage pass.
Owner

2026-05-22 11:59 PDT fresh triage update

Good current-head change: molecule-core/main advanced to cace2eb7d3f8 (Merge pull request #1683 ... raise staging SaaS provisioning timeout) and now has no failed current-head status rows (pending:24,success:23). The earlier Cloudflare DNS create code 1000 smoke/synthetic rows, missing RAILWAY_AUDIT_TOKEN Railway pin-audit row, and duplicate peer-visibility CP-health row are no longer current-head failures for main in the status API sample.

Caveat: PR #1683 is merged, but its PR-head statuses still show review/SOP failures (sop-checklist, security-review, qa-review), so that remains hygiene evidence to follow up separately from current main health.

No merge, force-push, push to main, branch-protection mutation, runner restart, secret rotation, or destructive cleanup was performed by this triage pass.

2026-05-22 11:59 PDT fresh triage update Good current-head change: `molecule-core/main` advanced to `cace2eb7d3f8` (`Merge pull request #1683 ... raise staging SaaS provisioning timeout`) and now has no failed current-head status rows (`pending:24,success:23`). The earlier Cloudflare DNS create code 1000 smoke/synthetic rows, missing `RAILWAY_AUDIT_TOKEN` Railway pin-audit row, and duplicate peer-visibility CP-health row are no longer current-head failures for `main` in the status API sample. Caveat: PR #1683 is merged, but its PR-head statuses still show review/SOP failures (`sop-checklist`, `security-review`, `qa-review`), so that remains hygiene evidence to follow up separately from current `main` health. No merge, force-push, push to main, branch-protection mutation, runner restart, secret rotation, or destructive cleanup was performed by this triage pass.
Owner

2026-05-22 12:59 PDT fresh triage update

molecule-core/main@cace2eb7d3f8 still has no failed current-head status rows (pending:38,success:37). The earlier Cloudflare DNS create, RAILWAY_AUDIT_TOKEN, and peer-visibility CP-health failures remain old-head/stale evidence in this sample, not current main failures.

Caveat remains: PR #1683 is merged, but its PR-head statuses still show review/SOP failures (sop-checklist, security-review, qa-review), so that remains hygiene evidence separate from current main health.

No merge, force-push, push to main, branch-protection mutation, runner restart, secret rotation, or destructive cleanup was performed by this triage pass.

2026-05-22 12:59 PDT fresh triage update `molecule-core/main@cace2eb7d3f8` still has no failed current-head status rows (`pending:38,success:37`). The earlier Cloudflare DNS create, `RAILWAY_AUDIT_TOKEN`, and peer-visibility CP-health failures remain old-head/stale evidence in this sample, not current `main` failures. Caveat remains: PR #1683 is merged, but its PR-head statuses still show review/SOP failures (`sop-checklist`, `security-review`, `qa-review`), so that remains hygiene evidence separate from current `main` health. No merge, force-push, push to main, branch-protection mutation, runner restart, secret rotation, or destructive cleanup was performed by this triage pass.
Owner

2026-05-22 13:59 PDT fresh triage update

molecule-core/main@cace2eb7d3f8 still has no failed current-head status rows (pending:52,success:51). The earlier Cloudflare DNS create, RAILWAY_AUDIT_TOKEN, and peer-visibility CP-health failures remain old-head/stale evidence in this sample, not current main failures.

Caveat remains: PR #1683 is merged, but its PR-head statuses still show review/SOP failures (sop-checklist, security-review, qa-review), separate from current main health.

No merge, force-push, push to main, branch-protection mutation, runner restart, secret rotation, or destructive cleanup was performed by this triage pass.

2026-05-22 13:59 PDT fresh triage update `molecule-core/main@cace2eb7d3f8` still has no failed current-head status rows (`pending:52,success:51`). The earlier Cloudflare DNS create, `RAILWAY_AUDIT_TOKEN`, and peer-visibility CP-health failures remain old-head/stale evidence in this sample, not current `main` failures. Caveat remains: PR #1683 is merged, but its PR-head statuses still show review/SOP failures (`sop-checklist`, `security-review`, `qa-review`), separate from current `main` health. No merge, force-push, push to main, branch-protection mutation, runner restart, secret rotation, or destructive cleanup was performed by this triage pass.
Owner

needs-hongming heartbeat verification 2026-05-22 14:59 PDT: current-head status remains recovered on main after the #1683 merge.

Fresh evidence:

  • molecule-core/main = cace2eb7d3, branch-protected, required context CI / all-required (pull_request), 2 approvals, merge whitelist disabled.
  • Current-head statuses: pending:68, success:66, failure/error:0.
  • The stale red evidence remains tied to PR heads/old heads only: PR #1614 has 3 failures, PR #1596 has 4 failures, and merged PR #1683 still has 3 PR-head review/SOP failures, but no current main-head blocker.
  • Runner/infra spot check: canonical runner storage /mnt/ci-runner-storage-1 is 65% used; Docker root is /mnt/ci-runner-storage-1/docker; molecule-core-net exists.
  • Probe spot check: obs health 200, staging-api health 200; known api /healthz 404 and staging.moleculesai.app DNS unresolved are unchanged.

No merge, branch-protection mutation, secret rotation, or runner mutation performed.

needs-hongming heartbeat verification 2026-05-22 14:59 PDT: current-head status remains recovered on main after the #1683 merge. Fresh evidence: - molecule-core/main = cace2eb7d3f8514b6f20914ef9868a1a02af92f3, branch-protected, required context `CI / all-required (pull_request)`, 2 approvals, merge whitelist disabled. - Current-head statuses: pending:68, success:66, failure/error:0. - The stale red evidence remains tied to PR heads/old heads only: PR #1614 has 3 failures, PR #1596 has 4 failures, and merged PR #1683 still has 3 PR-head review/SOP failures, but no current main-head blocker. - Runner/infra spot check: canonical runner storage /mnt/ci-runner-storage-1 is 65% used; Docker root is /mnt/ci-runner-storage-1/docker; molecule-core-net exists. - Probe spot check: obs health 200, staging-api health 200; known api /healthz 404 and staging.moleculesai.app DNS unresolved are unchanged. No merge, branch-protection mutation, secret rotation, or runner mutation performed.
Owner

needs-hongming heartbeat verification 2026-05-22 15:59 PDT: current-head status remains recovered on main after the #1683 merge.

Fresh evidence:

  • molecule-core/main = cace2eb7d3, branch-protected, required context CI / all-required (pull_request), 2 approvals, merge whitelist disabled.
  • Current-head statuses: pending:80, success:79, failure/error:0.
  • Stale red evidence remains tied to PR heads/old heads only: PR #1614 has 3 failures, PR #1596 has 4 failures, and merged PR #1683 still has 3 PR-head review/SOP failures, but no current main-head blocker.
  • Runner/infra spot check: canonical runner storage /mnt/ci-runner-storage-1 is 66% used; Docker root is /mnt/ci-runner-storage-1/docker; molecule-core-net exists.
  • Probe spot check: obs health 200, staging-api health 200; known api /healthz 404 and staging.moleculesai.app DNS unresolved are unchanged.

No merge, branch-protection mutation, secret rotation, or runner mutation performed.

needs-hongming heartbeat verification 2026-05-22 15:59 PDT: current-head status remains recovered on main after the #1683 merge. Fresh evidence: - molecule-core/main = cace2eb7d3f8514b6f20914ef9868a1a02af92f3, branch-protected, required context `CI / all-required (pull_request)`, 2 approvals, merge whitelist disabled. - Current-head statuses: pending:80, success:79, failure/error:0. - Stale red evidence remains tied to PR heads/old heads only: PR #1614 has 3 failures, PR #1596 has 4 failures, and merged PR #1683 still has 3 PR-head review/SOP failures, but no current main-head blocker. - Runner/infra spot check: canonical runner storage /mnt/ci-runner-storage-1 is 66% used; Docker root is /mnt/ci-runner-storage-1/docker; molecule-core-net exists. - Probe spot check: obs health 200, staging-api health 200; known api /healthz 404 and staging.moleculesai.app DNS unresolved are unchanged. No merge, branch-protection mutation, secret rotation, or runner mutation performed.
Owner

needs-hongming heartbeat verification 2026-05-22 16:59 PDT: current-head status remains recovered on main after the #1683 merge.

Fresh evidence:

  • molecule-core/main = cace2eb7d3, branch-protected, required context CI / all-required (pull_request), 2 approvals, merge whitelist disabled.
  • Current-head statuses: pending:94, success:93, failure/error:0.
  • Stale red evidence remains tied to PR heads/old heads only: PR #1614 has 3 failures, PR #1596 has 4 failures, and merged PR #1683 still has 3 PR-head review/SOP failures, but no current main-head blocker.
  • Runner/infra spot check: canonical runner storage /mnt/ci-runner-storage-1 is 69% used; Docker root is /mnt/ci-runner-storage-1/docker; molecule-core-net exists.
  • Probe spot check: obs health 200, staging-api health 200; known api /healthz 404 and staging.moleculesai.app DNS unresolved are unchanged.

No merge, branch-protection mutation, secret rotation, or runner mutation performed.

needs-hongming heartbeat verification 2026-05-22 16:59 PDT: current-head status remains recovered on main after the #1683 merge. Fresh evidence: - molecule-core/main = cace2eb7d3f8514b6f20914ef9868a1a02af92f3, branch-protected, required context `CI / all-required (pull_request)`, 2 approvals, merge whitelist disabled. - Current-head statuses: pending:94, success:93, failure/error:0. - Stale red evidence remains tied to PR heads/old heads only: PR #1614 has 3 failures, PR #1596 has 4 failures, and merged PR #1683 still has 3 PR-head review/SOP failures, but no current main-head blocker. - Runner/infra spot check: canonical runner storage /mnt/ci-runner-storage-1 is 69% used; Docker root is /mnt/ci-runner-storage-1/docker; molecule-core-net exists. - Probe spot check: obs health 200, staging-api health 200; known api /healthz 404 and staging.moleculesai.app DNS unresolved are unchanged. No merge, branch-protection mutation, secret rotation, or runner mutation performed.
Owner

needs-hongming heartbeat verification 2026-05-22 17:59 PDT: current-head status remains green after molecule-core main advanced to the #1685 scheduler fix.

Fresh evidence:

  • molecule-core/main = 2357aec4bf (fix(scheduler): #1684 — native_session adapters now use platform a2a_queue ... (#1685)), branch-protected, required context CI / all-required (pull_request), 2 approvals, merge whitelist disabled.
  • Current-head statuses: pending:26, success:24, failure/error:0.
  • Stale red evidence remains tied to PR heads/old heads only: PR #1614 has 3 failures, PR #1596 has 4 failures, and merged PR #1683 still has 3 PR-head review/SOP failures, but no current main-head blocker.
  • Runner/infra spot check: canonical runner storage /mnt/ci-runner-storage-1 is 69% used; Docker root is /mnt/ci-runner-storage-1/docker; molecule-core-net exists.
  • Probe spot check: obs health 200, staging-api health 200; known api /healthz 404 and staging.moleculesai.app DNS unresolved are unchanged.

No merge, branch-protection mutation, secret rotation, or runner mutation performed.

needs-hongming heartbeat verification 2026-05-22 17:59 PDT: current-head status remains green after molecule-core main advanced to the #1685 scheduler fix. Fresh evidence: - molecule-core/main = 2357aec4bf46aae198a4343476b0a362cbb38764 (`fix(scheduler): #1684 — native_session adapters now use platform a2a_queue ... (#1685)`), branch-protected, required context `CI / all-required (pull_request)`, 2 approvals, merge whitelist disabled. - Current-head statuses: pending:26, success:24, failure/error:0. - Stale red evidence remains tied to PR heads/old heads only: PR #1614 has 3 failures, PR #1596 has 4 failures, and merged PR #1683 still has 3 PR-head review/SOP failures, but no current main-head blocker. - Runner/infra spot check: canonical runner storage /mnt/ci-runner-storage-1 is 69% used; Docker root is /mnt/ci-runner-storage-1/docker; molecule-core-net exists. - Probe spot check: obs health 200, staging-api health 200; known api /healthz 404 and staging.moleculesai.app DNS unresolved are unchanged. No merge, branch-protection mutation, secret rotation, or runner mutation performed.
Owner

needs-hongming heartbeat verification 2026-05-22 18:59 PDT: current-head status remains green after molecule-core main advanced to the #1685 scheduler fix.

Fresh evidence:

  • molecule-core/main = 2357aec4bf (fix(scheduler): #1684 — native_session adapters now use platform a2a_queue ... (#1685)), branch-protected, required context CI / all-required (pull_request), 2 approvals, merge whitelist disabled.
  • Current-head statuses: pending:40, success:38, failure/error:0.
  • Stale red evidence remains tied to PR heads/old heads only: PR #1614 has 3 failures, PR #1596 has 4 failures, and merged PR #1683 still has 3 PR-head review/SOP failures, but no current main-head blocker.
  • Runner/infra spot check: canonical runner storage /mnt/ci-runner-storage-1 is 69% used; Docker root is /mnt/ci-runner-storage-1/docker; molecule-core-net exists.
  • Probe spot check: obs health 200, staging-api health 200; known api /healthz 404 and staging.moleculesai.app DNS unresolved are unchanged.

No merge, branch-protection mutation, secret rotation, or runner mutation performed.

needs-hongming heartbeat verification 2026-05-22 18:59 PDT: current-head status remains green after molecule-core main advanced to the #1685 scheduler fix. Fresh evidence: - molecule-core/main = 2357aec4bf46aae198a4343476b0a362cbb38764 (`fix(scheduler): #1684 — native_session adapters now use platform a2a_queue ... (#1685)`), branch-protected, required context `CI / all-required (pull_request)`, 2 approvals, merge whitelist disabled. - Current-head statuses: pending:40, success:38, failure/error:0. - Stale red evidence remains tied to PR heads/old heads only: PR #1614 has 3 failures, PR #1596 has 4 failures, and merged PR #1683 still has 3 PR-head review/SOP failures, but no current main-head blocker. - Runner/infra spot check: canonical runner storage /mnt/ci-runner-storage-1 is 69% used; Docker root is /mnt/ci-runner-storage-1/docker; molecule-core-net exists. - Probe spot check: obs health 200, staging-api health 200; known api /healthz 404 and staging.moleculesai.app DNS unresolved are unchanged. No merge, branch-protection mutation, secret rotation, or runner mutation performed.
Owner

needs-hongming heartbeat verification 2026-05-22 19:59 PDT: current-head status remains green after molecule-core main advanced to the #1698 scheduler/A2A error-detection fix.

Fresh evidence:

  • molecule-core/main = b6373e7026 (fix(scheduler): #1696 — detect A2A adapter errors in 2xx response body (#1698)), branch-protected, required context CI / all-required (pull_request), 2 approvals, merge whitelist disabled.
  • Current-head statuses: pending:30, success:29, failure/error:0.
  • Stale red evidence remains tied to PR heads/old heads only: PR #1614 has 3 failures, PR #1596 has 4 failures, and merged PR #1683 still has 3 PR-head review/SOP failures, but no current main-head blocker.
  • Runner/infra spot check: canonical runner storage /mnt/ci-runner-storage-1 is 73% used; Docker root is /mnt/ci-runner-storage-1/docker; molecule-core-net exists.
  • Probe spot check: obs health 200, staging-api health 200; known api /healthz 404 and staging.moleculesai.app DNS unresolved are unchanged.

No merge, branch-protection mutation, secret rotation, or runner mutation performed.

needs-hongming heartbeat verification 2026-05-22 19:59 PDT: current-head status remains green after molecule-core main advanced to the #1698 scheduler/A2A error-detection fix. Fresh evidence: - molecule-core/main = b6373e7026180e2529eafc853bd0eaea08d1adfc (`fix(scheduler): #1696 — detect A2A adapter errors in 2xx response body (#1698)`), branch-protected, required context `CI / all-required (pull_request)`, 2 approvals, merge whitelist disabled. - Current-head statuses: pending:30, success:29, failure/error:0. - Stale red evidence remains tied to PR heads/old heads only: PR #1614 has 3 failures, PR #1596 has 4 failures, and merged PR #1683 still has 3 PR-head review/SOP failures, but no current main-head blocker. - Runner/infra spot check: canonical runner storage /mnt/ci-runner-storage-1 is 73% used; Docker root is /mnt/ci-runner-storage-1/docker; molecule-core-net exists. - Probe spot check: obs health 200, staging-api health 200; known api /healthz 404 and staging.moleculesai.app DNS unresolved are unchanged. No merge, branch-protection mutation, secret rotation, or runner mutation performed.
Owner

needs-hongming heartbeat verification 2026-05-22 20:59 PDT: current-head status remains green after molecule-core main advanced to the #1699 scheduler/SDK-layer error-detection fix.

Fresh evidence:

  • molecule-core/main = 1df028f05b (fix(scheduler): #1696 — detect SDK-layer errors inside HTTP 200 responses (#1699)), branch-protected, required context CI / all-required (pull_request), 2 approvals, merge whitelist disabled.
  • Current-head statuses: pending:33, success:31, failure/error:0.
  • Stale red evidence remains tied to PR heads/old heads only: PR #1614 has 3 failures, PR #1596 has 4 failures, and merged PR #1683 still has 3 PR-head review/SOP failures, but no current main-head blocker.
  • Runner/infra spot check: canonical runner storage /mnt/ci-runner-storage-1 is 73% used; Docker root is /mnt/ci-runner-storage-1/docker; molecule-core-net exists.
  • Probe spot check: obs health 200, staging-api health 200; known api /healthz 404 and staging.moleculesai.app DNS unresolved are unchanged.

No merge, branch-protection mutation, secret rotation, or runner mutation performed.

needs-hongming heartbeat verification 2026-05-22 20:59 PDT: current-head status remains green after molecule-core main advanced to the #1699 scheduler/SDK-layer error-detection fix. Fresh evidence: - molecule-core/main = 1df028f05b479a4f3f8f4f0eca78a3ae89e68732 (`fix(scheduler): #1696 — detect SDK-layer errors inside HTTP 200 responses (#1699)`), branch-protected, required context `CI / all-required (pull_request)`, 2 approvals, merge whitelist disabled. - Current-head statuses: pending:33, success:31, failure/error:0. - Stale red evidence remains tied to PR heads/old heads only: PR #1614 has 3 failures, PR #1596 has 4 failures, and merged PR #1683 still has 3 PR-head review/SOP failures, but no current main-head blocker. - Runner/infra spot check: canonical runner storage /mnt/ci-runner-storage-1 is 73% used; Docker root is /mnt/ci-runner-storage-1/docker; molecule-core-net exists. - Probe spot check: obs health 200, staging-api health 200; known api /healthz 404 and staging.moleculesai.app DNS unresolved are unchanged. No merge, branch-protection mutation, secret rotation, or runner mutation performed.
Owner

needs-hongming heartbeat verification 2026-05-22 21:59 PDT: current-head status remains green after molecule-core main advanced to the #1699 scheduler/SDK-layer error-detection fix.

Fresh evidence:

  • molecule-core/main = 1df028f05b (fix(scheduler): #1696 — detect SDK-layer errors inside HTTP 200 responses (#1699)), branch-protected, required context CI / all-required (pull_request), 2 approvals, merge whitelist disabled.
  • Current-head statuses: pending:49, success:46, failure/error:0.
  • Stale red evidence remains tied to PR heads/old heads only: PR #1614 has 3 failures, PR #1596 has 4 failures, and merged PR #1683 still has 3 PR-head review/SOP failures, but no current main-head blocker.
  • Runner/infra spot check: canonical runner storage /mnt/ci-runner-storage-1 is 73% used; Docker root is /mnt/ci-runner-storage-1/docker; molecule-core-net exists.
  • Probe spot check: obs health 200, staging-api health 200; known api /healthz 404 and staging.moleculesai.app DNS unresolved are unchanged.

No merge, branch-protection mutation, secret rotation, or runner mutation performed.

needs-hongming heartbeat verification 2026-05-22 21:59 PDT: current-head status remains green after molecule-core main advanced to the #1699 scheduler/SDK-layer error-detection fix. Fresh evidence: - molecule-core/main = 1df028f05b479a4f3f8f4f0eca78a3ae89e68732 (`fix(scheduler): #1696 — detect SDK-layer errors inside HTTP 200 responses (#1699)`), branch-protected, required context `CI / all-required (pull_request)`, 2 approvals, merge whitelist disabled. - Current-head statuses: pending:49, success:46, failure/error:0. - Stale red evidence remains tied to PR heads/old heads only: PR #1614 has 3 failures, PR #1596 has 4 failures, and merged PR #1683 still has 3 PR-head review/SOP failures, but no current main-head blocker. - Runner/infra spot check: canonical runner storage /mnt/ci-runner-storage-1 is 73% used; Docker root is /mnt/ci-runner-storage-1/docker; molecule-core-net exists. - Probe spot check: obs health 200, staging-api health 200; known api /healthz 404 and staging.moleculesai.app DNS unresolved are unchanged. No merge, branch-protection mutation, secret rotation, or runner mutation performed.
Owner

needs-hongming heartbeat verification 2026-05-22 22:59 PDT: current-head status remains green on molecule-core main.

Fresh evidence:

  • molecule-core/main = 1df028f05b (fix(scheduler): #1696 — detect SDK-layer errors inside HTTP 200 responses (#1699)), branch-protected, required context CI / all-required (pull_request), 2 approvals, merge whitelist disabled.
  • Current-head statuses: pending:61, success:60, failure/error:0.
  • Stale red evidence remains tied to PR heads/old heads only: PR #1614 has 3 failures, PR #1596 has 4 failures, and merged PR #1683 still has 3 PR-head review/SOP failures, but no current main-head blocker.
  • Runner/infra spot check: canonical runner storage /mnt/ci-runner-storage-1 is 73% used; Docker root is /mnt/ci-runner-storage-1/docker; molecule-core-net exists.
  • Probe spot check: obs health 200, staging-api health 200; known api /healthz 404 and staging.moleculesai.app DNS unresolved are unchanged.

No merge, branch-protection mutation, secret rotation, or runner mutation performed.

needs-hongming heartbeat verification 2026-05-22 22:59 PDT: current-head status remains green on molecule-core main. Fresh evidence: - molecule-core/main = 1df028f05b479a4f3f8f4f0eca78a3ae89e68732 (`fix(scheduler): #1696 — detect SDK-layer errors inside HTTP 200 responses (#1699)`), branch-protected, required context `CI / all-required (pull_request)`, 2 approvals, merge whitelist disabled. - Current-head statuses: pending:61, success:60, failure/error:0. - Stale red evidence remains tied to PR heads/old heads only: PR #1614 has 3 failures, PR #1596 has 4 failures, and merged PR #1683 still has 3 PR-head review/SOP failures, but no current main-head blocker. - Runner/infra spot check: canonical runner storage /mnt/ci-runner-storage-1 is 73% used; Docker root is /mnt/ci-runner-storage-1/docker; molecule-core-net exists. - Probe spot check: obs health 200, staging-api health 200; known api /healthz 404 and staging.moleculesai.app DNS unresolved are unchanged. No merge, branch-protection mutation, secret rotation, or runner mutation performed.
Owner

needs-hongming heartbeat verification 2026-05-22 23:59 PDT: current-head status remains green after molecule-core main advanced to the #1705 Container Config tab skeleton merge.

Fresh evidence:

  • molecule-core/main = 5cc570a18f (Merge pull request ... add Container Config tab skeleton (#1705)), branch-protected, required context CI / all-required (pull_request), 2 approvals, merge whitelist disabled.
  • Current-head statuses: pending:25, success:21, failure/error:0.
  • Stale red evidence remains tied to PR heads/old heads only: PR #1614 has 3 failures, PR #1596 has 4 failures, and merged PR #1683 still has 3 PR-head review/SOP failures, but no current main-head blocker.
  • Runner/infra spot check: canonical runner storage /mnt/ci-runner-storage-1 dropped to 61% used; Docker root is /mnt/ci-runner-storage-1/docker; molecule-core-net exists.
  • Probe spot check: obs health 200, staging-api health 200; known api /healthz 404 and staging.moleculesai.app DNS unresolved are unchanged.

No merge, branch-protection mutation, secret rotation, or runner mutation performed.

needs-hongming heartbeat verification 2026-05-22 23:59 PDT: current-head status remains green after molecule-core main advanced to the #1705 Container Config tab skeleton merge. Fresh evidence: - molecule-core/main = 5cc570a18fe3045291ab9ed45774b3dc9710e74d (`Merge pull request ... add Container Config tab skeleton (#1705)`), branch-protected, required context `CI / all-required (pull_request)`, 2 approvals, merge whitelist disabled. - Current-head statuses: pending:25, success:21, failure/error:0. - Stale red evidence remains tied to PR heads/old heads only: PR #1614 has 3 failures, PR #1596 has 4 failures, and merged PR #1683 still has 3 PR-head review/SOP failures, but no current main-head blocker. - Runner/infra spot check: canonical runner storage /mnt/ci-runner-storage-1 dropped to 61% used; Docker root is /mnt/ci-runner-storage-1/docker; molecule-core-net exists. - Probe spot check: obs health 200, staging-api health 200; known api /healthz 404 and staging.moleculesai.app DNS unresolved are unchanged. No merge, branch-protection mutation, secret rotation, or runner mutation performed.
Owner

needs-hongming heartbeat verification 2026-05-23 00:59 PDT: current-head status remains green after molecule-core main advanced to the #1707 OpenAPI schedules spec merge.

Fresh evidence:

  • molecule-core/main = e6e9731bf3 (RFC #1706 Phase 1: OpenAPI spec from workspace-server schedules handler (#1707)), branch-protected, required context CI / all-required (pull_request), 2 approvals, merge whitelist disabled.
  • Current-head statuses: pending:26, success:25, failure/error:0.
  • Stale red evidence remains tied to PR heads/old heads only: PR #1614 has 3 failures, PR #1596 has 4 failures, and merged PR #1683 still has 3 PR-head review/SOP failures, but no current main-head blocker.
  • Runner/infra spot check: canonical runner storage /mnt/ci-runner-storage-1 is 61% used; Docker root is /mnt/ci-runner-storage-1/docker; molecule-core-net exists.
  • Probe spot check: obs health 200, staging-api health 200; known api /healthz 404 and staging.moleculesai.app DNS unresolved are unchanged.

No merge, branch-protection mutation, secret rotation, or runner mutation performed.

needs-hongming heartbeat verification 2026-05-23 00:59 PDT: current-head status remains green after molecule-core main advanced to the #1707 OpenAPI schedules spec merge. Fresh evidence: - molecule-core/main = e6e9731bf34834c99521731e2773fecb3c967996 (`RFC #1706 Phase 1: OpenAPI spec from workspace-server schedules handler (#1707)`), branch-protected, required context `CI / all-required (pull_request)`, 2 approvals, merge whitelist disabled. - Current-head statuses: pending:26, success:25, failure/error:0. - Stale red evidence remains tied to PR heads/old heads only: PR #1614 has 3 failures, PR #1596 has 4 failures, and merged PR #1683 still has 3 PR-head review/SOP failures, but no current main-head blocker. - Runner/infra spot check: canonical runner storage /mnt/ci-runner-storage-1 is 61% used; Docker root is /mnt/ci-runner-storage-1/docker; molecule-core-net exists. - Probe spot check: obs health 200, staging-api health 200; known api /healthz 404 and staging.moleculesai.app DNS unresolved are unchanged. No merge, branch-protection mutation, secret rotation, or runner mutation performed.
Owner

needs-hongming CI/security triage 2026-05-23 01:59 PDT / 08:59 UTC

Fresh evidence:

  • molecule-core main is at 665f0a2405 (merged #1718 Add display control lock endpoints).
  • Current-head statuses: failure:1, pending:27, success:18.
  • Failing context: publish-workspace-server-image / Production auto-deploy (push), run 80822 job 1, failed after 30m23s.
  • Log evidence: prod-auto-deploy wait-ci timed out after 1800s. Last states at timeout: Platform (Go), Canvas (Next.js), Shellcheck, and all-required were still pending; Python lint/test and secret scan had succeeded.
  • Branch protection remains enabled with required context CI / all-required (pull_request), required approvals=2, dismiss stale approvals=true, merge whitelist disabled.

Impact: production auto-deploy for main did not complete for 665f0a2. This looks like a wait-window/current-CI-latency failure rather than a failing required test, but it is a current-head failure and should not be ignored.

No destructive action taken: I did not rerun deploy, mutate protection, rotate secrets, or restart runners.

needs-hongming CI/security triage 2026-05-23 01:59 PDT / 08:59 UTC Fresh evidence: - molecule-core main is at 665f0a2405967288b9d1ca18ea772ea43513137c (merged #1718 Add display control lock endpoints). - Current-head statuses: failure:1, pending:27, success:18. - Failing context: publish-workspace-server-image / Production auto-deploy (push), run 80822 job 1, failed after 30m23s. - Log evidence: prod-auto-deploy wait-ci timed out after 1800s. Last states at timeout: Platform (Go), Canvas (Next.js), Shellcheck, and all-required were still pending; Python lint/test and secret scan had succeeded. - Branch protection remains enabled with required context CI / all-required (pull_request), required approvals=2, dismiss stale approvals=true, merge whitelist disabled. Impact: production auto-deploy for main did not complete for 665f0a2. This looks like a wait-window/current-CI-latency failure rather than a failing required test, but it is a current-head failure and should not be ignored. No destructive action taken: I did not rerun deploy, mutate protection, rotate secrets, or restart runners.
Owner

needs-hongming CI/security triage 2026-05-23 02:59 PDT / 09:59 UTC

Fresh molecule-core evidence:

  • main advanced to 010ec0f81b ( #1715).
  • Status list on the head has no failure/error contexts (), but combined status remains pending because multiple push jobs are blocked/waiting: CI all-required, Platform, Canvas, Shellcheck, E2E/API/chat/canvas, handlers integration, harness replays, shellcheck-arm64, and production auto-deploy.
  • Previous current-head production auto-deploy failure on 665f0a2 is now old-head/stale; no deploy rerun or branch-protection mutation performed.
  • Branch protection remains enabled with required context , approvals=2, dismiss stale approvals=true, merge whitelist disabled.

Impact: no fresh failed molecule-core head context this wake, but the new head is not green yet because push CI is still pending/blocked.

needs-hongming CI/security triage 2026-05-23 02:59 PDT / 09:59 UTC Fresh molecule-core evidence: - main advanced to 010ec0f81b8206098aa6de56bddd655ac9b83767 ( #1715). - Status list on the head has no failure/error contexts (), but combined status remains pending because multiple push jobs are blocked/waiting: CI all-required, Platform, Canvas, Shellcheck, E2E/API/chat/canvas, handlers integration, harness replays, shellcheck-arm64, and production auto-deploy. - Previous current-head production auto-deploy failure on 665f0a2 is now old-head/stale; no deploy rerun or branch-protection mutation performed. - Branch protection remains enabled with required context , approvals=2, dismiss stale approvals=true, merge whitelist disabled. Impact: no fresh failed molecule-core head context this wake, but the new head is not green yet because push CI is still pending/blocked.
Owner

needs-hongming CI/security triage 2026-05-23 02:59 PDT / 09:59 UTC — corrected evidence note superseding my immediately prior malformed comment.

Fresh molecule-core evidence:

  • main advanced to 010ec0f81b (fix(server): add ReadHeaderTimeout to http.Server #1715).
  • Status list on the head has no failure/error contexts (pending:24,success:12), but combined status remains pending because multiple push jobs are blocked/waiting: CI all-required, Platform, Canvas, Shellcheck, E2E/API/chat/canvas, handlers integration, harness replays, shellcheck-arm64, and production auto-deploy.
  • Previous current-head production auto-deploy failure on 665f0a2 is now old-head/stale; no deploy rerun or branch-protection mutation performed.
  • Branch protection remains enabled with required context CI / all-required (pull_request), approvals=2, dismiss stale approvals=true, merge whitelist disabled.

Impact: no fresh failed molecule-core head context this wake, but the new head is not green yet because push CI is still pending/blocked.

needs-hongming CI/security triage 2026-05-23 02:59 PDT / 09:59 UTC — corrected evidence note superseding my immediately prior malformed comment. Fresh molecule-core evidence: - main advanced to 010ec0f81b8206098aa6de56bddd655ac9b83767 (`fix(server): add ReadHeaderTimeout to http.Server` #1715). - Status list on the head has no failure/error contexts (`pending:24,success:12`), but combined status remains pending because multiple push jobs are blocked/waiting: CI all-required, Platform, Canvas, Shellcheck, E2E/API/chat/canvas, handlers integration, harness replays, shellcheck-arm64, and production auto-deploy. - Previous current-head production auto-deploy failure on 665f0a2 is now old-head/stale; no deploy rerun or branch-protection mutation performed. - Branch protection remains enabled with required context `CI / all-required (pull_request)`, approvals=2, dismiss stale approvals=true, merge whitelist disabled. Impact: no fresh failed molecule-core head context this wake, but the new head is not green yet because push CI is still pending/blocked.
Owner

needs-hongming 2026-05-23 12:20 PDT triage evidence

Fresh main evidence for molecule-core:

  • main head: e05fc4daae
  • combined state: failure; status counts: failure=3, pending=144, success=139
  • current failures sampled from Gitea Actions DB/logs:
    • lint-continue-on-error-tracking / lint-continue-on-error-tracking (push), run 81589, failed after 1m11s
    • Railway pin audit / Audit Railway env vars for drift-prone pins (push), run 81585, failed after 3s
    • older duplicate lint-continue-on-error-tracking run 81449

Safe fix opened:

  • #1731
  • Adds internal#418 tracker adjacent to the advisory Mac arm64 continue-on-error mask in .gitea/workflows/ci-arm64-advisory.yml.
  • Verification: python3 .gitea/scripts/lint_continue_on_error_tracking.py exited 0 locally. The local run still reported token-scope caveats for tracker lookups, but accepted all 36 continue-on-error directives.

Human/secret help still needed:

  • Railway pin audit fails because RAILWAY_AUDIT_TOKEN is absent on scheduled trigger. The log requests a read-only variables-scope token on the molecule-platform Railway project stored as repo secret RAILWAY_AUDIT_TOKEN. I did not provision or rotate secrets.
needs-hongming 2026-05-23 12:20 PDT triage evidence Fresh main evidence for molecule-core: - main head: e05fc4daaedc92a9cd86c367113431504e0f1d1c - combined state: failure; status counts: failure=3, pending=144, success=139 - current failures sampled from Gitea Actions DB/logs: - lint-continue-on-error-tracking / lint-continue-on-error-tracking (push), run 81589, failed after 1m11s - Railway pin audit / Audit Railway env vars for drift-prone pins (push), run 81585, failed after 3s - older duplicate lint-continue-on-error-tracking run 81449 Safe fix opened: - https://git.moleculesai.app/molecule-ai/molecule-core/pulls/1731 - Adds internal#418 tracker adjacent to the advisory Mac arm64 continue-on-error mask in .gitea/workflows/ci-arm64-advisory.yml. - Verification: python3 .gitea/scripts/lint_continue_on_error_tracking.py exited 0 locally. The local run still reported token-scope caveats for tracker lookups, but accepted all 36 continue-on-error directives. Human/secret help still needed: - Railway pin audit fails because RAILWAY_AUDIT_TOKEN is absent on scheduled trigger. The log requests a read-only variables-scope token on the molecule-platform Railway project stored as repo secret RAILWAY_AUDIT_TOKEN. I did not provision or rotate secrets.
Owner

2026-05-23 13:20 PDT refresh

Fresh molecule-core evidence:

  • main head advanced to 656176d511.
  • API status counts for current main: pending=30, success=27, failure=0.
  • Current main is no longer showing the lint/Railway failures sampled at 12:20; many push lanes are still pending/queued or blocked by conditions.
  • PR #1731 remains open/mergeable for the arm64 advisory tracker change. DB evidence for head aee31f30 shows most PR action runs succeeded, with failures limited to review/SOP gate surfaces rather than a new code change.

No merge, force-push, main push, secret rotation, branch-protection mutation, runner restart, or destructive cleanup was performed.

2026-05-23 13:20 PDT refresh Fresh molecule-core evidence: - main head advanced to 656176d51180753de48f2d9d83318ef3e649a87e. - API status counts for current main: pending=30, success=27, failure=0. - Current main is no longer showing the lint/Railway failures sampled at 12:20; many push lanes are still pending/queued or blocked by conditions. - PR https://git.moleculesai.app/molecule-ai/molecule-core/pulls/1731 remains open/mergeable for the arm64 advisory tracker change. DB evidence for head aee31f30 shows most PR action runs succeeded, with failures limited to review/SOP gate surfaces rather than a new code change. No merge, force-push, main push, secret rotation, branch-protection mutation, runner restart, or destructive cleanup was performed.
Owner

2026-05-23 14:20 PDT refresh

Fresh molecule-core evidence:

  • main head advanced to 436fae8949.
  • API status counts for the SHA: failure=8, pending=117, success=92.
  • The visible failures attached to the SHA are PR/review/SOP gate statuses for refs/pull/1702, refs/pull/1709, refs/pull/1710, and refs/pull/1712 (sop-checklist and security-review). Push lanes for main are still heavily pending/running; DB shows current-main push runs 98048/98084 still running, with old/stale task-log oddities from prior runs.
  • PR #1731 remains open and mergeable; its base has updated to 436fae8949.

No safe code-only fix was obvious from this sample. No merge, force-push, main push, secret rotation, branch-protection mutation, runner restart, or destructive cleanup was performed.

2026-05-23 14:20 PDT refresh Fresh molecule-core evidence: - main head advanced to 436fae8949b2b89866d29c623165bea66cc18991. - API status counts for the SHA: failure=8, pending=117, success=92. - The visible failures attached to the SHA are PR/review/SOP gate statuses for refs/pull/1702, refs/pull/1709, refs/pull/1710, and refs/pull/1712 (`sop-checklist` and `security-review`). Push lanes for main are still heavily pending/running; DB shows current-main push runs 98048/98084 still running, with old/stale task-log oddities from prior runs. - PR https://git.moleculesai.app/molecule-ai/molecule-core/pulls/1731 remains open and mergeable; its base has updated to 436fae8949b2b89866d29c623165bea66cc18991. No safe code-only fix was obvious from this sample. No merge, force-push, main push, secret rotation, branch-protection mutation, runner restart, or destructive cleanup was performed.
Owner

2026-05-23 15:20 PDT refresh

Fresh molecule-core evidence:

  • main head advanced to b13c9f94f1.
  • API status counts for current main: pending=25, success=1, failure=0.
  • Current non-success contexts are all queued or blocked-by-required-condition push lanes; no current-head failure is attached to this SHA in the sampled API response.
  • PR #1745 (ci: add internal#418 tracker for arm64 advisory continue-on-error) is merged. My earlier PR #1731 remains open but is now not mergeable and appears superseded by #1745; I did not close it.

No safe code-only fix was obvious from this sample. No merge, force-push, main push, secret rotation, branch-protection mutation, runner restart, or destructive cleanup was performed.

2026-05-23 15:20 PDT refresh Fresh molecule-core evidence: - main head advanced to b13c9f94f11b2389353843de86c846e294b48bd4. - API status counts for current main: pending=25, success=1, failure=0. - Current non-success contexts are all queued or blocked-by-required-condition push lanes; no current-head failure is attached to this SHA in the sampled API response. - PR #1745 (`ci: add internal#418 tracker for arm64 advisory continue-on-error`) is merged. My earlier PR #1731 remains open but is now not mergeable and appears superseded by #1745; I did not close it. No safe code-only fix was obvious from this sample. No merge, force-push, main push, secret rotation, branch-protection mutation, runner restart, or destructive cleanup was performed.
Owner

2026-05-23 16:20 PDT refresh

Fresh molecule-core evidence:

  • main head advanced to ef42e17224.
  • API status counts for current main: pending=37, success=36, failure=0.
  • Current non-success contexts are queued/running or blocked-by-required-condition push lanes; no current-head failure is attached to this SHA in the sampled API response.
  • PR #1745 (arm64 advisory tracker) and PR #1746 (production auto-deploy nonblocking) are merged. Older PR #1731 remains open/not mergeable and appears superseded; I did not close it.

No safe code-only fix was obvious from this sample. No merge, force-push, main push, secret rotation, branch-protection mutation, runner restart, or destructive cleanup was performed.

2026-05-23 16:20 PDT refresh Fresh molecule-core evidence: - main head advanced to ef42e17224dd936295ea79e04cc7511c8a9a944f. - API status counts for current main: pending=37, success=36, failure=0. - Current non-success contexts are queued/running or blocked-by-required-condition push lanes; no current-head failure is attached to this SHA in the sampled API response. - PR #1745 (arm64 advisory tracker) and PR #1746 (production auto-deploy nonblocking) are merged. Older PR #1731 remains open/not mergeable and appears superseded; I did not close it. No safe code-only fix was obvious from this sample. No merge, force-push, main push, secret rotation, branch-protection mutation, runner restart, or destructive cleanup was performed.
Owner

needs-hongming 2026-05-23 17:20 PDT refresh

Fresh molecule-core evidence:

  • main head advanced to 4d32736e25.
  • API status counts for current main: failure=1, pending=32, success=30.
  • Current failure: E2E Chat / E2E Chat (push), run 82702 job 1, failed after 3m30s.
  • Non-success contexts otherwise are queued/running or blocked-by-required-condition push lanes.
  • Attempted direct log read at /opt/molecule/gitea/actions_log/molecule-ai/molecule-core/0e/82702.log, but that file contains old 2026-05-14 issue_comment SOP output, so the current failure log path/index appears stale or colliding in local storage. I did not infer root cause from the mismatched log.
  • PR #1751 is merged; PR #1752 and #1749 are open/mergeable. Older PR #1731 remains open/not mergeable and appears superseded; I did not close it.

No safe code-only fix was obvious from this sample. No merge, force-push, main push, secret rotation, branch-protection mutation, runner restart, or destructive cleanup was performed.

needs-hongming 2026-05-23 17:20 PDT refresh Fresh molecule-core evidence: - main head advanced to 4d32736e2503b534e43230318cbaeb03eb9d0b7f. - API status counts for current main: failure=1, pending=32, success=30. - Current failure: E2E Chat / E2E Chat (push), run 82702 job 1, failed after 3m30s. - Non-success contexts otherwise are queued/running or blocked-by-required-condition push lanes. - Attempted direct log read at /opt/molecule/gitea/actions_log/molecule-ai/molecule-core/0e/82702.log, but that file contains old 2026-05-14 issue_comment SOP output, so the current failure log path/index appears stale or colliding in local storage. I did not infer root cause from the mismatched log. - PR #1751 is merged; PR #1752 and #1749 are open/mergeable. Older PR #1731 remains open/not mergeable and appears superseded; I did not close it. No safe code-only fix was obvious from this sample. No merge, force-push, main push, secret rotation, branch-protection mutation, runner restart, or destructive cleanup was performed.
Owner

Hourly CI/security triage - 2026-05-23 18:20 PDT

Fresh evidence:

  • main: 0ffb29f371, protected=true.
  • Current-head statuses: no failures observed; counts were pending=27, success=8 at check time.
  • Pending/blocked contexts include E2E API/Chat/Canvas/SaaS, forbidden-token lint, arm64 shellcheck, secret scan, production auto-deploy, and CI / all-required.
  • Recent DB failures were for PR heads or old main 4d32736, not current main 0ffb29f.
  • Superseded arm64 tracker PR #1731 remains open and non-mergeable against current main; no safe agent action taken to close or merge it.

Action: no branch/protection mutation and no merge performed. Continuing to watch current-head completion.

### Hourly CI/security triage - 2026-05-23 18:20 PDT Fresh evidence: - main: 0ffb29f371a3c12fbb9aa814df52f430247f9079, protected=true. - Current-head statuses: no failures observed; counts were pending=27, success=8 at check time. - Pending/blocked contexts include E2E API/Chat/Canvas/SaaS, forbidden-token lint, arm64 shellcheck, secret scan, production auto-deploy, and CI / all-required. - Recent DB failures were for PR heads or old main 4d32736, not current main 0ffb29f. - Superseded arm64 tracker PR #1731 remains open and non-mergeable against current main; no safe agent action taken to close or merge it. Action: no branch/protection mutation and no merge performed. Continuing to watch current-head completion.
Owner

Hourly CI/security triage - 2026-05-23 19:20 PDT

Fresh evidence:

  • main: 43422e0ba9, protected=true.
  • Current-head status counts: failure=1, pending=26, success=14.
  • Current failure: publish-workspace-server-image / Production auto-deploy (push), run 83125 job 1, task log /opt/molecule/gitea/actions_log/molecule-ai/molecule-core/8b/153227.log.
  • Log root cause: prod-auto-deploy wait-ci timed out after 1800s. Last states at timeout: Platform (Go)=pending, Canvas=pending, Shellcheck=pending, Python Lint & Test=success, CI / all-required=pending, Secret scan=success.
  • DB shows ci.yml for the same SHA still running at sample time; several E2E and gate workflows are waiting/blocked.
  • PR #1752 is merged; superseded #1731 remains open and non-mergeable and was not closed.

No workflow rerun, branch/protection mutation, merge, or destructive cleanup performed.

### Hourly CI/security triage - 2026-05-23 19:20 PDT Fresh evidence: - main: 43422e0ba9137a854dfadd1404e85bcf1939c062, protected=true. - Current-head status counts: failure=1, pending=26, success=14. - Current failure: publish-workspace-server-image / Production auto-deploy (push), run 83125 job 1, task log /opt/molecule/gitea/actions_log/molecule-ai/molecule-core/8b/153227.log. - Log root cause: prod-auto-deploy wait-ci timed out after 1800s. Last states at timeout: Platform (Go)=pending, Canvas=pending, Shellcheck=pending, Python Lint & Test=success, CI / all-required=pending, Secret scan=success. - DB shows ci.yml for the same SHA still running at sample time; several E2E and gate workflows are waiting/blocked. - PR #1752 is merged; superseded #1731 remains open and non-mergeable and was not closed. No workflow rerun, branch/protection mutation, merge, or destructive cleanup performed.
Owner

Hourly CI/security triage - 2026-05-23 20:20 PDT

Fresh evidence:

  • main: e5521c7675, protected=true.
  • Current-head status counts: pending=25, success=2, failure/error=0.
  • Non-success contexts are queued/blocked push lanes including CI, E2E, handlers, harness, shellcheck-arm64, publish workspace-server, and secret scan.
  • The prior red main 43422e0 production auto-deploy timeout is no longer current-head evidence.
  • Relevant PR state: #1765 is merged; #1766 (fix(ci): move all-required to meta runner lane) is open/mergeable; #1731 remains open and non-mergeable and was not closed.

No workflow rerun, branch/protection mutation, merge, or destructive cleanup performed.

### Hourly CI/security triage - 2026-05-23 20:20 PDT Fresh evidence: - main: e5521c7675227c155d61065bf3dc775504e0cf3c, protected=true. - Current-head status counts: pending=25, success=2, failure/error=0. - Non-success contexts are queued/blocked push lanes including CI, E2E, handlers, harness, shellcheck-arm64, publish workspace-server, and secret scan. - The prior red main 43422e0 production auto-deploy timeout is no longer current-head evidence. - Relevant PR state: #1765 is merged; #1766 (fix(ci): move all-required to meta runner lane) is open/mergeable; #1731 remains open and non-mergeable and was not closed. No workflow rerun, branch/protection mutation, merge, or destructive cleanup performed.
Owner

Hourly CI/security triage - 2026-05-23 21:20 PDT

Fresh evidence:

  • main: d594190653, protected=true.
  • Current-head status counts: pending=30, success=2, failure/error=0.
  • Non-success contexts are queued/blocked push lanes including CI, E2E API/Chat/Peer Visibility/Canvas/External Runtime/SaaS, handlers, harness, forbidden-token lint, shellcheck-arm64, publish workspace-server, and secret scan.
  • DB shows publish-workspace-server-image for this SHA is running; no current-head failure was observed.
  • Relevant PR state: #1766 (fix(ci): move all-required to meta runner lane) remains open/mergeable; #1731 remains open/non-mergeable and was not closed.

No workflow rerun, branch/protection mutation, merge, or destructive cleanup performed.

### Hourly CI/security triage - 2026-05-23 21:20 PDT Fresh evidence: - main: d594190653a2f09cb366d0c7aa394012405a5168, protected=true. - Current-head status counts: pending=30, success=2, failure/error=0. - Non-success contexts are queued/blocked push lanes including CI, E2E API/Chat/Peer Visibility/Canvas/External Runtime/SaaS, handlers, harness, forbidden-token lint, shellcheck-arm64, publish workspace-server, and secret scan. - DB shows publish-workspace-server-image for this SHA is running; no current-head failure was observed. - Relevant PR state: #1766 (fix(ci): move all-required to meta runner lane) remains open/mergeable; #1731 remains open/non-mergeable and was not closed. No workflow rerun, branch/protection mutation, merge, or destructive cleanup performed.
Owner

Hourly CI/security triage - 2026-05-23 22:20 PDT

Fresh evidence:

  • main: 220a04b1b3, protected=true.
  • Current-head status counts: pending=11 at first sample, later non-success listing showed 30 pending/queued contexts and no failure/error for this SHA.
  • Non-success contexts are queued/blocked push lanes including CI, E2E API/Chat/Peer Visibility/Canvas/External Runtime/SaaS, handlers, harness, forbidden-token lint, shellcheck-arm64, publish workspace-server, and secret scan.
  • Recent DB failures in molecule-core are on prior main heads or PR heads, not current 220a04b.
  • PR #1766 (fix(ci): move all-required to meta runner lane) is merged. PR #1774 (fix(ci): make prod deploy wait on aggregate context) is open/mergeable. Old #1731 remains open/non-mergeable and was not closed.

No workflow rerun, branch/protection mutation, merge, or destructive cleanup performed.

### Hourly CI/security triage - 2026-05-23 22:20 PDT Fresh evidence: - main: 220a04b1b3925e0d884fd640e96d9302141bf71a, protected=true. - Current-head status counts: pending=11 at first sample, later non-success listing showed 30 pending/queued contexts and no failure/error for this SHA. - Non-success contexts are queued/blocked push lanes including CI, E2E API/Chat/Peer Visibility/Canvas/External Runtime/SaaS, handlers, harness, forbidden-token lint, shellcheck-arm64, publish workspace-server, and secret scan. - Recent DB failures in molecule-core are on prior main heads or PR heads, not current 220a04b. - PR #1766 (fix(ci): move all-required to meta runner lane) is merged. PR #1774 (fix(ci): make prod deploy wait on aggregate context) is open/mergeable. Old #1731 remains open/non-mergeable and was not closed. No workflow rerun, branch/protection mutation, merge, or destructive cleanup performed.
Owner

Hourly CI/security triage - 2026-05-24 00:20 PDT

Fresh evidence from Gitea API/DB during this pass:

  • molecule-core@main initially sampled at 0fc1649a0c01297934cf9d31f4f8f9a47c18d557 with one current-head failure: Staging SaaS smoke (every 30 min) / Staging SaaS smoke (push) at /molecule-ai/molecule-core/actions/runs/84214/jobs/0, failing after 12m49s. Log tail showed staging full-SaaS smoke reached org creation and tenant provisioning (e2e-smoke-20260524-smoke-100822) before the failure window.
  • DB evidence also showed main push run 100822 for core 0fc1649 ended Failure at 2026-05-24 06:44:14Z.
  • Re-sampled before closing the heartbeat: current molecule-core@main advanced to 878c74eafef0dc65a5d11385e5892346038e7f22 and has no failure/error statuses yet (pending=32, success=1).
  • Health probe correlation remains: OBS health 200; https://api.moleculesai.app/healthz 404; https://staging-api.moleculesai.app/healthz 404; https://staging.moleculesai.app DNS unresolved (curl 000).

No rerun, deploy, cleanup, secret mutation, or branch-protection mutation was performed.

### Hourly CI/security triage - 2026-05-24 00:20 PDT Fresh evidence from Gitea API/DB during this pass: - `molecule-core@main` initially sampled at `0fc1649a0c01297934cf9d31f4f8f9a47c18d557` with one current-head failure: `Staging SaaS smoke (every 30 min) / Staging SaaS smoke (push)` at `/molecule-ai/molecule-core/actions/runs/84214/jobs/0`, failing after 12m49s. Log tail showed staging full-SaaS smoke reached org creation and tenant provisioning (`e2e-smoke-20260524-smoke-100822`) before the failure window. - DB evidence also showed main push run `100822` for core `0fc1649` ended `Failure` at 2026-05-24 06:44:14Z. - Re-sampled before closing the heartbeat: current `molecule-core@main` advanced to `878c74eafef0dc65a5d11385e5892346038e7f22` and has no failure/error statuses yet (`pending=32`, `success=1`). - Health probe correlation remains: OBS health `200`; `https://api.moleculesai.app/healthz` `404`; `https://staging-api.moleculesai.app/healthz` `404`; `https://staging.moleculesai.app` DNS unresolved (`curl 000`). No rerun, deploy, cleanup, secret mutation, or branch-protection mutation was performed.
Owner

Hourly CI/security triage - 2026-05-24 01:20 PDT

Fresh current-head sample:

  • molecule-core@272cb8b7d6be24035ea2557e21c1d5356f59a7d4 has no failure/error statuses (pending=35, success=34).
  • Recent DB rows still show superseded or PR-head failures in the last ~80m, including prior main SHAs 878c74, 406d73, and cancelled 272cb8 push run 101187, but the current Gitea commit-status surface for 272cb8 is not red.
  • Related open PRs remain mixed: #1768/#1772/#1773/#1770/#1764 are mergeable; #1731 remains non-mergeable.

No rerun, merge, PR closure, deploy, cleanup, or mutation was performed.

### Hourly CI/security triage - 2026-05-24 01:20 PDT Fresh current-head sample: - `molecule-core@272cb8b7d6be24035ea2557e21c1d5356f59a7d4` has no failure/error statuses (`pending=35`, `success=34`). - Recent DB rows still show superseded or PR-head failures in the last ~80m, including prior main SHAs `878c74`, `406d73`, and cancelled `272cb8` push run `101187`, but the current Gitea commit-status surface for `272cb8` is not red. - Related open PRs remain mixed: #1768/#1772/#1773/#1770/#1764 are mergeable; #1731 remains non-mergeable. No rerun, merge, PR closure, deploy, cleanup, or mutation was performed.
Owner

Hourly CI/security triage - 2026-05-24 02:20 PDT

Fresh current-head sample:

  • molecule-core@272cb8b7d6be24035ea2557e21c1d5356f59a7d4 has no failure/error statuses at close (pending=51, success=49).
  • DB shows recent PR-head failures for #1790 (test(e2e): add real staging image upload smoke) across several superseded heads, plus one running current PR-head run, but no current main red status for 272cb8.
  • Open PR state: #1790 is open/mergeable; #1731 remains open/non-mergeable.

No rerun, merge, PR closure, deploy, cleanup, or mutation was performed.

### Hourly CI/security triage - 2026-05-24 02:20 PDT Fresh current-head sample: - `molecule-core@272cb8b7d6be24035ea2557e21c1d5356f59a7d4` has no failure/error statuses at close (`pending=51`, `success=49`). - DB shows recent PR-head failures for #1790 (`test(e2e): add real staging image upload smoke`) across several superseded heads, plus one running current PR-head run, but no current `main` red status for `272cb8`. - Open PR state: #1790 is open/mergeable; #1731 remains open/non-mergeable. No rerun, merge, PR closure, deploy, cleanup, or mutation was performed.
Owner

Hourly CI/security triage - 2026-05-24 03:20 PDT

Fresh current-head sample:

  • molecule-core@a0944605809efe3b91cce86dd0e20d18d946c6ce is protected and has no failure/error statuses in the sampled commit-status API. At sample time it only had one pending status, so coverage is still filling in for this fresh main push.
  • DB evidence shows current-head main runs 101716, 101731, and 101732 are running for a094460.
  • DB also shows recent failures on superseded main SHAs 39c861 (#1796) and e75372 (#1794), plus repeated PR-head failures for #1790, but those are not current-head red statuses for a094460.
  • Open PR state: #1790 no longer appears in the first 12 open core PRs; #1731 remains open/non-mergeable.

No rerun, merge, PR closure, deploy, cleanup, or mutation was performed.

### Hourly CI/security triage - 2026-05-24 03:20 PDT Fresh current-head sample: - `molecule-core@a0944605809efe3b91cce86dd0e20d18d946c6ce` is protected and has no failure/error statuses in the sampled commit-status API. At sample time it only had one pending status, so coverage is still filling in for this fresh main push. - DB evidence shows current-head main runs `101716`, `101731`, and `101732` are running for `a094460`. - DB also shows recent failures on superseded main SHAs `39c861` (#1796) and `e75372` (#1794), plus repeated PR-head failures for #1790, but those are not current-head red statuses for `a094460`. - Open PR state: #1790 no longer appears in the first 12 open core PRs; #1731 remains open/non-mergeable. No rerun, merge, PR closure, deploy, cleanup, or mutation was performed.
Owner

Hourly CI/security triage - 2026-05-24 04:20 PDT

Fresh current-head sample:

  • molecule-core@7604e113d2da6f9e6359f738b64bc62528757c30 is protected and has no failure/error statuses at close (pending=33, success=32).
  • DB evidence in the last ~75m shows recent failures on PR heads (#1800, #1799, #1773) and superseded main SHAs (bf0f88, a09446, 39c861, etc.), but not a current-head red status for 7604e1.
  • Open PR state: #1800/#1799/#1773/#1772/#1770/#1768/#1764 are mergeable; #1731 remains non-mergeable.

No rerun, merge, PR closure, deploy, cleanup, or mutation was performed.

### Hourly CI/security triage - 2026-05-24 04:20 PDT Fresh current-head sample: - `molecule-core@7604e113d2da6f9e6359f738b64bc62528757c30` is protected and has no failure/error statuses at close (`pending=33`, `success=32`). - DB evidence in the last ~75m shows recent failures on PR heads (#1800, #1799, #1773) and superseded main SHAs (`bf0f88`, `a09446`, `39c861`, etc.), but not a current-head red status for `7604e1`. - Open PR state: #1800/#1799/#1773/#1772/#1770/#1768/#1764 are mergeable; #1731 remains non-mergeable. No rerun, merge, PR closure, deploy, cleanup, or mutation was performed.
Owner

2026-05-24 12:40 PDT CI/security heartbeat evidence:

  • Critical repo current-head status check showed no failure/error statuses on main heads. Sampled heads: internal@3301055, molecule-core@a773973, operator-config@2a198f3, molecule-controlplane@a33e672, workspace-runtime@ef64a20, codex-channel@94c7051, molecule-ci@3b464de.
  • Recent DB failures are concentrated in molecule-core PR/staging surfaces, not current main: PR #1805 (120ca0d) and #1804/#1803/#1802 are failing sop-checklist / all-items-acked, security-review / approved, and qa-review / approved; staging SHA f3b168b has E2E Chat / E2E Chat (push) failing after 6m12s.
  • Branch protection remains aligned with the post-2026-05-18 model (required_approvals=2, enable_merge_whitelist=false, dismiss_stale_approvals=true) except the known molecule-ci status-context gap (enable_status_check=false, no required contexts). internal and operator-config still have stale merge_whitelist_usernames:["devops-engineer"] metadata with enforcement disabled.
  • Token/workflow spot check: local token returned 200 for internal/issues/1 and internal/branches/main; no permissions: write-all found in local critical workflow checkouts. Narrow write scopes remain limited to known workflows (pull-requests: write, contents: write, id-token: write).
  • Runner/host sample: /mnt/ci-runner-storage-1 is 71% used (354G/500G), DockerRootDir /mnt/ci-runner-storage-1/docker, host restart still required.
  • Health probes unchanged: OBS 200; prod /healthz 404; staging API /healthz 404; staging.moleculesai.app DNS unresolved.

No destructive action taken: no merge, force-push, main push, secret rotation, branch-protection mutation, deploy/workflow rerun, runner restart, or cleanup.

2026-05-24 12:40 PDT CI/security heartbeat evidence: - Critical repo current-head status check showed no failure/error statuses on `main` heads. Sampled heads: `internal@3301055`, `molecule-core@a773973`, `operator-config@2a198f3`, `molecule-controlplane@a33e672`, `workspace-runtime@ef64a20`, `codex-channel@94c7051`, `molecule-ci@3b464de`. - Recent DB failures are concentrated in `molecule-core` PR/staging surfaces, not current `main`: PR #1805 (`120ca0d`) and #1804/#1803/#1802 are failing `sop-checklist / all-items-acked`, `security-review / approved`, and `qa-review / approved`; staging SHA `f3b168b` has `E2E Chat / E2E Chat (push)` failing after 6m12s. - Branch protection remains aligned with the post-2026-05-18 model (`required_approvals=2`, `enable_merge_whitelist=false`, `dismiss_stale_approvals=true`) except the known `molecule-ci` status-context gap (`enable_status_check=false`, no required contexts). `internal` and `operator-config` still have stale `merge_whitelist_usernames:["devops-engineer"]` metadata with enforcement disabled. - Token/workflow spot check: local token returned 200 for `internal/issues/1` and `internal/branches/main`; no `permissions: write-all` found in local critical workflow checkouts. Narrow write scopes remain limited to known workflows (`pull-requests: write`, `contents: write`, `id-token: write`). - Runner/host sample: `/mnt/ci-runner-storage-1` is 71% used (`354G/500G`), DockerRootDir `/mnt/ci-runner-storage-1/docker`, host restart still required. - Health probes unchanged: OBS 200; prod `/healthz` 404; staging API `/healthz` 404; `staging.moleculesai.app` DNS unresolved. No destructive action taken: no merge, force-push, main push, secret rotation, branch-protection mutation, deploy/workflow rerun, runner restart, or cleanup.
Owner

2026-05-24 13:40 PDT CI/security heartbeat evidence:

  • Critical repo current-head status check showed no failure/error statuses on main heads. Sampled heads: internal@3301055, molecule-core@a773973, operator-config@2a198f3, molecule-controlplane@a33e672, workspace-runtime@ef64a20, codex-channel@94c7051, molecule-ci@3b464d.
  • New recent DB failures in the critical repo set are molecule-core PR #1806 heads only: a53223a and prior 239bc19 are failing sop-checklist / all-items-acked, security-review / approved, and qa-review / approved.
  • Branch protection remains aligned with the post-2026-05-18 model except the known molecule-ci status-context gap (enable_status_check=false, no required contexts). internal and operator-config still have stale devops-engineer merge-whitelist metadata with enforcement disabled.
  • Token/workflow spot check: local token returned 200 for internal/issues/1 and internal/branches/main; no permissions: write-all found in local critical workflow checkouts. Narrow write scopes unchanged.
  • Runner/host sample: /mnt/ci-runner-storage-1 is 71% used (353G/500G), DockerRootDir /mnt/ci-runner-storage-1/docker, host restart still required.
  • Health probes unchanged: OBS 200; prod /healthz 404; staging API /healthz 404; staging.moleculesai.app DNS unresolved.

No destructive action taken: no merge, force-push, main push, secret rotation, branch-protection mutation, deploy/workflow rerun, runner restart, or cleanup.

2026-05-24 13:40 PDT CI/security heartbeat evidence: - Critical repo current-head status check showed no failure/error statuses on `main` heads. Sampled heads: `internal@3301055`, `molecule-core@a773973`, `operator-config@2a198f3`, `molecule-controlplane@a33e672`, `workspace-runtime@ef64a20`, `codex-channel@94c7051`, `molecule-ci@3b464d`. - New recent DB failures in the critical repo set are `molecule-core` PR #1806 heads only: `a53223a` and prior `239bc19` are failing `sop-checklist / all-items-acked`, `security-review / approved`, and `qa-review / approved`. - Branch protection remains aligned with the post-2026-05-18 model except the known `molecule-ci` status-context gap (`enable_status_check=false`, no required contexts). `internal` and `operator-config` still have stale `devops-engineer` merge-whitelist metadata with enforcement disabled. - Token/workflow spot check: local token returned 200 for `internal/issues/1` and `internal/branches/main`; no `permissions: write-all` found in local critical workflow checkouts. Narrow write scopes unchanged. - Runner/host sample: `/mnt/ci-runner-storage-1` is 71% used (`353G/500G`), DockerRootDir `/mnt/ci-runner-storage-1/docker`, host restart still required. - Health probes unchanged: OBS 200; prod `/healthz` 404; staging API `/healthz` 404; `staging.moleculesai.app` DNS unresolved. No destructive action taken: no merge, force-push, main push, secret rotation, branch-protection mutation, deploy/workflow rerun, runner restart, or cleanup.
Owner

2026-05-24 15:40 PDT CI/security heartbeat evidence:

  • Critical repo current-head commit-status check showed no failure/error statuses on main heads. Sampled heads: internal@3301055, molecule-core@4a610ca, operator-config@2a198f3, molecule-controlplane@a33e672, workspace-runtime@ef64a20, codex-channel@94c7051, molecule-ci@3b464d.
  • Recent DB rows show molecule-core main churn with superseded main cancellations/failures: 09614f4 had cancelled/failing push statuses, e0f9a16 had CI / all-required (push) failure plus many cancellations, while current 4a610ca has no failure/error commit statuses yet.
  • Current PR-head failures remain review/SOP-gate failures: PR #1809 (94bdd8f) and PR #1810 (b739dae) fail sop-checklist / all-items-acked, security-review / approved, and qa-review / approved; PR #1808 (ca1b900) shows the same gate failures.
  • Branch protection remains aligned with the post-2026-05-18 model except the known molecule-ci status-context gap (enable_status_check=false, no required contexts). internal and operator-config still have stale devops-engineer merge-whitelist metadata with enforcement disabled.
  • Token/workflow spot check: local token returned 200 for internal/issues/1 and internal/branches/main; no permissions: write-all found in local critical workflow checkouts. Narrow write scopes unchanged.
  • Runner/host sample: /mnt/ci-runner-storage-1 is 71% used (355G/500G), DockerRootDir /mnt/ci-runner-storage-1/docker, host restart still required.
  • Health probes unchanged: OBS 200; prod /healthz 404; staging API /healthz 404; staging.moleculesai.app DNS unresolved.

No destructive action taken: no merge, force-push, main push, secret rotation, branch-protection mutation, deploy/workflow rerun, runner restart, or cleanup.

2026-05-24 15:40 PDT CI/security heartbeat evidence: - Critical repo current-head commit-status check showed no failure/error statuses on `main` heads. Sampled heads: `internal@3301055`, `molecule-core@4a610ca`, `operator-config@2a198f3`, `molecule-controlplane@a33e672`, `workspace-runtime@ef64a20`, `codex-channel@94c7051`, `molecule-ci@3b464d`. - Recent DB rows show `molecule-core` main churn with superseded main cancellations/failures: `09614f4` had cancelled/failing push statuses, `e0f9a16` had `CI / all-required (push)` failure plus many cancellations, while current `4a610ca` has no failure/error commit statuses yet. - Current PR-head failures remain review/SOP-gate failures: PR #1809 (`94bdd8f`) and PR #1810 (`b739dae`) fail `sop-checklist / all-items-acked`, `security-review / approved`, and `qa-review / approved`; PR #1808 (`ca1b900`) shows the same gate failures. - Branch protection remains aligned with the post-2026-05-18 model except the known `molecule-ci` status-context gap (`enable_status_check=false`, no required contexts). `internal` and `operator-config` still have stale `devops-engineer` merge-whitelist metadata with enforcement disabled. - Token/workflow spot check: local token returned 200 for `internal/issues/1` and `internal/branches/main`; no `permissions: write-all` found in local critical workflow checkouts. Narrow write scopes unchanged. - Runner/host sample: `/mnt/ci-runner-storage-1` is 71% used (`355G/500G`), DockerRootDir `/mnt/ci-runner-storage-1/docker`, host restart still required. - Health probes unchanged: OBS 200; prod `/healthz` 404; staging API `/healthz` 404; `staging.moleculesai.app` DNS unresolved. No destructive action taken: no merge, force-push, main push, secret rotation, branch-protection mutation, deploy/workflow rerun, runner restart, or cleanup.
Owner

2026-05-24 16:40 PDT CI/security heartbeat evidence:

  • Critical repo current-head commit-status check showed no failure/error statuses on sampled main heads. Sampled heads: internal@3301055, molecule-core@8019231, operator-config@2a198f3, molecule-controlplane@a33e672, workspace-runtime@ef64a20, codex-channel@94c7051, molecule-ci@3b464d.
  • molecule-core@8019231 is still filling in rather than red: second status poll showed pending=36, success=22, no failure/error statuses.
  • Recent DB rows show current molecule-core@8019231 push runs still running, plus PR-head gate failures: PR #1816 (a551f64), PR #1815 (9b096b0), and PR #1814 (0cdf669) fail sop-checklist / all-items-acked, security-review / approved, and qa-review / approved.
  • Branch protection remains aligned with the post-2026-05-18 model except the known molecule-ci status-context gap (enable_status_check=false, no required contexts; tracked in molecule-ci#23). internal and operator-config still have stale devops-engineer merge-whitelist metadata with enforcement disabled.
  • Token/workflow spot check: local token returned 200 for internal/issues/1 and internal/branches/main; no permissions: write-all found in local critical workflow checkouts. Narrow write scopes unchanged.
  • Runner/host sample: /mnt/ci-runner-storage-1 jumped to 77% used (383G/500G), DockerRootDir /mnt/ci-runner-storage-1/docker, host restart still required, 7 zombie processes, runners 4 and 5 recently started.
  • Health probes unchanged: OBS 200; prod /healthz 404; staging API /healthz 404; staging.moleculesai.app DNS unresolved.

No destructive action taken: no merge, force-push, main push, secret rotation, branch-protection mutation, deploy/workflow rerun, runner restart, or cleanup.

2026-05-24 16:40 PDT CI/security heartbeat evidence: - Critical repo current-head commit-status check showed no failure/error statuses on sampled `main` heads. Sampled heads: `internal@3301055`, `molecule-core@8019231`, `operator-config@2a198f3`, `molecule-controlplane@a33e672`, `workspace-runtime@ef64a20`, `codex-channel@94c7051`, `molecule-ci@3b464d`. - `molecule-core@8019231` is still filling in rather than red: second status poll showed `pending=36`, `success=22`, no failure/error statuses. - Recent DB rows show current `molecule-core@8019231` push runs still running, plus PR-head gate failures: PR #1816 (`a551f64`), PR #1815 (`9b096b0`), and PR #1814 (`0cdf669`) fail `sop-checklist / all-items-acked`, `security-review / approved`, and `qa-review / approved`. - Branch protection remains aligned with the post-2026-05-18 model except the known `molecule-ci` status-context gap (`enable_status_check=false`, no required contexts; tracked in molecule-ci#23). `internal` and `operator-config` still have stale `devops-engineer` merge-whitelist metadata with enforcement disabled. - Token/workflow spot check: local token returned 200 for `internal/issues/1` and `internal/branches/main`; no `permissions: write-all` found in local critical workflow checkouts. Narrow write scopes unchanged. - Runner/host sample: `/mnt/ci-runner-storage-1` jumped to 77% used (`383G/500G`), DockerRootDir `/mnt/ci-runner-storage-1/docker`, host restart still required, 7 zombie processes, runners 4 and 5 recently started. - Health probes unchanged: OBS 200; prod `/healthz` 404; staging API `/healthz` 404; `staging.moleculesai.app` DNS unresolved. No destructive action taken: no merge, force-push, main push, secret rotation, branch-protection mutation, deploy/workflow rerun, runner restart, or cleanup.
Owner

2026-05-24 17:40 PDT CI/security heartbeat evidence:

  • Critical repo current-head status check found one current-head failure outside core: molecule-controlplane@6872294 has Deploy main → staging (push) failing after 20s (also posted to controlplane#184).
  • molecule-core@286a499 current head was running/filling in with pending=35, success=32, no failure/error commit statuses in the sampled current-head status API response.
  • Recent molecule-core failures remain PR/superseded-head gate failures: PR #1821 (40343db) and #1820 (cb6760f) fail sop-checklist, security-review, and qa-review; superseded main 6964b26 had a cancelled production auto-deploy status.
  • Branch protection remains aligned with the post-2026-05-18 model except the known molecule-ci required-context gap (enable_status_check=false, tracked in molecule-ci#23). internal and operator-config still have stale devops-engineer merge-whitelist metadata with enforcement disabled.
  • Runner/host sample: /mnt/ci-runner-storage-1 remains high at 77% used (383G/500G), DockerRootDir /mnt/ci-runner-storage-1/docker, host restart still required.
  • Health probes unchanged: OBS 200; prod /healthz 404; staging API /healthz 404; staging.moleculesai.app DNS unresolved.

No destructive action taken: no merge, force-push, main push, secret rotation, branch-protection mutation, deploy/workflow rerun, runner restart, or cleanup.

2026-05-24 17:40 PDT CI/security heartbeat evidence: - Critical repo current-head status check found one current-head failure outside core: `molecule-controlplane@6872294` has `Deploy main → staging (push)` failing after 20s (also posted to controlplane#184). - `molecule-core@286a499` current head was running/filling in with `pending=35`, `success=32`, no failure/error commit statuses in the sampled current-head status API response. - Recent `molecule-core` failures remain PR/superseded-head gate failures: PR #1821 (`40343db`) and #1820 (`cb6760f`) fail `sop-checklist`, `security-review`, and `qa-review`; superseded main `6964b26` had a cancelled production auto-deploy status. - Branch protection remains aligned with the post-2026-05-18 model except the known `molecule-ci` required-context gap (`enable_status_check=false`, tracked in molecule-ci#23). `internal` and `operator-config` still have stale `devops-engineer` merge-whitelist metadata with enforcement disabled. - Runner/host sample: `/mnt/ci-runner-storage-1` remains high at 77% used (`383G/500G`), DockerRootDir `/mnt/ci-runner-storage-1/docker`, host restart still required. - Health probes unchanged: OBS 200; prod `/healthz` 404; staging API `/healthz` 404; `staging.moleculesai.app` DNS unresolved. No destructive action taken: no merge, force-push, main push, secret rotation, branch-protection mutation, deploy/workflow rerun, runner restart, or cleanup.
Owner

2026-05-24 18:40 PDT CI/security heartbeat evidence:

  • Current molecule-core@main head 286a499819112ea07a2ab5faf5bca91c81605c84 has one failure status: Staging SaaS smoke (every 30 min) / Staging SaaS smoke (push) at /molecule-ai/molecule-core/actions/runs/86015/jobs/0, description Failing after 15m34s.
  • Fetched the run log for 86015/jobs/0; the tail shows explicit ::error::staging-smoke FAILED — staging SaaS canary is red, listing common causes as CP_STAGING_ADMIN_API_TOKEN secret drift, staging-api 5xx, MiniMax/Anthropic key failure, or AMI/CF/WorkOS drift. Secrets were redacted as expected.
  • Current molecule-controlplane@main head 6b197ba79882110bd113daca3d0c6c8d7e6fa3be also has Deploy main → staging (push) failing after 14s; posted to controlplane#184.
  • Branch protection remains aligned with the post-2026-05-18 model except the known molecule-ci status-context gap (enable_status_check=false, no required contexts; tracked in molecule-ci#23). internal and operator-config still have stale devops-engineer merge-whitelist metadata with enforcement disabled.
  • Runner/host sample: /mnt/ci-runner-storage-1 remains high at 77% used (384G/500G), DockerRootDir /mnt/ci-runner-storage-1/docker, host restart still required, 7 zombie processes.
  • Health probes unchanged: OBS 200; prod /healthz 404; staging API /healthz 404; staging.moleculesai.app DNS unresolved.

No destructive action taken: no merge, force-push, main push, secret rotation, branch-protection mutation, deploy/workflow rerun, runner restart, or cleanup.

2026-05-24 18:40 PDT CI/security heartbeat evidence: - Current `molecule-core@main` head `286a499819112ea07a2ab5faf5bca91c81605c84` has one failure status: `Staging SaaS smoke (every 30 min) / Staging SaaS smoke (push)` at `/molecule-ai/molecule-core/actions/runs/86015/jobs/0`, description `Failing after 15m34s`. - Fetched the run log for `86015/jobs/0`; the tail shows explicit `::error::staging-smoke FAILED — staging SaaS canary is red`, listing common causes as CP_STAGING_ADMIN_API_TOKEN secret drift, staging-api 5xx, MiniMax/Anthropic key failure, or AMI/CF/WorkOS drift. Secrets were redacted as expected. - Current `molecule-controlplane@main` head `6b197ba79882110bd113daca3d0c6c8d7e6fa3be` also has `Deploy main → staging (push)` failing after 14s; posted to controlplane#184. - Branch protection remains aligned with the post-2026-05-18 model except the known `molecule-ci` status-context gap (`enable_status_check=false`, no required contexts; tracked in molecule-ci#23). `internal` and `operator-config` still have stale `devops-engineer` merge-whitelist metadata with enforcement disabled. - Runner/host sample: `/mnt/ci-runner-storage-1` remains high at 77% used (`384G/500G`), DockerRootDir `/mnt/ci-runner-storage-1/docker`, host restart still required, 7 zombie processes. - Health probes unchanged: OBS 200; prod `/healthz` 404; staging API `/healthz` 404; `staging.moleculesai.app` DNS unresolved. No destructive action taken: no merge, force-push, main push, secret rotation, branch-protection mutation, deploy/workflow rerun, runner restart, or cleanup.
Owner

2026-05-24 19:40 PDT CI/security heartbeat evidence:

  • Critical repo current-head commit-status check showed no failure/error statuses on sampled main heads. Current heads included molecule-core@50e7173, molecule-controlplane@6e55d32, workspace-runtime@9356572, codex-channel@94c7051, molecule-ci@3b464d.
  • molecule-core@50e7173 is still filling in: status summary pending=43, success=42, no failure/error commit statuses.
  • molecule-controlplane@6e55d32 is also filling in: status summary pending=17, success=12, no failure/error commit statuses; this supersedes the previous red 6b197ba deploy-stage head.
  • Recent PR-head failures remain review/SOP-gate failures: core PR #1826 (a00e2be) failed sop-checklist, security-review, and qa-review; core PR #1825 and #1821 still show sop-checklist failures on sampled heads.
  • Runner/host sample worsened materially: /mnt/ci-runner-storage-1 is now 83% used (412G/500G, only 89G free), DockerRootDir /mnt/ci-runner-storage-1/docker, host restart still required, 7 zombie processes, load 7.48. No cleanup or runner restart performed.
  • Branch protection remains aligned with the post-2026-05-18 model except the known molecule-ci status-context gap (enable_status_check=false, no required contexts; tracked in molecule-ci#23). internal and operator-config still have stale devops-engineer merge-whitelist metadata with enforcement disabled.
  • Health probes unchanged: OBS 200; prod /healthz 404; staging API /healthz 404; staging.moleculesai.app DNS unresolved.

No destructive action taken: no merge, force-push, main push, secret rotation, branch-protection mutation, deploy/workflow rerun, runner restart, or cleanup.

2026-05-24 19:40 PDT CI/security heartbeat evidence: - Critical repo current-head commit-status check showed no failure/error statuses on sampled `main` heads. Current heads included `molecule-core@50e7173`, `molecule-controlplane@6e55d32`, `workspace-runtime@9356572`, `codex-channel@94c7051`, `molecule-ci@3b464d`. - `molecule-core@50e7173` is still filling in: status summary `pending=43`, `success=42`, no failure/error commit statuses. - `molecule-controlplane@6e55d32` is also filling in: status summary `pending=17`, `success=12`, no failure/error commit statuses; this supersedes the previous red `6b197ba` deploy-stage head. - Recent PR-head failures remain review/SOP-gate failures: core PR #1826 (`a00e2be`) failed `sop-checklist`, `security-review`, and `qa-review`; core PR #1825 and #1821 still show `sop-checklist` failures on sampled heads. - Runner/host sample worsened materially: `/mnt/ci-runner-storage-1` is now 83% used (`412G/500G`, only 89G free), DockerRootDir `/mnt/ci-runner-storage-1/docker`, host restart still required, 7 zombie processes, load 7.48. No cleanup or runner restart performed. - Branch protection remains aligned with the post-2026-05-18 model except the known `molecule-ci` status-context gap (`enable_status_check=false`, no required contexts; tracked in molecule-ci#23). `internal` and `operator-config` still have stale `devops-engineer` merge-whitelist metadata with enforcement disabled. - Health probes unchanged: OBS 200; prod `/healthz` 404; staging API `/healthz` 404; `staging.moleculesai.app` DNS unresolved. No destructive action taken: no merge, force-push, main push, secret rotation, branch-protection mutation, deploy/workflow rerun, runner restart, or cleanup.
Owner

Hourly CI/security triage evidence (2026-05-24 21:40 PDT / 2026-05-25 04:40 UTC):

  • Current molecule-core@main head 4c86f047c7b98b2b921a52248fe15ab5fb9b171c has no failure/error commit statuses in the commit-status API sample (pending=37, success=24). Recent DB rows show its push runs still running/succeeding.
  • Fresh open-PR head scan found the newest red PR as #1833 (c7211610, "Wire native LLM auth selection into workspace creation") failing sop-checklist / all-items-acked, qa-review / approved, and security-review / approved.
  • #1826 (34179e64) improved to only sop-checklist / all-items-acked failing; its review gates are no longer red in the sampled status set.
  • Older open PRs continue to have SOP/review-gate failures; #1772/#1768 still include actual CI/Platform or integration failures in addition to SOP/review gates.
  • No workflow reruns, merges, status mutations, or branch-protection changes performed.
Hourly CI/security triage evidence (2026-05-24 21:40 PDT / 2026-05-25 04:40 UTC): - Current `molecule-core@main` head `4c86f047c7b98b2b921a52248fe15ab5fb9b171c` has no failure/error commit statuses in the commit-status API sample (`pending=37`, `success=24`). Recent DB rows show its push runs still running/succeeding. - Fresh open-PR head scan found the newest red PR as #1833 (`c7211610`, "Wire native LLM auth selection into workspace creation") failing `sop-checklist / all-items-acked`, `qa-review / approved`, and `security-review / approved`. - #1826 (`34179e64`) improved to only `sop-checklist / all-items-acked` failing; its review gates are no longer red in the sampled status set. - Older open PRs continue to have SOP/review-gate failures; #1772/#1768 still include actual CI/Platform or integration failures in addition to SOP/review gates. - No workflow reruns, merges, status mutations, or branch-protection changes performed.
Owner

Hourly CI/security triage evidence (2026-05-24 22:40 PDT / 2026-05-25 05:40 UTC):

  • Current molecule-core@main head b364c16ea6af9294e3e6422733d4beb24a5a9173 is red: publish-workspace-server-image / Production auto-deploy (push), target /molecule-ai/molecule-core/actions/runs/86836/jobs/1, Failing after 10m33s.
  • Fetched the run log. The workflow waited for green CI / all-required (push), then called POST https://api.moleculesai.app/cp/admin/tenants/redeploy-fleet with target_tag=staging-b364c16. The call returned HTTP 502; log ended with ::error::redeploy-fleet returned HTTP 502 and Job 'Production auto-deploy' failed.
  • The same log also showed GET .../actions/variables/PROD_AUTO_DEPLOY_DISABLED -> HTTP 403: user should be the owner of the repo, but the plan still resolved enabled=true and proceeded to deploy.
  • Health probes during the same pass: OBS login 200; prod /healthz 404; staging API /healthz 404; staging frontend DNS unresolved.
  • Open PR scan: many older core PRs still have SOP/review-gate failures; #1772/#1768 still include real CI/Platform or integration failures.
  • No rerun, merge, deploy retry, secret read, status mutation, or branch-protection change performed.
Hourly CI/security triage evidence (2026-05-24 22:40 PDT / 2026-05-25 05:40 UTC): - Current `molecule-core@main` head `b364c16ea6af9294e3e6422733d4beb24a5a9173` is red: `publish-workspace-server-image / Production auto-deploy (push)`, target `/molecule-ai/molecule-core/actions/runs/86836/jobs/1`, `Failing after 10m33s`. - Fetched the run log. The workflow waited for green `CI / all-required (push)`, then called `POST https://api.moleculesai.app/cp/admin/tenants/redeploy-fleet` with `target_tag=staging-b364c16`. The call returned `HTTP 502`; log ended with `::error::redeploy-fleet returned HTTP 502` and `Job 'Production auto-deploy' failed`. - The same log also showed `GET .../actions/variables/PROD_AUTO_DEPLOY_DISABLED -> HTTP 403: user should be the owner of the repo`, but the plan still resolved `enabled=true` and proceeded to deploy. - Health probes during the same pass: OBS login `200`; prod `/healthz` `404`; staging API `/healthz` `404`; staging frontend DNS unresolved. - Open PR scan: many older core PRs still have SOP/review-gate failures; #1772/#1768 still include real CI/Platform or integration failures. - No rerun, merge, deploy retry, secret read, status mutation, or branch-protection change performed.
Owner

Hourly CI/security triage evidence (2026-05-25 00:55 PDT / 07:55 UTC):

  • Initial sample caught molecule-core@b364c16e still red on publish-workspace-server-image / Production auto-deploy (push), run /molecule-ai/molecule-core/actions/runs/86836/jobs/1, Failing after 10m33s.
  • Re-fetched the log: production auto-deploy called POST https://api.moleculesai.app/cp/admin/tenants/redeploy-fleet for target_tag=staging-b364c16 and failed on HTTP 502 (::error::redeploy-fleet returned HTTP 502). The same log still shows PROD_AUTO_DEPLOY_DISABLED variable read returning HTTP 403: user should be the owner of the repo before proceeding enabled.
  • During the pass, molecule-core@main advanced to ef1d35c32b1c8cd12a44002cf5cb8189de15dd7a from PR #1838 (Split production auto-deploy into scoped redeploy calls). Fresh status sample on that new head had no failure/error (pending=28, success=1), so the old red head is now superseded but the redeploy-fleet 502 remains useful RCA evidence.
  • Open PR scan still found older core PR failures, including #1768 and #1772 with real CI/Platform/integration failures plus SOP/review gates; #1837 has SOP/review-gate failures.
  • No rerun, merge, deploy retry, secret read, status mutation, or branch-protection change performed.
Hourly CI/security triage evidence (2026-05-25 00:55 PDT / 07:55 UTC): - Initial sample caught `molecule-core@b364c16e` still red on `publish-workspace-server-image / Production auto-deploy (push)`, run `/molecule-ai/molecule-core/actions/runs/86836/jobs/1`, `Failing after 10m33s`. - Re-fetched the log: production auto-deploy called `POST https://api.moleculesai.app/cp/admin/tenants/redeploy-fleet` for `target_tag=staging-b364c16` and failed on `HTTP 502` (`::error::redeploy-fleet returned HTTP 502`). The same log still shows `PROD_AUTO_DEPLOY_DISABLED` variable read returning `HTTP 403: user should be the owner of the repo` before proceeding enabled. - During the pass, `molecule-core@main` advanced to `ef1d35c32b1c8cd12a44002cf5cb8189de15dd7a` from PR #1838 (`Split production auto-deploy into scoped redeploy calls`). Fresh status sample on that new head had no failure/error (`pending=28`, `success=1`), so the old red head is now superseded but the redeploy-fleet 502 remains useful RCA evidence. - Open PR scan still found older core PR failures, including #1768 and #1772 with real CI/Platform/integration failures plus SOP/review gates; #1837 has SOP/review-gate failures. - No rerun, merge, deploy retry, secret read, status mutation, or branch-protection change performed.
Owner

Hourly CI/security triage evidence (2026-05-25 01:55 PDT / 08:55 UTC):

  • Current molecule-core@main head 8e97765a303aef6a35cef565292032ba60d8f7cd had no failure/error commit statuses in fresh samples (pending=28, success=24 initially; later recheck still 0 failure/error).
  • Recent DB rows show current main #1841 push rows mostly Success with some Running/Cancelled churn; no active current-head failure was observed.
  • Older superseded main 8df1fef4 had one DB failure row, but current 8e97765a supersedes it.
  • Open PR scan still shows older red PRs: #1837 on SOP/security/QA review gates, and #1772/#1768 with real CI/Platform/integration failures plus SOP/review gates.
  • No rerun, merge, deploy retry, secret read, status mutation, or branch-protection change performed.
Hourly CI/security triage evidence (2026-05-25 01:55 PDT / 08:55 UTC): - Current `molecule-core@main` head `8e97765a303aef6a35cef565292032ba60d8f7cd` had no failure/error commit statuses in fresh samples (`pending=28`, `success=24` initially; later recheck still `0` failure/error). - Recent DB rows show current main #1841 push rows mostly `Success` with some `Running`/`Cancelled` churn; no active current-head failure was observed. - Older superseded main `8df1fef4` had one DB failure row, but current `8e97765a` supersedes it. - Open PR scan still shows older red PRs: #1837 on SOP/security/QA review gates, and #1772/#1768 with real CI/Platform/integration failures plus SOP/review gates. - No rerun, merge, deploy retry, secret read, status mutation, or branch-protection change performed.
Owner

Hourly CI/security triage evidence (2026-05-25 02:55 PDT / 09:55 UTC):

  • Current molecule-core@main head 8e97765a303aef6a35cef565292032ba60d8f7cd has no failure/error commit statuses: pending=44, success=44.
  • Recent DB rows for the same SHA are successful; no active current-head DB failure observed this pass.
  • Fresh open PR scan still shows older red PRs, mainly SOP/review gates. Actual CI failures remain on #1772 and #1768 (CI / all-required, CI / Platform, and integration failures for #1768).
  • No rerun, merge, deploy retry, secret read, status mutation, or branch-protection change performed.
Hourly CI/security triage evidence (2026-05-25 02:55 PDT / 09:55 UTC): - Current `molecule-core@main` head `8e97765a303aef6a35cef565292032ba60d8f7cd` has no failure/error commit statuses: `pending=44`, `success=44`. - Recent DB rows for the same SHA are successful; no active current-head DB failure observed this pass. - Fresh open PR scan still shows older red PRs, mainly SOP/review gates. Actual CI failures remain on #1772 and #1768 (`CI / all-required`, `CI / Platform`, and integration failures for #1768). - No rerun, merge, deploy retry, secret read, status mutation, or branch-protection change performed.
Owner

Hourly CI/security triage evidence (2026-05-25 03:55 PDT / 10:55 UTC):

  • Current main head is 6eda20b3c01918877ec3b5f100f059b3545eec81.
  • Gitea commit-status API showed no failure/error states: pending=26, success=25.
  • Gitea DB cross-check for this same SHA on refs/heads/main showed Success=13, Cancelled=1; no Failure rows observed.
  • Branch protection remains aligned with the current gate model: required_approvals=2, enable_status_check=true, required context CI / all-required (pull_request), enable_merge_whitelist=false, dismiss_stale_approvals=true.
  • Health probes remain unchanged outside this repo: OBS /login 200; prod /healthz 404; staging API/frontend DNS unresolved.
  • No rerun, merge, branch-protection mutation, deploy, or secret action performed.
Hourly CI/security triage evidence (2026-05-25 03:55 PDT / 10:55 UTC): - Current `main` head is `6eda20b3c01918877ec3b5f100f059b3545eec81`. - Gitea commit-status API showed no failure/error states: `pending=26`, `success=25`. - Gitea DB cross-check for this same SHA on `refs/heads/main` showed `Success=13`, `Cancelled=1`; no Failure rows observed. - Branch protection remains aligned with the current gate model: `required_approvals=2`, `enable_status_check=true`, required context `CI / all-required (pull_request)`, `enable_merge_whitelist=false`, `dismiss_stale_approvals=true`. - Health probes remain unchanged outside this repo: OBS `/login` 200; prod `/healthz` 404; staging API/frontend DNS unresolved. - No rerun, merge, branch-protection mutation, deploy, or secret action performed.
Owner

Hourly CI/security triage evidence (2026-05-25 04:55 PDT / 11:55 UTC):

  • Current main head is e865d656f89c8a569968c8c420bad90ae9e0bbfd.
  • Gitea commit-status API showed no failure/error states: pending=36, success=35.
  • Gitea DB cross-check for this same SHA on refs/heads/main showed Success=24, Cancelled=1; no Failure rows observed.
  • Open PR scan still shows older red PRs, mostly SOP/review gates; #1772 still has CI / all-required and CI / Platform (Go), #1768 still has Handlers Postgres Integration, CI / all-required, and CI / Platform (Go), and #1731 has lint-continue-on-error-tracking.
  • Branch protection remains aligned: required_approvals=2, enable_status_check=true, required context CI / all-required (pull_request), enable_merge_whitelist=false, dismiss_stale_approvals=true.
  • Health probes remain unchanged outside this repo: OBS /login 200; prod /healthz 404; staging API/frontend DNS unresolved.
  • No rerun, merge, branch-protection mutation, deploy, or secret action performed.
Hourly CI/security triage evidence (2026-05-25 04:55 PDT / 11:55 UTC): - Current `main` head is `e865d656f89c8a569968c8c420bad90ae9e0bbfd`. - Gitea commit-status API showed no failure/error states: `pending=36`, `success=35`. - Gitea DB cross-check for this same SHA on `refs/heads/main` showed `Success=24`, `Cancelled=1`; no Failure rows observed. - Open PR scan still shows older red PRs, mostly SOP/review gates; #1772 still has `CI / all-required` and `CI / Platform (Go)`, #1768 still has `Handlers Postgres Integration`, `CI / all-required`, and `CI / Platform (Go)`, and #1731 has `lint-continue-on-error-tracking`. - Branch protection remains aligned: `required_approvals=2`, `enable_status_check=true`, required context `CI / all-required (pull_request)`, `enable_merge_whitelist=false`, `dismiss_stale_approvals=true`. - Health probes remain unchanged outside this repo: OBS `/login` 200; prod `/healthz` 404; staging API/frontend DNS unresolved. - No rerun, merge, branch-protection mutation, deploy, or secret action performed.
Owner

Hourly CI/security triage evidence (2026-05-25 06:55 PDT / 13:55 UTC):

  • Current main head is 0c15480c507fb69e1a5705a957b3e1ddcafab7f2.
  • Gitea commit-status API showed no failure/error states: pending=26, success=3.
  • Gitea DB cross-check for this same SHA on refs/heads/main showed Running=2, Waiting=10, Success=2; no Failure rows observed.
  • Open PR scan still shows older red PRs, mostly SOP/review gates; #1772 still has CI / all-required and CI / Platform (Go), #1768 still has Handlers Postgres Integration, CI / all-required, and CI / Platform (Go), and #1731 has lint-continue-on-error-tracking.
  • Branch protection remains aligned: required_approvals=2, enable_status_check=true, required context CI / all-required (pull_request), enable_merge_whitelist=false, dismiss_stale_approvals=true.
  • No rerun, merge, branch-protection mutation, deploy, or secret action performed.
Hourly CI/security triage evidence (2026-05-25 06:55 PDT / 13:55 UTC): - Current `main` head is `0c15480c507fb69e1a5705a957b3e1ddcafab7f2`. - Gitea commit-status API showed no failure/error states: `pending=26`, `success=3`. - Gitea DB cross-check for this same SHA on `refs/heads/main` showed `Running=2`, `Waiting=10`, `Success=2`; no Failure rows observed. - Open PR scan still shows older red PRs, mostly SOP/review gates; #1772 still has `CI / all-required` and `CI / Platform (Go)`, #1768 still has `Handlers Postgres Integration`, `CI / all-required`, and `CI / Platform (Go)`, and #1731 has `lint-continue-on-error-tracking`. - Branch protection remains aligned: `required_approvals=2`, `enable_status_check=true`, required context `CI / all-required (pull_request)`, `enable_merge_whitelist=false`, `dismiss_stale_approvals=true`. - No rerun, merge, branch-protection mutation, deploy, or secret action performed.
Owner

Hourly CI/security triage evidence (2026-05-25 07:55 PDT / 14:55 UTC):

  • Current main head is e1496055e2b5c74e50f6d735b932df9b9979b9db.
  • Gitea commit-status API showed no failure/error states: pending=25, success=2.
  • Gitea DB cross-check for this same SHA on refs/heads/main showed Success=1, Cancelled=1, Waiting=10, Running=1, Failure=2. Follow-up on representative failed runs showed their jobs were Cancelled (e.g. ci.yml, handlers-postgres-integration.yml, secret-scan.yml jobs all cancelled/no task log), so this looks like cancellation/status drift while the new head is still active rather than a task-backed failure.
  • Open PR scan still shows older red PRs, mostly SOP/review gates; #1772 still has CI / all-required and CI / Platform (Go), #1768 still has Handlers Postgres Integration, CI / all-required, and CI / Platform (Go), and #1731 has lint-continue-on-error-tracking.
  • Branch protection remains aligned: required_approvals=2, enable_status_check=true, required context CI / all-required (pull_request), enable_merge_whitelist=false, dismiss_stale_approvals=true.
  • No rerun, merge, branch-protection mutation, deploy, or secret action performed.
Hourly CI/security triage evidence (2026-05-25 07:55 PDT / 14:55 UTC): - Current `main` head is `e1496055e2b5c74e50f6d735b932df9b9979b9db`. - Gitea commit-status API showed no failure/error states: `pending=25`, `success=2`. - Gitea DB cross-check for this same SHA on `refs/heads/main` showed `Success=1`, `Cancelled=1`, `Waiting=10`, `Running=1`, `Failure=2`. Follow-up on representative failed runs showed their jobs were `Cancelled` (e.g. `ci.yml`, `handlers-postgres-integration.yml`, `secret-scan.yml` jobs all cancelled/no task log), so this looks like cancellation/status drift while the new head is still active rather than a task-backed failure. - Open PR scan still shows older red PRs, mostly SOP/review gates; #1772 still has `CI / all-required` and `CI / Platform (Go)`, #1768 still has `Handlers Postgres Integration`, `CI / all-required`, and `CI / Platform (Go)`, and #1731 has `lint-continue-on-error-tracking`. - Branch protection remains aligned: `required_approvals=2`, `enable_status_check=true`, required context `CI / all-required (pull_request)`, `enable_merge_whitelist=false`, `dismiss_stale_approvals=true`. - No rerun, merge, branch-protection mutation, deploy, or secret action performed.
Owner

Hourly CI/security triage evidence (2026-05-25 08:55 PDT / 15:55 UTC):

  • Current main head is 37972fa6352df6469e4787175fc55e9e6302d057 and is red in the Gitea commit-status API: failure=2, pending=26, success=3.
  • Failing contexts are CI / all-required (push) (/molecule-ai/molecule-core/actions/runs/87786/jobs/6, failing after 40m26s) and publish-workspace-server-image / Production auto-deploy (push) (/molecule-ai/molecule-core/actions/runs/87796/jobs/1, failing after 30m20s).
  • Fetched both logs. CI / all-required timed out waiting for CI / Detect changes (push), CI / Python Lint & Test (push), and CI / Canvas (Next.js) (push) to leave pending. Production auto-deploy timed out waiting 1800s for CI / all-required (push) and secret scan to go green before deploying. This appears to be required-context/status propagation or runner backlog, not an application test failure in the fetched logs.
  • DB cross-check for the same SHA showed Failure=2, Success=2, Cancelled=6, Waiting=20, matching active red state plus queue churn.
  • Branch protection remains aligned: required_approvals=2, enable_status_check=true, required context CI / all-required (pull_request), enable_merge_whitelist=false, dismiss_stale_approvals=true.
  • No rerun, merge, branch-protection mutation, deploy, or secret action performed.
Hourly CI/security triage evidence (2026-05-25 08:55 PDT / 15:55 UTC): - Current `main` head is `37972fa6352df6469e4787175fc55e9e6302d057` and is red in the Gitea commit-status API: `failure=2`, `pending=26`, `success=3`. - Failing contexts are `CI / all-required (push)` (`/molecule-ai/molecule-core/actions/runs/87786/jobs/6`, failing after 40m26s) and `publish-workspace-server-image / Production auto-deploy (push)` (`/molecule-ai/molecule-core/actions/runs/87796/jobs/1`, failing after 30m20s). - Fetched both logs. `CI / all-required` timed out waiting for `CI / Detect changes (push)`, `CI / Python Lint & Test (push)`, and `CI / Canvas (Next.js) (push)` to leave pending. Production auto-deploy timed out waiting 1800s for `CI / all-required (push)` and secret scan to go green before deploying. This appears to be required-context/status propagation or runner backlog, not an application test failure in the fetched logs. - DB cross-check for the same SHA showed `Failure=2`, `Success=2`, `Cancelled=6`, `Waiting=20`, matching active red state plus queue churn. - Branch protection remains aligned: `required_approvals=2`, `enable_status_check=true`, required context `CI / all-required (pull_request)`, `enable_merge_whitelist=false`, `dismiss_stale_approvals=true`. - No rerun, merge, branch-protection mutation, deploy, or secret action performed.
Owner

Hourly CI/security triage evidence (2026-05-25 09:55 PDT / 16:55 UTC):

  • Current main head remains 37972fa6352df6469e4787175fc55e9e6302d057 and is still red in the Gitea commit-status API: failure=2, pending=26, success=12.
  • Failing contexts are still CI / all-required (push) (/molecule-ai/molecule-core/actions/runs/87786/jobs/6) and publish-workspace-server-image / Production auto-deploy (push) (/molecule-ai/molecule-core/actions/runs/87796/jobs/1).
  • Fresh log grep confirms the same root symptom: all-required timed out waiting for CI / Detect changes (push), CI / Python Lint & Test (push), and CI / Canvas (Next.js) (push) to leave pending; production auto-deploy timed out waiting for CI / all-required (push) and Secret scan / Scan diff for credential-shaped strings (push).
  • DB cross-check for the same SHA showed Failure=2, Success=6, Cancelled=22, Waiting=16, consistent with the red status plus queue churn.
  • Open PR scan still shows older red PRs; #1837 now also includes CI / all-required, while #1772/#1768 retain CI/Platform/integration failures and many older PRs retain SOP/review gate failures.
  • Branch protection remains aligned: required_approvals=2, enable_status_check=true, required context CI / all-required (pull_request), enable_merge_whitelist=false, dismiss_stale_approvals=true.
  • No rerun, merge, branch-protection mutation, deploy, or secret action performed.
Hourly CI/security triage evidence (2026-05-25 09:55 PDT / 16:55 UTC): - Current `main` head remains `37972fa6352df6469e4787175fc55e9e6302d057` and is still red in the Gitea commit-status API: `failure=2`, `pending=26`, `success=12`. - Failing contexts are still `CI / all-required (push)` (`/molecule-ai/molecule-core/actions/runs/87786/jobs/6`) and `publish-workspace-server-image / Production auto-deploy (push)` (`/molecule-ai/molecule-core/actions/runs/87796/jobs/1`). - Fresh log grep confirms the same root symptom: all-required timed out waiting for `CI / Detect changes (push)`, `CI / Python Lint & Test (push)`, and `CI / Canvas (Next.js) (push)` to leave pending; production auto-deploy timed out waiting for `CI / all-required (push)` and `Secret scan / Scan diff for credential-shaped strings (push)`. - DB cross-check for the same SHA showed `Failure=2`, `Success=6`, `Cancelled=22`, `Waiting=16`, consistent with the red status plus queue churn. - Open PR scan still shows older red PRs; #1837 now also includes `CI / all-required`, while #1772/#1768 retain CI/Platform/integration failures and many older PRs retain SOP/review gate failures. - Branch protection remains aligned: `required_approvals=2`, `enable_status_check=true`, required context `CI / all-required (pull_request)`, `enable_merge_whitelist=false`, `dismiss_stale_approvals=true`. - No rerun, merge, branch-protection mutation, deploy, or secret action performed.
Owner

2026-05-25 10:55 PDT triage update (fresh evidence): current main is still 37972fa635 and now has three failing push contexts in the status API: Handlers Postgres Integration / Handlers Postgres Integration (push) (run 87790 job 1), CI / all-required (push) (run 87786 job 6), and publish-workspace-server-image / Production auto-deploy (push) (run 87796 job 1). DB cross-check maps all three action_run rows to status 2=Failure on refs/heads/main at 37972fa635.

Fresh log evidence:

  • Handlers Postgres Integration reached migrations/tests, then failed TestIntegration_PendingUploads_PutGetAckRoundTrip: Get after Ack: got <nil>, want ErrNotFound; same log also includes a check-constraint violation on pending_uploads_size_bytes_check.
  • all-required timed out waiting for CI / Detect changes (push), CI / Python Lint & Test (push), and CI / Canvas (Next.js) (push) to leave pending.
  • Production auto-deploy timed out after 1800s waiting for CI / all-required (push) and Secret scan / Scan diff for credential-shaped strings (push).

Other critical current heads sampled in this pass had no failure/error commit statuses. No reruns, merges, branch-protection mutations, secret reads/rotation, runner restarts, or destructive cleanup performed.

2026-05-25 10:55 PDT triage update (fresh evidence): current main is still 37972fa6352d and now has three failing push contexts in the status API: `Handlers Postgres Integration / Handlers Postgres Integration (push)` (run 87790 job 1), `CI / all-required (push)` (run 87786 job 6), and `publish-workspace-server-image / Production auto-deploy (push)` (run 87796 job 1). DB cross-check maps all three action_run rows to status 2=Failure on refs/heads/main at 37972fa6352d. Fresh log evidence: - Handlers Postgres Integration reached migrations/tests, then failed `TestIntegration_PendingUploads_PutGetAckRoundTrip`: `Get after Ack: got <nil>, want ErrNotFound`; same log also includes a check-constraint violation on `pending_uploads_size_bytes_check`. - all-required timed out waiting for `CI / Detect changes (push)`, `CI / Python Lint & Test (push)`, and `CI / Canvas (Next.js) (push)` to leave pending. - Production auto-deploy timed out after 1800s waiting for `CI / all-required (push)` and `Secret scan / Scan diff for credential-shaped strings (push)`. Other critical current heads sampled in this pass had no failure/error commit statuses. No reruns, merges, branch-protection mutations, secret reads/rotation, runner restarts, or destructive cleanup performed.
Owner

Hourly CI/security triage evidence (2026-05-25 11:55 PDT / 18:55 UTC):

Current main remains 37972fa6352df6469e4787175fc55e9e6302d057 and is still red in the commit-status API: failure=3, pending=40, success=28.

Failing current-head contexts:

  • Handlers Postgres Integration / Handlers Postgres Integration (push) -> /molecule-ai/molecule-core/actions/runs/87790/jobs/1, Failing after 1m49s
  • CI / all-required (push) -> /molecule-ai/molecule-core/actions/runs/87786/jobs/6, Failing after 40m26s
  • publish-workspace-server-image / Production auto-deploy (push) -> /molecule-ai/molecule-core/actions/runs/87796/jobs/1, Failing after 30m20s

Fresh log evidence is unchanged from the prior pass: handlers integration fails at TestIntegration_PendingUploads_PutGetAckRoundTrip with Get after Ack: got <nil>, want ErrNotFound, and the same run logs a pending_uploads_size_bytes_check constraint violation. all-required timed out waiting for CI / Detect changes (push), CI / Python Lint & Test (push), and CI / Canvas (Next.js) (push) to leave pending. Production auto-deploy timed out after 1800s waiting for CI / all-required (push) and secret scan.

DB cross-check still maps runs 87786, 87790, and 87796 to action_run status 2=Failure on refs/heads/main at this SHA; there are also queued/waiting action_run rows now, consistent with runner pressure. Open PR spot check continues to show #1837 red on CI / all-required plus review/SOP gates, #1772/#1768 red on CI/platform/integration contexts, and many older PRs held by SOP/review gates.

No rerun, merge, branch-protection mutation, secret read/rotation, runner restart, or destructive cleanup performed.

Hourly CI/security triage evidence (2026-05-25 11:55 PDT / 18:55 UTC): Current main remains `37972fa6352df6469e4787175fc55e9e6302d057` and is still red in the commit-status API: `failure=3`, `pending=40`, `success=28`. Failing current-head contexts: - `Handlers Postgres Integration / Handlers Postgres Integration (push)` -> `/molecule-ai/molecule-core/actions/runs/87790/jobs/1`, `Failing after 1m49s` - `CI / all-required (push)` -> `/molecule-ai/molecule-core/actions/runs/87786/jobs/6`, `Failing after 40m26s` - `publish-workspace-server-image / Production auto-deploy (push)` -> `/molecule-ai/molecule-core/actions/runs/87796/jobs/1`, `Failing after 30m20s` Fresh log evidence is unchanged from the prior pass: handlers integration fails at `TestIntegration_PendingUploads_PutGetAckRoundTrip` with `Get after Ack: got <nil>, want ErrNotFound`, and the same run logs a `pending_uploads_size_bytes_check` constraint violation. `all-required` timed out waiting for `CI / Detect changes (push)`, `CI / Python Lint & Test (push)`, and `CI / Canvas (Next.js) (push)` to leave pending. Production auto-deploy timed out after 1800s waiting for `CI / all-required (push)` and secret scan. DB cross-check still maps runs 87786, 87790, and 87796 to action_run status `2=Failure` on `refs/heads/main` at this SHA; there are also queued/waiting action_run rows now, consistent with runner pressure. Open PR spot check continues to show #1837 red on `CI / all-required` plus review/SOP gates, #1772/#1768 red on CI/platform/integration contexts, and many older PRs held by SOP/review gates. No rerun, merge, branch-protection mutation, secret read/rotation, runner restart, or destructive cleanup performed.
Owner

Hourly CI/security triage evidence (2026-05-25 12:55 PDT / 19:55 UTC):

Current main advanced to 89476ae3303f639d16dfe00bc95f65480131f7da. The prior 37972fa all-required / production auto-deploy failures are no longer current-head failures, but this new head is still red in the commit-status API: failure=2, pending=28, success=23.

Failing current-head contexts:

  • CI / Canvas (Next.js) (push) -> /molecule-ai/molecule-core/actions/runs/88409/jobs/2, Failing after 7m26s
  • Handlers Postgres Integration / Handlers Postgres Integration (push) -> /molecule-ai/molecule-core/actions/runs/88413/jobs/1, Failing after 3m23s

Fresh log evidence:

  • Canvas fails canvas/src/components/__tests__/CreateWorkspaceDialog.a11y.test.tsx with 2 failures: tests look for placeholder e.g. seo-agent (from workspace-configs-templates/), while the rendered dialog now exposes the Name input with placeholder e.g. SEO Agent. This appears to be a focused test/UI expectation drift around the template input.
  • Handlers Postgres Integration is the same failure class as the prior hour: TestIntegration_PendingUploads_PutGetAckRoundTrip fails with Get after Ack: got <nil>, want ErrNotFound; the same run logs pending_uploads_size_bytes_check constraint violation.

DB cross-check for this SHA: Success=13, Failure=1, Cancelled=1, Skipped=1, Running=1. Notably, action_run 88413 is Failure; action_run 88409 is Skipped in DB even though the commit-status API/log URL show a failed Canvas job, so there is likely Gitea run/status drift for that context.

Open PR spot check: newer PRs #1856/#1855/#1852/#1810 are red on E2E API Smoke and/or Handlers Postgres Integration plus review/SOP gates; older #1837/#1821/#1772/#1770 remain red on CI/all-required or review/SOP gates.

I did not rerun, merge, mutate branch protection, rotate/read secrets, restart runners, or perform cleanup. A focused Canvas test correction may be safe, but I left this pass as verified evidence because the concurrent runner pressure and handler integration failure need care before stacking another CI branch.

Hourly CI/security triage evidence (2026-05-25 12:55 PDT / 19:55 UTC): Current main advanced to `89476ae3303f639d16dfe00bc95f65480131f7da`. The prior `37972fa` all-required / production auto-deploy failures are no longer current-head failures, but this new head is still red in the commit-status API: `failure=2`, `pending=28`, `success=23`. Failing current-head contexts: - `CI / Canvas (Next.js) (push)` -> `/molecule-ai/molecule-core/actions/runs/88409/jobs/2`, `Failing after 7m26s` - `Handlers Postgres Integration / Handlers Postgres Integration (push)` -> `/molecule-ai/molecule-core/actions/runs/88413/jobs/1`, `Failing after 3m23s` Fresh log evidence: - Canvas fails `canvas/src/components/__tests__/CreateWorkspaceDialog.a11y.test.tsx` with 2 failures: tests look for placeholder `e.g. seo-agent (from workspace-configs-templates/)`, while the rendered dialog now exposes the `Name` input with placeholder `e.g. SEO Agent`. This appears to be a focused test/UI expectation drift around the template input. - Handlers Postgres Integration is the same failure class as the prior hour: `TestIntegration_PendingUploads_PutGetAckRoundTrip` fails with `Get after Ack: got <nil>, want ErrNotFound`; the same run logs `pending_uploads_size_bytes_check` constraint violation. DB cross-check for this SHA: `Success=13`, `Failure=1`, `Cancelled=1`, `Skipped=1`, `Running=1`. Notably, action_run 88413 is `Failure`; action_run 88409 is `Skipped` in DB even though the commit-status API/log URL show a failed Canvas job, so there is likely Gitea run/status drift for that context. Open PR spot check: newer PRs #1856/#1855/#1852/#1810 are red on E2E API Smoke and/or Handlers Postgres Integration plus review/SOP gates; older #1837/#1821/#1772/#1770 remain red on CI/all-required or review/SOP gates. I did not rerun, merge, mutate branch protection, rotate/read secrets, restart runners, or perform cleanup. A focused Canvas test correction may be safe, but I left this pass as verified evidence because the concurrent runner pressure and handler integration failure need care before stacking another CI branch.
Owner

Hourly CI/security triage evidence (2026-05-25 13:55 PDT / 20:55 UTC):

Current main advanced to 648ac4d61bdc6a632c06bc01f51915075b115f43 and currently has no failure/error commit statuses (pending=35, success=32, bad=[]). This is a material improvement from the prior 89476ae head where Canvas and Handlers Postgres Integration were red.

Important branch-protection hardening observed on molecule-core@main: required contexts now include CI / all-required (pull_request), E2E API Smoke Test / E2E API Smoke Test (pull_request), and Handlers Postgres Integration / Handlers Postgres Integration (pull_request), with required_approvals=2, merge whitelist enforcement disabled, and stale approvals dismissed. This brings the two recently red surfaces into the protected PR gate set.

DB cross-check for 648ac4d shows Success=22, Cancelled=1, Running=1, and no Failure rows. Open PR spot check still shows red PRs (#1858 review/SOP gates; #1855/#1852/#1810 E2E API Smoke and/or Handlers Postgres Integration plus review/SOP gates; older #1837/#1821/#1772/#1770 CI/all-required or review/SOP gates), so current main is cleaner than the PR queue.

No rerun, merge, branch-protection mutation, secret read/rotation, runner restart, or destructive cleanup performed.

Hourly CI/security triage evidence (2026-05-25 13:55 PDT / 20:55 UTC): Current main advanced to `648ac4d61bdc6a632c06bc01f51915075b115f43` and currently has no failure/error commit statuses (`pending=35`, `success=32`, `bad=[]`). This is a material improvement from the prior `89476ae` head where Canvas and Handlers Postgres Integration were red. Important branch-protection hardening observed on `molecule-core@main`: required contexts now include `CI / all-required (pull_request)`, `E2E API Smoke Test / E2E API Smoke Test (pull_request)`, and `Handlers Postgres Integration / Handlers Postgres Integration (pull_request)`, with `required_approvals=2`, merge whitelist enforcement disabled, and stale approvals dismissed. This brings the two recently red surfaces into the protected PR gate set. DB cross-check for `648ac4d` shows `Success=22`, `Cancelled=1`, `Running=1`, and no Failure rows. Open PR spot check still shows red PRs (#1858 review/SOP gates; #1855/#1852/#1810 E2E API Smoke and/or Handlers Postgres Integration plus review/SOP gates; older #1837/#1821/#1772/#1770 CI/all-required or review/SOP gates), so current main is cleaner than the PR queue. No rerun, merge, branch-protection mutation, secret read/rotation, runner restart, or destructive cleanup performed.
Owner

Hourly CI/security triage evidence (2026-05-25 14:55 PDT / 21:55 UTC):

Current main remains 648ac4d61bdc6a632c06bc01f51915075b115f43 and still has no failure/error commit statuses (pending=48, success=45, bad=[]). DB cross-check shows Success=75, Cancelled=2, Skipped=2, Running=1, and no Failure rows for this SHA.

Branch protection remains in the improved shape observed last hour: required contexts include CI / all-required (pull_request), E2E API Smoke Test / E2E API Smoke Test (pull_request), and Handlers Postgres Integration / Handlers Postgres Integration (pull_request), with required_approvals=2, merge whitelist enforcement disabled, and stale approvals dismissed.

Open PR spot check is still noisy: #1858 red on review/SOP gates; #1855/#1852/#1810 red on E2E API Smoke and/or Handlers Postgres Integration plus review/SOP gates; older #1837/#1821/#1772/#1770 remain red on CI/all-required or review/SOP gates. Current main remains cleaner than the PR queue.

No rerun, merge, branch-protection mutation, secret read/rotation, runner restart, or destructive cleanup performed.

Hourly CI/security triage evidence (2026-05-25 14:55 PDT / 21:55 UTC): Current main remains `648ac4d61bdc6a632c06bc01f51915075b115f43` and still has no failure/error commit statuses (`pending=48`, `success=45`, `bad=[]`). DB cross-check shows `Success=75`, `Cancelled=2`, `Skipped=2`, `Running=1`, and no Failure rows for this SHA. Branch protection remains in the improved shape observed last hour: required contexts include `CI / all-required (pull_request)`, `E2E API Smoke Test / E2E API Smoke Test (pull_request)`, and `Handlers Postgres Integration / Handlers Postgres Integration (pull_request)`, with `required_approvals=2`, merge whitelist enforcement disabled, and stale approvals dismissed. Open PR spot check is still noisy: #1858 red on review/SOP gates; #1855/#1852/#1810 red on E2E API Smoke and/or Handlers Postgres Integration plus review/SOP gates; older #1837/#1821/#1772/#1770 remain red on CI/all-required or review/SOP gates. Current main remains cleaner than the PR queue. No rerun, merge, branch-protection mutation, secret read/rotation, runner restart, or destructive cleanup performed.
Owner

Hourly CI/security triage evidence (2026-05-25 15:55 PDT / 22:55 UTC):

Current main remains 648ac4d61bdc6a632c06bc01f51915075b115f43 and still has no failure/error commit statuses (pending=62, success=59, bad=[]). DB cross-check shows Success=93, Cancelled=2, Skipped=2, Running=1, and no Failure rows for this SHA.

Branch protection remains in the improved shape: required contexts include CI / all-required (pull_request), E2E API Smoke Test / E2E API Smoke Test (pull_request), and Handlers Postgres Integration / Handlers Postgres Integration (pull_request), with required_approvals=2, merge whitelist enforcement disabled, and stale approvals dismissed.

Open PR spot check is still noisy: #1858 red on review/SOP gates; #1855/#1852/#1810 red on E2E API Smoke and/or Handlers Postgres Integration plus review/SOP gates; #1837/#1772/#1770 remain red on CI/all-required or review/SOP gates. Current main remains cleaner than the PR queue.

No rerun, merge, branch-protection mutation, secret read/rotation, runner restart, or destructive cleanup performed.

Hourly CI/security triage evidence (2026-05-25 15:55 PDT / 22:55 UTC): Current main remains `648ac4d61bdc6a632c06bc01f51915075b115f43` and still has no failure/error commit statuses (`pending=62`, `success=59`, `bad=[]`). DB cross-check shows `Success=93`, `Cancelled=2`, `Skipped=2`, `Running=1`, and no Failure rows for this SHA. Branch protection remains in the improved shape: required contexts include `CI / all-required (pull_request)`, `E2E API Smoke Test / E2E API Smoke Test (pull_request)`, and `Handlers Postgres Integration / Handlers Postgres Integration (pull_request)`, with `required_approvals=2`, merge whitelist enforcement disabled, and stale approvals dismissed. Open PR spot check is still noisy: #1858 red on review/SOP gates; #1855/#1852/#1810 red on E2E API Smoke and/or Handlers Postgres Integration plus review/SOP gates; #1837/#1772/#1770 remain red on CI/all-required or review/SOP gates. Current main remains cleaner than the PR queue. No rerun, merge, branch-protection mutation, secret read/rotation, runner restart, or destructive cleanup performed.
Owner

2026-05-25 23:55Z heartbeat triage evidence:

  • Current main: 648ac4d61bdc6a632c06bc01f51915075b115f43.
  • Commit statuses sampled from Gitea API: success=73, pending=76, failure/error=0.
  • Branch protection now requires status checks for CI / all-required, E2E API Smoke Test, and Handlers Postgres Integration; required_approvals=2, merge whitelist disabled, stale approvals dismissed.
  • Gitea DB action_run sample for this SHA: Success=109, Cancelled=2, Skipped=2, Running=1; no Failure rows.
  • Open PR queue remains noisy with failing historical heads (#1858, #1855, #1852, #1837, etc.), but no current-head regression found.

No branch protection mutation, rerun, merge, or runner cleanup performed.

2026-05-25 23:55Z heartbeat triage evidence: - Current main: `648ac4d61bdc6a632c06bc01f51915075b115f43`. - Commit statuses sampled from Gitea API: success=73, pending=76, failure/error=0. - Branch protection now requires status checks for `CI / all-required`, `E2E API Smoke Test`, and `Handlers Postgres Integration`; `required_approvals=2`, merge whitelist disabled, stale approvals dismissed. - Gitea DB `action_run` sample for this SHA: Success=109, Cancelled=2, Skipped=2, Running=1; no Failure rows. - Open PR queue remains noisy with failing historical heads (`#1858`, `#1855`, `#1852`, `#1837`, etc.), but no current-head regression found. No branch protection mutation, rerun, merge, or runner cleanup performed.
Owner

2026-05-26 00:55Z heartbeat triage evidence:

  • Current main advanced to 19b4d816709ca764dd71dbaafd4a225ea179faf4.
  • Commit statuses sampled from Gitea API: success=30, pending=31, failure/error=0.
  • Branch protection still requires CI / all-required, E2E API Smoke Test, and Handlers Postgres Integration; required_approvals=2, merge whitelist disabled, stale approvals dismissed.
  • Gitea DB action_run sample for this SHA: Success=28, Cancelled=1; no Failure rows.
  • Open PR queue remains noisy with failing historical heads (#1855, #1852, #1837, #1810, #1768, etc.), but no current-head regression found.

No branch protection mutation, rerun, merge, or runner cleanup performed.

2026-05-26 00:55Z heartbeat triage evidence: - Current main advanced to `19b4d816709ca764dd71dbaafd4a225ea179faf4`. - Commit statuses sampled from Gitea API: success=30, pending=31, failure/error=0. - Branch protection still requires `CI / all-required`, `E2E API Smoke Test`, and `Handlers Postgres Integration`; `required_approvals=2`, merge whitelist disabled, stale approvals dismissed. - Gitea DB `action_run` sample for this SHA: Success=28, Cancelled=1; no Failure rows. - Open PR queue remains noisy with failing historical heads (`#1855`, `#1852`, `#1837`, `#1810`, `#1768`, etc.), but no current-head regression found. No branch protection mutation, rerun, merge, or runner cleanup performed.
Owner

2026-05-26 01:55Z heartbeat triage evidence:

  • Current main advanced to 9843a970d370fc5b883f009362a0a4f56fe9427a.
  • Commit statuses sampled from Gitea API: success=30, pending=31, failure/error=0.
  • Branch protection still requires CI / all-required, E2E API Smoke Test, and Handlers Postgres Integration; required_approvals=2, merge whitelist disabled, stale approvals dismissed.
  • Gitea DB action_run sample for this SHA: Success=24, Cancelled=1; no Failure rows.
  • Open PR queue remains noisy with failing historical heads (#1855, #1852, #1837, #1810, #1772, #1770, etc.), but no current-head regression found.

No branch protection mutation, rerun, merge, or runner cleanup performed.

2026-05-26 01:55Z heartbeat triage evidence: - Current main advanced to `9843a970d370fc5b883f009362a0a4f56fe9427a`. - Commit statuses sampled from Gitea API: success=30, pending=31, failure/error=0. - Branch protection still requires `CI / all-required`, `E2E API Smoke Test`, and `Handlers Postgres Integration`; `required_approvals=2`, merge whitelist disabled, stale approvals dismissed. - Gitea DB `action_run` sample for this SHA: Success=24, Cancelled=1; no Failure rows. - Open PR queue remains noisy with failing historical heads (`#1855`, `#1852`, `#1837`, `#1810`, `#1772`, `#1770`, etc.), but no current-head regression found. No branch protection mutation, rerun, merge, or runner cleanup performed.
Owner

2026-05-26 02:55Z heartbeat triage evidence:

  • Current main remains 9843a970d370fc5b883f009362a0a4f56fe9427a.
  • Commit statuses sampled from Gitea API: success=44, pending=45, failure/error=0.
  • Branch protection still requires CI / all-required, E2E API Smoke Test, and Handlers Postgres Integration; required_approvals=2, merge whitelist disabled, stale approvals dismissed.
  • Gitea DB action_run sample for this SHA: Success=54, Cancelled=1; no Failure rows.
  • Open PR queue remains noisy with failing historical heads (#1855, #1852, #1837, #1810, #1772, #1770, etc.), but no current-head regression found.

No branch protection mutation, rerun, merge, or runner cleanup performed.

2026-05-26 02:55Z heartbeat triage evidence: - Current main remains `9843a970d370fc5b883f009362a0a4f56fe9427a`. - Commit statuses sampled from Gitea API: success=44, pending=45, failure/error=0. - Branch protection still requires `CI / all-required`, `E2E API Smoke Test`, and `Handlers Postgres Integration`; `required_approvals=2`, merge whitelist disabled, stale approvals dismissed. - Gitea DB `action_run` sample for this SHA: Success=54, Cancelled=1; no Failure rows. - Open PR queue remains noisy with failing historical heads (`#1855`, `#1852`, `#1837`, `#1810`, `#1772`, `#1770`, etc.), but no current-head regression found. No branch protection mutation, rerun, merge, or runner cleanup performed.
Owner

2026-05-26 03:55Z heartbeat triage evidence:

  • Current main remains 9843a970d370fc5b883f009362a0a4f56fe9427a.
  • Commit statuses sampled from Gitea API: success=59, pending=60, failure/error=0.
  • Branch protection still requires CI / all-required, E2E API Smoke Test, and Handlers Postgres Integration; required_approvals=2, merge whitelist disabled, stale approvals dismissed.
  • Gitea DB action_run sample for this SHA: Success=77, Cancelled=1; no Failure rows.
  • Open PR queue remains noisy with failing historical heads (#1862, #1861, #1855, #1852, #1837, etc.), but no current-head regression found.

No branch protection mutation, rerun, merge, or runner cleanup performed.

2026-05-26 03:55Z heartbeat triage evidence: - Current main remains `9843a970d370fc5b883f009362a0a4f56fe9427a`. - Commit statuses sampled from Gitea API: success=59, pending=60, failure/error=0. - Branch protection still requires `CI / all-required`, `E2E API Smoke Test`, and `Handlers Postgres Integration`; `required_approvals=2`, merge whitelist disabled, stale approvals dismissed. - Gitea DB `action_run` sample for this SHA: Success=77, Cancelled=1; no Failure rows. - Open PR queue remains noisy with failing historical heads (`#1862`, `#1861`, `#1855`, `#1852`, `#1837`, etc.), but no current-head regression found. No branch protection mutation, rerun, merge, or runner cleanup performed.
Owner

2026-05-26 04:55Z heartbeat triage evidence:

  • Current main advanced to 01295486575877167c6f8dfffe6258c8e9780278.
  • Commit statuses sampled from Gitea API: success=38, pending=40, failure=1: E2E Chat / E2E Chat (push) at /molecule-ai/molecule-core/actions/runs/89352/jobs/1.
  • Failing log evidence: Playwright MobileChat > history persists across reload failed because expected locator was not visible / element not found; artifact paths included test-results/chat-mobile-MobileChat-history-persists-across-reload-chromium/test-failed-1.png and error-context.md. Logs also showed platform/canvas startup health echoes but the terminal failure was the Playwright visibility assertion.
  • Branch protection still requires CI / all-required, E2E API Smoke Test, and Handlers Postgres Integration; it does not require this push-only E2E Chat context directly.
  • Gitea DB sample for this SHA: Success=25, Failure=1, Cancelled=1, Waiting=2.

No branch protection mutation, rerun, merge, or runner cleanup performed.

2026-05-26 04:55Z heartbeat triage evidence: - Current main advanced to `01295486575877167c6f8dfffe6258c8e9780278`. - Commit statuses sampled from Gitea API: success=38, pending=40, failure=1: `E2E Chat / E2E Chat (push)` at `/molecule-ai/molecule-core/actions/runs/89352/jobs/1`. - Failing log evidence: Playwright `MobileChat > history persists across reload` failed because expected locator was not visible / element not found; artifact paths included `test-results/chat-mobile-MobileChat-history-persists-across-reload-chromium/test-failed-1.png` and `error-context.md`. Logs also showed platform/canvas startup health echoes but the terminal failure was the Playwright visibility assertion. - Branch protection still requires `CI / all-required`, `E2E API Smoke Test`, and `Handlers Postgres Integration`; it does not require this push-only `E2E Chat` context directly. - Gitea DB sample for this SHA: Success=25, Failure=1, Cancelled=1, Waiting=2. No branch protection mutation, rerun, merge, or runner cleanup performed.
Member

RCA — root cause

#1638 is a compound main-red, not a single regression from merge commit def18f28. The two red contexts are independent guardrail lanes: the staging peer-visibility gate is exercising the literal MCP list_peers user path and reproducing a platform peer/auth regression, while the Railway pin audit is failing in its secret preflight before it can audit drift.

Evidence

  • Gitea status API for def18f28fa7477ac82d114d570c9dc9fe63ce7d7 reports exactly two failures: peer visibility after 2m17s and Railway pin audit after 3s.
  • .gitea/workflows/e2e-peer-visibility.yml:296-346 runs the real staging gate on push, with CP_STAGING_ADMIN_API_TOKEN, LLM keys, staging health, then tests/e2e/test_peer_visibility_mcp_staging.sh.
  • tests/e2e/lib/peer_visibility_assert.sh:70-90 fails the gate on non-200 MCP list_peers; :137-150 fails native fallback or missing peer sets, so this is an honest user-path assertion rather than registry/heartbeat proxy health.
  • .gitea/workflows/railway-pin-audit.yml:31-33 says scheduled audit hard-fails on missing RAILWAY_AUDIT_TOKEN; :61-73 exits immediately with an error when that secret is absent, matching the 3s failure.

Suggested fix

Split ownership. Peer visibility belongs to the staging peer/auth runtime path: inspect run 77595 to classify HTTP 401, native fallback, or missing peer IDs, then route to the workspace MCP/auth owner. Railway audit belongs to CI/secrets hygiene: provision/repair RAILWAY_AUDIT_TOKEN with read-only Railway variable scope and keep the fail-fast preflight. Do not treat this as a product rollback candidate for PR #1637; that merge touched status-reaper compensation while the failing lanes are external guardrails.

Confidence

High — both failing contexts and their workflow code point to separate guardrail failures; raw logs would only refine the peer-visibility sub-classification.

## RCA — root cause `#1638` is a compound main-red, not a single regression from merge commit `def18f28`. The two red contexts are independent guardrail lanes: the staging peer-visibility gate is exercising the literal MCP `list_peers` user path and reproducing a platform peer/auth regression, while the Railway pin audit is failing in its secret preflight before it can audit drift. ## Evidence - Gitea status API for `def18f28fa7477ac82d114d570c9dc9fe63ce7d7` reports exactly two failures: peer visibility after 2m17s and Railway pin audit after 3s. - `.gitea/workflows/e2e-peer-visibility.yml:296-346` runs the real staging gate on push, with `CP_STAGING_ADMIN_API_TOKEN`, LLM keys, staging health, then `tests/e2e/test_peer_visibility_mcp_staging.sh`. - `tests/e2e/lib/peer_visibility_assert.sh:70-90` fails the gate on non-200 MCP `list_peers`; `:137-150` fails native fallback or missing peer sets, so this is an honest user-path assertion rather than registry/heartbeat proxy health. - `.gitea/workflows/railway-pin-audit.yml:31-33` says scheduled audit hard-fails on missing `RAILWAY_AUDIT_TOKEN`; `:61-73` exits immediately with an error when that secret is absent, matching the 3s failure. ## Suggested fix Split ownership. Peer visibility belongs to the staging peer/auth runtime path: inspect run `77595` to classify HTTP 401, native fallback, or missing peer IDs, then route to the workspace MCP/auth owner. Railway audit belongs to CI/secrets hygiene: provision/repair `RAILWAY_AUDIT_TOKEN` with read-only Railway variable scope and keep the fail-fast preflight. Do not treat this as a product rollback candidate for PR #1637; that merge touched status-reaper compensation while the failing lanes are external guardrails. ## Confidence High — both failing contexts and their workflow code point to separate guardrail failures; raw logs would only refine the peer-visibility sub-classification.
Owner

CI/security heartbeat — 2026-05-26 06:17 PDT / 13:17 UTC

Fresh evidence:

  • main is 4001f54057.
  • commit-status counts: success=15, pending=27, failure/error=0. Current head is not red, but several push checks are still waiting/running in the action DB (status 5 Waiting x7, status 6 Running x2; status 1 Success x8 at sample time).
  • Branch protection still requires CI / all-required, E2E API Smoke Test, and Handlers Postgres Integration for pull_request, with required_approvals=2 and stale approvals dismissed.
  • Open PR sample: #1906 has no failed contexts yet (pending=30, success=1); #1905/#1904/#1899 and older PRs remain blocked mostly on SOP/security/QA gate contexts, with #1676/#1669 also red on protected CI contexts.

Security posture:

  • No required-context drift observed on main branch protection.
  • No unsafe write-all permission found in the critical repo workflow scan for molecule-core.
  • Runner evidence: /mnt/ci-runner-storage-1 is 73% used, / is 35%, restart-required is present, load average was 19.24/23.97/15.88, zombies=17, and live Gitea action containers were active. No runner restart or cleanup performed.
CI/security heartbeat — 2026-05-26 06:17 PDT / 13:17 UTC Fresh evidence: - main is 4001f54057fdc2cbfdf5f659d1879e663ac9580d. - commit-status counts: success=15, pending=27, failure/error=0. Current head is not red, but several push checks are still waiting/running in the action DB (status 5 Waiting x7, status 6 Running x2; status 1 Success x8 at sample time). - Branch protection still requires CI / all-required, E2E API Smoke Test, and Handlers Postgres Integration for pull_request, with required_approvals=2 and stale approvals dismissed. - Open PR sample: #1906 has no failed contexts yet (pending=30, success=1); #1905/#1904/#1899 and older PRs remain blocked mostly on SOP/security/QA gate contexts, with #1676/#1669 also red on protected CI contexts. Security posture: - No required-context drift observed on main branch protection. - No unsafe write-all permission found in the critical repo workflow scan for molecule-core. - Runner evidence: /mnt/ci-runner-storage-1 is 73% used, / is 35%, restart-required is present, load average was 19.24/23.97/15.88, zombies=17, and live Gitea action containers were active. No runner restart or cleanup performed.

main returned to green at SHA ca9fe8dbfca459f4b4a61f55dcd21fecae6c1b73 (https://git.moleculesai.app/molecule-ai/molecule-core/commit/ca9fe8dbfca459f4b4a61f55dcd21fecae6c1b73). Closing automatically. If the underlying root cause is not yet understood, reopen this issue and file a postmortem — green-by-flake is still a bug per feedback_no_such_thing_as_flakes.

`main` returned to green at SHA `ca9fe8dbfca459f4b4a61f55dcd21fecae6c1b73` (<https://git.moleculesai.app/molecule-ai/molecule-core/commit/ca9fe8dbfca459f4b4a61f55dcd21fecae6c1b73>). Closing automatically. If the underlying root cause is not yet understood, reopen this issue and file a postmortem — green-by-flake is still a bug per `feedback_no_such_thing_as_flakes`.
gitea-actions bot closed this issue 2026-05-26 16:05:59 +00:00
Owner

CI/security heartbeat — 2026-05-26 10:28 PDT / 17:28 UTC

Fresh evidence:

  • main advanced to a176900e7c and is red: success=33, pending=38, failure=4.
  • Failing push contexts: publish-workspace-server-image / Production auto-deploy, Handlers Postgres Integration, CI / all-required, and CI / Platform (Go).
  • Root cause visible in Platform (Go) log: .
  • Production auto-deploy correctly refused deploy because required CI context was failing.
  • DB sample for this head: Success=21, Failure=3, Cancelled=1.

Safe focused fix opened:

  • PR #1923: #1923
  • Change: replace stale GitHub monorepo db import with .
  • Verification: passed from ; passed.
  • Broader targeted handlers MCP tests still fail on existing sqlmock delegation expectations (), separate from this import-path fix.

No merge, deploy rerun, branch protection mutation, secret rotation, or destructive cleanup performed.

CI/security heartbeat — 2026-05-26 10:28 PDT / 17:28 UTC Fresh evidence: - main advanced to a176900e7cf695c6f1a716142b76d0ffbedbd78b and is red: success=33, pending=38, failure=4. - Failing push contexts: publish-workspace-server-image / Production auto-deploy, Handlers Postgres Integration, CI / all-required, and CI / Platform (Go). - Root cause visible in Platform (Go) log: . - Production auto-deploy correctly refused deploy because required CI context was failing. - DB sample for this head: Success=21, Failure=3, Cancelled=1. Safe focused fix opened: - PR #1923: https://git.moleculesai.app/molecule-ai/molecule-core/pulls/1923 - Change: replace stale GitHub monorepo db import with . - Verification: passed from ; passed. - Broader targeted handlers MCP tests still fail on existing sqlmock delegation expectations (), separate from this import-path fix. No merge, deploy rerun, branch protection mutation, secret rotation, or destructive cleanup performed.
Owner

Correction to the immediately previous heartbeat comment: the shell mangled a few inline code spans while posting. Authoritative evidence is below.

CI/security heartbeat — 2026-05-26 10:28 PDT / 17:28 UTC

Fresh evidence:

  • main advanced to a176900e7c and is red: success=33, pending=38, failure=4.
  • Failing push contexts: publish-workspace-server-image / Production auto-deploy, Handlers Postgres Integration, CI / all-required, and CI / Platform (Go).
  • Root cause visible in Platform (Go) log: internal/handlers/mcp_tools_test.go:9:2: no required module provides package github.com/Molecule-AI/molecule-monorepo/platform/internal/db.
  • Production auto-deploy correctly refused deploy because required CI context CI / all-required (push) was failing.
  • DB sample for this head: Success=21, Failure=3, Cancelled=1.

Safe focused fix opened:

  • PR #1923: #1923
  • Change: replace stale GitHub monorepo db import with git.moleculesai.app/molecule-ai/molecule-core/workspace-server/internal/db.
  • Verification: go vet ./... passed from workspace-server; go test ./internal/handlers -run TestExtractA2AText passed.
  • Broader targeted handlers MCP tests still fail on existing sqlmock delegation expectations (expected 5, got 6 arguments), separate from this import-path fix.

No merge, deploy rerun, branch protection mutation, secret rotation, or destructive cleanup performed.

Correction to the immediately previous heartbeat comment: the shell mangled a few inline code spans while posting. Authoritative evidence is below. CI/security heartbeat — 2026-05-26 10:28 PDT / 17:28 UTC Fresh evidence: - main advanced to a176900e7cf695c6f1a716142b76d0ffbedbd78b and is red: success=33, pending=38, failure=4. - Failing push contexts: publish-workspace-server-image / Production auto-deploy, Handlers Postgres Integration, CI / all-required, and CI / Platform (Go). - Root cause visible in Platform (Go) log: `internal/handlers/mcp_tools_test.go:9:2: no required module provides package github.com/Molecule-AI/molecule-monorepo/platform/internal/db`. - Production auto-deploy correctly refused deploy because required CI context `CI / all-required (push)` was failing. - DB sample for this head: Success=21, Failure=3, Cancelled=1. Safe focused fix opened: - PR #1923: https://git.moleculesai.app/molecule-ai/molecule-core/pulls/1923 - Change: replace stale GitHub monorepo db import with `git.moleculesai.app/molecule-ai/molecule-core/workspace-server/internal/db`. - Verification: `go vet ./...` passed from `workspace-server`; `go test ./internal/handlers -run TestExtractA2AText` passed. - Broader targeted handlers MCP tests still fail on existing sqlmock delegation expectations (`expected 5, got 6 arguments`), separate from this import-path fix. No merge, deploy rerun, branch protection mutation, secret rotation, or destructive cleanup performed.
Owner

CI/security heartbeat — 2026-05-26 11:28 PDT / 18:28 UTC

Fresh evidence:

  • main advanced to 12319f1ffd32f08e77f592aaed6bc7f295d33f1f; current head is not red: API statuses success=24, pending=32, failure/error=0.
  • DB sample for this head: Success=13, Waiting=1, Running=4.
  • PR #1923 remains open and unmerged: #1923
    • Head 4b153dbc748e559ed3c0350df4a8894956b32147 has statuses success=25, pending=32, failure=5.
    • The original stale-import go vet failure is fixed by #1923, but PR CI is now failing in broader internal/handlers sqlmock expectation drift (INSERT INTO structure_events... vs expected UPDATE workspaces SET status, plus fulfilled expectations around workspace_config persistence).
    • Added PR comment #50054 with this evidence.
  • Branch protection still requires CI / all-required, E2E API Smoke Test, and Handlers Postgres Integration for pull_request, with required_approvals=2, merge whitelist disabled, stale approvals dismissed.

No merge, force-push, deploy rerun, branch-protection mutation, or destructive cleanup performed.

CI/security heartbeat — 2026-05-26 11:28 PDT / 18:28 UTC Fresh evidence: - main advanced to `12319f1ffd32f08e77f592aaed6bc7f295d33f1f`; current head is not red: API statuses `success=24`, `pending=32`, failure/error=0. - DB sample for this head: Success=13, Waiting=1, Running=4. - PR #1923 remains open and unmerged: https://git.moleculesai.app/molecule-ai/molecule-core/pulls/1923 - Head `4b153dbc748e559ed3c0350df4a8894956b32147` has statuses `success=25`, `pending=32`, `failure=5`. - The original stale-import `go vet` failure is fixed by #1923, but PR CI is now failing in broader `internal/handlers` sqlmock expectation drift (`INSERT INTO structure_events...` vs expected `UPDATE workspaces SET status`, plus fulfilled expectations around workspace_config persistence). - Added PR comment #50054 with this evidence. - Branch protection still requires `CI / all-required`, `E2E API Smoke Test`, and `Handlers Postgres Integration` for pull_request, with `required_approvals=2`, merge whitelist disabled, stale approvals dismissed. No merge, force-push, deploy rerun, branch-protection mutation, or destructive cleanup performed.
Owner

2026-05-26 12:28 PDT triage update:

Fresh evidence:

  • molecule-core@12319f1 now has one red push context: Continuous synthetic E2E (staging) / Synthetic E2E against staging (push), run 92757/job 0.
  • Gitea log shows the full SaaS E2E reached parent A2A successfully, then failed at step 9 on POST /workspaces/.../memories with HTTP 500.
  • Loki around the failure shows Commit memory error (plugin): memory-plugin: internal: commit memory: pq: insert or update on table "memory_records" violates foreign key constraint "memory_records_namespace_fkey".

Action taken:

  • Opened PR #1925: #1925
  • The PR upserts the v2 memory namespace before commit_memory_v2 writes, matching the existing backfill contract that namespace creation must precede memory_records inserts.

Validation:

  • go test ./internal/handlers -run 'TestCommitMemoryV2|TestCommitMemoryLegacyShim|TestToolCommitMemory'
  • go test ./internal/handlers
  • go test ./internal/memory/... ./internal/handlers -run 'TestCommitMemoryV2|TestCommitMemoryLegacyShim|TestMemoryV2|TestMemoriesV2'

No merge, deploy, runner restart, or secret rotation performed.

2026-05-26 12:28 PDT triage update: Fresh evidence: - `molecule-core@12319f1` now has one red push context: `Continuous synthetic E2E (staging) / Synthetic E2E against staging (push)`, run 92757/job 0. - Gitea log shows the full SaaS E2E reached parent A2A successfully, then failed at step 9 on `POST /workspaces/.../memories` with HTTP 500. - Loki around the failure shows `Commit memory error (plugin): memory-plugin: internal: commit memory: pq: insert or update on table "memory_records" violates foreign key constraint "memory_records_namespace_fkey"`. Action taken: - Opened PR #1925: https://git.moleculesai.app/molecule-ai/molecule-core/pulls/1925 - The PR upserts the v2 memory namespace before `commit_memory_v2` writes, matching the existing backfill contract that namespace creation must precede `memory_records` inserts. Validation: - `go test ./internal/handlers -run 'TestCommitMemoryV2|TestCommitMemoryLegacyShim|TestToolCommitMemory'` - `go test ./internal/handlers` - `go test ./internal/memory/... ./internal/handlers -run 'TestCommitMemoryV2|TestCommitMemoryLegacyShim|TestMemoryV2|TestMemoriesV2'` No merge, deploy, runner restart, or secret rotation performed.
Owner

2026-05-26 13:28 PDT triage update:

Fresh evidence:

  • molecule-core@12319f1 has three red continuous synthetic E2E push contexts: runs 92757, 92766, and 92793.
  • Runs 92766 and 92793 both reached parent A2A PONG, then failed on step 9 Writing + reading HMA memory on parent with memory POST failed / HTTP 500.
  • This matches the Loki evidence from the previous wake: memory_records_namespace_fkey on v2 memory writes.

Active fix:

  • PR #1925 remains open: #1925
  • PR head 42b16b33fb069aab19d6a9f2cde69fd5525437f3 has code CI pending/green so far and no code-test failure surfaced in status API; red statuses are review/SOP gates (sop-checklist, security-review, qa-review).
  • No merge or deploy performed. Staging synthetic E2E is expected to continue failing until the fix merges and reaches staging.
2026-05-26 13:28 PDT triage update: Fresh evidence: - `molecule-core@12319f1` has three red continuous synthetic E2E push contexts: runs 92757, 92766, and 92793. - Runs 92766 and 92793 both reached parent A2A `PONG`, then failed on step 9 `Writing + reading HMA memory on parent` with `memory POST failed` / HTTP 500. - This matches the Loki evidence from the previous wake: `memory_records_namespace_fkey` on v2 memory writes. Active fix: - PR #1925 remains open: https://git.moleculesai.app/molecule-ai/molecule-core/pulls/1925 - PR head `42b16b33fb069aab19d6a9f2cde69fd5525437f3` has code CI pending/green so far and no code-test failure surfaced in status API; red statuses are review/SOP gates (`sop-checklist`, `security-review`, `qa-review`). - No merge or deploy performed. Staging synthetic E2E is expected to continue failing until the fix merges and reaches staging.
Owner

CI/security heartbeat 2026-05-26 14:28 PDT fresh sample:

  • molecule-core@12319f1ffd32f08e77f592aaed6bc7f295d33f1f: API statuses success=68, pending=74, failure=5.
  • Latest failing staging synthetic E2E run /molecule-ai/molecule-core/actions/runs/92877/jobs/0 reached workspace online, image round-trip, files API, route recovery, and A2A PONG, then failed at step 9 Writing + reading HMA memory on parent with HTTP 500.
  • Open fix PR remains #1925 (fix(memory): upsert namespace before v2 commit). PR head status: success=27, pending=32, failure=3; failures are SOP checklist/security-review/QA-review gates, not the local targeted tests from the fix branch.
  • No merge/deploy/rerun/runner cleanup performed.
CI/security heartbeat 2026-05-26 14:28 PDT fresh sample: - `molecule-core@12319f1ffd32f08e77f592aaed6bc7f295d33f1f`: API statuses `success=68`, `pending=74`, `failure=5`. - Latest failing staging synthetic E2E run `/molecule-ai/molecule-core/actions/runs/92877/jobs/0` reached workspace online, image round-trip, files API, route recovery, and A2A `PONG`, then failed at step 9 `Writing + reading HMA memory on parent` with HTTP 500. - Open fix PR remains #1925 (`fix(memory): upsert namespace before v2 commit`). PR head status: `success=27`, `pending=32`, `failure=3`; failures are SOP checklist/security-review/QA-review gates, not the local targeted tests from the fix branch. - No merge/deploy/rerun/runner cleanup performed.
Owner

CI/security heartbeat 2026-05-26 15:28 PDT fresh sample:

  • molecule-core@cffe4bec431522e82cd9fb113c80e934e9a880c3: API statuses success=32, pending=34, failure=1.
  • Latest failing staging synthetic E2E run /molecule-ai/molecule-core/actions/runs/93017/jobs/0 reached tenant/workspace online, image round-trip, files API, route recovery, and A2A PONG, then failed at step 9 Writing + reading HMA memory on parent with HTTP 500.
  • Fix PR #1925 is still open/unmerged; PR head 42b16b33fb069aab19d6a9f2cde69fd5525437f3 remains gated by sop-checklist, security-review, and qa-review statuses.
  • No merge/deploy/rerun/runner cleanup performed.
CI/security heartbeat 2026-05-26 15:28 PDT fresh sample: - `molecule-core@cffe4bec431522e82cd9fb113c80e934e9a880c3`: API statuses `success=32`, `pending=34`, `failure=1`. - Latest failing staging synthetic E2E run `/molecule-ai/molecule-core/actions/runs/93017/jobs/0` reached tenant/workspace online, image round-trip, files API, route recovery, and A2A `PONG`, then failed at step 9 `Writing + reading HMA memory on parent` with HTTP 500. - Fix PR #1925 is still open/unmerged; PR head `42b16b33fb069aab19d6a9f2cde69fd5525437f3` remains gated by `sop-checklist`, `security-review`, and `qa-review` statuses. - No merge/deploy/rerun/runner cleanup performed.
Sign in to join this conversation.
3 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: molecule-ai/molecule-core#1638