[main-red] molecule-ai/molecule-core: 6c7f66fa31 #1729

Closed
opened 2026-05-23 11:07:09 +00:00 by gitea-actions · 3 comments

Main is RED on molecule-ai/molecule-core at 6c7f66fa31

Commit: https://git.moleculesai.app/molecule-ai/molecule-core/commit/6c7f66fa3106db8648bea1ce5fec96717a78341d

Auto-filed by .gitea/workflows/main-red-watchdog.yml (Option C of the main-never-red directive). Per feedback_no_such_thing_as_flakes + feedback_fix_root_not_symptom: investigate the root cause; do NOT revert as a reflex. The watchdog itself never reverts.

Failed status contexts

  • E2E Peer Visibility (literal MCP list_peers) / E2E Peer Visibility (push)failurelogs
    • Failing after 2m9s
  • CI / Platform (Go) (push)failurelogs
    • Failing after 4m42s
  • publish-workspace-server-image / Production auto-deploy (push)failurelogs
    • Failing after 18m43s
  • CI / all-required (push)failurelogs
    • Failing after 14m56s

Resolution path

  1. Read the failed logs (links above).
  2. If reproducible locally, fix forward in a PR targeting main.
  3. If the failure is a real flake — STOP. Per feedback_no_such_thing_as_flakes, intermittent failures are real bugs. Investigate to root cause; do not mark as flake.
  4. If the failure is blocking unrelated work for >1 hour, file a follow-up issue and assign someone. Do NOT revert without a human GO per feedback_prod_apply_needs_hongming_chat_go (branch protection is a prod surface).

Debug

{
  "all_contexts": [
    {
      "context": "E2E Chat / detect-changes (push)",
      "state": "success"
    },
    {
      "context": "E2E API Smoke Test / detect-changes (push)",
      "state": "success"
    },
    {
      "context": "E2E Peer Visibility (literal MCP list_peers) / E2E Peer Visibility (local) (push)",
      "state": "success"
    },
    {
      "context": "E2E Staging Canvas (Playwright) / detect-changes (push)",
      "state": "success"
    },
    {
      "context": "E2E Peer Visibility (literal MCP list_peers) / E2E Peer Visibility (push)",
      "state": "failure"
    },
    {
      "context": "E2E Staging SaaS (full lifecycle) / pr-validate (push)",
      "state": "success"
    },
    {
      "context": "Handlers Postgres Integration / detect-changes (push)",
      "state": "success"
    },
    {
      "context": "Harness Replays / detect-changes (push)",
      "state": "success"
    },
    {
      "context": "Lint forbidden tenant-env keys / Scan workspace_secrets writers for forbidden env keys (push)",
      "state": "success"
    },
    {
      "context": "Lint no tenant GITEA or GITHUB token write / Scan for repo-host token write into tenant workspace surface (push)",
      "state": "success"
    },
    {
      "context": "Secret scan / Scan diff for credential-shaped strings (push)",
      "state": "success"
    },
    {
      "context": "ci-required-drift / drift (push)",
      "state": "success"
    },
    {
      "context": "E2E Staging External Runtime / E2E Staging External Runtime (push)",
      "state": "success"
    },
    {
      "context": "E2E Staging SaaS (full lifecycle) / E2E Staging SaaS (push)",
      "state": "success"
    },
    {
      "context": "CI / Canvas (Next.js) (push)",
      "state": "success"
    },
    {
      "context": "CI / Shellcheck (E2E scripts) (push)",
      "state": "success"
    },
    {
      "context": "E2E API Smoke Test / E2E API Smoke Test (push)",
      "state": "success"
    },
    {
      "context": "E2E Staging Canvas (Playwright) / Canvas tabs E2E (push)",
      "state": "success"
    },
    {
      "context": "Harness Replays / Harness Replays (push)",
      "state": "success"
    },
    {
      "context": "Handlers Postgres Integration / Handlers Postgres Integration (push)",
      "state": "success"
    },
    {
      "context": "CI / Platform (Go) (push)",
      "state": "failure"
    },
    {
      "context": "E2E Chat / E2E Chat (push)",
      "state": "success"
    },
    {
      "context": "publish-workspace-server-image / Production auto-deploy (push)",
      "state": "failure"
    },
    {
      "context": "CI / all-required (push)",
      "state": "failure"
    },
    {
      "context": "Sweep stale AWS Secrets Manager secrets / Sweep AWS Secrets Manager (push)",
      "state": "success"
    },
    {
      "context": "Sweep stale Cloudflare Tunnels / Sweep CF tunnels (push)",
      "state": "success"
    },
    {
      "context": "Staging SaaS smoke (every 30 min) / Staging SaaS smoke (push)",
      "state": "pending"
    },
    {
      "context": "Sweep stale e2e-* orgs (staging) / Sweep e2e orgs (push)",
      "state": "success"
    },
    {
      "context": "Continuous synthetic E2E (staging) / Synthetic E2E against staging (push)",
      "state": "pending"
    },
    {
      "context": "main-red-watchdog / watchdog (push)",
      "state": "pending"
    }
  ],
  "branch": "main",
  "combined_state": "failure",
  "failed_contexts": [
    "E2E Peer Visibility (literal MCP list_peers) / E2E Peer Visibility (push)",
    "CI / Platform (Go) (push)",
    "publish-workspace-server-image / Production auto-deploy (push)",
    "CI / all-required (push)"
  ],
  "recheck_combined_state": "failure",
  "recheck_failed_contexts": [
    "E2E Peer Visibility (literal MCP list_peers) / E2E Peer Visibility (push)",
    "CI / Platform (Go) (push)",
    "publish-workspace-server-image / Production auto-deploy (push)",
    "CI / all-required (push)"
  ],
  "sha": "6c7f66fa3106db8648bea1ce5fec96717a78341d"
}

This issue is idempotent: the watchdog runs hourly at :05 and edits this body in place. When main returns to green, the watchdog will close this issue automatically with a "main returned to green" comment.

# Main is RED on `molecule-ai/molecule-core` at `6c7f66fa31` Commit: <https://git.moleculesai.app/molecule-ai/molecule-core/commit/6c7f66fa3106db8648bea1ce5fec96717a78341d> Auto-filed by `.gitea/workflows/main-red-watchdog.yml` (Option C of the [main-never-red directive](https://git.moleculesai.app/molecule-ai/molecule-core/issues/420)). Per `feedback_no_such_thing_as_flakes` + `feedback_fix_root_not_symptom`: investigate the root cause; do NOT revert as a reflex. The watchdog itself never reverts. ## Failed status contexts - **E2E Peer Visibility (literal MCP list_peers) / E2E Peer Visibility (push)** — `failure` → [logs](/molecule-ai/molecule-core/actions/runs/81279/jobs/2) - Failing after 2m9s - **CI / Platform (Go) (push)** — `failure` → [logs](/molecule-ai/molecule-core/actions/runs/81276/jobs/1) - Failing after 4m42s - **publish-workspace-server-image / Production auto-deploy (push)** — `failure` → [logs](/molecule-ai/molecule-core/actions/runs/81289/jobs/1) - Failing after 18m43s - **CI / all-required (push)** — `failure` → [logs](/molecule-ai/molecule-core/actions/runs/81276/jobs/6) - Failing after 14m56s ## Resolution path 1. Read the failed logs (links above). 2. If reproducible locally, fix forward in a PR targeting `main`. 3. If the failure is a real flake — STOP. Per `feedback_no_such_thing_as_flakes`, intermittent failures are real bugs. Investigate to root cause; do not mark as flake. 4. If the failure is blocking unrelated work for >1 hour, file a follow-up issue and assign someone. Do NOT revert without a human GO per `feedback_prod_apply_needs_hongming_chat_go` (branch protection is a prod surface). ## Debug ```json { "all_contexts": [ { "context": "E2E Chat / detect-changes (push)", "state": "success" }, { "context": "E2E API Smoke Test / detect-changes (push)", "state": "success" }, { "context": "E2E Peer Visibility (literal MCP list_peers) / E2E Peer Visibility (local) (push)", "state": "success" }, { "context": "E2E Staging Canvas (Playwright) / detect-changes (push)", "state": "success" }, { "context": "E2E Peer Visibility (literal MCP list_peers) / E2E Peer Visibility (push)", "state": "failure" }, { "context": "E2E Staging SaaS (full lifecycle) / pr-validate (push)", "state": "success" }, { "context": "Handlers Postgres Integration / detect-changes (push)", "state": "success" }, { "context": "Harness Replays / detect-changes (push)", "state": "success" }, { "context": "Lint forbidden tenant-env keys / Scan workspace_secrets writers for forbidden env keys (push)", "state": "success" }, { "context": "Lint no tenant GITEA or GITHUB token write / Scan for repo-host token write into tenant workspace surface (push)", "state": "success" }, { "context": "Secret scan / Scan diff for credential-shaped strings (push)", "state": "success" }, { "context": "ci-required-drift / drift (push)", "state": "success" }, { "context": "E2E Staging External Runtime / E2E Staging External Runtime (push)", "state": "success" }, { "context": "E2E Staging SaaS (full lifecycle) / E2E Staging SaaS (push)", "state": "success" }, { "context": "CI / Canvas (Next.js) (push)", "state": "success" }, { "context": "CI / Shellcheck (E2E scripts) (push)", "state": "success" }, { "context": "E2E API Smoke Test / E2E API Smoke Test (push)", "state": "success" }, { "context": "E2E Staging Canvas (Playwright) / Canvas tabs E2E (push)", "state": "success" }, { "context": "Harness Replays / Harness Replays (push)", "state": "success" }, { "context": "Handlers Postgres Integration / Handlers Postgres Integration (push)", "state": "success" }, { "context": "CI / Platform (Go) (push)", "state": "failure" }, { "context": "E2E Chat / E2E Chat (push)", "state": "success" }, { "context": "publish-workspace-server-image / Production auto-deploy (push)", "state": "failure" }, { "context": "CI / all-required (push)", "state": "failure" }, { "context": "Sweep stale AWS Secrets Manager secrets / Sweep AWS Secrets Manager (push)", "state": "success" }, { "context": "Sweep stale Cloudflare Tunnels / Sweep CF tunnels (push)", "state": "success" }, { "context": "Staging SaaS smoke (every 30 min) / Staging SaaS smoke (push)", "state": "pending" }, { "context": "Sweep stale e2e-* orgs (staging) / Sweep e2e orgs (push)", "state": "success" }, { "context": "Continuous synthetic E2E (staging) / Synthetic E2E against staging (push)", "state": "pending" }, { "context": "main-red-watchdog / watchdog (push)", "state": "pending" } ], "branch": "main", "combined_state": "failure", "failed_contexts": [ "E2E Peer Visibility (literal MCP list_peers) / E2E Peer Visibility (push)", "CI / Platform (Go) (push)", "publish-workspace-server-image / Production auto-deploy (push)", "CI / all-required (push)" ], "recheck_combined_state": "failure", "recheck_failed_contexts": [ "E2E Peer Visibility (literal MCP list_peers) / E2E Peer Visibility (push)", "CI / Platform (Go) (push)", "publish-workspace-server-image / Production auto-deploy (push)", "CI / all-required (push)" ], "sha": "6c7f66fa3106db8648bea1ce5fec96717a78341d" } ``` _This issue is idempotent: the watchdog runs hourly at `:05` and edits this body in place. When `main` returns to green, the watchdog will close this issue automatically with a "main returned to green" comment._
gitea-actions bot added the tier:high label 2026-05-23 11:07:10 +00:00
Member

RCA — root cause

#1729 is a compound main-red. The user-facing E2E failure is in the peer-visibility staging harness: when workspace creation does not return a workspace auth_token, the script falls back to the tenant admin/session bearer for POST /workspaces/:id/mcp, but the actual WorkspaceAuth middleware only accepts the exact tenant ADMIN_TOKEN, org-scoped API tokens, or a token bound to that workspace. Separately, CI / Platform (Go) is consistent with stale handler/sqlmock expectations after commit 6c7f66fa made model explicit and added workspace create side effects; CI / all-required and production auto-deploy are downstream failures of those contexts.

Evidence

  • tests/e2e/test_peer_visibility_mcp_staging.sh:237-255 — creates parent/sibling workspaces without model and falls back to TENANT_TOKEN when no workspace token is returned.
  • tests/e2e/lib/peer_visibility_assert.sh:72-89 — the gate then calls POST /workspaces/$wid/mcp and fails any non-200 response.
  • workspace-server/internal/middleware/wsauth_middleware.go:74-110WorkspaceAuth accepts only env ADMIN_TOKEN, org tokens, or per-workspace tokens before returning 401.
  • workspace-server/internal/router/router.go:413-419 and handlers/admin_workspace_tokens.go:23-71 — a real admin token-mint route exists, but this staging script explicitly avoids using it.
  • workspace-server/internal/handlers/workspace.go:386-428 — commit 6c7f66fa changed create semantics to fail closed on missing model for spawned runtimes, while handlers_extended_test.go:821-847 pins that new 422 contract.
  • Issue log excerpt — E2E Peer Visibility, CI / Platform (Go), and Production auto-deploy are the failed contexts; auto-deploy waits on CI rather than being an independent deploy regression.

Suggested fix

Treat this as two fix tracks in molecule-core. For peer visibility, stop authenticating MCP calls with the tenant admin/session token unless it is intentionally an org token accepted by WorkspaceAuth; instead have the E2E mint per-workspace tokens through the production-supported admin token route, or update the create path to return a first-use workspace token for these managed test workspaces. For Platform Go, update the affected handler/sqlmock tests around workspace creation to include the new explicit model requirement and the post-create side effects now exercised by 6c7f66fa. Once those two roots are fixed, all-required and production auto-deploy should clear without deploy-specific changes.

Confidence

Medium — the issue body exposes the failed contexts and the cited code shows the auth/model contract mismatch. The action log endpoint returns 404 to this token, so exact test names from the Platform Go run would raise confidence further.

## RCA — root cause `#1729` is a compound main-red. The user-facing E2E failure is in the peer-visibility staging harness: when workspace creation does not return a workspace `auth_token`, the script falls back to the tenant admin/session bearer for `POST /workspaces/:id/mcp`, but the actual `WorkspaceAuth` middleware only accepts the exact tenant `ADMIN_TOKEN`, org-scoped API tokens, or a token bound to that workspace. Separately, `CI / Platform (Go)` is consistent with stale handler/sqlmock expectations after commit `6c7f66fa` made model explicit and added workspace create side effects; `CI / all-required` and production auto-deploy are downstream failures of those contexts. ## Evidence - `tests/e2e/test_peer_visibility_mcp_staging.sh:237-255` — creates parent/sibling workspaces without `model` and falls back to `TENANT_TOKEN` when no workspace token is returned. - `tests/e2e/lib/peer_visibility_assert.sh:72-89` — the gate then calls `POST /workspaces/$wid/mcp` and fails any non-200 response. - `workspace-server/internal/middleware/wsauth_middleware.go:74-110` — `WorkspaceAuth` accepts only env `ADMIN_TOKEN`, org tokens, or per-workspace tokens before returning 401. - `workspace-server/internal/router/router.go:413-419` and `handlers/admin_workspace_tokens.go:23-71` — a real admin token-mint route exists, but this staging script explicitly avoids using it. - `workspace-server/internal/handlers/workspace.go:386-428` — commit `6c7f66fa` changed create semantics to fail closed on missing model for spawned runtimes, while `handlers_extended_test.go:821-847` pins that new 422 contract. - Issue log excerpt — `E2E Peer Visibility`, `CI / Platform (Go)`, and `Production auto-deploy` are the failed contexts; auto-deploy waits on CI rather than being an independent deploy regression. ## Suggested fix Treat this as two fix tracks in `molecule-core`. For peer visibility, stop authenticating MCP calls with the tenant admin/session token unless it is intentionally an org token accepted by `WorkspaceAuth`; instead have the E2E mint per-workspace tokens through the production-supported admin token route, or update the create path to return a first-use workspace token for these managed test workspaces. For Platform Go, update the affected handler/sqlmock tests around workspace creation to include the new explicit `model` requirement and the post-create side effects now exercised by `6c7f66fa`. Once those two roots are fixed, `all-required` and production auto-deploy should clear without deploy-specific changes. ## Confidence Medium — the issue body exposes the failed contexts and the cited code shows the auth/model contract mismatch. The action log endpoint returns 404 to this token, so exact test names from the Platform Go run would raise confidence further.

main returned to green at SHA ca9fe8dbfca459f4b4a61f55dcd21fecae6c1b73 (https://git.moleculesai.app/molecule-ai/molecule-core/commit/ca9fe8dbfca459f4b4a61f55dcd21fecae6c1b73). Closing automatically. If the underlying root cause is not yet understood, reopen this issue and file a postmortem — green-by-flake is still a bug per feedback_no_such_thing_as_flakes.

`main` returned to green at SHA `ca9fe8dbfca459f4b4a61f55dcd21fecae6c1b73` (<https://git.moleculesai.app/molecule-ai/molecule-core/commit/ca9fe8dbfca459f4b4a61f55dcd21fecae6c1b73>). Closing automatically. If the underlying root cause is not yet understood, reopen this issue and file a postmortem — green-by-flake is still a bug per `feedback_no_such_thing_as_flakes`.
gitea-actions bot closed this issue 2026-05-26 16:05:55 +00:00
Member

Formal closure pass: this RCA is already closed and current main is green (12319f1f, combined status success). Treat #1729 as resolved for pending-task tracking; reopen only if the same compound auth-flow + Platform Go failure recurs on a fresh main SHA.

Formal closure pass: this RCA is already closed and current `main` is green (`12319f1f`, combined status `success`). Treat #1729 as resolved for pending-task tracking; reopen only if the same compound auth-flow + Platform Go failure recurs on a fresh main SHA.
Sign in to join this conversation.
2 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: molecule-ai/molecule-core#1729