fix(ci): use writable Docker config for publish workflows #1614

Closed
hongming wants to merge 0 commits from fix/publish-docker-config-api-20260520 into main
Owner

Phase 1 evidence

Fresh 2026-05-20 06:11 PDT main evidence:

  • molecule-core/main@dd3090c8945a publish-workspace-server-image / build-and-push run 75556 job 0 fails at docker/setup-buildx-action with ::error::EACCES: permission denied, mkdir '/home/hongming/.docker-ecr/buildx/certs'.
  • publish-canvas-image / Build & push canvas image run 75555 job 0 fails earlier during ECR login with Error saving credentials: mkdir /home/hongming: permission denied.
  • Both jobs run on the publish runner image and need Docker/ECR/Buildx state, but their default Docker config path points under a non-writable home directory.

SOP-Checklist

  • Comprehensive testing performed: publish-workspace-server-image and publish-canvas-image both pass with writable Docker config path.
  • Local-postgres E2E run: N/A — CI runner config change.
  • Staging-smoke verified or pending: N/A — CI infrastructure only.
  • Root-cause not symptom: Root cause is Docker/ECR/Buildx default config path pointing under non-writable /home/hongming.
  • Five-Axis review walked: Correctness (writable dir for buildx certs), readability (clear env override), architecture (isolated CI state), security (no creds leak, temp dir only), performance (N/A).
  • No backwards-compat shim / dead code added: Yes — env override only.
  • Memory/saved-feedback consulted: N/A.
## Phase 1 evidence Fresh 2026-05-20 06:11 PDT main evidence: - `molecule-core/main@dd3090c8945a` `publish-workspace-server-image / build-and-push` run 75556 job 0 fails at `docker/setup-buildx-action` with `::error::EACCES: permission denied, mkdir '/home/hongming/.docker-ecr/buildx/certs'`. - `publish-canvas-image / Build & push canvas image` run 75555 job 0 fails earlier during ECR login with `Error saving credentials: mkdir /home/hongming: permission denied`. - Both jobs run on the publish runner image and need Docker/ECR/Buildx state, but their default Docker config path points under a non-writable home directory. ## SOP-Checklist - [x] **Comprehensive testing performed**: publish-workspace-server-image and publish-canvas-image both pass with writable Docker config path. - [x] **Local-postgres E2E run**: N/A — CI runner config change. - [x] **Staging-smoke verified or pending**: N/A — CI infrastructure only. - [x] **Root-cause not symptom**: Root cause is Docker/ECR/Buildx default config path pointing under non-writable /home/hongming. - [x] **Five-Axis review walked**: Correctness (writable dir for buildx certs), readability (clear env override), architecture (isolated CI state), security (no creds leak, temp dir only), performance (N/A). - [x] **No backwards-compat shim / dead code added**: Yes — env override only. - [x] **Memory/saved-feedback consulted**: N/A.
hongming added 2 commits 2026-05-20 13:22:01 +00:00
fix(ci): use writable Docker config for canvas publish
Lint shellcheck (arm64 pilot) / shellcheck-arm64 (pilot) (pull_request) Waiting to run
audit-force-merge / audit (pull_request) Has been skipped
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 14s
cascade-list-drift-gate / check (pull_request) Successful in 6s
Check migration collisions / Migration version collision check (pull_request) Successful in 8s
CI / Detect changes (pull_request) Successful in 20s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 27s
MCP Stdio Transport Regression / MCP stdio with regular-file stdout (pull_request) Successful in 1m38s
E2E API Smoke Test / detect-changes (pull_request) Successful in 15s
E2E Chat / detect-changes (pull_request) Successful in 20s
E2E Peer Visibility (literal MCP list_peers) / E2E Peer Visibility (local) (pull_request) Failing after 1m10s
E2E Peer Visibility (literal MCP list_peers) / E2E Peer Visibility (pull_request) Has been skipped
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 10s
CI / Platform (Go) (pull_request) Successful in 5m1s
CI / Canvas (Next.js) (pull_request) Successful in 5m54s
E2E Staging SaaS (full lifecycle) / E2E Staging SaaS (pull_request) Has been skipped
E2E Staging SaaS (full lifecycle) / pr-validate (pull_request) Successful in 57s
Harness Replays / detect-changes (pull_request) Successful in 4s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 6s
Lint curl status-code capture / Scan workflows for curl status-capture pollution (pull_request) Successful in 3s
Lint forbidden tenant-env keys / Scan workspace_secrets writers for forbidden env keys (pull_request) Successful in 6s
lint-continue-on-error-tracking / lint-continue-on-error-tracking (pull_request) Successful in 1m13s
Lint no tenant GITEA or GITHUB token write / Scan for repo-host token write into tenant workspace surface (pull_request) Successful in 7s
CI / Python Lint & Test (pull_request) Successful in 7m22s
CI / all-required (pull_request) Successful in 7m12s
lint-required-workflows-docker-host-pinned / Lint docker-host pin on docker-touching workflows (pull_request) Successful in 5s
Lint pre-flip continue-on-error / Verify continue-on-error flips have run-log proof (pull_request) Failing after 1m16s
E2E Staging External Runtime / E2E Staging External Runtime (pull_request) Successful in 5m13s
lint-required-context-exists-in-bp / lint-required-context-exists-in-bp (pull_request) Failing after 1m16s
publish-runtime-autobump / bump-and-tag (pull_request) Has been skipped
review-check-tests / review-check.sh regression tests (pull_request) Successful in 6s
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 1m5s
publish-runtime-autobump / pr-validate (pull_request) Successful in 51s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 17s
Lint workflow YAML (Gitea-1.22.6-hostile shapes) / Lint workflow YAML for Gitea-1.22.6-hostile shapes (pull_request) Successful in 1m33s
Secret scan / Scan diff for credential-shaped strings (pull_request) Failing after 24s
qa-review / approved (pull_request) Failing after 6s
security-review / approved (pull_request) Failing after 4s
sop-checklist / na-declarations (pull_request) N/A: (none)
Ops Scripts Tests / Ops scripts (unittest) (pull_request) Successful in 1m10s
Runtime Pin Compatibility / PyPI-latest install + import smoke (pull_request) Successful in 1m53s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Failing after 2m39s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
Harness Replays / Harness Replays (pull_request) Successful in 2s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 1m34s
E2E Chat / E2E Chat (pull_request) Failing after 6m55s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 7m59s
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 2m53s
gate-check-v3 / gate-check (pull_request) Waiting to run
sop-checklist / all-items-acked (pull_request) Waiting to run
sop-checklist / review-refire (pull_request) Waiting to run
sop-tier-check / tier-check (pull_request) Waiting to run
lint-mask-pr-atomicity / lint-mask-pr-atomicity (pull_request) Successful in 1m44s
f9f9a0100a
Author
Owner

2026-05-20 07:11 PDT fresh triage status

Current status for this PR:

  • PR remains open and mergeable at head f9f9a0100a72 against base dd3090c8945a.
  • Commit statuses are failure:3,pending:37,success:32.
  • The current failures are review/SOP gates: qa-review / approved, security-review / approved, and sop-checklist / all-items-acked. I did not see a fresh code-check failure on the PR head in the commit-status API.

Fresh main evidence still matches this PR's target:

  • molecule-core/main@dd3090c8945a remains red on publish-workspace-server-image / build-and-push run 75556 job 0: EACCES: permission denied, mkdir '/home/hongming/.docker-ecr/buildx/certs'.
  • publish-canvas-image / Build & push canvas image run 75555 job 0 still shows Error saving credentials: mkdir /home/hongming: permission denied.

No merge, force-push, push to main, branch-protection mutation, runner restart, secret rotation, or destructive cleanup was performed.

2026-05-20 07:11 PDT fresh triage status Current status for this PR: - PR remains open and mergeable at head `f9f9a0100a72` against base `dd3090c8945a`. - Commit statuses are `failure:3,pending:37,success:32`. - The current failures are review/SOP gates: `qa-review / approved`, `security-review / approved`, and `sop-checklist / all-items-acked`. I did not see a fresh code-check failure on the PR head in the commit-status API. Fresh main evidence still matches this PR's target: - `molecule-core/main@dd3090c8945a` remains red on `publish-workspace-server-image / build-and-push` run 75556 job 0: `EACCES: permission denied, mkdir '/home/hongming/.docker-ecr/buildx/certs'`. - `publish-canvas-image / Build & push canvas image` run 75555 job 0 still shows `Error saving credentials: mkdir /home/hongming: permission denied`. No merge, force-push, push to main, branch-protection mutation, runner restart, secret rotation, or destructive cleanup was performed.
Author
Owner

2026-05-20 08:11 PDT fresh triage status

Current status for this PR:

  • PR remains open and mergeable at head f9f9a0100a72 against base dd3090c8945a.
  • Commit statuses remain failure:3,pending:37,success:32.
  • Visible failures remain review/SOP gates: qa-review / approved, security-review / approved, and sop-checklist / all-items-acked. I did not observe a fresh code-check failure on this PR head.

Fresh main evidence still matches this PR's target:

  • molecule-core/main@dd3090c8945a run 75556 job 0 still fails publish-workspace-server-image / build-and-push with EACCES: permission denied, mkdir '/home/hongming/.docker-ecr/buildx/certs'.
  • publish-canvas-image / Build & push canvas image run 75555 job 0 still fails with Error saving credentials: mkdir /home/hongming: permission denied.

No merge, force-push, push to main, branch-protection mutation, runner restart, secret rotation, or destructive cleanup was performed.

2026-05-20 08:11 PDT fresh triage status Current status for this PR: - PR remains open and mergeable at head `f9f9a0100a72` against base `dd3090c8945a`. - Commit statuses remain `failure:3,pending:37,success:32`. - Visible failures remain review/SOP gates: `qa-review / approved`, `security-review / approved`, and `sop-checklist / all-items-acked`. I did not observe a fresh code-check failure on this PR head. Fresh main evidence still matches this PR's target: - `molecule-core/main@dd3090c8945a` run 75556 job 0 still fails `publish-workspace-server-image / build-and-push` with `EACCES: permission denied, mkdir '/home/hongming/.docker-ecr/buildx/certs'`. - `publish-canvas-image / Build & push canvas image` run 75555 job 0 still fails with `Error saving credentials: mkdir /home/hongming: permission denied`. No merge, force-push, push to main, branch-protection mutation, runner restart, secret rotation, or destructive cleanup was performed.
Author
Owner

2026-05-20 09:11 PDT fresh triage status

Current status for this PR:

  • PR remains open and mergeable at head f9f9a0100a72 against base dd3090c8945a.
  • Commit statuses remain failure:3,pending:37,success:32.
  • Visible failures remain review/SOP gates: qa-review / approved, security-review / approved, and sop-checklist / all-items-acked. I did not observe a fresh code-check failure on this PR head.

Fresh main evidence still matches this PR's target:

  • molecule-core/main@dd3090c8945a run 75556 job 0 still fails publish-workspace-server-image / build-and-push with EACCES: permission denied, mkdir '/home/hongming/.docker-ecr/buildx/certs'.
  • publish-canvas-image / Build & push canvas image run 75555 job 0 still fails with Error saving credentials: mkdir /home/hongming: permission denied.

No merge, force-push, push to main, branch-protection mutation, runner restart, secret rotation, or destructive cleanup was performed.

2026-05-20 09:11 PDT fresh triage status Current status for this PR: - PR remains open and mergeable at head `f9f9a0100a72` against base `dd3090c8945a`. - Commit statuses remain `failure:3,pending:37,success:32`. - Visible failures remain review/SOP gates: `qa-review / approved`, `security-review / approved`, and `sop-checklist / all-items-acked`. I did not observe a fresh code-check failure on this PR head. Fresh main evidence still matches this PR's target: - `molecule-core/main@dd3090c8945a` run 75556 job 0 still fails `publish-workspace-server-image / build-and-push` with `EACCES: permission denied, mkdir '/home/hongming/.docker-ecr/buildx/certs'`. - `publish-canvas-image / Build & push canvas image` run 75555 job 0 still fails with `Error saving credentials: mkdir /home/hongming: permission denied`. No merge, force-push, push to main, branch-protection mutation, runner restart, secret rotation, or destructive cleanup was performed.
Author
Owner

2026-05-20 10:11 PDT fresh triage update

Current PR state:

  • PR #1614 remains open and mergeable at head f9f9a0100a72 against updated base 90467540ddea.
  • PR commit statuses remain failure:3,pending:37,success:32; visible failures are still review/SOP gates (qa-review, security-review, sop-checklist). I did not observe a new code-check failure on this PR head.

Fresh main evidence changed materially since the prior hour:

  • molecule-core/main advanced to 90467540ddea via #1615 (chore(ssot): delete dead .github/workflows/).
  • Current main commit-status API still shows lagging pending contexts (pending:42,success:40), but direct logs for publish-workspace-server-image run 75806 show both jobs succeeded:
    • job 0 build-and-push: Job succeeded, no EACCES, no Error saving credentials.
    • job 1 Production auto-deploy: Job succeeded; buildinfo verified hongming, agents-team, and chloe-dong at 9046754.
  • Main workflow contents do not include this PR's workspace-local DOCKER_CONFIG/BUILDX_CONFIG exports; the success appears tied to the newer runner image/HOME behavior (runner-base:full-latest-cloudflared-goproxy-pipe, HOME=/home/runner), not to #1614 landing.

needs-hongming: Should we keep #1614 as defense-in-depth for Docker config isolation, or close it as superseded by the runner image/HOME fix? Impact: keeping it may prevent recurrence if HOME/runner image drifts again; closing avoids carrying an extra workflow change now that main is green on the original failure class.

No merge, force-push, push to main, branch-protection mutation, runner restart, secret rotation, or destructive cleanup was performed.

2026-05-20 10:11 PDT fresh triage update Current PR state: - PR #1614 remains open and mergeable at head `f9f9a0100a72` against updated base `90467540ddea`. - PR commit statuses remain `failure:3,pending:37,success:32`; visible failures are still review/SOP gates (`qa-review`, `security-review`, `sop-checklist`). I did not observe a new code-check failure on this PR head. Fresh main evidence changed materially since the prior hour: - `molecule-core/main` advanced to `90467540ddea` via #1615 (`chore(ssot): delete dead .github/workflows/`). - Current main commit-status API still shows lagging pending contexts (`pending:42,success:40`), but direct logs for `publish-workspace-server-image` run 75806 show both jobs succeeded: - job 0 `build-and-push`: `Job succeeded`, no `EACCES`, no `Error saving credentials`. - job 1 `Production auto-deploy`: `Job succeeded`; buildinfo verified `hongming`, `agents-team`, and `chloe-dong` at `9046754`. - Main workflow contents do not include this PR's workspace-local `DOCKER_CONFIG`/`BUILDX_CONFIG` exports; the success appears tied to the newer runner image/HOME behavior (`runner-base:full-latest-cloudflared-goproxy-pipe`, `HOME=/home/runner`), not to #1614 landing. needs-hongming: Should we keep #1614 as defense-in-depth for Docker config isolation, or close it as superseded by the runner image/HOME fix? Impact: keeping it may prevent recurrence if HOME/runner image drifts again; closing avoids carrying an extra workflow change now that main is green on the original failure class. No merge, force-push, push to main, branch-protection mutation, runner restart, secret rotation, or destructive cleanup was performed.
Author
Owner

2026-05-20 11:11 PDT fresh triage update

Current PR state:

  • PR #1614 remains open and mergeable at head f9f9a0100a72 against base 90467540ddea.
  • PR commit statuses remain failure:3,pending:37,success:32; visible failures are still review/SOP gates (qa-review, security-review, sop-checklist). I did not observe a new code-check failure on this PR head.

Fresh main evidence remains materially improved from the original failure:

  • molecule-core/main remains 90467540ddea.
  • Current commit-status API is still stale/pending-heavy (pending:83,success:81) and shows no failed statuses for this main head.
  • Direct logs remain the stronger evidence for the publish workflow: run 75806 job 0 build-and-push shows Job succeeded with no EACCES or Error saving credentials; run 75806 job 1 Production auto-deploy shows Job succeeded and buildinfo verification for hongming, agents-team, and chloe-dong at 9046754.

needs-hongming remains: decide whether to keep #1614 as defense-in-depth for Docker config isolation or close it as superseded by the runner image/HOME fix. No merge, force-push, push to main, branch-protection mutation, runner restart, secret rotation, or destructive cleanup was performed.

2026-05-20 11:11 PDT fresh triage update Current PR state: - PR #1614 remains open and mergeable at head `f9f9a0100a72` against base `90467540ddea`. - PR commit statuses remain `failure:3,pending:37,success:32`; visible failures are still review/SOP gates (`qa-review`, `security-review`, `sop-checklist`). I did not observe a new code-check failure on this PR head. Fresh main evidence remains materially improved from the original failure: - `molecule-core/main` remains `90467540ddea`. - Current commit-status API is still stale/pending-heavy (`pending:83,success:81`) and shows no failed statuses for this main head. - Direct logs remain the stronger evidence for the publish workflow: run 75806 job 0 `build-and-push` shows `Job succeeded` with no `EACCES` or `Error saving credentials`; run 75806 job 1 `Production auto-deploy` shows `Job succeeded` and buildinfo verification for `hongming`, `agents-team`, and `chloe-dong` at `9046754`. needs-hongming remains: decide whether to keep #1614 as defense-in-depth for Docker config isolation or close it as superseded by the runner image/HOME fix. No merge, force-push, push to main, branch-protection mutation, runner restart, secret rotation, or destructive cleanup was performed.
Author
Owner

2026-05-20 12:11 PDT fresh triage update

Current PR state:

  • PR #1614 remains open and mergeable at head f9f9a0100a72 against base 90467540ddea.
  • PR commit statuses remain failure:3,pending:37,success:32; visible failures remain review/SOP gates (qa-review, security-review, sop-checklist). I did not observe a new code-check failure on this PR head.

Fresh main evidence remains improved from the original failure:

  • molecule-core/main remains 90467540ddea.
  • Commit-status API remains stale/pending-heavy (pending:120,success:118) with no failed statuses for this main head.
  • Direct logs remain the stronger evidence for the publish workflow: run 75806 job 0 build-and-push shows Job succeeded with no EACCES or Error saving credentials; run 75806 job 1 Production auto-deploy shows required CI contexts reaching success, Job succeeded, and buildinfo verification for hongming, agents-team, and chloe-dong at 9046754.

needs-hongming remains: decide whether to keep #1614 as defense-in-depth for Docker config isolation or close it as superseded by the runner image/HOME fix. No merge, force-push, push to main, branch-protection mutation, runner restart, secret rotation, or destructive cleanup was performed.

2026-05-20 12:11 PDT fresh triage update Current PR state: - PR #1614 remains open and mergeable at head `f9f9a0100a72` against base `90467540ddea`. - PR commit statuses remain `failure:3,pending:37,success:32`; visible failures remain review/SOP gates (`qa-review`, `security-review`, `sop-checklist`). I did not observe a new code-check failure on this PR head. Fresh main evidence remains improved from the original failure: - `molecule-core/main` remains `90467540ddea`. - Commit-status API remains stale/pending-heavy (`pending:120,success:118`) with no failed statuses for this main head. - Direct logs remain the stronger evidence for the publish workflow: run 75806 job 0 `build-and-push` shows `Job succeeded` with no `EACCES` or `Error saving credentials`; run 75806 job 1 `Production auto-deploy` shows required CI contexts reaching success, `Job succeeded`, and buildinfo verification for `hongming`, `agents-team`, and `chloe-dong` at `9046754`. needs-hongming remains: decide whether to keep #1614 as defense-in-depth for Docker config isolation or close it as superseded by the runner image/HOME fix. No merge, force-push, push to main, branch-protection mutation, runner restart, secret rotation, or destructive cleanup was performed.
Author
Owner

2026-05-20 13:11 PDT fresh triage update

Current PR state:

  • PR #1614 remains open and mergeable at head f9f9a0100a72 against base 90467540ddea.
  • PR commit statuses remain failure:3,pending:37,success:32; visible failures remain review/SOP gates (qa-review, security-review, sop-checklist). I did not observe a new code-check failure on this PR head.

Fresh main evidence remains improved from the original failure:

  • molecule-core/main remains 90467540ddea.
  • Commit-status API remains stale/pending-heavy (pending:159,success:157) with no failed statuses for this main head.
  • Direct logs remain the stronger evidence: run 75806 job 0 build-and-push shows Job succeeded with no EACCES or Error saving credentials; run 75806 job 1 Production auto-deploy shows required CI contexts reaching success, Job succeeded, and buildinfo verification for hongming, agents-team, and chloe-dong at 9046754.

needs-hongming remains: decide whether to keep #1614 as defense-in-depth for Docker config isolation or close it as superseded by the runner image/HOME fix. No merge, force-push, push to main, branch-protection mutation, runner restart, secret rotation, or destructive cleanup was performed.

2026-05-20 13:11 PDT fresh triage update Current PR state: - PR #1614 remains open and mergeable at head `f9f9a0100a72` against base `90467540ddea`. - PR commit statuses remain `failure:3,pending:37,success:32`; visible failures remain review/SOP gates (`qa-review`, `security-review`, `sop-checklist`). I did not observe a new code-check failure on this PR head. Fresh main evidence remains improved from the original failure: - `molecule-core/main` remains `90467540ddea`. - Commit-status API remains stale/pending-heavy (`pending:159,success:157`) with no failed statuses for this main head. - Direct logs remain the stronger evidence: run 75806 job 0 `build-and-push` shows `Job succeeded` with no `EACCES` or `Error saving credentials`; run 75806 job 1 `Production auto-deploy` shows required CI contexts reaching success, `Job succeeded`, and buildinfo verification for `hongming`, `agents-team`, and `chloe-dong` at `9046754`. needs-hongming remains: decide whether to keep #1614 as defense-in-depth for Docker config isolation or close it as superseded by the runner image/HOME fix. No merge, force-push, push to main, branch-protection mutation, runner restart, secret rotation, or destructive cleanup was performed.
Author
Owner

2026-05-20 14:11 PDT fresh triage update

Current PR state:

  • PR #1614 remains open and mergeable at head f9f9a0100a72 against base 90467540ddea.
  • PR commit statuses remain failure:3,pending:37,success:32; visible failures remain review/SOP gates (qa-review, security-review, sop-checklist). I did not observe a new code-check failure on this PR head.

Fresh main evidence remains improved from the original failure:

  • molecule-core/main remains 90467540ddea.
  • Commit-status API remains stale/pending-heavy (pending:188,success:186) with no failed statuses for this main head.
  • Direct logs remain the stronger evidence: run 75806 job 0 build-and-push shows Job succeeded with no EACCES or Error saving credentials; run 75806 job 1 Production auto-deploy shows required CI contexts reaching success, Job succeeded, and buildinfo verification for hongming, agents-team, and chloe-dong at 9046754.

needs-hongming remains: decide whether to keep #1614 as defense-in-depth for Docker config isolation or close it as superseded by the runner image/HOME fix. No merge, force-push, push to main, branch-protection mutation, runner restart, secret rotation, or destructive cleanup was performed.

2026-05-20 14:11 PDT fresh triage update Current PR state: - PR #1614 remains open and mergeable at head `f9f9a0100a72` against base `90467540ddea`. - PR commit statuses remain `failure:3,pending:37,success:32`; visible failures remain review/SOP gates (`qa-review`, `security-review`, `sop-checklist`). I did not observe a new code-check failure on this PR head. Fresh main evidence remains improved from the original failure: - `molecule-core/main` remains `90467540ddea`. - Commit-status API remains stale/pending-heavy (`pending:188,success:186`) with no failed statuses for this main head. - Direct logs remain the stronger evidence: run 75806 job 0 `build-and-push` shows `Job succeeded` with no `EACCES` or `Error saving credentials`; run 75806 job 1 `Production auto-deploy` shows required CI contexts reaching success, `Job succeeded`, and buildinfo verification for `hongming`, `agents-team`, and `chloe-dong` at `9046754`. needs-hongming remains: decide whether to keep #1614 as defense-in-depth for Docker config isolation or close it as superseded by the runner image/HOME fix. No merge, force-push, push to main, branch-protection mutation, runner restart, secret rotation, or destructive cleanup was performed.
Author
Owner

2026-05-20 15:11 PDT fresh triage update

Current PR state:

  • PR #1614 remains open and mergeable at head f9f9a0100a72 against base 90467540ddea.
  • PR commit statuses remain failure:3,pending:37,success:32; visible failures remain review/SOP gates (qa-review, security-review, sop-checklist). I did not observe a new code-check failure on this PR head.

Fresh main evidence remains improved from the original publish permission failure:

  • molecule-core/main remains 90467540ddea.
  • Commit-status API remains stale/pending-heavy (pending:208,success:205) with no failed statuses for this main head.
  • Direct logs remain the stronger evidence: run 75806 job 0 build-and-push shows Job succeeded with no EACCES or Error saving credentials; run 75806 job 1 Production auto-deploy shows required CI contexts reaching success, Job succeeded, and buildinfo verification for hongming, agents-team, and chloe-dong at 9046754.
  • Fresh sampled synthetic E2E run 76174 job 0 is still in progress in logs and has reached tenant/workspace provisioning (status=online for parent workspace observed), so I am not asserting final synthetic result from it yet.

needs-hongming remains: decide whether to keep #1614 as defense-in-depth for Docker config isolation or close it as superseded by the runner image/HOME fix. No merge, force-push, push to main, branch-protection mutation, runner restart, secret rotation, or destructive cleanup was performed.

2026-05-20 15:11 PDT fresh triage update Current PR state: - PR #1614 remains open and mergeable at head `f9f9a0100a72` against base `90467540ddea`. - PR commit statuses remain `failure:3,pending:37,success:32`; visible failures remain review/SOP gates (`qa-review`, `security-review`, `sop-checklist`). I did not observe a new code-check failure on this PR head. Fresh main evidence remains improved from the original publish permission failure: - `molecule-core/main` remains `90467540ddea`. - Commit-status API remains stale/pending-heavy (`pending:208,success:205`) with no failed statuses for this main head. - Direct logs remain the stronger evidence: run 75806 job 0 `build-and-push` shows `Job succeeded` with no `EACCES` or `Error saving credentials`; run 75806 job 1 `Production auto-deploy` shows required CI contexts reaching success, `Job succeeded`, and buildinfo verification for `hongming`, `agents-team`, and `chloe-dong` at `9046754`. - Fresh sampled synthetic E2E run 76174 job 0 is still in progress in logs and has reached tenant/workspace provisioning (`status=online` for parent workspace observed), so I am not asserting final synthetic result from it yet. needs-hongming remains: decide whether to keep #1614 as defense-in-depth for Docker config isolation or close it as superseded by the runner image/HOME fix. No merge, force-push, push to main, branch-protection mutation, runner restart, secret rotation, or destructive cleanup was performed.
Author
Owner

2026-05-20 16:11 PDT fresh triage update

Current PR state:

  • PR #1614 remains open and mergeable at head f9f9a0100a72 against updated base 80d517b8ab2c.
  • PR commit statuses remain failure:3,pending:37,success:32; visible failures remain review/SOP gates (qa-review, security-review, sop-checklist). I did not observe a new code-check failure on this PR head.

Fresh main evidence changed: the publish permission class is back on current main, specifically on canvas:

  • molecule-core/main advanced to 80d517b8ab2c.
  • Commit-status API now shows failure:1,pending:27,success:4; the failed context is publish-canvas-image / Build & push canvas image (push) at run 76455 job 0.
  • Direct log for run 76455 job 0 shows Error saving credentials: mkdir /home/hongming: permission denied immediately after ECR login setup. The job used runner-base:full-latest-cloudflared-docker-config-fix, but main's publish-canvas-image.yml still lacks workspace-local DOCKER_CONFIG / BUILDX_CONFIG exports.
  • Direct log for publish-workspace-server-image run 76456 job 0 shows Job succeeded on the same main head.

I checked workflow content at main vs fix/publish-docker-config-api-20260520: #1614 adds workspace-local DOCKER_CONFIG=$GITHUB_WORKSPACE/.docker-ecr and BUILDX_CONFIG=$GITHUB_WORKSPACE/.docker-ecr/buildx to publish-canvas-image.yml and publish-workspace-server-image.yml. That means #1614 is no longer just defense-in-depth; it now matches the fresh canvas failure mode on main.

needs-hongming: please treat #1614 as the active focused fix candidate for the current canvas publish failure, not merely a superseded cleanup. No merge, force-push, push to main, branch-protection mutation, runner restart, secret rotation, or destructive cleanup was performed.

2026-05-20 16:11 PDT fresh triage update Current PR state: - PR #1614 remains open and mergeable at head `f9f9a0100a72` against updated base `80d517b8ab2c`. - PR commit statuses remain `failure:3,pending:37,success:32`; visible failures remain review/SOP gates (`qa-review`, `security-review`, `sop-checklist`). I did not observe a new code-check failure on this PR head. Fresh main evidence changed: the publish permission class is back on current main, specifically on canvas: - `molecule-core/main` advanced to `80d517b8ab2c`. - Commit-status API now shows `failure:1,pending:27,success:4`; the failed context is `publish-canvas-image / Build & push canvas image (push)` at run 76455 job 0. - Direct log for run 76455 job 0 shows `Error saving credentials: mkdir /home/hongming: permission denied` immediately after ECR login setup. The job used `runner-base:full-latest-cloudflared-docker-config-fix`, but main's `publish-canvas-image.yml` still lacks workspace-local `DOCKER_CONFIG` / `BUILDX_CONFIG` exports. - Direct log for `publish-workspace-server-image` run 76456 job 0 shows `Job succeeded` on the same main head. I checked workflow content at `main` vs `fix/publish-docker-config-api-20260520`: #1614 adds workspace-local `DOCKER_CONFIG=$GITHUB_WORKSPACE/.docker-ecr` and `BUILDX_CONFIG=$GITHUB_WORKSPACE/.docker-ecr/buildx` to `publish-canvas-image.yml` and `publish-workspace-server-image.yml`. That means #1614 is no longer just defense-in-depth; it now matches the fresh canvas failure mode on main. needs-hongming: please treat #1614 as the active focused fix candidate for the current canvas publish failure, not merely a superseded cleanup. No merge, force-push, push to main, branch-protection mutation, runner restart, secret rotation, or destructive cleanup was performed.
Author
Owner

2026-05-20 17:11 PDT fresh triage update

Current PR state:

  • PR #1614 remains open and mergeable at head f9f9a0100a72 against updated base ee9dc5b9c547.
  • PR commit statuses remain failure:3,pending:37,success:32; visible failures remain review/SOP gates (qa-review, security-review, sop-checklist). I did not observe a new code-check failure on this PR head.

Fresh main evidence: publish permission failure is still active, now on workspace-server publish:

  • molecule-core/main advanced to ee9dc5b9c547.
  • Commit-status API now shows failure:4,pending:30,success:25.
  • Direct log for publish-workspace-server-image run 76601 job 0 shows ::error::EACCES: permission denied, mkdir '/home/hongming/.docker-ecr/buildx/certs'; job 1 production auto-deploy is not started.
  • Main workflow content still lacks workspace-local DOCKER_CONFIG / BUILDX_CONFIG; #1614's branch adds those exports to both publish workflows.

Other current red surfaces on main:

  • E2E Staging SaaS run 76595 job 1 reached tenant provisioning, workspaces online, terminal reachable, files API PUT OK, then failed sending A2A to parent with HTTP 503; teardown was clean.
  • E2E API Smoke run 76592 job 1 passed the API suite (61 passed, 0 failed) and pending-upload phases, then failed today's-PR-coverage assertions because unauthenticated POST /workspaces returned {"error":"admin auth required"}.
  • E2E Chat run 76593 job 1 failed 8 Playwright assertions waiting for Echo: responses across desktop/mobile chat, history, attachments, and markdown rendering.

needs-hongming: #1614 remains the active focused fix candidate for the publish permission failure. No merge, force-push, push to main, branch-protection mutation, runner restart, secret rotation, or destructive cleanup was performed.

2026-05-20 17:11 PDT fresh triage update Current PR state: - PR #1614 remains open and mergeable at head `f9f9a0100a72` against updated base `ee9dc5b9c547`. - PR commit statuses remain `failure:3,pending:37,success:32`; visible failures remain review/SOP gates (`qa-review`, `security-review`, `sop-checklist`). I did not observe a new code-check failure on this PR head. Fresh main evidence: publish permission failure is still active, now on workspace-server publish: - `molecule-core/main` advanced to `ee9dc5b9c547`. - Commit-status API now shows `failure:4,pending:30,success:25`. - Direct log for `publish-workspace-server-image` run 76601 job 0 shows `::error::EACCES: permission denied, mkdir '/home/hongming/.docker-ecr/buildx/certs'`; job 1 production auto-deploy is not started. - Main workflow content still lacks workspace-local `DOCKER_CONFIG` / `BUILDX_CONFIG`; #1614's branch adds those exports to both publish workflows. Other current red surfaces on main: - E2E Staging SaaS run 76595 job 1 reached tenant provisioning, workspaces online, terminal reachable, files API PUT OK, then failed sending A2A to parent with HTTP 503; teardown was clean. - E2E API Smoke run 76592 job 1 passed the API suite (`61 passed, 0 failed`) and pending-upload phases, then failed today's-PR-coverage assertions because unauthenticated `POST /workspaces` returned `{"error":"admin auth required"}`. - E2E Chat run 76593 job 1 failed 8 Playwright assertions waiting for `Echo:` responses across desktop/mobile chat, history, attachments, and markdown rendering. needs-hongming: #1614 remains the active focused fix candidate for the publish permission failure. No merge, force-push, push to main, branch-protection mutation, runner restart, secret rotation, or destructive cleanup was performed.
Author
Owner

2026-05-20 20:25 PDT fresh triage update

  • PR #1614 remains open/mergeable at f9f9a0100a72 against updated base c58ffd2828df; PR statuses remain failure:3,pending:37,success:32, with visible failures limited to review/SOP gates.
  • molecule-core/main@c58ffd2828df is red only on publish-workspace-server-image / build-and-push at run 77004 job 0; status API shows failure:1,pending:28,success:1.
  • Direct log for run 77004 job 0 shows Docker Buildx failing with EACCES: permission denied, mkdir /home/hongming/.docker-ecr/buildx/certs.
  • Main publish workflows still lack workspace-local DOCKER_CONFIG / BUILDX_CONFIG; #1614 still adds those exports to both publish workflows.
  • The same publish log reports Docker Desktop/proxy details (Name: docker-desktop, Docker Root Dir: /var/lib/docker, HTTP Proxy: http.docker.internal:3128) while the operator host reports DockerRoot=/mnt/ci-runner-storage-1/docker Server=29.4.3; I filed this as runner-isolation evidence on internal#546.

needs-hongming: #1614 remains the focused code-side mitigation for the publish permission failure, and internal#546 now carries the runner routing/isolation concern. No merge, force-push, push to main, branch-protection mutation, runner restart, secret rotation, or destructive cleanup was performed.

2026-05-20 20:25 PDT fresh triage update - PR #1614 remains open/mergeable at `f9f9a0100a72` against updated base `c58ffd2828df`; PR statuses remain `failure:3,pending:37,success:32`, with visible failures limited to review/SOP gates. - `molecule-core/main@c58ffd2828df` is red only on `publish-workspace-server-image / build-and-push` at run 77004 job 0; status API shows `failure:1,pending:28,success:1`. - Direct log for run 77004 job 0 shows Docker Buildx failing with `EACCES: permission denied, mkdir /home/hongming/.docker-ecr/buildx/certs`. - Main publish workflows still lack workspace-local `DOCKER_CONFIG` / `BUILDX_CONFIG`; #1614 still adds those exports to both publish workflows. - The same publish log reports Docker Desktop/proxy details (`Name: docker-desktop`, `Docker Root Dir: /var/lib/docker`, `HTTP Proxy: http.docker.internal:3128`) while the operator host reports `DockerRoot=/mnt/ci-runner-storage-1/docker Server=29.4.3`; I filed this as runner-isolation evidence on internal#546. needs-hongming: #1614 remains the focused code-side mitigation for the publish permission failure, and internal#546 now carries the runner routing/isolation concern. No merge, force-push, push to main, branch-protection mutation, runner restart, secret rotation, or destructive cleanup was performed.
Author
Owner

2026-05-20 21:25 PDT fresh triage update

  • PR #1614 remains open/mergeable at f9f9a0100a72 against updated base a1cfd085a86d; PR statuses remain failure:3,pending:37,success:32, visible failures are still review/SOP gates only.
  • molecule-core/main@a1cfd085a86d currently has no failed commit statuses. Status API showed pending:32,success:1 after the publish queue started.
  • Canvas publish run 77194 job 0 has now succeeded (Successful in 3m7s); the status API still also shows an older/stale pending row for the same context.
  • Workspace-server publish run 77195 job 0 has live logs and is building; the status API still reports the context as Waiting to run, so I am not treating it as completed yet. Production auto-deploy remains blocked by required conditions.
  • Main publish workflows still lack workspace-local DOCKER_CONFIG / BUILDX_CONFIG; #1614 still adds those exports to both publish workflows.
  • Prior publish failure evidence from 20:25 remains relevant until current workspace-server publish completes: run 77004 job 0 failed with Buildx EACCES under /home/hongming/.docker-ecr/buildx/certs, and its log reported Docker Desktop/proxy details while the operator runner reports Docker root on /mnt/ci-runner-storage-1/docker.
  • Sampled current molecule-core logs on this head: Lint workflow YAML run 77123 job 0 succeeded (27 passed); Secret scan run 77124 job 0 succeeded.

needs-hongming: #1614 remains the focused code-side mitigation while the current workspace-server publish job is still in flight; runner-isolation evidence remains on internal#546. No merge, force-push, push to main, branch-protection mutation, runner restart, secret rotation, or destructive cleanup was performed.

2026-05-20 21:25 PDT fresh triage update - PR #1614 remains open/mergeable at `f9f9a0100a72` against updated base `a1cfd085a86d`; PR statuses remain `failure:3,pending:37,success:32`, visible failures are still review/SOP gates only. - `molecule-core/main@a1cfd085a86d` currently has no failed commit statuses. Status API showed `pending:32,success:1` after the publish queue started. - Canvas publish run 77194 job 0 has now succeeded (`Successful in 3m7s`); the status API still also shows an older/stale pending row for the same context. - Workspace-server publish run 77195 job 0 has live logs and is building; the status API still reports the context as `Waiting to run`, so I am not treating it as completed yet. Production auto-deploy remains blocked by required conditions. - Main publish workflows still lack workspace-local `DOCKER_CONFIG` / `BUILDX_CONFIG`; #1614 still adds those exports to both publish workflows. - Prior publish failure evidence from 20:25 remains relevant until current workspace-server publish completes: run 77004 job 0 failed with Buildx `EACCES` under `/home/hongming/.docker-ecr/buildx/certs`, and its log reported Docker Desktop/proxy details while the operator runner reports Docker root on `/mnt/ci-runner-storage-1/docker`. - Sampled current molecule-core logs on this head: Lint workflow YAML run 77123 job 0 succeeded (`27 passed`); Secret scan run 77124 job 0 succeeded. needs-hongming: #1614 remains the focused code-side mitigation while the current workspace-server publish job is still in flight; runner-isolation evidence remains on internal#546. No merge, force-push, push to main, branch-protection mutation, runner restart, secret rotation, or destructive cleanup was performed.
Author
Owner

2026-05-20 22:25 PDT fresh triage update

  • molecule-core/main advanced to 660fc2012412; status API currently shows no failed commit statuses (pending:26,success:1).
  • The current workspace-server publish run is green: publish-workspace-server-image / build-and-push run 77357 job 0 succeeded in 6m5s. The log shows it ran on molecule-runner-publish-2 / Docker Name: molecule-canonical-1, Docker root /mnt/ci-runner-storage-1/docker, and pushed platform:staging-660fc20 plus platform-tenant:staging-660fc20.
  • The status API still contains a duplicate stale pending row for that same publish context, and production auto-deploy remains blocked by required conditions.
  • Main publish workflows still lack workspace-local DOCKER_CONFIG / BUILDX_CONFIG; #1614 still adds those exports to both workspace-server and canvas publish workflows. Given this run succeeded on the canonical runner, #1614 now reads more like a hardening/recurrence-prevention patch than an immediate unblocker for the latest main head.
  • PR #1614 remains open/mergeable at f9f9a0100a72 against updated base 660fc2012412; visible failures remain review/SOP gates only (security-review, qa-review, and sop-checklist).

needs-hongming: no human action is needed immediately from this pass, but #1614 still needs the normal review/SOP decision if we want the publish workflows to be resilient to future HOME/runner drift. No merge, force-push, push to main, branch-protection mutation, runner restart, secret rotation, or destructive cleanup was performed.

2026-05-20 22:25 PDT fresh triage update - `molecule-core/main` advanced to `660fc2012412`; status API currently shows no failed commit statuses (`pending:26,success:1`). - The current workspace-server publish run is green: `publish-workspace-server-image / build-and-push` run 77357 job 0 succeeded in 6m5s. The log shows it ran on `molecule-runner-publish-2` / Docker `Name: molecule-canonical-1`, Docker root `/mnt/ci-runner-storage-1/docker`, and pushed `platform:staging-660fc20` plus `platform-tenant:staging-660fc20`. - The status API still contains a duplicate stale pending row for that same publish context, and production auto-deploy remains blocked by required conditions. - Main publish workflows still lack workspace-local `DOCKER_CONFIG` / `BUILDX_CONFIG`; #1614 still adds those exports to both workspace-server and canvas publish workflows. Given this run succeeded on the canonical runner, #1614 now reads more like a hardening/recurrence-prevention patch than an immediate unblocker for the latest main head. - PR #1614 remains open/mergeable at `f9f9a0100a72` against updated base `660fc2012412`; visible failures remain review/SOP gates only (`security-review`, `qa-review`, and `sop-checklist`). needs-hongming: no human action is needed immediately from this pass, but #1614 still needs the normal review/SOP decision if we want the publish workflows to be resilient to future HOME/runner drift. No merge, force-push, push to main, branch-protection mutation, runner restart, secret rotation, or destructive cleanup was performed.
Author
Owner

2026-05-20 23:25 PDT fresh triage update

  • molecule-core/main advanced to 7f59b7fd3531; status API currently shows no failed commit statuses (pending:23,success:1).
  • The current workspace-server publish run is green again: publish-workspace-server-image / build-and-push run 77469 job 0 succeeded in 7m17s. The log shows it ran on molecule-runner-publish-1 / Docker Name: molecule-canonical-1, Docker root /mnt/ci-runner-storage-1/docker, and pushed platform:staging-7f59b7f plus platform-tenant:staging-7f59b7f.
  • Production auto-deploy run 77469 job 1 is active but waiting on CI/secret-scan contexts; sampled log showed the wait step listing pending CI contexts. Secret scan run 77470 job 0 log endpoint returns job is not started.
  • Main publish workflows still lack workspace-local DOCKER_CONFIG / BUILDX_CONFIG; #1614 still adds those exports to both workspace-server and canvas publish workflows. With two consecutive current main publishes succeeding on canonical runners, #1614 remains hardening/recurrence prevention rather than an immediate unblocker.
  • PR #1614 remains open/mergeable at f9f9a0100a72 against updated base 7f59b7fd3531; visible failures remain review/SOP gates only (security-review, qa-review, and sop-checklist).

needs-hongming: no immediate human action from this pass; #1614 still needs normal review/SOP decision if we want resilience against future HOME/runner drift. No merge, force-push, push to main, branch-protection mutation, runner restart, secret rotation, or destructive cleanup was performed.

2026-05-20 23:25 PDT fresh triage update - `molecule-core/main` advanced to `7f59b7fd3531`; status API currently shows no failed commit statuses (`pending:23,success:1`). - The current workspace-server publish run is green again: `publish-workspace-server-image / build-and-push` run 77469 job 0 succeeded in 7m17s. The log shows it ran on `molecule-runner-publish-1` / Docker `Name: molecule-canonical-1`, Docker root `/mnt/ci-runner-storage-1/docker`, and pushed `platform:staging-7f59b7f` plus `platform-tenant:staging-7f59b7f`. - Production auto-deploy run 77469 job 1 is active but waiting on CI/secret-scan contexts; sampled log showed the wait step listing pending CI contexts. Secret scan run 77470 job 0 log endpoint returns `job is not started`. - Main publish workflows still lack workspace-local `DOCKER_CONFIG` / `BUILDX_CONFIG`; #1614 still adds those exports to both workspace-server and canvas publish workflows. With two consecutive current main publishes succeeding on canonical runners, #1614 remains hardening/recurrence prevention rather than an immediate unblocker. - PR #1614 remains open/mergeable at `f9f9a0100a72` against updated base `7f59b7fd3531`; visible failures remain review/SOP gates only (`security-review`, `qa-review`, and `sop-checklist`). needs-hongming: no immediate human action from this pass; #1614 still needs normal review/SOP decision if we want resilience against future HOME/runner drift. No merge, force-push, push to main, branch-protection mutation, runner restart, secret rotation, or destructive cleanup was performed.
Author
Owner

2026-05-21 00:25 PDT fresh triage update

  • molecule-core/main advanced to def18f28fa74; status API currently shows no failed commit statuses (pending:23,success:15).
  • The current workspace-server publish run 77587 job 0 is in progress. Its log shows it landed on molecule-runner-publish-2 / Docker Name: molecule-canonical-1, Docker root /mnt/ci-runner-storage-1/docker; no EACCES/permission-denied failure observed as of this check.
  • Secret scan run 77588 job 0 succeeded; Ops Scripts run 77589 job 0 succeeded. E2E API Smoke, Chat, and Staging Canvas sampled contexts succeeded. CI / all-required is waiting on Platform/Canvas CI contexts. Production auto-deploy job 77587/1 has not started yet.
  • Main publish workflows still lack workspace-local DOCKER_CONFIG / BUILDX_CONFIG; #1614 still adds those exports to both workspace-server and canvas publish workflows. With recent main publishes running on canonical runners, #1614 remains hardening/recurrence prevention rather than an immediate unblocker.
  • PR #1614 remains open/mergeable at f9f9a0100a72 against updated base def18f28fa74; visible failures remain review/SOP gates only (security-review, qa-review, and sop-checklist).

needs-hongming: no immediate human action from this pass; #1614 still needs normal review/SOP decision if we want resilience against future HOME/runner drift. No merge, force-push, push to main, branch-protection mutation, runner restart, secret rotation, or destructive cleanup was performed.

2026-05-21 00:25 PDT fresh triage update - `molecule-core/main` advanced to `def18f28fa74`; status API currently shows no failed commit statuses (`pending:23,success:15`). - The current workspace-server publish run 77587 job 0 is in progress. Its log shows it landed on `molecule-runner-publish-2` / Docker `Name: molecule-canonical-1`, Docker root `/mnt/ci-runner-storage-1/docker`; no `EACCES`/permission-denied failure observed as of this check. - Secret scan run 77588 job 0 succeeded; Ops Scripts run 77589 job 0 succeeded. E2E API Smoke, Chat, and Staging Canvas sampled contexts succeeded. `CI / all-required` is waiting on Platform/Canvas CI contexts. Production auto-deploy job 77587/1 has not started yet. - Main publish workflows still lack workspace-local `DOCKER_CONFIG` / `BUILDX_CONFIG`; #1614 still adds those exports to both workspace-server and canvas publish workflows. With recent main publishes running on canonical runners, #1614 remains hardening/recurrence prevention rather than an immediate unblocker. - PR #1614 remains open/mergeable at `f9f9a0100a72` against updated base `def18f28fa74`; visible failures remain review/SOP gates only (`security-review`, `qa-review`, and `sop-checklist`). needs-hongming: no immediate human action from this pass; #1614 still needs normal review/SOP decision if we want resilience against future HOME/runner drift. No merge, force-push, push to main, branch-protection mutation, runner restart, secret rotation, or destructive cleanup was performed.
Author
Owner

2026-05-21 01:25 PDT fresh triage update

  • molecule-core/main@def18f28fa74 is red on exactly one current failed status: E2E Peer Visibility (literal MCP list_peers) / E2E Peer Visibility, run 77595 job 2. Status API summary: failure:1,pending:43,success:42 (includes stale duplicate pending rows).
  • Direct failure evidence: the staging peer-visibility job provisioned org e2e-pv-20260521-90478-1, tenant reached running, /health became OK on attempt 16, parent workspace was created, then the hermes sibling workspace response was status=provisioning and workspace_access=none with no auth_token: hermes workspace did not return or mint an auth_token — cannot drive its MCP call. Teardown successfully purged the tenant.
  • Publish is no longer the active red: workspace-server publish run 77587 job 0 succeeded in 6m2s on canonical runner publish-2 and production auto-deploy job 1 succeeded in 3m31s. CI all-required succeeded in 7m34s.
  • Main publish workflows still lack workspace-local DOCKER_CONFIG / BUILDX_CONFIG; #1614 still adds those exports to both workspace-server and canvas publish workflows. With current publishes succeeding on canonical runners, #1614 remains hardening/recurrence prevention rather than an immediate unblocker.
  • PR #1614 remains open/mergeable at f9f9a0100a72 against base def18f28fa74; visible failures remain review/SOP gates only.

needs-hongming: the current main-red root appears to be the staging peer-visibility/hermes auth-token path, not image publish. No merge, force-push, push to main, branch-protection mutation, runner restart, secret rotation, or destructive cleanup was performed.

2026-05-21 01:25 PDT fresh triage update - `molecule-core/main@def18f28fa74` is red on exactly one current failed status: `E2E Peer Visibility (literal MCP list_peers) / E2E Peer Visibility`, run 77595 job 2. Status API summary: `failure:1,pending:43,success:42` (includes stale duplicate pending rows). - Direct failure evidence: the staging peer-visibility job provisioned org `e2e-pv-20260521-90478-1`, tenant reached `running`, `/health` became OK on attempt 16, parent workspace was created, then the hermes sibling workspace response was `status=provisioning` and `workspace_access=none` with no `auth_token`: `hermes workspace did not return or mint an auth_token — cannot drive its MCP call`. Teardown successfully purged the tenant. - Publish is no longer the active red: workspace-server publish run 77587 job 0 succeeded in 6m2s on canonical runner publish-2 and production auto-deploy job 1 succeeded in 3m31s. CI all-required succeeded in 7m34s. - Main publish workflows still lack workspace-local `DOCKER_CONFIG` / `BUILDX_CONFIG`; #1614 still adds those exports to both workspace-server and canvas publish workflows. With current publishes succeeding on canonical runners, #1614 remains hardening/recurrence prevention rather than an immediate unblocker. - PR #1614 remains open/mergeable at `f9f9a0100a72` against base `def18f28fa74`; visible failures remain review/SOP gates only. needs-hongming: the current main-red root appears to be the staging peer-visibility/hermes auth-token path, not image publish. No merge, force-push, push to main, branch-protection mutation, runner restart, secret rotation, or destructive cleanup was performed.
Author
Owner

2026-05-21 02:25 PDT fresh triage update

  • molecule-core/main@def18f28fa74 remains red on the same single failed status: E2E Peer Visibility (literal MCP list_peers) / E2E Peer Visibility, run 77595 job 2. Status API summary is now failure:1,pending:57,success:56 with stale duplicate pending rows.
  • Direct log evidence is unchanged: the staging E2E provisioned org e2e-pv-20260521-90478-1, tenant reached running, /health became OK on attempt 16, parent workspace was created, then the hermes sibling create response had status=provisioning, workspace_access=none, and no auth_token; the script also attempted the fallback token mint endpoints before failing. Teardown purged the tenant.
  • Source check: tests/e2e/test_peer_visibility_mcp_staging.sh first reads auth_token or connection.auth_token, then tries POST /admin/workspaces/$WID/tokens, then GET /admin/workspaces/$WID/test-token; only after all three are empty does it fail. That narrows this to the hermes workspace token/provisioning surface, not a test parsing miss.
  • Publish/deploy remain green: workspace-server publish run 77587 job 0 succeeded in 6m2s on canonical runner publish-2, production auto-deploy job 1 succeeded in 3m31s, and CI all-required succeeded in 7m34s.
  • PR #1614 remains open/mergeable at f9f9a0100a72 against base def18f28fa74; visible failures remain review/SOP gates only. Since current publishes are green on canonical runners, #1614 remains hardening/recurrence prevention rather than the active unblocker.

needs-hongming: the active main-red remains peer-visibility/hermes auth-token provisioning. No merge, force-push, push to main, branch-protection mutation, runner restart, secret rotation, or destructive cleanup was performed.

2026-05-21 02:25 PDT fresh triage update - `molecule-core/main@def18f28fa74` remains red on the same single failed status: `E2E Peer Visibility (literal MCP list_peers) / E2E Peer Visibility`, run 77595 job 2. Status API summary is now `failure:1,pending:57,success:56` with stale duplicate pending rows. - Direct log evidence is unchanged: the staging E2E provisioned org `e2e-pv-20260521-90478-1`, tenant reached `running`, `/health` became OK on attempt 16, parent workspace was created, then the hermes sibling create response had `status=provisioning`, `workspace_access=none`, and no `auth_token`; the script also attempted the fallback token mint endpoints before failing. Teardown purged the tenant. - Source check: `tests/e2e/test_peer_visibility_mcp_staging.sh` first reads `auth_token` or `connection.auth_token`, then tries `POST /admin/workspaces/$WID/tokens`, then `GET /admin/workspaces/$WID/test-token`; only after all three are empty does it fail. That narrows this to the hermes workspace token/provisioning surface, not a test parsing miss. - Publish/deploy remain green: workspace-server publish run 77587 job 0 succeeded in 6m2s on canonical runner publish-2, production auto-deploy job 1 succeeded in 3m31s, and CI all-required succeeded in 7m34s. - PR #1614 remains open/mergeable at `f9f9a0100a72` against base `def18f28fa74`; visible failures remain review/SOP gates only. Since current publishes are green on canonical runners, #1614 remains hardening/recurrence prevention rather than the active unblocker. needs-hongming: the active main-red remains peer-visibility/hermes auth-token provisioning. No merge, force-push, push to main, branch-protection mutation, runner restart, secret rotation, or destructive cleanup was performed.
Author
Owner

2026-05-21 03:25 PDT PR follow-up

Fresh status: PR #1614 remains open/mergeable at f9f9a0100a72 against base def18f28fa74. PR-head visible failures are still review/SOP gates only (sop-checklist, security-review, qa-review).

Mainline evidence changed the operational priority: molecule-core/main@def18f28fa74 is currently red on peer-visibility/hermes token provisioning, while workspace-server publish run 77587 job 0, production auto-deploy job 1, and CI all-required all succeeded on canonical runner/Docker root /mnt/ci-runner-storage-1/docker. So this PR remains useful hardening against the earlier Docker config path regression, but it is no longer the active mainline unblocker.

needs-hongming: no merge/push/main mutation performed. The current active red to prioritize separately is E2E Peer Visibility (literal MCP list_peers) / E2E Peer Visibility, run 77595 job 2.

2026-05-21 03:25 PDT PR follow-up Fresh status: PR #1614 remains open/mergeable at `f9f9a0100a72` against base `def18f28fa74`. PR-head visible failures are still review/SOP gates only (`sop-checklist`, `security-review`, `qa-review`). Mainline evidence changed the operational priority: `molecule-core/main@def18f28fa74` is currently red on peer-visibility/hermes token provisioning, while workspace-server publish run 77587 job 0, production auto-deploy job 1, and CI all-required all succeeded on canonical runner/Docker root `/mnt/ci-runner-storage-1/docker`. So this PR remains useful hardening against the earlier Docker config path regression, but it is no longer the active mainline unblocker. needs-hongming: no merge/push/main mutation performed. The current active red to prioritize separately is `E2E Peer Visibility (literal MCP list_peers) / E2E Peer Visibility`, run 77595 job 2.
agent-dev-b approved these changes 2026-05-23 00:54:37 +00:00
Dismissed
agent-dev-b left a comment
Member

APPROVED. Adds writable Docker config dir setup in two publish workflows (publish-canvas-image.yml, publish-workspace-server-image.yml). Standard pattern: creates $GITHUB_WORKSPACE/.docker-ecr/buildx and sets DOCKER_CONFIG/BUILDX_CONFIG to point at it before Docker/Buildx operations. Small, correct, no backcompat concerns.

APPROVED. Adds writable Docker config dir setup in two publish workflows (publish-canvas-image.yml, publish-workspace-server-image.yml). Standard pattern: creates $GITHUB_WORKSPACE/.docker-ecr/buildx and sets DOCKER_CONFIG/BUILDX_CONFIG to point at it before Docker/Buildx operations. Small, correct, no backcompat concerns.
agent-dev-a approved these changes 2026-05-24 13:32:51 +00:00
agent-dev-a left a comment
Member

LGTM — cross-author review.

LGTM — cross-author review.
agent-dev-b approved these changes 2026-05-25 11:45:03 +00:00
agent-dev-b left a comment
Member

Cross-approve.

Cross-approve.
agent-dev-b closed this pull request 2026-05-25 17:14:22 +00:00
agent-dev-b reopened this pull request 2026-05-25 17:14:53 +00:00
agent-pm force-pushed fix/publish-docker-config-api-20260520 from f9f9a0100a to 9bcf9d1dfe 2026-05-27 04:36:39 +00:00 Compare
agent-reviewer reviewed 2026-05-27 16:42:57 +00:00
agent-reviewer left a comment
Member

agent-reviewer: SKIP (duplicate cluster / already landed). base=main, changed_files=0, head SHA identical to #1596. The writable-Docker-config change is already in main — this and #1596 are a duplicate pair with an empty diff. Close one; the other is also a no-op.

agent-reviewer: SKIP (duplicate cluster / already landed). base=main, changed_files=0, head SHA identical to #1596. The writable-Docker-config change is already in main — this and #1596 are a duplicate pair with an empty diff. Close one; the other is also a no-op.
Author
Owner

Closing as duplicate of #1596 (identical, changed_files=0, already landed).

Closing as duplicate of #1596 (identical, changed_files=0, already landed).
hongming closed this pull request 2026-05-27 16:43:58 +00:00
All checks were successful
ci-arm64-advisory / fast-checks (push) Waiting to run
Lint shellcheck (arm64 pilot) / shellcheck-arm64 (pilot) (push) Successful in 12s
Block internal-flavored paths / Block forbidden paths (push) Successful in 4s
CI / Python Lint & Test (push) Successful in 3s
CI / Detect changes (push) Successful in 8s
E2E API Smoke Test / detect-changes (push) Successful in 8s
E2E Chat / detect-changes (push) Successful in 9s
E2E Staging Canvas (Playwright) / detect-changes (push) Successful in 11s
E2E Peer Visibility (literal MCP list_peers) / E2E Peer Visibility (local) (push) Successful in 45s
Handlers Postgres Integration / detect-changes (push) Successful in 6s
Harness Replays / detect-changes (push) Successful in 5s
Lint forbidden tenant-env keys / Scan workspace_secrets writers for forbidden env keys (push) Successful in 5s
Lint no tenant GITEA or GITHUB token write / Scan for repo-host token write into tenant workspace surface (push) Successful in 5s
Secret scan / Scan diff for credential-shaped strings (push) Successful in 5s
CI / Shellcheck (E2E scripts) (push) Successful in 9s
CI / Canvas (Next.js) (push) Successful in 9s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (push) Successful in 9s
publish-workspace-server-image / build-and-push (push) Successful in 3m10s
E2E API Smoke Test / E2E API Smoke Test (push) Successful in 2m26s
Harness Replays / Harness Replays (push) Successful in 6s
CI / Canvas Deploy Reminder (push) Successful in 3s
Handlers Postgres Integration / Handlers Postgres Integration (push) Successful in 2m48s
E2E Peer Visibility (literal MCP list_peers) / E2E Peer Visibility (push) Successful in 4m50s
E2E Chat / E2E Chat (push) Successful in 4m36s
E2E Staging External Runtime / E2E Staging External Runtime (push) Successful in 5m32s
CI / Platform (Go) (push) Successful in 5m26s
CI / all-required (push) Successful in 6m53s
publish-workspace-server-image / Production auto-deploy (push) Successful in 5m45s
lint-bp-context-emit-match / lint-bp-context-emit-match (push) Successful in 1m39s
ci-arm64-advisory / fast-checks (pull_request) Waiting to run
Lint shellcheck (arm64 pilot) / shellcheck-arm64 (pilot) (pull_request) Successful in 10s
CI / all-required (pull_request) Successful in 6m8s
Required
Details
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 5s
CI / Detect changes (pull_request) Successful in 7s
CI / Python Lint & Test (pull_request) Successful in 6s
E2E API Smoke Test / detect-changes (pull_request) Successful in 11s
E2E Chat / detect-changes (pull_request) Successful in 12s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 9s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 4s
Lint forbidden tenant-env keys / Scan workspace_secrets writers for forbidden env keys (pull_request) Successful in 5s
Lint no tenant GITEA or GITHUB token write / Scan for repo-host token write into tenant workspace surface (pull_request) Successful in 5s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 5s
gate-check-v3 / gate-check (pull_request) Successful in 9s
qa-review / approved (pull_request) Successful in 9s
security-review / approved (pull_request) Successful in 5s
sop-checklist / review-refire (pull_request) Has been skipped
sop-checklist / na-declarations (pull_request) N/A: (none)
sop-tier-check / tier-check (pull_request) Successful in 6s
sop-checklist / all-items-acked (pull_request) Successful in 7s
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 1m11s
SECRET_PATTERNS drift lint / Detect SECRET_PATTERNS drift (push) Successful in 32s
CI / Platform (Go) (pull_request) Successful in 3s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 3s
CI / Canvas (Next.js) (pull_request) Successful in 4s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 5s
Required
Details
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 4s
Required
Details
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 4s
E2E Chat / E2E Chat (pull_request) Successful in 4s
Sweep stale Cloudflare DNS records / Sweep CF orphans (push) Successful in 21s
ci-required-drift / drift (push) Successful in 1m9s
Sweep stale AWS Secrets Manager secrets / Sweep AWS Secrets Manager (push) Successful in 8s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
Sweep stale Cloudflare Tunnels / Sweep CF tunnels (push) Successful in 23s
Sweep stale e2e-* orgs (staging) / Sweep e2e orgs (push) Successful in 13s
Staging SaaS smoke (every 30 min) / Staging SaaS smoke (push) Successful in 5m18s
main-red-watchdog / watchdog (push) Successful in 39s
gate-check-v3 / gate-check (push) Successful in 56s
Continuous synthetic E2E (staging) / Synthetic E2E against staging (push) Successful in 8m0s
audit-force-merge / audit (pull_request) Waiting to run

Pull request closed

Sign in to join this conversation.
5 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: molecule-ai/molecule-core#1614