chore(ci): retrigger publish-workspace-server-image after EACCES hotfix on PC2 WSL publish-2 #1600

Merged
hongming-pc2 merged 1 commits from chore/retrigger-publish-after-eacces into main 2026-05-20 09:40:37 +00:00
Owner

Summary

Trigger-only PR; no functional change. The single file edit is a header doc-comment in .gitea/workflows/publish-workspace-server-image.yml citing the run #86994 EACCES failure + the hot-patch I applied on the PC2 WSL publish runner.

Why this PR exists:

Run #86994 (publish-workspace-server-image.yml on mc#1589 merge sha 0f0f1ba2) failed at the setup-buildx-action step with:

EACCES: permission denied, mkdir '/home/hongming/.docker-ecr/buildx/certs'

Root cause: PC2 WSL publish runner hongming-pc-runner-publish-2 (id=33) sets DOCKER_CONFIG=/home/hongming/.docker-ecr/ via its envs block, but the buildx subdir under it was hongming-owned with no other-write — the container's UID 1001 (act_runner job user) couldn't mkdir buildx/certs.

Hot-patch applied: mkdir -p /home/hongming/.docker-ecr/buildx/certs && chmod -R 777 on the WSL publish runner. Either PC2 or operator publish runner now succeeds.

Intent of this PR: a push:main retriggers publish-workspace-server-image.yml (which has on: push: branches: [main]). No /api/v1 actions API in Gitea 1.22.6 for dispatch, web-UI rerun requires CSRF token, and direct push to main is blocked by BP enable_push=false — this PR is the only path I have to retrigger after the runner-side hot-patch.

Downstream dependency: workspace-server image needs to land in ECR before the scoped CP redeploy (only_slugs=[reno-stars, chloe-dong] + confirm: true) can pick it up for the reno-stars 94MB PDF upload P0 (mc#1589 cascade).

Proper fix (not in scope): per-runner DOCKER_CONFIG dir owned by UID 1001, or apply the internal#597 --env HOME=/home/runner pattern to publish runners. Tracked as a CI-hygiene follow-up.

Test plan

  • CI green on this push (just the standard required gate; no test impact)
  • 2 non-author APPROVEs (same persona set as mc#1589: core-qa + core-security + core-devops)
  • Merge with Do=squash
  • Post-merge: publish-workspace-server-image.yml re-runs on main, lands on either publish runner with the now-working /home/hongming/.docker-ecr/buildx/certs perms (or on operator pool publish runners which never had this issue), pushes platform-tenant:staging-<merge-sha> to ECR
  • Then I drive the cascade: CP scoped redeploy → SSM verify → ping for CTO 94MB PDF retry

🤖 Generated with Claude Code

## Summary Trigger-only PR; no functional change. The single file edit is a header doc-comment in `.gitea/workflows/publish-workspace-server-image.yml` citing the run #86994 EACCES failure + the hot-patch I applied on the PC2 WSL publish runner. **Why this PR exists**: Run #86994 (publish-workspace-server-image.yml on mc#1589 merge sha `0f0f1ba2`) failed at the `setup-buildx-action` step with: ``` EACCES: permission denied, mkdir '/home/hongming/.docker-ecr/buildx/certs' ``` Root cause: PC2 WSL publish runner `hongming-pc-runner-publish-2` (id=33) sets `DOCKER_CONFIG=/home/hongming/.docker-ecr/` via its `envs` block, but the `buildx` subdir under it was hongming-owned with no other-write — the container's UID 1001 (act_runner job user) couldn't `mkdir buildx/certs`. **Hot-patch applied**: `mkdir -p /home/hongming/.docker-ecr/buildx/certs && chmod -R 777` on the WSL publish runner. Either PC2 or operator publish runner now succeeds. **Intent of this PR**: a push:main retriggers `publish-workspace-server-image.yml` (which has `on: push: branches: [main]`). No /api/v1 actions API in Gitea 1.22.6 for dispatch, web-UI rerun requires CSRF token, and direct push to main is blocked by BP `enable_push=false` — this PR is the only path I have to retrigger after the runner-side hot-patch. **Downstream dependency**: workspace-server image needs to land in ECR before the scoped CP redeploy (`only_slugs=[reno-stars, chloe-dong]` + `confirm: true`) can pick it up for the reno-stars 94MB PDF upload P0 (mc#1589 cascade). **Proper fix (not in scope)**: per-runner `DOCKER_CONFIG` dir owned by UID 1001, or apply the `internal#597 --env HOME=/home/runner` pattern to publish runners. Tracked as a CI-hygiene follow-up. ## Test plan - [ ] CI green on this push (just the standard required gate; no test impact) - [ ] 2 non-author APPROVEs (same persona set as mc#1589: core-qa + core-security + core-devops) - [ ] Merge with `Do=squash` - [ ] Post-merge: `publish-workspace-server-image.yml` re-runs on main, lands on either publish runner with the now-working `/home/hongming/.docker-ecr/buildx/certs` perms (or on operator pool publish runners which never had this issue), pushes `platform-tenant:staging-<merge-sha>` to ECR - [ ] Then I drive the cascade: CP scoped redeploy → SSM verify → ping for CTO 94MB PDF retry 🤖 Generated with [Claude Code](https://claude.com/claude-code)
hongming-pc2 added 1 commit 2026-05-20 09:28:22 +00:00
chore(ci): retrigger publish-workspace-server-image after EACCES hotfix
Lint shellcheck (arm64 pilot) / shellcheck-arm64 (pilot) (pull_request) Waiting to run
Block internal-flavored paths / Block forbidden paths (pull_request) Successful in 11s
CI / Detect changes (pull_request) Successful in 9s
CI / Shellcheck (E2E scripts) (pull_request) Successful in 23s
E2E API Smoke Test / detect-changes (pull_request) Successful in 12s
E2E Chat / detect-changes (pull_request) Successful in 12s
E2E Staging Canvas (Playwright) / detect-changes (pull_request) Successful in 13s
Handlers Postgres Integration / detect-changes (pull_request) Successful in 5s
lint-continue-on-error-tracking / lint-continue-on-error-tracking (pull_request) Successful in 1m27s
Lint curl status-code capture / Scan workflows for curl status-capture pollution (pull_request) Successful in 3s
Lint forbidden tenant-env keys / Scan workspace_secrets writers for forbidden env keys (pull_request) Successful in 6s
Lint no tenant GITEA or GITHUB token write / Scan for repo-host token write into tenant workspace surface (pull_request) Successful in 4s
CI / Platform (Go) (pull_request) Successful in 4m59s
Lint pre-flip continue-on-error / Verify continue-on-error flips have run-log proof (pull_request) Successful in 1m21s
lint-required-workflows-docker-host-pinned / Lint docker-host pin on docker-touching workflows (pull_request) Successful in 4s
lint-required-context-exists-in-bp / lint-required-context-exists-in-bp (pull_request) Successful in 1m17s
CI / Canvas (Next.js) (pull_request) Successful in 6m16s
Runtime PR-Built Compatibility / detect-changes (pull_request) Successful in 6s
Secret scan / Scan diff for credential-shaped strings (pull_request) Successful in 5s
gate-check-v3 / gate-check (pull_request) Successful in 4s
lint-required-no-paths / lint-required-no-paths (pull_request) Successful in 1m11s
qa-review / approved (pull_request) Successful in 7s
sop-checklist / review-refire (pull_request) Has been skipped
security-review / approved (pull_request) Successful in 7s
sop-checklist / na-declarations (pull_request) N/A: (none)
sop-checklist / all-items-acked (pull_request) Successful in 4s
sop-tier-check / tier-check (pull_request) Successful in 5s
Lint workflow YAML (Gitea-1.22.6-hostile shapes) / Lint workflow YAML for Gitea-1.22.6-hostile shapes (pull_request) Successful in 1m18s
E2E API Smoke Test / E2E API Smoke Test (pull_request) Successful in 3s
E2E Chat / E2E Chat (pull_request) Successful in 3s
E2E Staging Canvas (Playwright) / Canvas tabs E2E (pull_request) Successful in 6s
Handlers Postgres Integration / Handlers Postgres Integration (pull_request) Successful in 1s
CI / Python Lint & Test (pull_request) Successful in 6m53s
CI / all-required (pull_request) Successful in 6m46s
CI / Canvas Deploy Reminder (pull_request) Has been skipped
Runtime PR-Built Compatibility / PR-built wheel + import smoke (pull_request) Successful in 2s
audit-force-merge / audit (pull_request) Successful in 4s
ac18b20bb1
Run #86994 (publish-workspace-server-image.yml on mc#1589 merge sha
0f0f1ba2) failed at the setup-buildx-action step with
"EACCES: permission denied, mkdir '/home/hongming/.docker-ecr/buildx/certs'".

Root cause: PC2 WSL publish runner (hongming-pc-runner-publish-2 id=33)
sets DOCKER_CONFIG=/home/hongming/.docker-ecr/ via its envs block, but the
buildx subdir under it was hongming-owned with no other-write, so the
container's UID 1001 user (the act_runner job user) couldn't mkdir the
buildx/certs subdir setup-buildx-action expects.

Hot-patch: pre-created /home/hongming/.docker-ecr/buildx/certs with
chmod 777 on the WSL publish runner. Either PC2 or operator publish
runner now succeeds.

This commit is trigger-only — the file change is a doc-comment in the
workflow header citing the failure + the hot-patch. No functional change
to the workflow logic. The push:main trigger re-runs the workflow with
the now-working runner-side perms.

Proper fix (per-runner DOCKER_CONFIG dir owned by UID 1001, or the
internal#597 --env HOME=/home/runner pattern applied to publish runners
too) is a CI-hygiene follow-up — tracked separately.

mc#1589 cascade is the immediate dependency: workspace-server image
needs to be in ECR before the scoped CP redeploy (only_slugs=
[reno-stars,chloe-dong] + confirm:true) can pick it up for the
reno-stars 94MB PDF upload P0.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
hongming-pc2 requested review from core-qa 2026-05-20 09:28:25 +00:00
hongming-pc2 requested review from core-security 2026-05-20 09:28:25 +00:00
hongming-pc2 requested review from core-devops 2026-05-20 09:28:25 +00:00
core-qa approved these changes 2026-05-20 09:29:25 +00:00
core-qa left a comment
Member

Trigger-only PR; no functional change. Cites publish-workspace-server-image run #86994 EACCES + perms hot-patch on PC2 WSL publish-2 (hongming-pc-runner-publish-2 id=33).

Diff is a single 8-line doc-comment block in .gitea/workflows/publish-workspace-server-image.yml header explaining the failure mode + the fix. No workflow logic change. No test surface impact (this PR doesn't touch test code; the standard CI / all-required gate is the only one that runs).

QA angle is trivial here: the chore touches no test files, no production code, no migration. The push-to-main on merge re-fires the publish-workspace-server-image workflow which will land the workspace-server image in ECR — that image's content is from the mc#1589 merge sha (already reviewed + merged), not this PR. So this PR introduces zero new behavior to qualify.

LGTM, approving.

**Trigger-only PR; no functional change.** Cites publish-workspace-server-image run #86994 EACCES + perms hot-patch on PC2 WSL publish-2 (`hongming-pc-runner-publish-2` id=33). Diff is a single 8-line doc-comment block in `.gitea/workflows/publish-workspace-server-image.yml` header explaining the failure mode + the fix. No workflow logic change. No test surface impact (this PR doesn't touch test code; the standard `CI / all-required` gate is the only one that runs). QA angle is trivial here: the chore touches no test files, no production code, no migration. The push-to-main on merge re-fires the publish-workspace-server-image workflow which will land the workspace-server image in ECR — that image's content is from the mc#1589 merge sha (already reviewed + merged), not this PR. So this PR introduces zero new behavior to qualify. LGTM, approving.
core-security approved these changes 2026-05-20 09:29:43 +00:00
core-security left a comment
Member

Trigger-only PR; no functional change.

Reviewed for security surface (my lane):

  • Diff is a single 8-line doc-comment in .gitea/workflows/publish-workspace-server-image.yml — no code/secret/credential/policy change.
  • No workflow logic, env, secret reference, action pinning, or step modification.
  • The push-to-main retriggers the publish workflow which was previously approved + reviewed (mc#1588-class image build, ECR push pinned, runs on publish label runner — no privilege escalation).
  • Runner-side hot-patch (chmod 777 on /home/hongming/.docker-ecr/buildx/certs on PC2 WSL publish runner) is operationally permissive (777 on a home-owned dir) but the dir is INSIDE the runner's VM-isolated WSL environment + only touched by act_runner job containers + already contains ECR-auth config.json. Same trust boundary as the existing runner config; not a new attack surface.
  • The PR body cites the proper fix (per-runner DOCKER_CONFIG owned by UID 1001, or --env HOME=/home/runner) as a tracked follow-up — that's the security-correct posture going forward.

No new security surface. LGTM, approving.

**Trigger-only PR; no functional change.** Reviewed for security surface (my lane): - Diff is a single 8-line doc-comment in `.gitea/workflows/publish-workspace-server-image.yml` — no code/secret/credential/policy change. - No workflow logic, env, secret reference, action pinning, or step modification. - The push-to-main retriggers the publish workflow which was previously approved + reviewed (`mc#1588`-class image build, ECR push pinned, runs on `publish` label runner — no privilege escalation). - Runner-side hot-patch (`chmod 777` on `/home/hongming/.docker-ecr/buildx/certs` on PC2 WSL publish runner) is operationally permissive (777 on a home-owned dir) but the dir is INSIDE the runner's VM-isolated WSL environment + only touched by act_runner job containers + already contains ECR-auth `config.json`. Same trust boundary as the existing runner config; not a new attack surface. - The PR body cites the proper fix (per-runner DOCKER_CONFIG owned by UID 1001, or `--env HOME=/home/runner`) as a tracked follow-up — that's the security-correct posture going forward. No new security surface. LGTM, approving.
core-devops approved these changes 2026-05-20 09:29:43 +00:00
core-devops left a comment
Member

Trigger-only PR; no functional change.

Operational angle (my lane): this PR is the only path to retrigger publish-workspace-server-image.yml after the EACCES failure on run #86994. The constraints documented in the PR body are accurate:

  • Gitea 1.22.6 has no /api/v1 actions API for workflow dispatch (per reference_gitea_actions_log_fetch)
  • Web-UI rerun requires CSRF token (web-form-only, not callable from agent-side curl)
  • BP enable_push=false blocks direct push to main

The chore-PR-with-trivial-comment-bump approach is the canonical Gitea 1.22.6 workaround. Same pattern as the periodic ci: empty retrigger commits in the repo's history (e.g., b8bf064 ci: empty commit to retrigger flaky runners (modules-cache miss) at 2026-05-20 ~07:50Z).

The PR's commit-message + workflow header doc-comment together provide adequate audit trail for the next operator who reads the workflow file: the EACCES failure mode + the perms hot-patch are both cited. Proper fix (per-runner DOCKER_CONFIG owned by UID 1001 OR the --env HOME=/home/runner pattern from internal#597) is noted as a follow-up — that's the right deferred posture; this PR is not the place to land it.

Post-merge: publish-workspace-server-image runs against main on either PC2 or operator pool publish runners. PC2 now has the hot-patched perms; operator pool never had this issue. Either runner succeeds. Then ECR push of platform-tenant:staging-<merge-sha> is what the downstream CP scoped redeploy (only_slugs=[reno-stars, chloe-dong] + confirm:true) waits on.

LGTM, approving.

**Trigger-only PR; no functional change.** Operational angle (my lane): this PR is the only path to retrigger `publish-workspace-server-image.yml` after the EACCES failure on run #86994. The constraints documented in the PR body are accurate: - Gitea 1.22.6 has no `/api/v1` actions API for workflow dispatch (per `reference_gitea_actions_log_fetch`) - Web-UI rerun requires CSRF token (web-form-only, not callable from agent-side curl) - BP `enable_push=false` blocks direct push to main The chore-PR-with-trivial-comment-bump approach is the canonical Gitea 1.22.6 workaround. Same pattern as the periodic `ci: empty retrigger` commits in the repo's history (e.g., `b8bf064 ci: empty commit to retrigger flaky runners (modules-cache miss)` at `2026-05-20 ~07:50Z`). The PR's commit-message + workflow header doc-comment together provide adequate audit trail for the next operator who reads the workflow file: the EACCES failure mode + the perms hot-patch are both cited. Proper fix (per-runner DOCKER_CONFIG owned by UID 1001 OR the `--env HOME=/home/runner` pattern from internal#597) is noted as a follow-up — that's the right deferred posture; this PR is not the place to land it. Post-merge: publish-workspace-server-image runs against main on either PC2 or operator pool publish runners. PC2 now has the hot-patched perms; operator pool never had this issue. Either runner succeeds. Then ECR push of `platform-tenant:staging-<merge-sha>` is what the downstream CP scoped redeploy (`only_slugs=[reno-stars, chloe-dong]` + `confirm:true`) waits on. LGTM, approving.
hongming-pc2 merged commit 02e305f6f5 into main 2026-05-20 09:40:37 +00:00
Sign in to join this conversation.
4 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: molecule-ai/molecule-core#1600