chore: sync main into staging (Option A, RFC #229 P2 + PR #285) #325

Closed
fullstack-engineer wants to merge 3 commits from chore/sync-main-to-staging into main

3 Commits

Author SHA1 Message Date
b4269a53e4 Merge origin/main into staging (sync for Option A)
Some checks failed
Secret scan / Scan diff for credential-shaped strings (pull_request) Failing after 2s
sop-tier-check / tier-check (pull_request) Successful in 34s
audit-force-merge / audit (pull_request) Has been skipped
Syncs main into staging per Option (A) of the 2026-05-10 release-coordination
plan. Brings RFC #229 P2 batch (PRs #294, #295) + PR #285 (docker.sock
health-check guard) onto staging.

Commits absorbed from main:
- #285: ci: add Docker daemon health-check step before build
- RFC #229 P2-4: fix(workspace-server): respect MOLECULE_IMAGE_REGISTRY in
  imagewatch + admin_workspace_images
- RFC #229 P2-5: fix(workspace-server): emit Gitea/PyPI URLs for external
  user instructions (external_connection.go)
- provisioner/registry.go: new RegistryHost() helper + tests
- Workflow SHA-pin restoration (8b6a11cc, 6d94fd30)

Staging-specific content preserved:
- #298: fix(ci): retry git clone in clone-manifest.sh (7ff5622a + de9f46ea)

Merge commit only — no rebase, no squash. Awaiting PM/CEO authorization.

🤖 Generated with Claude Code
2026-05-10 13:54:19 +00:00
de9f46ea30 Merge pull request '[release-blocker] fix(ci): retry git clone in clone-manifest.sh (publish-workspace-server-image OOM flake)' (#298) from fix/publish-workspace-server-ci-clone-manifest-retry into staging
Some checks are pending
Secret scan / Scan diff for credential-shaped strings (push) Waiting to run
2026-05-10 12:44:35 +00:00
7ff5622a42 [infra-lead-agent] fix(ci): retry git clone in clone-manifest.sh (publish-workspace-server-image flake)
Some checks failed
Secret scan / Scan diff for credential-shaped strings (pull_request) Failing after 1s
sop-tier-check / tier-check (pull_request) Failing after 1s
audit-force-merge / audit (pull_request) Failing after 2s
The publish-workspace-server-image / build-and-push job clones the full
manifest (~36 repos) serially in the "Pre-clone manifest deps" step on a
memory-constrained Gitea Actions runner. Under host memory pressure the
OOM killer SIGKILLs git-remote-https mid-clone:

  cloning .../molecule-ai-plugin-molecule-skill-code-review.git ...
  error: git-remote-https died of signal 9
  fatal: the remote end hung up unexpectedly
    Failure - Main Pre-clone manifest deps
  exitcode '128': failure

Observed in run 4622 (2026-05-10, staging HEAD b5d2ab88) — died on the
14th of 36 clones, which red-lights CI and wedges staging→main.

Wrap each `git clone` in clone-manifest.sh with bounded retry + backoff
(3 attempts, 3s/6s), wiping any partial checkout between tries. A single
transient SIGKILL / network blip no longer fails the whole tenant image
rebuild. Benefits every caller of the script (publish-workspace-server-image,
harness-replays, Dockerfile builds, local quickstart).

This is a mitigation; the durable fix is more runner RAM/swap on the
operator host — tracked separately with Infra-SRE.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-10 11:58:09 +00:00