The publish-workspace-server-image / build-and-push job clones the full
manifest (~36 repos) serially in the "Pre-clone manifest deps" step on a
memory-constrained Gitea Actions runner. Under host memory pressure the
OOM killer SIGKILLs git-remote-https mid-clone:
cloning .../molecule-ai-plugin-molecule-skill-code-review.git ...
error: git-remote-https died of signal 9
fatal: the remote end hung up unexpectedly
❌ Failure - Main Pre-clone manifest deps
exitcode '128': failure
Observed in run 4622 (2026-05-10, staging HEAD b5d2ab88) — died on the
14th of 36 clones, which red-lights CI and wedges staging→main.
Wrap each `git clone` in clone-manifest.sh with bounded retry + backoff
(3 attempts, 3s/6s), wiping any partial checkout between tries. A single
transient SIGKILL / network blip no longer fails the whole tenant image
rebuild. Benefits every caller of the script (publish-workspace-server-image,
harness-replays, Dockerfile builds, local quickstart).
This is a mitigation; the durable fix is more runner RAM/swap on the
operator host — tracked separately with Infra-SRE.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>