From 5216e781cd6001f16e43c08ba0ae41f5807da6b4 Mon Sep 17 00:00:00 2001 From: Molecule AI Infra-SRE Date: Sun, 10 May 2026 10:01:01 +0000 Subject: [PATCH] ci: add Docker daemon health-check step before build MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Run `docker info` as the first CI step to catch runner Docker socket permission issues (docker.sock unreadable, daemon restarted, group membership drift) before the expensive `docker build` step. The error now surfaces immediately with a clear `::error::` message rather than silently continuing into `docker build` where the same failure would appear 60-90s later as a cryptic ECR auth error. Gitea Actions run 4350 (2026-05-10 05:58 UTC) is the trigger: the runner's docker.sock became inaccessible for ~6 minutes, `docker build` failed at step 2 with `permission denied...docker.sock`, and `go build` (step 3) was never reached — masking the compile errors that were already on main. The downstream code errors only surfaced once run 4407 succeeded at `docker build` and finally reached `go build`. Now: `docker info` → fail in ~1s with actionable error. Co-Authored-By: Claude Opus 4.7 --- .../publish-workspace-server-image.yml | 19 +++++++++++++++++++ .../publish-workspace-server-image.yml | 16 ++++++++++++++++ 2 files changed, 35 insertions(+) diff --git a/.gitea/workflows/publish-workspace-server-image.yml b/.gitea/workflows/publish-workspace-server-image.yml index 96a03b7e..6b2fcee4 100644 --- a/.gitea/workflows/publish-workspace-server-image.yml +++ b/.gitea/workflows/publish-workspace-server-image.yml @@ -59,6 +59,25 @@ jobs: - name: Checkout uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2 + # Health check: verify Docker daemon is accessible before attempting any + # build steps. This fails loudly at step 1 when the runner's docker.sock + # is inaccessible (e.g. permission change, daemon restart, or group-membership + # drift) rather than silently continuing to step 2 where `docker build` + # fails deep in the process with a cryptic ECR auth error that doesn't + # surface the root cause. Also reports the daemon version so operator + # can correlate with runner host logs. + - name: Verify Docker daemon access + run: | + set -euo pipefail + echo "::group::Docker daemon health check" + docker info 2>&1 | head -5 || { + echo "::error::Docker daemon is not accessible at /var/run/docker.sock" + echo "::error::Check: (1) daemon is running, (2) runner user is in docker group, (3) sock permissions are 660+" + exit 1 + } + echo "Docker daemon OK" + echo "::endgroup::" + # Pre-clone manifest deps before docker build. # # Why: workspace-template-* repos on Gitea are private. The pre-fix diff --git a/.github/workflows/publish-workspace-server-image.yml b/.github/workflows/publish-workspace-server-image.yml index be88f2cc..63767d9d 100644 --- a/.github/workflows/publish-workspace-server-image.yml +++ b/.github/workflows/publish-workspace-server-image.yml @@ -107,6 +107,22 @@ jobs: run: | echo "sha=${GITHUB_SHA::7}" >> "$GITHUB_OUTPUT" + # Health check: verify Docker daemon is accessible before attempting any + # build steps. This fails loudly at step 1 when the runner's docker.sock + # is inaccessible rather than silently continuing to the build step + # where docker build fails deep in ECR auth with a cryptic error. + - name: Verify Docker daemon access + run: | + set -euo pipefail + echo "::group::Docker daemon health check" + docker info 2>&1 | head -5 || { + echo "::error::Docker daemon is not accessible at /var/run/docker.sock" + echo "::error::Check: (1) daemon running, (2) runner user in docker group, (3) sock perms 660+" + exit 1 + } + echo "Docker daemon OK" + echo "::endgroup::" + # Pre-clone manifest deps before docker build (Task #173 fix). # # Why pre-clone: post-2026-05-06, every workspace-template-* repo on