Switch publish-image cache to type=registry (GHA cache emulator unreachable from buildkit) #1182

Open
opened 2026-05-15 12:20:15 +00:00 by hongming · 0 comments
Owner

Problem

publish-image.yml in workspace templates was using cache-to: type=gha,mode=max + cache-from: type=gha for buildkit cache. The act_runner provides a GHA-emulator cache server (e.g. at 172.18.0.7:37871 on the runner bridge), but buildkit cannot reach this address because it runs in a separate docker container with its own network namespace.

Repro

ERROR: failed to build: failed to solve: Get "http://172.18.0.7:37871/_apis/artifactcache/cache?...": dial tcp 172.18.0.7:37871: i/o timeout

Hermes publish-image run 109: 17m39s failure. Openclaw publish-image run 66: 8m7s failure. Both at the cache-to/cache-from step.

Workaround in place

openclaw#8 and hermes#19 (merged) drop both cache lines. Loses cache perf — uncached builds take 6-8 minutes instead of 2-3.

Proper fix options

A. type=registry,ref=$IMAGE_NAME:buildcache — buildkit can pull/push to ECR over standard HTTPS. Works across container boundaries. Recommended.

B. type=inline — bakes cache info into image layers. Smaller cache but no extra registry round-trip. Good for templates with small dep churn.

C. Fix act_runner cache server to bind on the host network so buildkit containers can reach it. Upstream gitea/act_runner work.

Recommendation

Try option A (type=registry,ref=...:buildcache) on one template first (say openclaw) and measure rebuild time. If it works, roll out to other templates.

Acceptance criteria

  • publish-image rebuild time back to ~2-3 min for cached layer hits
  • Works in act_runner DinD topology
  • Cache eviction doesn't bloat ECR storage

Repos affected

All workspace templates with publish-image.yml:

  • molecule-ai-workspace-template-openclaw
  • molecule-ai-workspace-template-hermes
  • molecule-ai-workspace-template-deepagents
  • (presumably others)
## Problem publish-image.yml in workspace templates was using `cache-to: type=gha,mode=max` + `cache-from: type=gha` for buildkit cache. The act_runner provides a GHA-emulator cache server (e.g. at 172.18.0.7:37871 on the runner bridge), but buildkit cannot reach this address because it runs in a separate docker container with its own network namespace. ## Repro ERROR: failed to build: failed to solve: Get "http://172.18.0.7:37871/_apis/artifactcache/cache?...": dial tcp 172.18.0.7:37871: i/o timeout Hermes publish-image run 109: 17m39s failure. Openclaw publish-image run 66: 8m7s failure. Both at the cache-to/cache-from step. ## Workaround in place openclaw#8 and hermes#19 (merged) drop both cache lines. Loses cache perf — uncached builds take 6-8 minutes instead of 2-3. ## Proper fix options A. type=registry,ref=$IMAGE_NAME:buildcache — buildkit can pull/push to ECR over standard HTTPS. Works across container boundaries. Recommended. B. type=inline — bakes cache info into image layers. Smaller cache but no extra registry round-trip. Good for templates with small dep churn. C. Fix act_runner cache server to bind on the host network so buildkit containers can reach it. Upstream gitea/act_runner work. ## Recommendation Try option A (type=registry,ref=...:buildcache) on one template first (say openclaw) and measure rebuild time. If it works, roll out to other templates. ## Acceptance criteria - publish-image rebuild time back to ~2-3 min for cached layer hits - Works in act_runner DinD topology - Cache eviction doesn't bloat ECR storage ## Repos affected All workspace templates with publish-image.yml: - molecule-ai-workspace-template-openclaw - molecule-ai-workspace-template-hermes - molecule-ai-workspace-template-deepagents - (presumably others)
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: molecule-ai/molecule-core#1182