Cherry-pick of d79a4bd2 from PR #318 onto fresh main base (PR #318 closed).
Issue #310: platform a2a-proxy logs ~300/hr
`timeout awaiting response headers` because ResponseHeaderTimeout was hardcoded
to 60s. Opus agent turns (big context + internal delegate_task round-trips)
routinely exceed 60s, so the proxy gave up before headers arrived even when
the workspace agent was healthy.
Changes:
- a2a_proxy.go: ResponseHeaderTimeout: 60s hardcoded →
envx.Duration("A2A_PROXY_RESPONSE_HEADER_TIMEOUT", 180s).
180s gives Opus turns comfortable headroom. The X-Timeout caller header
still bounds the absolute request ceiling independently.
- a2a_proxy_test.go: TestA2AClientResponseHeaderTimeout verifies the 180s
default and env-override parsing logic.
Env var: A2A_PROXY_RESPONSE_HEADER_TIMEOUT (e.g. 5m, 300s).
Closes#310.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Ports the bounded retry+backoff around each `git clone` in
scripts/clone-manifest.sh onto main, mirroring PR #298 which landed the
same change on staging. CI-infra carve-out: publish-workspace-server-image.yml
fires on `push: branches:[main]`, so the retry mitigation must be on main for
the workflow to be resilient to the OOM-killed-git-mid-clone flake
(`error: git-remote-https died of signal 9`, run 4622) when triggered by a
main push. Same one-file change as #298 (+45/-5), POSIX-sh, sh -n clean.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The Docker daemon health-check fix should not change which branches trigger
the build. Revert accidental addition of 'staging' to branch filters.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Cover the canvas image publish workflow with the same `docker info`
guard added to publish-workspace-server-image.yml (commit 5216e781).
publish-canvas-image.yml was the only docker-build workflow still
missing the step.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The Molecule-AI GitHub org was suspended 2026-05-06; canonical SCM is
now git.moleculesai.app. external_connection.go was still emitting
github.com URLs in operator-facing copy-paste blocks, breaking
external-agent onboarding silently.
Per-site decisions (8 emit sites in 1 file):
- L124 (channel template doc comment): swap source-of-truth comment to
Gitea host.
- L137 /plugin marketplace add Molecule-AI/...: swap to explicit Gitea
HTTPS URL form. End-to-end-verified path per internal#37 § 1.A.
- L138 /plugin install molecule@molecule-mcp-claude-channel: marketplace
name is molecule-channel (per remote .claude-plugin/marketplace.json),
not the repo name. Fix to molecule@molecule-channel.
- L157 --channels plugin:molecule@molecule-mcp-claude-channel: same
marketplace-name fix.
- L179 user-facing GitHub URL: swap to Gitea.
- L261 pip install git+https://github.com/Molecule-AI/molecule-sdk-python:
not on PyPI; swap to git+https://git.moleculesai.app/molecule-ai/...
- L310 hermes-channel doc comment: swap source-of-truth comment.
- L339 pip install git+https://github.com/Molecule-AI/hermes-channel-molecule:
not on PyPI; swap to Gitea.
- L369 issue-tracker URL: swap to Gitea.
Verification:
- molecule-ai-workspace-runtime, codex-channel-molecule are on PyPI (200);
no swap needed for those pip lines (they were already package-name form).
- molecule-mcp-claude-channel, molecule-sdk-python, hermes-channel-molecule
are NOT on PyPI; swapped to git+https://git.moleculesai.app/molecule-ai/
form. All three repos are public on Gitea (default branch main) and
serve git-upload-pack unauthenticated (verified curl 200 against
/info/refs?service=git-upload-pack).
- Third-party github URLs (gin import, openai/codex, NousResearch/
hermes-agent upstream issue trackers, npm @openai/codex) intentionally
preserved.
Adds TestExternalTemplates_NoBrokenMoleculeAIGitHubURLs regression guard
to prevent the same broken URLs from re-emerging on future template
edits.
go vet / go build / existing TestExternal* — all clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two surfaces in workspace-server hardcoded `ghcr.io` and silently bypassed
the `MOLECULE_IMAGE_REGISTRY` env override that flips every other image
operation to the configured private mirror (e.g. AWS ECR in production):
1. internal/imagewatch/watch.go — image-auto-refresh polled
`https://ghcr.io/v2/...` and `https://ghcr.io/token` directly. Post-
suspension, with the platform pointed at ECR, the watcher silently
stopped seeing digest changes (every poll either 404'd or hung on a
registry it has no business talking to).
2. internal/handlers/admin_workspace_images.go — Docker Engine auth
payload pinned `serveraddress: "ghcr.io"`, so when the operator sets
`MOLECULE_IMAGE_REGISTRY=…ecr…/molecule-ai` the engine matched the
wrong credential entry on every authenticated pull.
Fix: extract `provisioner.RegistryHost()` returning the host portion of
`RegistryPrefix()` (e.g. `ghcr.io` ← `ghcr.io/molecule-ai`, or
`004947743811.dkr.ecr.us-east-2.amazonaws.com` ← the ECR mirror prefix),
and route both surfaces through it. Default behavior is unchanged for
OSS users on GHCR.
Tests
- New `TestRegistryHost_SplitsHostFromOrgPath` and
`TestRegistryHost_NeverEmpty` pin the helper across GHCR / ECR /
self-hosted Gitea / bare-host edge cases.
- New `TestGHCRAuthHeader_RespectsRegistryEnv` asserts the Docker auth
payload's `serveraddress` follows MOLECULE_IMAGE_REGISTRY (and never
leaks the org-path suffix).
- New `TestRemoteDigest_RegistryHostFollowsEnv` stands up an httptest
server, points MOLECULE_IMAGE_REGISTRY at it, and confirms both the
token endpoint and the manifest HEAD land there — i.e. the full image-
watch loop respects the env override end-to-end.
Both new tests were verified to FAIL on the pre-fix code path before the
helper was wired in, so a future revert can't silently re-introduce the
bug.
Out of scope (followup needed)
ECR uses `aws ecr get-authorization-token` (SigV4 + basic-auth) instead
of GHCR's `/token?service=…&scope=…` flow. This PR makes the URL host-
configurable; the bearer-token negotiation in `fetchPullToken` still
speaks the GHCR flavor. On ECR with `IMAGE_AUTO_REFRESH=true`, the
watcher will now fail loudly at the token fetch (logged per tick) rather
than silently hitting ghcr.io. Operators on ECR should keep
IMAGE_AUTO_REFRESH=false until ECR auth is wired — tracked as a separate
task. Net effect of this PR alone is strictly better than pre-fix:
fail-loud > silent-broken.
Refs: RFC #229 P2-4
tier:low
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Run `docker info` as the first CI step to catch runner Docker socket
permission issues (docker.sock unreadable, daemon restarted, group
membership drift) before the expensive `docker build` step. The error
now surfaces immediately with a clear `::error::` message rather than
silently continuing into `docker build` where the same failure would
appear 60-90s later as a cryptic ECR auth error.
Gitea Actions run 4350 (2026-05-10 05:58 UTC) is the trigger: the runner's
docker.sock became inaccessible for ~6 minutes, `docker build` failed
at step 2 with `permission denied...docker.sock`, and `go build` (step 3)
was never reached — masking the compile errors that were already on
main. The downstream code errors only surfaced once run 4407 succeeded
at `docker build` and finally reached `go build`.
Now: `docker info` → fail in ~1s with actionable error.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>