The auto-promote-staging.yml gate-check (line 99) treats "workflow
didn't run" as failure. Path-filtered triggers on E2E API Smoke Test
and E2E Staging Canvas meant a platform-only or test-only push to
staging — say, the prior PR #2201 which only touched
tests/e2e/test_staging_full_saas.sh — never triggered the canvas
workflow, and auto-promote saw `missing/none`, marked all_green=false,
and aborted. Same class for any push that doesn't touch the gate's
watched paths. Dead-lock by design, never noticed because the gate
was new.
Fix per Design B (always-run + fast-skip):
- Drop `paths:` from the push/pull_request triggers on both gate
workflows. The workflow now always fires on every staging+main
push/PR.
- Add a `detect-changes` job using `dorny/paths-filter@v3` that
decides whether to do real work, scoped to the same paths the
trigger filter used to watch.
- Real work job (e2e-api / playwright) gates on
`needs: detect-changes; if: needs.detect-changes.outputs.X == 'true'`.
- Add a sibling `no-op` job that runs when the filter output is
false, emitting `::notice::… no-op pass`. The workflow run's
conclusion is `success` either way — auto-promote sees green and
proceeds.
manual `workflow_dispatch` and the weekly canvas `schedule` short-
circuit detect-changes to always-run — those triggers exist precisely
to exercise the suite and shouldn't be silently no-op'd.
Why this approach over making auto-promote-staging smarter:
The alternative (Design A, considered + rejected) was to teach
auto-promote-staging to read each gate's `paths:` filter and treat
"no run because filter excluded the commit" as conditional pass.
That couples auto-promote to other workflows' YAML schema and breaks
silently if a gate is renamed or its filter changes. Design B keeps
the auto-promote contract simple ("each gate emits success") and
makes each gate self-describing — adding a new gate doesn't require
touching auto-promote.
Cost: ~10-30s of runner overhead per gate per push for the no-op when
paths don't match. Negligible vs the alternative of dead-locked
auto-promote chains.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Self-contained happy-path E2E for the two runtimes the project commits
to first-class support for (task #116, completes the loop on the
"both must work end-to-end with tests" requirement).
What it proves per runtime:
1. POST /workspaces succeeds with the runtime + secrets
2. Workspace reaches status=online within its cold-boot window
(claude-code: 240s, hermes: 900s on cold apt + uv + sidecar)
3. POST /a2a (message/send "Reply with PONG") returns a non-error,
non-empty reply
4. activity_logs row written with method=message/send and ok|error
status (a2a_proxy.LogActivity contract)
Skip semantics: each phase independently checks for its required env
key (CLAUDE_CODE_OAUTH_TOKEN / E2E_OPENAI_API_KEY) and skips cleanly
if absent. The script always exit-0s if every phase either passed or
skipped — so wiring it into a no-keys CI job validates the script
itself stays clean without false-failing.
Idempotent: pre-sweeps any prior "Priority E2E (claude-code)" /
"Priority E2E (hermes)" workspaces so a run interrupted by SIGPIPE /
kill -9 (which bypasses the EXIT trap) doesn't poison the next run.
Same defensive pattern as test_notify_attachments_e2e.sh.
CI wiring:
- e2e-api.yml — runs on every PR with no LLM keys, both phases skip,
catches script-level regressions (set -u bugs, syntax issues, etc.)
- canary-staging.yml + e2e-staging-saas.yml already have the keys
via secrets.MOLECULE_STAGING_OPENAI_KEY and exercise wire-real
behavior — could be wired to opt-in if you want claude-code coverage
there too.
Local runs (from this branch, no keys):
=== Results: 0 passed, 0 failed, 2 skipped ===
Validates the capability primitives shipped in PRs #2137-2144: once
template PRs #12 (claude-code) + #25 (hermes) merge with their
declared provides_native_session=True + idle_timeout_override=900,
a manual run with both keys validates the full native+pluggable chain.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
User asked to "keep optimizing and comprehensive e2e testings to prove all
works as expected" for the communication path. Adds three layers of coverage
for PR #2130 (agent → user file attachments via send_message_to_user) since
that path has the most user-visible blast radius:
1. Shell E2E (tests/e2e/test_notify_attachments_e2e.sh) — pure platform test,
no workspace container needed. 14 assertions covering: notify text-only
round-trip, notify-with-attachments persists parts[].kind=file in the
shape extractFilesFromTask reads, per-element validation rejects empty
uri/name (regression for the missing gin `dive` bug), and a real
/chat/uploads → /notify URI round-trip when a container is up.
2. Canvas AGENT_MESSAGE handler tests (canvas-events.test.ts +5) — pin the
WebSocket-side filtering that drops malformed attachments, allows
attachments-only bubbles, ignores non-array payloads, and no-ops on
pure-empty events.
3. Persisted response_body shape test (message-parser.test.ts +1) — pins
the {result, parts} contract the chat history loader hydrates on
reload, so refreshing after an agent attachment restores both caption
and download chips.
Also wires the new shell E2E into e2e-api.yml so the contract regresses
in CI rather than only in manual runs.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds branches: [main, staging] to e2e-api.yml triggers so the
auto-promote workflow can see E2E API status on staging SHA.
Without this, the promoter gate for E2E API always reports missing
and auto-promotion is permanently blocked.
molecule-core is a public repo — GHA-hosted minutes are free. The
self-hosted Mac mini was only in play to dodge GHA rate limits
(memory feedback_selfhosted_runner), but for these specific
workflows it came with real costs:
- Docker-push workflows emulated linux/amd64 from arm64 via QEMU —
every canvas + platform image build ran ~2-3x slower than native.
- Six PRs worth of keychain-avoidance hacks in publish-* because
`docker login` on macOS writes to osxkeychain unconditionally,
and the Mac mini's launchd user-agent keychain is locked.
- Homebrew pin-down environment variables (HOMEBREW_NO_*) sprinkled
everywhere to work around the shared /opt/homebrew symlink mess
on the runner.
- Setup-python@v5 couldn't write to /Users/runner, so ci.yml
python-lint resorted to a hand-rolled Homebrew python3.11 dance.
- Single runner → fan-out contention; CodeQL's 45-min analysis
fought the canvas publish for the one slot.
Changes across the 7 workflows:
- runs-on: [self-hosted, macos, arm64] → ubuntu-latest (every job)
- publish-canvas-image + publish-workspace-server-image:
drop the hand-rolled auths-map step + QEMU setup + buildx v4
→ docker/login-action@v3 + setup-buildx@v3. Linux + amd64
target = native build.
- canary-verify + promote-latest: replace `brew install crane` +
HOMEBREW_NO_* incantations with imjasonh/setup-crane@v0.4.
- codeql.yml: drop `brew install jq` — jq is preinstalled on
ubuntu-latest.
- ci.yml shellcheck: drop the self-hosted existence check —
shellcheck is preinstalled via apt.
- ci.yml python-lint: replace the Homebrew python3.11 path dance
with actions/setup-python@v5 (which works fine on GHA-hosted),
add requirements.txt caching while we're there.
- Remove stale comments referencing "the self-hosted runner",
"Mac mini", keychain, osxkeychain etc.
The self-hosted Mac mini remains in service for private-repo
workflows only. Memory feedback_selfhosted_runner updated to
reflect the public-repo scope carve-out.
Net -96 lines across the 7 files.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CI now detects which paths changed and skips irrelevant jobs:
- Platform (Go): only runs when platform/** changes
- Canvas (Next.js): only runs when canvas/** changes
- Python Lint: only runs when workspace-template/** changes
- Shellcheck: only runs when tests/e2e/** or scripts/** change
- E2E API: only runs when platform/** or tests/e2e/** change
Docs-only PRs (*.md, docs/**) skip all 5 jobs, saving ~15 min of
runner time per PR. Uses dorny/paths-filter for the CI workflow and
native paths: filter for the E2E workflow.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Job-level `concurrency.cancel-in-progress: false` only prevents sibling jobs
from killing each other — it does not protect the parent workflow run from
being cancelled when a new push arrives. Every PR push was cancelling the
in-progress E2E run, forcing manual `gh run rerun` across 7+ active PRs.
Fix: move e2e-api into `.github/workflows/e2e-api.yml` with a workflow-level
concurrency group (`e2e-api-${{ github.ref }}`, cancel-in-progress: false).
New pushes now queue behind the running E2E job instead of cancelling it.
Fast jobs (platform-build, canvas-build, shellcheck, python-lint) stay in
ci.yml and retain normal run-level cancellation for quick iteration feedback.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>