Tonight's wire-real E2E sweep exposed 12+ root causes across the post-
#87 template extraction. Most would have been caught by an actual
provision-and-online test running on each template — but the test only
covered claude-code + hermes. Extending it to cover all 8 ensures any
future regression in any template fails the test, not production.
What's added:
- run_openai_runtime(runtime, label): generic provisioner for the 5
OpenAI-backed templates (langgraph, crewai, autogen, deepagents,
openclaw). Same shape as run_hermes minus the HERMES_* config block
that hermes-agent needs.
- run_gemini_cli: separate function — gemini-cli wants a Google AI
key (E2E_GEMINI_API_KEY), not OpenAI.
- Each new runtime registered in the dispatch loop. New `all` keyword
for E2E_RUNTIMES runs every covered runtime.
claude-code + hermes keep their dedicated functions; both have unique
provisioning quirks (claude-code OAuth + claude-code-specific volume
mounts; hermes 15-min cold-boot) that don't generalize cleanly.
Skip-if-no-key pattern matches the existing one — partially-keyed CI
gets clean skips, not false-fails.
Usage:
E2E_OPENAI_API_KEY=... E2E_RUNTIMES=langgraph ./test_priority_runtimes_e2e.sh
E2E_OPENAI_API_KEY=... E2E_RUNTIMES=all ./test_priority_runtimes_e2e.sh
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>