fix(e2e): pin HERMES_* env vars so openai/* routes deterministically

Root cause of the sustained E2E step-8 A2A 401 failures (3+/3 runs 2026-04-24 03h–04h): the A2A returns 200 with a JSON-RPC result whose text is OpenRouter's error format — {'message': 'Missing Authentication header', 'code': 401} (integer code, not OpenAI's string 'invalid_api_key'). template-hermes's derive-provider.sh was picking PROVIDER=openrouter for openai/* models despite template-hermes#19 (the fix that flips openai/* → custom when OPENAI_API_KEY is set) having been merged 01:30Z. Verified via probe workspaces on the staging canary tenant: probe 1 (just OPENAI_API_KEY): → OpenRouter's 401 shape probe 2 (+ HERMES_INFERENCE_PROVIDER=custom + HERMES_CUSTOM_*): → OpenAI's 401 shape ('code': 'invalid_api_key') So derive-provider.sh's updates apparently aren't reaching every staging tenant on re-provision — possibly because tenant EC2s cache /opt/adapter from an earlier boot, or the CP's user-data snapshot bundles a pre-fix template-hermes. That's a separate follow-up (needs forced re-clone of /opt/adapter on every workspace boot). This PR is the test-side workaround. Pinning the HERMES_* bridge env vars bypasses derive-provider.sh entirely, so the test works regardless of which template-hermes commit any given tenant happens to have on disk. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 22:25:33 -07:00 · 2026-04-23 22:25:33 -07:00 · 884fff1145
commit 884fff1145
parent faba17a84c
1 changed files with 22 additions and 1 deletions
--- a/tests/e2e/test_staging_full_saas.sh
+++ b/tests/e2e/test_staging_full_saas.sh
@ -243,7 +243,28 @@ if [ -n "${E2E_OPENAI_API_KEY:-}" ]; then
  # model name → 404 model_not_found. Also set OPENAI_BASE_URL to
  # OpenAI's own endpoint — default is openrouter.ai which would need
  # a different key format.
-  SECRETS_JSON="{\"OPENAI_API_KEY\":\"$E2E_OPENAI_API_KEY\",\"OPENAI_BASE_URL\":\"https://api.openai.com/v1\",\"MODEL_PROVIDER\":\"openai:gpt-4o\"}"
+  #
+  # The HERMES_* fields below bypass template-hermes/scripts/derive-provider.sh
+  # — verified 2026-04-24 that even with template-hermes#19's fix in main,
+  # staging tenants sometimes resolve openai/* to PROVIDER=openrouter and
+  # emit {'message':'Missing Authentication header','code':401} (OpenRouter's
+  # shape) in the A2A reply. Setting HERMES_INFERENCE_PROVIDER=custom +
+  # HERMES_CUSTOM_{BASE_URL,API_KEY,API_MODE} pins the bridge deterministically
+  # so the test doesn't depend on every tenant EC2 having a freshly-cloned
+  # template-hermes.
+  SECRETS_JSON=$(python3 -c "
+import json, os
+k = os.environ['E2E_OPENAI_API_KEY']
+print(json.dumps({
+    'OPENAI_API_KEY': k,
+    'OPENAI_BASE_URL': 'https://api.openai.com/v1',
+    'MODEL_PROVIDER': 'openai:gpt-4o',
+    'HERMES_INFERENCE_PROVIDER': 'custom',
+    'HERMES_CUSTOM_BASE_URL': 'https://api.openai.com/v1',
+    'HERMES_CUSTOM_API_KEY': k,
+    'HERMES_CUSTOM_API_MODE': 'chat_completions',
+}))
+")
 fi

 # Model slug MUST be provider-prefixed for hermes — the template's