From 884fff1145288f65fc2e7b43899d9095aa968759 Mon Sep 17 00:00:00 2001 From: Hongming Wang Date: Thu, 23 Apr 2026 22:25:33 -0700 Subject: [PATCH] fix(e2e): pin HERMES_* env vars so openai/* routes deterministically MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Root cause of the sustained E2E step-8 A2A 401 failures (3+/3 runs 2026-04-24 03h–04h): the A2A returns 200 with a JSON-RPC result whose text is OpenRouter's error format — {'message': 'Missing Authentication header', 'code': 401} (integer code, not OpenAI's string 'invalid_api_key'). template-hermes's derive-provider.sh was picking PROVIDER=openrouter for openai/* models despite template-hermes#19 (the fix that flips openai/* → custom when OPENAI_API_KEY is set) having been merged 01:30Z. Verified via probe workspaces on the staging canary tenant: probe 1 (just OPENAI_API_KEY): → OpenRouter's 401 shape probe 2 (+ HERMES_INFERENCE_PROVIDER=custom + HERMES_CUSTOM_*): → OpenAI's 401 shape ('code': 'invalid_api_key') So derive-provider.sh's updates apparently aren't reaching every staging tenant on re-provision — possibly because tenant EC2s cache /opt/adapter from an earlier boot, or the CP's user-data snapshot bundles a pre-fix template-hermes. That's a separate follow-up (needs forced re-clone of /opt/adapter on every workspace boot). This PR is the test-side workaround. Pinning the HERMES_* bridge env vars bypasses derive-provider.sh entirely, so the test works regardless of which template-hermes commit any given tenant happens to have on disk. Co-Authored-By: Claude Opus 4.7 (1M context) --- tests/e2e/test_staging_full_saas.sh | 23 ++++++++++++++++++++++- 1 file changed, 22 insertions(+), 1 deletion(-) diff --git a/tests/e2e/test_staging_full_saas.sh b/tests/e2e/test_staging_full_saas.sh index 317c761b..aea0f8a0 100755 --- a/tests/e2e/test_staging_full_saas.sh +++ b/tests/e2e/test_staging_full_saas.sh @@ -243,7 +243,28 @@ if [ -n "${E2E_OPENAI_API_KEY:-}" ]; then # model name → 404 model_not_found. Also set OPENAI_BASE_URL to # OpenAI's own endpoint — default is openrouter.ai which would need # a different key format. - SECRETS_JSON="{\"OPENAI_API_KEY\":\"$E2E_OPENAI_API_KEY\",\"OPENAI_BASE_URL\":\"https://api.openai.com/v1\",\"MODEL_PROVIDER\":\"openai:gpt-4o\"}" + # + # The HERMES_* fields below bypass template-hermes/scripts/derive-provider.sh + # — verified 2026-04-24 that even with template-hermes#19's fix in main, + # staging tenants sometimes resolve openai/* to PROVIDER=openrouter and + # emit {'message':'Missing Authentication header','code':401} (OpenRouter's + # shape) in the A2A reply. Setting HERMES_INFERENCE_PROVIDER=custom + + # HERMES_CUSTOM_{BASE_URL,API_KEY,API_MODE} pins the bridge deterministically + # so the test doesn't depend on every tenant EC2 having a freshly-cloned + # template-hermes. + SECRETS_JSON=$(python3 -c " +import json, os +k = os.environ['E2E_OPENAI_API_KEY'] +print(json.dumps({ + 'OPENAI_API_KEY': k, + 'OPENAI_BASE_URL': 'https://api.openai.com/v1', + 'MODEL_PROVIDER': 'openai:gpt-4o', + 'HERMES_INFERENCE_PROVIDER': 'custom', + 'HERMES_CUSTOM_BASE_URL': 'https://api.openai.com/v1', + 'HERMES_CUSTOM_API_KEY': k, + 'HERMES_CUSTOM_API_MODE': 'chat_completions', +})) +") fi # Model slug MUST be provider-prefixed for hermes — the template's