fix(e2e): pin HERMES_* env vars so openai/* routes deterministically

Root cause of the sustained E2E step-8 A2A 401 failures (3+/3 runs
2026-04-24 03h–04h): the A2A returns 200 with a JSON-RPC result whose
text is OpenRouter's error format —
  {'message': 'Missing Authentication header', 'code': 401}
(integer code, not OpenAI's string 'invalid_api_key'). template-hermes's
derive-provider.sh was picking PROVIDER=openrouter for openai/* models
despite template-hermes#19 (the fix that flips openai/* → custom when
OPENAI_API_KEY is set) having been merged 01:30Z.

Verified via probe workspaces on the staging canary tenant:
  probe 1 (just OPENAI_API_KEY): → OpenRouter's 401 shape
  probe 2 (+ HERMES_INFERENCE_PROVIDER=custom + HERMES_CUSTOM_*):
           → OpenAI's 401 shape ('code': 'invalid_api_key')

So derive-provider.sh's updates apparently aren't reaching every
staging tenant on re-provision — possibly because tenant EC2s cache
/opt/adapter from an earlier boot, or the CP's user-data snapshot
bundles a pre-fix template-hermes. That's a separate follow-up (needs
forced re-clone of /opt/adapter on every workspace boot).

This PR is the test-side workaround. Pinning the HERMES_* bridge env
vars bypasses derive-provider.sh entirely, so the test works regardless
of which template-hermes commit any given tenant happens to have on
disk.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Hongming Wang 2026-04-23 22:25:33 -07:00
parent faba17a84c
commit 884fff1145

View File

@ -243,7 +243,28 @@ if [ -n "${E2E_OPENAI_API_KEY:-}" ]; then
# model name → 404 model_not_found. Also set OPENAI_BASE_URL to
# OpenAI's own endpoint — default is openrouter.ai which would need
# a different key format.
SECRETS_JSON="{\"OPENAI_API_KEY\":\"$E2E_OPENAI_API_KEY\",\"OPENAI_BASE_URL\":\"https://api.openai.com/v1\",\"MODEL_PROVIDER\":\"openai:gpt-4o\"}"
#
# The HERMES_* fields below bypass template-hermes/scripts/derive-provider.sh
# — verified 2026-04-24 that even with template-hermes#19's fix in main,
# staging tenants sometimes resolve openai/* to PROVIDER=openrouter and
# emit {'message':'Missing Authentication header','code':401} (OpenRouter's
# shape) in the A2A reply. Setting HERMES_INFERENCE_PROVIDER=custom +
# HERMES_CUSTOM_{BASE_URL,API_KEY,API_MODE} pins the bridge deterministically
# so the test doesn't depend on every tenant EC2 having a freshly-cloned
# template-hermes.
SECRETS_JSON=$(python3 -c "
import json, os
k = os.environ['E2E_OPENAI_API_KEY']
print(json.dumps({
'OPENAI_API_KEY': k,
'OPENAI_BASE_URL': 'https://api.openai.com/v1',
'MODEL_PROVIDER': 'openai:gpt-4o',
'HERMES_INFERENCE_PROVIDER': 'custom',
'HERMES_CUSTOM_BASE_URL': 'https://api.openai.com/v1',
'HERMES_CUSTOM_API_KEY': k,
'HERMES_CUSTOM_API_MODE': 'chat_completions',
}))
")
fi
# Model slug MUST be provider-prefixed for hermes — the template's