forked from molecule-ai/molecule-core
Root cause of the sustained E2E step-8 A2A 401 failures (3+/3 runs
2026-04-24 03h–04h): the A2A returns 200 with a JSON-RPC result whose
text is OpenRouter's error format —
{'message': 'Missing Authentication header', 'code': 401}
(integer code, not OpenAI's string 'invalid_api_key'). template-hermes's
derive-provider.sh was picking PROVIDER=openrouter for openai/* models
despite template-hermes#19 (the fix that flips openai/* → custom when
OPENAI_API_KEY is set) having been merged 01:30Z.
Verified via probe workspaces on the staging canary tenant:
probe 1 (just OPENAI_API_KEY): → OpenRouter's 401 shape
probe 2 (+ HERMES_INFERENCE_PROVIDER=custom + HERMES_CUSTOM_*):
→ OpenAI's 401 shape ('code': 'invalid_api_key')
So derive-provider.sh's updates apparently aren't reaching every
staging tenant on re-provision — possibly because tenant EC2s cache
/opt/adapter from an earlier boot, or the CP's user-data snapshot
bundles a pre-fix template-hermes. That's a separate follow-up (needs
forced re-clone of /opt/adapter on every workspace boot).
This PR is the test-side workaround. Pinning the HERMES_* bridge env
vars bypasses derive-provider.sh entirely, so the test works regardless
of which template-hermes commit any given tenant happens to have on
disk.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|---|---|---|
| .. | ||
| _extract_token.py | ||
| _lib.sh | ||
| STAGING_SAAS_E2E.md | ||
| test_a2a_e2e.sh | ||
| test_activity_e2e.sh | ||
| test_api.sh | ||
| test_claude_code_e2e.sh | ||
| test_comprehensive_e2e.sh | ||
| test_dev_mode.sh | ||
| test_saas_tenant.sh | ||
| test_staging_full_saas.sh | ||