fix(e2e/staging-saas): send provider-prefixed model slug for hermes

The E2E posts a bare "gpt-4o" as the workspace model. Hermes
template's derive-provider.sh parses the slug PREFIX (before the
slash) to set HERMES_INFERENCE_PROVIDER at install time. With no
prefix, provider falls back to hermes's auto-detect, which picks
the compiled-in Anthropic default. Hermes-agent then tries the
Anthropic API with the OpenAI key the E2E passed in SECRETS_JSON
and returns 401 "Invalid API key" at step 8/11 (A2A call).

Same trap PR #1714 fixed for the canvas Create flow. The E2E
was quietly broken on the same vector — it masked before today
because workspaces never reached "online" (pre-#231 install.sh
hook missing on staging; staging now deploys #231 via CP #236).

Fix: pin MODEL_SLUG="openai/gpt-4o" since the E2E's secret is
always the OpenAI key. Non-hermes runtimes ignore the prefix.

Now that both layers are fixed (install.sh runs AND the slug
steers hermes to OpenAI), the E2E should reach step 11/11.

Evidence from run 24822173171 attempt 2 (post-CP-#236 deploy):
  07:55:25  CP reachable
  07:57:28  Tenant provisioning complete (2:03, canary)
  08:04:56  Workspace 52107c1a online (7:28, install.sh ran!)
  08:05:06  Workspace 34a286df online
  08:05:06  A2A 401 — hermes tried Anthropic with OpenAI key

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Hongming Wang 2026-04-23 01:07:08 -07:00
parent e739b49938
commit 786a8470e5

View File

@ -246,10 +246,20 @@ if [ -n "${E2E_OPENAI_API_KEY:-}" ]; then
SECRETS_JSON="{\"OPENAI_API_KEY\":\"$E2E_OPENAI_API_KEY\",\"OPENAI_BASE_URL\":\"https://api.openai.com/v1\",\"MODEL_PROVIDER\":\"openai:gpt-4o\"}"
fi
# Model slug MUST be provider-prefixed for hermes — the template's
# derive-provider.sh parses the slug prefix (`openai/…`, `anthropic/…`,
# `minimax/…`) to set HERMES_INFERENCE_PROVIDER at install time. A bare
# "gpt-4o" has no prefix → provider falls back to hermes auto-detect →
# picks Anthropic default → tries Anthropic API with the OpenAI key →
# 401 on A2A. Same trap that trapped prod users in PR #1714. We pin
# "openai/gpt-4o" here because the E2E's secret is always the OpenAI
# key; non-hermes runtimes ignore the prefix.
MODEL_SLUG="openai/gpt-4o"
log "5/11 Provisioning parent workspace (runtime=$RUNTIME)..."
PARENT_RESP=$(tenant_call POST /workspaces \
-H "Content-Type: application/json" \
-d "{\"name\":\"E2E Parent\",\"runtime\":\"$RUNTIME\",\"tier\":2,\"model\":\"gpt-4o\",\"secrets\":$SECRETS_JSON}")
-d "{\"name\":\"E2E Parent\",\"runtime\":\"$RUNTIME\",\"tier\":2,\"model\":\"$MODEL_SLUG\",\"secrets\":$SECRETS_JSON}")
PARENT_ID=$(echo "$PARENT_RESP" | python3 -c "import json,sys; print(json.load(sys.stdin)['id'])")
log " PARENT_ID=$PARENT_ID"
@ -259,7 +269,7 @@ if [ "$MODE" = "full" ]; then
log "6/11 Provisioning child workspace..."
CHILD_RESP=$(tenant_call POST /workspaces \
-H "Content-Type: application/json" \
-d "{\"name\":\"E2E Child\",\"runtime\":\"$RUNTIME\",\"tier\":2,\"model\":\"gpt-4o\",\"parent_id\":\"$PARENT_ID\",\"secrets\":$SECRETS_JSON}")
-d "{\"name\":\"E2E Child\",\"runtime\":\"$RUNTIME\",\"tier\":2,\"model\":\"$MODEL_SLUG\",\"parent_id\":\"$PARENT_ID\",\"secrets\":$SECRETS_JSON}")
CHILD_ID=$(echo "$CHILD_RESP" | python3 -c "import json,sys; print(json.load(sys.stdin)['id'])")
log " CHILD_ID=$CHILD_ID"
else