fix(start.sh): read provider/model from /configs/config.yaml (Option B PR-4)

Closes the gap where CP user-data (PR-3, task #197) writes
runtime_config.{model,provider} into /configs/config.yaml but start.sh
only reads HERMES_DEFAULT_MODEL / HERMES_INFERENCE_PROVIDER env vars
that CP doesn't set. Result: every CP-provisioned hermes workspace
booted with the built-in `nousresearch/hermes-4-70b` default and
500'd at first prompt with "No LLM provider configured" — visible
in the 2026-04-30 hongmingwang tenant screenshots.

New scripts/load-workspace-config.sh, sourced by start.sh before the
existing DEFAULT_MODEL/PROVIDER derivation. Reads /configs/config.yaml
via python3 + PyYAML and exports HERMES_DEFAULT_MODEL +
HERMES_INFERENCE_PROVIDER if they're not already set.

Precedence (highest to lowest):
  1. HERMES_* env vars (operator override via workspace secrets)
  2. /configs/config.yaml runtime_config.{model,provider} (canvas UI)
  3. start.sh hard-coded fallback (nousresearch/hermes-4-70b)

Resilience:
  - Missing config.yaml → silent skip (dev containers)
  - Malformed YAML → silent skip (don't kill boot)
  - python3 missing → silent skip
  - PyYAML missing → silent skip
  - Empty/non-dict runtime_config → silent skip

Tests: scripts/test-load-workspace-config.sh — 11 cases covering all
silent-skip paths + happy paths + operator override + non-string scalar
coercion. Existing scripts/test-derive-provider.sh (12 cases) re-verified.

Wires shell tests into CI via a new shell-tests job — those tests
weren't running anywhere before, opportunistically closes that gap.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Hongming Wang 2026-04-30 23:50:23 -07:00
parent 6906a3fe91
commit 43c2569faa
4 changed files with 308 additions and 3 deletions

View File

@ -3,3 +3,25 @@ on: [push, pull_request]
jobs:
validate:
uses: Molecule-AI/molecule-ci/.github/workflows/validate-workspace-template.yml@main
shell-tests:
# Shell-level unit tests for scripts/*.sh. Pre-existing
# test-derive-provider.sh + test-install-prefix-strip.sh weren't wired
# to CI before; landing the new test-load-workspace-config.sh fix
# opportunistically closes that gap. Runs in <5s on a fresh runner.
name: Shell unit tests
runs-on: ubuntu-latest
timeout-minutes: 3
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: "3.11"
# PyYAML is required by load-workspace-config.sh's python helper.
# In production it's transitive via molecule-ai-workspace-runtime;
# in this minimal CI env we install it explicitly so the YAML path
# is exercised instead of the script's silent ImportError fallback.
- run: pip install -q pyyaml
- run: bash scripts/test-derive-provider.sh
- run: bash scripts/test-install-prefix-strip.sh
- run: bash scripts/test-load-workspace-config.sh

118
scripts/load-workspace-config.sh Executable file
View File

@ -0,0 +1,118 @@
#!/usr/bin/env bash
# load-workspace-config.sh — bridge the workspace-level /configs/config.yaml
# (written by molecule-controlplane user-data per task #197) into the
# hermes-specific HERMES_DEFAULT_MODEL / HERMES_INFERENCE_PROVIDER env
# vars that start.sh + derive-provider.sh consume.
#
# Why this exists: PR-3 in the Option B series taught CP to write
# `runtime_config.model` and `runtime_config.provider` into
# /configs/config.yaml at provision time so the canvas Config tab can
# round-trip the operator's pick. start.sh used to only read the
# HERMES_* env vars, which CP doesn't set, so the config.yaml fields
# were silently ignored — every workspace booted with the built-in
# `nousresearch/hermes-4-70b` default and 500'd at first prompt with
# "No LLM provider configured" (visible in the 2026-04-30 hongmingwang
# tenant screenshots).
#
# Precedence (highest to lowest):
# 1. HERMES_DEFAULT_MODEL / HERMES_INFERENCE_PROVIDER env vars
# (operator override via workspace secrets — they win)
# 2. /configs/config.yaml runtime_config.{model,provider}
# (canvas Config tab — set via UI, written by CP user-data)
# 3. start.sh's hard-coded fallback (nousresearch/hermes-4-70b)
#
# Contract:
# Reads: /configs/config.yaml (or $MOLECULE_CONFIG_PATH/config.yaml)
# $HERMES_DEFAULT_MODEL, $HERMES_INFERENCE_PROVIDER
# Writes: HERMES_DEFAULT_MODEL (only if unset and config.yaml has it)
# HERMES_INFERENCE_PROVIDER (only if unset and config.yaml has it)
#
# Failure modes (silent — never blocks boot):
# - /configs/config.yaml doesn't exist → no-op
# - python3 not on PATH → no-op (start.sh's fallback still works)
# - PyYAML not importable → no-op
# - Malformed YAML → no-op
# - runtime_config absent or not a dict → no-op
#
# Resilience over completeness — same philosophy as the claude-code
# adapter's _load_providers fallback. A workspace with a missing or
# malformed config.yaml should still boot and fall through to the
# env-var/built-in defaults instead of dying at this step.
# Source-only safety: don't `set -e` here — this script is `.`-sourced
# by start.sh which already has its own set -euo pipefail. Errors here
# would otherwise kill the parent shell.
_lwc_config_path="${MOLECULE_CONFIG_PATH:-/configs}/config.yaml"
# Skip silently if the file isn't there. Workspaces booted before PR-3
# rolled out, or non-CP-provisioned dev containers, won't have it.
if [ ! -f "$_lwc_config_path" ]; then
unset _lwc_config_path
return 0 2>/dev/null || true
fi
# Skip if python3 is missing — start.sh's existing logic still works.
if ! command -v python3 >/dev/null 2>&1; then
unset _lwc_config_path
return 0 2>/dev/null || true
fi
# Single python invocation extracts both fields and prints `key=value`
# lines. Importing yaml inside try/except so a runtime missing PyYAML
# (shouldn't happen in production — molecule-ai-workspace-runtime
# brings it transitively — but defensive against dev images) yields
# zero output instead of an error. Any exception → empty output → the
# read loop below sets nothing and start.sh keeps its fallbacks.
_lwc_extracted=$(MOLECULE_CONFIG_FILE="$_lwc_config_path" python3 - <<'PYEOF' 2>/dev/null
import os, sys
try:
import yaml
except ImportError:
sys.exit(0)
try:
with open(os.environ["MOLECULE_CONFIG_FILE"]) as f:
data = yaml.safe_load(f) or {}
except Exception:
sys.exit(0)
rc = data.get("runtime_config") or {}
if not isinstance(rc, dict):
sys.exit(0)
# Print one key=value per line; empty values omitted so the bash side
# can use [ -n "$value" ] without ambiguity. Values are stringified so
# non-string YAML scalars (e.g. integers) don't break the read loop.
for env_name, yaml_key in (
("HERMES_DEFAULT_MODEL", "model"),
("HERMES_INFERENCE_PROVIDER", "provider"),
):
v = rc.get(yaml_key)
if v is None:
continue
s = str(v).strip()
if s:
print(f"{env_name}={s}")
PYEOF
)
# Apply the extracted values, but only when the corresponding env var
# isn't already set — operator override (env-var pre-set, e.g. via
# workspace secrets) must beat the YAML.
while IFS='=' read -r _lwc_key _lwc_value; do
[ -z "$_lwc_key" ] && continue
case "$_lwc_key" in
HERMES_DEFAULT_MODEL)
if [ -z "${HERMES_DEFAULT_MODEL:-}" ] && [ -n "$_lwc_value" ]; then
export HERMES_DEFAULT_MODEL="$_lwc_value"
echo "[load-workspace-config] HERMES_DEFAULT_MODEL=$_lwc_value (from $_lwc_config_path)" >&2
fi
;;
HERMES_INFERENCE_PROVIDER)
if [ -z "${HERMES_INFERENCE_PROVIDER:-}" ] && [ -n "$_lwc_value" ]; then
export HERMES_INFERENCE_PROVIDER="$_lwc_value"
echo "[load-workspace-config] HERMES_INFERENCE_PROVIDER=$_lwc_value (from $_lwc_config_path)" >&2
fi
;;
esac
done <<<"$_lwc_extracted"
unset _lwc_config_path _lwc_extracted _lwc_key _lwc_value

View File

@ -0,0 +1,150 @@
#!/usr/bin/env bash
# test-load-workspace-config.sh — offline unit tests for load-workspace-config.sh.
#
# load-workspace-config.sh has one external dep (python3 + pyyaml) but
# both are present on every workspace image and any CI runner with
# Python — making this cheap to exercise as a shell-level test:
# write a fixture YAML, point MOLECULE_CONFIG_PATH at it, source the
# script, check $HERMES_DEFAULT_MODEL / $HERMES_INFERENCE_PROVIDER.
#
# Pins the Option B PR-4 contract that fixed the 2026-04-30 hongmingwang
# tenant 500: hermes start.sh must consume runtime_config.{model,provider}
# from /configs/config.yaml so canvas Config-tab picks reach the gateway.
#
# Covered cases:
# - YAML missing → no-op (silent skip)
# - Malformed YAML → no-op (silent skip, no exit)
# - runtime_config absent → no-op
# - runtime_config.{model,provider} present → exported
# - Pre-set env vars win over YAML values (operator override)
# - Only model present (provider stays unset)
# - Only provider present (model stays unset)
# - Non-string scalar values (int) coerce to string
set -u
HERE="$(cd "$(dirname "$0")" && pwd)"
SCRIPT="$HERE/load-workspace-config.sh"
if [ ! -f "$SCRIPT" ]; then
echo "FAIL: load-workspace-config.sh not found at $SCRIPT"
exit 2
fi
if ! command -v python3 >/dev/null 2>&1; then
echo "SKIP: python3 not on PATH — load-workspace-config.sh requires it"
exit 0
fi
if ! python3 -c "import yaml" 2>/dev/null; then
echo "SKIP: pyyaml not importable — install with: pip install pyyaml"
exit 0
fi
PASS=0
FAIL=0
TMPDIR=$(mktemp -d -t lwc-test.XXXXXX)
trap 'rm -rf "$TMPDIR"' EXIT
# Run the script in a fresh subshell with given env + a fixture
# config.yaml, then echo the resulting env-var values for comparison.
# Subshell isolation prevents env mutations from leaking between cases.
run_case() {
local label="$1" yaml_content="$2" pre_env="$3" expected_model="$4" expected_provider="$5"
local case_dir="$TMPDIR/$$.$RANDOM"
mkdir -p "$case_dir"
if [ "$yaml_content" != "<missing>" ]; then
printf '%s' "$yaml_content" > "$case_dir/config.yaml"
fi
local actual
actual=$(bash -c "
set +e
unset HERMES_DEFAULT_MODEL HERMES_INFERENCE_PROVIDER
$pre_env
export MOLECULE_CONFIG_PATH='$case_dir'
. '$SCRIPT' 2>/dev/null
echo \"MODEL=\${HERMES_DEFAULT_MODEL:-}\"
echo \"PROVIDER=\${HERMES_INFERENCE_PROVIDER:-}\"
")
local actual_model actual_provider
actual_model=$(echo "$actual" | grep -E '^MODEL=' | sed 's/^MODEL=//')
actual_provider=$(echo "$actual" | grep -E '^PROVIDER=' | sed 's/^PROVIDER=//')
if [ "$actual_model" = "$expected_model" ] && [ "$actual_provider" = "$expected_provider" ]; then
echo " PASS $label (model='$actual_model' provider='$actual_provider')"
PASS=$((PASS+1))
else
echo " FAIL $label"
echo " got model='$actual_model' provider='$actual_provider'"
echo " expected model='$expected_model' provider='$expected_provider'"
FAIL=$((FAIL+1))
fi
rm -rf "$case_dir"
}
echo "== load-workspace-config.sh =="
# --- File missing → no-op ---
run_case "missing config.yaml" "<missing>" "" "" ""
# --- Malformed YAML → no-op (silent skip) ---
run_case "malformed YAML" "providers: [not valid: {{{" "" "" ""
# --- runtime_config absent → no-op ---
run_case "no runtime_config block" "name: foo
runtime: hermes
" "" "" ""
# --- runtime_config not a dict → no-op ---
run_case "runtime_config is a string" "runtime_config: \"not-a-dict\"
" "" "" ""
# --- Both fields present → both exported ---
run_case "both fields" "runtime_config:
model: nousresearch/hermes-4-70b
provider: openrouter
" "" "nousresearch/hermes-4-70b" "openrouter"
# --- Operator env var beats YAML for both ---
run_case "env override beats both" "runtime_config:
model: nousresearch/hermes-4-70b
provider: openrouter
" "export HERMES_DEFAULT_MODEL=anthropic/claude-sonnet-4-6
export HERMES_INFERENCE_PROVIDER=anthropic" "anthropic/claude-sonnet-4-6" "anthropic"
# --- Only model present (provider stays unset) ---
run_case "model only" "runtime_config:
model: deepseek/deepseek-v4-pro
" "" "deepseek/deepseek-v4-pro" ""
# --- Only provider present (model stays unset) ---
run_case "provider only" "runtime_config:
provider: zai
" "" "" "zai"
# --- Empty string values are skipped ---
run_case "empty-string fields skipped" "runtime_config:
model: \"\"
provider: \"\"
" "" "" ""
# --- Non-string scalar (int) coerces to string ---
# Not a real-world case but pins the str() coercion in the python helper
# so a future YAML schema with numeric fields doesn't crash the boot.
run_case "non-string scalar coerces" "runtime_config:
model: 42
provider: 7
" "" "42" "7"
# --- Mixed: env var set for one, YAML provides the other ---
run_case "partial env override" "runtime_config:
model: minimax/MiniMax-M2.7-highspeed
provider: minimax
" "export HERMES_INFERENCE_PROVIDER=anthropic" "minimax/MiniMax-M2.7-highspeed" "anthropic"
echo
echo "Total: PASS=$PASS FAIL=$FAIL"
[ "$FAIL" -eq 0 ]

View File

@ -107,6 +107,21 @@ chmod 600 "$ENV_FILE"
# the selection; operators override via HERMES_INFERENCE_PROVIDER
# + HERMES_DEFAULT_MODEL env, or by editing config.yaml at runtime
# inside the container.
# Pull HERMES_DEFAULT_MODEL + HERMES_INFERENCE_PROVIDER out of
# /configs/config.yaml (canvas Config tab values, written by CP
# user-data per task #197). Env-var overrides still win — the helper
# only sets vars that aren't already set. Sourced for env mutation.
# Dockerfile COPYs scripts/ to /app/scripts; fall back to /scripts
# for dev environments running start.sh with a different WORKDIR.
#
# Runs BEFORE the API-key-based auto-selection block below so a
# canvas-set provider/model wins over a key-presence guess. Operators
# who explicitly picked GLM-4.6 in the UI shouldn't get bumped to
# anthropic/* just because ANTHROPIC_API_KEY happens to be in env too.
LOAD_CONFIG_SCRIPT="/app/scripts/load-workspace-config.sh"
[ -f "$LOAD_CONFIG_SCRIPT" ] || LOAD_CONFIG_SCRIPT="/scripts/load-workspace-config.sh"
[ -f "$LOAD_CONFIG_SCRIPT" ] && . "$LOAD_CONFIG_SCRIPT"
# Pick a default model. The fallback used to be `nousresearch/hermes-4-70b`
# unconditionally, which derives PROVIDER=openrouter when no Nous key is
# present — and if OPENROUTER_API_KEY isn't set either, hermes-agent boots
@ -119,9 +134,9 @@ chmod 600 "$ENV_FILE"
# Fix: when HERMES_DEFAULT_MODEL is unset and HERMES_INFERENCE_PROVIDER
# is unset, pick the default model based on which API key is actually
# present in env. Keeps the behaviour-when-everything-is-set unchanged
# (operator-supplied HERMES_DEFAULT_MODEL still wins). Order below is
# rough preference (direct providers preferred over OR routing for the
# same model family).
# (operator-supplied HERMES_DEFAULT_MODEL still wins, including the
# config.yaml-sourced one above). Order below is rough preference
# (direct providers preferred over OR routing for the same model family).
if [ -z "${HERMES_DEFAULT_MODEL:-}" ] && [ -z "${HERMES_INFERENCE_PROVIDER:-}" ]; then
if [ -n "${HERMES_API_KEY:-}" ] || [ -n "${NOUS_API_KEY:-}" ]; then
HERMES_DEFAULT_MODEL="nousresearch/hermes-4-70b"