PR #763 (feat/issue-733-agents-md-impl) branched before PR #743 landed the
claude-opus-4-7 model default upgrade. config.py still had the old
claude-sonnet-4-6 default, which would have silently regressed the upgrade.
Restore both occurrences:
- WorkspaceConfig.model default: claude-sonnet-4-6 → claude-opus-4-7
- load_config() fallback: claude-sonnet-4-6 → claude-opus-4-7
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Turns the QA TDD spec from PR #755 GREEN: all 14 tests pass.
Changes:
- workspace-template/agents_md.py (new): generate_agents_md(config_dir, output_path)
Writes AAIF-compliant AGENTS.md with name, role, description, A2A endpoint,
and MCP tools sections. AGENT_URL env var overrides the derived localhost URL.
Falls back to description when role is absent (graceful legacy compat).
Always overwrites — no stale-file guard.
- workspace-template/config.py: add role field to WorkspaceConfig
New top-level field `role: str = ""` with load_config support.
Falls back to description in agents_md.py for backward compat.
- workspace-template/main.py: wire generate_agents_md into startup (step 1a)
Fires after load_config + preflight. Non-fatal: exception is caught and
printed as a warning so a bad /workspace mount never kills the agent.
- workspace-template/tests/test_agents_md.py (new): pulled from PR #755 branch
Test results:
pytest tests/test_agents_md.py -v → 14 passed (was: 14 RED / import error)
pytest (full suite) → 1044 passed, 2 xfailed
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The Baidu MeDo hackathon integration was sitting in builtin_tools/ as dead
code — not imported by any loader but shipped with every workspace image,
misleadingly suggesting it was a core builtin.
Changes:
- Move builtin_tools/medo.py → plugins/molecule-medo/skills/medo-tools/scripts/medo.py
(git detects this as a rename — no code changes, identical tool surface)
- Add plugins/molecule-medo/plugin.yaml (manifest: name, version, runtimes, tags)
- Add plugins/molecule-medo/skills/medo-tools/SKILL.md (frontmatter + setup docs)
- Move workspace-template/tests/test_medo.py → plugins/molecule-medo/tests/test_medo.py
(update _MEDO_PATH to resolve from plugin root; add conftest.py for langchain mock)
- Update .gitignore: change /plugins/ blanket ignore to /plugins/* so this plugin
can be tracked until it gets its own standalone repo
Acceptance criteria met:
- builtin_tools/medo.py removed from core
- plugins/molecule-medo/ created with identical tool surface (9/9 tests pass)
- cd workspace-template && pytest → 1021 passed, 2 xfailed (no regression)
- MEDO_API_KEY was never in default provisioning (.env.example / config.py clean)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace the anthropic:claude-sonnet-4-6 default across config, handlers,
env example, and litellm proxy config. All tests updated to match the new
default; sonnet-4-6 alias kept in litellm_config.yml for pinned workspaces.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The optional $1 argument flowed directly into Docker image tag names
(workspace-template:<runtime>) and filesystem paths (RUNTIME_DIR) with
no validation, enabling path traversal or unexpected tag injection via
e.g. `bash rebuild-runtime-images.sh '../evil'`.
Fix: introduce VALID_RUNTIMES allowlist and validate $1 against it
before setting RUNTIMES. Any unlisted value now exits with a clear
error message. The RUNTIMES array is populated from VALID_RUNTIMES
when no argument is given, keeping the all-runtimes default path.
shellcheck clean; $1 only appears inside the validated block.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Bug 1: TMPDIR is a POSIX-reserved variable used by mktemp, Docker
BuildKit, and git subprocesses as their system temp directory.
Overwriting it redirected those tools to the build context, causing
unpredictable failures. Renamed all 6 occurrences to RUNTIME_DIR.
Bug 2: `docker build ... | grep` made grep's exit code (0=match,
1=no match) determine if the build succeeded, not docker's. Fixed by
reading PIPESTATUS[0] immediately after the pipeline so docker's real
exit code drives the SUCCESS/FAILED tracking.
Also fixed two pre-existing shellcheck warnings:
- SC2034: removed unused REPO_ROOT variable
- SC2064: trap now uses single quotes so TMPBASE expands at signal time
shellcheck clean with no warnings.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Cover the four paths that were exercised only via mock in the
_build_options tests: valid YAML, missing file, malformed YAML,
and empty file (safe_load → None → {} via `or {}`).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds _load_config_dict() helper to ClaudeSDKExecutor and wires the new
effort and task_budget config fields into _build_options() before the
Anthropic API call:
- effort (str): low|medium|high|xhigh|max — populates output_config.effort
- task_budget (int): advisory total-token budget; must be >= 20000 when set;
automatically adds task-budgets-2026-03-13 beta header
Also adds WorkspaceConfig.effort and WorkspaceConfig.task_budget fields in
config.py and 5 acceptance tests covering all code paths.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Standalone adapter images (langgraph, claude-code, etc.) use
ENTRYPOINT ["molecule-runtime"] which bypasses entrypoint.sh. PR #640's
entrypoint.sh fix therefore never runs in adapter images. The correct fix
is to bake git config --system into the image at build time.
This script:
1. Rebuilds workspace-template:base from the monorepo Dockerfile (which
has the fixed entrypoint.sh and molecule-git-token-helper.sh)
2. For each of the 6 runtime adapters: clones the standalone repo, patches
its Dockerfile to COPY the credential helper and run git config --system,
then builds the final image tagged as workspace-template:<runtime>
Usage (run on the host machine, not inside a workspace container):
bash workspace-template/rebuild-runtime-images.sh # all 6
bash workspace-template/rebuild-runtime-images.sh claude-code # one
See issue #658 for the architectural explanation.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Both PRs restructured the same chat.completions.create() call to use a
create_kwargs dict. Resolved by keeping both __init__ params and both
conditionals in the create_kwargs block.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Two bugs prevented the git credential helper (merged in #567) from ever
running at workspace boot:
1. Dockerfile never COPY'd scripts/molecule-git-token-helper.sh into the
image — only gh-wrapper.sh was copied from scripts/. Result: the helper
binary did not exist in any built container image.
2. entrypoint.sh looked for the helper at /workspace-template/scripts/...
but /workspace-template/ is not a path that exists inside the container
(WORKDIR is /app, no /workspace-template mount). The `if [ -f ... ]`
guard silently fell through to the WARNING branch on every boot since
#567 merged — the helper was never registered.
Fix:
- Add `COPY scripts/molecule-git-token-helper.sh ./scripts/` to Dockerfile
so the script lands at /app/scripts/ in the image (matching WORKDIR /app)
- Update HELPER_SCRIPT path in entrypoint.sh from
/workspace-template/scripts/... to /app/scripts/...
After this fix, every workspace container registers the helper at boot via:
git config --global credential.https://github.com.helper \
"!/app/scripts/molecule-git-token-helper.sh"
Closes#613.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Unconditional list.extend() on repeated plugin install caused every
hook handler to be appended on each reinstall, leading to 3-4x duplicate
firings per event (PreToolUse, PostToolUse, Stop, etc.).
Fix: before appending each incoming handler, compute a fingerprint of
(matcher, frozenset-of-commands). Skip append if the fingerprint is
already present in the merged list. First-time installs are unaffected —
new handlers still land correctly.
Adds 7 unit tests covering: first install, double install, triple install,
different-matcher co-existence, different-command co-existence, existing
user hook preservation, and top-level key merge semantics.
Closes#566
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds response_format support to HermesA2AExecutor so callers can request
structured JSON output via the OpenAI-native response_format parameter.
Changes:
- _validate_response_format(): validates type (json_schema/json_object/text)
and required sub-fields; returns None if valid, error message if invalid
- HermesA2AExecutor.__init__: new response_format kwarg, stored as _response_format
- execute(): validates before API call — invalid schema enqueues error and
returns early without hitting Hermes API; valid and non-None adds
response_format= to create_kwargs; None omits the field entirely
Tests (12 new):
- _validate_response_format: all valid types, invalid type, missing fields
- constructor stores response_format correctly
- valid response_format forwarded to API call
- response_format omitted when None (no key in call kwargs)
- invalid schema → error message enqueued, API not called
Closes#498
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Instead of injecting tool definitions as text into the system prompt,
HermesA2AExecutor now accepts a tools: list[dict] | None constructor
parameter containing OpenAI-format tool definitions and forwards them
via the native tools= parameter on chat.completions.create().
Empty list / None rule: when tools is falsy, the tools key is omitted
from the API call entirely — never sent as tools=[] — so providers
that reject an empty tools array don't return a 400.
Tool-call response handling: when the model returns finish_reason
"tool_calls" with no text content, the executor serialises the call
list as a JSON string and enqueues it as the A2A reply. This keeps
the executor thin (single API call per turn, no ReAct loop) while
surfacing function-call intent in a structured, parseable format.
Changes:
- HermesA2AExecutor.__init__: new tools kwarg; stored as self._tools
(copy; mutating the input list has no effect)
- execute(): builds create_kwargs dict and conditionally adds tools=
only when self._tools is non-empty; handles tool_calls response
- Module docstring: new "Native tools (#497)" section with schema
reference and edge-case explanation
Tests (12 new, 47 total in hermes test file, 1002 total suite):
- tools stored correctly in constructor (copy, None, [], non-empty)
- non-empty tools forwarded as tools= in API call
- multiple tools all forwarded
- empty list ([] and None and default) → tools key absent from call
- model tool_call response → JSON-serialised list as A2A reply
- multiple tool_calls → all in JSON reply
- text content present → text wins over tool_calls
Closes#497
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Root cause: the github-app-auth plugin injects GH_TOKEN + GITHUB_TOKEN
into each workspace container's env at provision time (EnvMutator). Those
are GitHub App installation tokens with a fixed ~60 min TTL. The plugin
has an in-process cache that proactively refreshes 5 min before expiry —
but the workspace env is set once at container start and never updated.
Any workspace alive >60 min ends up with an expired token.
Fix (Option B — on-demand endpoint):
pkg/provisionhook:
- Add TokenProvider interface: Token(ctx) (token, expiresAt, error)
Lives in pkg/ (public) so the github-app-auth plugin can implement it.
- Add Registry.FirstTokenProvider() — discovers the first mutator that
also satisfies TokenProvider via interface assertion. Safe under
concurrent reads (existing RWMutex).
platform/internal/handlers/github_token.go:
- New GitHubTokenHandler serving GET /admin/github-installation-token
- Delegates to the registered TokenProvider (plugin cache — always fresh)
- 404 if no GitHub App configured, 500 + [github] prefix log on error
- Never logs the token itself
platform/internal/handlers/workspace.go:
- Add TokenRegistry() getter so the router can wire the handler without
coupling to WorkspaceHandler internals
platform/internal/router/router.go:
- Register GET /admin/github-installation-token under AdminAuth
workspace-template/:
- scripts/molecule-git-token-helper.sh — git credential helper; calls
the platform endpoint on every push/fetch; falls through to next
helper (operator PAT) if platform unreachable
- entrypoint.sh — configure the credential helper at startup
Why Option B over Option A (background goroutine):
- The plugin already has its own cache refresh; nothing to refresh here.
- Pushing env updates into running containers requires docker exec, which
the architecture explicitly rejects (issue #547 "Alternatives").
- Pull-based is stateless, trivially testable, zero extra goroutines.
Closes#547
Co-authored-by: Molecule AI DevOps Engineer <devops-engineer@agents.moleculesai.app>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Implements WorkspaceAdapter for Google's Agent Development Kit (google-adk
v1.x, Apache-2.0). Ships four files under workspace-template/adapters/google-adk/:
- adapter.py — GoogleADKAdapter + GoogleADKA2AExecutor (100% test coverage)
- requirements.txt — pinned google-adk==1.30.0 + google-genai>=1.16.0
- README.md — overview, install, usage, config, architecture diagram
- test_adapter.py — 46 unit tests, all passing, no live API calls
Supports AI Studio (GOOGLE_API_KEY) and Vertex AI (GOOGLE_GENAI_USE_VERTEXAI=1).
Model prefix stripping: "google:gemini-2.0-flash" → "gemini-2.0-flash".
Error sanitization mirrors the hermes_executor convention.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Hermes 4 is a hybrid-reasoning model trained on <think> tags; without asking
for thinking we pay flagship $/tok but get non-reasoning quality. This adds a
dedicated HermesA2AExecutor that dispatches to any OpenAI-compat endpoint
(OpenRouter, Nous Portal) and enables native reasoning for Hermes 4 models.
Key decisions:
- ProviderConfig + _reasoning_supported() detect Hermes 4 by model slug
substring ("hermes-4", "hermes4") — case-insensitive, no config needed
- extra_body={"reasoning": {"enabled": True}} sent only to Hermes 4 entries;
Hermes 3 path unchanged (no extra_body, no regressions)
- choices[0].message.reasoning + reasoning_details extracted and written to
an OTEL span (hermes.reasoning) — deliberately NOT echoed in the A2A reply
so the reasoning trace never contaminates the agent's next-turn context
- API key / base URL default to OPENAI_API_KEY / OPENAI_BASE_URL env vars
with openrouter.ai/api/v1 as the fallback endpoint
- _client injection parameter for unit tests (no live API calls needed)
- Error sanitization: only exception class name surfaces to user (mirrors
sanitize_agent_error() convention from cli_executor.py)
Test coverage: 35 tests, 100% coverage on all new code paths including:
- _reasoning_supported() — Hermes 4/3/unknown/empty/uppercase
- ProviderConfig — field assignment and capability flags
- extra_body presence for Hermes 4, absence for Hermes 3
- reasoning not in A2A reply; _log_reasoning called when trace present
- reasoning_details forwarded; span attributes set correctly
- Telemetry failure swallowed (never blocks response)
- API error → sanitized class-name-only reply
- cancel() → TaskStatusUpdateEvent(state=canceled)
Full suite: 990 passed, 0 failed (no regressions).
Resolves#496
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Code review fixes:
- 🟡#1: Replace python3 with jq in Dockerfile template stages (~50MB → ~2MB)
- 🟡#2: Add clone count verification to scripts/clone-manifest.sh
(set -e + expected vs actual count check — fails build if any clone fails)
- 🟡#3: Drop 'unsafe-eval' from CSP (not needed for Next.js production
standalone builds, only dev mode). Updated test assertion.
- 🟡#4: Remove broken pyproject.toml from workspace-template/ (it claimed
to package as molecule-ai-workspace-runtime but the directory structure
didn't match — the real package ships from the standalone repo)
- 🔵#1: Add version-pinning TODO comment to manifest.json
- 🔵#3: Add full repo URLs + test counts for SDK/MCP/CLI/runtime in CLAUDE.md
Security (GitGuardian alert):
- Removed Telegram bot token (8633739353:AA...) from template-molecule-dev
pm/.env — replaced with ${TELEGRAM_BOT_TOKEN} placeholder
- Removed Claude OAuth token (sk-ant-oat01-...) from template-molecule-dev
root .env — replaced with ${CLAUDE_CODE_OAUTH_TOKEN} placeholder
- Both tokens need immediate rotation by the operator
Tests: Platform middleware tests updated + all pass.
PR #471 removed Dockerfiles/requirements from adapters/ but left the
Python source files. This commit finishes the extraction:
1. Moved shared_runtime.py → workspace-template/shared_runtime.py
(used by prompt.py, a2a_executor.py, coordinator.py — not adapter-specific)
2. Moved base.py → workspace-template/adapter_base.py
(BaseAdapter + AdapterConfig — the interface adapters implement)
3. Updated imports in prompt.py, a2a_executor.py, coordinator.py
4. Rewritten adapters/__init__.py as a thin shim that:
- Reads ADAPTER_MODULE env var (production: standalone repos set this)
- Re-exports BaseAdapter/AdapterConfig for backward compat
5. adapters/base.py + adapters/shared_runtime.py remain as re-export shims
6. Deleted all 8 adapter subdirectories (autogen, claude_code, crewai,
deepagents, gemini_cli, hermes, langgraph, openclaw)
7. Removed 11 test files that imported adapter-specific code
Tests: 955 passed, 0 failed (down from 1216 — the difference is
adapter-specific tests that moved to standalone repos).
test_first_party_plugins.py, test_plugins_builtins_drift.py, and
test_hermes_adapter.py all referenced files under plugins/ and
adapters/ which were extracted to standalone repos. These tests
belong in those repos now, not in the core workspace-template.
1216 passed, 0 failed after removal.
These files have moved to the standalone template repos:
https://github.com/Molecule-AI/molecule-ai-workspace-template-<runtime>
Each adapter repo now has its own Dockerfile (FROM python:3.11-slim + pip install
molecule-ai-workspace-runtime) and requirements.txt. The adapter Python source
files (.py) stay in the monorepo for local development and testing.
Adapters removed from workspace-template/adapters/*/: Dockerfile, requirements.txt
Adapters retained: adapter.py, __init__.py (+ hermes extras: escalation.py, executor.py, providers.py)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Every agent in the template currently uses the same GitHub PAT, so
\`gh pr list\` shows every PR as authored by the CEO's account with
no signal which agent opened each one. Commits already carry
per-agent authors (GIT_AUTHOR_NAME from #402). This wrapper extends
the identity split to the PR/issue metadata surface layer that
commit attribution can't reach.
## How it works
A tiny bash script installed at \`/usr/local/bin/gh\`, which sits
earlier in PATH than the real binary at \`/usr/bin/gh\`. For \`gh pr
create\` and \`gh issue create\`:
- Title gets prefixed with \`[Role Name]\` — e.g. \`[Frontend Engineer]
fix: canvas grid index\`
- Body gets \`\n\n---\n_Opened by: Molecule AI <Role>_\` appended
Role is read from \`GIT_AUTHOR_NAME\` which the platform provisioner
sets to \`Molecule AI <Role>\` (shipped with #402). Accepts both
\`--title X\` and \`--title=X\` forms. Same for \`--body\`.
Anything that isn't \`gh pr create\` or \`gh issue create\` (e.g.
\`gh pr list\`, \`gh issue view\`, \`gh run watch\`) passes through
untouched. No behaviour change for read-side operations.
## Idempotent
- If the title already starts with \`[...]\` the wrapper does not
re-prefix. \`gh pr edit\` flows that resubmit title won't layer
multiple tags.
- If the body already contains \`Opened by: Molecule AI\` the footer
is not re-appended.
## Fail-open
When \`GIT_AUTHOR_NAME\` is absent or doesn't start with \`Molecule
AI \`, the wrapper exec's the real gh with unchanged args. No call
is ever blocked by this script.
## Test coverage
\`tests/test_gh_wrapper.sh\` — 12 cases, no network, no Docker:
- Passthrough for non-create subcommands (pr list)
- pr create title prefix + body footer
- issue create with \`--title=X\` \`--body=X\` equals-form
- Idempotent title re-prefix
- Idempotent body footer (count = 1 after two applies)
- Missing GIT_AUTHOR_NAME → passthrough, title preserved
- Malformed GIT_AUTHOR_NAME (not "Molecule AI ...") → passthrough
All 12 pass. Test script is standalone bash + a temp fake gh binary
that echoes argv; safe to run in CI's Python Lint & Test job via
subprocess shell-out.
## Deployment note
This lands in the workspace image. Existing containers keep their
old /usr/bin/gh until the image is rebuilt and they're re-provisioned
(POST /workspaces/:id/restart {}). No migration required; the wrapper
just starts tagging PRs once the new image is rolled.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Phase 3 escalation ladder added `from .escalation import ...` to
executor.py. The phase-2 dispatch tests load executor.py via
`exec(compile(src, ...))` with the relative import rewritten — this
broke because (a) the rewrite didn't know about escalation and (b) the
exec namespace lacked `__name__`, which executor.py needs at import
time for `logging.getLogger(__name__)`.
Fix both in all 8 exec sites:
- Rewrite both `from .providers import` AND `from .escalation import`
- Pre-register escalation + providers in sys.modules under the fake
package name
- Seed the exec namespace with `__name__ = "hermes_executor_under_test"`
54/54 hermes tests pass (28 escalation truth-table + 6 ladder-integration
+ 20 existing phase-2 dispatch).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Ships scoped Phase 3 of the Hermes multi-provider work. Every workspace
can now declare an ordered list of (provider, model) rungs; when the
pinned model hits rate-limit / 5xx / context-length / overload, the
executor advances to the next rung before raising.
## Why
3× Claude Max saturation is a routine occurrence now — the "first 429 on
a batch delegation" is the common path, not the exception. A workspace
pinned to Haiku that hits a context-length limit has no recovery today;
same for Sonnet hitting rate-limit mid-synthesis. Escalation promotes
to the next tier for that single call, preserves coordination, avoids
restart cascades.
## New module: adapters/hermes/escalation.py
- ``LadderRung(provider, model)`` — one config entry.
- ``parse_ladder(raw)`` — tolerant config parser; skips malformed rungs
with a warning rather than raising so boot stays resilient.
- ``should_escalate(exc) -> bool`` — truth table over 15+ error shapes:
- Typed classes (RateLimitError, OverloadedError, APITimeoutError,
APIConnectionError, InternalServerError)
- Context-length markers (each provider uses different phrasing)
- Gateway markers (502/503/504, overloaded, temporarily unavailable)
- Status-code substrings (429, 529, 5xx)
- Hard-rejects auth failures (401/403/invalid_api_key) even if the
outer exception class is RateLimitError — wrapping case matters.
## Executor wiring
``HermesA2AExecutor`` now accepts ``escalation_ladder`` in its
constructor + ``create_executor()`` factory. ``_do_inference()`` walks
the ladder:
1. First attempt = pinned provider:model (matches pre-ladder behaviour)
2. On escalatable error, try each rung in order
3. On non-escalatable error, raise immediately (auth, malformed payload)
4. On exhaustion, raise the last error
Rung switches temporarily rebind ``self.provider_cfg`` / ``self.model``
/ ``self.api_key`` / ``self.base_url`` in a try/finally, so any raised
error leaves the executor in its original state for the next call. Key
resolution for non-pinned rungs goes through ``resolve_provider`` which
reads the rung-provider's env vars fresh.
## Config shape
``config.yaml`` (rendered from ``org.yaml`` → workspace secrets):
runtime_config:
escalation_ladder:
- provider: gemini
model: gemini-2.5-flash
- provider: anthropic
model: claude-sonnet-4-5-20250929
- provider: anthropic
model: claude-opus-4-1-20250805
Empty / absent = single-shot behaviour, full backwards-compat with
every existing workspace.
## Tests
34 passing, all isolated (no network):
- ``test_hermes_escalation.py`` (28): parser + truth-table across
rate-limit, overload, context-length, gateway, auth-reject, unrelated
exceptions, and case-insensitivity.
- ``test_hermes_ladder_integration.py`` (6): no-ladder single call,
ladder-not-triggered on success, escalate-on-rate-limit-then-succeed,
stop-on-non-escalatable, raise-last-error-when-exhausted, skip-
unknown-provider-in-rung.
## Not in this PR
- Uncertainty-driven escalation (judge pass after successful reply).
- Per-workspace budget tracking (#305 covers this separately).
- Live streaming reuse across rungs (ladder retries the whole call).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(a2a): add missing Authorization header to delegation and message calls
Three A2A client functions were missing the Bearer token on their HTTP calls
after the Phase 30.1 workspace-auth enforcement rollout:
1. send_a2a_message (a2a_client.py): POST to target workspace's /message/send
used WorkspaceAuth middleware that fails-closed on missing auth header.
Fix: headers=auth_headers() — auth_headers() already imported.
2. tool_delegate_task_async (a2a_tools.py): POST to platform /delegate endpoint
requires the caller's workspace bearer token since Phase 30.1.
Fix: headers=_auth_headers_for_heartbeat()
3. tool_check_task_status (a2a_tools.py): GET /delegations endpoint, same issue.
Fix: headers=_auth_headers_for_heartbeat()
tool_list_peers already uses _auth_headers_for_heartbeat() correctly —
that's why list_peers works while delegation returns 401/[A2A_ERROR].
Root cause of the multi-session A2A outage. PR #386 (TTL fix) addressed
the workspace-restart cascade; this fixes the underlying 401 on each call.
Closes#391
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(a2a): add missing auth headers to /activity and /notify endpoints
Two more Phase 30.1 regressions in a2a_tools.py found during send_message_to_user
debugging (it was returning 401):
- tool_report_activity: POST /workspaces/:id/activity missing headers
- tool_send_message_to_user: POST /workspaces/:id/notify missing headers
Both now use headers=_auth_headers_for_heartbeat() matching the pattern used
by commit_memory, recall_memory, and the heartbeat POST in the same file.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
---------
Co-authored-by: PM (Molecule AI) <pm@molecule-ai.internal>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Severity HIGH. The /transcript route in main.py used `if expected:`
around the bearer-token compare, so `get_token()` returning None (no
/configs/.auth_token on disk — bootstrap window, deleted file, OSError)
silently skipped the entire auth check. Any container on
molecule-monorepo-net could GET /transcript during the provisioning
window and walk away with the full session log (user messages, Claude
tool calls, assistant replies).
The platform's TranscriptHandler always has a valid token (it acquired
one at workspace registration), so tightening this gate has no
legitimate-caller impact. Only unauthenticated sniffers lose access,
which was never the intended contract of #287.
Fix:
1. Extracted the auth gate into `workspace-template/transcript_auth.py`
— a 20-line module with no heavy imports so the security-critical
code is unit-testable without standing up the full uvicorn/a2a/httpx
stack (the former inline guard could only be tested end-to-end,
which explains why the regression shipped in #287).
2. `transcript_authorized(expected, auth_header)` returns False when
`expected` is None or empty — the #328 fix — and otherwise does
strict equality against "Bearer <expected>".
3. main.py's inline handler calls the extracted function:
if not _transcript_authorized(get_token(), auth_header):
return 401
4. New tests/test_transcript_auth.py covers: None token, empty token,
valid bearer, wrong bearer, missing header, case-sensitive prefix,
whitespace fuzzing. All 7 pass.
Closes#328
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Closes#287
Any container on molecule-monorepo-net could previously read the full Claude session log without authentication. Guard uses get_token() from platform_auth — skipped only before workspace registration (dev-mode).
Adds auth_headers to recall_memory and commit_memory in a2a_tools.py. Fixes the #215-class auth regression for A2A memory tools. Test mocks updated to accept headers kwarg.
The #215-class fix in memory.py (859a60e) adds headers=_headers to the
direct-httpx commit_memory + search_memory paths, but 9 existing tests
in test_memory.py had FakeAsyncClient.post/get signatures like
`async def post(self, url, json):` with no headers kwarg. Python
raised TypeError: unexpected keyword argument 'headers' on every call,
commit_memory caught it and returned {success: False}, tests failed.
Fixes applied:
1. Add `headers=None` to every FakeAsyncClient.post + .get signature
across test_memory.py. Uses replace_all so all 9+ fakes match.
2. For tests that capture a single captured["url"]:
- test_commit_memory_uses_awareness_client_when_configured
- test_commit_memory_uses_platform_fallback_without_awareness
- test_commit_memory_httpx_201_success
filter to only capture /memories URLs. Without the filter, the
subsequent _record_memory_activity fire-and-forget post to /activity
overwrites captured["url"] and the assertion fails.
3. For test_commit_memory_promoted_packet_logs_skill_promotion: bump
expected captured["calls"] from 3 to 4. Pre-fix, the memory_write
/activity call (from _record_memory_activity #125) was silently
dropped because the fake rejected headers=; post-fix it succeeds
and lands in the captured list alongside the skill_promotion
/activity and /registry/heartbeat calls. Also extend that test's
fake to accept /registry/heartbeat (was raising AssertionError).
Total: 36/36 memory tests pass. Full workspace-template suite 1189/1189.
This is strictly test-infrastructure work — zero production code
changed. CI never caught the break because the Mac mini runner has
been stuck for ~4 hours (tick-33/34/35/36 reports).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Context: platform now gates `GET /workspaces/:id/memories` and
`POST /workspaces/:id/memories` behind workspace auth (post-#166 /
#167 AdminAuth wave). The `builtin_tools.memory` tool had three HTTP
call sites:
1. commit_memory POST fallback (line 121) ← NO auth_headers
2. search_memory GET fallback (line 269) ← NO auth_headers
3. activity-log helper POST (line 371) ← HAS auth_headers
Path 3 was already fixed. Paths 1 + 2 silently 401 every call, but the
tool's error-handling path returns `{"success": False}` without surfacing
the auth failure to the agent. Result: the agent sees an empty memory
backlog on every call and assumes there's nothing to do.
## Discovered today
Technical Researcher is the first workspace opted in to the idle-loop
pilot from #216 (reflection-on-completion pattern). The pilot fires
every 10 min, the agent calls `search_memory "research-backlog:..."` as
the first step, gets back an empty result, writes "tr-idle clean" to
memory, and stops. Clean-idle outcome every tick, 9 consecutive ticks.
Looking at TR's activity_logs response bodies:
"Memory auth has failed on every tick this session — skipping the call"
"tr-idle — step 2 done. Memory unavailable (auth token missing..."
"tr-idle 04:15 — clean (memory auth still down, 3rd consecutive tick)"
The AGENT knew the memory calls were failing. The platform 401 error
was surfacing in the tool response, but our instrumentation wasn't
counting it as a defect — we saw "tr-idle clean" writes and assumed
the pilot was working as designed. It was actually silently broken.
## Fix
Import `platform_auth.auth_headers` lazily (same pattern as the
activity-log path already uses), attach `headers=_auth()` to both
httpx call sites. Matches the #225 fix for the register call.
## Not in this PR
- awareness_client.py also makes HTTP calls to a separate AWARENESS_URL
service (not the platform), which may or may not need the same fix
depending on that service's auth posture. Out of scope for this PR.
- TR's specific token problem: TR's `/configs/.auth_token` file is
empty because it was re-provisioned via `apply_template: true`
(recovery path from the failed-volume incident) and Phase 30.1
only mints a token on FIRST register per workspace. This fix
doesn't help TR until TR gets a fresh token — tracked separately.
## Test plan
- [x] Python syntax check on memory.py passes
- [ ] CI: all memory-related tests should still pass (the new code
paths only add header passing, no shape change)
- [ ] Real-world verification: after TR gets a fresh token, idle-loop
pilot should produce a dispatch within 10 min (seeded backlog
already in place from this session)
## Related
- #215 / #225 — register call auth_headers fix (same pattern)
- #216 — TR idle-loop pilot (couldn't measure until this lands)
- #166 / #167 — platform AdminAuth wave that surfaced this gap
The Hermes adapter never read /configs/system-prompt.md. Any role that
switched to runtime: hermes was silently losing its role identity because
the system prompt wasn't passed to the model. This PR fixes that by:
1. HermesA2AExecutor.__init__ takes new optional `config_path` kwarg
2. `create_executor(config_path=...)` forwards to the constructor
3. `adapter.py` passes `config.config_path` through from AdapterConfig
4. `execute()` reads system-prompt.md via executor_helpers.get_system_prompt
(hot-reload-capable — reads on every turn, not just at startup)
5. `_do_inference(user_message, history, system_prompt)` — new arg threads
through the dispatch to each native path
6. Each path uses the provider's NATIVE system field:
- OpenAI-compat: prepends `{"role":"system", "content":...}` to messages
- Anthropic: top-level `system=` kwarg (NOT in messages — Anthropic
requires system at the top level)
- Gemini: `config=GenerateContentConfig(system_instruction=...)`
## Phase scoreboard
- 2a (in main) — native Anthropic dispatch infra
- 2b (in main) — native Gemini dispatch
- 2c (in main) — multi-turn history on all paths
- **2d-i (this PR)** — system prompts on all paths
- 2d-ii (future) — tool calling on native paths
- 2d-iii (future) — vision content blocks on native paths
- 2d-iv (future) — streaming
## Test coverage
46/46 tests pass (20 Phase 2 dispatch + 26 Phase 1 registry):
- Existing dispatch tests updated to assert the 3-arg call shape
`("hello", None, None)` — history + system_prompt both None
- 4 new tests:
- `dispatch_passes_system_prompt_to_anthropic` — happy path, third arg flows
- `dispatch_passes_system_prompt_to_gemini` — happy path
- `dispatch_passes_system_prompt_to_openai` — happy path
- `executor_accepts_config_path_kwarg` — constructor stores config_path
- `create_executor_forwards_config_path` — both back-compat and registry
resolution paths forward config_path through to the executor
## Back-compat
- `config_path=None` (default) → execute() skips system-prompt injection,
same behavior as pre-2d-i
- Workspaces with `runtime: hermes` but no `/configs/system-prompt.md`
file get `system_prompt=None` (get_system_prompt returns fallback),
same as before
- The 13 OpenAI-compat providers work identically — system_prompt just
adds a leading message, which every OpenAI-compat endpoint already
supports
- Anthropic + Gemini previously got zero system context; now they get
the same system prompt the workspace's system-prompt.md carries
## Why this matters
Before this PR: if someone flipped a workspace from `runtime: claude-code`
to `runtime: hermes`, the agent would act generically (no role identity,
no project conventions, no CLAUDE.md context) because the Hermes executor
never looked at system-prompt.md. That's a silent correctness regression
the test suite wouldn't catch because none of our live workspaces use
the hermes runtime today.
With this PR: Hermes workspaces get the same system prompt injection as
Claude-code workspaces, making the `runtime: hermes` switch a true drop-in
alternative.
## Related
- #267 Phase 2c (multi-turn history — in main)
- #255 Phase 2b (gemini native — in main)
- #240 Phase 2a (anthropic native — in main)
- #208 Phase 1 (provider registry — in main)
- project_hermes_multi_provider.md — Phase 2d-i was the next queued item
Closes #N (issue to be filed)
Lets canvas / operators see live tool calls + AI thinking instead of
waiting for the high-level activity log to flush. Right now the only
way to "look over an agent's shoulder" is `docker exec ws-XXX cat
/home/agent/.claude/projects/.../<session>.jsonl`, which:
- doesn't work for remote workspaces (Phase 30 / Fly Machines)
- requires shell access on the host
- has no pagination
This PR adds:
1. `BaseAdapter.transcript_lines(since, limit)` — async hook returning
`{runtime, supported, lines, cursor, more, source}`. Default returns
`supported: false` so non-claude-code runtimes pass through gracefully.
2. `ClaudeCodeAdapter.transcript_lines` override — reads the most-
recently-modified `.jsonl` in `~/.claude/projects/<cwd>/`. Resolves
cwd the same way `ClaudeSDKExecutor._resolve_cwd()` does so the
project dir name matches what Claude Code actually writes to. Limit
capped at 1000 to prevent OOM.
3. Workspace HTTP route `GET /transcript` — Starlette handler added
alongside the A2A app. Trusts the internal Docker network (same
model as POST / for A2A); Phase 30 remote-workspace auth is a
follow-up.
4. Platform proxy `GET /workspaces/:id/transcript` — looks up the
workspace's URL, forwards GET, caps response at 1MB. Gated by
existing `WorkspaceAuth` middleware (same as /traces, /memories,
/delegations).
Tests: 6 Python unit tests cover empty dir / pagination / multi-session
/ malformed lines / limit cap, plus 4 Go tests cover 404 / proxy
forwarding / query-string propagation / unreachable-workspace 502.
Verified end-to-end on a live workspace — returns real claude-code
session entries through the platform proxy.
## Follow-ups
- WebSocket variant for live streaming (instead of polling)
- Canvas UI tab "Transcript" between Activity and Traces
- LangGraph / DeepAgents / OpenClaw transcript adapters
- Phase 30 remote-workspace auth on /transcript
Completes the Phase 2 scope by keeping conversation turns as turns across
all three dispatch paths. Pre-2c, history was flattened into a single user
message via shared_runtime.build_task_text, which worked as a fallback but
lost the model's native multi-turn awareness (role attribution,
instruction-following on mid-conversation corrections, system-prompt
grounding against prior turns).
Phase 2a + 2b shipped the dispatch infrastructure + per-provider native
paths. This PR uses them properly.
## What's new
- **`_history_to_openai_messages(user_message, history)`** (static) — maps
A2A `(role, text)` tuples to OpenAI Chat Completions
`[{"role":"user"|"assistant","content":str}]`. Roles: `human`→`user`,
`ai`→`assistant`. Current turn appended as the final user message.
- **`_history_to_anthropic_messages`** (static) — identical wire shape to
OpenAI for text-only turns, so it delegates. Phase 2d tool_use/vision
blocks will diverge here.
- **`_history_to_gemini_contents`** (static) — Gemini uses a different
shape: `role="user"|"model"` (NOT "assistant") and text wrapped in
`parts=[{"text":...}]`. Delegates to none of the others.
- **`_do_openai_compat(user_message, history=None)`** — accepts history,
builds messages via `_history_to_openai_messages`. Back-compat: pass
`history=None` to get the old single-turn behavior.
- **`_do_anthropic_native(user_message, history=None)`** — same signature
change, calls `_history_to_anthropic_messages`. Still uses
`anthropic.AsyncAnthropic().messages.create()`, just with proper
multi-turn.
- **`_do_gemini_native(user_message, history=None)`** — same pattern,
calls `_history_to_gemini_contents`, passes to Gemini's
`generate_content(contents=...)`.
- **`_do_inference(user_message, history=None)`** — new signature,
dispatches by auth_scheme as before, passes both args through.
- **`execute()`** — no longer calls `build_task_text`. Calls
`extract_history(context)` directly and forwards to `_do_inference`.
Removes the `build_task_text` import (not needed in this file anymore).
## Tests
Existing 7 dispatch tests updated for the new `(user_message, history)`
signature — they assert the path is called with `("hello", None)` since
they pass no history.
5 NEW tests:
- `test_history_to_openai_messages_empty_history` — empty history degrades
to single user message (back-compat)
- `test_history_to_openai_messages_multi_turn` — round-trip of a 3-turn
history + current turn
- `test_history_to_anthropic_messages_same_as_openai` — cross-check that
anthropic path produces identical wire shape for text-only
- `test_history_to_gemini_contents_uses_model_role_and_parts_wrapper` —
verifies the Gemini-specific role mapping (`ai`→`model`) + parts wrapper
- `test_dispatch_passes_history_through` — end-to-end: _do_inference
forwards history to the chosen provider path
All 41 tests pass (15 Phase 2 dispatch + 26 Phase 1 registry):
pytest tests/test_hermes_phase2_dispatch.py tests/test_hermes_providers.py
41 passed in 0.07s
## Back-compat
- No public API changes to `create_executor()`. Callers that hit
`execute()` via A2A get the new multi-turn behavior automatically via
`extract_history(context)`.
- Callers that passed an empty history list (or None) get the same
single-turn behavior as pre-2c.
- The `build_task_text` helper in shared_runtime is unchanged — other
adapters (AutoGen, LangGraph) that use it keep working. Only Hermes
bypasses it now.
## What's NOT in this PR (Phase 2d)
- Tool calling / function calling on native paths (anthropic `tools=`,
gemini `tools=Tool(function_declarations=[...])`)
- Vision content blocks (image_url → anthropic `{type:"image", source:
{type:"base64",...}}` / gemini `{inline_data:{mime_type,data}}`)
- System instructions pass-through (anthropic `system=`, gemini
`system_instruction=`)
- Streaming (`astream_messages` / `streamGenerateContent` stream variants)
- Extended thinking (anthropic `thinking={"type":"enabled"}`) / Gemini
thinking config
Phase 2c is the **multi-turn upgrade**. Tool + vision + streaming are
Phase 2d, scoped in project_hermes_multi_provider.md.
## Related
- #240 Phase 2a (native Anthropic dispatch — in main)
- #255 Phase 2b (native Gemini dispatch — in main)
- Phase 1 (#208 — provider registry baseline, in main)
- `project_hermes_multi_provider.md` queued memory
- CEO 2026-04-15: "focus on supporting hermes agent"
Pre-existing flaky test: when the full workspace-template suite ran in
collection order, test_hermes_smoke.py::test_create_executor_raises_
without_keys failed with "DID NOT RAISE ValueError". Failure only
surfaced when test_hermes_providers ran first.
Root cause: test_hermes_providers had an autouse fixture that used
monkeypatch.delenv on entry, but several tests in that file mutate
os.environ directly (e.g. `os.environ["HERMES_API_KEY"] = "test"`),
bypassing monkeypatch. monkeypatch only tracks its own deltas, so on
fixture teardown the direct-mutation values stayed in os.environ.
HERMES_API_KEY leaked across file boundaries into test_hermes_smoke,
which then saw a key present when it expected absence.
Fix: replace monkeypatch-based fixture with pure snapshot/restore:
- Snapshot all provider env vars at entry
- Clear them
- yield (test runs, may mutate freely)
- try/finally restore the exact pre-test state
This is deterministic regardless of whether a test uses monkeypatch,
direct mutation, or neither. Also adds a comment documenting WHY we
switched away from monkeypatch so a future reviewer doesn't revert.
Full workspace-template suite: 1169 passed, 9 skipped, 2 xfailed.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Completes Hermes Phase 2 by adding the second native SDK path: Google Gemini
via the official `google-genai` Python SDK. Stacked on top of Phase 2a
(feat/hermes-phase2-native-sdks) which introduced the dispatch infra +
the anthropic native path.
## What's new in this PR
1. `providers.py`: flip `gemini` entry to `auth_scheme="gemini"` and
update `base_url` from the OpenAI-compat endpoint
(`/v1beta/openai`) to the bare host
(`https://generativelanguage.googleapis.com`) which the native SDK
uses.
2. `executor.py`: new method `_do_gemini_native(task_text)` that uses
`google.genai.Client().aio.models.generate_content(...)`. Dispatch
table in `_do_inference` now routes `"gemini"` → `_do_gemini_native`.
Same fail-loud semantics as `_do_anthropic_native` — missing SDK
raises a clear RuntimeError with install instructions.
3. `requirements.txt`: add `google-genai>=1.0.0`.
4. `test_hermes_phase2_dispatch.py`: +3 tests
- `test_gemini_entry_has_gemini_scheme` — registry flip + base URL
validated
- `test_dispatch_gemini_scheme_calls_gemini_native` — dispatch runs
gemini native, not openai-compat or anthropic-native
- `test_gemini_native_raises_clear_error_when_sdk_missing` — fail-loud
on missing `google-genai` package
Plus updated existing dispatch tests to mock `_do_gemini_native`
alongside the other paths so "no cross-calls" assertions stay tight.
All 36 tests pass locally (10 Phase 2 dispatch + 26 Phase 1 registry):
pytest tests/test_hermes_phase2_dispatch.py tests/test_hermes_providers.py
36 passed in 0.07s
## Dispatch table after this PR
auth_scheme="openai" → _do_openai_compat (13 providers)
auth_scheme="anthropic" → _do_anthropic_native (1 provider, Phase 2a)
auth_scheme="gemini" → _do_gemini_native (1 provider, Phase 2b) ← NEW
<unknown> → _do_openai_compat + warning (forward-compat)
## Back-compat
- All 13 openai-scheme providers unchanged
- `hermes_api_key` / `HERMES_API_KEY` / `OPENROUTER_API_KEY` paths unchanged
- Only `gemini` provider changes behavior: now uses native generateContent
instead of the `/v1beta/openai` compat shim
- Existing Gemini callers setting `GEMINI_API_KEY` get the native path
automatically — no caller changes needed
## What's NOT in this PR (future phases)
- Streaming support (`astream_messages` / `streamGenerateContent` stream
variants) for either native path
- Tool calling / function calling on native paths
- Vision content blocks (image_url → anthropic image blocks; image_url →
gemini inline_data with base64 + mime_type)
- Extended thinking (anthropic) / thinking config (gemini)
- System instructions pass-through on the gemini native path
Phase 2c/2d will layer these on. This PR is the minimum-viable native
dispatch — single-turn text in, text out — same shape as Phase 2a.
## Stacking
This PR targets `feat/hermes-phase2-native-sdks` (Phase 2a) as its base
branch, NOT main, so the diff shows only the Gemini-specific additions.
When Phase 2a merges to main, GitHub auto-rebases this PR onto the new
main head. If reviewer prefers a single combined PR, close#240 and land
this one instead — the commits on feat/hermes-phase2-native-sdks are
already included in this branch's history.
## Related
- #240 Phase 2a (parent branch)
- #208 Phase 1 (registry + openai-compat path — already in main)
- `project_hermes_multi_provider.md` queued memory — Phase 2 was the next
item, this PR completes it
- `docs/ecosystem-watch.md` → `### Hermes Agent` — Research Lead's
eco-watch entry that catalogued Hermes's native provider list and
shaped the original Phase 2 scope
Completes the Hermes adapter's native-SDK plan for the provider that gains
the most from leaving OpenAI-compat: Anthropic. OpenAI-compat works fine for
plain text turns on every provider (Phase 1 covered that with one code path
for all 15 providers), but Anthropic's Messages API has first-class tool use,
vision content blocks, and extended thinking that the OpenAI-compat shim
strips or mis-translates.
Rather than ship all native SDK paths in one PR (Anthropic + Gemini + future),
this lands Anthropic only (Phase 2a). Gemini is Phase 2b, shipping after a
production measurement window on Phase 2a.
## Design
Providers now dispatch by `auth_scheme` field. Phase 1 added the field but
every provider used `"openai"`. Phase 2 flips `anthropic` to `"anthropic"`
and wires a second inference path keyed on that:
- `HermesA2AExecutor._do_openai_compat(task_text)` — existing path, handles
14 of 15 providers (Nous Portal, OpenRouter, OpenAI, xAI, Gemini, Qwen,
GLM, Kimi, MiniMax, DeepSeek, Groq, Together, Fireworks, Mistral)
- `HermesA2AExecutor._do_anthropic_native(task_text)` — NEW, uses the
official `anthropic` Python SDK's `AsyncAnthropic().messages.create(...)`
- `HermesA2AExecutor._do_inference(task_text)` — dispatches by
`self.provider_cfg.auth_scheme`
Unknown schemes fall back to OpenAI-compat with a logged warning, so future
provider additions don't crash if a native SDK path ships late.
## Fail-loud on missing SDK
`_do_anthropic_native` raises a clear `RuntimeError` with install
instructions if the `anthropic` package is missing at runtime:
Hermes anthropic native path requires the `anthropic` package. Install
in the workspace image with `pip install anthropic>=0.39.0` or set
HERMES provider=openrouter to route Claude models through OpenRouter's
OpenAI-compat shim instead.
This is intentional: silent fallback would mask fidelity loss (tool_use
blocks become plain text, vision gets stripped). Loud failure is better.
`requirements.txt` adds `anthropic>=0.39.0` so the package is baked into
the workspace-template image build path. Operators building custom workspace
images without anthropic installed get the loud error.
## Back-compat
- `create_executor(hermes_api_key="x")` → still routes to Nous Portal
(`auth_scheme="openai"`), unchanged
- `HERMES_API_KEY` env var → still first in RESOLUTION_ORDER
- `OPENROUTER_API_KEY` env var → still second
- All 14 OpenAI-compat providers unchanged — they take the same code path
as before
- ONLY `anthropic` provider changes behavior: it now uses the native
Messages API instead of the `/v1/chat/completions` compat shim
## Constructor signature change
`HermesA2AExecutor.__init__` now takes `provider_cfg: ProviderConfig`
instead of separate `api_key + base_url + model`. The three fields are
derived from `provider_cfg` + an optional model override. This is a
breaking change for any external caller building an executor directly,
but the only documented public entry point is `create_executor()`, which
is updated in the same commit to pass the cfg through.
## Test coverage
`workspace-template/tests/test_hermes_phase2_dispatch.py` — 7 new tests:
1. `test_anthropic_entry_has_anthropic_scheme` — registry flip
2. `test_all_other_providers_still_openai_scheme` — regression guard
3. `test_dispatch_openai_scheme_calls_openai_compat` — happy path
4. `test_dispatch_anthropic_scheme_calls_anthropic_native` — happy path
5. `test_dispatch_unknown_scheme_falls_back_to_openai_compat` — forward compat
6. `test_anthropic_native_raises_clear_error_when_sdk_missing` — fail-loud
7. `test_create_executor_passes_provider_cfg` — constructor wiring
All pass locally (pytest tests/test_hermes_phase2_dispatch.py -v, 0.04s).
Phase 1 tests unchanged: `test_hermes_providers.py` 26/26 pass, no
regressions.
## What's NOT in this PR (Phase 2b)
- Gemini native `generateContent` path (`auth_scheme="gemini"`)
- Streaming support across both native paths (`astream_messages`, `streamGenerateContent`)
- Tool calling on the anthropic native path (the `tools` + `tool_use` blocks)
- Vision content blocks (image_url → anthropic image blocks)
- Extended thinking parameter passthrough
All scoped in `project_hermes_multi_provider.md`. Phase 2a is the minimum
viable native Anthropic dispatch — single-turn text in, text out, no tools.
## Related
- Phase 1 baseline (already in main): #208 — provider registry + OpenAI-compat path
- Queued memory: `project_hermes_multi_provider.md` — full phased plan
- Triggering directive: CEO 2026-04-15 — "once current works are cleared,
focus on supporting hermes agent"
Closes#220. #215 added auth_headers() to /registry/register but missed
two other self-post paths from the same workspace container:
1. initial_prompt (_do_send_sync) — fires once on first boot after the
A2A server is ready. Posts to /workspaces/:id/a2a via the platform
proxy. Missing headers meant the initial prompt got silently
dropped as 401 on any token-enrolled workspace.
2. idle loop (_post_sync) — fires every idle_interval_seconds while
the workspace has no active task (#205 pattern). Same proxy path,
same missing headers, same silent 401 in multi-tenant mode.
Both now build headers as
{"Content-Type": "application/json", **auth_headers()}
auth_headers() returns {"Authorization": "Bearer <token>"} when
/auth-token.txt exists, empty dict otherwise (first boot before
register issues the token). The existing lazy-bootstrap fail-open
on the platform side covers the empty-dict case.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>