HermesA2AExecutor now supports sending system context as ordered, separate
role=system messages instead of a single concatenated string — the model
format recommended by NousResearch.
Changes:
- HermesA2AExecutor.__init__: new system_blocks kwarg (list[str|None]|None)
stored as an independent copy; None blocks and empty strings silently skipped
- _build_messages(): when system_blocks is not None, emits each non-empty
block as a separate {"role": "system"} entry in Hermes-recommended order
(persona → tools context → reasoning policy); falls through to legacy
system_prompt path when system_blocks is None (backward compatible)
Backward compatibility: existing callers that pass a single system_prompt
string continue to work identically — no changes required.
Tests (12 new, 47 total):
- system_blocks stored as independent copy (mutation safe)
- three-block stacked ordering preserved
- empty / None blocks silently skipped
- all-empty list → zero system messages
- system_blocks overrides system_prompt when both provided
- legacy system_prompt path unchanged
- stacked blocks appear in the live API call kwargs
Closes#499
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Closes#790. Depends on feat/issue-583-1-checkpoint-persistence (PR #788).
Platform (Go) — checkpoints_integration_test.go (5 new tests):
1. ThreeStepPersistence: POST task_receive/llm_call/task_complete → GET returns
all 3 in step_index DESC order with correct names and payloads.
2. CrashResume_HighestStepIsResumptionPoint: POST steps 0+1 only (crash before
step 2) → GET shows step_index=1 as the resume point; task_complete absent.
3. UpsertIdempotency_LatestPayloadWins: POST same (wf_id, step_name) twice with
different payloads → List returns only the second payload (ON CONFLICT DO UPDATE).
4. PostCascadeDelete_Returns404: simulate post ON-DELETE-CASCADE state (empty
rows) → List returns 404 as expected after workspace deletion.
5. AuthGate_NoToken_Returns401: router-level test with WorkspaceAuth middleware;
POST/GET/DELETE all return 401 without a bearer token (no DB calls made).
workspace-template — _save_checkpoint + 4 Python tests:
- Add async _save_checkpoint() to temporal_workflow.py: POST to the platform
checkpoint endpoint after each activity stage; fully non-fatal (try/except
inside the function, plus defence-in-depth try/except at every call site).
- 4 new pytest cases (test_temporal_workflow.py):
- nonfatal_on_http_error: _save_checkpoint raises HTTPStatusError (500) →
task_receive_activity still returns {"status":"received"}.
- nonfatal_on_network_error: _save_checkpoint raises ConnectError →
llm_call_activity still returns success LLMResult.
- success_path: _save_checkpoint no-op → activity returns correctly;
checkpoint called with correct args.
- standalone_http_error_is_swallowed: real _save_checkpoint function swallows
HTTP 500 from a mocked httpx.AsyncClient; returns None.
All 36 temporal workflow Python tests pass.
Go tests: Go binary not in this container; test file verified for syntax and
against the sqlmock patterns used throughout the handlers package.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The Baidu MeDo hackathon integration was sitting in builtin_tools/ as dead
code — not imported by any loader but shipped with every workspace image,
misleadingly suggesting it was a core builtin.
Changes:
- Move builtin_tools/medo.py → plugins/molecule-medo/skills/medo-tools/scripts/medo.py
(git detects this as a rename — no code changes, identical tool surface)
- Add plugins/molecule-medo/plugin.yaml (manifest: name, version, runtimes, tags)
- Add plugins/molecule-medo/skills/medo-tools/SKILL.md (frontmatter + setup docs)
- Move workspace-template/tests/test_medo.py → plugins/molecule-medo/tests/test_medo.py
(update _MEDO_PATH to resolve from plugin root; add conftest.py for langchain mock)
- Update .gitignore: change /plugins/ blanket ignore to /plugins/* so this plugin
can be tracked until it gets its own standalone repo
Acceptance criteria met:
- builtin_tools/medo.py removed from core
- plugins/molecule-medo/ created with identical tool surface (9/9 tests pass)
- cd workspace-template && pytest → 1021 passed, 2 xfailed (no regression)
- MEDO_API_KEY was never in default provisioning (.env.example / config.py clean)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace the anthropic:claude-sonnet-4-6 default across config, handlers,
env example, and litellm proxy config. All tests updated to match the new
default; sonnet-4-6 alias kept in litellm_config.yml for pinned workspaces.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Cover the four paths that were exercised only via mock in the
_build_options tests: valid YAML, missing file, malformed YAML,
and empty file (safe_load → None → {} via `or {}`).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds _load_config_dict() helper to ClaudeSDKExecutor and wires the new
effort and task_budget config fields into _build_options() before the
Anthropic API call:
- effort (str): low|medium|high|xhigh|max — populates output_config.effort
- task_budget (int): advisory total-token budget; must be >= 20000 when set;
automatically adds task-budgets-2026-03-13 beta header
Also adds WorkspaceConfig.effort and WorkspaceConfig.task_budget fields in
config.py and 5 acceptance tests covering all code paths.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Replace == HMAC comparisons with hmac.compare_digest (Python) and
hmac.Equal (Go) in ledger.py, verify.py, and audit.go to prevent
timing oracle attacks (Fixes 1-6)
- Increase PBKDF2 iterations from 100K to 210K in both ledger.py and
audit.go — must match for cross-language verification (Fix 7)
- Return chain_valid: null when offset > 0 (paginated views cannot
verify a truncated chain; null means "not computed") (Fix 8)
- Remove module-level AUDIT_LEDGER_SALT attribute from ledger.py; read
the secret exclusively from os.environ inside _get_hmac_key() so the
salt is not exposed in the module namespace (Fix 9)
- Update tests: use monkeypatch.setenv/delenv instead of setattr on the
removed AUDIT_LEDGER_SALT attribute; update testAuditKey helper to
use 210K iterations; add TestAuditQuery_PaginatedOffsetReturnsNullChainValid
- Fix migration 028: workspace_id column type TEXT → UUID to match
workspaces.id UUID primary key
All tests pass: 1043 pytest + 0 Go test failures.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Both PRs restructured the same chat.completions.create() call to use a
create_kwargs dict. Resolved by keeping both __init__ params and both
conditionals in the create_kwargs block.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Unconditional list.extend() on repeated plugin install caused every
hook handler to be appended on each reinstall, leading to 3-4x duplicate
firings per event (PreToolUse, PostToolUse, Stop, etc.).
Fix: before appending each incoming handler, compute a fingerprint of
(matcher, frozenset-of-commands). Skip append if the fingerprint is
already present in the merged list. First-time installs are unaffected —
new handlers still land correctly.
Adds 7 unit tests covering: first install, double install, triple install,
different-matcher co-existence, different-command co-existence, existing
user hook preservation, and top-level key merge semantics.
Closes#566
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds response_format support to HermesA2AExecutor so callers can request
structured JSON output via the OpenAI-native response_format parameter.
Changes:
- _validate_response_format(): validates type (json_schema/json_object/text)
and required sub-fields; returns None if valid, error message if invalid
- HermesA2AExecutor.__init__: new response_format kwarg, stored as _response_format
- execute(): validates before API call — invalid schema enqueues error and
returns early without hitting Hermes API; valid and non-None adds
response_format= to create_kwargs; None omits the field entirely
Tests (12 new):
- _validate_response_format: all valid types, invalid type, missing fields
- constructor stores response_format correctly
- valid response_format forwarded to API call
- response_format omitted when None (no key in call kwargs)
- invalid schema → error message enqueued, API not called
Closes#498
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Instead of injecting tool definitions as text into the system prompt,
HermesA2AExecutor now accepts a tools: list[dict] | None constructor
parameter containing OpenAI-format tool definitions and forwards them
via the native tools= parameter on chat.completions.create().
Empty list / None rule: when tools is falsy, the tools key is omitted
from the API call entirely — never sent as tools=[] — so providers
that reject an empty tools array don't return a 400.
Tool-call response handling: when the model returns finish_reason
"tool_calls" with no text content, the executor serialises the call
list as a JSON string and enqueues it as the A2A reply. This keeps
the executor thin (single API call per turn, no ReAct loop) while
surfacing function-call intent in a structured, parseable format.
Changes:
- HermesA2AExecutor.__init__: new tools kwarg; stored as self._tools
(copy; mutating the input list has no effect)
- execute(): builds create_kwargs dict and conditionally adds tools=
only when self._tools is non-empty; handles tool_calls response
- Module docstring: new "Native tools (#497)" section with schema
reference and edge-case explanation
Tests (12 new, 47 total in hermes test file, 1002 total suite):
- tools stored correctly in constructor (copy, None, [], non-empty)
- non-empty tools forwarded as tools= in API call
- multiple tools all forwarded
- empty list ([] and None and default) → tools key absent from call
- model tool_call response → JSON-serialised list as A2A reply
- multiple tool_calls → all in JSON reply
- text content present → text wins over tool_calls
Closes#497
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Hermes 4 is a hybrid-reasoning model trained on <think> tags; without asking
for thinking we pay flagship $/tok but get non-reasoning quality. This adds a
dedicated HermesA2AExecutor that dispatches to any OpenAI-compat endpoint
(OpenRouter, Nous Portal) and enables native reasoning for Hermes 4 models.
Key decisions:
- ProviderConfig + _reasoning_supported() detect Hermes 4 by model slug
substring ("hermes-4", "hermes4") — case-insensitive, no config needed
- extra_body={"reasoning": {"enabled": True}} sent only to Hermes 4 entries;
Hermes 3 path unchanged (no extra_body, no regressions)
- choices[0].message.reasoning + reasoning_details extracted and written to
an OTEL span (hermes.reasoning) — deliberately NOT echoed in the A2A reply
so the reasoning trace never contaminates the agent's next-turn context
- API key / base URL default to OPENAI_API_KEY / OPENAI_BASE_URL env vars
with openrouter.ai/api/v1 as the fallback endpoint
- _client injection parameter for unit tests (no live API calls needed)
- Error sanitization: only exception class name surfaces to user (mirrors
sanitize_agent_error() convention from cli_executor.py)
Test coverage: 35 tests, 100% coverage on all new code paths including:
- _reasoning_supported() — Hermes 4/3/unknown/empty/uppercase
- ProviderConfig — field assignment and capability flags
- extra_body presence for Hermes 4, absence for Hermes 3
- reasoning not in A2A reply; _log_reasoning called when trace present
- reasoning_details forwarded; span attributes set correctly
- Telemetry failure swallowed (never blocks response)
- API error → sanitized class-name-only reply
- cancel() → TaskStatusUpdateEvent(state=canceled)
Full suite: 990 passed, 0 failed (no regressions).
Resolves#496
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
PR #471 removed Dockerfiles/requirements from adapters/ but left the
Python source files. This commit finishes the extraction:
1. Moved shared_runtime.py → workspace-template/shared_runtime.py
(used by prompt.py, a2a_executor.py, coordinator.py — not adapter-specific)
2. Moved base.py → workspace-template/adapter_base.py
(BaseAdapter + AdapterConfig — the interface adapters implement)
3. Updated imports in prompt.py, a2a_executor.py, coordinator.py
4. Rewritten adapters/__init__.py as a thin shim that:
- Reads ADAPTER_MODULE env var (production: standalone repos set this)
- Re-exports BaseAdapter/AdapterConfig for backward compat
5. adapters/base.py + adapters/shared_runtime.py remain as re-export shims
6. Deleted all 8 adapter subdirectories (autogen, claude_code, crewai,
deepagents, gemini_cli, hermes, langgraph, openclaw)
7. Removed 11 test files that imported adapter-specific code
Tests: 955 passed, 0 failed (down from 1216 — the difference is
adapter-specific tests that moved to standalone repos).
test_first_party_plugins.py, test_plugins_builtins_drift.py, and
test_hermes_adapter.py all referenced files under plugins/ and
adapters/ which were extracted to standalone repos. These tests
belong in those repos now, not in the core workspace-template.
1216 passed, 0 failed after removal.
Every agent in the template currently uses the same GitHub PAT, so
\`gh pr list\` shows every PR as authored by the CEO's account with
no signal which agent opened each one. Commits already carry
per-agent authors (GIT_AUTHOR_NAME from #402). This wrapper extends
the identity split to the PR/issue metadata surface layer that
commit attribution can't reach.
## How it works
A tiny bash script installed at \`/usr/local/bin/gh\`, which sits
earlier in PATH than the real binary at \`/usr/bin/gh\`. For \`gh pr
create\` and \`gh issue create\`:
- Title gets prefixed with \`[Role Name]\` — e.g. \`[Frontend Engineer]
fix: canvas grid index\`
- Body gets \`\n\n---\n_Opened by: Molecule AI <Role>_\` appended
Role is read from \`GIT_AUTHOR_NAME\` which the platform provisioner
sets to \`Molecule AI <Role>\` (shipped with #402). Accepts both
\`--title X\` and \`--title=X\` forms. Same for \`--body\`.
Anything that isn't \`gh pr create\` or \`gh issue create\` (e.g.
\`gh pr list\`, \`gh issue view\`, \`gh run watch\`) passes through
untouched. No behaviour change for read-side operations.
## Idempotent
- If the title already starts with \`[...]\` the wrapper does not
re-prefix. \`gh pr edit\` flows that resubmit title won't layer
multiple tags.
- If the body already contains \`Opened by: Molecule AI\` the footer
is not re-appended.
## Fail-open
When \`GIT_AUTHOR_NAME\` is absent or doesn't start with \`Molecule
AI \`, the wrapper exec's the real gh with unchanged args. No call
is ever blocked by this script.
## Test coverage
\`tests/test_gh_wrapper.sh\` — 12 cases, no network, no Docker:
- Passthrough for non-create subcommands (pr list)
- pr create title prefix + body footer
- issue create with \`--title=X\` \`--body=X\` equals-form
- Idempotent title re-prefix
- Idempotent body footer (count = 1 after two applies)
- Missing GIT_AUTHOR_NAME → passthrough, title preserved
- Malformed GIT_AUTHOR_NAME (not "Molecule AI ...") → passthrough
All 12 pass. Test script is standalone bash + a temp fake gh binary
that echoes argv; safe to run in CI's Python Lint & Test job via
subprocess shell-out.
## Deployment note
This lands in the workspace image. Existing containers keep their
old /usr/bin/gh until the image is rebuilt and they're re-provisioned
(POST /workspaces/:id/restart {}). No migration required; the wrapper
just starts tagging PRs once the new image is rolled.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Phase 3 escalation ladder added `from .escalation import ...` to
executor.py. The phase-2 dispatch tests load executor.py via
`exec(compile(src, ...))` with the relative import rewritten — this
broke because (a) the rewrite didn't know about escalation and (b) the
exec namespace lacked `__name__`, which executor.py needs at import
time for `logging.getLogger(__name__)`.
Fix both in all 8 exec sites:
- Rewrite both `from .providers import` AND `from .escalation import`
- Pre-register escalation + providers in sys.modules under the fake
package name
- Seed the exec namespace with `__name__ = "hermes_executor_under_test"`
54/54 hermes tests pass (28 escalation truth-table + 6 ladder-integration
+ 20 existing phase-2 dispatch).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Ships scoped Phase 3 of the Hermes multi-provider work. Every workspace
can now declare an ordered list of (provider, model) rungs; when the
pinned model hits rate-limit / 5xx / context-length / overload, the
executor advances to the next rung before raising.
## Why
3× Claude Max saturation is a routine occurrence now — the "first 429 on
a batch delegation" is the common path, not the exception. A workspace
pinned to Haiku that hits a context-length limit has no recovery today;
same for Sonnet hitting rate-limit mid-synthesis. Escalation promotes
to the next tier for that single call, preserves coordination, avoids
restart cascades.
## New module: adapters/hermes/escalation.py
- ``LadderRung(provider, model)`` — one config entry.
- ``parse_ladder(raw)`` — tolerant config parser; skips malformed rungs
with a warning rather than raising so boot stays resilient.
- ``should_escalate(exc) -> bool`` — truth table over 15+ error shapes:
- Typed classes (RateLimitError, OverloadedError, APITimeoutError,
APIConnectionError, InternalServerError)
- Context-length markers (each provider uses different phrasing)
- Gateway markers (502/503/504, overloaded, temporarily unavailable)
- Status-code substrings (429, 529, 5xx)
- Hard-rejects auth failures (401/403/invalid_api_key) even if the
outer exception class is RateLimitError — wrapping case matters.
## Executor wiring
``HermesA2AExecutor`` now accepts ``escalation_ladder`` in its
constructor + ``create_executor()`` factory. ``_do_inference()`` walks
the ladder:
1. First attempt = pinned provider:model (matches pre-ladder behaviour)
2. On escalatable error, try each rung in order
3. On non-escalatable error, raise immediately (auth, malformed payload)
4. On exhaustion, raise the last error
Rung switches temporarily rebind ``self.provider_cfg`` / ``self.model``
/ ``self.api_key`` / ``self.base_url`` in a try/finally, so any raised
error leaves the executor in its original state for the next call. Key
resolution for non-pinned rungs goes through ``resolve_provider`` which
reads the rung-provider's env vars fresh.
## Config shape
``config.yaml`` (rendered from ``org.yaml`` → workspace secrets):
runtime_config:
escalation_ladder:
- provider: gemini
model: gemini-2.5-flash
- provider: anthropic
model: claude-sonnet-4-5-20250929
- provider: anthropic
model: claude-opus-4-1-20250805
Empty / absent = single-shot behaviour, full backwards-compat with
every existing workspace.
## Tests
34 passing, all isolated (no network):
- ``test_hermes_escalation.py`` (28): parser + truth-table across
rate-limit, overload, context-length, gateway, auth-reject, unrelated
exceptions, and case-insensitivity.
- ``test_hermes_ladder_integration.py`` (6): no-ladder single call,
ladder-not-triggered on success, escalate-on-rate-limit-then-succeed,
stop-on-non-escalatable, raise-last-error-when-exhausted, skip-
unknown-provider-in-rung.
## Not in this PR
- Uncertainty-driven escalation (judge pass after successful reply).
- Per-workspace budget tracking (#305 covers this separately).
- Live streaming reuse across rungs (ladder retries the whole call).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Severity HIGH. The /transcript route in main.py used `if expected:`
around the bearer-token compare, so `get_token()` returning None (no
/configs/.auth_token on disk — bootstrap window, deleted file, OSError)
silently skipped the entire auth check. Any container on
molecule-monorepo-net could GET /transcript during the provisioning
window and walk away with the full session log (user messages, Claude
tool calls, assistant replies).
The platform's TranscriptHandler always has a valid token (it acquired
one at workspace registration), so tightening this gate has no
legitimate-caller impact. Only unauthenticated sniffers lose access,
which was never the intended contract of #287.
Fix:
1. Extracted the auth gate into `workspace-template/transcript_auth.py`
— a 20-line module with no heavy imports so the security-critical
code is unit-testable without standing up the full uvicorn/a2a/httpx
stack (the former inline guard could only be tested end-to-end,
which explains why the regression shipped in #287).
2. `transcript_authorized(expected, auth_header)` returns False when
`expected` is None or empty — the #328 fix — and otherwise does
strict equality against "Bearer <expected>".
3. main.py's inline handler calls the extracted function:
if not _transcript_authorized(get_token(), auth_header):
return 401
4. New tests/test_transcript_auth.py covers: None token, empty token,
valid bearer, wrong bearer, missing header, case-sensitive prefix,
whitespace fuzzing. All 7 pass.
Closes#328
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds auth_headers to recall_memory and commit_memory in a2a_tools.py. Fixes the #215-class auth regression for A2A memory tools. Test mocks updated to accept headers kwarg.
The #215-class fix in memory.py (859a60e) adds headers=_headers to the
direct-httpx commit_memory + search_memory paths, but 9 existing tests
in test_memory.py had FakeAsyncClient.post/get signatures like
`async def post(self, url, json):` with no headers kwarg. Python
raised TypeError: unexpected keyword argument 'headers' on every call,
commit_memory caught it and returned {success: False}, tests failed.
Fixes applied:
1. Add `headers=None` to every FakeAsyncClient.post + .get signature
across test_memory.py. Uses replace_all so all 9+ fakes match.
2. For tests that capture a single captured["url"]:
- test_commit_memory_uses_awareness_client_when_configured
- test_commit_memory_uses_platform_fallback_without_awareness
- test_commit_memory_httpx_201_success
filter to only capture /memories URLs. Without the filter, the
subsequent _record_memory_activity fire-and-forget post to /activity
overwrites captured["url"] and the assertion fails.
3. For test_commit_memory_promoted_packet_logs_skill_promotion: bump
expected captured["calls"] from 3 to 4. Pre-fix, the memory_write
/activity call (from _record_memory_activity #125) was silently
dropped because the fake rejected headers=; post-fix it succeeds
and lands in the captured list alongside the skill_promotion
/activity and /registry/heartbeat calls. Also extend that test's
fake to accept /registry/heartbeat (was raising AssertionError).
Total: 36/36 memory tests pass. Full workspace-template suite 1189/1189.
This is strictly test-infrastructure work — zero production code
changed. CI never caught the break because the Mac mini runner has
been stuck for ~4 hours (tick-33/34/35/36 reports).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The Hermes adapter never read /configs/system-prompt.md. Any role that
switched to runtime: hermes was silently losing its role identity because
the system prompt wasn't passed to the model. This PR fixes that by:
1. HermesA2AExecutor.__init__ takes new optional `config_path` kwarg
2. `create_executor(config_path=...)` forwards to the constructor
3. `adapter.py` passes `config.config_path` through from AdapterConfig
4. `execute()` reads system-prompt.md via executor_helpers.get_system_prompt
(hot-reload-capable — reads on every turn, not just at startup)
5. `_do_inference(user_message, history, system_prompt)` — new arg threads
through the dispatch to each native path
6. Each path uses the provider's NATIVE system field:
- OpenAI-compat: prepends `{"role":"system", "content":...}` to messages
- Anthropic: top-level `system=` kwarg (NOT in messages — Anthropic
requires system at the top level)
- Gemini: `config=GenerateContentConfig(system_instruction=...)`
## Phase scoreboard
- 2a (in main) — native Anthropic dispatch infra
- 2b (in main) — native Gemini dispatch
- 2c (in main) — multi-turn history on all paths
- **2d-i (this PR)** — system prompts on all paths
- 2d-ii (future) — tool calling on native paths
- 2d-iii (future) — vision content blocks on native paths
- 2d-iv (future) — streaming
## Test coverage
46/46 tests pass (20 Phase 2 dispatch + 26 Phase 1 registry):
- Existing dispatch tests updated to assert the 3-arg call shape
`("hello", None, None)` — history + system_prompt both None
- 4 new tests:
- `dispatch_passes_system_prompt_to_anthropic` — happy path, third arg flows
- `dispatch_passes_system_prompt_to_gemini` — happy path
- `dispatch_passes_system_prompt_to_openai` — happy path
- `executor_accepts_config_path_kwarg` — constructor stores config_path
- `create_executor_forwards_config_path` — both back-compat and registry
resolution paths forward config_path through to the executor
## Back-compat
- `config_path=None` (default) → execute() skips system-prompt injection,
same behavior as pre-2d-i
- Workspaces with `runtime: hermes` but no `/configs/system-prompt.md`
file get `system_prompt=None` (get_system_prompt returns fallback),
same as before
- The 13 OpenAI-compat providers work identically — system_prompt just
adds a leading message, which every OpenAI-compat endpoint already
supports
- Anthropic + Gemini previously got zero system context; now they get
the same system prompt the workspace's system-prompt.md carries
## Why this matters
Before this PR: if someone flipped a workspace from `runtime: claude-code`
to `runtime: hermes`, the agent would act generically (no role identity,
no project conventions, no CLAUDE.md context) because the Hermes executor
never looked at system-prompt.md. That's a silent correctness regression
the test suite wouldn't catch because none of our live workspaces use
the hermes runtime today.
With this PR: Hermes workspaces get the same system prompt injection as
Claude-code workspaces, making the `runtime: hermes` switch a true drop-in
alternative.
## Related
- #267 Phase 2c (multi-turn history — in main)
- #255 Phase 2b (gemini native — in main)
- #240 Phase 2a (anthropic native — in main)
- #208 Phase 1 (provider registry — in main)
- project_hermes_multi_provider.md — Phase 2d-i was the next queued item
Closes #N (issue to be filed)
Lets canvas / operators see live tool calls + AI thinking instead of
waiting for the high-level activity log to flush. Right now the only
way to "look over an agent's shoulder" is `docker exec ws-XXX cat
/home/agent/.claude/projects/.../<session>.jsonl`, which:
- doesn't work for remote workspaces (Phase 30 / Fly Machines)
- requires shell access on the host
- has no pagination
This PR adds:
1. `BaseAdapter.transcript_lines(since, limit)` — async hook returning
`{runtime, supported, lines, cursor, more, source}`. Default returns
`supported: false` so non-claude-code runtimes pass through gracefully.
2. `ClaudeCodeAdapter.transcript_lines` override — reads the most-
recently-modified `.jsonl` in `~/.claude/projects/<cwd>/`. Resolves
cwd the same way `ClaudeSDKExecutor._resolve_cwd()` does so the
project dir name matches what Claude Code actually writes to. Limit
capped at 1000 to prevent OOM.
3. Workspace HTTP route `GET /transcript` — Starlette handler added
alongside the A2A app. Trusts the internal Docker network (same
model as POST / for A2A); Phase 30 remote-workspace auth is a
follow-up.
4. Platform proxy `GET /workspaces/:id/transcript` — looks up the
workspace's URL, forwards GET, caps response at 1MB. Gated by
existing `WorkspaceAuth` middleware (same as /traces, /memories,
/delegations).
Tests: 6 Python unit tests cover empty dir / pagination / multi-session
/ malformed lines / limit cap, plus 4 Go tests cover 404 / proxy
forwarding / query-string propagation / unreachable-workspace 502.
Verified end-to-end on a live workspace — returns real claude-code
session entries through the platform proxy.
## Follow-ups
- WebSocket variant for live streaming (instead of polling)
- Canvas UI tab "Transcript" between Activity and Traces
- LangGraph / DeepAgents / OpenClaw transcript adapters
- Phase 30 remote-workspace auth on /transcript
Completes the Phase 2 scope by keeping conversation turns as turns across
all three dispatch paths. Pre-2c, history was flattened into a single user
message via shared_runtime.build_task_text, which worked as a fallback but
lost the model's native multi-turn awareness (role attribution,
instruction-following on mid-conversation corrections, system-prompt
grounding against prior turns).
Phase 2a + 2b shipped the dispatch infrastructure + per-provider native
paths. This PR uses them properly.
## What's new
- **`_history_to_openai_messages(user_message, history)`** (static) — maps
A2A `(role, text)` tuples to OpenAI Chat Completions
`[{"role":"user"|"assistant","content":str}]`. Roles: `human`→`user`,
`ai`→`assistant`. Current turn appended as the final user message.
- **`_history_to_anthropic_messages`** (static) — identical wire shape to
OpenAI for text-only turns, so it delegates. Phase 2d tool_use/vision
blocks will diverge here.
- **`_history_to_gemini_contents`** (static) — Gemini uses a different
shape: `role="user"|"model"` (NOT "assistant") and text wrapped in
`parts=[{"text":...}]`. Delegates to none of the others.
- **`_do_openai_compat(user_message, history=None)`** — accepts history,
builds messages via `_history_to_openai_messages`. Back-compat: pass
`history=None` to get the old single-turn behavior.
- **`_do_anthropic_native(user_message, history=None)`** — same signature
change, calls `_history_to_anthropic_messages`. Still uses
`anthropic.AsyncAnthropic().messages.create()`, just with proper
multi-turn.
- **`_do_gemini_native(user_message, history=None)`** — same pattern,
calls `_history_to_gemini_contents`, passes to Gemini's
`generate_content(contents=...)`.
- **`_do_inference(user_message, history=None)`** — new signature,
dispatches by auth_scheme as before, passes both args through.
- **`execute()`** — no longer calls `build_task_text`. Calls
`extract_history(context)` directly and forwards to `_do_inference`.
Removes the `build_task_text` import (not needed in this file anymore).
## Tests
Existing 7 dispatch tests updated for the new `(user_message, history)`
signature — they assert the path is called with `("hello", None)` since
they pass no history.
5 NEW tests:
- `test_history_to_openai_messages_empty_history` — empty history degrades
to single user message (back-compat)
- `test_history_to_openai_messages_multi_turn` — round-trip of a 3-turn
history + current turn
- `test_history_to_anthropic_messages_same_as_openai` — cross-check that
anthropic path produces identical wire shape for text-only
- `test_history_to_gemini_contents_uses_model_role_and_parts_wrapper` —
verifies the Gemini-specific role mapping (`ai`→`model`) + parts wrapper
- `test_dispatch_passes_history_through` — end-to-end: _do_inference
forwards history to the chosen provider path
All 41 tests pass (15 Phase 2 dispatch + 26 Phase 1 registry):
pytest tests/test_hermes_phase2_dispatch.py tests/test_hermes_providers.py
41 passed in 0.07s
## Back-compat
- No public API changes to `create_executor()`. Callers that hit
`execute()` via A2A get the new multi-turn behavior automatically via
`extract_history(context)`.
- Callers that passed an empty history list (or None) get the same
single-turn behavior as pre-2c.
- The `build_task_text` helper in shared_runtime is unchanged — other
adapters (AutoGen, LangGraph) that use it keep working. Only Hermes
bypasses it now.
## What's NOT in this PR (Phase 2d)
- Tool calling / function calling on native paths (anthropic `tools=`,
gemini `tools=Tool(function_declarations=[...])`)
- Vision content blocks (image_url → anthropic `{type:"image", source:
{type:"base64",...}}` / gemini `{inline_data:{mime_type,data}}`)
- System instructions pass-through (anthropic `system=`, gemini
`system_instruction=`)
- Streaming (`astream_messages` / `streamGenerateContent` stream variants)
- Extended thinking (anthropic `thinking={"type":"enabled"}`) / Gemini
thinking config
Phase 2c is the **multi-turn upgrade**. Tool + vision + streaming are
Phase 2d, scoped in project_hermes_multi_provider.md.
## Related
- #240 Phase 2a (native Anthropic dispatch — in main)
- #255 Phase 2b (native Gemini dispatch — in main)
- Phase 1 (#208 — provider registry baseline, in main)
- `project_hermes_multi_provider.md` queued memory
- CEO 2026-04-15: "focus on supporting hermes agent"
Pre-existing flaky test: when the full workspace-template suite ran in
collection order, test_hermes_smoke.py::test_create_executor_raises_
without_keys failed with "DID NOT RAISE ValueError". Failure only
surfaced when test_hermes_providers ran first.
Root cause: test_hermes_providers had an autouse fixture that used
monkeypatch.delenv on entry, but several tests in that file mutate
os.environ directly (e.g. `os.environ["HERMES_API_KEY"] = "test"`),
bypassing monkeypatch. monkeypatch only tracks its own deltas, so on
fixture teardown the direct-mutation values stayed in os.environ.
HERMES_API_KEY leaked across file boundaries into test_hermes_smoke,
which then saw a key present when it expected absence.
Fix: replace monkeypatch-based fixture with pure snapshot/restore:
- Snapshot all provider env vars at entry
- Clear them
- yield (test runs, may mutate freely)
- try/finally restore the exact pre-test state
This is deterministic regardless of whether a test uses monkeypatch,
direct mutation, or neither. Also adds a comment documenting WHY we
switched away from monkeypatch so a future reviewer doesn't revert.
Full workspace-template suite: 1169 passed, 9 skipped, 2 xfailed.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Completes Hermes Phase 2 by adding the second native SDK path: Google Gemini
via the official `google-genai` Python SDK. Stacked on top of Phase 2a
(feat/hermes-phase2-native-sdks) which introduced the dispatch infra +
the anthropic native path.
## What's new in this PR
1. `providers.py`: flip `gemini` entry to `auth_scheme="gemini"` and
update `base_url` from the OpenAI-compat endpoint
(`/v1beta/openai`) to the bare host
(`https://generativelanguage.googleapis.com`) which the native SDK
uses.
2. `executor.py`: new method `_do_gemini_native(task_text)` that uses
`google.genai.Client().aio.models.generate_content(...)`. Dispatch
table in `_do_inference` now routes `"gemini"` → `_do_gemini_native`.
Same fail-loud semantics as `_do_anthropic_native` — missing SDK
raises a clear RuntimeError with install instructions.
3. `requirements.txt`: add `google-genai>=1.0.0`.
4. `test_hermes_phase2_dispatch.py`: +3 tests
- `test_gemini_entry_has_gemini_scheme` — registry flip + base URL
validated
- `test_dispatch_gemini_scheme_calls_gemini_native` — dispatch runs
gemini native, not openai-compat or anthropic-native
- `test_gemini_native_raises_clear_error_when_sdk_missing` — fail-loud
on missing `google-genai` package
Plus updated existing dispatch tests to mock `_do_gemini_native`
alongside the other paths so "no cross-calls" assertions stay tight.
All 36 tests pass locally (10 Phase 2 dispatch + 26 Phase 1 registry):
pytest tests/test_hermes_phase2_dispatch.py tests/test_hermes_providers.py
36 passed in 0.07s
## Dispatch table after this PR
auth_scheme="openai" → _do_openai_compat (13 providers)
auth_scheme="anthropic" → _do_anthropic_native (1 provider, Phase 2a)
auth_scheme="gemini" → _do_gemini_native (1 provider, Phase 2b) ← NEW
<unknown> → _do_openai_compat + warning (forward-compat)
## Back-compat
- All 13 openai-scheme providers unchanged
- `hermes_api_key` / `HERMES_API_KEY` / `OPENROUTER_API_KEY` paths unchanged
- Only `gemini` provider changes behavior: now uses native generateContent
instead of the `/v1beta/openai` compat shim
- Existing Gemini callers setting `GEMINI_API_KEY` get the native path
automatically — no caller changes needed
## What's NOT in this PR (future phases)
- Streaming support (`astream_messages` / `streamGenerateContent` stream
variants) for either native path
- Tool calling / function calling on native paths
- Vision content blocks (image_url → anthropic image blocks; image_url →
gemini inline_data with base64 + mime_type)
- Extended thinking (anthropic) / thinking config (gemini)
- System instructions pass-through on the gemini native path
Phase 2c/2d will layer these on. This PR is the minimum-viable native
dispatch — single-turn text in, text out — same shape as Phase 2a.
## Stacking
This PR targets `feat/hermes-phase2-native-sdks` (Phase 2a) as its base
branch, NOT main, so the diff shows only the Gemini-specific additions.
When Phase 2a merges to main, GitHub auto-rebases this PR onto the new
main head. If reviewer prefers a single combined PR, close#240 and land
this one instead — the commits on feat/hermes-phase2-native-sdks are
already included in this branch's history.
## Related
- #240 Phase 2a (parent branch)
- #208 Phase 1 (registry + openai-compat path — already in main)
- `project_hermes_multi_provider.md` queued memory — Phase 2 was the next
item, this PR completes it
- `docs/ecosystem-watch.md` → `### Hermes Agent` — Research Lead's
eco-watch entry that catalogued Hermes's native provider list and
shaped the original Phase 2 scope
Completes the Hermes adapter's native-SDK plan for the provider that gains
the most from leaving OpenAI-compat: Anthropic. OpenAI-compat works fine for
plain text turns on every provider (Phase 1 covered that with one code path
for all 15 providers), but Anthropic's Messages API has first-class tool use,
vision content blocks, and extended thinking that the OpenAI-compat shim
strips or mis-translates.
Rather than ship all native SDK paths in one PR (Anthropic + Gemini + future),
this lands Anthropic only (Phase 2a). Gemini is Phase 2b, shipping after a
production measurement window on Phase 2a.
## Design
Providers now dispatch by `auth_scheme` field. Phase 1 added the field but
every provider used `"openai"`. Phase 2 flips `anthropic` to `"anthropic"`
and wires a second inference path keyed on that:
- `HermesA2AExecutor._do_openai_compat(task_text)` — existing path, handles
14 of 15 providers (Nous Portal, OpenRouter, OpenAI, xAI, Gemini, Qwen,
GLM, Kimi, MiniMax, DeepSeek, Groq, Together, Fireworks, Mistral)
- `HermesA2AExecutor._do_anthropic_native(task_text)` — NEW, uses the
official `anthropic` Python SDK's `AsyncAnthropic().messages.create(...)`
- `HermesA2AExecutor._do_inference(task_text)` — dispatches by
`self.provider_cfg.auth_scheme`
Unknown schemes fall back to OpenAI-compat with a logged warning, so future
provider additions don't crash if a native SDK path ships late.
## Fail-loud on missing SDK
`_do_anthropic_native` raises a clear `RuntimeError` with install
instructions if the `anthropic` package is missing at runtime:
Hermes anthropic native path requires the `anthropic` package. Install
in the workspace image with `pip install anthropic>=0.39.0` or set
HERMES provider=openrouter to route Claude models through OpenRouter's
OpenAI-compat shim instead.
This is intentional: silent fallback would mask fidelity loss (tool_use
blocks become plain text, vision gets stripped). Loud failure is better.
`requirements.txt` adds `anthropic>=0.39.0` so the package is baked into
the workspace-template image build path. Operators building custom workspace
images without anthropic installed get the loud error.
## Back-compat
- `create_executor(hermes_api_key="x")` → still routes to Nous Portal
(`auth_scheme="openai"`), unchanged
- `HERMES_API_KEY` env var → still first in RESOLUTION_ORDER
- `OPENROUTER_API_KEY` env var → still second
- All 14 OpenAI-compat providers unchanged — they take the same code path
as before
- ONLY `anthropic` provider changes behavior: it now uses the native
Messages API instead of the `/v1/chat/completions` compat shim
## Constructor signature change
`HermesA2AExecutor.__init__` now takes `provider_cfg: ProviderConfig`
instead of separate `api_key + base_url + model`. The three fields are
derived from `provider_cfg` + an optional model override. This is a
breaking change for any external caller building an executor directly,
but the only documented public entry point is `create_executor()`, which
is updated in the same commit to pass the cfg through.
## Test coverage
`workspace-template/tests/test_hermes_phase2_dispatch.py` — 7 new tests:
1. `test_anthropic_entry_has_anthropic_scheme` — registry flip
2. `test_all_other_providers_still_openai_scheme` — regression guard
3. `test_dispatch_openai_scheme_calls_openai_compat` — happy path
4. `test_dispatch_anthropic_scheme_calls_anthropic_native` — happy path
5. `test_dispatch_unknown_scheme_falls_back_to_openai_compat` — forward compat
6. `test_anthropic_native_raises_clear_error_when_sdk_missing` — fail-loud
7. `test_create_executor_passes_provider_cfg` — constructor wiring
All pass locally (pytest tests/test_hermes_phase2_dispatch.py -v, 0.04s).
Phase 1 tests unchanged: `test_hermes_providers.py` 26/26 pass, no
regressions.
## What's NOT in this PR (Phase 2b)
- Gemini native `generateContent` path (`auth_scheme="gemini"`)
- Streaming support across both native paths (`astream_messages`, `streamGenerateContent`)
- Tool calling on the anthropic native path (the `tools` + `tool_use` blocks)
- Vision content blocks (image_url → anthropic image blocks)
- Extended thinking parameter passthrough
All scoped in `project_hermes_multi_provider.md`. Phase 2a is the minimum
viable native Anthropic dispatch — single-turn text in, text out, no tools.
## Related
- Phase 1 baseline (already in main): #208 — provider registry + OpenAI-compat path
- Queued memory: `project_hermes_multi_provider.md` — full phased plan
- Triggering directive: CEO 2026-04-15 — "once current works are cleared,
focus on supporting hermes agent"
Context: when the claude-agent-sdk wraps a stream error from the CLI
subprocess that it can't categorize (rate limit, auth, network), it
raises a bare `Exception("Command failed with exit code 1\nError output:
Check stderr output for details")`. The exception has no `.stderr` or
`.exit_code` attributes, so #66's `_format_process_error` — which reads
those attributes — has nothing to surface. The log line becomes:
SDK agent error [claude-code]: Exception: Command failed with exit
code 1 (exit code: 1)\nError output: Check stderr output for details
That's the placeholder text from the SDK's error path, not the actual
error. Operators chasing a stuck workspace are forced to `docker exec
ws-xxx claude --print` manually to discover the real cause. Observed
today during the rate-limit incident: every PM error line was identical
"Check stderr output for details" while the real cause ("You've hit
your limit · resets Apr 17, 11pm (UTC)") was only visible via manual
reproduction — that cost ~20 minutes of diagnosis time.
## Fix
Add `_probe_claude_cli_error()`: a best-effort subprocess call that runs
`claude --print` with a small probe input, captures stderr+stdout, and
returns the real error string. Bounded by 30s timeout so a hung CLI
can't stall the error path.
Extend `_format_process_error` with ONE narrow fallback: if the
exception has no stderr/exit_code AND its message contains the specific
"Check stderr output for details" marker, call the probe and append
`probed_cli_error=<real error>` to the formatted line.
Critically: the probe only runs in the narrow case where we have
nothing else to log. If `.stderr` or `.exit_code` are present (the
normal ProcessError path from #66), the probe is skipped — no wasted
subprocess, no 30s latency on every error.
## Test coverage
`workspace-template/tests/test_claude_sdk_executor.py` adds 3 new tests:
- `test_format_process_error_probes_cli_when_stderr_swallowed` — the
happy path: exception matches the marker, probe runs, result appears
in the formatted line. Probe is monkeypatched so no subprocess spawns
in the test.
- `test_format_process_error_does_not_probe_when_stderr_already_present` —
negative: regular ProcessError with `.stderr` set does NOT trigger
the probe (skip the wasted call).
- `test_format_process_error_does_not_probe_without_swallowed_marker` —
negative: unrelated plain exceptions (e.g. RuntimeError) do NOT
trigger the probe (so the common-case error path stays fast).
All 7 `_format_process_error` tests pass locally (4 existing + 3 new):
\`\`\`
pytest tests/test_claude_sdk_executor.py -k format_process_error
======================= 7 passed in 0.06s ========================
\`\`\`
## Impact
Next time the SDK swallows a real error (rate limit, auth failure,
network outage), the workspace log will contain the actual error string
alongside the generic placeholder:
SDK agent error [claude-code]: Exception: Command failed with exit
code 1 ... | probed_cli_error="You've hit your limit · resets Apr
17, 11pm (UTC)"
Diagnosis time drops from "docker exec each ws, run claude --print,
read stderr" (~20 min) to "grep probed_cli_error in platform logs"
(~10 seconds).
Closes#160.
Ships the first half of the queued Hermes adapter expansion. PR 2 only
supported Nous Portal + OpenRouter; this adds 13 more providers reachable
via OpenAI-compat endpoints. Native SDK paths for Anthropic + Gemini are
Phase 2 (better tool-calling + vision fidelity).
## What's new
**`workspace-template/adapters/hermes/providers.py`** (new file, 220 LOC):
- ``ProviderConfig`` dataclass: name, env vars, base URL, default model, auth scheme, docs
- ``PROVIDERS`` dict with 15 entries across 4 groups:
- PR 2 baseline: nous_portal, openrouter
- Frontier commercial: openai, anthropic, xai, gemini
- Chinese providers: qwen, glm, kimi, minimax, deepseek
- OSS/alt: groq, together, fireworks, mistral
- ``RESOLUTION_ORDER`` tuple: priority for auto-detect (back-compat first,
then commercial, then Chinese, then OSS/alt)
- ``resolve_provider(explicit=None)`` -> (ProviderConfig, api_key)
- With explicit name: routes to that provider, raises if env var empty
- Without: walks RESOLUTION_ORDER, first env-var-set provider wins
**`workspace-template/adapters/hermes/executor.py`** (refactored):
- `create_executor(hermes_api_key=None, provider=None, model=None)` now has
three parameters:
- `hermes_api_key`: PR 2 back-compat — routes to Nous Portal
- `provider`: canonical short name from the registry (e.g. "anthropic")
- `model`: optional override of the provider's default model
- Delegates all resolution to `providers.resolve_provider()` — no more
hardcoded URLs or env var lookups in the executor itself
- `HermesA2AExecutor.__init__` no longer has Nous-specific defaults; callers
pass base_url + model explicitly (which create_executor always does)
**`workspace-template/tests/test_hermes_providers.py`** (new file, 26 tests):
- Registry shape invariants (count >= 15, no duplicates, every config valid)
- PR 2 back-compat: HERMES_API_KEY / OPENROUTER_API_KEY still route correctly
- Auto-detect for every provider in the registry (parametrized — guards against
typos in env var lists)
- Explicit `provider=` bypass of auto-detect
- Error cases: unknown provider, explicit-but-empty, auto-detect-with-no-env
- All 26 tests pass locally in 0.08s
## Back-compat guarantees
| Scenario | PR 2 behavior | This PR behavior |
|---|---|---|
| `create_executor(hermes_api_key="x")` | Nous Portal | Nous Portal (unchanged) |
| `HERMES_API_KEY=x` env, auto-detect | Nous Portal | Nous Portal (unchanged) |
| `OPENROUTER_API_KEY=x` env, auto-detect | OpenRouter | OpenRouter (unchanged) |
| Both env + explicit hermes_api_key param | Nous Portal (param wins) | Nous Portal (param wins, unchanged) |
Nothing existing can break. New callers gain access to 13 more providers.
## What's NOT in this PR (Phase 2)
- **Native Anthropic Messages API path** — better tool calling, vision, extended
thinking. Requires pulling in `anthropic` SDK. ~50 LOC.
- **Native Gemini generateContent path** — for vision + google tools. Requires
`google-genai` SDK. ~50 LOC.
- **Streaming support across all providers** — current executor is non-streaming
(single chat.completions.create call). Streaming works with openai.AsyncOpenAI
but hasn't been wired to the A2A event queue path. ~30 LOC.
- **Per-provider model overrides in config.yaml** — Phase 1 uses the registry's
default_model. Phase 2 adds a `hermes: { provider: qwen, model: qwen3-coder-plus }`
block in the workspace config.
- **`.env.example` updates** — not critical since the registry itself documents
every env var via the `env_vars` field, but nice-to-have.
## Related
- Queued memory: `project_hermes_multi_provider.md`
- CEO directive 2026-04-15: *"once current works are cleared, I want you to
focus on supporting hermes agent, right now it doesnt take too much providers"*
- `docs/ecosystem-watch.md` → `### Hermes Agent` — Research Lead's eco-watch
entry listed "Nous Portal, OpenRouter, GLM, Kimi, MiniMax, OpenAI, …" which
shaped this registry's initial set
## Test plan
- [x] Unit tests: 26/26 pass locally (pytest)
- [ ] CI will run on the self-hosted macOS arm64 runner
- [ ] Smoke test in a real workspace: set QWEN_API_KEY and verify Technical
Researcher actually hits Alibaba DashScope successfully
- [ ] Integration test per provider with real API keys (gated on env, skip
when not set — Phase 2 CI addition)
#173 — implement cancel() in LangGraphA2AExecutor: emits
TaskStatusUpdateEvent(state=canceled, final=True) so clients see the
state transition rather than silence. Removes pragma: no cover.
Test: test_cancel_emits_canceled_event.
#174 — add stateTransitionHistory=True to AgentCapabilities in main.py
so microsoft/agent-framework clients know they can request full task
history via the A2A protocol.
#175 — wire InMemoryPushNotificationConfigStore and PushNotificationSender
into DefaultRequestHandler so the advertised pushNotifications capability
is backed by a real store. Both classes live in a2a.server.tasks (a2a-sdk
0.3.25); import confirmed by probe.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
get_peers() was sending no auth headers to /registry/:id/peers — this would
return 401 for every workspace agent after PR #31 (WorkspaceAuth middleware)
deploys, breaking peer discovery entirely.
discover_peer() had X-Workspace-ID but was missing the bearer token, also
required by Phase 30.6 for /registry/discover/:id.
Both functions now send {"X-Workspace-ID": WORKSPACE_ID, **auth_headers()}.
get_workspace_info() was already correct (auth_headers() present since PR #39).
Adds test_request_sends_workspace_id_header to TestGetPeers; hardens the
discover_peer header assertion to use presence-check rather than exact equality.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Both watcher.py (ConfigWatcher) and skill_loader/watcher.py
(SkillsWatcher) used hashlib.md5() for file-integrity change detection.
MD5 is collision-prone: a crafted config file could produce the same
hash as a benign one, silently suppressing the hot-reload callback and
preventing agents from picking up legitimate config changes.
Replace hashlib.md5 → hashlib.sha256 in both _hash_file() methods.
Update docstrings, comments, and the type-annotation comment
(rel_path → md5 hex → sha256 hex).
Test update: test_skills_watcher.py — rename helper _md5 → _sha256,
update the hash-length assertion from 32 (MD5) to 64 (SHA-256), and
rename the test from test_hash_file_returns_md5_for_existing_file to
test_hash_file_returns_sha256_for_existing_file. All 25 watcher tests
pass.
Note: H2 (a2a_client.py timeout=None) was already fixed in Cycle 5
(timeout=httpx.Timeout(connect=30.0, read=300.0, ...)) — confirmed by
code review before opening this PR.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Fix A — platform/internal/middleware/wsauth_middleware.go (NEW):
WorkspaceAuth() gin middleware enforces per-workspace bearer-token auth on
ALL /workspaces/:id/* sub-routes. Same lazy-bootstrap contract as
secrets.Values: workspaces with no live token are grandfathered through.
Blocks C2, C3, C4, C5, C7, C8, C9, C12, C13 simultaneously.
Fix A — platform/internal/router/router.go:
Reorganised route registration: bare CRUD (/workspaces, /workspaces/:id)
and /a2a remain on root router; all other /workspaces/:id/* sub-routes
moved into wsAuth = r.Group("/workspaces/:id", middleware.WorkspaceAuth(db.DB)).
CORS AllowHeaders updated to include Authorization so browser/agent callers
can send the bearer token cross-origin.
Fix B — workspace-template/heartbeat.py:
_check_delegations(): validate source_id == self.workspace_id before
accepting a delegation result. Attacker-crafted records with a foreign
source_id are silently skipped with a WARNING log (injection attempt).
trigger_msg no longer embeds raw response_preview text; references
delegation_id + status only — removes the prompt-injection vector.
Fix C — workspace-template/skill_loader/loader.py:
load_skill_tools(): before exec_module(), verify script is within
scripts_dir (path traversal guard) and temporarily scrub sensitive env
vars (CLAUDE_CODE_OAUTH_TOKEN, ANTHROPIC_API_KEY, OPENAI_API_KEY,
WORKSPACE_AUTH_TOKEN, GITHUB_TOKEN, GH_TOKEN) from os.environ; restore
in finally block. Defence-in-depth even if /plugins auth gate is bypassed.
Fix D — platform/internal/handlers/socket.go:
HandleConnect(): agent connections (X-Workspace-ID present) validated via
wsauth.HasAnyLiveToken + wsauth.ValidateToken before WebSocket upgrade.
Canvas clients (no X-Workspace-ID) remain unauthenticated.
Fix D — workspace-template/events.py:
PlatformEventSubscriber._connect(): include platform_auth bearer token in
WebSocket upgrade headers alongside X-Workspace-ID.
Fix E — workspace-template/executor_helpers.py:
recall_memories() and commit_memory() now pass platform_auth bearer token
in Authorization header so WorkspaceAuth middleware allows access.
Fix F — workspace-template/a2a_client.py:
send_a2a_message(): timeout=None → httpx.Timeout(connect=30, read=300,
write=30, pool=30). Resolves H2 flagged across 5 consecutive audits.
Tests: 149/149 Python tests pass (test_heartbeat + test_events updated to
assert new source_id validation behaviour and allow Authorization header).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>