The #215-class fix in memory.py (859a60e) adds headers=_headers to the
direct-httpx commit_memory + search_memory paths, but 9 existing tests
in test_memory.py had FakeAsyncClient.post/get signatures like
`async def post(self, url, json):` with no headers kwarg. Python
raised TypeError: unexpected keyword argument 'headers' on every call,
commit_memory caught it and returned {success: False}, tests failed.
Fixes applied:
1. Add `headers=None` to every FakeAsyncClient.post + .get signature
across test_memory.py. Uses replace_all so all 9+ fakes match.
2. For tests that capture a single captured["url"]:
- test_commit_memory_uses_awareness_client_when_configured
- test_commit_memory_uses_platform_fallback_without_awareness
- test_commit_memory_httpx_201_success
filter to only capture /memories URLs. Without the filter, the
subsequent _record_memory_activity fire-and-forget post to /activity
overwrites captured["url"] and the assertion fails.
3. For test_commit_memory_promoted_packet_logs_skill_promotion: bump
expected captured["calls"] from 3 to 4. Pre-fix, the memory_write
/activity call (from _record_memory_activity #125) was silently
dropped because the fake rejected headers=; post-fix it succeeds
and lands in the captured list alongside the skill_promotion
/activity and /registry/heartbeat calls. Also extend that test's
fake to accept /registry/heartbeat (was raising AssertionError).
Total: 36/36 memory tests pass. Full workspace-template suite 1189/1189.
This is strictly test-infrastructure work — zero production code
changed. CI never caught the break because the Mac mini runner has
been stuck for ~4 hours (tick-33/34/35/36 reports).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Context: platform now gates `GET /workspaces/:id/memories` and
`POST /workspaces/:id/memories` behind workspace auth (post-#166 /
#167 AdminAuth wave). The `builtin_tools.memory` tool had three HTTP
call sites:
1. commit_memory POST fallback (line 121) ← NO auth_headers
2. search_memory GET fallback (line 269) ← NO auth_headers
3. activity-log helper POST (line 371) ← HAS auth_headers
Path 3 was already fixed. Paths 1 + 2 silently 401 every call, but the
tool's error-handling path returns `{"success": False}` without surfacing
the auth failure to the agent. Result: the agent sees an empty memory
backlog on every call and assumes there's nothing to do.
## Discovered today
Technical Researcher is the first workspace opted in to the idle-loop
pilot from #216 (reflection-on-completion pattern). The pilot fires
every 10 min, the agent calls `search_memory "research-backlog:..."` as
the first step, gets back an empty result, writes "tr-idle clean" to
memory, and stops. Clean-idle outcome every tick, 9 consecutive ticks.
Looking at TR's activity_logs response bodies:
"Memory auth has failed on every tick this session — skipping the call"
"tr-idle — step 2 done. Memory unavailable (auth token missing..."
"tr-idle 04:15 — clean (memory auth still down, 3rd consecutive tick)"
The AGENT knew the memory calls were failing. The platform 401 error
was surfacing in the tool response, but our instrumentation wasn't
counting it as a defect — we saw "tr-idle clean" writes and assumed
the pilot was working as designed. It was actually silently broken.
## Fix
Import `platform_auth.auth_headers` lazily (same pattern as the
activity-log path already uses), attach `headers=_auth()` to both
httpx call sites. Matches the #225 fix for the register call.
## Not in this PR
- awareness_client.py also makes HTTP calls to a separate AWARENESS_URL
service (not the platform), which may or may not need the same fix
depending on that service's auth posture. Out of scope for this PR.
- TR's specific token problem: TR's `/configs/.auth_token` file is
empty because it was re-provisioned via `apply_template: true`
(recovery path from the failed-volume incident) and Phase 30.1
only mints a token on FIRST register per workspace. This fix
doesn't help TR until TR gets a fresh token — tracked separately.
## Test plan
- [x] Python syntax check on memory.py passes
- [ ] CI: all memory-related tests should still pass (the new code
paths only add header passing, no shape change)
- [ ] Real-world verification: after TR gets a fresh token, idle-loop
pilot should produce a dispatch within 10 min (seeded backlog
already in place from this session)
## Related
- #215 / #225 — register call auth_headers fix (same pattern)
- #216 — TR idle-loop pilot (couldn't measure until this lands)
- #166 / #167 — platform AdminAuth wave that surfaced this gap
The Hermes adapter never read /configs/system-prompt.md. Any role that
switched to runtime: hermes was silently losing its role identity because
the system prompt wasn't passed to the model. This PR fixes that by:
1. HermesA2AExecutor.__init__ takes new optional `config_path` kwarg
2. `create_executor(config_path=...)` forwards to the constructor
3. `adapter.py` passes `config.config_path` through from AdapterConfig
4. `execute()` reads system-prompt.md via executor_helpers.get_system_prompt
(hot-reload-capable — reads on every turn, not just at startup)
5. `_do_inference(user_message, history, system_prompt)` — new arg threads
through the dispatch to each native path
6. Each path uses the provider's NATIVE system field:
- OpenAI-compat: prepends `{"role":"system", "content":...}` to messages
- Anthropic: top-level `system=` kwarg (NOT in messages — Anthropic
requires system at the top level)
- Gemini: `config=GenerateContentConfig(system_instruction=...)`
## Phase scoreboard
- 2a (in main) — native Anthropic dispatch infra
- 2b (in main) — native Gemini dispatch
- 2c (in main) — multi-turn history on all paths
- **2d-i (this PR)** — system prompts on all paths
- 2d-ii (future) — tool calling on native paths
- 2d-iii (future) — vision content blocks on native paths
- 2d-iv (future) — streaming
## Test coverage
46/46 tests pass (20 Phase 2 dispatch + 26 Phase 1 registry):
- Existing dispatch tests updated to assert the 3-arg call shape
`("hello", None, None)` — history + system_prompt both None
- 4 new tests:
- `dispatch_passes_system_prompt_to_anthropic` — happy path, third arg flows
- `dispatch_passes_system_prompt_to_gemini` — happy path
- `dispatch_passes_system_prompt_to_openai` — happy path
- `executor_accepts_config_path_kwarg` — constructor stores config_path
- `create_executor_forwards_config_path` — both back-compat and registry
resolution paths forward config_path through to the executor
## Back-compat
- `config_path=None` (default) → execute() skips system-prompt injection,
same behavior as pre-2d-i
- Workspaces with `runtime: hermes` but no `/configs/system-prompt.md`
file get `system_prompt=None` (get_system_prompt returns fallback),
same as before
- The 13 OpenAI-compat providers work identically — system_prompt just
adds a leading message, which every OpenAI-compat endpoint already
supports
- Anthropic + Gemini previously got zero system context; now they get
the same system prompt the workspace's system-prompt.md carries
## Why this matters
Before this PR: if someone flipped a workspace from `runtime: claude-code`
to `runtime: hermes`, the agent would act generically (no role identity,
no project conventions, no CLAUDE.md context) because the Hermes executor
never looked at system-prompt.md. That's a silent correctness regression
the test suite wouldn't catch because none of our live workspaces use
the hermes runtime today.
With this PR: Hermes workspaces get the same system prompt injection as
Claude-code workspaces, making the `runtime: hermes` switch a true drop-in
alternative.
## Related
- #267 Phase 2c (multi-turn history — in main)
- #255 Phase 2b (gemini native — in main)
- #240 Phase 2a (anthropic native — in main)
- #208 Phase 1 (provider registry — in main)
- project_hermes_multi_provider.md — Phase 2d-i was the next queued item
Completes the Phase 2 scope by keeping conversation turns as turns across
all three dispatch paths. Pre-2c, history was flattened into a single user
message via shared_runtime.build_task_text, which worked as a fallback but
lost the model's native multi-turn awareness (role attribution,
instruction-following on mid-conversation corrections, system-prompt
grounding against prior turns).
Phase 2a + 2b shipped the dispatch infrastructure + per-provider native
paths. This PR uses them properly.
## What's new
- **`_history_to_openai_messages(user_message, history)`** (static) — maps
A2A `(role, text)` tuples to OpenAI Chat Completions
`[{"role":"user"|"assistant","content":str}]`. Roles: `human`→`user`,
`ai`→`assistant`. Current turn appended as the final user message.
- **`_history_to_anthropic_messages`** (static) — identical wire shape to
OpenAI for text-only turns, so it delegates. Phase 2d tool_use/vision
blocks will diverge here.
- **`_history_to_gemini_contents`** (static) — Gemini uses a different
shape: `role="user"|"model"` (NOT "assistant") and text wrapped in
`parts=[{"text":...}]`. Delegates to none of the others.
- **`_do_openai_compat(user_message, history=None)`** — accepts history,
builds messages via `_history_to_openai_messages`. Back-compat: pass
`history=None` to get the old single-turn behavior.
- **`_do_anthropic_native(user_message, history=None)`** — same signature
change, calls `_history_to_anthropic_messages`. Still uses
`anthropic.AsyncAnthropic().messages.create()`, just with proper
multi-turn.
- **`_do_gemini_native(user_message, history=None)`** — same pattern,
calls `_history_to_gemini_contents`, passes to Gemini's
`generate_content(contents=...)`.
- **`_do_inference(user_message, history=None)`** — new signature,
dispatches by auth_scheme as before, passes both args through.
- **`execute()`** — no longer calls `build_task_text`. Calls
`extract_history(context)` directly and forwards to `_do_inference`.
Removes the `build_task_text` import (not needed in this file anymore).
## Tests
Existing 7 dispatch tests updated for the new `(user_message, history)`
signature — they assert the path is called with `("hello", None)` since
they pass no history.
5 NEW tests:
- `test_history_to_openai_messages_empty_history` — empty history degrades
to single user message (back-compat)
- `test_history_to_openai_messages_multi_turn` — round-trip of a 3-turn
history + current turn
- `test_history_to_anthropic_messages_same_as_openai` — cross-check that
anthropic path produces identical wire shape for text-only
- `test_history_to_gemini_contents_uses_model_role_and_parts_wrapper` —
verifies the Gemini-specific role mapping (`ai`→`model`) + parts wrapper
- `test_dispatch_passes_history_through` — end-to-end: _do_inference
forwards history to the chosen provider path
All 41 tests pass (15 Phase 2 dispatch + 26 Phase 1 registry):
pytest tests/test_hermes_phase2_dispatch.py tests/test_hermes_providers.py
41 passed in 0.07s
## Back-compat
- No public API changes to `create_executor()`. Callers that hit
`execute()` via A2A get the new multi-turn behavior automatically via
`extract_history(context)`.
- Callers that passed an empty history list (or None) get the same
single-turn behavior as pre-2c.
- The `build_task_text` helper in shared_runtime is unchanged — other
adapters (AutoGen, LangGraph) that use it keep working. Only Hermes
bypasses it now.
## What's NOT in this PR (Phase 2d)
- Tool calling / function calling on native paths (anthropic `tools=`,
gemini `tools=Tool(function_declarations=[...])`)
- Vision content blocks (image_url → anthropic `{type:"image", source:
{type:"base64",...}}` / gemini `{inline_data:{mime_type,data}}`)
- System instructions pass-through (anthropic `system=`, gemini
`system_instruction=`)
- Streaming (`astream_messages` / `streamGenerateContent` stream variants)
- Extended thinking (anthropic `thinking={"type":"enabled"}`) / Gemini
thinking config
Phase 2c is the **multi-turn upgrade**. Tool + vision + streaming are
Phase 2d, scoped in project_hermes_multi_provider.md.
## Related
- #240 Phase 2a (native Anthropic dispatch — in main)
- #255 Phase 2b (native Gemini dispatch — in main)
- Phase 1 (#208 — provider registry baseline, in main)
- `project_hermes_multi_provider.md` queued memory
- CEO 2026-04-15: "focus on supporting hermes agent"
Pre-existing flaky test: when the full workspace-template suite ran in
collection order, test_hermes_smoke.py::test_create_executor_raises_
without_keys failed with "DID NOT RAISE ValueError". Failure only
surfaced when test_hermes_providers ran first.
Root cause: test_hermes_providers had an autouse fixture that used
monkeypatch.delenv on entry, but several tests in that file mutate
os.environ directly (e.g. `os.environ["HERMES_API_KEY"] = "test"`),
bypassing monkeypatch. monkeypatch only tracks its own deltas, so on
fixture teardown the direct-mutation values stayed in os.environ.
HERMES_API_KEY leaked across file boundaries into test_hermes_smoke,
which then saw a key present when it expected absence.
Fix: replace monkeypatch-based fixture with pure snapshot/restore:
- Snapshot all provider env vars at entry
- Clear them
- yield (test runs, may mutate freely)
- try/finally restore the exact pre-test state
This is deterministic regardless of whether a test uses monkeypatch,
direct mutation, or neither. Also adds a comment documenting WHY we
switched away from monkeypatch so a future reviewer doesn't revert.
Full workspace-template suite: 1169 passed, 9 skipped, 2 xfailed.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Completes Hermes Phase 2 by adding the second native SDK path: Google Gemini
via the official `google-genai` Python SDK. Stacked on top of Phase 2a
(feat/hermes-phase2-native-sdks) which introduced the dispatch infra +
the anthropic native path.
## What's new in this PR
1. `providers.py`: flip `gemini` entry to `auth_scheme="gemini"` and
update `base_url` from the OpenAI-compat endpoint
(`/v1beta/openai`) to the bare host
(`https://generativelanguage.googleapis.com`) which the native SDK
uses.
2. `executor.py`: new method `_do_gemini_native(task_text)` that uses
`google.genai.Client().aio.models.generate_content(...)`. Dispatch
table in `_do_inference` now routes `"gemini"` → `_do_gemini_native`.
Same fail-loud semantics as `_do_anthropic_native` — missing SDK
raises a clear RuntimeError with install instructions.
3. `requirements.txt`: add `google-genai>=1.0.0`.
4. `test_hermes_phase2_dispatch.py`: +3 tests
- `test_gemini_entry_has_gemini_scheme` — registry flip + base URL
validated
- `test_dispatch_gemini_scheme_calls_gemini_native` — dispatch runs
gemini native, not openai-compat or anthropic-native
- `test_gemini_native_raises_clear_error_when_sdk_missing` — fail-loud
on missing `google-genai` package
Plus updated existing dispatch tests to mock `_do_gemini_native`
alongside the other paths so "no cross-calls" assertions stay tight.
All 36 tests pass locally (10 Phase 2 dispatch + 26 Phase 1 registry):
pytest tests/test_hermes_phase2_dispatch.py tests/test_hermes_providers.py
36 passed in 0.07s
## Dispatch table after this PR
auth_scheme="openai" → _do_openai_compat (13 providers)
auth_scheme="anthropic" → _do_anthropic_native (1 provider, Phase 2a)
auth_scheme="gemini" → _do_gemini_native (1 provider, Phase 2b) ← NEW
<unknown> → _do_openai_compat + warning (forward-compat)
## Back-compat
- All 13 openai-scheme providers unchanged
- `hermes_api_key` / `HERMES_API_KEY` / `OPENROUTER_API_KEY` paths unchanged
- Only `gemini` provider changes behavior: now uses native generateContent
instead of the `/v1beta/openai` compat shim
- Existing Gemini callers setting `GEMINI_API_KEY` get the native path
automatically — no caller changes needed
## What's NOT in this PR (future phases)
- Streaming support (`astream_messages` / `streamGenerateContent` stream
variants) for either native path
- Tool calling / function calling on native paths
- Vision content blocks (image_url → anthropic image blocks; image_url →
gemini inline_data with base64 + mime_type)
- Extended thinking (anthropic) / thinking config (gemini)
- System instructions pass-through on the gemini native path
Phase 2c/2d will layer these on. This PR is the minimum-viable native
dispatch — single-turn text in, text out — same shape as Phase 2a.
## Stacking
This PR targets `feat/hermes-phase2-native-sdks` (Phase 2a) as its base
branch, NOT main, so the diff shows only the Gemini-specific additions.
When Phase 2a merges to main, GitHub auto-rebases this PR onto the new
main head. If reviewer prefers a single combined PR, close#240 and land
this one instead — the commits on feat/hermes-phase2-native-sdks are
already included in this branch's history.
## Related
- #240 Phase 2a (parent branch)
- #208 Phase 1 (registry + openai-compat path — already in main)
- `project_hermes_multi_provider.md` queued memory — Phase 2 was the next
item, this PR completes it
- `docs/ecosystem-watch.md` → `### Hermes Agent` — Research Lead's
eco-watch entry that catalogued Hermes's native provider list and
shaped the original Phase 2 scope
Completes the Hermes adapter's native-SDK plan for the provider that gains
the most from leaving OpenAI-compat: Anthropic. OpenAI-compat works fine for
plain text turns on every provider (Phase 1 covered that with one code path
for all 15 providers), but Anthropic's Messages API has first-class tool use,
vision content blocks, and extended thinking that the OpenAI-compat shim
strips or mis-translates.
Rather than ship all native SDK paths in one PR (Anthropic + Gemini + future),
this lands Anthropic only (Phase 2a). Gemini is Phase 2b, shipping after a
production measurement window on Phase 2a.
## Design
Providers now dispatch by `auth_scheme` field. Phase 1 added the field but
every provider used `"openai"`. Phase 2 flips `anthropic` to `"anthropic"`
and wires a second inference path keyed on that:
- `HermesA2AExecutor._do_openai_compat(task_text)` — existing path, handles
14 of 15 providers (Nous Portal, OpenRouter, OpenAI, xAI, Gemini, Qwen,
GLM, Kimi, MiniMax, DeepSeek, Groq, Together, Fireworks, Mistral)
- `HermesA2AExecutor._do_anthropic_native(task_text)` — NEW, uses the
official `anthropic` Python SDK's `AsyncAnthropic().messages.create(...)`
- `HermesA2AExecutor._do_inference(task_text)` — dispatches by
`self.provider_cfg.auth_scheme`
Unknown schemes fall back to OpenAI-compat with a logged warning, so future
provider additions don't crash if a native SDK path ships late.
## Fail-loud on missing SDK
`_do_anthropic_native` raises a clear `RuntimeError` with install
instructions if the `anthropic` package is missing at runtime:
Hermes anthropic native path requires the `anthropic` package. Install
in the workspace image with `pip install anthropic>=0.39.0` or set
HERMES provider=openrouter to route Claude models through OpenRouter's
OpenAI-compat shim instead.
This is intentional: silent fallback would mask fidelity loss (tool_use
blocks become plain text, vision gets stripped). Loud failure is better.
`requirements.txt` adds `anthropic>=0.39.0` so the package is baked into
the workspace-template image build path. Operators building custom workspace
images without anthropic installed get the loud error.
## Back-compat
- `create_executor(hermes_api_key="x")` → still routes to Nous Portal
(`auth_scheme="openai"`), unchanged
- `HERMES_API_KEY` env var → still first in RESOLUTION_ORDER
- `OPENROUTER_API_KEY` env var → still second
- All 14 OpenAI-compat providers unchanged — they take the same code path
as before
- ONLY `anthropic` provider changes behavior: it now uses the native
Messages API instead of the `/v1/chat/completions` compat shim
## Constructor signature change
`HermesA2AExecutor.__init__` now takes `provider_cfg: ProviderConfig`
instead of separate `api_key + base_url + model`. The three fields are
derived from `provider_cfg` + an optional model override. This is a
breaking change for any external caller building an executor directly,
but the only documented public entry point is `create_executor()`, which
is updated in the same commit to pass the cfg through.
## Test coverage
`workspace-template/tests/test_hermes_phase2_dispatch.py` — 7 new tests:
1. `test_anthropic_entry_has_anthropic_scheme` — registry flip
2. `test_all_other_providers_still_openai_scheme` — regression guard
3. `test_dispatch_openai_scheme_calls_openai_compat` — happy path
4. `test_dispatch_anthropic_scheme_calls_anthropic_native` — happy path
5. `test_dispatch_unknown_scheme_falls_back_to_openai_compat` — forward compat
6. `test_anthropic_native_raises_clear_error_when_sdk_missing` — fail-loud
7. `test_create_executor_passes_provider_cfg` — constructor wiring
All pass locally (pytest tests/test_hermes_phase2_dispatch.py -v, 0.04s).
Phase 1 tests unchanged: `test_hermes_providers.py` 26/26 pass, no
regressions.
## What's NOT in this PR (Phase 2b)
- Gemini native `generateContent` path (`auth_scheme="gemini"`)
- Streaming support across both native paths (`astream_messages`, `streamGenerateContent`)
- Tool calling on the anthropic native path (the `tools` + `tool_use` blocks)
- Vision content blocks (image_url → anthropic image blocks)
- Extended thinking parameter passthrough
All scoped in `project_hermes_multi_provider.md`. Phase 2a is the minimum
viable native Anthropic dispatch — single-turn text in, text out, no tools.
## Related
- Phase 1 baseline (already in main): #208 — provider registry + OpenAI-compat path
- Queued memory: `project_hermes_multi_provider.md` — full phased plan
- Triggering directive: CEO 2026-04-15 — "once current works are cleared,
focus on supporting hermes agent"
Closes#220. #215 added auth_headers() to /registry/register but missed
two other self-post paths from the same workspace container:
1. initial_prompt (_do_send_sync) — fires once on first boot after the
A2A server is ready. Posts to /workspaces/:id/a2a via the platform
proxy. Missing headers meant the initial prompt got silently
dropped as 401 on any token-enrolled workspace.
2. idle loop (_post_sync) — fires every idle_interval_seconds while
the workspace has no active task (#205 pattern). Same proxy path,
same missing headers, same silent 401 in multi-tenant mode.
Both now build headers as
{"Content-Type": "application/json", **auth_headers()}
auth_headers() returns {"Authorization": "Bearer <token>"} when
/auth-token.txt exists, empty dict otherwise (first boot before
register issues the token). The existing lazy-bootstrap fail-open
on the platform side covers the empty-dict case.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Addresses items 4, 5, 7 from the self-review of the batch merge. PR A
(#228) covered items 1, 2, 3, 6 on the Go side.
## workspace-template/main.py — idle loop hardening
- Replace asyncio.get_event_loop() with asyncio.get_running_loop() —
the former is deprecated in 3.12+ and emits a DeprecationWarning on
every idle fire.
- Replace hardcoded urlopen timeout=600 with IDLE_FIRE_TIMEOUT_SECONDS
clamped to max(60, min(300, idle_interval_seconds)). Long cadence
workspaces no longer hold dangling requests open for 10 minutes; the
cap adapts automatically when the interval is short.
- Type the exception handling: split HTTPError (has .code) from URLError
(connection-level) from the generic catch-all. Log status + error
class separately so operators can grep for specific failure modes
instead of a bare "post failed".
- Fire-and-forget no longer loses exceptions. run_in_executor Future
now has an add_done_callback that logs the outcome, so a panic in
_post_sync surfaces as "Idle loop: post failed — status=None err=..."
instead of Python's default "Task exception was never retrieved"
warning burried in stderr.
## org-templates/molecule-dev/org.yaml — discoverability
Added idle_prompt + idle_interval_seconds to the defaults: block with
explanatory comments. Without this, users had to read main.py to
discover the feature.
## docs/runbooks/admin-auth.md — new
Documents the three middleware variants (AdminAuth strict,
CanvasOrBearer soft, WorkspaceAuth per-id), the exact contract of each,
and the three-question test for adding a new route to CanvasOrBearer.
Also flags the session-cookie follow-up as Phase H.
Referenced PRs: #138, #164, #165, #166, #167, #168, #190, #194, #203,
#228.
No code deltas in platform/ beyond the Python + YAML + docs changes.
Full pytest suite unchanged except the pre-existing test_hermes_smoke
flake that fails in full-suite but passes in isolation (test isolation
bug, not introduced by this PR).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Context: when the claude-agent-sdk wraps a stream error from the CLI
subprocess that it can't categorize (rate limit, auth, network), it
raises a bare `Exception("Command failed with exit code 1\nError output:
Check stderr output for details")`. The exception has no `.stderr` or
`.exit_code` attributes, so #66's `_format_process_error` — which reads
those attributes — has nothing to surface. The log line becomes:
SDK agent error [claude-code]: Exception: Command failed with exit
code 1 (exit code: 1)\nError output: Check stderr output for details
That's the placeholder text from the SDK's error path, not the actual
error. Operators chasing a stuck workspace are forced to `docker exec
ws-xxx claude --print` manually to discover the real cause. Observed
today during the rate-limit incident: every PM error line was identical
"Check stderr output for details" while the real cause ("You've hit
your limit · resets Apr 17, 11pm (UTC)") was only visible via manual
reproduction — that cost ~20 minutes of diagnosis time.
## Fix
Add `_probe_claude_cli_error()`: a best-effort subprocess call that runs
`claude --print` with a small probe input, captures stderr+stdout, and
returns the real error string. Bounded by 30s timeout so a hung CLI
can't stall the error path.
Extend `_format_process_error` with ONE narrow fallback: if the
exception has no stderr/exit_code AND its message contains the specific
"Check stderr output for details" marker, call the probe and append
`probed_cli_error=<real error>` to the formatted line.
Critically: the probe only runs in the narrow case where we have
nothing else to log. If `.stderr` or `.exit_code` are present (the
normal ProcessError path from #66), the probe is skipped — no wasted
subprocess, no 30s latency on every error.
## Test coverage
`workspace-template/tests/test_claude_sdk_executor.py` adds 3 new tests:
- `test_format_process_error_probes_cli_when_stderr_swallowed` — the
happy path: exception matches the marker, probe runs, result appears
in the formatted line. Probe is monkeypatched so no subprocess spawns
in the test.
- `test_format_process_error_does_not_probe_when_stderr_already_present` —
negative: regular ProcessError with `.stderr` set does NOT trigger
the probe (skip the wasted call).
- `test_format_process_error_does_not_probe_without_swallowed_marker` —
negative: unrelated plain exceptions (e.g. RuntimeError) do NOT
trigger the probe (so the common-case error path stays fast).
All 7 `_format_process_error` tests pass locally (4 existing + 3 new):
\`\`\`
pytest tests/test_claude_sdk_executor.py -k format_process_error
======================= 7 passed in 0.06s ========================
\`\`\`
## Impact
Next time the SDK swallows a real error (rate limit, auth failure,
network outage), the workspace log will contain the actual error string
alongside the generic placeholder:
SDK agent error [claude-code]: Exception: Command failed with exit
code 1 ... | probed_cli_error="You've hit your limit · resets Apr
17, 11pm (UTC)"
Diagnosis time drops from "docker exec each ws, run claude --print,
read stderr" (~20 min) to "grep probed_cli_error in platform logs"
(~10 seconds).
Closes#160.
The register call was missing headers=auth_headers(), so workspaces that
already have a persisted token (i.e. every restart after the first boot)
were sending an unauthenticated request. The platform's register handler
returns 401 for requests missing a valid bearer token once a token has
been issued, causing re-registration to fail on every restart.
Import auth_headers at the module level (alongside the existing save_token
inline import) and pass it to the httpx POST. auth_headers() returns {}
when no token is on file yet (first boot), so there is no regression for
fresh workspaces — the platform still issues a token on the 200 response
and save_token() persists it for all subsequent restarts.
Closes#215
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Closes#204. PR #198 wired push_sender=PushNotificationSender() into
DefaultRequestHandler to satisfy #175's push-notification capability,
but PushNotificationSender in a2a-sdk is an abstract base class and
cannot be instantiated. Every workspace container crashed on startup
with TypeError.
Reverted to DefaultRequestHandler's defaults. The pushNotifications
capability still appears in AgentCard.capabilities (advertised to A2A
clients) but actual implementation of the sender is deferred to a
Phase-H follow-up that subclasses PushNotificationSender properly.
Existing pytest suite unchanged (the crash was only at runtime on
main.py import, which no existing test exercises directly).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Ships the first half of the queued Hermes adapter expansion. PR 2 only
supported Nous Portal + OpenRouter; this adds 13 more providers reachable
via OpenAI-compat endpoints. Native SDK paths for Anthropic + Gemini are
Phase 2 (better tool-calling + vision fidelity).
## What's new
**`workspace-template/adapters/hermes/providers.py`** (new file, 220 LOC):
- ``ProviderConfig`` dataclass: name, env vars, base URL, default model, auth scheme, docs
- ``PROVIDERS`` dict with 15 entries across 4 groups:
- PR 2 baseline: nous_portal, openrouter
- Frontier commercial: openai, anthropic, xai, gemini
- Chinese providers: qwen, glm, kimi, minimax, deepseek
- OSS/alt: groq, together, fireworks, mistral
- ``RESOLUTION_ORDER`` tuple: priority for auto-detect (back-compat first,
then commercial, then Chinese, then OSS/alt)
- ``resolve_provider(explicit=None)`` -> (ProviderConfig, api_key)
- With explicit name: routes to that provider, raises if env var empty
- Without: walks RESOLUTION_ORDER, first env-var-set provider wins
**`workspace-template/adapters/hermes/executor.py`** (refactored):
- `create_executor(hermes_api_key=None, provider=None, model=None)` now has
three parameters:
- `hermes_api_key`: PR 2 back-compat — routes to Nous Portal
- `provider`: canonical short name from the registry (e.g. "anthropic")
- `model`: optional override of the provider's default model
- Delegates all resolution to `providers.resolve_provider()` — no more
hardcoded URLs or env var lookups in the executor itself
- `HermesA2AExecutor.__init__` no longer has Nous-specific defaults; callers
pass base_url + model explicitly (which create_executor always does)
**`workspace-template/tests/test_hermes_providers.py`** (new file, 26 tests):
- Registry shape invariants (count >= 15, no duplicates, every config valid)
- PR 2 back-compat: HERMES_API_KEY / OPENROUTER_API_KEY still route correctly
- Auto-detect for every provider in the registry (parametrized — guards against
typos in env var lists)
- Explicit `provider=` bypass of auto-detect
- Error cases: unknown provider, explicit-but-empty, auto-detect-with-no-env
- All 26 tests pass locally in 0.08s
## Back-compat guarantees
| Scenario | PR 2 behavior | This PR behavior |
|---|---|---|
| `create_executor(hermes_api_key="x")` | Nous Portal | Nous Portal (unchanged) |
| `HERMES_API_KEY=x` env, auto-detect | Nous Portal | Nous Portal (unchanged) |
| `OPENROUTER_API_KEY=x` env, auto-detect | OpenRouter | OpenRouter (unchanged) |
| Both env + explicit hermes_api_key param | Nous Portal (param wins) | Nous Portal (param wins, unchanged) |
Nothing existing can break. New callers gain access to 13 more providers.
## What's NOT in this PR (Phase 2)
- **Native Anthropic Messages API path** — better tool calling, vision, extended
thinking. Requires pulling in `anthropic` SDK. ~50 LOC.
- **Native Gemini generateContent path** — for vision + google tools. Requires
`google-genai` SDK. ~50 LOC.
- **Streaming support across all providers** — current executor is non-streaming
(single chat.completions.create call). Streaming works with openai.AsyncOpenAI
but hasn't been wired to the A2A event queue path. ~30 LOC.
- **Per-provider model overrides in config.yaml** — Phase 1 uses the registry's
default_model. Phase 2 adds a `hermes: { provider: qwen, model: qwen3-coder-plus }`
block in the workspace config.
- **`.env.example` updates** — not critical since the registry itself documents
every env var via the `env_vars` field, but nice-to-have.
## Related
- Queued memory: `project_hermes_multi_provider.md`
- CEO directive 2026-04-15: *"once current works are cleared, I want you to
focus on supporting hermes agent, right now it doesnt take too much providers"*
- `docs/ecosystem-watch.md` → `### Hermes Agent` — Research Lead's eco-watch
entry listed "Nous Portal, OpenRouter, GLM, Kimi, MiniMax, OpenAI, …" which
shaped this registry's initial set
## Test plan
- [x] Unit tests: 26/26 pass locally (pytest)
- [ ] CI will run on the self-hosted macOS arm64 runner
- [ ] Smoke test in a real workspace: set QWEN_API_KEY and verify Technical
Researcher actually hits Alibaba DashScope successfully
- [ ] Integration test per provider with real API keys (gated on env, skip
when not set — Phase 2 CI addition)
Today's multi-framework research (Hermes, Letta, Trigger.dev, Inngest, AG2,
Rivet, n8n, Composio, SWE-agent — see docs/ecosystem-watch.md) confirmed
that nobody runs while(true) per agent. The working patterns are:
(a) event-driven + hibernation (Hermes, Letta, Trigger.dev, Inngest)
(b) cron/user-triggered ephemeral runs (AG2, Rivet, n8n, SWE-agent)
Molecule AI is currently 100% in category (b). Observed team utilization:
~0.5% — agents idle 99.5% of the time because cron fires and CEO-typed
A2A are the only initiating signals. CEO's north-star is 24/7 iteration,
current cadence falls short.
This PR closes the gap by adding an in-workspace idle loop that wakes the
agent periodically ONLY when it has no active task. The shape is the
Hermes reflection-on-completion pattern combined with the Letta backlog-pull
pattern, collapsed into a ~60 LOC change in the workspace-template. Zero
new Go code. Zero new DB tables. Zero new API endpoints.
## How it works
1. `config.py` gets two new fields on WorkspaceConfig:
- `idle_prompt: str = ""` — the prompt to self-send when idle
- `idle_interval_seconds: int = 600` — how often to check (default 10 min)
Both support inline or file ref (matching the initial_prompt pattern).
2. `main.py` spawns an `_run_idle_loop()` asyncio task alongside the
existing initial_prompt task (same lifecycle hooks — cancelled in the
`finally:` of the server.serve() block).
3. The loop body:
a. Sleep interval
b. Check `heartbeat.active_tasks == 0` LOCALLY (no LLM call, no HTTP)
c. If idle → self-POST the idle_prompt via the existing /workspaces/{id}/a2a proxy
d. Loop
The agent's own concurrency control rejects the post if it becomes busy
between the check and the POST — that's the safety valve.
4. Gated on `config.idle_prompt` being non-empty. Default = "" = no loop.
Existing workspaces upgrade silently as no-ops until someone explicitly
opts in by setting idle_prompt in org.yaml (either defaults: or
per-workspace:).
## Cost analysis (from the research report)
- while(true) pattern: ~$93/day/org (12 agents × 12 thinks/hour × $0.027). Unshippable.
- Hermes reflection-on-completion: ~$0.45/day/org. Cost ∝ useful work.
- This PR's idle loop at 10-min cadence: upper bound 12 × 6/hour × 24h
× ~3k tokens × Sonnet rate ≈ $5/day/org PER ROLE, only if they're
genuinely idle every check. In practice far less because busy periods
skip the LLM call entirely (the active_tasks check is local).
## Rollout plan
Research report recommended rolling to ONE workspace first (Technical
Researcher) and measuring 24h of activity_logs before enabling for
all 12. This PR enables the mechanism; it does NOT add any default
idle_prompt to org-templates/molecule-dev/org.yaml. That's a follow-up
PR after this one lands and one workspace has been manually opted in
for measurement.
## Not touched in this PR
- No Go code (no new platform endpoint, no new DB columns)
- No org.yaml changes (zero-impact until someone opts in)
- No scheduler changes (the idle loop is a workspace concern, not a
scheduler concern — matches the research report's layering)
## Test plan
- [x] Python syntax check (ast.parse) on main.py + config.py
- [ ] Unit test: WorkspaceConfig parses idle_prompt / idle_interval_seconds from yaml
- [ ] Integration test: set idle_prompt on Technical Researcher, measure that
an A2A message is received every ~10 min while idle, and NOT received
while busy with a delegation
- [ ] Dogfood: enable on Technical Researcher for 24h, count activity_logs
delta vs baseline, confirm cost stays within model
## Related
- Today's research report (conversation output, summarized in commit trailer)
- docs/ecosystem-watch.md → `### Hermes Agent` (the canonical reflection-on-completion example)
- #159 orchestrator/worker split — complementary: leaders pulse for dispatch,
workers idle-loop for pull. Together: leaders push work, workers pull work,
no role ever sits idle with a cold queue.
#173 — implement cancel() in LangGraphA2AExecutor: emits
TaskStatusUpdateEvent(state=canceled, final=True) so clients see the
state transition rather than silence. Removes pragma: no cover.
Test: test_cancel_emits_canceled_event.
#174 — add stateTransitionHistory=True to AgentCapabilities in main.py
so microsoft/agent-framework clients know they can request full task
history via the A2A protocol.
#175 — wire InMemoryPushNotificationConfigStore and PushNotificationSender
into DefaultRequestHandler so the advertised pushNotifications capability
is backed by a real store. Both classes live in a2a.server.tasks (a2a-sdk
0.3.25); import confirmed by probe.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replaces the proposed monolithic molecule-guardrails plugin with 12
single-purpose plugins users can install à la carte. Powered by a
small extension to the AgentskillsAdaptor base class so any plugin can
ship hooks/, commands/, and a settings-fragment.json without writing a
custom adapter.
## Base adapter changes
workspace-template/plugins_registry/builtins.py + sdk/python/molecule_plugin/builtins.py
(both copies — drift-tested):
- New _install_claude_layer() helper called at the end of install()
- Conditionally copies hooks/ → /configs/.claude/hooks/ (preserving exec bit)
- Conditionally copies commands/*.md → /configs/.claude/commands/
- Conditionally merges settings-fragment.json into /configs/.claude/settings.json
with ${CLAUDE_DIR} placeholder rewritten to the workspace's absolute install
path. Existing user hooks are preserved (deep-merge by event name).
- All steps no-op when the plugin doesn't ship the corresponding files,
so existing skill+rule plugins (molecule-dev, superpowers, ecc,
browser-automation) are unchanged.
Drift test (tests/test_plugins_builtins_drift.py) still passes.
## 12 new plugins
Hook plugins (ambient enforcement):
- molecule-careful-bash — refuses destructive bash; ships careful-mode skill
- molecule-freeze-scope — locks edits via .claude/freeze
- molecule-audit-trail — appends every Edit/Write to audit.jsonl
- molecule-session-context — auto-loads cron-learnings at session start
- molecule-prompt-watchdog — injects warnings on destructive prompt keywords
Skill plugins (on-demand):
- molecule-skill-code-review — 16-criteria multi-axis review
- molecule-skill-cross-vendor-review — adversarial second-model review
- molecule-skill-llm-judge — deliverable-vs-request scoring
- molecule-skill-update-docs — post-merge doc sync
- molecule-skill-cron-learnings — operational-memory JSONL format
Workflow plugins (slash commands):
- molecule-workflow-triage — /triage full PR-triage cycle
- molecule-workflow-retro — /retro + cron-retro skill, weekly retrospective
Each ships only what it needs — most have just plugin.yaml + skills/ or
hooks/ + adapter (one-line stub: `from plugins_registry.builtins import
AgentskillsAdaptor as Adaptor`). Total ~120 files but each plugin is
small and self-contained.
## Verification
- python3 -m molecule_plugin validate plugins/molecule-* → all 13 valid
(12 new + pre-existing molecule-dev)
- End-to-end install smoke test on representative samples: hook plugin
(molecule-careful-bash), skill-only plugin (molecule-skill-code-review),
workflow plugin (molecule-workflow-triage). All produce expected
/configs/ tree, settings.json paths rewritten, exec bits preserved,
zero warnings.
- workspace-template pytest tests/test_plugins_builtins_drift.py → passes
(SDK + runtime stay in sync).
## CLAUDE.md repo-doc updated
Lists all 12 new plugins under the existing Plugins section, organized
by category (hook / skill / workflow). Each entry one line, recommend-
together hints where dependencies make sense.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
On Docker Desktop (macOS/Windows), host-path bind mounts often appear
root-owned inside the container. The previous entrypoint only chowned
/workspace top-level, so agents (uid 1000) still couldn't write to
/workspace/repo/* — git clone, pip install, and file edits failed with
EACCES and fell back to /tmp. Detect the root-owned-contents case by
sampling the first entry; if it's root-owned, recursively chown the
tree. On normal Linux Docker with matching uids this is a no-op, so the
fast-startup path is preserved for the common case.
Part B of the issue (private-repo initial_prompt clone) was addressed
by PR #20.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
get_peers() was sending no auth headers to /registry/:id/peers — this would
return 401 for every workspace agent after PR #31 (WorkspaceAuth middleware)
deploys, breaking peer discovery entirely.
discover_peer() had X-Workspace-ID but was missing the bearer token, also
required by Phase 30.6 for /registry/discover/:id.
Both functions now send {"X-Workspace-ID": WORKSPACE_ID, **auth_headers()}.
get_workspace_info() was already correct (auth_headers() present since PR #39).
Adds test_request_sends_workspace_id_header to TestGetPeers; hardens the
discover_peer header assertion to use presence-check rather than exact equality.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
H3 (compliance.py): GitHub fine-grained PATs use the github_pat_ prefix
with an 82-character alphanumeric+underscore suffix — different from
classic tokens (36 chars). Add the missing pattern to _PII_PATTERNS so
fine-grained PATs are redacted in compliance logs alongside classic tokens.
M4 (platform_auth.py): Replace write_text()+chmod() in save_token() with
os.open(O_WRONLY|O_CREAT|O_TRUNC, 0o600) + os.write(). The old approach
had a TOCTOU window where a concurrent reader could access the token file
before chmod restricted permissions. os.open with explicit mode creates the
file with 0600 permissions atomically in a single syscall.
H2 (a2a_client.py): Already fixed in commit 6c78962 (Cycle 5); no-op.
Tests: 1136 passed, 2 skipped (workspace-template pytest suite)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
IMPACT WITHOUT THIS FIX: deploying PR #31 (WorkspaceAuth middleware on
/workspaces/*) without this patch causes EVERY delegation cycle to silently
break — the heartbeat poll returns 401, the self-message A2A POST returns
401, agents never wake up after task completion, and memory consolidation
stops. The entire multi-agent coordination system degrades to single-shot
interactions with no result delivery.
Changes (all using the existing platform_auth.auth_headers() pattern
already used for POST /registry/heartbeat):
heartbeat.py — 5 calls fixed:
- GET /workspaces/:id/delegations (delegation poll)
- GET /workspaces/:id (self workspace info for parent lookup)
- GET /workspaces/{parent_id} (parent workspace name lookup)
- POST /workspaces/:id/a2a (self-message to wake agent on results)
- POST /workspaces/:id/notify (canvas delegation result notification)
Also moved `from platform_auth import auth_headers` from inline (per-call)
to module-level import so _check_delegations() can use it without re-importing.
consolidation.py — 4 calls fixed:
- GET /workspaces/:id/memories (fetch memories for consolidation)
- POST /workspaces/:id/memories (write consolidated summary — agent path)
- DELETE /workspaces/:id/memories/:id (delete original memories post-consolidation)
- POST /workspaces/:id/memories (write consolidated summary — fallback path)
a2a_client.py — 1 call fixed:
- GET /workspaces/:id (get_workspace_info())
⚠️ DEPLOYMENT NOTE: This PR MUST be merged and deployed at the same time as
PR #31 (WorkspaceAuth middleware). Deploying #31 without this fix will
immediately break all delegation result delivery.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Both watcher.py (ConfigWatcher) and skill_loader/watcher.py
(SkillsWatcher) used hashlib.md5() for file-integrity change detection.
MD5 is collision-prone: a crafted config file could produce the same
hash as a benign one, silently suppressing the hot-reload callback and
preventing agents from picking up legitimate config changes.
Replace hashlib.md5 → hashlib.sha256 in both _hash_file() methods.
Update docstrings, comments, and the type-annotation comment
(rel_path → md5 hex → sha256 hex).
Test update: test_skills_watcher.py — rename helper _md5 → _sha256,
update the hash-length assertion from 32 (MD5) to 64 (SHA-256), and
rename the test from test_hash_file_returns_md5_for_existing_file to
test_hash_file_returns_sha256_for_existing_file. All 25 watcher tests
pass.
Note: H2 (a2a_client.py timeout=None) was already fixed in Cycle 5
(timeout=httpx.Timeout(connect=30.0, read=300.0, ...)) — confirmed by
code review before opening this PR.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Fix A — platform/internal/middleware/wsauth_middleware.go (NEW):
WorkspaceAuth() gin middleware enforces per-workspace bearer-token auth on
ALL /workspaces/:id/* sub-routes. Same lazy-bootstrap contract as
secrets.Values: workspaces with no live token are grandfathered through.
Blocks C2, C3, C4, C5, C7, C8, C9, C12, C13 simultaneously.
Fix A — platform/internal/router/router.go:
Reorganised route registration: bare CRUD (/workspaces, /workspaces/:id)
and /a2a remain on root router; all other /workspaces/:id/* sub-routes
moved into wsAuth = r.Group("/workspaces/:id", middleware.WorkspaceAuth(db.DB)).
CORS AllowHeaders updated to include Authorization so browser/agent callers
can send the bearer token cross-origin.
Fix B — workspace-template/heartbeat.py:
_check_delegations(): validate source_id == self.workspace_id before
accepting a delegation result. Attacker-crafted records with a foreign
source_id are silently skipped with a WARNING log (injection attempt).
trigger_msg no longer embeds raw response_preview text; references
delegation_id + status only — removes the prompt-injection vector.
Fix C — workspace-template/skill_loader/loader.py:
load_skill_tools(): before exec_module(), verify script is within
scripts_dir (path traversal guard) and temporarily scrub sensitive env
vars (CLAUDE_CODE_OAUTH_TOKEN, ANTHROPIC_API_KEY, OPENAI_API_KEY,
WORKSPACE_AUTH_TOKEN, GITHUB_TOKEN, GH_TOKEN) from os.environ; restore
in finally block. Defence-in-depth even if /plugins auth gate is bypassed.
Fix D — platform/internal/handlers/socket.go:
HandleConnect(): agent connections (X-Workspace-ID present) validated via
wsauth.HasAnyLiveToken + wsauth.ValidateToken before WebSocket upgrade.
Canvas clients (no X-Workspace-ID) remain unauthenticated.
Fix D — workspace-template/events.py:
PlatformEventSubscriber._connect(): include platform_auth bearer token in
WebSocket upgrade headers alongside X-Workspace-ID.
Fix E — workspace-template/executor_helpers.py:
recall_memories() and commit_memory() now pass platform_auth bearer token
in Authorization header so WorkspaceAuth middleware allows access.
Fix F — workspace-template/a2a_client.py:
send_a2a_message(): timeout=None → httpx.Timeout(connect=30, read=300,
write=30, pool=30). Resolves H2 flagged across 5 consecutive audits.
Tests: 149/149 Python tests pass (test_heartbeat + test_events updated to
assert new source_id validation behaviour and allow Authorization header).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>