Commit Graph

42 Commits

Author SHA1 Message Date
Hongming Wang
c7477047c2 Merge pull request #338 from Molecule-AI/fix/issue-328-transcript-fail-closed
fix(security): /transcript fails closed when auth token missing (#328)
2026-04-15 21:30:56 -07:00
Hongming Wang
0e46afa4b9 fix(security): hitl task-id ownership + wire fail_open_if_no_scanner in loader (closes #265, #268)
Security audit cycle 13: hitl.py LGTM (workspace-scoped task IDs). Loader.py fix applied (commit 0557f73): fail_open_if_no_scanner now read from config and forwarded to scan_skill_dependencies(); regression test added. CI 5/6 pass (E2E cancel = run-supersession pattern). Closes #265. Closes #268.
2026-04-15 21:18:52 -07:00
Hongming Wang
e1cdb5c9c6 fix(security): /transcript endpoint fails closed when auth token missing (#328)
Severity HIGH. The /transcript route in main.py used `if expected:`
around the bearer-token compare, so `get_token()` returning None (no
/configs/.auth_token on disk — bootstrap window, deleted file, OSError)
silently skipped the entire auth check. Any container on
molecule-monorepo-net could GET /transcript during the provisioning
window and walk away with the full session log (user messages, Claude
tool calls, assistant replies).

The platform's TranscriptHandler always has a valid token (it acquired
one at workspace registration), so tightening this gate has no
legitimate-caller impact. Only unauthenticated sniffers lose access,
which was never the intended contract of #287.

Fix:

1. Extracted the auth gate into `workspace-template/transcript_auth.py`
   — a 20-line module with no heavy imports so the security-critical
   code is unit-testable without standing up the full uvicorn/a2a/httpx
   stack (the former inline guard could only be tested end-to-end,
   which explains why the regression shipped in #287).

2. `transcript_authorized(expected, auth_header)` returns False when
   `expected` is None or empty — the #328 fix — and otherwise does
   strict equality against "Bearer <expected>".

3. main.py's inline handler calls the extracted function:
     if not _transcript_authorized(get_token(), auth_header):
         return 401

4. New tests/test_transcript_auth.py covers: None token, empty token,
   valid bearer, wrong bearer, missing header, case-sensitive prefix,
   whitespace fuzzing. All 7 pass.

Closes #328

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 21:17:37 -07:00
Hongming Wang
5451164cba fix(security): add bearer token auth to /transcript endpoint (#287)
Closes #287

Any container on molecule-monorepo-net could previously read the full Claude session log without authentication. Guard uses get_token() from platform_auth — skipped only before workspace registration (dev-mode).
2026-04-15 19:47:23 -07:00
Hongming Wang
84d5e395d4 fix(a2a-tools): auth_headers on recall_memory + commit_memory (#304)
Adds auth_headers to recall_memory and commit_memory in a2a_tools.py. Fixes the #215-class auth regression for A2A memory tools. Test mocks updated to accept headers kwarg.
2026-04-15 19:12:18 -07:00
Hongming Wang
72d30c0b14 Merge pull request #270 from Molecule-AI/feat/workspace-transcript-endpoint
feat: GET /workspaces/:id/transcript — live agent session log
2026-04-15 17:55:41 -07:00
Hongming Wang
6898391dd0 fix(tests): update memory fakes for auth_headers kwarg + activity overwrite
The #215-class fix in memory.py (859a60e) adds headers=_headers to the
direct-httpx commit_memory + search_memory paths, but 9 existing tests
in test_memory.py had FakeAsyncClient.post/get signatures like
`async def post(self, url, json):` with no headers kwarg. Python
raised TypeError: unexpected keyword argument 'headers' on every call,
commit_memory caught it and returned {success: False}, tests failed.

Fixes applied:

1. Add `headers=None` to every FakeAsyncClient.post + .get signature
   across test_memory.py. Uses replace_all so all 9+ fakes match.

2. For tests that capture a single captured["url"]:
   - test_commit_memory_uses_awareness_client_when_configured
   - test_commit_memory_uses_platform_fallback_without_awareness
   - test_commit_memory_httpx_201_success
   filter to only capture /memories URLs. Without the filter, the
   subsequent _record_memory_activity fire-and-forget post to /activity
   overwrites captured["url"] and the assertion fails.

3. For test_commit_memory_promoted_packet_logs_skill_promotion: bump
   expected captured["calls"] from 3 to 4. Pre-fix, the memory_write
   /activity call (from _record_memory_activity #125) was silently
   dropped because the fake rejected headers=; post-fix it succeeds
   and lands in the captured list alongside the skill_promotion
   /activity and /registry/heartbeat calls. Also extend that test's
   fake to accept /registry/heartbeat (was raising AssertionError).

Total: 36/36 memory tests pass. Full workspace-template suite 1189/1189.

This is strictly test-infrastructure work — zero production code
changed. CI never caught the break because the Mac mini runner has
been stuck for ~4 hours (tick-33/34/35/36 reports).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 17:29:15 -07:00
rabbitblood
9923159f2e fix(memory-tools): #215-class — auth_headers on commit_memory + search_memory HTTP fallback
Context: platform now gates `GET /workspaces/:id/memories` and
`POST /workspaces/:id/memories` behind workspace auth (post-#166 /
#167 AdminAuth wave). The `builtin_tools.memory` tool had three HTTP
call sites:

  1. commit_memory POST fallback (line 121)        ← NO auth_headers
  2. search_memory GET fallback (line 269)         ← NO auth_headers
  3. activity-log helper POST (line 371)           ← HAS auth_headers

Path 3 was already fixed. Paths 1 + 2 silently 401 every call, but the
tool's error-handling path returns `{"success": False}` without surfacing
the auth failure to the agent. Result: the agent sees an empty memory
backlog on every call and assumes there's nothing to do.

## Discovered today

Technical Researcher is the first workspace opted in to the idle-loop
pilot from #216 (reflection-on-completion pattern). The pilot fires
every 10 min, the agent calls `search_memory "research-backlog:..."` as
the first step, gets back an empty result, writes "tr-idle clean" to
memory, and stops. Clean-idle outcome every tick, 9 consecutive ticks.

Looking at TR's activity_logs response bodies:

    "Memory auth has failed on every tick this session — skipping the call"
    "tr-idle — step 2 done. Memory unavailable (auth token missing..."
    "tr-idle 04:15 — clean (memory auth still down, 3rd consecutive tick)"

The AGENT knew the memory calls were failing. The platform 401 error
was surfacing in the tool response, but our instrumentation wasn't
counting it as a defect — we saw "tr-idle clean" writes and assumed
the pilot was working as designed. It was actually silently broken.

## Fix

Import `platform_auth.auth_headers` lazily (same pattern as the
activity-log path already uses), attach `headers=_auth()` to both
httpx call sites. Matches the #225 fix for the register call.

## Not in this PR

- awareness_client.py also makes HTTP calls to a separate AWARENESS_URL
  service (not the platform), which may or may not need the same fix
  depending on that service's auth posture. Out of scope for this PR.

- TR's specific token problem: TR's `/configs/.auth_token` file is
  empty because it was re-provisioned via `apply_template: true`
  (recovery path from the failed-volume incident) and Phase 30.1
  only mints a token on FIRST register per workspace. This fix
  doesn't help TR until TR gets a fresh token — tracked separately.

## Test plan

- [x] Python syntax check on memory.py passes
- [ ] CI: all memory-related tests should still pass (the new code
      paths only add header passing, no shape change)
- [ ] Real-world verification: after TR gets a fresh token, idle-loop
      pilot should produce a dispatch within 10 min (seeded backlog
      already in place from this session)

## Related
- #215 / #225 — register call auth_headers fix (same pattern)
- #216 — TR idle-loop pilot (couldn't measure until this lands)
- #166 / #167 — platform AdminAuth wave that surfaced this gap
2026-04-15 17:26:26 -07:00
rabbitblood
88ce2a18cd feat(hermes): Phase 2d-i — system-prompt.md injection on all 3 dispatch paths
The Hermes adapter never read /configs/system-prompt.md. Any role that
switched to runtime: hermes was silently losing its role identity because
the system prompt wasn't passed to the model. This PR fixes that by:

1. HermesA2AExecutor.__init__ takes new optional `config_path` kwarg
2. `create_executor(config_path=...)` forwards to the constructor
3. `adapter.py` passes `config.config_path` through from AdapterConfig
4. `execute()` reads system-prompt.md via executor_helpers.get_system_prompt
   (hot-reload-capable — reads on every turn, not just at startup)
5. `_do_inference(user_message, history, system_prompt)` — new arg threads
   through the dispatch to each native path
6. Each path uses the provider's NATIVE system field:
   - OpenAI-compat: prepends `{"role":"system", "content":...}` to messages
   - Anthropic: top-level `system=` kwarg (NOT in messages — Anthropic
     requires system at the top level)
   - Gemini: `config=GenerateContentConfig(system_instruction=...)`

## Phase scoreboard
- 2a (in main) — native Anthropic dispatch infra
- 2b (in main) — native Gemini dispatch
- 2c (in main) — multi-turn history on all paths
- **2d-i (this PR)** — system prompts on all paths
- 2d-ii (future) — tool calling on native paths
- 2d-iii (future) — vision content blocks on native paths
- 2d-iv (future) — streaming

## Test coverage

46/46 tests pass (20 Phase 2 dispatch + 26 Phase 1 registry):

- Existing dispatch tests updated to assert the 3-arg call shape
  `("hello", None, None)` — history + system_prompt both None
- 4 new tests:
  - `dispatch_passes_system_prompt_to_anthropic` — happy path, third arg flows
  - `dispatch_passes_system_prompt_to_gemini` — happy path
  - `dispatch_passes_system_prompt_to_openai` — happy path
  - `executor_accepts_config_path_kwarg` — constructor stores config_path
  - `create_executor_forwards_config_path` — both back-compat and registry
    resolution paths forward config_path through to the executor

## Back-compat

- `config_path=None` (default) → execute() skips system-prompt injection,
  same behavior as pre-2d-i
- Workspaces with `runtime: hermes` but no `/configs/system-prompt.md`
  file get `system_prompt=None` (get_system_prompt returns fallback),
  same as before
- The 13 OpenAI-compat providers work identically — system_prompt just
  adds a leading message, which every OpenAI-compat endpoint already
  supports
- Anthropic + Gemini previously got zero system context; now they get
  the same system prompt the workspace's system-prompt.md carries

## Why this matters

Before this PR: if someone flipped a workspace from `runtime: claude-code`
to `runtime: hermes`, the agent would act generically (no role identity,
no project conventions, no CLAUDE.md context) because the Hermes executor
never looked at system-prompt.md. That's a silent correctness regression
the test suite wouldn't catch because none of our live workspaces use
the hermes runtime today.

With this PR: Hermes workspaces get the same system prompt injection as
Claude-code workspaces, making the `runtime: hermes` switch a true drop-in
alternative.

## Related
- #267 Phase 2c (multi-turn history — in main)
- #255 Phase 2b (gemini native — in main)
- #240 Phase 2a (anthropic native — in main)
- #208 Phase 1 (provider registry — in main)
- project_hermes_multi_provider.md — Phase 2d-i was the next queued item
2026-04-15 16:21:47 -07:00
airenostars
853734aa4e feat: GET /workspaces/:id/transcript — live agent session log
Closes #N (issue to be filed)

Lets canvas / operators see live tool calls + AI thinking instead of
waiting for the high-level activity log to flush. Right now the only
way to "look over an agent's shoulder" is `docker exec ws-XXX cat
/home/agent/.claude/projects/.../<session>.jsonl`, which:
  - doesn't work for remote workspaces (Phase 30 / Fly Machines)
  - requires shell access on the host
  - has no pagination

This PR adds:

1. `BaseAdapter.transcript_lines(since, limit)` — async hook returning
   `{runtime, supported, lines, cursor, more, source}`. Default returns
   `supported: false` so non-claude-code runtimes pass through gracefully.

2. `ClaudeCodeAdapter.transcript_lines` override — reads the most-
   recently-modified `.jsonl` in `~/.claude/projects/<cwd>/`. Resolves
   cwd the same way `ClaudeSDKExecutor._resolve_cwd()` does so the
   project dir name matches what Claude Code actually writes to. Limit
   capped at 1000 to prevent OOM.

3. Workspace HTTP route `GET /transcript` — Starlette handler added
   alongside the A2A app. Trusts the internal Docker network (same
   model as POST / for A2A); Phase 30 remote-workspace auth is a
   follow-up.

4. Platform proxy `GET /workspaces/:id/transcript` — looks up the
   workspace's URL, forwards GET, caps response at 1MB. Gated by
   existing `WorkspaceAuth` middleware (same as /traces, /memories,
   /delegations).

Tests: 6 Python unit tests cover empty dir / pagination / multi-session
/ malformed lines / limit cap, plus 4 Go tests cover 404 / proxy
forwarding / query-string propagation / unreachable-workspace 502.

Verified end-to-end on a live workspace — returns real claude-code
session entries through the platform proxy.

## Follow-ups
- WebSocket variant for live streaming (instead of polling)
- Canvas UI tab "Transcript" between Activity and Traces
- LangGraph / DeepAgents / OpenClaw transcript adapters
- Phase 30 remote-workspace auth on /transcript
2026-04-15 14:29:43 -07:00
rabbitblood
d40a9d940c feat(hermes): Phase 2c — multi-turn history passed natively to all paths
Completes the Phase 2 scope by keeping conversation turns as turns across
all three dispatch paths. Pre-2c, history was flattened into a single user
message via shared_runtime.build_task_text, which worked as a fallback but
lost the model's native multi-turn awareness (role attribution,
instruction-following on mid-conversation corrections, system-prompt
grounding against prior turns).

Phase 2a + 2b shipped the dispatch infrastructure + per-provider native
paths. This PR uses them properly.

## What's new

- **`_history_to_openai_messages(user_message, history)`** (static) — maps
  A2A `(role, text)` tuples to OpenAI Chat Completions
  `[{"role":"user"|"assistant","content":str}]`. Roles: `human`→`user`,
  `ai`→`assistant`. Current turn appended as the final user message.

- **`_history_to_anthropic_messages`** (static) — identical wire shape to
  OpenAI for text-only turns, so it delegates. Phase 2d tool_use/vision
  blocks will diverge here.

- **`_history_to_gemini_contents`** (static) — Gemini uses a different
  shape: `role="user"|"model"` (NOT "assistant") and text wrapped in
  `parts=[{"text":...}]`. Delegates to none of the others.

- **`_do_openai_compat(user_message, history=None)`** — accepts history,
  builds messages via `_history_to_openai_messages`. Back-compat: pass
  `history=None` to get the old single-turn behavior.

- **`_do_anthropic_native(user_message, history=None)`** — same signature
  change, calls `_history_to_anthropic_messages`. Still uses
  `anthropic.AsyncAnthropic().messages.create()`, just with proper
  multi-turn.

- **`_do_gemini_native(user_message, history=None)`** — same pattern,
  calls `_history_to_gemini_contents`, passes to Gemini's
  `generate_content(contents=...)`.

- **`_do_inference(user_message, history=None)`** — new signature,
  dispatches by auth_scheme as before, passes both args through.

- **`execute()`** — no longer calls `build_task_text`. Calls
  `extract_history(context)` directly and forwards to `_do_inference`.
  Removes the `build_task_text` import (not needed in this file anymore).

## Tests

Existing 7 dispatch tests updated for the new `(user_message, history)`
signature — they assert the path is called with `("hello", None)` since
they pass no history.

5 NEW tests:

- `test_history_to_openai_messages_empty_history` — empty history degrades
  to single user message (back-compat)
- `test_history_to_openai_messages_multi_turn` — round-trip of a 3-turn
  history + current turn
- `test_history_to_anthropic_messages_same_as_openai` — cross-check that
  anthropic path produces identical wire shape for text-only
- `test_history_to_gemini_contents_uses_model_role_and_parts_wrapper` —
  verifies the Gemini-specific role mapping (`ai`→`model`) + parts wrapper
- `test_dispatch_passes_history_through` — end-to-end: _do_inference
  forwards history to the chosen provider path

All 41 tests pass (15 Phase 2 dispatch + 26 Phase 1 registry):

    pytest tests/test_hermes_phase2_dispatch.py tests/test_hermes_providers.py
    41 passed in 0.07s

## Back-compat

- No public API changes to `create_executor()`. Callers that hit
  `execute()` via A2A get the new multi-turn behavior automatically via
  `extract_history(context)`.
- Callers that passed an empty history list (or None) get the same
  single-turn behavior as pre-2c.
- The `build_task_text` helper in shared_runtime is unchanged — other
  adapters (AutoGen, LangGraph) that use it keep working. Only Hermes
  bypasses it now.

## What's NOT in this PR (Phase 2d)

- Tool calling / function calling on native paths (anthropic `tools=`,
  gemini `tools=Tool(function_declarations=[...])`)
- Vision content blocks (image_url → anthropic `{type:"image", source:
  {type:"base64",...}}` / gemini `{inline_data:{mime_type,data}}`)
- System instructions pass-through (anthropic `system=`, gemini
  `system_instruction=`)
- Streaming (`astream_messages` / `streamGenerateContent` stream variants)
- Extended thinking (anthropic `thinking={"type":"enabled"}`) / Gemini
  thinking config

Phase 2c is the **multi-turn upgrade**. Tool + vision + streaming are
Phase 2d, scoped in project_hermes_multi_provider.md.

## Related

- #240 Phase 2a (native Anthropic dispatch — in main)
- #255 Phase 2b (native Gemini dispatch — in main)
- Phase 1 (#208 — provider registry baseline, in main)
- `project_hermes_multi_provider.md` queued memory
- CEO 2026-04-15: "focus on supporting hermes agent"
2026-04-15 14:21:10 -07:00
Hongming Wang
825b8a227f Merge pull request #255 from Molecule-AI/feat/hermes-phase2b-gemini-native
feat(hermes): Phase 2b — native Google Gemini generateContent dispatch path
2026-04-15 14:01:00 -07:00
Hongming Wang
353dc306e9 Merge pull request #240 from Molecule-AI/feat/hermes-phase2-native-sdks
feat(hermes): Phase 2a — native Anthropic Messages API dispatch (auth_scheme='anthropic')
2026-04-15 14:00:51 -07:00
Hongming Wang
66120e6c37 fix(tests): hermes provider env-var leak broke test_hermes_smoke
Pre-existing flaky test: when the full workspace-template suite ran in
collection order, test_hermes_smoke.py::test_create_executor_raises_
without_keys failed with "DID NOT RAISE ValueError". Failure only
surfaced when test_hermes_providers ran first.

Root cause: test_hermes_providers had an autouse fixture that used
monkeypatch.delenv on entry, but several tests in that file mutate
os.environ directly (e.g. `os.environ["HERMES_API_KEY"] = "test"`),
bypassing monkeypatch. monkeypatch only tracks its own deltas, so on
fixture teardown the direct-mutation values stayed in os.environ.
HERMES_API_KEY leaked across file boundaries into test_hermes_smoke,
which then saw a key present when it expected absence.

Fix: replace monkeypatch-based fixture with pure snapshot/restore:
- Snapshot all provider env vars at entry
- Clear them
- yield (test runs, may mutate freely)
- try/finally restore the exact pre-test state

This is deterministic regardless of whether a test uses monkeypatch,
direct mutation, or neither. Also adds a comment documenting WHY we
switched away from monkeypatch so a future reviewer doesn't revert.

Full workspace-template suite: 1169 passed, 9 skipped, 2 xfailed.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 13:59:48 -07:00
rabbitblood
485dcb4cae feat(hermes): Phase 2b — native Google Gemini generateContent dispatch path
Completes Hermes Phase 2 by adding the second native SDK path: Google Gemini
via the official `google-genai` Python SDK. Stacked on top of Phase 2a
(feat/hermes-phase2-native-sdks) which introduced the dispatch infra +
the anthropic native path.

## What's new in this PR

1. `providers.py`: flip `gemini` entry to `auth_scheme="gemini"` and
   update `base_url` from the OpenAI-compat endpoint
   (`/v1beta/openai`) to the bare host
   (`https://generativelanguage.googleapis.com`) which the native SDK
   uses.

2. `executor.py`: new method `_do_gemini_native(task_text)` that uses
   `google.genai.Client().aio.models.generate_content(...)`. Dispatch
   table in `_do_inference` now routes `"gemini"` → `_do_gemini_native`.
   Same fail-loud semantics as `_do_anthropic_native` — missing SDK
   raises a clear RuntimeError with install instructions.

3. `requirements.txt`: add `google-genai>=1.0.0`.

4. `test_hermes_phase2_dispatch.py`: +3 tests
   - `test_gemini_entry_has_gemini_scheme` — registry flip + base URL
     validated
   - `test_dispatch_gemini_scheme_calls_gemini_native` — dispatch runs
     gemini native, not openai-compat or anthropic-native
   - `test_gemini_native_raises_clear_error_when_sdk_missing` — fail-loud
     on missing `google-genai` package
   Plus updated existing dispatch tests to mock `_do_gemini_native`
   alongside the other paths so "no cross-calls" assertions stay tight.

All 36 tests pass locally (10 Phase 2 dispatch + 26 Phase 1 registry):

    pytest tests/test_hermes_phase2_dispatch.py tests/test_hermes_providers.py
    36 passed in 0.07s

## Dispatch table after this PR

    auth_scheme="openai"     → _do_openai_compat (13 providers)
    auth_scheme="anthropic"  → _do_anthropic_native (1 provider, Phase 2a)
    auth_scheme="gemini"     → _do_gemini_native (1 provider, Phase 2b) ← NEW
    <unknown>                → _do_openai_compat + warning (forward-compat)

## Back-compat

- All 13 openai-scheme providers unchanged
- `hermes_api_key` / `HERMES_API_KEY` / `OPENROUTER_API_KEY` paths unchanged
- Only `gemini` provider changes behavior: now uses native generateContent
  instead of the `/v1beta/openai` compat shim
- Existing Gemini callers setting `GEMINI_API_KEY` get the native path
  automatically — no caller changes needed

## What's NOT in this PR (future phases)

- Streaming support (`astream_messages` / `streamGenerateContent` stream
  variants) for either native path
- Tool calling / function calling on native paths
- Vision content blocks (image_url → anthropic image blocks; image_url →
  gemini inline_data with base64 + mime_type)
- Extended thinking (anthropic) / thinking config (gemini)
- System instructions pass-through on the gemini native path

Phase 2c/2d will layer these on. This PR is the minimum-viable native
dispatch — single-turn text in, text out — same shape as Phase 2a.

## Stacking

This PR targets `feat/hermes-phase2-native-sdks` (Phase 2a) as its base
branch, NOT main, so the diff shows only the Gemini-specific additions.
When Phase 2a merges to main, GitHub auto-rebases this PR onto the new
main head. If reviewer prefers a single combined PR, close #240 and land
this one instead — the commits on feat/hermes-phase2-native-sdks are
already included in this branch's history.

## Related

- #240 Phase 2a (parent branch)
- #208 Phase 1 (registry + openai-compat path — already in main)
- `project_hermes_multi_provider.md` queued memory — Phase 2 was the next
  item, this PR completes it
- `docs/ecosystem-watch.md` → `### Hermes Agent` — Research Lead's
  eco-watch entry that catalogued Hermes's native provider list and
  shaped the original Phase 2 scope
2026-04-15 13:20:39 -07:00
rabbitblood
3985d80220 feat(hermes): Phase 2a — native Anthropic Messages API dispatch path
Completes the Hermes adapter's native-SDK plan for the provider that gains
the most from leaving OpenAI-compat: Anthropic. OpenAI-compat works fine for
plain text turns on every provider (Phase 1 covered that with one code path
for all 15 providers), but Anthropic's Messages API has first-class tool use,
vision content blocks, and extended thinking that the OpenAI-compat shim
strips or mis-translates.

Rather than ship all native SDK paths in one PR (Anthropic + Gemini + future),
this lands Anthropic only (Phase 2a). Gemini is Phase 2b, shipping after a
production measurement window on Phase 2a.

## Design

Providers now dispatch by `auth_scheme` field. Phase 1 added the field but
every provider used `"openai"`. Phase 2 flips `anthropic` to `"anthropic"`
and wires a second inference path keyed on that:

- `HermesA2AExecutor._do_openai_compat(task_text)` — existing path, handles
  14 of 15 providers (Nous Portal, OpenRouter, OpenAI, xAI, Gemini, Qwen,
  GLM, Kimi, MiniMax, DeepSeek, Groq, Together, Fireworks, Mistral)
- `HermesA2AExecutor._do_anthropic_native(task_text)` — NEW, uses the
  official `anthropic` Python SDK's `AsyncAnthropic().messages.create(...)`
- `HermesA2AExecutor._do_inference(task_text)` — dispatches by
  `self.provider_cfg.auth_scheme`

Unknown schemes fall back to OpenAI-compat with a logged warning, so future
provider additions don't crash if a native SDK path ships late.

## Fail-loud on missing SDK

`_do_anthropic_native` raises a clear `RuntimeError` with install
instructions if the `anthropic` package is missing at runtime:

    Hermes anthropic native path requires the `anthropic` package. Install
    in the workspace image with `pip install anthropic>=0.39.0` or set
    HERMES provider=openrouter to route Claude models through OpenRouter's
    OpenAI-compat shim instead.

This is intentional: silent fallback would mask fidelity loss (tool_use
blocks become plain text, vision gets stripped). Loud failure is better.

`requirements.txt` adds `anthropic>=0.39.0` so the package is baked into
the workspace-template image build path. Operators building custom workspace
images without anthropic installed get the loud error.

## Back-compat

- `create_executor(hermes_api_key="x")` → still routes to Nous Portal
  (`auth_scheme="openai"`), unchanged
- `HERMES_API_KEY` env var → still first in RESOLUTION_ORDER
- `OPENROUTER_API_KEY` env var → still second
- All 14 OpenAI-compat providers unchanged — they take the same code path
  as before
- ONLY `anthropic` provider changes behavior: it now uses the native
  Messages API instead of the `/v1/chat/completions` compat shim

## Constructor signature change

`HermesA2AExecutor.__init__` now takes `provider_cfg: ProviderConfig`
instead of separate `api_key + base_url + model`. The three fields are
derived from `provider_cfg` + an optional model override. This is a
breaking change for any external caller building an executor directly,
but the only documented public entry point is `create_executor()`, which
is updated in the same commit to pass the cfg through.

## Test coverage

`workspace-template/tests/test_hermes_phase2_dispatch.py` — 7 new tests:

1. `test_anthropic_entry_has_anthropic_scheme` — registry flip
2. `test_all_other_providers_still_openai_scheme` — regression guard
3. `test_dispatch_openai_scheme_calls_openai_compat` — happy path
4. `test_dispatch_anthropic_scheme_calls_anthropic_native` — happy path
5. `test_dispatch_unknown_scheme_falls_back_to_openai_compat` — forward compat
6. `test_anthropic_native_raises_clear_error_when_sdk_missing` — fail-loud
7. `test_create_executor_passes_provider_cfg` — constructor wiring

All pass locally (pytest tests/test_hermes_phase2_dispatch.py -v, 0.04s).
Phase 1 tests unchanged: `test_hermes_providers.py` 26/26 pass, no
regressions.

## What's NOT in this PR (Phase 2b)

- Gemini native `generateContent` path (`auth_scheme="gemini"`)
- Streaming support across both native paths (`astream_messages`, `streamGenerateContent`)
- Tool calling on the anthropic native path (the `tools` + `tool_use` blocks)
- Vision content blocks (image_url → anthropic image blocks)
- Extended thinking parameter passthrough

All scoped in `project_hermes_multi_provider.md`. Phase 2a is the minimum
viable native Anthropic dispatch — single-turn text in, text out, no tools.

## Related

- Phase 1 baseline (already in main): #208 — provider registry + OpenAI-compat path
- Queued memory: `project_hermes_multi_provider.md` — full phased plan
- Triggering directive: CEO 2026-04-15 — "once current works are cleared,
  focus on supporting hermes agent"
2026-04-15 12:23:56 -07:00
Hongming Wang
1c41c30310 fix(workspace-template): #220 — send auth_headers() on initial_prompt + idle loop
Closes #220. #215 added auth_headers() to /registry/register but missed
two other self-post paths from the same workspace container:

1. initial_prompt (_do_send_sync) — fires once on first boot after the
   A2A server is ready. Posts to /workspaces/:id/a2a via the platform
   proxy. Missing headers meant the initial prompt got silently
   dropped as 401 on any token-enrolled workspace.

2. idle loop (_post_sync) — fires every idle_interval_seconds while
   the workspace has no active task (#205 pattern). Same proxy path,
   same missing headers, same silent 401 in multi-tenant mode.

Both now build headers as
  {"Content-Type": "application/json", **auth_headers()}

auth_headers() returns {"Authorization": "Bearer <token>"} when
/auth-token.txt exists, empty dict otherwise (first boot before
register issues the token). The existing lazy-bootstrap fail-open
on the platform side covers the empty-dict case.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 12:02:01 -07:00
Hongming Wang
dc5c4b9dfa Merge pull request #231 from Molecule-AI/fix/160-sdk-error-probe
fix(claude-sdk): #160 — probe CLI directly when SDK swallowed the real stderr
2026-04-15 11:58:59 -07:00
Hongming Wang
a202db15e1 Merge branch 'main' into fix/160-sdk-error-probe 2026-04-15 11:54:13 -07:00
Hongming Wang
715ecc2caf Merge branch 'main' into fix/issue-215-register-auth 2026-04-15 11:54:09 -07:00
Hongming Wang
54b49ffd1b fix(code-review): idle loop hardening + idle_prompt docs + admin-auth runbook
Addresses items 4, 5, 7 from the self-review of the batch merge. PR A
(#228) covered items 1, 2, 3, 6 on the Go side.

## workspace-template/main.py — idle loop hardening

- Replace asyncio.get_event_loop() with asyncio.get_running_loop() —
  the former is deprecated in 3.12+ and emits a DeprecationWarning on
  every idle fire.
- Replace hardcoded urlopen timeout=600 with IDLE_FIRE_TIMEOUT_SECONDS
  clamped to max(60, min(300, idle_interval_seconds)). Long cadence
  workspaces no longer hold dangling requests open for 10 minutes; the
  cap adapts automatically when the interval is short.
- Type the exception handling: split HTTPError (has .code) from URLError
  (connection-level) from the generic catch-all. Log status + error
  class separately so operators can grep for specific failure modes
  instead of a bare "post failed".
- Fire-and-forget no longer loses exceptions. run_in_executor Future
  now has an add_done_callback that logs the outcome, so a panic in
  _post_sync surfaces as "Idle loop: post failed — status=None err=..."
  instead of Python's default "Task exception was never retrieved"
  warning burried in stderr.

## org-templates/molecule-dev/org.yaml — discoverability

Added idle_prompt + idle_interval_seconds to the defaults: block with
explanatory comments. Without this, users had to read main.py to
discover the feature.

## docs/runbooks/admin-auth.md — new

Documents the three middleware variants (AdminAuth strict,
CanvasOrBearer soft, WorkspaceAuth per-id), the exact contract of each,
and the three-question test for adding a new route to CanvasOrBearer.
Also flags the session-cookie follow-up as Phase H.

Referenced PRs: #138, #164, #165, #166, #167, #168, #190, #194, #203,
#228.

No code deltas in platform/ beyond the Python + YAML + docs changes.
Full pytest suite unchanged except the pre-existing test_hermes_smoke
flake that fails in full-suite but passes in isolation (test isolation
bug, not introduced by this PR).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 11:52:01 -07:00
rabbitblood
1151265b72 fix(claude-sdk): #160 — probe CLI directly when SDK swallowed the real stderr
Context: when the claude-agent-sdk wraps a stream error from the CLI
subprocess that it can't categorize (rate limit, auth, network), it
raises a bare `Exception("Command failed with exit code 1\nError output:
Check stderr output for details")`. The exception has no `.stderr` or
`.exit_code` attributes, so #66's `_format_process_error` — which reads
those attributes — has nothing to surface. The log line becomes:

    SDK agent error [claude-code]: Exception: Command failed with exit
    code 1 (exit code: 1)\nError output: Check stderr output for details

That's the placeholder text from the SDK's error path, not the actual
error. Operators chasing a stuck workspace are forced to `docker exec
ws-xxx claude --print` manually to discover the real cause. Observed
today during the rate-limit incident: every PM error line was identical
"Check stderr output for details" while the real cause ("You've hit
your limit · resets Apr 17, 11pm (UTC)") was only visible via manual
reproduction — that cost ~20 minutes of diagnosis time.

## Fix

Add `_probe_claude_cli_error()`: a best-effort subprocess call that runs
`claude --print` with a small probe input, captures stderr+stdout, and
returns the real error string. Bounded by 30s timeout so a hung CLI
can't stall the error path.

Extend `_format_process_error` with ONE narrow fallback: if the
exception has no stderr/exit_code AND its message contains the specific
"Check stderr output for details" marker, call the probe and append
`probed_cli_error=<real error>` to the formatted line.

Critically: the probe only runs in the narrow case where we have
nothing else to log. If `.stderr` or `.exit_code` are present (the
normal ProcessError path from #66), the probe is skipped — no wasted
subprocess, no 30s latency on every error.

## Test coverage

`workspace-template/tests/test_claude_sdk_executor.py` adds 3 new tests:
- `test_format_process_error_probes_cli_when_stderr_swallowed` — the
  happy path: exception matches the marker, probe runs, result appears
  in the formatted line. Probe is monkeypatched so no subprocess spawns
  in the test.
- `test_format_process_error_does_not_probe_when_stderr_already_present` —
  negative: regular ProcessError with `.stderr` set does NOT trigger
  the probe (skip the wasted call).
- `test_format_process_error_does_not_probe_without_swallowed_marker` —
  negative: unrelated plain exceptions (e.g. RuntimeError) do NOT
  trigger the probe (so the common-case error path stays fast).

All 7 `_format_process_error` tests pass locally (4 existing + 3 new):
\`\`\`
pytest tests/test_claude_sdk_executor.py -k format_process_error
======================= 7 passed in 0.06s ========================
\`\`\`

## Impact

Next time the SDK swallows a real error (rate limit, auth failure,
network outage), the workspace log will contain the actual error string
alongside the generic placeholder:

    SDK agent error [claude-code]: Exception: Command failed with exit
    code 1 ... | probed_cli_error="You've hit your limit · resets Apr
    17, 11pm (UTC)"

Diagnosis time drops from "docker exec each ws, run claude --print,
read stderr" (~20 min) to "grep probed_cli_error in platform logs"
(~10 seconds).

Closes #160.
2026-04-15 11:50:55 -07:00
Dev Lead Agent
b8f810dd21 fix(workspace-template): include auth_headers() on /registry/register POST
The register call was missing headers=auth_headers(), so workspaces that
already have a persisted token (i.e. every restart after the first boot)
were sending an unauthenticated request. The platform's register handler
returns 401 for requests missing a valid bearer token once a token has
been issued, causing re-registration to fail on every restart.

Import auth_headers at the module level (alongside the existing save_token
inline import) and pass it to the httpx POST. auth_headers() returns {}
when no token is on file yet (first boot), so there is no regression for
fresh workspaces — the platform still issues a token on the 200 response
and save_token() persists it for all subsequent restarts.

Closes #215

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-15 18:44:53 +00:00
Hongming Wang
7f11328e22 Merge pull request #205 from Molecule-AI/feat/workspace-idle-loop
feat(workspace): add idle-loop reflection pattern (Hermes/Letta shape, opt-in, ~90 LOC)
2026-04-15 11:21:47 -07:00
Hongming Wang
8a011c9f51 Merge remote-tracking branch 'origin/main' into feat/workspace-idle-loop 2026-04-15 11:21:15 -07:00
Hongming Wang
80ae2bd6ad Merge remote-tracking branch 'origin/main' into feat/hermes-phase1-provider-registry 2026-04-15 11:20:51 -07:00
Hongming Wang
7d7d5995e0 fix(workspace-template): #204 — drop PushNotificationSender (abstract class)
Closes #204. PR #198 wired push_sender=PushNotificationSender() into
DefaultRequestHandler to satisfy #175's push-notification capability,
but PushNotificationSender in a2a-sdk is an abstract base class and
cannot be instantiated. Every workspace container crashed on startup
with TypeError.

Reverted to DefaultRequestHandler's defaults. The pushNotifications
capability still appears in AgentCard.capabilities (advertised to A2A
clients) but actual implementation of the sender is deferred to a
Phase-H follow-up that subclasses PushNotificationSender properly.

Existing pytest suite unchanged (the crash was only at runtime on
main.py import, which no existing test exercises directly).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 11:18:52 -07:00
rabbitblood
8d8ca18bc0 feat(hermes): Phase 1 — multi-provider registry (15 providers, back-compat preserved)
Ships the first half of the queued Hermes adapter expansion. PR 2 only
supported Nous Portal + OpenRouter; this adds 13 more providers reachable
via OpenAI-compat endpoints. Native SDK paths for Anthropic + Gemini are
Phase 2 (better tool-calling + vision fidelity).

## What's new

**`workspace-template/adapters/hermes/providers.py`** (new file, 220 LOC):
- ``ProviderConfig`` dataclass: name, env vars, base URL, default model, auth scheme, docs
- ``PROVIDERS`` dict with 15 entries across 4 groups:
  - PR 2 baseline: nous_portal, openrouter
  - Frontier commercial: openai, anthropic, xai, gemini
  - Chinese providers: qwen, glm, kimi, minimax, deepseek
  - OSS/alt: groq, together, fireworks, mistral
- ``RESOLUTION_ORDER`` tuple: priority for auto-detect (back-compat first,
  then commercial, then Chinese, then OSS/alt)
- ``resolve_provider(explicit=None)`` -> (ProviderConfig, api_key)
  - With explicit name: routes to that provider, raises if env var empty
  - Without: walks RESOLUTION_ORDER, first env-var-set provider wins

**`workspace-template/adapters/hermes/executor.py`** (refactored):
- `create_executor(hermes_api_key=None, provider=None, model=None)` now has
  three parameters:
  - `hermes_api_key`: PR 2 back-compat — routes to Nous Portal
  - `provider`: canonical short name from the registry (e.g. "anthropic")
  - `model`: optional override of the provider's default model
- Delegates all resolution to `providers.resolve_provider()` — no more
  hardcoded URLs or env var lookups in the executor itself
- `HermesA2AExecutor.__init__` no longer has Nous-specific defaults; callers
  pass base_url + model explicitly (which create_executor always does)

**`workspace-template/tests/test_hermes_providers.py`** (new file, 26 tests):
- Registry shape invariants (count >= 15, no duplicates, every config valid)
- PR 2 back-compat: HERMES_API_KEY / OPENROUTER_API_KEY still route correctly
- Auto-detect for every provider in the registry (parametrized — guards against
  typos in env var lists)
- Explicit `provider=` bypass of auto-detect
- Error cases: unknown provider, explicit-but-empty, auto-detect-with-no-env
- All 26 tests pass locally in 0.08s

## Back-compat guarantees

| Scenario | PR 2 behavior | This PR behavior |
|---|---|---|
| `create_executor(hermes_api_key="x")` | Nous Portal | Nous Portal (unchanged) |
| `HERMES_API_KEY=x` env, auto-detect | Nous Portal | Nous Portal (unchanged) |
| `OPENROUTER_API_KEY=x` env, auto-detect | OpenRouter | OpenRouter (unchanged) |
| Both env + explicit hermes_api_key param | Nous Portal (param wins) | Nous Portal (param wins, unchanged) |

Nothing existing can break. New callers gain access to 13 more providers.

## What's NOT in this PR (Phase 2)

- **Native Anthropic Messages API path** — better tool calling, vision, extended
  thinking. Requires pulling in `anthropic` SDK. ~50 LOC.
- **Native Gemini generateContent path** — for vision + google tools. Requires
  `google-genai` SDK. ~50 LOC.
- **Streaming support across all providers** — current executor is non-streaming
  (single chat.completions.create call). Streaming works with openai.AsyncOpenAI
  but hasn't been wired to the A2A event queue path. ~30 LOC.
- **Per-provider model overrides in config.yaml** — Phase 1 uses the registry's
  default_model. Phase 2 adds a `hermes: { provider: qwen, model: qwen3-coder-plus }`
  block in the workspace config.
- **`.env.example` updates** — not critical since the registry itself documents
  every env var via the `env_vars` field, but nice-to-have.

## Related
- Queued memory: `project_hermes_multi_provider.md`
- CEO directive 2026-04-15: *"once current works are cleared, I want you to
  focus on supporting hermes agent, right now it doesnt take too much providers"*
- `docs/ecosystem-watch.md` → `### Hermes Agent` — Research Lead's eco-watch
  entry listed "Nous Portal, OpenRouter, GLM, Kimi, MiniMax, OpenAI, …" which
  shaped this registry's initial set

## Test plan
- [x] Unit tests: 26/26 pass locally (pytest)
- [ ] CI will run on the self-hosted macOS arm64 runner
- [ ] Smoke test in a real workspace: set QWEN_API_KEY and verify Technical
      Researcher actually hits Alibaba DashScope successfully
- [ ] Integration test per provider with real API keys (gated on env, skip
      when not set — Phase 2 CI addition)
2026-04-15 11:14:35 -07:00
rabbitblood
37bca9176e feat(workspace): add idle-loop reflection pattern (Hermes/Letta shape)
Today's multi-framework research (Hermes, Letta, Trigger.dev, Inngest, AG2,
Rivet, n8n, Composio, SWE-agent — see docs/ecosystem-watch.md) confirmed
that nobody runs while(true) per agent. The working patterns are:

  (a) event-driven + hibernation (Hermes, Letta, Trigger.dev, Inngest)
  (b) cron/user-triggered ephemeral runs (AG2, Rivet, n8n, SWE-agent)

Molecule AI is currently 100% in category (b). Observed team utilization:
~0.5% — agents idle 99.5% of the time because cron fires and CEO-typed
A2A are the only initiating signals. CEO's north-star is 24/7 iteration,
current cadence falls short.

This PR closes the gap by adding an in-workspace idle loop that wakes the
agent periodically ONLY when it has no active task. The shape is the
Hermes reflection-on-completion pattern combined with the Letta backlog-pull
pattern, collapsed into a ~60 LOC change in the workspace-template. Zero
new Go code. Zero new DB tables. Zero new API endpoints.

## How it works

1. `config.py` gets two new fields on WorkspaceConfig:
   - `idle_prompt: str = ""` — the prompt to self-send when idle
   - `idle_interval_seconds: int = 600` — how often to check (default 10 min)
   Both support inline or file ref (matching the initial_prompt pattern).

2. `main.py` spawns an `_run_idle_loop()` asyncio task alongside the
   existing initial_prompt task (same lifecycle hooks — cancelled in the
   `finally:` of the server.serve() block).

3. The loop body:
   a. Sleep interval
   b. Check `heartbeat.active_tasks == 0` LOCALLY (no LLM call, no HTTP)
   c. If idle → self-POST the idle_prompt via the existing /workspaces/{id}/a2a proxy
   d. Loop
   The agent's own concurrency control rejects the post if it becomes busy
   between the check and the POST — that's the safety valve.

4. Gated on `config.idle_prompt` being non-empty. Default = "" = no loop.
   Existing workspaces upgrade silently as no-ops until someone explicitly
   opts in by setting idle_prompt in org.yaml (either defaults: or
   per-workspace:).

## Cost analysis (from the research report)

- while(true) pattern: ~$93/day/org (12 agents × 12 thinks/hour × $0.027). Unshippable.
- Hermes reflection-on-completion: ~$0.45/day/org. Cost ∝ useful work.
- This PR's idle loop at 10-min cadence: upper bound 12 × 6/hour × 24h
  × ~3k tokens × Sonnet rate ≈ $5/day/org PER ROLE, only if they're
  genuinely idle every check. In practice far less because busy periods
  skip the LLM call entirely (the active_tasks check is local).

## Rollout plan

Research report recommended rolling to ONE workspace first (Technical
Researcher) and measuring 24h of activity_logs before enabling for
all 12. This PR enables the mechanism; it does NOT add any default
idle_prompt to org-templates/molecule-dev/org.yaml. That's a follow-up
PR after this one lands and one workspace has been manually opted in
for measurement.

## Not touched in this PR

- No Go code (no new platform endpoint, no new DB columns)
- No org.yaml changes (zero-impact until someone opts in)
- No scheduler changes (the idle loop is a workspace concern, not a
  scheduler concern — matches the research report's layering)

## Test plan

- [x] Python syntax check (ast.parse) on main.py + config.py
- [ ] Unit test: WorkspaceConfig parses idle_prompt / idle_interval_seconds from yaml
- [ ] Integration test: set idle_prompt on Technical Researcher, measure that
      an A2A message is received every ~10 min while idle, and NOT received
      while busy with a delegation
- [ ] Dogfood: enable on Technical Researcher for 24h, count activity_logs
      delta vs baseline, confirm cost stays within model

## Related

- Today's research report (conversation output, summarized in commit trailer)
- docs/ecosystem-watch.md → `### Hermes Agent` (the canonical reflection-on-completion example)
- #159 orchestrator/worker split — complementary: leaders pulse for dispatch,
  workers idle-loop for pull. Together: leaders push work, workers pull work,
  no role ever sits idle with a cold queue.
2026-04-15 11:09:43 -07:00
Backend Engineer
4cea1c6478 fix(a2a): cancel() event, stateTransitionHistory capability, wire push store (#173 #174 #175)
#173 — implement cancel() in LangGraphA2AExecutor: emits
TaskStatusUpdateEvent(state=canceled, final=True) so clients see the
state transition rather than silence. Removes pragma: no cover.
Test: test_cancel_emits_canceled_event.

#174 — add stateTransitionHistory=True to AgentCapabilities in main.py
so microsoft/agent-framework clients know they can request full task
history via the A2A protocol.

#175 — wire InMemoryPushNotificationConfigStore and PushNotificationSender
into DefaultRequestHandler so the advertised pushNotifications capability
is backed by a real store. Both classes live in a2a.server.tasks (a2a-sdk
0.3.25); import confirmed by probe.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-15 17:58:10 +00:00
Hongming Wang
119b02c544 feat(plugins): split guardrails into 12 modular plugins
Replaces the proposed monolithic molecule-guardrails plugin with 12
single-purpose plugins users can install à la carte. Powered by a
small extension to the AgentskillsAdaptor base class so any plugin can
ship hooks/, commands/, and a settings-fragment.json without writing a
custom adapter.

## Base adapter changes

workspace-template/plugins_registry/builtins.py + sdk/python/molecule_plugin/builtins.py
(both copies — drift-tested):
- New _install_claude_layer() helper called at the end of install()
- Conditionally copies hooks/ → /configs/.claude/hooks/ (preserving exec bit)
- Conditionally copies commands/*.md → /configs/.claude/commands/
- Conditionally merges settings-fragment.json into /configs/.claude/settings.json
  with ${CLAUDE_DIR} placeholder rewritten to the workspace's absolute install
  path. Existing user hooks are preserved (deep-merge by event name).
- All steps no-op when the plugin doesn't ship the corresponding files,
  so existing skill+rule plugins (molecule-dev, superpowers, ecc,
  browser-automation) are unchanged.

Drift test (tests/test_plugins_builtins_drift.py) still passes.

## 12 new plugins

Hook plugins (ambient enforcement):
- molecule-careful-bash       — refuses destructive bash; ships careful-mode skill
- molecule-freeze-scope       — locks edits via .claude/freeze
- molecule-audit-trail        — appends every Edit/Write to audit.jsonl
- molecule-session-context    — auto-loads cron-learnings at session start
- molecule-prompt-watchdog    — injects warnings on destructive prompt keywords

Skill plugins (on-demand):
- molecule-skill-code-review        — 16-criteria multi-axis review
- molecule-skill-cross-vendor-review — adversarial second-model review
- molecule-skill-llm-judge          — deliverable-vs-request scoring
- molecule-skill-update-docs        — post-merge doc sync
- molecule-skill-cron-learnings     — operational-memory JSONL format

Workflow plugins (slash commands):
- molecule-workflow-triage  — /triage full PR-triage cycle
- molecule-workflow-retro   — /retro + cron-retro skill, weekly retrospective

Each ships only what it needs — most have just plugin.yaml + skills/ or
hooks/ + adapter (one-line stub: `from plugins_registry.builtins import
AgentskillsAdaptor as Adaptor`). Total ~120 files but each plugin is
small and self-contained.

## Verification

- python3 -m molecule_plugin validate plugins/molecule-* → all 13 valid
  (12 new + pre-existing molecule-dev)
- End-to-end install smoke test on representative samples: hook plugin
  (molecule-careful-bash), skill-only plugin (molecule-skill-code-review),
  workflow plugin (molecule-workflow-triage). All produce expected
  /configs/ tree, settings.json paths rewritten, exec bits preserved,
  zero warnings.
- workspace-template pytest tests/test_plugins_builtins_drift.py → passes
  (SDK + runtime stay in sync).

## CLAUDE.md repo-doc updated

Lists all 12 new plugins under the existing Plugins section, organized
by category (hook / skill / workflow). Each entry one line, recommend-
together hints where dependencies make sense.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 12:20:04 -07:00
Hongming Wang
cd4eb9c590 Merge pull request #49 from Molecule-AI/feat/hermes-pr2
feat(hermes): implement create_executor() with HERMES_API_KEY / OPENROUTER_API_KEY fallback + smoke tests
2026-04-14 08:16:15 -07:00
Hongming Wang
b6c2f15933 fix(workspace): recursive chown when /workspace bind mount is root-owned (#13)
On Docker Desktop (macOS/Windows), host-path bind mounts often appear
root-owned inside the container. The previous entrypoint only chowned
/workspace top-level, so agents (uid 1000) still couldn't write to
/workspace/repo/* — git clone, pip install, and file edits failed with
EACCES and fell back to /tmp. Detect the root-owned-contents case by
sampling the first entry; if it's root-owned, recursively chown the
tree. On normal Linux Docker with matching uids this is a no-op, so the
fast-startup path is preserved for the common case.

Part B of the issue (private-repo initial_prompt clone) was addressed
by PR #20.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 07:29:30 -07:00
Dev Lead Agent
363a55782b fix(security): complete Phase 30.6 auth headers in a2a_client get_peers and discover_peer
get_peers() was sending no auth headers to /registry/:id/peers — this would
return 401 for every workspace agent after PR #31 (WorkspaceAuth middleware)
deploys, breaking peer discovery entirely.

discover_peer() had X-Workspace-ID but was missing the bearer token, also
required by Phase 30.6 for /registry/discover/:id.

Both functions now send {"X-Workspace-ID": WORKSPACE_ID, **auth_headers()}.
get_workspace_info() was already correct (auth_headers() present since PR #39).

Adds test_request_sends_workspace_id_header to TestGetPeers; hardens the
discover_peer header assertion to use presence-check rather than exact equality.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-14 13:23:44 +00:00
Hongming Wang
a565c49bce Merge pull request #41 from Molecule-AI/fix/security-h3-m4
noteworthy: secrets-handling — H3 github_pat_ redaction + M4 atomic 0600 token write. 7-gate verification PASS.
2026-04-14 03:21:49 -07:00
Dev Lead Agent
1a109b3263 fix(security): H3 github_pat_ redaction + M4 atomic token write (audit cycle 10)
H3 (compliance.py): GitHub fine-grained PATs use the github_pat_ prefix
with an 82-character alphanumeric+underscore suffix — different from
classic tokens (36 chars). Add the missing pattern to _PII_PATTERNS so
fine-grained PATs are redacted in compliance logs alongside classic tokens.

M4 (platform_auth.py): Replace write_text()+chmod() in save_token() with
os.open(O_WRONLY|O_CREAT|O_TRUNC, 0o600) + os.write(). The old approach
had a TOCTOU window where a concurrent reader could access the token file
before chmod restricted permissions. os.open with explicit mode creates the
file with 0600 permissions atomically in a single syscall.

H2 (a2a_client.py): Already fixed in commit 6c78962 (Cycle 5); no-op.

Tests: 1136 passed, 2 skipped (workspace-template pytest suite)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-14 09:34:27 +00:00
Backend Engineer
9649311d51 fix(security): N1 — add auth headers to all platform calls in Python callers
IMPACT WITHOUT THIS FIX: deploying PR #31 (WorkspaceAuth middleware on
/workspaces/*) without this patch causes EVERY delegation cycle to silently
break — the heartbeat poll returns 401, the self-message A2A POST returns
401, agents never wake up after task completion, and memory consolidation
stops. The entire multi-agent coordination system degrades to single-shot
interactions with no result delivery.

Changes (all using the existing platform_auth.auth_headers() pattern
already used for POST /registry/heartbeat):

heartbeat.py — 5 calls fixed:
  - GET  /workspaces/:id/delegations     (delegation poll)
  - GET  /workspaces/:id                 (self workspace info for parent lookup)
  - GET  /workspaces/{parent_id}         (parent workspace name lookup)
  - POST /workspaces/:id/a2a             (self-message to wake agent on results)
  - POST /workspaces/:id/notify          (canvas delegation result notification)
  Also moved `from platform_auth import auth_headers` from inline (per-call)
  to module-level import so _check_delegations() can use it without re-importing.

consolidation.py — 4 calls fixed:
  - GET    /workspaces/:id/memories      (fetch memories for consolidation)
  - POST   /workspaces/:id/memories      (write consolidated summary — agent path)
  - DELETE /workspaces/:id/memories/:id  (delete original memories post-consolidation)
  - POST   /workspaces/:id/memories      (write consolidated summary — fallback path)

a2a_client.py — 1 call fixed:
  - GET /workspaces/:id                  (get_workspace_info())

⚠️  DEPLOYMENT NOTE: This PR MUST be merged and deployed at the same time as
PR #31 (WorkspaceAuth middleware). Deploying #31 without this fix will
immediately break all delegation result delivery.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-14 08:37:50 +00:00
Hongming Wang
7d3e369632 fix(gate-3): update watcher test to expect SHA-256 hash
Align test_hash_file_real_file with the SHA-256 switch in watcher.py.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 01:21:35 -07:00
Dev Lead Agent
486275868d fix(security): H1 — replace MD5 with SHA-256 in config/skill watchers
Both watcher.py (ConfigWatcher) and skill_loader/watcher.py
(SkillsWatcher) used hashlib.md5() for file-integrity change detection.
MD5 is collision-prone: a crafted config file could produce the same
hash as a benign one, silently suppressing the hot-reload callback and
preventing agents from picking up legitimate config changes.

Replace hashlib.md5 → hashlib.sha256 in both _hash_file() methods.
Update docstrings, comments, and the type-annotation comment
(rel_path → md5 hex → sha256 hex).

Test update: test_skills_watcher.py — rename helper _md5 → _sha256,
update the hash-length assertion from 32 (MD5) to 64 (SHA-256), and
rename the test from test_hash_file_returns_md5_for_existing_file to
test_hash_file_returns_sha256_for_existing_file. All 25 watcher tests
pass.

Note: H2 (a2a_client.py timeout=None) was already fixed in Cycle 5
(timeout=httpx.Timeout(connect=30.0, read=300.0, ...)) — confirmed by
code review before opening this PR.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-14 07:52:07 +00:00
Dev Lead Agent
6c78962a33 fix(security): Cycle 5 — auth middleware, injection hardening, skill sandbox
Fix A — platform/internal/middleware/wsauth_middleware.go (NEW):
  WorkspaceAuth() gin middleware enforces per-workspace bearer-token auth on
  ALL /workspaces/:id/* sub-routes. Same lazy-bootstrap contract as
  secrets.Values: workspaces with no live token are grandfathered through.
  Blocks C2, C3, C4, C5, C7, C8, C9, C12, C13 simultaneously.

Fix A — platform/internal/router/router.go:
  Reorganised route registration: bare CRUD (/workspaces, /workspaces/:id)
  and /a2a remain on root router; all other /workspaces/:id/* sub-routes
  moved into wsAuth = r.Group("/workspaces/:id", middleware.WorkspaceAuth(db.DB)).
  CORS AllowHeaders updated to include Authorization so browser/agent callers
  can send the bearer token cross-origin.

Fix B — workspace-template/heartbeat.py:
  _check_delegations(): validate source_id == self.workspace_id before
  accepting a delegation result. Attacker-crafted records with a foreign
  source_id are silently skipped with a WARNING log (injection attempt).
  trigger_msg no longer embeds raw response_preview text; references
  delegation_id + status only — removes the prompt-injection vector.

Fix C — workspace-template/skill_loader/loader.py:
  load_skill_tools(): before exec_module(), verify script is within
  scripts_dir (path traversal guard) and temporarily scrub sensitive env
  vars (CLAUDE_CODE_OAUTH_TOKEN, ANTHROPIC_API_KEY, OPENAI_API_KEY,
  WORKSPACE_AUTH_TOKEN, GITHUB_TOKEN, GH_TOKEN) from os.environ; restore
  in finally block. Defence-in-depth even if /plugins auth gate is bypassed.

Fix D — platform/internal/handlers/socket.go:
  HandleConnect(): agent connections (X-Workspace-ID present) validated via
  wsauth.HasAnyLiveToken + wsauth.ValidateToken before WebSocket upgrade.
  Canvas clients (no X-Workspace-ID) remain unauthenticated.

Fix D — workspace-template/events.py:
  PlatformEventSubscriber._connect(): include platform_auth bearer token in
  WebSocket upgrade headers alongside X-Workspace-ID.

Fix E — workspace-template/executor_helpers.py:
  recall_memories() and commit_memory() now pass platform_auth bearer token
  in Authorization header so WorkspaceAuth middleware allows access.

Fix F — workspace-template/a2a_client.py:
  send_a2a_message(): timeout=None → httpx.Timeout(connect=30, read=300,
  write=30, pool=30). Resolves H2 flagged across 5 consecutive audits.

Tests: 149/149 Python tests pass (test_heartbeat + test_events updated to
assert new source_id validation behaviour and allow Authorization header).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-14 04:44:42 +00:00
Dev Lead Agent
08fe37aee1 feat: implement Hermes adapter create_executor() with OpenRouter fallback
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-13 16:47:29 -07:00
Hongming Wang
24fec62d7f initial commit — Molecule AI platform
Forked clean from public hackathon repo (Starfire-AgentTeam, BSL 1.1)
with full rebrand to Molecule AI under github.com/Molecule-AI/molecule-monorepo.

Brand: Starfire → Molecule AI.
Slug: starfire / agent-molecule → molecule.
Env vars: STARFIRE_* → MOLECULE_*.
Go module: github.com/agent-molecule/platform → github.com/Molecule-AI/molecule-monorepo/platform.
Python packages: starfire_plugin → molecule_plugin, starfire_agent → molecule_agent.
DB: agentmolecule → molecule.

History truncated; see public repo for prior commits and contributor
attribution. Verified green: go test -race ./... (platform), pytest
(workspace-template 1129 + sdk 132), vitest (canvas 352), build (mcp).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 11:55:37 -07:00