molecule-core

Author	SHA1	Message	Date
Hongming Wang	c7477047c2	Merge pull request #338 from Molecule-AI/fix/issue-328-transcript-fail-closed fix(security): /transcript fails closed when auth token missing (#328)	2026-04-15 21:30:56 -07:00
Hongming Wang	0e46afa4b9	fix(security): hitl task-id ownership + wire fail_open_if_no_scanner in loader (closes #265 , #268 ) Security audit cycle 13: hitl.py LGTM (workspace-scoped task IDs). Loader.py fix applied (commit 0557f73): fail_open_if_no_scanner now read from config and forwarded to scan_skill_dependencies(); regression test added. CI 5/6 pass (E2E cancel = run-supersession pattern). Closes #265. Closes #268.	2026-04-15 21:18:52 -07:00
Hongming Wang	e1cdb5c9c6	fix(security): /transcript endpoint fails closed when auth token missing (#328 ) Severity HIGH. The /transcript route in main.py used `if expected:` around the bearer-token compare, so `get_token()` returning None (no /configs/.auth_token on disk — bootstrap window, deleted file, OSError) silently skipped the entire auth check. Any container on molecule-monorepo-net could GET /transcript during the provisioning window and walk away with the full session log (user messages, Claude tool calls, assistant replies). The platform's TranscriptHandler always has a valid token (it acquired one at workspace registration), so tightening this gate has no legitimate-caller impact. Only unauthenticated sniffers lose access, which was never the intended contract of #287. Fix: 1. Extracted the auth gate into `workspace-template/transcript_auth.py` — a 20-line module with no heavy imports so the security-critical code is unit-testable without standing up the full uvicorn/a2a/httpx stack (the former inline guard could only be tested end-to-end, which explains why the regression shipped in #287). 2. `transcript_authorized(expected, auth_header)` returns False when `expected` is None or empty — the #328 fix — and otherwise does strict equality against "Bearer <expected>". 3. main.py's inline handler calls the extracted function: if not _transcript_authorized(get_token(), auth_header): return 401 4. New tests/test_transcript_auth.py covers: None token, empty token, valid bearer, wrong bearer, missing header, case-sensitive prefix, whitespace fuzzing. All 7 pass. Closes #328 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 21:17:37 -07:00
Hongming Wang	5451164cba	fix(security): add bearer token auth to /transcript endpoint (#287 ) Closes #287 Any container on molecule-monorepo-net could previously read the full Claude session log without authentication. Guard uses get_token() from platform_auth — skipped only before workspace registration (dev-mode).	2026-04-15 19:47:23 -07:00
Hongming Wang	84d5e395d4	fix(a2a-tools): auth_headers on recall_memory + commit_memory (#304 ) Adds auth_headers to recall_memory and commit_memory in a2a_tools.py. Fixes the #215-class auth regression for A2A memory tools. Test mocks updated to accept headers kwarg.	2026-04-15 19:12:18 -07:00
Hongming Wang	72d30c0b14	Merge pull request #270 from Molecule-AI/feat/workspace-transcript-endpoint feat: GET /workspaces/:id/transcript — live agent session log	2026-04-15 17:55:41 -07:00
Hongming Wang	6898391dd0	fix(tests): update memory fakes for auth_headers kwarg + activity overwrite The #215-class fix in memory.py (859a60e) adds headers=_headers to the direct-httpx commit_memory + search_memory paths, but 9 existing tests in test_memory.py had FakeAsyncClient.post/get signatures like `async def post(self, url, json):` with no headers kwarg. Python raised TypeError: unexpected keyword argument 'headers' on every call, commit_memory caught it and returned {success: False}, tests failed. Fixes applied: 1. Add `headers=None` to every FakeAsyncClient.post + .get signature across test_memory.py. Uses replace_all so all 9+ fakes match. 2. For tests that capture a single captured["url"]: - test_commit_memory_uses_awareness_client_when_configured - test_commit_memory_uses_platform_fallback_without_awareness - test_commit_memory_httpx_201_success filter to only capture /memories URLs. Without the filter, the subsequent _record_memory_activity fire-and-forget post to /activity overwrites captured["url"] and the assertion fails. 3. For test_commit_memory_promoted_packet_logs_skill_promotion: bump expected captured["calls"] from 3 to 4. Pre-fix, the memory_write /activity call (from _record_memory_activity #125) was silently dropped because the fake rejected headers=; post-fix it succeeds and lands in the captured list alongside the skill_promotion /activity and /registry/heartbeat calls. Also extend that test's fake to accept /registry/heartbeat (was raising AssertionError). Total: 36/36 memory tests pass. Full workspace-template suite 1189/1189. This is strictly test-infrastructure work — zero production code changed. CI never caught the break because the Mac mini runner has been stuck for ~4 hours (tick-33/34/35/36 reports). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 17:29:15 -07:00
rabbitblood	9923159f2e	fix(memory-tools): #215-class — auth_headers on commit_memory + search_memory HTTP fallback Context: platform now gates `GET /workspaces/:id/memories` and `POST /workspaces/:id/memories` behind workspace auth (post-#166 / #167 AdminAuth wave). The `builtin_tools.memory` tool had three HTTP call sites: 1. commit_memory POST fallback (line 121) ← NO auth_headers 2. search_memory GET fallback (line 269) ← NO auth_headers 3. activity-log helper POST (line 371) ← HAS auth_headers Path 3 was already fixed. Paths 1 + 2 silently 401 every call, but the tool's error-handling path returns `{"success": False}` without surfacing the auth failure to the agent. Result: the agent sees an empty memory backlog on every call and assumes there's nothing to do. ## Discovered today Technical Researcher is the first workspace opted in to the idle-loop pilot from #216 (reflection-on-completion pattern). The pilot fires every 10 min, the agent calls `search_memory "research-backlog:..."` as the first step, gets back an empty result, writes "tr-idle clean" to memory, and stops. Clean-idle outcome every tick, 9 consecutive ticks. Looking at TR's activity_logs response bodies: "Memory auth has failed on every tick this session — skipping the call" "tr-idle — step 2 done. Memory unavailable (auth token missing..." "tr-idle 04:15 — clean (memory auth still down, 3rd consecutive tick)" The AGENT knew the memory calls were failing. The platform 401 error was surfacing in the tool response, but our instrumentation wasn't counting it as a defect — we saw "tr-idle clean" writes and assumed the pilot was working as designed. It was actually silently broken. ## Fix Import `platform_auth.auth_headers` lazily (same pattern as the activity-log path already uses), attach `headers=_auth()` to both httpx call sites. Matches the #225 fix for the register call. ## Not in this PR - awareness_client.py also makes HTTP calls to a separate AWARENESS_URL service (not the platform), which may or may not need the same fix depending on that service's auth posture. Out of scope for this PR. - TR's specific token problem: TR's `/configs/.auth_token` file is empty because it was re-provisioned via `apply_template: true` (recovery path from the failed-volume incident) and Phase 30.1 only mints a token on FIRST register per workspace. This fix doesn't help TR until TR gets a fresh token — tracked separately. ## Test plan - [x] Python syntax check on memory.py passes - [ ] CI: all memory-related tests should still pass (the new code paths only add header passing, no shape change) - [ ] Real-world verification: after TR gets a fresh token, idle-loop pilot should produce a dispatch within 10 min (seeded backlog already in place from this session) ## Related - #215 / #225 — register call auth_headers fix (same pattern) - #216 — TR idle-loop pilot (couldn't measure until this lands) - #166 / #167 — platform AdminAuth wave that surfaced this gap	2026-04-15 17:26:26 -07:00
rabbitblood	88ce2a18cd	feat(hermes): Phase 2d-i — system-prompt.md injection on all 3 dispatch paths The Hermes adapter never read /configs/system-prompt.md. Any role that switched to runtime: hermes was silently losing its role identity because the system prompt wasn't passed to the model. This PR fixes that by: 1. HermesA2AExecutor.__init__ takes new optional `config_path` kwarg 2. `create_executor(config_path=...)` forwards to the constructor 3. `adapter.py` passes `config.config_path` through from AdapterConfig 4. `execute()` reads system-prompt.md via executor_helpers.get_system_prompt (hot-reload-capable — reads on every turn, not just at startup) 5. `_do_inference(user_message, history, system_prompt)` — new arg threads through the dispatch to each native path 6. Each path uses the provider's NATIVE system field: - OpenAI-compat: prepends `{"role":"system", "content":...}` to messages - Anthropic: top-level `system=` kwarg (NOT in messages — Anthropic requires system at the top level) - Gemini: `config=GenerateContentConfig(system_instruction=...)` ## Phase scoreboard - 2a (in main) — native Anthropic dispatch infra - 2b (in main) — native Gemini dispatch - 2c (in main) — multi-turn history on all paths - 2d-i (this PR) — system prompts on all paths - 2d-ii (future) — tool calling on native paths - 2d-iii (future) — vision content blocks on native paths - 2d-iv (future) — streaming ## Test coverage 46/46 tests pass (20 Phase 2 dispatch + 26 Phase 1 registry): - Existing dispatch tests updated to assert the 3-arg call shape `("hello", None, None)` — history + system_prompt both None - 4 new tests: - `dispatch_passes_system_prompt_to_anthropic` — happy path, third arg flows - `dispatch_passes_system_prompt_to_gemini` — happy path - `dispatch_passes_system_prompt_to_openai` — happy path - `executor_accepts_config_path_kwarg` — constructor stores config_path - `create_executor_forwards_config_path` — both back-compat and registry resolution paths forward config_path through to the executor ## Back-compat - `config_path=None` (default) → execute() skips system-prompt injection, same behavior as pre-2d-i - Workspaces with `runtime: hermes` but no `/configs/system-prompt.md` file get `system_prompt=None` (get_system_prompt returns fallback), same as before - The 13 OpenAI-compat providers work identically — system_prompt just adds a leading message, which every OpenAI-compat endpoint already supports - Anthropic + Gemini previously got zero system context; now they get the same system prompt the workspace's system-prompt.md carries ## Why this matters Before this PR: if someone flipped a workspace from `runtime: claude-code` to `runtime: hermes`, the agent would act generically (no role identity, no project conventions, no CLAUDE.md context) because the Hermes executor never looked at system-prompt.md. That's a silent correctness regression the test suite wouldn't catch because none of our live workspaces use the hermes runtime today. With this PR: Hermes workspaces get the same system prompt injection as Claude-code workspaces, making the `runtime: hermes` switch a true drop-in alternative. ## Related - #267 Phase 2c (multi-turn history — in main) - #255 Phase 2b (gemini native — in main) - #240 Phase 2a (anthropic native — in main) - #208 Phase 1 (provider registry — in main) - project_hermes_multi_provider.md — Phase 2d-i was the next queued item	2026-04-15 16:21:47 -07:00
airenostars	853734aa4e	feat: GET /workspaces/:id/transcript — live agent session log Closes #N (issue to be filed) Lets canvas / operators see live tool calls + AI thinking instead of waiting for the high-level activity log to flush. Right now the only way to "look over an agent's shoulder" is `docker exec ws-XXX cat /home/agent/.claude/projects/.../<session>.jsonl`, which: - doesn't work for remote workspaces (Phase 30 / Fly Machines) - requires shell access on the host - has no pagination This PR adds: 1. `BaseAdapter.transcript_lines(since, limit)` — async hook returning `{runtime, supported, lines, cursor, more, source}`. Default returns `supported: false` so non-claude-code runtimes pass through gracefully. 2. `ClaudeCodeAdapter.transcript_lines` override — reads the most- recently-modified `.jsonl` in `~/.claude/projects/<cwd>/`. Resolves cwd the same way `ClaudeSDKExecutor._resolve_cwd()` does so the project dir name matches what Claude Code actually writes to. Limit capped at 1000 to prevent OOM. 3. Workspace HTTP route `GET /transcript` — Starlette handler added alongside the A2A app. Trusts the internal Docker network (same model as POST / for A2A); Phase 30 remote-workspace auth is a follow-up. 4. Platform proxy `GET /workspaces/:id/transcript` — looks up the workspace's URL, forwards GET, caps response at 1MB. Gated by existing `WorkspaceAuth` middleware (same as /traces, /memories, /delegations). Tests: 6 Python unit tests cover empty dir / pagination / multi-session / malformed lines / limit cap, plus 4 Go tests cover 404 / proxy forwarding / query-string propagation / unreachable-workspace 502. Verified end-to-end on a live workspace — returns real claude-code session entries through the platform proxy. ## Follow-ups - WebSocket variant for live streaming (instead of polling) - Canvas UI tab "Transcript" between Activity and Traces - LangGraph / DeepAgents / OpenClaw transcript adapters - Phase 30 remote-workspace auth on /transcript	2026-04-15 14:29:43 -07:00
rabbitblood	d40a9d940c	feat(hermes): Phase 2c — multi-turn history passed natively to all paths Completes the Phase 2 scope by keeping conversation turns as turns across all three dispatch paths. Pre-2c, history was flattened into a single user message via shared_runtime.build_task_text, which worked as a fallback but lost the model's native multi-turn awareness (role attribution, instruction-following on mid-conversation corrections, system-prompt grounding against prior turns). Phase 2a + 2b shipped the dispatch infrastructure + per-provider native paths. This PR uses them properly. ## What's new - `_history_to_openai_messages(user_message, history)` (static) — maps A2A `(role, text)` tuples to OpenAI Chat Completions `[{"role":"user"\|"assistant","content":str}]`. Roles: `human`→`user`, `ai`→`assistant`. Current turn appended as the final user message. - `_history_to_anthropic_messages` (static) — identical wire shape to OpenAI for text-only turns, so it delegates. Phase 2d tool_use/vision blocks will diverge here. - `_history_to_gemini_contents` (static) — Gemini uses a different shape: `role="user"\|"model"` (NOT "assistant") and text wrapped in `parts=[{"text":...}]`. Delegates to none of the others. - `_do_openai_compat(user_message, history=None)` — accepts history, builds messages via `_history_to_openai_messages`. Back-compat: pass `history=None` to get the old single-turn behavior. - `_do_anthropic_native(user_message, history=None)` — same signature change, calls `_history_to_anthropic_messages`. Still uses `anthropic.AsyncAnthropic().messages.create()`, just with proper multi-turn. - `_do_gemini_native(user_message, history=None)` — same pattern, calls `_history_to_gemini_contents`, passes to Gemini's `generate_content(contents=...)`. - `_do_inference(user_message, history=None)` — new signature, dispatches by auth_scheme as before, passes both args through. - `execute()` — no longer calls `build_task_text`. Calls `extract_history(context)` directly and forwards to `_do_inference`. Removes the `build_task_text` import (not needed in this file anymore). ## Tests Existing 7 dispatch tests updated for the new `(user_message, history)` signature — they assert the path is called with `("hello", None)` since they pass no history. 5 NEW tests: - `test_history_to_openai_messages_empty_history` — empty history degrades to single user message (back-compat) - `test_history_to_openai_messages_multi_turn` — round-trip of a 3-turn history + current turn - `test_history_to_anthropic_messages_same_as_openai` — cross-check that anthropic path produces identical wire shape for text-only - `test_history_to_gemini_contents_uses_model_role_and_parts_wrapper` — verifies the Gemini-specific role mapping (`ai`→`model`) + parts wrapper - `test_dispatch_passes_history_through` — end-to-end: _do_inference forwards history to the chosen provider path All 41 tests pass (15 Phase 2 dispatch + 26 Phase 1 registry): pytest tests/test_hermes_phase2_dispatch.py tests/test_hermes_providers.py 41 passed in 0.07s ## Back-compat - No public API changes to `create_executor()`. Callers that hit `execute()` via A2A get the new multi-turn behavior automatically via `extract_history(context)`. - Callers that passed an empty history list (or None) get the same single-turn behavior as pre-2c. - The `build_task_text` helper in shared_runtime is unchanged — other adapters (AutoGen, LangGraph) that use it keep working. Only Hermes bypasses it now. ## What's NOT in this PR (Phase 2d) - Tool calling / function calling on native paths (anthropic `tools=`, gemini `tools=Tool(function_declarations=[...])`) - Vision content blocks (image_url → anthropic `{type:"image", source: {type:"base64",...}}` / gemini `{inline_data:{mime_type,data}}`) - System instructions pass-through (anthropic `system=`, gemini `system_instruction=`) - Streaming (`astream_messages` / `streamGenerateContent` stream variants) - Extended thinking (anthropic `thinking={"type":"enabled"}`) / Gemini thinking config Phase 2c is the multi-turn upgrade. Tool + vision + streaming are Phase 2d, scoped in project_hermes_multi_provider.md. ## Related - #240 Phase 2a (native Anthropic dispatch — in main) - #255 Phase 2b (native Gemini dispatch — in main) - Phase 1 (#208 — provider registry baseline, in main) - `project_hermes_multi_provider.md` queued memory - CEO 2026-04-15: "focus on supporting hermes agent"	2026-04-15 14:21:10 -07:00
Hongming Wang	825b8a227f	Merge pull request #255 from Molecule-AI/feat/hermes-phase2b-gemini-native feat(hermes): Phase 2b — native Google Gemini generateContent dispatch path	2026-04-15 14:01:00 -07:00
Hongming Wang	353dc306e9	Merge pull request #240 from Molecule-AI/feat/hermes-phase2-native-sdks feat(hermes): Phase 2a — native Anthropic Messages API dispatch (auth_scheme='anthropic')	2026-04-15 14:00:51 -07:00
Hongming Wang	66120e6c37	fix(tests): hermes provider env-var leak broke test_hermes_smoke Pre-existing flaky test: when the full workspace-template suite ran in collection order, test_hermes_smoke.py::test_create_executor_raises_ without_keys failed with "DID NOT RAISE ValueError". Failure only surfaced when test_hermes_providers ran first. Root cause: test_hermes_providers had an autouse fixture that used monkeypatch.delenv on entry, but several tests in that file mutate os.environ directly (e.g. `os.environ["HERMES_API_KEY"] = "test"`), bypassing monkeypatch. monkeypatch only tracks its own deltas, so on fixture teardown the direct-mutation values stayed in os.environ. HERMES_API_KEY leaked across file boundaries into test_hermes_smoke, which then saw a key present when it expected absence. Fix: replace monkeypatch-based fixture with pure snapshot/restore: - Snapshot all provider env vars at entry - Clear them - yield (test runs, may mutate freely) - try/finally restore the exact pre-test state This is deterministic regardless of whether a test uses monkeypatch, direct mutation, or neither. Also adds a comment documenting WHY we switched away from monkeypatch so a future reviewer doesn't revert. Full workspace-template suite: 1169 passed, 9 skipped, 2 xfailed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 13:59:48 -07:00
rabbitblood	485dcb4cae	feat(hermes): Phase 2b — native Google Gemini generateContent dispatch path Completes Hermes Phase 2 by adding the second native SDK path: Google Gemini via the official `google-genai` Python SDK. Stacked on top of Phase 2a (feat/hermes-phase2-native-sdks) which introduced the dispatch infra + the anthropic native path. ## What's new in this PR 1. `providers.py`: flip `gemini` entry to `auth_scheme="gemini"` and update `base_url` from the OpenAI-compat endpoint (`/v1beta/openai`) to the bare host (`https://generativelanguage.googleapis.com`) which the native SDK uses. 2. `executor.py`: new method `_do_gemini_native(task_text)` that uses `google.genai.Client().aio.models.generate_content(...)`. Dispatch table in `_do_inference` now routes `"gemini"` → `_do_gemini_native`. Same fail-loud semantics as `_do_anthropic_native` — missing SDK raises a clear RuntimeError with install instructions. 3. `requirements.txt`: add `google-genai>=1.0.0`. 4. `test_hermes_phase2_dispatch.py`: +3 tests - `test_gemini_entry_has_gemini_scheme` — registry flip + base URL validated - `test_dispatch_gemini_scheme_calls_gemini_native` — dispatch runs gemini native, not openai-compat or anthropic-native - `test_gemini_native_raises_clear_error_when_sdk_missing` — fail-loud on missing `google-genai` package Plus updated existing dispatch tests to mock `_do_gemini_native` alongside the other paths so "no cross-calls" assertions stay tight. All 36 tests pass locally (10 Phase 2 dispatch + 26 Phase 1 registry): pytest tests/test_hermes_phase2_dispatch.py tests/test_hermes_providers.py 36 passed in 0.07s ## Dispatch table after this PR auth_scheme="openai" → _do_openai_compat (13 providers) auth_scheme="anthropic" → _do_anthropic_native (1 provider, Phase 2a) auth_scheme="gemini" → _do_gemini_native (1 provider, Phase 2b) ← NEW <unknown> → _do_openai_compat + warning (forward-compat) ## Back-compat - All 13 openai-scheme providers unchanged - `hermes_api_key` / `HERMES_API_KEY` / `OPENROUTER_API_KEY` paths unchanged - Only `gemini` provider changes behavior: now uses native generateContent instead of the `/v1beta/openai` compat shim - Existing Gemini callers setting `GEMINI_API_KEY` get the native path automatically — no caller changes needed ## What's NOT in this PR (future phases) - Streaming support (`astream_messages` / `streamGenerateContent` stream variants) for either native path - Tool calling / function calling on native paths - Vision content blocks (image_url → anthropic image blocks; image_url → gemini inline_data with base64 + mime_type) - Extended thinking (anthropic) / thinking config (gemini) - System instructions pass-through on the gemini native path Phase 2c/2d will layer these on. This PR is the minimum-viable native dispatch — single-turn text in, text out — same shape as Phase 2a. ## Stacking This PR targets `feat/hermes-phase2-native-sdks` (Phase 2a) as its base branch, NOT main, so the diff shows only the Gemini-specific additions. When Phase 2a merges to main, GitHub auto-rebases this PR onto the new main head. If reviewer prefers a single combined PR, close #240 and land this one instead — the commits on feat/hermes-phase2-native-sdks are already included in this branch's history. ## Related - #240 Phase 2a (parent branch) - #208 Phase 1 (registry + openai-compat path — already in main) - `project_hermes_multi_provider.md` queued memory — Phase 2 was the next item, this PR completes it - `docs/ecosystem-watch.md` → `### Hermes Agent` — Research Lead's eco-watch entry that catalogued Hermes's native provider list and shaped the original Phase 2 scope	2026-04-15 13:20:39 -07:00
rabbitblood	3985d80220	feat(hermes): Phase 2a — native Anthropic Messages API dispatch path Completes the Hermes adapter's native-SDK plan for the provider that gains the most from leaving OpenAI-compat: Anthropic. OpenAI-compat works fine for plain text turns on every provider (Phase 1 covered that with one code path for all 15 providers), but Anthropic's Messages API has first-class tool use, vision content blocks, and extended thinking that the OpenAI-compat shim strips or mis-translates. Rather than ship all native SDK paths in one PR (Anthropic + Gemini + future), this lands Anthropic only (Phase 2a). Gemini is Phase 2b, shipping after a production measurement window on Phase 2a. ## Design Providers now dispatch by `auth_scheme` field. Phase 1 added the field but every provider used `"openai"`. Phase 2 flips `anthropic` to `"anthropic"` and wires a second inference path keyed on that: - `HermesA2AExecutor._do_openai_compat(task_text)` — existing path, handles 14 of 15 providers (Nous Portal, OpenRouter, OpenAI, xAI, Gemini, Qwen, GLM, Kimi, MiniMax, DeepSeek, Groq, Together, Fireworks, Mistral) - `HermesA2AExecutor._do_anthropic_native(task_text)` — NEW, uses the official `anthropic` Python SDK's `AsyncAnthropic().messages.create(...)` - `HermesA2AExecutor._do_inference(task_text)` — dispatches by `self.provider_cfg.auth_scheme` Unknown schemes fall back to OpenAI-compat with a logged warning, so future provider additions don't crash if a native SDK path ships late. ## Fail-loud on missing SDK `_do_anthropic_native` raises a clear `RuntimeError` with install instructions if the `anthropic` package is missing at runtime: Hermes anthropic native path requires the `anthropic` package. Install in the workspace image with `pip install anthropic>=0.39.0` or set HERMES provider=openrouter to route Claude models through OpenRouter's OpenAI-compat shim instead. This is intentional: silent fallback would mask fidelity loss (tool_use blocks become plain text, vision gets stripped). Loud failure is better. `requirements.txt` adds `anthropic>=0.39.0` so the package is baked into the workspace-template image build path. Operators building custom workspace images without anthropic installed get the loud error. ## Back-compat - `create_executor(hermes_api_key="x")` → still routes to Nous Portal (`auth_scheme="openai"`), unchanged - `HERMES_API_KEY` env var → still first in RESOLUTION_ORDER - `OPENROUTER_API_KEY` env var → still second - All 14 OpenAI-compat providers unchanged — they take the same code path as before - ONLY `anthropic` provider changes behavior: it now uses the native Messages API instead of the `/v1/chat/completions` compat shim ## Constructor signature change `HermesA2AExecutor.__init__` now takes `provider_cfg: ProviderConfig` instead of separate `api_key + base_url + model`. The three fields are derived from `provider_cfg` + an optional model override. This is a breaking change for any external caller building an executor directly, but the only documented public entry point is `create_executor()`, which is updated in the same commit to pass the cfg through. ## Test coverage `workspace-template/tests/test_hermes_phase2_dispatch.py` — 7 new tests: 1. `test_anthropic_entry_has_anthropic_scheme` — registry flip 2. `test_all_other_providers_still_openai_scheme` — regression guard 3. `test_dispatch_openai_scheme_calls_openai_compat` — happy path 4. `test_dispatch_anthropic_scheme_calls_anthropic_native` — happy path 5. `test_dispatch_unknown_scheme_falls_back_to_openai_compat` — forward compat 6. `test_anthropic_native_raises_clear_error_when_sdk_missing` — fail-loud 7. `test_create_executor_passes_provider_cfg` — constructor wiring All pass locally (pytest tests/test_hermes_phase2_dispatch.py -v, 0.04s). Phase 1 tests unchanged: `test_hermes_providers.py` 26/26 pass, no regressions. ## What's NOT in this PR (Phase 2b) - Gemini native `generateContent` path (`auth_scheme="gemini"`) - Streaming support across both native paths (`astream_messages`, `streamGenerateContent`) - Tool calling on the anthropic native path (the `tools` + `tool_use` blocks) - Vision content blocks (image_url → anthropic image blocks) - Extended thinking parameter passthrough All scoped in `project_hermes_multi_provider.md`. Phase 2a is the minimum viable native Anthropic dispatch — single-turn text in, text out, no tools. ## Related - Phase 1 baseline (already in main): #208 — provider registry + OpenAI-compat path - Queued memory: `project_hermes_multi_provider.md` — full phased plan - Triggering directive: CEO 2026-04-15 — "once current works are cleared, focus on supporting hermes agent"	2026-04-15 12:23:56 -07:00
Hongming Wang	1c41c30310	fix(workspace-template): #220 — send auth_headers() on initial_prompt + idle loop Closes #220. #215 added auth_headers() to /registry/register but missed two other self-post paths from the same workspace container: 1. initial_prompt (_do_send_sync) — fires once on first boot after the A2A server is ready. Posts to /workspaces/:id/a2a via the platform proxy. Missing headers meant the initial prompt got silently dropped as 401 on any token-enrolled workspace. 2. idle loop (_post_sync) — fires every idle_interval_seconds while the workspace has no active task (#205 pattern). Same proxy path, same missing headers, same silent 401 in multi-tenant mode. Both now build headers as {"Content-Type": "application/json", **auth_headers()} auth_headers() returns {"Authorization": "Bearer <token>"} when /auth-token.txt exists, empty dict otherwise (first boot before register issues the token). The existing lazy-bootstrap fail-open on the platform side covers the empty-dict case. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 12:02:01 -07:00
Hongming Wang	dc5c4b9dfa	Merge pull request #231 from Molecule-AI/fix/160-sdk-error-probe fix(claude-sdk): #160 — probe CLI directly when SDK swallowed the real stderr	2026-04-15 11:58:59 -07:00
Hongming Wang	a202db15e1	Merge branch 'main' into fix/160-sdk-error-probe	2026-04-15 11:54:13 -07:00
Hongming Wang	715ecc2caf	Merge branch 'main' into fix/issue-215-register-auth	2026-04-15 11:54:09 -07:00
Hongming Wang	54b49ffd1b	fix(code-review): idle loop hardening + idle_prompt docs + admin-auth runbook Addresses items 4, 5, 7 from the self-review of the batch merge. PR A (#228) covered items 1, 2, 3, 6 on the Go side. ## workspace-template/main.py — idle loop hardening - Replace asyncio.get_event_loop() with asyncio.get_running_loop() — the former is deprecated in 3.12+ and emits a DeprecationWarning on every idle fire. - Replace hardcoded urlopen timeout=600 with IDLE_FIRE_TIMEOUT_SECONDS clamped to max(60, min(300, idle_interval_seconds)). Long cadence workspaces no longer hold dangling requests open for 10 minutes; the cap adapts automatically when the interval is short. - Type the exception handling: split HTTPError (has .code) from URLError (connection-level) from the generic catch-all. Log status + error class separately so operators can grep for specific failure modes instead of a bare "post failed". - Fire-and-forget no longer loses exceptions. run_in_executor Future now has an add_done_callback that logs the outcome, so a panic in _post_sync surfaces as "Idle loop: post failed — status=None err=..." instead of Python's default "Task exception was never retrieved" warning burried in stderr. ## org-templates/molecule-dev/org.yaml — discoverability Added idle_prompt + idle_interval_seconds to the defaults: block with explanatory comments. Without this, users had to read main.py to discover the feature. ## docs/runbooks/admin-auth.md — new Documents the three middleware variants (AdminAuth strict, CanvasOrBearer soft, WorkspaceAuth per-id), the exact contract of each, and the three-question test for adding a new route to CanvasOrBearer. Also flags the session-cookie follow-up as Phase H. Referenced PRs: #138, #164, #165, #166, #167, #168, #190, #194, #203, #228. No code deltas in platform/ beyond the Python + YAML + docs changes. Full pytest suite unchanged except the pre-existing test_hermes_smoke flake that fails in full-suite but passes in isolation (test isolation bug, not introduced by this PR). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 11:52:01 -07:00
rabbitblood	1151265b72	fix(claude-sdk): #160 — probe CLI directly when SDK swallowed the real stderr Context: when the claude-agent-sdk wraps a stream error from the CLI subprocess that it can't categorize (rate limit, auth, network), it raises a bare `Exception("Command failed with exit code 1\nError output: Check stderr output for details")`. The exception has no `.stderr` or `.exit_code` attributes, so #66's `_format_process_error` — which reads those attributes — has nothing to surface. The log line becomes: SDK agent error [claude-code]: Exception: Command failed with exit code 1 (exit code: 1)\nError output: Check stderr output for details That's the placeholder text from the SDK's error path, not the actual error. Operators chasing a stuck workspace are forced to `docker exec ws-xxx claude --print` manually to discover the real cause. Observed today during the rate-limit incident: every PM error line was identical "Check stderr output for details" while the real cause ("You've hit your limit · resets Apr 17, 11pm (UTC)") was only visible via manual reproduction — that cost ~20 minutes of diagnosis time. ## Fix Add `_probe_claude_cli_error()`: a best-effort subprocess call that runs `claude --print` with a small probe input, captures stderr+stdout, and returns the real error string. Bounded by 30s timeout so a hung CLI can't stall the error path. Extend `_format_process_error` with ONE narrow fallback: if the exception has no stderr/exit_code AND its message contains the specific "Check stderr output for details" marker, call the probe and append `probed_cli_error=<real error>` to the formatted line. Critically: the probe only runs in the narrow case where we have nothing else to log. If `.stderr` or `.exit_code` are present (the normal ProcessError path from #66), the probe is skipped — no wasted subprocess, no 30s latency on every error. ## Test coverage `workspace-template/tests/test_claude_sdk_executor.py` adds 3 new tests: - `test_format_process_error_probes_cli_when_stderr_swallowed` — the happy path: exception matches the marker, probe runs, result appears in the formatted line. Probe is monkeypatched so no subprocess spawns in the test. - `test_format_process_error_does_not_probe_when_stderr_already_present` — negative: regular ProcessError with `.stderr` set does NOT trigger the probe (skip the wasted call). - `test_format_process_error_does_not_probe_without_swallowed_marker` — negative: unrelated plain exceptions (e.g. RuntimeError) do NOT trigger the probe (so the common-case error path stays fast). All 7 `_format_process_error` tests pass locally (4 existing + 3 new): \`\`\` pytest tests/test_claude_sdk_executor.py -k format_process_error ======================= 7 passed in 0.06s ======================== \`\`\` ## Impact Next time the SDK swallows a real error (rate limit, auth failure, network outage), the workspace log will contain the actual error string alongside the generic placeholder: SDK agent error [claude-code]: Exception: Command failed with exit code 1 ... \| probed_cli_error="You've hit your limit · resets Apr 17, 11pm (UTC)" Diagnosis time drops from "docker exec each ws, run claude --print, read stderr" (~20 min) to "grep probed_cli_error in platform logs" (~10 seconds). Closes #160.	2026-04-15 11:50:55 -07:00
Dev Lead Agent	b8f810dd21	fix(workspace-template): include auth_headers() on /registry/register POST The register call was missing headers=auth_headers(), so workspaces that already have a persisted token (i.e. every restart after the first boot) were sending an unauthenticated request. The platform's register handler returns 401 for requests missing a valid bearer token once a token has been issued, causing re-registration to fail on every restart. Import auth_headers at the module level (alongside the existing save_token inline import) and pass it to the httpx POST. auth_headers() returns {} when no token is on file yet (first boot), so there is no regression for fresh workspaces — the platform still issues a token on the 200 response and save_token() persists it for all subsequent restarts. Closes #215 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-15 18:44:53 +00:00
Hongming Wang	7f11328e22	Merge pull request #205 from Molecule-AI/feat/workspace-idle-loop feat(workspace): add idle-loop reflection pattern (Hermes/Letta shape, opt-in, ~90 LOC)	2026-04-15 11:21:47 -07:00
Hongming Wang	8a011c9f51	Merge remote-tracking branch 'origin/main' into feat/workspace-idle-loop	2026-04-15 11:21:15 -07:00
Hongming Wang	80ae2bd6ad	Merge remote-tracking branch 'origin/main' into feat/hermes-phase1-provider-registry	2026-04-15 11:20:51 -07:00
Hongming Wang	7d7d5995e0	fix(workspace-template): #204 — drop PushNotificationSender (abstract class) Closes #204. PR #198 wired push_sender=PushNotificationSender() into DefaultRequestHandler to satisfy #175's push-notification capability, but PushNotificationSender in a2a-sdk is an abstract base class and cannot be instantiated. Every workspace container crashed on startup with TypeError. Reverted to DefaultRequestHandler's defaults. The pushNotifications capability still appears in AgentCard.capabilities (advertised to A2A clients) but actual implementation of the sender is deferred to a Phase-H follow-up that subclasses PushNotificationSender properly. Existing pytest suite unchanged (the crash was only at runtime on main.py import, which no existing test exercises directly). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 11:18:52 -07:00
rabbitblood	8d8ca18bc0	feat(hermes): Phase 1 — multi-provider registry (15 providers, back-compat preserved) Ships the first half of the queued Hermes adapter expansion. PR 2 only supported Nous Portal + OpenRouter; this adds 13 more providers reachable via OpenAI-compat endpoints. Native SDK paths for Anthropic + Gemini are Phase 2 (better tool-calling + vision fidelity). ## What's new `workspace-template/adapters/hermes/providers.py` (new file, 220 LOC): - ``ProviderConfig`` dataclass: name, env vars, base URL, default model, auth scheme, docs - ``PROVIDERS`` dict with 15 entries across 4 groups: - PR 2 baseline: nous_portal, openrouter - Frontier commercial: openai, anthropic, xai, gemini - Chinese providers: qwen, glm, kimi, minimax, deepseek - OSS/alt: groq, together, fireworks, mistral - ``RESOLUTION_ORDER`` tuple: priority for auto-detect (back-compat first, then commercial, then Chinese, then OSS/alt) - ``resolve_provider(explicit=None)`` -> (ProviderConfig, api_key) - With explicit name: routes to that provider, raises if env var empty - Without: walks RESOLUTION_ORDER, first env-var-set provider wins `workspace-template/adapters/hermes/executor.py` (refactored): - `create_executor(hermes_api_key=None, provider=None, model=None)` now has three parameters: - `hermes_api_key`: PR 2 back-compat — routes to Nous Portal - `provider`: canonical short name from the registry (e.g. "anthropic") - `model`: optional override of the provider's default model - Delegates all resolution to `providers.resolve_provider()` — no more hardcoded URLs or env var lookups in the executor itself - `HermesA2AExecutor.__init__` no longer has Nous-specific defaults; callers pass base_url + model explicitly (which create_executor always does) `workspace-template/tests/test_hermes_providers.py` (new file, 26 tests): - Registry shape invariants (count >= 15, no duplicates, every config valid) - PR 2 back-compat: HERMES_API_KEY / OPENROUTER_API_KEY still route correctly - Auto-detect for every provider in the registry (parametrized — guards against typos in env var lists) - Explicit `provider=` bypass of auto-detect - Error cases: unknown provider, explicit-but-empty, auto-detect-with-no-env - All 26 tests pass locally in 0.08s ## Back-compat guarantees \| Scenario \| PR 2 behavior \| This PR behavior \| \|---\|---\|---\| \| `create_executor(hermes_api_key="x")` \| Nous Portal \| Nous Portal (unchanged) \| \| `HERMES_API_KEY=x` env, auto-detect \| Nous Portal \| Nous Portal (unchanged) \| \| `OPENROUTER_API_KEY=x` env, auto-detect \| OpenRouter \| OpenRouter (unchanged) \| \| Both env + explicit hermes_api_key param \| Nous Portal (param wins) \| Nous Portal (param wins, unchanged) \| Nothing existing can break. New callers gain access to 13 more providers. ## What's NOT in this PR (Phase 2) - Native Anthropic Messages API path — better tool calling, vision, extended thinking. Requires pulling in `anthropic` SDK. ~50 LOC. - Native Gemini generateContent path — for vision + google tools. Requires `google-genai` SDK. ~50 LOC. - Streaming support across all providers — current executor is non-streaming (single chat.completions.create call). Streaming works with openai.AsyncOpenAI but hasn't been wired to the A2A event queue path. ~30 LOC. - Per-provider model overrides in config.yaml — Phase 1 uses the registry's default_model. Phase 2 adds a `hermes: { provider: qwen, model: qwen3-coder-plus }` block in the workspace config. - `.env.example` updates — not critical since the registry itself documents every env var via the `env_vars` field, but nice-to-have. ## Related - Queued memory: `project_hermes_multi_provider.md` - CEO directive 2026-04-15: "once current works are cleared, I want you to focus on supporting hermes agent, right now it doesnt take too much providers" - `docs/ecosystem-watch.md` → `### Hermes Agent` — Research Lead's eco-watch entry listed "Nous Portal, OpenRouter, GLM, Kimi, MiniMax, OpenAI, …" which shaped this registry's initial set ## Test plan - [x] Unit tests: 26/26 pass locally (pytest) - [ ] CI will run on the self-hosted macOS arm64 runner - [ ] Smoke test in a real workspace: set QWEN_API_KEY and verify Technical Researcher actually hits Alibaba DashScope successfully - [ ] Integration test per provider with real API keys (gated on env, skip when not set — Phase 2 CI addition)	2026-04-15 11:14:35 -07:00
rabbitblood	37bca9176e	feat(workspace): add idle-loop reflection pattern (Hermes/Letta shape) Today's multi-framework research (Hermes, Letta, Trigger.dev, Inngest, AG2, Rivet, n8n, Composio, SWE-agent — see docs/ecosystem-watch.md) confirmed that nobody runs while(true) per agent. The working patterns are: (a) event-driven + hibernation (Hermes, Letta, Trigger.dev, Inngest) (b) cron/user-triggered ephemeral runs (AG2, Rivet, n8n, SWE-agent) Molecule AI is currently 100% in category (b). Observed team utilization: ~0.5% — agents idle 99.5% of the time because cron fires and CEO-typed A2A are the only initiating signals. CEO's north-star is 24/7 iteration, current cadence falls short. This PR closes the gap by adding an in-workspace idle loop that wakes the agent periodically ONLY when it has no active task. The shape is the Hermes reflection-on-completion pattern combined with the Letta backlog-pull pattern, collapsed into a ~60 LOC change in the workspace-template. Zero new Go code. Zero new DB tables. Zero new API endpoints. ## How it works 1. `config.py` gets two new fields on WorkspaceConfig: - `idle_prompt: str = ""` — the prompt to self-send when idle - `idle_interval_seconds: int = 600` — how often to check (default 10 min) Both support inline or file ref (matching the initial_prompt pattern). 2. `main.py` spawns an `_run_idle_loop()` asyncio task alongside the existing initial_prompt task (same lifecycle hooks — cancelled in the `finally:` of the server.serve() block). 3. The loop body: a. Sleep interval b. Check `heartbeat.active_tasks == 0` LOCALLY (no LLM call, no HTTP) c. If idle → self-POST the idle_prompt via the existing /workspaces/{id}/a2a proxy d. Loop The agent's own concurrency control rejects the post if it becomes busy between the check and the POST — that's the safety valve. 4. Gated on `config.idle_prompt` being non-empty. Default = "" = no loop. Existing workspaces upgrade silently as no-ops until someone explicitly opts in by setting idle_prompt in org.yaml (either defaults: or per-workspace:). ## Cost analysis (from the research report) - while(true) pattern: ~$93/day/org (12 agents × 12 thinks/hour × $0.027). Unshippable. - Hermes reflection-on-completion: ~$0.45/day/org. Cost ∝ useful work. - This PR's idle loop at 10-min cadence: upper bound 12 × 6/hour × 24h × ~3k tokens × Sonnet rate ≈ $5/day/org PER ROLE, only if they're genuinely idle every check. In practice far less because busy periods skip the LLM call entirely (the active_tasks check is local). ## Rollout plan Research report recommended rolling to ONE workspace first (Technical Researcher) and measuring 24h of activity_logs before enabling for all 12. This PR enables the mechanism; it does NOT add any default idle_prompt to org-templates/molecule-dev/org.yaml. That's a follow-up PR after this one lands and one workspace has been manually opted in for measurement. ## Not touched in this PR - No Go code (no new platform endpoint, no new DB columns) - No org.yaml changes (zero-impact until someone opts in) - No scheduler changes (the idle loop is a workspace concern, not a scheduler concern — matches the research report's layering) ## Test plan - [x] Python syntax check (ast.parse) on main.py + config.py - [ ] Unit test: WorkspaceConfig parses idle_prompt / idle_interval_seconds from yaml - [ ] Integration test: set idle_prompt on Technical Researcher, measure that an A2A message is received every ~10 min while idle, and NOT received while busy with a delegation - [ ] Dogfood: enable on Technical Researcher for 24h, count activity_logs delta vs baseline, confirm cost stays within model ## Related - Today's research report (conversation output, summarized in commit trailer) - docs/ecosystem-watch.md → `### Hermes Agent` (the canonical reflection-on-completion example) - #159 orchestrator/worker split — complementary: leaders pulse for dispatch, workers idle-loop for pull. Together: leaders push work, workers pull work, no role ever sits idle with a cold queue.	2026-04-15 11:09:43 -07:00
Backend Engineer	4cea1c6478	fix(a2a): cancel() event, stateTransitionHistory capability, wire push store (#173 #174 #175 ) #173 — implement cancel() in LangGraphA2AExecutor: emits TaskStatusUpdateEvent(state=canceled, final=True) so clients see the state transition rather than silence. Removes pragma: no cover. Test: test_cancel_emits_canceled_event. #174 — add stateTransitionHistory=True to AgentCapabilities in main.py so microsoft/agent-framework clients know they can request full task history via the A2A protocol. #175 — wire InMemoryPushNotificationConfigStore and PushNotificationSender into DefaultRequestHandler so the advertised pushNotifications capability is backed by a real store. Both classes live in a2a.server.tasks (a2a-sdk 0.3.25); import confirmed by probe. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-15 17:58:10 +00:00
Hongming Wang	119b02c544	feat(plugins): split guardrails into 12 modular plugins Replaces the proposed monolithic molecule-guardrails plugin with 12 single-purpose plugins users can install à la carte. Powered by a small extension to the AgentskillsAdaptor base class so any plugin can ship hooks/, commands/, and a settings-fragment.json without writing a custom adapter. ## Base adapter changes workspace-template/plugins_registry/builtins.py + sdk/python/molecule_plugin/builtins.py (both copies — drift-tested): - New _install_claude_layer() helper called at the end of install() - Conditionally copies hooks/ → /configs/.claude/hooks/ (preserving exec bit) - Conditionally copies commands/.md → /configs/.claude/commands/ - Conditionally merges settings-fragment.json into /configs/.claude/settings.json with ${CLAUDE_DIR} placeholder rewritten to the workspace's absolute install path. Existing user hooks are preserved (deep-merge by event name). - All steps no-op when the plugin doesn't ship the corresponding files, so existing skill+rule plugins (molecule-dev, superpowers, ecc, browser-automation) are unchanged. Drift test (tests/test_plugins_builtins_drift.py) still passes. ## 12 new plugins Hook plugins (ambient enforcement): - molecule-careful-bash — refuses destructive bash; ships careful-mode skill - molecule-freeze-scope — locks edits via .claude/freeze - molecule-audit-trail — appends every Edit/Write to audit.jsonl - molecule-session-context — auto-loads cron-learnings at session start - molecule-prompt-watchdog — injects warnings on destructive prompt keywords Skill plugins (on-demand): - molecule-skill-code-review — 16-criteria multi-axis review - molecule-skill-cross-vendor-review — adversarial second-model review - molecule-skill-llm-judge — deliverable-vs-request scoring - molecule-skill-update-docs — post-merge doc sync - molecule-skill-cron-learnings — operational-memory JSONL format Workflow plugins (slash commands): - molecule-workflow-triage — /triage full PR-triage cycle - molecule-workflow-retro — /retro + cron-retro skill, weekly retrospective Each ships only what it needs — most have just plugin.yaml + skills/ or hooks/ + adapter (one-line stub: `from plugins_registry.builtins import AgentskillsAdaptor as Adaptor`). Total ~120 files but each plugin is small and self-contained. ## Verification - python3 -m molecule_plugin validate plugins/molecule- → all 13 valid (12 new + pre-existing molecule-dev) - End-to-end install smoke test on representative samples: hook plugin (molecule-careful-bash), skill-only plugin (molecule-skill-code-review), workflow plugin (molecule-workflow-triage). All produce expected /configs/ tree, settings.json paths rewritten, exec bits preserved, zero warnings. - workspace-template pytest tests/test_plugins_builtins_drift.py → passes (SDK + runtime stay in sync). ## CLAUDE.md repo-doc updated Lists all 12 new plugins under the existing Plugins section, organized by category (hook / skill / workflow). Each entry one line, recommend- together hints where dependencies make sense. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-14 12:20:04 -07:00
Hongming Wang	cd4eb9c590	Merge pull request #49 from Molecule-AI/feat/hermes-pr2 feat(hermes): implement create_executor() with HERMES_API_KEY / OPENROUTER_API_KEY fallback + smoke tests	2026-04-14 08:16:15 -07:00
Hongming Wang	b6c2f15933	fix(workspace): recursive chown when /workspace bind mount is root-owned (#13 ) On Docker Desktop (macOS/Windows), host-path bind mounts often appear root-owned inside the container. The previous entrypoint only chowned /workspace top-level, so agents (uid 1000) still couldn't write to /workspace/repo/* — git clone, pip install, and file edits failed with EACCES and fell back to /tmp. Detect the root-owned-contents case by sampling the first entry; if it's root-owned, recursively chown the tree. On normal Linux Docker with matching uids this is a no-op, so the fast-startup path is preserved for the common case. Part B of the issue (private-repo initial_prompt clone) was addressed by PR #20. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-14 07:29:30 -07:00
Dev Lead Agent	363a55782b	fix(security): complete Phase 30.6 auth headers in a2a_client get_peers and discover_peer get_peers() was sending no auth headers to /registry/:id/peers — this would return 401 for every workspace agent after PR #31 (WorkspaceAuth middleware) deploys, breaking peer discovery entirely. discover_peer() had X-Workspace-ID but was missing the bearer token, also required by Phase 30.6 for /registry/discover/:id. Both functions now send {"X-Workspace-ID": WORKSPACE_ID, **auth_headers()}. get_workspace_info() was already correct (auth_headers() present since PR #39). Adds test_request_sends_workspace_id_header to TestGetPeers; hardens the discover_peer header assertion to use presence-check rather than exact equality. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-14 13:23:44 +00:00
Hongming Wang	a565c49bce	Merge pull request #41 from Molecule-AI/fix/security-h3-m4 noteworthy: secrets-handling — H3 github_pat_ redaction + M4 atomic 0600 token write. 7-gate verification PASS.	2026-04-14 03:21:49 -07:00
Dev Lead Agent	1a109b3263	fix(security): H3 github_pat_ redaction + M4 atomic token write (audit cycle 10) H3 (compliance.py): GitHub fine-grained PATs use the github_pat_ prefix with an 82-character alphanumeric+underscore suffix — different from classic tokens (36 chars). Add the missing pattern to _PII_PATTERNS so fine-grained PATs are redacted in compliance logs alongside classic tokens. M4 (platform_auth.py): Replace write_text()+chmod() in save_token() with os.open(O_WRONLY\|O_CREAT\|O_TRUNC, 0o600) + os.write(). The old approach had a TOCTOU window where a concurrent reader could access the token file before chmod restricted permissions. os.open with explicit mode creates the file with 0600 permissions atomically in a single syscall. H2 (a2a_client.py): Already fixed in commit `6c78962` (Cycle 5); no-op. Tests: 1136 passed, 2 skipped (workspace-template pytest suite) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-14 09:34:27 +00:00
Backend Engineer	9649311d51	fix(security): N1 — add auth headers to all platform calls in Python callers IMPACT WITHOUT THIS FIX: deploying PR #31 (WorkspaceAuth middleware on /workspaces/*) without this patch causes EVERY delegation cycle to silently break — the heartbeat poll returns 401, the self-message A2A POST returns 401, agents never wake up after task completion, and memory consolidation stops. The entire multi-agent coordination system degrades to single-shot interactions with no result delivery. Changes (all using the existing platform_auth.auth_headers() pattern already used for POST /registry/heartbeat): heartbeat.py — 5 calls fixed: - GET /workspaces/:id/delegations (delegation poll) - GET /workspaces/:id (self workspace info for parent lookup) - GET /workspaces/{parent_id} (parent workspace name lookup) - POST /workspaces/:id/a2a (self-message to wake agent on results) - POST /workspaces/:id/notify (canvas delegation result notification) Also moved `from platform_auth import auth_headers` from inline (per-call) to module-level import so _check_delegations() can use it without re-importing. consolidation.py — 4 calls fixed: - GET /workspaces/:id/memories (fetch memories for consolidation) - POST /workspaces/:id/memories (write consolidated summary — agent path) - DELETE /workspaces/:id/memories/:id (delete original memories post-consolidation) - POST /workspaces/:id/memories (write consolidated summary — fallback path) a2a_client.py — 1 call fixed: - GET /workspaces/:id (get_workspace_info()) ⚠️ DEPLOYMENT NOTE: This PR MUST be merged and deployed at the same time as PR #31 (WorkspaceAuth middleware). Deploying #31 without this fix will immediately break all delegation result delivery. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-14 08:37:50 +00:00
Hongming Wang	7d3e369632	fix(gate-3): update watcher test to expect SHA-256 hash Align test_hash_file_real_file with the SHA-256 switch in watcher.py. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-14 01:21:35 -07:00
Dev Lead Agent	486275868d	fix(security): H1 — replace MD5 with SHA-256 in config/skill watchers Both watcher.py (ConfigWatcher) and skill_loader/watcher.py (SkillsWatcher) used hashlib.md5() for file-integrity change detection. MD5 is collision-prone: a crafted config file could produce the same hash as a benign one, silently suppressing the hot-reload callback and preventing agents from picking up legitimate config changes. Replace hashlib.md5 → hashlib.sha256 in both _hash_file() methods. Update docstrings, comments, and the type-annotation comment (rel_path → md5 hex → sha256 hex). Test update: test_skills_watcher.py — rename helper _md5 → _sha256, update the hash-length assertion from 32 (MD5) to 64 (SHA-256), and rename the test from test_hash_file_returns_md5_for_existing_file to test_hash_file_returns_sha256_for_existing_file. All 25 watcher tests pass. Note: H2 (a2a_client.py timeout=None) was already fixed in Cycle 5 (timeout=httpx.Timeout(connect=30.0, read=300.0, ...)) — confirmed by code review before opening this PR. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-14 07:52:07 +00:00
Dev Lead Agent	6c78962a33	fix(security): Cycle 5 — auth middleware, injection hardening, skill sandbox Fix A — platform/internal/middleware/wsauth_middleware.go (NEW): WorkspaceAuth() gin middleware enforces per-workspace bearer-token auth on ALL /workspaces/:id/* sub-routes. Same lazy-bootstrap contract as secrets.Values: workspaces with no live token are grandfathered through. Blocks C2, C3, C4, C5, C7, C8, C9, C12, C13 simultaneously. Fix A — platform/internal/router/router.go: Reorganised route registration: bare CRUD (/workspaces, /workspaces/:id) and /a2a remain on root router; all other /workspaces/:id/* sub-routes moved into wsAuth = r.Group("/workspaces/:id", middleware.WorkspaceAuth(db.DB)). CORS AllowHeaders updated to include Authorization so browser/agent callers can send the bearer token cross-origin. Fix B — workspace-template/heartbeat.py: _check_delegations(): validate source_id == self.workspace_id before accepting a delegation result. Attacker-crafted records with a foreign source_id are silently skipped with a WARNING log (injection attempt). trigger_msg no longer embeds raw response_preview text; references delegation_id + status only — removes the prompt-injection vector. Fix C — workspace-template/skill_loader/loader.py: load_skill_tools(): before exec_module(), verify script is within scripts_dir (path traversal guard) and temporarily scrub sensitive env vars (CLAUDE_CODE_OAUTH_TOKEN, ANTHROPIC_API_KEY, OPENAI_API_KEY, WORKSPACE_AUTH_TOKEN, GITHUB_TOKEN, GH_TOKEN) from os.environ; restore in finally block. Defence-in-depth even if /plugins auth gate is bypassed. Fix D — platform/internal/handlers/socket.go: HandleConnect(): agent connections (X-Workspace-ID present) validated via wsauth.HasAnyLiveToken + wsauth.ValidateToken before WebSocket upgrade. Canvas clients (no X-Workspace-ID) remain unauthenticated. Fix D — workspace-template/events.py: PlatformEventSubscriber._connect(): include platform_auth bearer token in WebSocket upgrade headers alongside X-Workspace-ID. Fix E — workspace-template/executor_helpers.py: recall_memories() and commit_memory() now pass platform_auth bearer token in Authorization header so WorkspaceAuth middleware allows access. Fix F — workspace-template/a2a_client.py: send_a2a_message(): timeout=None → httpx.Timeout(connect=30, read=300, write=30, pool=30). Resolves H2 flagged across 5 consecutive audits. Tests: 149/149 Python tests pass (test_heartbeat + test_events updated to assert new source_id validation behaviour and allow Authorization header). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-14 04:44:42 +00:00
Dev Lead Agent	08fe37aee1	feat: implement Hermes adapter create_executor() with OpenRouter fallback Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-13 16:47:29 -07:00
Hongming Wang	24fec62d7f	initial commit — Molecule AI platform Forked clean from public hackathon repo (Starfire-AgentTeam, BSL 1.1) with full rebrand to Molecule AI under github.com/Molecule-AI/molecule-monorepo. Brand: Starfire → Molecule AI. Slug: starfire / agent-molecule → molecule. Env vars: STARFIRE_* → MOLECULE_*. Go module: github.com/agent-molecule/platform → github.com/Molecule-AI/molecule-monorepo/platform. Python packages: starfire_plugin → molecule_plugin, starfire_agent → molecule_agent. DB: agentmolecule → molecule. History truncated; see public repo for prior commits and contributor attribution. Verified green: go test -race ./... (platform), pytest (workspace-template 1129 + sdk 132), vitest (canvas 352), build (mcp). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-13 11:55:37 -07:00

42 Commits