molecule-core

Author	SHA1	Message	Date
Hongming Wang	825b8a227f	Merge pull request #255 from Molecule-AI/feat/hermes-phase2b-gemini-native feat(hermes): Phase 2b — native Google Gemini generateContent dispatch path	2026-04-15 14:01:00 -07:00
Hongming Wang	353dc306e9	Merge pull request #240 from Molecule-AI/feat/hermes-phase2-native-sdks feat(hermes): Phase 2a — native Anthropic Messages API dispatch (auth_scheme='anthropic')	2026-04-15 14:00:51 -07:00
Hongming Wang	66120e6c37	fix(tests): hermes provider env-var leak broke test_hermes_smoke Pre-existing flaky test: when the full workspace-template suite ran in collection order, test_hermes_smoke.py::test_create_executor_raises_ without_keys failed with "DID NOT RAISE ValueError". Failure only surfaced when test_hermes_providers ran first. Root cause: test_hermes_providers had an autouse fixture that used monkeypatch.delenv on entry, but several tests in that file mutate os.environ directly (e.g. `os.environ["HERMES_API_KEY"] = "test"`), bypassing monkeypatch. monkeypatch only tracks its own deltas, so on fixture teardown the direct-mutation values stayed in os.environ. HERMES_API_KEY leaked across file boundaries into test_hermes_smoke, which then saw a key present when it expected absence. Fix: replace monkeypatch-based fixture with pure snapshot/restore: - Snapshot all provider env vars at entry - Clear them - yield (test runs, may mutate freely) - try/finally restore the exact pre-test state This is deterministic regardless of whether a test uses monkeypatch, direct mutation, or neither. Also adds a comment documenting WHY we switched away from monkeypatch so a future reviewer doesn't revert. Full workspace-template suite: 1169 passed, 9 skipped, 2 xfailed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 13:59:48 -07:00
rabbitblood	485dcb4cae	feat(hermes): Phase 2b — native Google Gemini generateContent dispatch path Completes Hermes Phase 2 by adding the second native SDK path: Google Gemini via the official `google-genai` Python SDK. Stacked on top of Phase 2a (feat/hermes-phase2-native-sdks) which introduced the dispatch infra + the anthropic native path. ## What's new in this PR 1. `providers.py`: flip `gemini` entry to `auth_scheme="gemini"` and update `base_url` from the OpenAI-compat endpoint (`/v1beta/openai`) to the bare host (`https://generativelanguage.googleapis.com`) which the native SDK uses. 2. `executor.py`: new method `_do_gemini_native(task_text)` that uses `google.genai.Client().aio.models.generate_content(...)`. Dispatch table in `_do_inference` now routes `"gemini"` → `_do_gemini_native`. Same fail-loud semantics as `_do_anthropic_native` — missing SDK raises a clear RuntimeError with install instructions. 3. `requirements.txt`: add `google-genai>=1.0.0`. 4. `test_hermes_phase2_dispatch.py`: +3 tests - `test_gemini_entry_has_gemini_scheme` — registry flip + base URL validated - `test_dispatch_gemini_scheme_calls_gemini_native` — dispatch runs gemini native, not openai-compat or anthropic-native - `test_gemini_native_raises_clear_error_when_sdk_missing` — fail-loud on missing `google-genai` package Plus updated existing dispatch tests to mock `_do_gemini_native` alongside the other paths so "no cross-calls" assertions stay tight. All 36 tests pass locally (10 Phase 2 dispatch + 26 Phase 1 registry): pytest tests/test_hermes_phase2_dispatch.py tests/test_hermes_providers.py 36 passed in 0.07s ## Dispatch table after this PR auth_scheme="openai" → _do_openai_compat (13 providers) auth_scheme="anthropic" → _do_anthropic_native (1 provider, Phase 2a) auth_scheme="gemini" → _do_gemini_native (1 provider, Phase 2b) ← NEW <unknown> → _do_openai_compat + warning (forward-compat) ## Back-compat - All 13 openai-scheme providers unchanged - `hermes_api_key` / `HERMES_API_KEY` / `OPENROUTER_API_KEY` paths unchanged - Only `gemini` provider changes behavior: now uses native generateContent instead of the `/v1beta/openai` compat shim - Existing Gemini callers setting `GEMINI_API_KEY` get the native path automatically — no caller changes needed ## What's NOT in this PR (future phases) - Streaming support (`astream_messages` / `streamGenerateContent` stream variants) for either native path - Tool calling / function calling on native paths - Vision content blocks (image_url → anthropic image blocks; image_url → gemini inline_data with base64 + mime_type) - Extended thinking (anthropic) / thinking config (gemini) - System instructions pass-through on the gemini native path Phase 2c/2d will layer these on. This PR is the minimum-viable native dispatch — single-turn text in, text out — same shape as Phase 2a. ## Stacking This PR targets `feat/hermes-phase2-native-sdks` (Phase 2a) as its base branch, NOT main, so the diff shows only the Gemini-specific additions. When Phase 2a merges to main, GitHub auto-rebases this PR onto the new main head. If reviewer prefers a single combined PR, close #240 and land this one instead — the commits on feat/hermes-phase2-native-sdks are already included in this branch's history. ## Related - #240 Phase 2a (parent branch) - #208 Phase 1 (registry + openai-compat path — already in main) - `project_hermes_multi_provider.md` queued memory — Phase 2 was the next item, this PR completes it - `docs/ecosystem-watch.md` → `### Hermes Agent` — Research Lead's eco-watch entry that catalogued Hermes's native provider list and shaped the original Phase 2 scope	2026-04-15 13:20:39 -07:00
rabbitblood	3985d80220	feat(hermes): Phase 2a — native Anthropic Messages API dispatch path Completes the Hermes adapter's native-SDK plan for the provider that gains the most from leaving OpenAI-compat: Anthropic. OpenAI-compat works fine for plain text turns on every provider (Phase 1 covered that with one code path for all 15 providers), but Anthropic's Messages API has first-class tool use, vision content blocks, and extended thinking that the OpenAI-compat shim strips or mis-translates. Rather than ship all native SDK paths in one PR (Anthropic + Gemini + future), this lands Anthropic only (Phase 2a). Gemini is Phase 2b, shipping after a production measurement window on Phase 2a. ## Design Providers now dispatch by `auth_scheme` field. Phase 1 added the field but every provider used `"openai"`. Phase 2 flips `anthropic` to `"anthropic"` and wires a second inference path keyed on that: - `HermesA2AExecutor._do_openai_compat(task_text)` — existing path, handles 14 of 15 providers (Nous Portal, OpenRouter, OpenAI, xAI, Gemini, Qwen, GLM, Kimi, MiniMax, DeepSeek, Groq, Together, Fireworks, Mistral) - `HermesA2AExecutor._do_anthropic_native(task_text)` — NEW, uses the official `anthropic` Python SDK's `AsyncAnthropic().messages.create(...)` - `HermesA2AExecutor._do_inference(task_text)` — dispatches by `self.provider_cfg.auth_scheme` Unknown schemes fall back to OpenAI-compat with a logged warning, so future provider additions don't crash if a native SDK path ships late. ## Fail-loud on missing SDK `_do_anthropic_native` raises a clear `RuntimeError` with install instructions if the `anthropic` package is missing at runtime: Hermes anthropic native path requires the `anthropic` package. Install in the workspace image with `pip install anthropic>=0.39.0` or set HERMES provider=openrouter to route Claude models through OpenRouter's OpenAI-compat shim instead. This is intentional: silent fallback would mask fidelity loss (tool_use blocks become plain text, vision gets stripped). Loud failure is better. `requirements.txt` adds `anthropic>=0.39.0` so the package is baked into the workspace-template image build path. Operators building custom workspace images without anthropic installed get the loud error. ## Back-compat - `create_executor(hermes_api_key="x")` → still routes to Nous Portal (`auth_scheme="openai"`), unchanged - `HERMES_API_KEY` env var → still first in RESOLUTION_ORDER - `OPENROUTER_API_KEY` env var → still second - All 14 OpenAI-compat providers unchanged — they take the same code path as before - ONLY `anthropic` provider changes behavior: it now uses the native Messages API instead of the `/v1/chat/completions` compat shim ## Constructor signature change `HermesA2AExecutor.__init__` now takes `provider_cfg: ProviderConfig` instead of separate `api_key + base_url + model`. The three fields are derived from `provider_cfg` + an optional model override. This is a breaking change for any external caller building an executor directly, but the only documented public entry point is `create_executor()`, which is updated in the same commit to pass the cfg through. ## Test coverage `workspace-template/tests/test_hermes_phase2_dispatch.py` — 7 new tests: 1. `test_anthropic_entry_has_anthropic_scheme` — registry flip 2. `test_all_other_providers_still_openai_scheme` — regression guard 3. `test_dispatch_openai_scheme_calls_openai_compat` — happy path 4. `test_dispatch_anthropic_scheme_calls_anthropic_native` — happy path 5. `test_dispatch_unknown_scheme_falls_back_to_openai_compat` — forward compat 6. `test_anthropic_native_raises_clear_error_when_sdk_missing` — fail-loud 7. `test_create_executor_passes_provider_cfg` — constructor wiring All pass locally (pytest tests/test_hermes_phase2_dispatch.py -v, 0.04s). Phase 1 tests unchanged: `test_hermes_providers.py` 26/26 pass, no regressions. ## What's NOT in this PR (Phase 2b) - Gemini native `generateContent` path (`auth_scheme="gemini"`) - Streaming support across both native paths (`astream_messages`, `streamGenerateContent`) - Tool calling on the anthropic native path (the `tools` + `tool_use` blocks) - Vision content blocks (image_url → anthropic image blocks) - Extended thinking parameter passthrough All scoped in `project_hermes_multi_provider.md`. Phase 2a is the minimum viable native Anthropic dispatch — single-turn text in, text out, no tools. ## Related - Phase 1 baseline (already in main): #208 — provider registry + OpenAI-compat path - Queued memory: `project_hermes_multi_provider.md` — full phased plan - Triggering directive: CEO 2026-04-15 — "once current works are cleared, focus on supporting hermes agent"	2026-04-15 12:23:56 -07:00
rabbitblood	1151265b72	fix(claude-sdk): #160 — probe CLI directly when SDK swallowed the real stderr Context: when the claude-agent-sdk wraps a stream error from the CLI subprocess that it can't categorize (rate limit, auth, network), it raises a bare `Exception("Command failed with exit code 1\nError output: Check stderr output for details")`. The exception has no `.stderr` or `.exit_code` attributes, so #66's `_format_process_error` — which reads those attributes — has nothing to surface. The log line becomes: SDK agent error [claude-code]: Exception: Command failed with exit code 1 (exit code: 1)\nError output: Check stderr output for details That's the placeholder text from the SDK's error path, not the actual error. Operators chasing a stuck workspace are forced to `docker exec ws-xxx claude --print` manually to discover the real cause. Observed today during the rate-limit incident: every PM error line was identical "Check stderr output for details" while the real cause ("You've hit your limit · resets Apr 17, 11pm (UTC)") was only visible via manual reproduction — that cost ~20 minutes of diagnosis time. ## Fix Add `_probe_claude_cli_error()`: a best-effort subprocess call that runs `claude --print` with a small probe input, captures stderr+stdout, and returns the real error string. Bounded by 30s timeout so a hung CLI can't stall the error path. Extend `_format_process_error` with ONE narrow fallback: if the exception has no stderr/exit_code AND its message contains the specific "Check stderr output for details" marker, call the probe and append `probed_cli_error=<real error>` to the formatted line. Critically: the probe only runs in the narrow case where we have nothing else to log. If `.stderr` or `.exit_code` are present (the normal ProcessError path from #66), the probe is skipped — no wasted subprocess, no 30s latency on every error. ## Test coverage `workspace-template/tests/test_claude_sdk_executor.py` adds 3 new tests: - `test_format_process_error_probes_cli_when_stderr_swallowed` — the happy path: exception matches the marker, probe runs, result appears in the formatted line. Probe is monkeypatched so no subprocess spawns in the test. - `test_format_process_error_does_not_probe_when_stderr_already_present` — negative: regular ProcessError with `.stderr` set does NOT trigger the probe (skip the wasted call). - `test_format_process_error_does_not_probe_without_swallowed_marker` — negative: unrelated plain exceptions (e.g. RuntimeError) do NOT trigger the probe (so the common-case error path stays fast). All 7 `_format_process_error` tests pass locally (4 existing + 3 new): \`\`\` pytest tests/test_claude_sdk_executor.py -k format_process_error ======================= 7 passed in 0.06s ======================== \`\`\` ## Impact Next time the SDK swallows a real error (rate limit, auth failure, network outage), the workspace log will contain the actual error string alongside the generic placeholder: SDK agent error [claude-code]: Exception: Command failed with exit code 1 ... \| probed_cli_error="You've hit your limit · resets Apr 17, 11pm (UTC)" Diagnosis time drops from "docker exec each ws, run claude --print, read stderr" (~20 min) to "grep probed_cli_error in platform logs" (~10 seconds). Closes #160.	2026-04-15 11:50:55 -07:00
rabbitblood	8d8ca18bc0	feat(hermes): Phase 1 — multi-provider registry (15 providers, back-compat preserved) Ships the first half of the queued Hermes adapter expansion. PR 2 only supported Nous Portal + OpenRouter; this adds 13 more providers reachable via OpenAI-compat endpoints. Native SDK paths for Anthropic + Gemini are Phase 2 (better tool-calling + vision fidelity). ## What's new `workspace-template/adapters/hermes/providers.py` (new file, 220 LOC): - ``ProviderConfig`` dataclass: name, env vars, base URL, default model, auth scheme, docs - ``PROVIDERS`` dict with 15 entries across 4 groups: - PR 2 baseline: nous_portal, openrouter - Frontier commercial: openai, anthropic, xai, gemini - Chinese providers: qwen, glm, kimi, minimax, deepseek - OSS/alt: groq, together, fireworks, mistral - ``RESOLUTION_ORDER`` tuple: priority for auto-detect (back-compat first, then commercial, then Chinese, then OSS/alt) - ``resolve_provider(explicit=None)`` -> (ProviderConfig, api_key) - With explicit name: routes to that provider, raises if env var empty - Without: walks RESOLUTION_ORDER, first env-var-set provider wins `workspace-template/adapters/hermes/executor.py` (refactored): - `create_executor(hermes_api_key=None, provider=None, model=None)` now has three parameters: - `hermes_api_key`: PR 2 back-compat — routes to Nous Portal - `provider`: canonical short name from the registry (e.g. "anthropic") - `model`: optional override of the provider's default model - Delegates all resolution to `providers.resolve_provider()` — no more hardcoded URLs or env var lookups in the executor itself - `HermesA2AExecutor.__init__` no longer has Nous-specific defaults; callers pass base_url + model explicitly (which create_executor always does) `workspace-template/tests/test_hermes_providers.py` (new file, 26 tests): - Registry shape invariants (count >= 15, no duplicates, every config valid) - PR 2 back-compat: HERMES_API_KEY / OPENROUTER_API_KEY still route correctly - Auto-detect for every provider in the registry (parametrized — guards against typos in env var lists) - Explicit `provider=` bypass of auto-detect - Error cases: unknown provider, explicit-but-empty, auto-detect-with-no-env - All 26 tests pass locally in 0.08s ## Back-compat guarantees \| Scenario \| PR 2 behavior \| This PR behavior \| \|---\|---\|---\| \| `create_executor(hermes_api_key="x")` \| Nous Portal \| Nous Portal (unchanged) \| \| `HERMES_API_KEY=x` env, auto-detect \| Nous Portal \| Nous Portal (unchanged) \| \| `OPENROUTER_API_KEY=x` env, auto-detect \| OpenRouter \| OpenRouter (unchanged) \| \| Both env + explicit hermes_api_key param \| Nous Portal (param wins) \| Nous Portal (param wins, unchanged) \| Nothing existing can break. New callers gain access to 13 more providers. ## What's NOT in this PR (Phase 2) - Native Anthropic Messages API path — better tool calling, vision, extended thinking. Requires pulling in `anthropic` SDK. ~50 LOC. - Native Gemini generateContent path — for vision + google tools. Requires `google-genai` SDK. ~50 LOC. - Streaming support across all providers — current executor is non-streaming (single chat.completions.create call). Streaming works with openai.AsyncOpenAI but hasn't been wired to the A2A event queue path. ~30 LOC. - Per-provider model overrides in config.yaml — Phase 1 uses the registry's default_model. Phase 2 adds a `hermes: { provider: qwen, model: qwen3-coder-plus }` block in the workspace config. - `.env.example` updates — not critical since the registry itself documents every env var via the `env_vars` field, but nice-to-have. ## Related - Queued memory: `project_hermes_multi_provider.md` - CEO directive 2026-04-15: "once current works are cleared, I want you to focus on supporting hermes agent, right now it doesnt take too much providers" - `docs/ecosystem-watch.md` → `### Hermes Agent` — Research Lead's eco-watch entry listed "Nous Portal, OpenRouter, GLM, Kimi, MiniMax, OpenAI, …" which shaped this registry's initial set ## Test plan - [x] Unit tests: 26/26 pass locally (pytest) - [ ] CI will run on the self-hosted macOS arm64 runner - [ ] Smoke test in a real workspace: set QWEN_API_KEY and verify Technical Researcher actually hits Alibaba DashScope successfully - [ ] Integration test per provider with real API keys (gated on env, skip when not set — Phase 2 CI addition)	2026-04-15 11:14:35 -07:00
Backend Engineer	4cea1c6478	fix(a2a): cancel() event, stateTransitionHistory capability, wire push store (#173 #174 #175 ) #173 — implement cancel() in LangGraphA2AExecutor: emits TaskStatusUpdateEvent(state=canceled, final=True) so clients see the state transition rather than silence. Removes pragma: no cover. Test: test_cancel_emits_canceled_event. #174 — add stateTransitionHistory=True to AgentCapabilities in main.py so microsoft/agent-framework clients know they can request full task history via the A2A protocol. #175 — wire InMemoryPushNotificationConfigStore and PushNotificationSender into DefaultRequestHandler so the advertised pushNotifications capability is backed by a real store. Both classes live in a2a.server.tasks (a2a-sdk 0.3.25); import confirmed by probe. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-15 17:58:10 +00:00
Hongming Wang	cd4eb9c590	Merge pull request #49 from Molecule-AI/feat/hermes-pr2 feat(hermes): implement create_executor() with HERMES_API_KEY / OPENROUTER_API_KEY fallback + smoke tests	2026-04-14 08:16:15 -07:00
Dev Lead Agent	363a55782b	fix(security): complete Phase 30.6 auth headers in a2a_client get_peers and discover_peer get_peers() was sending no auth headers to /registry/:id/peers — this would return 401 for every workspace agent after PR #31 (WorkspaceAuth middleware) deploys, breaking peer discovery entirely. discover_peer() had X-Workspace-ID but was missing the bearer token, also required by Phase 30.6 for /registry/discover/:id. Both functions now send {"X-Workspace-ID": WORKSPACE_ID, **auth_headers()}. get_workspace_info() was already correct (auth_headers() present since PR #39). Adds test_request_sends_workspace_id_header to TestGetPeers; hardens the discover_peer header assertion to use presence-check rather than exact equality. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-14 13:23:44 +00:00
Hongming Wang	7d3e369632	fix(gate-3): update watcher test to expect SHA-256 hash Align test_hash_file_real_file with the SHA-256 switch in watcher.py. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-14 01:21:35 -07:00
Dev Lead Agent	486275868d	fix(security): H1 — replace MD5 with SHA-256 in config/skill watchers Both watcher.py (ConfigWatcher) and skill_loader/watcher.py (SkillsWatcher) used hashlib.md5() for file-integrity change detection. MD5 is collision-prone: a crafted config file could produce the same hash as a benign one, silently suppressing the hot-reload callback and preventing agents from picking up legitimate config changes. Replace hashlib.md5 → hashlib.sha256 in both _hash_file() methods. Update docstrings, comments, and the type-annotation comment (rel_path → md5 hex → sha256 hex). Test update: test_skills_watcher.py — rename helper _md5 → _sha256, update the hash-length assertion from 32 (MD5) to 64 (SHA-256), and rename the test from test_hash_file_returns_md5_for_existing_file to test_hash_file_returns_sha256_for_existing_file. All 25 watcher tests pass. Note: H2 (a2a_client.py timeout=None) was already fixed in Cycle 5 (timeout=httpx.Timeout(connect=30.0, read=300.0, ...)) — confirmed by code review before opening this PR. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-14 07:52:07 +00:00
Dev Lead Agent	6c78962a33	fix(security): Cycle 5 — auth middleware, injection hardening, skill sandbox Fix A — platform/internal/middleware/wsauth_middleware.go (NEW): WorkspaceAuth() gin middleware enforces per-workspace bearer-token auth on ALL /workspaces/:id/* sub-routes. Same lazy-bootstrap contract as secrets.Values: workspaces with no live token are grandfathered through. Blocks C2, C3, C4, C5, C7, C8, C9, C12, C13 simultaneously. Fix A — platform/internal/router/router.go: Reorganised route registration: bare CRUD (/workspaces, /workspaces/:id) and /a2a remain on root router; all other /workspaces/:id/* sub-routes moved into wsAuth = r.Group("/workspaces/:id", middleware.WorkspaceAuth(db.DB)). CORS AllowHeaders updated to include Authorization so browser/agent callers can send the bearer token cross-origin. Fix B — workspace-template/heartbeat.py: _check_delegations(): validate source_id == self.workspace_id before accepting a delegation result. Attacker-crafted records with a foreign source_id are silently skipped with a WARNING log (injection attempt). trigger_msg no longer embeds raw response_preview text; references delegation_id + status only — removes the prompt-injection vector. Fix C — workspace-template/skill_loader/loader.py: load_skill_tools(): before exec_module(), verify script is within scripts_dir (path traversal guard) and temporarily scrub sensitive env vars (CLAUDE_CODE_OAUTH_TOKEN, ANTHROPIC_API_KEY, OPENAI_API_KEY, WORKSPACE_AUTH_TOKEN, GITHUB_TOKEN, GH_TOKEN) from os.environ; restore in finally block. Defence-in-depth even if /plugins auth gate is bypassed. Fix D — platform/internal/handlers/socket.go: HandleConnect(): agent connections (X-Workspace-ID present) validated via wsauth.HasAnyLiveToken + wsauth.ValidateToken before WebSocket upgrade. Canvas clients (no X-Workspace-ID) remain unauthenticated. Fix D — workspace-template/events.py: PlatformEventSubscriber._connect(): include platform_auth bearer token in WebSocket upgrade headers alongside X-Workspace-ID. Fix E — workspace-template/executor_helpers.py: recall_memories() and commit_memory() now pass platform_auth bearer token in Authorization header so WorkspaceAuth middleware allows access. Fix F — workspace-template/a2a_client.py: send_a2a_message(): timeout=None → httpx.Timeout(connect=30, read=300, write=30, pool=30). Resolves H2 flagged across 5 consecutive audits. Tests: 149/149 Python tests pass (test_heartbeat + test_events updated to assert new source_id validation behaviour and allow Authorization header). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-14 04:44:42 +00:00
Dev Lead Agent	08fe37aee1	feat: implement Hermes adapter create_executor() with OpenRouter fallback Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-13 16:47:29 -07:00
Hongming Wang	24fec62d7f	initial commit — Molecule AI platform Forked clean from public hackathon repo (Starfire-AgentTeam, BSL 1.1) with full rebrand to Molecule AI under github.com/Molecule-AI/molecule-monorepo. Brand: Starfire → Molecule AI. Slug: starfire / agent-molecule → molecule. Env vars: STARFIRE_* → MOLECULE_*. Go module: github.com/agent-molecule/platform → github.com/Molecule-AI/molecule-monorepo/platform. Python packages: starfire_plugin → molecule_plugin, starfire_agent → molecule_agent. DB: agentmolecule → molecule. History truncated; see public repo for prior commits and contributor attribution. Verified green: go test -race ./... (platform), pytest (workspace-template 1129 + sdk 132), vitest (canvas 352), build (mcp). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-13 11:55:37 -07:00

15 Commits