molecule-core

Author	SHA1	Message	Date
molecule-ai[bot]	bcd256946f	Merge pull request #890 from Molecule-AI/test/issue-790-crash-resume-integration test(integration): crash-resume integration tests for Temporal checkpoints (#790)	2026-04-18 00:02:48 +00:00
Molecule AI Backend Engineer	9d171bda7f	feat(hermes): stacked system messages — persona + tools + reasoning policy (#499 ) HermesA2AExecutor now supports sending system context as ordered, separate role=system messages instead of a single concatenated string — the model format recommended by NousResearch. Changes: - HermesA2AExecutor.__init__: new system_blocks kwarg (list[str\|None]\|None) stored as an independent copy; None blocks and empty strings silently skipped - _build_messages(): when system_blocks is not None, emits each non-empty block as a separate {"role": "system"} entry in Hermes-recommended order (persona → tools context → reasoning policy); falls through to legacy system_prompt path when system_blocks is None (backward compatible) Backward compatibility: existing callers that pass a single system_prompt string continue to work identically — no changes required. Tests (12 new, 47 total): - system_blocks stored as independent copy (mutation safe) - three-block stacked ordering preserved - empty / None blocks silently skipped - all-empty list → zero system messages - system_blocks overrides system_prompt when both provided - legacy system_prompt path unchanged - stacked blocks appear in the live API call kwargs Closes #499 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-17 23:53:12 +00:00
Molecule AI QA Engineer	a663c8de81	test(integration): crash-resume integration tests for Temporal checkpoints (#790 ) Closes #790. Depends on feat/issue-583-1-checkpoint-persistence (PR #788). Platform (Go) — checkpoints_integration_test.go (5 new tests): 1. ThreeStepPersistence: POST task_receive/llm_call/task_complete → GET returns all 3 in step_index DESC order with correct names and payloads. 2. CrashResume_HighestStepIsResumptionPoint: POST steps 0+1 only (crash before step 2) → GET shows step_index=1 as the resume point; task_complete absent. 3. UpsertIdempotency_LatestPayloadWins: POST same (wf_id, step_name) twice with different payloads → List returns only the second payload (ON CONFLICT DO UPDATE). 4. PostCascadeDelete_Returns404: simulate post ON-DELETE-CASCADE state (empty rows) → List returns 404 as expected after workspace deletion. 5. AuthGate_NoToken_Returns401: router-level test with WorkspaceAuth middleware; POST/GET/DELETE all return 401 without a bearer token (no DB calls made). workspace-template — _save_checkpoint + 4 Python tests: - Add async _save_checkpoint() to temporal_workflow.py: POST to the platform checkpoint endpoint after each activity stage; fully non-fatal (try/except inside the function, plus defence-in-depth try/except at every call site). - 4 new pytest cases (test_temporal_workflow.py): - nonfatal_on_http_error: _save_checkpoint raises HTTPStatusError (500) → task_receive_activity still returns {"status":"received"}. - nonfatal_on_network_error: _save_checkpoint raises ConnectError → llm_call_activity still returns success LLMResult. - success_path: _save_checkpoint no-op → activity returns correctly; checkpoint called with correct args. - standalone_http_error_is_swallowed: real _save_checkpoint function swallows HTTP 500 from a mocked httpx.AsyncClient; returns None. All 36 temporal workflow Python tests pass. Go tests: Go binary not in this container; test file verified for syntax and against the sqlmock patterns used throughout the handlers package. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-17 19:17:29 +00:00
molecule-ai[bot]	4e4d21a8ac	Merge pull request #651 from Molecule-AI/feat/issue-594-audit-ledger feat: molecule-audit-ledger — HMAC-SHA256 immutable agent event log (#594)	2026-04-17 16:37:01 +00:00
molecule-ai[bot]	705c0a46ce	Merge pull request #763 from Molecule-AI/feat/issue-733-agents-md-impl feat(#733): implement AGENTS.md auto-generation	2026-04-17 16:21:58 +00:00
molecule-ai[bot]	8a00c338ee	feat(#733 ): implement AGENTS.md auto-generation	2026-04-17 16:20:39 +00:00
molecule-ai[bot]	c14f9f04d9	refactor(#741 ): extract medo.py from builtin_tools to plugins/molecule-medo The Baidu MeDo hackathon integration was sitting in builtin_tools/ as dead code — not imported by any loader but shipped with every workspace image, misleadingly suggesting it was a core builtin. Changes: - Move builtin_tools/medo.py → plugins/molecule-medo/skills/medo-tools/scripts/medo.py (git detects this as a rename — no code changes, identical tool surface) - Add plugins/molecule-medo/plugin.yaml (manifest: name, version, runtimes, tags) - Add plugins/molecule-medo/skills/medo-tools/SKILL.md (frontmatter + setup docs) - Move workspace-template/tests/test_medo.py → plugins/molecule-medo/tests/test_medo.py (update _MEDO_PATH to resolve from plugin root; add conftest.py for langchain mock) - Update .gitignore: change /plugins/ blanket ignore to /plugins/* so this plugin can be tracked until it gets its own standalone repo Acceptance criteria met: - builtin_tools/medo.py removed from core - plugins/molecule-medo/ created with identical tool surface (9/9 tests pass) - cd workspace-template && pytest → 1021 passed, 2 xfailed (no regression) - MEDO_API_KEY was never in default provisioning (.env.example / config.py clean) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-17 16:03:50 +00:00
Molecule AI Backend Engineer	ebfafb9139	feat: upgrade default workspace model to claude-opus-4-7 (#727 ) Replace the anthropic:claude-sonnet-4-6 default across config, handlers, env example, and litellm proxy config. All tests updated to match the new default; sonnet-4-6 alias kept in litellm_config.yml for pinned workspaces. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-17 15:30:57 +00:00
Molecule AI QA Engineer	5c95c6dc42	test: add _load_config_dict coverage for issue #652 Cover the four paths that were exercised only via mock in the _build_options tests: valid YAML, missing file, malformed YAML, and empty file (safe_load → None → {} via `or {}`). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-17 10:08:45 +00:00
Molecule AI Backend Engineer	cf5428664b	feat(issue-652): wire effort and task_budget to claude sdk output_config Adds _load_config_dict() helper to ClaudeSDKExecutor and wires the new effort and task_budget config fields into _build_options() before the Anthropic API call: - effort (str): low\|medium\|high\|xhigh\|max — populates output_config.effort - task_budget (int): advisory total-token budget; must be >= 20000 when set; automatically adds task-budgets-2026-03-13 beta header Also adds WorkspaceConfig.effort and WorkspaceConfig.task_budget fields in config.py and 5 acceptance tests covering all code paths. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-17 07:33:07 +00:00
Molecule AI Backend Engineer	7584267a80	fix(security): address Security Auditor findings on audit-ledger (#651 ) - Replace == HMAC comparisons with hmac.compare_digest (Python) and hmac.Equal (Go) in ledger.py, verify.py, and audit.go to prevent timing oracle attacks (Fixes 1-6) - Increase PBKDF2 iterations from 100K to 210K in both ledger.py and audit.go — must match for cross-language verification (Fix 7) - Return chain_valid: null when offset > 0 (paginated views cannot verify a truncated chain; null means "not computed") (Fix 8) - Remove module-level AUDIT_LEDGER_SALT attribute from ledger.py; read the secret exclusively from os.environ inside _get_hmac_key() so the salt is not exposed in the module namespace (Fix 9) - Update tests: use monkeypatch.setenv/delenv instead of setattr on the removed AUDIT_LEDGER_SALT attribute; update testAuditKey helper to use 210K iterations; add TestAuditQuery_PaginatedOffsetReturnsNullChainValid - Fix migration 028: workspace_id column type TEXT → UUID to match workspaces.id UUID primary key All tests pass: 1043 pytest + 0 Go test failures. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-17 07:30:10 +00:00
Molecule AI Triage Operator	af00a6c128	fix(merge): combine response_format (#498 ) and tools (#497 ) in hermes_executor Both PRs restructured the same chat.completions.create() call to use a create_kwargs dict. Resolved by keeping both __init__ params and both conditionals in the create_kwargs block. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-17 07:03:22 +00:00
molecule-ai[bot]	c9b8c26d5f	feat(hermes): native tools=[] parameter instead of text-in-prompt workaround (#497 ) feat(hermes): native tools=[] parameter instead of text-in-prompt workaround (#497)	2026-04-17 06:56:10 +00:00
Molecule AI Backend Engineer	951ea163fa	feat: molecule-audit-ledger — HMAC-SHA256 immutable agent event log (#594 ) Implements EU AI Act Annex III compliance (Art. 12 record-keeping, Art. 13 transparency) via an append-only HMAC-SHA256-chained agent event log. Python (workspace-template/molecule_audit/): - ledger.py: SQLAlchemy 2.0 AuditEvent model + PBKDF2 key derivation + append_event() with prev_hmac chain linkage + verify_chain() CLI helper. - hooks.py: LedgerHooks — on_task_start/on_llm_call/on_tool_call/on_task_end pipeline hooks; exception-safe (_safe_append); context manager support. - verify.py: `python -m molecule_audit.verify --agent-id <id>` CLI; exits 0=valid, 1=broken, 2=missing SALT, 3=DB error. - tests/test_audit_ledger.py: 46 tests covering HMAC determinism, field sensitivity, chain verification, LedgerHooks lifecycle, CLI. Go (platform/): - migrations/028_audit_events.up.sql: audit_events table with indexes. - internal/handlers/audit.go: GET /workspaces/:id/audit — parameterized queries, inline chain verification (chain_valid: bool\|null), PBKDF2 key cached via sync.Once. - internal/handlers/audit_test.go: 14 tests — HMAC, chain verify, handler query/filter/pagination/cap/error paths. - internal/router/router.go: wire wsAuth.GET("/audit", audh.Query). - .env.example: document AUDIT_LEDGER_SALT. - requirements.txt: add sqlalchemy>=2.0.0. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-17 06:55:36 +00:00
Molecule AI Triage Operator	4eb56ebec6	fix(plugins_registry): deduplicate handlers in _deep_merge_hooks() Unconditional list.extend() on repeated plugin install caused every hook handler to be appended on each reinstall, leading to 3-4x duplicate firings per event (PreToolUse, PostToolUse, Stop, etc.). Fix: before appending each incoming handler, compute a fingerprint of (matcher, frozenset-of-commands). Skip append if the fingerprint is already present in the merged list. First-time installs are unaffected — new handlers still land correctly. Adds 7 unit tests covering: first install, double install, triple install, different-matcher co-existence, different-command co-existence, existing user hook preservation, and top-level key merge semantics. Closes #566 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-17 05:22:00 +00:00
Molecule AI Backend Engineer	1d41f23ddd	feat(hermes): plumb response_format=json_schema for structured output (#498 ) Adds response_format support to HermesA2AExecutor so callers can request structured JSON output via the OpenAI-native response_format parameter. Changes: - _validate_response_format(): validates type (json_schema/json_object/text) and required sub-fields; returns None if valid, error message if invalid - HermesA2AExecutor.__init__: new response_format kwarg, stored as _response_format - execute(): validates before API call — invalid schema enqueues error and returns early without hitting Hermes API; valid and non-None adds response_format= to create_kwargs; None omits the field entirely Tests (12 new): - _validate_response_format: all valid types, invalid type, missing fields - constructor stores response_format correctly - valid response_format forwarded to API call - response_format omitted when None (no key in call kwargs) - invalid schema → error message enqueued, API not called Closes #498 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-17 01:19:51 +00:00
Molecule AI Backend Engineer	6d253b961d	feat(hermes): pass tools via native tools[] parameter instead of text-in-prompt (#497 ) Instead of injecting tool definitions as text into the system prompt, HermesA2AExecutor now accepts a tools: list[dict] \| None constructor parameter containing OpenAI-format tool definitions and forwards them via the native tools= parameter on chat.completions.create(). Empty list / None rule: when tools is falsy, the tools key is omitted from the API call entirely — never sent as tools=[] — so providers that reject an empty tools array don't return a 400. Tool-call response handling: when the model returns finish_reason "tool_calls" with no text content, the executor serialises the call list as a JSON string and enqueues it as the A2A reply. This keeps the executor thin (single API call per turn, no ReAct loop) while surfacing function-call intent in a structured, parseable format. Changes: - HermesA2AExecutor.__init__: new tools kwarg; stored as self._tools (copy; mutating the input list has no effect) - execute(): builds create_kwargs dict and conditionally adds tools= only when self._tools is non-empty; handles tool_calls response - Module docstring: new "Native tools (#497)" section with schema reference and edge-case explanation Tests (12 new, 47 total in hermes test file, 1002 total suite): - tools stored correctly in constructor (copy, None, [], non-empty) - non-empty tools forwarded as tools= in API call - multiple tools all forwarded - empty list ([] and None and default) → tools key absent from call - model tool_call response → JSON-serialised list as A2A reply - multiple tool_calls → all in JSON reply - text content present → text wins over tool_calls Closes #497 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-17 01:00:23 +00:00
Molecule AI Backend Engineer	3d817a42b7	feat(hermes): expose reasoning mode for Hermes 4 via OpenAI-compat API (#496 ) Hermes 4 is a hybrid-reasoning model trained on <think> tags; without asking for thinking we pay flagship $/tok but get non-reasoning quality. This adds a dedicated HermesA2AExecutor that dispatches to any OpenAI-compat endpoint (OpenRouter, Nous Portal) and enables native reasoning for Hermes 4 models. Key decisions: - ProviderConfig + _reasoning_supported() detect Hermes 4 by model slug substring ("hermes-4", "hermes4") — case-insensitive, no config needed - extra_body={"reasoning": {"enabled": True}} sent only to Hermes 4 entries; Hermes 3 path unchanged (no extra_body, no regressions) - choices[0].message.reasoning + reasoning_details extracted and written to an OTEL span (hermes.reasoning) — deliberately NOT echoed in the A2A reply so the reasoning trace never contaminates the agent's next-turn context - API key / base URL default to OPENAI_API_KEY / OPENAI_BASE_URL env vars with openrouter.ai/api/v1 as the fallback endpoint - _client injection parameter for unit tests (no live API calls needed) - Error sanitization: only exception class name surfaces to user (mirrors sanitize_agent_error() convention from cli_executor.py) Test coverage: 35 tests, 100% coverage on all new code paths including: - _reasoning_supported() — Hermes 4/3/unknown/empty/uppercase - ProviderConfig — field assignment and capability flags - extra_body presence for Hermes 4, absence for Hermes 3 - reasoning not in A2A reply; _log_reasoning called when trace present - reasoning_details forwarded; span attributes set correctly - Telemetry failure swallowed (never blocks response) - API error → sanitized class-name-only reply - cancel() → TaskStatusUpdateEvent(state=canceled) Full suite: 990 passed, 0 failed (no regressions). Resolves #496 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-16 20:38:45 +00:00
Hongming Wang	55a2ee0153	fix: properly remove adapter subdirectories + move shared code to root PR #471 removed Dockerfiles/requirements from adapters/ but left the Python source files. This commit finishes the extraction: 1. Moved shared_runtime.py → workspace-template/shared_runtime.py (used by prompt.py, a2a_executor.py, coordinator.py — not adapter-specific) 2. Moved base.py → workspace-template/adapter_base.py (BaseAdapter + AdapterConfig — the interface adapters implement) 3. Updated imports in prompt.py, a2a_executor.py, coordinator.py 4. Rewritten adapters/__init__.py as a thin shim that: - Reads ADAPTER_MODULE env var (production: standalone repos set this) - Re-exports BaseAdapter/AdapterConfig for backward compat 5. adapters/base.py + adapters/shared_runtime.py remain as re-export shims 6. Deleted all 8 adapter subdirectories (autogen, claude_code, crewai, deepagents, gemini_cli, hermes, langgraph, openclaw) 7. Removed 11 test files that imported adapter-specific code Tests: 955 passed, 0 failed (down from 1216 — the difference is adapter-specific tests that moved to standalone repos).	2026-04-16 04:59:13 -07:00
Hongming Wang	8ea8c1d7af	fix: remove tests that referenced removed plugins/ directory test_first_party_plugins.py, test_plugins_builtins_drift.py, and test_hermes_adapter.py all referenced files under plugins/ and adapters/ which were extracted to standalone repos. These tests belong in those repos now, not in the core workspace-template. 1216 passed, 0 failed after removal.	2026-04-16 04:39:31 -07:00
Hongming Wang	cb74f0d6ae	chore: extract workspace runtime to PyPI + move adapter Dockerfiles to template repos Published `molecule-ai-workspace-runtime==0.1.0` to PyPI: https://pypi.org/project/molecule-ai-workspace-runtime/0.1.0/ Source repo: https://github.com/Molecule-AI/molecule-ai-workspace-runtime Each adapter's Dockerfile and requirements.txt have moved to the corresponding standalone template repo (molecule-ai-workspace-template-<runtime>). The adapter Python code (.py files) stays in the monorepo for local dev and testing. Changes: - workspace-template/pyproject.toml — new, packages the shared runtime as a PyPI package - workspace-template/adapters//Dockerfile — removed (now in template repos) - workspace-template/adapters//requirements.txt — removed (now in template repos) - workspace-template/Dockerfile — drop COPY adapters/ (still copies .py files via *.py glob) - workspace-template/build-all.sh — simplified to base-image-only build - workspace-template/entrypoint.sh — remove adapter requirements.txt install step - workspace-template/tests/test_hermes_adapter.py — skip Dockerfile/requirements.txt checks - CLAUDE.md — update architecture description + workspace image table - docs/workspace-runtime-package.md — new, explains the package + adapter repo layout Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-16 04:33:10 -07:00
rabbitblood	067a8333ce	feat(workspace): gh-wrapper — auto-tag agent PRs + issues with role Every agent in the template currently uses the same GitHub PAT, so \`gh pr list\` shows every PR as authored by the CEO's account with no signal which agent opened each one. Commits already carry per-agent authors (GIT_AUTHOR_NAME from #402). This wrapper extends the identity split to the PR/issue metadata surface layer that commit attribution can't reach. ## How it works A tiny bash script installed at \`/usr/local/bin/gh\`, which sits earlier in PATH than the real binary at \`/usr/bin/gh\`. For \`gh pr create\` and \`gh issue create\`: - Title gets prefixed with \`[Role Name]\` — e.g. \`[Frontend Engineer] fix: canvas grid index\` - Body gets \`\n\n---\n_Opened by: Molecule AI <Role>_\` appended Role is read from \`GIT_AUTHOR_NAME\` which the platform provisioner sets to \`Molecule AI <Role>\` (shipped with #402). Accepts both \`--title X\` and \`--title=X\` forms. Same for \`--body\`. Anything that isn't \`gh pr create\` or \`gh issue create\` (e.g. \`gh pr list\`, \`gh issue view\`, \`gh run watch\`) passes through untouched. No behaviour change for read-side operations. ## Idempotent - If the title already starts with \`[...]\` the wrapper does not re-prefix. \`gh pr edit\` flows that resubmit title won't layer multiple tags. - If the body already contains \`Opened by: Molecule AI\` the footer is not re-appended. ## Fail-open When \`GIT_AUTHOR_NAME\` is absent or doesn't start with \`Molecule AI \`, the wrapper exec's the real gh with unchanged args. No call is ever blocked by this script. ## Test coverage \`tests/test_gh_wrapper.sh\` — 12 cases, no network, no Docker: - Passthrough for non-create subcommands (pr list) - pr create title prefix + body footer - issue create with \`--title=X\` \`--body=X\` equals-form - Idempotent title re-prefix - Idempotent body footer (count = 1 after two applies) - Missing GIT_AUTHOR_NAME → passthrough, title preserved - Malformed GIT_AUTHOR_NAME (not "Molecule AI ...") → passthrough All 12 pass. Test script is standalone bash + a temp fake gh binary that echoes argv; safe to run in CI's Python Lint & Test job via subprocess shell-out. ## Deployment note This lands in the workspace image. Existing containers keep their old /usr/bin/gh until the image is rebuilt and they're re-provisioned (POST /workspaces/:id/restart {}). No migration required; the wrapper just starts tagging PRs once the new image is rolled. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-16 03:10:46 -07:00
rabbitblood	b30d8d431c	fix(tests): test_hermes_phase2_dispatch exec-load needs escalation + __name__ Phase 3 escalation ladder added `from .escalation import ...` to executor.py. The phase-2 dispatch tests load executor.py via `exec(compile(src, ...))` with the relative import rewritten — this broke because (a) the rewrite didn't know about escalation and (b) the exec namespace lacked `__name__`, which executor.py needs at import time for `logging.getLogger(__name__)`. Fix both in all 8 exec sites: - Rewrite both `from .providers import` AND `from .escalation import` - Pre-register escalation + providers in sys.modules under the fake package name - Seed the exec namespace with `__name__ = "hermes_executor_under_test"` 54/54 hermes tests pass (28 escalation truth-table + 6 ladder-integration + 20 existing phase-2 dispatch). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-16 02:43:02 -07:00
rabbitblood	3cd18929c4	feat(hermes): escalation ladder — promote to stronger models on transient failure Ships scoped Phase 3 of the Hermes multi-provider work. Every workspace can now declare an ordered list of (provider, model) rungs; when the pinned model hits rate-limit / 5xx / context-length / overload, the executor advances to the next rung before raising. ## Why 3× Claude Max saturation is a routine occurrence now — the "first 429 on a batch delegation" is the common path, not the exception. A workspace pinned to Haiku that hits a context-length limit has no recovery today; same for Sonnet hitting rate-limit mid-synthesis. Escalation promotes to the next tier for that single call, preserves coordination, avoids restart cascades. ## New module: adapters/hermes/escalation.py - ``LadderRung(provider, model)`` — one config entry. - ``parse_ladder(raw)`` — tolerant config parser; skips malformed rungs with a warning rather than raising so boot stays resilient. - ``should_escalate(exc) -> bool`` — truth table over 15+ error shapes: - Typed classes (RateLimitError, OverloadedError, APITimeoutError, APIConnectionError, InternalServerError) - Context-length markers (each provider uses different phrasing) - Gateway markers (502/503/504, overloaded, temporarily unavailable) - Status-code substrings (429, 529, 5xx) - Hard-rejects auth failures (401/403/invalid_api_key) even if the outer exception class is RateLimitError — wrapping case matters. ## Executor wiring ``HermesA2AExecutor`` now accepts ``escalation_ladder`` in its constructor + ``create_executor()`` factory. ``_do_inference()`` walks the ladder: 1. First attempt = pinned provider:model (matches pre-ladder behaviour) 2. On escalatable error, try each rung in order 3. On non-escalatable error, raise immediately (auth, malformed payload) 4. On exhaustion, raise the last error Rung switches temporarily rebind ``self.provider_cfg`` / ``self.model`` / ``self.api_key`` / ``self.base_url`` in a try/finally, so any raised error leaves the executor in its original state for the next call. Key resolution for non-pinned rungs goes through ``resolve_provider`` which reads the rung-provider's env vars fresh. ## Config shape ``config.yaml`` (rendered from ``org.yaml`` → workspace secrets): runtime_config: escalation_ladder: - provider: gemini model: gemini-2.5-flash - provider: anthropic model: claude-sonnet-4-5-20250929 - provider: anthropic model: claude-opus-4-1-20250805 Empty / absent = single-shot behaviour, full backwards-compat with every existing workspace. ## Tests 34 passing, all isolated (no network): - ``test_hermes_escalation.py`` (28): parser + truth-table across rate-limit, overload, context-length, gateway, auth-reject, unrelated exceptions, and case-insensitivity. - ``test_hermes_ladder_integration.py`` (6): no-ladder single call, ladder-not-triggered on success, escalate-on-rate-limit-then-succeed, stop-on-non-escalatable, raise-last-error-when-exhausted, skip- unknown-provider-in-rung. ## Not in this PR - Uncertainty-driven escalation (judge pass after successful reply). - Per-workspace budget tracking (#305 covers this separately). - Live streaming reuse across rungs (ladder retries the whole call). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-16 02:27:27 -07:00
Hongming Wang	0aec76400a	feat(adapters): add gemini-cli runtime adapter (closes #332 ) (#379 ) Adds a `gemini-cli` workspace runtime backed by Google's Gemini CLI (@google/gemini-cli, ~101k ★, Apache 2.0). Mirrors the claude-code adapter pattern: Docker image installs the CLI, CLIAgentExecutor drives the subprocess, A2A MCP tools wire via ~/.gemini/settings.json. Changes: - workspace-template/adapters/gemini_cli/ — new adapter (Dockerfile, adapter.py, __init__.py, requirements.txt); setup() seeds GEMINI.md from system-prompt.md and injects A2A MCP server into settings.json - workspace-template/cli_executor.py — adds gemini-cli to RUNTIME_PRESETS (--yolo flag, -p prompt, --model, GEMINI_API_KEY env auth); adds mcp_via_settings preset flag to skip --mcp-config injection for runtimes that own their own settings file - workspace-configs-templates/gemini-cli/ — default config.yaml + system-prompt.md template - tests/test_adapters.py — adds gemini-cli to expected adapter set - CLAUDE.md — documents new runtime row in the image table Requires: GEMINI_API_KEY global secret. Build: bash workspace-template/build-all.sh gemini-cli Co-authored-by: DevOps Engineer <devops@molecule.ai> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-15 23:30:00 -07:00
Hongming Wang	e7bde9a919	Merge pull request #338 from Molecule-AI/fix/issue-328-transcript-fail-closed fix(security): /transcript fails closed when auth token missing (#328)	2026-04-15 21:30:56 -07:00
Hongming Wang	c11d8f3ec3	fix(security): hitl task-id ownership + wire fail_open_if_no_scanner in loader (closes #265 , #268 ) Security audit cycle 13: hitl.py LGTM (workspace-scoped task IDs). Loader.py fix applied (commit 0557f73): fail_open_if_no_scanner now read from config and forwarded to scan_skill_dependencies(); regression test added. CI 5/6 pass (E2E cancel = run-supersession pattern). Closes #265. Closes #268.	2026-04-15 21:18:52 -07:00
Hongming Wang	5eb08332ee	fix(security): /transcript endpoint fails closed when auth token missing (#328 ) Severity HIGH. The /transcript route in main.py used `if expected:` around the bearer-token compare, so `get_token()` returning None (no /configs/.auth_token on disk — bootstrap window, deleted file, OSError) silently skipped the entire auth check. Any container on molecule-monorepo-net could GET /transcript during the provisioning window and walk away with the full session log (user messages, Claude tool calls, assistant replies). The platform's TranscriptHandler always has a valid token (it acquired one at workspace registration), so tightening this gate has no legitimate-caller impact. Only unauthenticated sniffers lose access, which was never the intended contract of #287. Fix: 1. Extracted the auth gate into `workspace-template/transcript_auth.py` — a 20-line module with no heavy imports so the security-critical code is unit-testable without standing up the full uvicorn/a2a/httpx stack (the former inline guard could only be tested end-to-end, which explains why the regression shipped in #287). 2. `transcript_authorized(expected, auth_header)` returns False when `expected` is None or empty — the #328 fix — and otherwise does strict equality against "Bearer <expected>". 3. main.py's inline handler calls the extracted function: if not _transcript_authorized(get_token(), auth_header): return 401 4. New tests/test_transcript_auth.py covers: None token, empty token, valid bearer, wrong bearer, missing header, case-sensitive prefix, whitespace fuzzing. All 7 pass. Closes #328 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 21:17:37 -07:00
Hongming Wang	e88ae9f6d0	fix(a2a-tools): auth_headers on recall_memory + commit_memory (#304 ) Adds auth_headers to recall_memory and commit_memory in a2a_tools.py. Fixes the #215-class auth regression for A2A memory tools. Test mocks updated to accept headers kwarg.	2026-04-15 19:12:18 -07:00
Hongming Wang	472495c380	Merge pull request #270 from Molecule-AI/feat/workspace-transcript-endpoint feat: GET /workspaces/:id/transcript — live agent session log	2026-04-15 17:55:41 -07:00
Hongming Wang	469d24c23a	fix(tests): update memory fakes for auth_headers kwarg + activity overwrite The #215-class fix in memory.py (859a60e) adds headers=_headers to the direct-httpx commit_memory + search_memory paths, but 9 existing tests in test_memory.py had FakeAsyncClient.post/get signatures like `async def post(self, url, json):` with no headers kwarg. Python raised TypeError: unexpected keyword argument 'headers' on every call, commit_memory caught it and returned {success: False}, tests failed. Fixes applied: 1. Add `headers=None` to every FakeAsyncClient.post + .get signature across test_memory.py. Uses replace_all so all 9+ fakes match. 2. For tests that capture a single captured["url"]: - test_commit_memory_uses_awareness_client_when_configured - test_commit_memory_uses_platform_fallback_without_awareness - test_commit_memory_httpx_201_success filter to only capture /memories URLs. Without the filter, the subsequent _record_memory_activity fire-and-forget post to /activity overwrites captured["url"] and the assertion fails. 3. For test_commit_memory_promoted_packet_logs_skill_promotion: bump expected captured["calls"] from 3 to 4. Pre-fix, the memory_write /activity call (from _record_memory_activity #125) was silently dropped because the fake rejected headers=; post-fix it succeeds and lands in the captured list alongside the skill_promotion /activity and /registry/heartbeat calls. Also extend that test's fake to accept /registry/heartbeat (was raising AssertionError). Total: 36/36 memory tests pass. Full workspace-template suite 1189/1189. This is strictly test-infrastructure work — zero production code changed. CI never caught the break because the Mac mini runner has been stuck for ~4 hours (tick-33/34/35/36 reports). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 17:29:15 -07:00
rabbitblood	baffc6b0c3	feat(hermes): Phase 2d-i — system-prompt.md injection on all 3 dispatch paths The Hermes adapter never read /configs/system-prompt.md. Any role that switched to runtime: hermes was silently losing its role identity because the system prompt wasn't passed to the model. This PR fixes that by: 1. HermesA2AExecutor.__init__ takes new optional `config_path` kwarg 2. `create_executor(config_path=...)` forwards to the constructor 3. `adapter.py` passes `config.config_path` through from AdapterConfig 4. `execute()` reads system-prompt.md via executor_helpers.get_system_prompt (hot-reload-capable — reads on every turn, not just at startup) 5. `_do_inference(user_message, history, system_prompt)` — new arg threads through the dispatch to each native path 6. Each path uses the provider's NATIVE system field: - OpenAI-compat: prepends `{"role":"system", "content":...}` to messages - Anthropic: top-level `system=` kwarg (NOT in messages — Anthropic requires system at the top level) - Gemini: `config=GenerateContentConfig(system_instruction=...)` ## Phase scoreboard - 2a (in main) — native Anthropic dispatch infra - 2b (in main) — native Gemini dispatch - 2c (in main) — multi-turn history on all paths - 2d-i (this PR) — system prompts on all paths - 2d-ii (future) — tool calling on native paths - 2d-iii (future) — vision content blocks on native paths - 2d-iv (future) — streaming ## Test coverage 46/46 tests pass (20 Phase 2 dispatch + 26 Phase 1 registry): - Existing dispatch tests updated to assert the 3-arg call shape `("hello", None, None)` — history + system_prompt both None - 4 new tests: - `dispatch_passes_system_prompt_to_anthropic` — happy path, third arg flows - `dispatch_passes_system_prompt_to_gemini` — happy path - `dispatch_passes_system_prompt_to_openai` — happy path - `executor_accepts_config_path_kwarg` — constructor stores config_path - `create_executor_forwards_config_path` — both back-compat and registry resolution paths forward config_path through to the executor ## Back-compat - `config_path=None` (default) → execute() skips system-prompt injection, same behavior as pre-2d-i - Workspaces with `runtime: hermes` but no `/configs/system-prompt.md` file get `system_prompt=None` (get_system_prompt returns fallback), same as before - The 13 OpenAI-compat providers work identically — system_prompt just adds a leading message, which every OpenAI-compat endpoint already supports - Anthropic + Gemini previously got zero system context; now they get the same system prompt the workspace's system-prompt.md carries ## Why this matters Before this PR: if someone flipped a workspace from `runtime: claude-code` to `runtime: hermes`, the agent would act generically (no role identity, no project conventions, no CLAUDE.md context) because the Hermes executor never looked at system-prompt.md. That's a silent correctness regression the test suite wouldn't catch because none of our live workspaces use the hermes runtime today. With this PR: Hermes workspaces get the same system prompt injection as Claude-code workspaces, making the `runtime: hermes` switch a true drop-in alternative. ## Related - #267 Phase 2c (multi-turn history — in main) - #255 Phase 2b (gemini native — in main) - #240 Phase 2a (anthropic native — in main) - #208 Phase 1 (provider registry — in main) - project_hermes_multi_provider.md — Phase 2d-i was the next queued item	2026-04-15 16:21:47 -07:00
airenostars	1f22d7df1b	feat: GET /workspaces/:id/transcript — live agent session log Closes #N (issue to be filed) Lets canvas / operators see live tool calls + AI thinking instead of waiting for the high-level activity log to flush. Right now the only way to "look over an agent's shoulder" is `docker exec ws-XXX cat /home/agent/.claude/projects/.../<session>.jsonl`, which: - doesn't work for remote workspaces (Phase 30 / Fly Machines) - requires shell access on the host - has no pagination This PR adds: 1. `BaseAdapter.transcript_lines(since, limit)` — async hook returning `{runtime, supported, lines, cursor, more, source}`. Default returns `supported: false` so non-claude-code runtimes pass through gracefully. 2. `ClaudeCodeAdapter.transcript_lines` override — reads the most- recently-modified `.jsonl` in `~/.claude/projects/<cwd>/`. Resolves cwd the same way `ClaudeSDKExecutor._resolve_cwd()` does so the project dir name matches what Claude Code actually writes to. Limit capped at 1000 to prevent OOM. 3. Workspace HTTP route `GET /transcript` — Starlette handler added alongside the A2A app. Trusts the internal Docker network (same model as POST / for A2A); Phase 30 remote-workspace auth is a follow-up. 4. Platform proxy `GET /workspaces/:id/transcript` — looks up the workspace's URL, forwards GET, caps response at 1MB. Gated by existing `WorkspaceAuth` middleware (same as /traces, /memories, /delegations). Tests: 6 Python unit tests cover empty dir / pagination / multi-session / malformed lines / limit cap, plus 4 Go tests cover 404 / proxy forwarding / query-string propagation / unreachable-workspace 502. Verified end-to-end on a live workspace — returns real claude-code session entries through the platform proxy. ## Follow-ups - WebSocket variant for live streaming (instead of polling) - Canvas UI tab "Transcript" between Activity and Traces - LangGraph / DeepAgents / OpenClaw transcript adapters - Phase 30 remote-workspace auth on /transcript	2026-04-15 14:29:43 -07:00
rabbitblood	cb3c7dcf91	feat(hermes): Phase 2c — multi-turn history passed natively to all paths Completes the Phase 2 scope by keeping conversation turns as turns across all three dispatch paths. Pre-2c, history was flattened into a single user message via shared_runtime.build_task_text, which worked as a fallback but lost the model's native multi-turn awareness (role attribution, instruction-following on mid-conversation corrections, system-prompt grounding against prior turns). Phase 2a + 2b shipped the dispatch infrastructure + per-provider native paths. This PR uses them properly. ## What's new - `_history_to_openai_messages(user_message, history)` (static) — maps A2A `(role, text)` tuples to OpenAI Chat Completions `[{"role":"user"\|"assistant","content":str}]`. Roles: `human`→`user`, `ai`→`assistant`. Current turn appended as the final user message. - `_history_to_anthropic_messages` (static) — identical wire shape to OpenAI for text-only turns, so it delegates. Phase 2d tool_use/vision blocks will diverge here. - `_history_to_gemini_contents` (static) — Gemini uses a different shape: `role="user"\|"model"` (NOT "assistant") and text wrapped in `parts=[{"text":...}]`. Delegates to none of the others. - `_do_openai_compat(user_message, history=None)` — accepts history, builds messages via `_history_to_openai_messages`. Back-compat: pass `history=None` to get the old single-turn behavior. - `_do_anthropic_native(user_message, history=None)` — same signature change, calls `_history_to_anthropic_messages`. Still uses `anthropic.AsyncAnthropic().messages.create()`, just with proper multi-turn. - `_do_gemini_native(user_message, history=None)` — same pattern, calls `_history_to_gemini_contents`, passes to Gemini's `generate_content(contents=...)`. - `_do_inference(user_message, history=None)` — new signature, dispatches by auth_scheme as before, passes both args through. - `execute()` — no longer calls `build_task_text`. Calls `extract_history(context)` directly and forwards to `_do_inference`. Removes the `build_task_text` import (not needed in this file anymore). ## Tests Existing 7 dispatch tests updated for the new `(user_message, history)` signature — they assert the path is called with `("hello", None)` since they pass no history. 5 NEW tests: - `test_history_to_openai_messages_empty_history` — empty history degrades to single user message (back-compat) - `test_history_to_openai_messages_multi_turn` — round-trip of a 3-turn history + current turn - `test_history_to_anthropic_messages_same_as_openai` — cross-check that anthropic path produces identical wire shape for text-only - `test_history_to_gemini_contents_uses_model_role_and_parts_wrapper` — verifies the Gemini-specific role mapping (`ai`→`model`) + parts wrapper - `test_dispatch_passes_history_through` — end-to-end: _do_inference forwards history to the chosen provider path All 41 tests pass (15 Phase 2 dispatch + 26 Phase 1 registry): pytest tests/test_hermes_phase2_dispatch.py tests/test_hermes_providers.py 41 passed in 0.07s ## Back-compat - No public API changes to `create_executor()`. Callers that hit `execute()` via A2A get the new multi-turn behavior automatically via `extract_history(context)`. - Callers that passed an empty history list (or None) get the same single-turn behavior as pre-2c. - The `build_task_text` helper in shared_runtime is unchanged — other adapters (AutoGen, LangGraph) that use it keep working. Only Hermes bypasses it now. ## What's NOT in this PR (Phase 2d) - Tool calling / function calling on native paths (anthropic `tools=`, gemini `tools=Tool(function_declarations=[...])`) - Vision content blocks (image_url → anthropic `{type:"image", source: {type:"base64",...}}` / gemini `{inline_data:{mime_type,data}}`) - System instructions pass-through (anthropic `system=`, gemini `system_instruction=`) - Streaming (`astream_messages` / `streamGenerateContent` stream variants) - Extended thinking (anthropic `thinking={"type":"enabled"}`) / Gemini thinking config Phase 2c is the multi-turn upgrade. Tool + vision + streaming are Phase 2d, scoped in project_hermes_multi_provider.md. ## Related - #240 Phase 2a (native Anthropic dispatch — in main) - #255 Phase 2b (native Gemini dispatch — in main) - Phase 1 (#208 — provider registry baseline, in main) - `project_hermes_multi_provider.md` queued memory - CEO 2026-04-15: "focus on supporting hermes agent"	2026-04-15 14:21:10 -07:00
Hongming Wang	3828693897	Merge pull request #255 from Molecule-AI/feat/hermes-phase2b-gemini-native feat(hermes): Phase 2b — native Google Gemini generateContent dispatch path	2026-04-15 14:01:00 -07:00
Hongming Wang	df4740bf26	Merge pull request #240 from Molecule-AI/feat/hermes-phase2-native-sdks feat(hermes): Phase 2a — native Anthropic Messages API dispatch (auth_scheme='anthropic')	2026-04-15 14:00:51 -07:00
Hongming Wang	1d9ddb8c67	fix(tests): hermes provider env-var leak broke test_hermes_smoke Pre-existing flaky test: when the full workspace-template suite ran in collection order, test_hermes_smoke.py::test_create_executor_raises_ without_keys failed with "DID NOT RAISE ValueError". Failure only surfaced when test_hermes_providers ran first. Root cause: test_hermes_providers had an autouse fixture that used monkeypatch.delenv on entry, but several tests in that file mutate os.environ directly (e.g. `os.environ["HERMES_API_KEY"] = "test"`), bypassing monkeypatch. monkeypatch only tracks its own deltas, so on fixture teardown the direct-mutation values stayed in os.environ. HERMES_API_KEY leaked across file boundaries into test_hermes_smoke, which then saw a key present when it expected absence. Fix: replace monkeypatch-based fixture with pure snapshot/restore: - Snapshot all provider env vars at entry - Clear them - yield (test runs, may mutate freely) - try/finally restore the exact pre-test state This is deterministic regardless of whether a test uses monkeypatch, direct mutation, or neither. Also adds a comment documenting WHY we switched away from monkeypatch so a future reviewer doesn't revert. Full workspace-template suite: 1169 passed, 9 skipped, 2 xfailed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 13:59:48 -07:00
rabbitblood	adcaa69e42	feat(hermes): Phase 2b — native Google Gemini generateContent dispatch path Completes Hermes Phase 2 by adding the second native SDK path: Google Gemini via the official `google-genai` Python SDK. Stacked on top of Phase 2a (feat/hermes-phase2-native-sdks) which introduced the dispatch infra + the anthropic native path. ## What's new in this PR 1. `providers.py`: flip `gemini` entry to `auth_scheme="gemini"` and update `base_url` from the OpenAI-compat endpoint (`/v1beta/openai`) to the bare host (`https://generativelanguage.googleapis.com`) which the native SDK uses. 2. `executor.py`: new method `_do_gemini_native(task_text)` that uses `google.genai.Client().aio.models.generate_content(...)`. Dispatch table in `_do_inference` now routes `"gemini"` → `_do_gemini_native`. Same fail-loud semantics as `_do_anthropic_native` — missing SDK raises a clear RuntimeError with install instructions. 3. `requirements.txt`: add `google-genai>=1.0.0`. 4. `test_hermes_phase2_dispatch.py`: +3 tests - `test_gemini_entry_has_gemini_scheme` — registry flip + base URL validated - `test_dispatch_gemini_scheme_calls_gemini_native` — dispatch runs gemini native, not openai-compat or anthropic-native - `test_gemini_native_raises_clear_error_when_sdk_missing` — fail-loud on missing `google-genai` package Plus updated existing dispatch tests to mock `_do_gemini_native` alongside the other paths so "no cross-calls" assertions stay tight. All 36 tests pass locally (10 Phase 2 dispatch + 26 Phase 1 registry): pytest tests/test_hermes_phase2_dispatch.py tests/test_hermes_providers.py 36 passed in 0.07s ## Dispatch table after this PR auth_scheme="openai" → _do_openai_compat (13 providers) auth_scheme="anthropic" → _do_anthropic_native (1 provider, Phase 2a) auth_scheme="gemini" → _do_gemini_native (1 provider, Phase 2b) ← NEW <unknown> → _do_openai_compat + warning (forward-compat) ## Back-compat - All 13 openai-scheme providers unchanged - `hermes_api_key` / `HERMES_API_KEY` / `OPENROUTER_API_KEY` paths unchanged - Only `gemini` provider changes behavior: now uses native generateContent instead of the `/v1beta/openai` compat shim - Existing Gemini callers setting `GEMINI_API_KEY` get the native path automatically — no caller changes needed ## What's NOT in this PR (future phases) - Streaming support (`astream_messages` / `streamGenerateContent` stream variants) for either native path - Tool calling / function calling on native paths - Vision content blocks (image_url → anthropic image blocks; image_url → gemini inline_data with base64 + mime_type) - Extended thinking (anthropic) / thinking config (gemini) - System instructions pass-through on the gemini native path Phase 2c/2d will layer these on. This PR is the minimum-viable native dispatch — single-turn text in, text out — same shape as Phase 2a. ## Stacking This PR targets `feat/hermes-phase2-native-sdks` (Phase 2a) as its base branch, NOT main, so the diff shows only the Gemini-specific additions. When Phase 2a merges to main, GitHub auto-rebases this PR onto the new main head. If reviewer prefers a single combined PR, close #240 and land this one instead — the commits on feat/hermes-phase2-native-sdks are already included in this branch's history. ## Related - #240 Phase 2a (parent branch) - #208 Phase 1 (registry + openai-compat path — already in main) - `project_hermes_multi_provider.md` queued memory — Phase 2 was the next item, this PR completes it - `docs/ecosystem-watch.md` → `### Hermes Agent` — Research Lead's eco-watch entry that catalogued Hermes's native provider list and shaped the original Phase 2 scope	2026-04-15 13:20:39 -07:00
rabbitblood	3dd8df585e	feat(hermes): Phase 2a — native Anthropic Messages API dispatch path Completes the Hermes adapter's native-SDK plan for the provider that gains the most from leaving OpenAI-compat: Anthropic. OpenAI-compat works fine for plain text turns on every provider (Phase 1 covered that with one code path for all 15 providers), but Anthropic's Messages API has first-class tool use, vision content blocks, and extended thinking that the OpenAI-compat shim strips or mis-translates. Rather than ship all native SDK paths in one PR (Anthropic + Gemini + future), this lands Anthropic only (Phase 2a). Gemini is Phase 2b, shipping after a production measurement window on Phase 2a. ## Design Providers now dispatch by `auth_scheme` field. Phase 1 added the field but every provider used `"openai"`. Phase 2 flips `anthropic` to `"anthropic"` and wires a second inference path keyed on that: - `HermesA2AExecutor._do_openai_compat(task_text)` — existing path, handles 14 of 15 providers (Nous Portal, OpenRouter, OpenAI, xAI, Gemini, Qwen, GLM, Kimi, MiniMax, DeepSeek, Groq, Together, Fireworks, Mistral) - `HermesA2AExecutor._do_anthropic_native(task_text)` — NEW, uses the official `anthropic` Python SDK's `AsyncAnthropic().messages.create(...)` - `HermesA2AExecutor._do_inference(task_text)` — dispatches by `self.provider_cfg.auth_scheme` Unknown schemes fall back to OpenAI-compat with a logged warning, so future provider additions don't crash if a native SDK path ships late. ## Fail-loud on missing SDK `_do_anthropic_native` raises a clear `RuntimeError` with install instructions if the `anthropic` package is missing at runtime: Hermes anthropic native path requires the `anthropic` package. Install in the workspace image with `pip install anthropic>=0.39.0` or set HERMES provider=openrouter to route Claude models through OpenRouter's OpenAI-compat shim instead. This is intentional: silent fallback would mask fidelity loss (tool_use blocks become plain text, vision gets stripped). Loud failure is better. `requirements.txt` adds `anthropic>=0.39.0` so the package is baked into the workspace-template image build path. Operators building custom workspace images without anthropic installed get the loud error. ## Back-compat - `create_executor(hermes_api_key="x")` → still routes to Nous Portal (`auth_scheme="openai"`), unchanged - `HERMES_API_KEY` env var → still first in RESOLUTION_ORDER - `OPENROUTER_API_KEY` env var → still second - All 14 OpenAI-compat providers unchanged — they take the same code path as before - ONLY `anthropic` provider changes behavior: it now uses the native Messages API instead of the `/v1/chat/completions` compat shim ## Constructor signature change `HermesA2AExecutor.__init__` now takes `provider_cfg: ProviderConfig` instead of separate `api_key + base_url + model`. The three fields are derived from `provider_cfg` + an optional model override. This is a breaking change for any external caller building an executor directly, but the only documented public entry point is `create_executor()`, which is updated in the same commit to pass the cfg through. ## Test coverage `workspace-template/tests/test_hermes_phase2_dispatch.py` — 7 new tests: 1. `test_anthropic_entry_has_anthropic_scheme` — registry flip 2. `test_all_other_providers_still_openai_scheme` — regression guard 3. `test_dispatch_openai_scheme_calls_openai_compat` — happy path 4. `test_dispatch_anthropic_scheme_calls_anthropic_native` — happy path 5. `test_dispatch_unknown_scheme_falls_back_to_openai_compat` — forward compat 6. `test_anthropic_native_raises_clear_error_when_sdk_missing` — fail-loud 7. `test_create_executor_passes_provider_cfg` — constructor wiring All pass locally (pytest tests/test_hermes_phase2_dispatch.py -v, 0.04s). Phase 1 tests unchanged: `test_hermes_providers.py` 26/26 pass, no regressions. ## What's NOT in this PR (Phase 2b) - Gemini native `generateContent` path (`auth_scheme="gemini"`) - Streaming support across both native paths (`astream_messages`, `streamGenerateContent`) - Tool calling on the anthropic native path (the `tools` + `tool_use` blocks) - Vision content blocks (image_url → anthropic image blocks) - Extended thinking parameter passthrough All scoped in `project_hermes_multi_provider.md`. Phase 2a is the minimum viable native Anthropic dispatch — single-turn text in, text out, no tools. ## Related - Phase 1 baseline (already in main): #208 — provider registry + OpenAI-compat path - Queued memory: `project_hermes_multi_provider.md` — full phased plan - Triggering directive: CEO 2026-04-15 — "once current works are cleared, focus on supporting hermes agent"	2026-04-15 12:23:56 -07:00
rabbitblood	0f2ed6bf0a	fix(claude-sdk): #160 — probe CLI directly when SDK swallowed the real stderr Context: when the claude-agent-sdk wraps a stream error from the CLI subprocess that it can't categorize (rate limit, auth, network), it raises a bare `Exception("Command failed with exit code 1\nError output: Check stderr output for details")`. The exception has no `.stderr` or `.exit_code` attributes, so #66's `_format_process_error` — which reads those attributes — has nothing to surface. The log line becomes: SDK agent error [claude-code]: Exception: Command failed with exit code 1 (exit code: 1)\nError output: Check stderr output for details That's the placeholder text from the SDK's error path, not the actual error. Operators chasing a stuck workspace are forced to `docker exec ws-xxx claude --print` manually to discover the real cause. Observed today during the rate-limit incident: every PM error line was identical "Check stderr output for details" while the real cause ("You've hit your limit · resets Apr 17, 11pm (UTC)") was only visible via manual reproduction — that cost ~20 minutes of diagnosis time. ## Fix Add `_probe_claude_cli_error()`: a best-effort subprocess call that runs `claude --print` with a small probe input, captures stderr+stdout, and returns the real error string. Bounded by 30s timeout so a hung CLI can't stall the error path. Extend `_format_process_error` with ONE narrow fallback: if the exception has no stderr/exit_code AND its message contains the specific "Check stderr output for details" marker, call the probe and append `probed_cli_error=<real error>` to the formatted line. Critically: the probe only runs in the narrow case where we have nothing else to log. If `.stderr` or `.exit_code` are present (the normal ProcessError path from #66), the probe is skipped — no wasted subprocess, no 30s latency on every error. ## Test coverage `workspace-template/tests/test_claude_sdk_executor.py` adds 3 new tests: - `test_format_process_error_probes_cli_when_stderr_swallowed` — the happy path: exception matches the marker, probe runs, result appears in the formatted line. Probe is monkeypatched so no subprocess spawns in the test. - `test_format_process_error_does_not_probe_when_stderr_already_present` — negative: regular ProcessError with `.stderr` set does NOT trigger the probe (skip the wasted call). - `test_format_process_error_does_not_probe_without_swallowed_marker` — negative: unrelated plain exceptions (e.g. RuntimeError) do NOT trigger the probe (so the common-case error path stays fast). All 7 `_format_process_error` tests pass locally (4 existing + 3 new): \`\`\` pytest tests/test_claude_sdk_executor.py -k format_process_error ======================= 7 passed in 0.06s ======================== \`\`\` ## Impact Next time the SDK swallows a real error (rate limit, auth failure, network outage), the workspace log will contain the actual error string alongside the generic placeholder: SDK agent error [claude-code]: Exception: Command failed with exit code 1 ... \| probed_cli_error="You've hit your limit · resets Apr 17, 11pm (UTC)" Diagnosis time drops from "docker exec each ws, run claude --print, read stderr" (~20 min) to "grep probed_cli_error in platform logs" (~10 seconds). Closes #160.	2026-04-15 11:50:55 -07:00
rabbitblood	376c9574a3	feat(hermes): Phase 1 — multi-provider registry (15 providers, back-compat preserved) Ships the first half of the queued Hermes adapter expansion. PR 2 only supported Nous Portal + OpenRouter; this adds 13 more providers reachable via OpenAI-compat endpoints. Native SDK paths for Anthropic + Gemini are Phase 2 (better tool-calling + vision fidelity). ## What's new `workspace-template/adapters/hermes/providers.py` (new file, 220 LOC): - ``ProviderConfig`` dataclass: name, env vars, base URL, default model, auth scheme, docs - ``PROVIDERS`` dict with 15 entries across 4 groups: - PR 2 baseline: nous_portal, openrouter - Frontier commercial: openai, anthropic, xai, gemini - Chinese providers: qwen, glm, kimi, minimax, deepseek - OSS/alt: groq, together, fireworks, mistral - ``RESOLUTION_ORDER`` tuple: priority for auto-detect (back-compat first, then commercial, then Chinese, then OSS/alt) - ``resolve_provider(explicit=None)`` -> (ProviderConfig, api_key) - With explicit name: routes to that provider, raises if env var empty - Without: walks RESOLUTION_ORDER, first env-var-set provider wins `workspace-template/adapters/hermes/executor.py` (refactored): - `create_executor(hermes_api_key=None, provider=None, model=None)` now has three parameters: - `hermes_api_key`: PR 2 back-compat — routes to Nous Portal - `provider`: canonical short name from the registry (e.g. "anthropic") - `model`: optional override of the provider's default model - Delegates all resolution to `providers.resolve_provider()` — no more hardcoded URLs or env var lookups in the executor itself - `HermesA2AExecutor.__init__` no longer has Nous-specific defaults; callers pass base_url + model explicitly (which create_executor always does) `workspace-template/tests/test_hermes_providers.py` (new file, 26 tests): - Registry shape invariants (count >= 15, no duplicates, every config valid) - PR 2 back-compat: HERMES_API_KEY / OPENROUTER_API_KEY still route correctly - Auto-detect for every provider in the registry (parametrized — guards against typos in env var lists) - Explicit `provider=` bypass of auto-detect - Error cases: unknown provider, explicit-but-empty, auto-detect-with-no-env - All 26 tests pass locally in 0.08s ## Back-compat guarantees \| Scenario \| PR 2 behavior \| This PR behavior \| \|---\|---\|---\| \| `create_executor(hermes_api_key="x")` \| Nous Portal \| Nous Portal (unchanged) \| \| `HERMES_API_KEY=x` env, auto-detect \| Nous Portal \| Nous Portal (unchanged) \| \| `OPENROUTER_API_KEY=x` env, auto-detect \| OpenRouter \| OpenRouter (unchanged) \| \| Both env + explicit hermes_api_key param \| Nous Portal (param wins) \| Nous Portal (param wins, unchanged) \| Nothing existing can break. New callers gain access to 13 more providers. ## What's NOT in this PR (Phase 2) - Native Anthropic Messages API path — better tool calling, vision, extended thinking. Requires pulling in `anthropic` SDK. ~50 LOC. - Native Gemini generateContent path — for vision + google tools. Requires `google-genai` SDK. ~50 LOC. - Streaming support across all providers — current executor is non-streaming (single chat.completions.create call). Streaming works with openai.AsyncOpenAI but hasn't been wired to the A2A event queue path. ~30 LOC. - Per-provider model overrides in config.yaml — Phase 1 uses the registry's default_model. Phase 2 adds a `hermes: { provider: qwen, model: qwen3-coder-plus }` block in the workspace config. - `.env.example` updates — not critical since the registry itself documents every env var via the `env_vars` field, but nice-to-have. ## Related - Queued memory: `project_hermes_multi_provider.md` - CEO directive 2026-04-15: "once current works are cleared, I want you to focus on supporting hermes agent, right now it doesnt take too much providers" - `docs/ecosystem-watch.md` → `### Hermes Agent` — Research Lead's eco-watch entry listed "Nous Portal, OpenRouter, GLM, Kimi, MiniMax, OpenAI, …" which shaped this registry's initial set ## Test plan - [x] Unit tests: 26/26 pass locally (pytest) - [ ] CI will run on the self-hosted macOS arm64 runner - [ ] Smoke test in a real workspace: set QWEN_API_KEY and verify Technical Researcher actually hits Alibaba DashScope successfully - [ ] Integration test per provider with real API keys (gated on env, skip when not set — Phase 2 CI addition)	2026-04-15 11:14:35 -07:00
Backend Engineer	1c07046332	fix(a2a): cancel() event, stateTransitionHistory capability, wire push store (#173 #174 #175 ) #173 — implement cancel() in LangGraphA2AExecutor: emits TaskStatusUpdateEvent(state=canceled, final=True) so clients see the state transition rather than silence. Removes pragma: no cover. Test: test_cancel_emits_canceled_event. #174 — add stateTransitionHistory=True to AgentCapabilities in main.py so microsoft/agent-framework clients know they can request full task history via the A2A protocol. #175 — wire InMemoryPushNotificationConfigStore and PushNotificationSender into DefaultRequestHandler so the advertised pushNotifications capability is backed by a real store. Both classes live in a2a.server.tasks (a2a-sdk 0.3.25); import confirmed by probe. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-15 17:58:10 +00:00
Hongming Wang	a2ea1b183b	Merge pull request #49 from Molecule-AI/feat/hermes-pr2 feat(hermes): implement create_executor() with HERMES_API_KEY / OPENROUTER_API_KEY fallback + smoke tests	2026-04-14 08:16:15 -07:00
Dev Lead Agent	b99497cd3f	fix(security): complete Phase 30.6 auth headers in a2a_client get_peers and discover_peer get_peers() was sending no auth headers to /registry/:id/peers — this would return 401 for every workspace agent after PR #31 (WorkspaceAuth middleware) deploys, breaking peer discovery entirely. discover_peer() had X-Workspace-ID but was missing the bearer token, also required by Phase 30.6 for /registry/discover/:id. Both functions now send {"X-Workspace-ID": WORKSPACE_ID, **auth_headers()}. get_workspace_info() was already correct (auth_headers() present since PR #39). Adds test_request_sends_workspace_id_header to TestGetPeers; hardens the discover_peer header assertion to use presence-check rather than exact equality. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-14 13:23:44 +00:00
Hongming Wang	892f41bc3e	fix(gate-3): update watcher test to expect SHA-256 hash Align test_hash_file_real_file with the SHA-256 switch in watcher.py. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-14 01:21:35 -07:00
Dev Lead Agent	7f3274391e	fix(security): H1 — replace MD5 with SHA-256 in config/skill watchers Both watcher.py (ConfigWatcher) and skill_loader/watcher.py (SkillsWatcher) used hashlib.md5() for file-integrity change detection. MD5 is collision-prone: a crafted config file could produce the same hash as a benign one, silently suppressing the hot-reload callback and preventing agents from picking up legitimate config changes. Replace hashlib.md5 → hashlib.sha256 in both _hash_file() methods. Update docstrings, comments, and the type-annotation comment (rel_path → md5 hex → sha256 hex). Test update: test_skills_watcher.py — rename helper _md5 → _sha256, update the hash-length assertion from 32 (MD5) to 64 (SHA-256), and rename the test from test_hash_file_returns_md5_for_existing_file to test_hash_file_returns_sha256_for_existing_file. All 25 watcher tests pass. Note: H2 (a2a_client.py timeout=None) was already fixed in Cycle 5 (timeout=httpx.Timeout(connect=30.0, read=300.0, ...)) — confirmed by code review before opening this PR. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-14 07:52:07 +00:00
Dev Lead Agent	bea0e96a86	fix(security): Cycle 5 — auth middleware, injection hardening, skill sandbox Fix A — platform/internal/middleware/wsauth_middleware.go (NEW): WorkspaceAuth() gin middleware enforces per-workspace bearer-token auth on ALL /workspaces/:id/* sub-routes. Same lazy-bootstrap contract as secrets.Values: workspaces with no live token are grandfathered through. Blocks C2, C3, C4, C5, C7, C8, C9, C12, C13 simultaneously. Fix A — platform/internal/router/router.go: Reorganised route registration: bare CRUD (/workspaces, /workspaces/:id) and /a2a remain on root router; all other /workspaces/:id/* sub-routes moved into wsAuth = r.Group("/workspaces/:id", middleware.WorkspaceAuth(db.DB)). CORS AllowHeaders updated to include Authorization so browser/agent callers can send the bearer token cross-origin. Fix B — workspace-template/heartbeat.py: _check_delegations(): validate source_id == self.workspace_id before accepting a delegation result. Attacker-crafted records with a foreign source_id are silently skipped with a WARNING log (injection attempt). trigger_msg no longer embeds raw response_preview text; references delegation_id + status only — removes the prompt-injection vector. Fix C — workspace-template/skill_loader/loader.py: load_skill_tools(): before exec_module(), verify script is within scripts_dir (path traversal guard) and temporarily scrub sensitive env vars (CLAUDE_CODE_OAUTH_TOKEN, ANTHROPIC_API_KEY, OPENAI_API_KEY, WORKSPACE_AUTH_TOKEN, GITHUB_TOKEN, GH_TOKEN) from os.environ; restore in finally block. Defence-in-depth even if /plugins auth gate is bypassed. Fix D — platform/internal/handlers/socket.go: HandleConnect(): agent connections (X-Workspace-ID present) validated via wsauth.HasAnyLiveToken + wsauth.ValidateToken before WebSocket upgrade. Canvas clients (no X-Workspace-ID) remain unauthenticated. Fix D — workspace-template/events.py: PlatformEventSubscriber._connect(): include platform_auth bearer token in WebSocket upgrade headers alongside X-Workspace-ID. Fix E — workspace-template/executor_helpers.py: recall_memories() and commit_memory() now pass platform_auth bearer token in Authorization header so WorkspaceAuth middleware allows access. Fix F — workspace-template/a2a_client.py: send_a2a_message(): timeout=None → httpx.Timeout(connect=30, read=300, write=30, pool=30). Resolves H2 flagged across 5 consecutive audits. Tests: 149/149 Python tests pass (test_heartbeat + test_events updated to assert new source_id validation behaviour and allow Authorization header). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-14 04:44:42 +00:00
Dev Lead Agent	791def3fdf	feat: implement Hermes adapter create_executor() with OpenRouter fallback Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-13 16:47:29 -07:00
Hongming Wang	24fec62d7f	initial commit — Molecule AI platform Forked clean from public hackathon repo (Starfire-AgentTeam, BSL 1.1) with full rebrand to Molecule AI under github.com/Molecule-AI/molecule-monorepo. Brand: Starfire → Molecule AI. Slug: starfire / agent-molecule → molecule. Env vars: STARFIRE_* → MOLECULE_*. Go module: github.com/agent-molecule/platform → github.com/Molecule-AI/molecule-monorepo/platform. Python packages: starfire_plugin → molecule_plugin, starfire_agent → molecule_agent. DB: agentmolecule → molecule. History truncated; see public repo for prior commits and contributor attribution. Verified green: go test -race ./... (platform), pytest (workspace-template 1129 + sdk 132), vitest (canvas 352), build (mcp). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-13 11:55:37 -07:00

49 Commits