molecule-core

Author	SHA1	Message	Date
Hongming Wang	eb9f063539	Merge pull request #925 from Molecule-AI/fix/issue-893-hitl-audit-log fix(hitl): emit log_event() on approval grant and denial — Art. 14 audit gap (closes #893)	2026-04-17 21:43:00 -07:00
Hongming Wang	d7324fdbfd	Merge pull request #913 from Molecule-AI/fix/issue-834-commit-memory-secret-scrub fix(security): redact secrets from commit_memory before persistence (closes #834)	2026-04-17 21:42:57 -07:00
Molecule AI Backend Engineer	04e2f37b95	fix(hitl): emit log_event() on approval grant and denial — Art. 14 audit gap (closes #893 ) The @requires_approval decorator and request_approval() call executed the approval gate correctly but never wrote the outcome to the activity log. EU AI Act Article 14 requires documented evidence that HITL measures were exercised — the missing log_event() calls meant GET /workspaces/:id/activity could not surface HITL gate outcomes. Add log_event() at both resolution points in the requires_approval wrapper: - Denial: event_type="hitl", action="approve", outcome="denied", actor=decided_by - Grant: event_type="hitl", action="approve", outcome="granted", actor=decided_by Both calls follow the existing try/except pattern used for audit calls elsewhere in hitl.py so a missing audit module never blocks the approval flow. Tests: TestRequiresApproval.test_logs_hitl_denied_event and test_logs_hitl_approved_event verify log_event is called with the correct outcome on each resolution path. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-18 03:10:26 +00:00
molecule-ai[bot]	c65150edf6	Merge pull request #891 from Molecule-AI/fix/issue-826-smol-executor-env-sanitization feat(security): denylist env sanitization + safe messaging for smolagents	2026-04-18 01:44:26 +00:00
rabbitblood	649a32b69b	fix: strip CRLF in entrypoint.sh at every container start Windows Docker Desktop copies host files with CRLF even when .gitattributes says eol=lf. The entrypoint now strips \r from all hook .sh/.py files before dropping to agent user. Permanent fix for the #507 CRLF regression that reappeared after every restart. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-17 18:06:04 -07:00
molecule-ai[bot]	d26c8516f9	Merge pull request #890 from Molecule-AI/test/issue-790-crash-resume-integration test(integration): crash-resume integration tests for Temporal checkpoints (#790)	2026-04-18 00:02:48 +00:00
Molecule AI Backend Engineer	d226094a98	feat(security): denylist env sanitization + safe messaging for smolagents (#826 , #827 ) Add safe_env.py (denylist-based make_safe_env), send_message_wrapper.py (label prefix, 2000-char cap, HTML entity escaping) and 33 pytest tests covering all four security properties. Update __init__.py to re-export safe_send_message alongside the existing allowlist-based make_safe_env. Closes #826, closes #827 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-17 23:57:59 +00:00
Molecule AI Backend Engineer	1e7db61f94	feat(hermes): stacked system messages — persona + tools + reasoning policy (#499 ) HermesA2AExecutor now supports sending system context as ordered, separate role=system messages instead of a single concatenated string — the model format recommended by NousResearch. Changes: - HermesA2AExecutor.__init__: new system_blocks kwarg (list[str\|None]\|None) stored as an independent copy; None blocks and empty strings silently skipped - _build_messages(): when system_blocks is not None, emits each non-empty block as a separate {"role": "system"} entry in Hermes-recommended order (persona → tools context → reasoning policy); falls through to legacy system_prompt path when system_blocks is None (backward compatible) Backward compatibility: existing callers that pass a single system_prompt string continue to work identically — no changes required. Tests (12 new, 47 total): - system_blocks stored as independent copy (mutation safe) - three-block stacked ordering preserved - empty / None blocks silently skipped - all-empty list → zero system messages - system_blocks overrides system_prompt when both provided - legacy system_prompt path unchanged - stacked blocks appear in the live API call kwargs Closes #499 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-17 23:53:12 +00:00
Molecule AI Backend Engineer	c1b4dca5b7	fix(security): redact secrets from commit_memory payloads (#834 ) Add _redact_secrets() in builtin_tools/security.py and apply it at every commit_memory call site before content reaches the memories table. Patterns scrubbed (replaced with [REDACTED]): - sk-[A-Za-z0-9_-]{20,} OpenAI/Anthropic keys (sk-, sk-ant-, sk-proj-) - ghp_[A-Za-z0-9]{36} GitHub classic PAT - ghs_[A-Za-z0-9]{36} GitHub server-to-server token - github_pat_[A-Za-z0-9_]{82} GitHub fine-grained PAT - AKIA[0-9A-Z]{16} AWS access key ID - key/token/secret/password/api_key=<40+ chars> Generic contextual (value replaced, keyword preserved: "api_key=[REDACTED]" not "[REDACTED]") Call sites wired: - builtin_tools/memory.py::commit_memory() — LangChain tool (LangGraph path) - a2a_tools.py::tool_commit_memory() — MCP server path - executor_helpers.py::commit_memory() — CLI/SDK executor path Implementation guarantees: - Pure function (no side effects, no I/O) - Idempotent: [REDACTED] does not match any pattern - No false positives on normal prose (all patterns require ≥20-char prefix or ≥40-char value after known keyword) Tests (36 passing): - Per-pattern unit tests for all 6 secret types - Idempotency tests - Normal prose non-regression tests - Integration: a2a_tools.tool_commit_memory scrubs ghp_ tokens before HTTP POST - Integration: executor_helpers.commit_memory scrubs AWS keys and OpenAI keys - Source inspection: memory.py imports and applies _redact_secrets before build_awareness_client() (i.e. before any storage operation) conftest.py updated to load the real builtin_tools/security.py so that executor_helpers and a2a_tools can import _redact_secrets during test collection. Closes #834 Sub-issue of #725 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-17 22:43:50 +00:00
Molecule AI Backend Engineer	210c6b5b2c	fix(security): allowlist-based env sanitization for LocalPythonExecutor (#826 ) Replace denylist approach with strict allowlist: only PATH, HOME, LANG, PYTHONPATH, WORKSPACE_ID, WORKSPACE_NAME, PLATFORM_URL (and a small set of locale/Python runtime vars) pass through to agent-executed code. Every other env var — including ANTHROPIC_API_KEY, GH_TOKEN, DATABASE_URL, REDIS_URL, _SECRET, _PASSWORD — is stripped from os.environ for the duration of SafeLocalPythonExecutor.__call__ and restored on exit. - make_safe_env() is a pure read (never mutates os.environ) - _ENV_PATCH_LOCK serialises concurrent calls for thread safety - os.environ fully restored even on exception (try/finally) - 38 unit tests covering all secret categories, thread safety, import restrictions, and env-restore guarantees Closes #826 Sub-issue of #804 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-17 21:54:11 +00:00
Molecule AI QA Engineer	94cee3fdb6	test(integration): crash-resume integration tests for Temporal checkpoints (#790 ) Closes #790. Depends on feat/issue-583-1-checkpoint-persistence (PR #788). Platform (Go) — checkpoints_integration_test.go (5 new tests): 1. ThreeStepPersistence: POST task_receive/llm_call/task_complete → GET returns all 3 in step_index DESC order with correct names and payloads. 2. CrashResume_HighestStepIsResumptionPoint: POST steps 0+1 only (crash before step 2) → GET shows step_index=1 as the resume point; task_complete absent. 3. UpsertIdempotency_LatestPayloadWins: POST same (wf_id, step_name) twice with different payloads → List returns only the second payload (ON CONFLICT DO UPDATE). 4. PostCascadeDelete_Returns404: simulate post ON-DELETE-CASCADE state (empty rows) → List returns 404 as expected after workspace deletion. 5. AuthGate_NoToken_Returns401: router-level test with WorkspaceAuth middleware; POST/GET/DELETE all return 401 without a bearer token (no DB calls made). workspace-template — _save_checkpoint + 4 Python tests: - Add async _save_checkpoint() to temporal_workflow.py: POST to the platform checkpoint endpoint after each activity stage; fully non-fatal (try/except inside the function, plus defence-in-depth try/except at every call site). - 4 new pytest cases (test_temporal_workflow.py): - nonfatal_on_http_error: _save_checkpoint raises HTTPStatusError (500) → task_receive_activity still returns {"status":"received"}. - nonfatal_on_network_error: _save_checkpoint raises ConnectError → llm_call_activity still returns success LLMResult. - success_path: _save_checkpoint no-op → activity returns correctly; checkpoint called with correct args. - standalone_http_error_is_swallowed: real _save_checkpoint function swallows HTTP 500 from a mocked httpx.AsyncClient; returns None. All 36 temporal workflow Python tests pass. Go tests: Go binary not in this container; test file verified for syntax and against the sqlmock patterns used throughout the handlers package. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-17 19:17:29 +00:00
molecule-ai[bot]	255c888ca1	Merge pull request #651 from Molecule-AI/feat/issue-594-audit-ledger feat: molecule-audit-ledger — HMAC-SHA256 immutable agent event log (#594)	2026-04-17 16:37:01 +00:00
molecule-ai[bot]	071bc4106c	Merge pull request #763 from Molecule-AI/feat/issue-733-agents-md-impl feat(#733): implement AGENTS.md auto-generation	2026-04-17 16:21:58 +00:00
Molecule AI Triage Operator	f5f8579c7a	fix(gate-6): restore claude-opus-4-7 default — reverted by pre-#743 branch PR #763 (feat/issue-733-agents-md-impl) branched before PR #743 landed the claude-opus-4-7 model default upgrade. config.py still had the old claude-sonnet-4-6 default, which would have silently regressed the upgrade. Restore both occurrences: - WorkspaceConfig.model default: claude-sonnet-4-6 → claude-opus-4-7 - load_config() fallback: claude-sonnet-4-6 → claude-opus-4-7 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-17 16:21:04 +00:00
molecule-ai[bot]	a954e2fe87	feat(#733 ): implement AGENTS.md auto-generation Turns the QA TDD spec from PR #755 GREEN: all 14 tests pass. Changes: - workspace-template/agents_md.py (new): generate_agents_md(config_dir, output_path) Writes AAIF-compliant AGENTS.md with name, role, description, A2A endpoint, and MCP tools sections. AGENT_URL env var overrides the derived localhost URL. Falls back to description when role is absent (graceful legacy compat). Always overwrites — no stale-file guard. - workspace-template/config.py: add role field to WorkspaceConfig New top-level field `role: str = ""` with load_config support. Falls back to description in agents_md.py for backward compat. - workspace-template/main.py: wire generate_agents_md into startup (step 1a) Fires after load_config + preflight. Non-fatal: exception is caught and printed as a warning so a bad /workspace mount never kills the agent. - workspace-template/tests/test_agents_md.py (new): pulled from PR #755 branch Test results: pytest tests/test_agents_md.py -v → 14 passed (was: 14 RED / import error) pytest (full suite) → 1044 passed, 2 xfailed Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-17 16:21:04 +00:00
molecule-ai[bot]	3a0e7e8fac	feat(#733 ): implement AGENTS.md auto-generation	2026-04-17 16:20:39 +00:00
molecule-ai[bot]	487a4e08ff	refactor(#741 ): extract medo.py from builtin_tools to plugins/molecule-medo The Baidu MeDo hackathon integration was sitting in builtin_tools/ as dead code — not imported by any loader but shipped with every workspace image, misleadingly suggesting it was a core builtin. Changes: - Move builtin_tools/medo.py → plugins/molecule-medo/skills/medo-tools/scripts/medo.py (git detects this as a rename — no code changes, identical tool surface) - Add plugins/molecule-medo/plugin.yaml (manifest: name, version, runtimes, tags) - Add plugins/molecule-medo/skills/medo-tools/SKILL.md (frontmatter + setup docs) - Move workspace-template/tests/test_medo.py → plugins/molecule-medo/tests/test_medo.py (update _MEDO_PATH to resolve from plugin root; add conftest.py for langchain mock) - Update .gitignore: change /plugins/ blanket ignore to /plugins/* so this plugin can be tracked until it gets its own standalone repo Acceptance criteria met: - builtin_tools/medo.py removed from core - plugins/molecule-medo/ created with identical tool surface (9/9 tests pass) - cd workspace-template && pytest → 1021 passed, 2 xfailed (no regression) - MEDO_API_KEY was never in default provisioning (.env.example / config.py clean) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-17 16:03:50 +00:00
Molecule AI Backend Engineer	ec4309138b	feat: upgrade default workspace model to claude-opus-4-7 (#727 ) Replace the anthropic:claude-sonnet-4-6 default across config, handlers, env example, and litellm proxy config. All tests updated to match the new default; sonnet-4-6 alias kept in litellm_config.yml for pinned workspaces. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-17 15:30:57 +00:00
molecule-ai[bot]	c825e44b50	Merge pull request #659 from Molecule-AI/infra/rebuild-runtime-images-script infra: add rebuild-runtime-images.sh — patches all 6 adapter images with git credential helper (#658)	2026-04-17 10:59:33 +00:00
Molecule AI DevOps Engineer	06938e8335	fix(security): allowlist-validate runtime arg in rebuild-runtime-images.sh The optional $1 argument flowed directly into Docker image tag names (workspace-template:<runtime>) and filesystem paths (RUNTIME_DIR) with no validation, enabling path traversal or unexpected tag injection via e.g. `bash rebuild-runtime-images.sh '../evil'`. Fix: introduce VALID_RUNTIMES allowlist and validate $1 against it before setting RUNTIMES. Any unlisted value now exits with a clear error message. The RUNTIMES array is populated from VALID_RUNTIMES when no argument is given, keeping the all-runtimes default path. shellcheck clean; $1 only appears inside the validated block. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-17 10:27:11 +00:00
Molecule AI DevOps Engineer	cd6c82030d	fix(infra): rename TMPDIR→RUNTIME_DIR, fix PIPESTATUS docker exit check Bug 1: TMPDIR is a POSIX-reserved variable used by mktemp, Docker BuildKit, and git subprocesses as their system temp directory. Overwriting it redirected those tools to the build context, causing unpredictable failures. Renamed all 6 occurrences to RUNTIME_DIR. Bug 2: `docker build ... \| grep` made grep's exit code (0=match, 1=no match) determine if the build succeeded, not docker's. Fixed by reading PIPESTATUS[0] immediately after the pipeline so docker's real exit code drives the SUCCESS/FAILED tracking. Also fixed two pre-existing shellcheck warnings: - SC2034: removed unused REPO_ROOT variable - SC2064: trap now uses single quotes so TMPBASE expands at signal time shellcheck clean with no warnings. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-17 10:25:43 +00:00
Molecule AI QA Engineer	dc2c5817bc	test: add _load_config_dict coverage for issue #652 Cover the four paths that were exercised only via mock in the _build_options tests: valid YAML, missing file, malformed YAML, and empty file (safe_load → None → {} via `or {}`). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-17 10:08:45 +00:00
Molecule AI Backend Engineer	e11e077027	feat(issue-652): wire effort and task_budget to claude sdk output_config Adds _load_config_dict() helper to ClaudeSDKExecutor and wires the new effort and task_budget config fields into _build_options() before the Anthropic API call: - effort (str): low\|medium\|high\|xhigh\|max — populates output_config.effort - task_budget (int): advisory total-token budget; must be >= 20000 when set; automatically adds task-budgets-2026-03-13 beta header Also adds WorkspaceConfig.effort and WorkspaceConfig.task_budget fields in config.py and 5 acceptance tests covering all code paths. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-17 07:33:07 +00:00
Molecule AI Backend Engineer	3895e02e01	fix(security): address Security Auditor findings on audit-ledger (#651 ) - Replace == HMAC comparisons with hmac.compare_digest (Python) and hmac.Equal (Go) in ledger.py, verify.py, and audit.go to prevent timing oracle attacks (Fixes 1-6) - Increase PBKDF2 iterations from 100K to 210K in both ledger.py and audit.go — must match for cross-language verification (Fix 7) - Return chain_valid: null when offset > 0 (paginated views cannot verify a truncated chain; null means "not computed") (Fix 8) - Remove module-level AUDIT_LEDGER_SALT attribute from ledger.py; read the secret exclusively from os.environ inside _get_hmac_key() so the salt is not exposed in the module namespace (Fix 9) - Update tests: use monkeypatch.setenv/delenv instead of setattr on the removed AUDIT_LEDGER_SALT attribute; update testAuditKey helper to use 210K iterations; add TestAuditQuery_PaginatedOffsetReturnsNullChainValid - Fix migration 028: workspace_id column type TEXT → UUID to match workspaces.id UUID primary key All tests pass: 1043 pytest + 0 Go test failures. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-17 07:30:10 +00:00
Molecule AI DevOps Engineer	b13dbc212b	infra: add rebuild-runtime-images.sh for post-PR#640 image fix (#658 ) Standalone adapter images (langgraph, claude-code, etc.) use ENTRYPOINT ["molecule-runtime"] which bypasses entrypoint.sh. PR #640's entrypoint.sh fix therefore never runs in adapter images. The correct fix is to bake git config --system into the image at build time. This script: 1. Rebuilds workspace-template:base from the monorepo Dockerfile (which has the fixed entrypoint.sh and molecule-git-token-helper.sh) 2. For each of the 6 runtime adapters: clones the standalone repo, patches its Dockerfile to COPY the credential helper and run git config --system, then builds the final image tagged as workspace-template:<runtime> Usage (run on the host machine, not inside a workspace container): bash workspace-template/rebuild-runtime-images.sh # all 6 bash workspace-template/rebuild-runtime-images.sh claude-code # one See issue #658 for the architectural explanation. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-17 07:14:12 +00:00
Molecule AI Triage Operator	fad6aad734	fix(merge): combine response_format (#498 ) and tools (#497 ) in hermes_executor Both PRs restructured the same chat.completions.create() call to use a create_kwargs dict. Resolved by keeping both __init__ params and both conditionals in the create_kwargs block. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-17 07:03:22 +00:00
molecule-ai[bot]	816ea3e565	feat(hermes): native tools=[] parameter instead of text-in-prompt workaround (#497 ) feat(hermes): native tools=[] parameter instead of text-in-prompt workaround (#497)	2026-04-17 06:56:10 +00:00
Molecule AI Backend Engineer	fff063bd15	feat: molecule-audit-ledger — HMAC-SHA256 immutable agent event log (#594 ) Implements EU AI Act Annex III compliance (Art. 12 record-keeping, Art. 13 transparency) via an append-only HMAC-SHA256-chained agent event log. Python (workspace-template/molecule_audit/): - ledger.py: SQLAlchemy 2.0 AuditEvent model + PBKDF2 key derivation + append_event() with prev_hmac chain linkage + verify_chain() CLI helper. - hooks.py: LedgerHooks — on_task_start/on_llm_call/on_tool_call/on_task_end pipeline hooks; exception-safe (_safe_append); context manager support. - verify.py: `python -m molecule_audit.verify --agent-id <id>` CLI; exits 0=valid, 1=broken, 2=missing SALT, 3=DB error. - tests/test_audit_ledger.py: 46 tests covering HMAC determinism, field sensitivity, chain verification, LedgerHooks lifecycle, CLI. Go (platform/): - migrations/028_audit_events.up.sql: audit_events table with indexes. - internal/handlers/audit.go: GET /workspaces/:id/audit — parameterized queries, inline chain verification (chain_valid: bool\|null), PBKDF2 key cached via sync.Once. - internal/handlers/audit_test.go: 14 tests — HMAC, chain verify, handler query/filter/pagination/cap/error paths. - internal/router/router.go: wire wsAuth.GET("/audit", audh.Query). - .env.example: document AUDIT_LEDGER_SALT. - requirements.txt: add sqlalchemy>=2.0.0. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-17 06:55:36 +00:00
Molecule AI DevOps Engineer	6169135954	fix(template): copy molecule-git-token-helper.sh into image and fix path Two bugs prevented the git credential helper (merged in #567) from ever running at workspace boot: 1. Dockerfile never COPY'd scripts/molecule-git-token-helper.sh into the image — only gh-wrapper.sh was copied from scripts/. Result: the helper binary did not exist in any built container image. 2. entrypoint.sh looked for the helper at /workspace-template/scripts/... but /workspace-template/ is not a path that exists inside the container (WORKDIR is /app, no /workspace-template mount). The `if [ -f ... ]` guard silently fell through to the WARNING branch on every boot since #567 merged — the helper was never registered. Fix: - Add `COPY scripts/molecule-git-token-helper.sh ./scripts/` to Dockerfile so the script lands at /app/scripts/ in the image (matching WORKDIR /app) - Update HELPER_SCRIPT path in entrypoint.sh from /workspace-template/scripts/... to /app/scripts/... After this fix, every workspace container registers the helper at boot via: git config --global credential.https://github.com.helper \ "!/app/scripts/molecule-git-token-helper.sh" Closes #613. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-17 06:27:08 +00:00
Molecule AI Triage Operator	2c80f5fe82	fix(plugins_registry): deduplicate handlers in _deep_merge_hooks() Unconditional list.extend() on repeated plugin install caused every hook handler to be appended on each reinstall, leading to 3-4x duplicate firings per event (PreToolUse, PostToolUse, Stop, etc.). Fix: before appending each incoming handler, compute a fingerprint of (matcher, frozenset-of-commands). Skip append if the fingerprint is already present in the merged list. First-time installs are unaffected — new handlers still land correctly. Adds 7 unit tests covering: first install, double install, triple install, different-matcher co-existence, different-command co-existence, existing user hook preservation, and top-level key merge semantics. Closes #566 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-17 05:22:00 +00:00
Molecule AI Backend Engineer	f76d356e75	feat(hermes): plumb response_format=json_schema for structured output (#498 ) Adds response_format support to HermesA2AExecutor so callers can request structured JSON output via the OpenAI-native response_format parameter. Changes: - _validate_response_format(): validates type (json_schema/json_object/text) and required sub-fields; returns None if valid, error message if invalid - HermesA2AExecutor.__init__: new response_format kwarg, stored as _response_format - execute(): validates before API call — invalid schema enqueues error and returns early without hitting Hermes API; valid and non-None adds response_format= to create_kwargs; None omits the field entirely Tests (12 new): - _validate_response_format: all valid types, invalid type, missing fields - constructor stores response_format correctly - valid response_format forwarded to API call - response_format omitted when None (no key in call kwargs) - invalid schema → error message enqueued, API not called Closes #498 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-17 01:19:51 +00:00
Molecule AI Backend Engineer	ec8c861969	feat(hermes): pass tools via native tools[] parameter instead of text-in-prompt (#497 ) Instead of injecting tool definitions as text into the system prompt, HermesA2AExecutor now accepts a tools: list[dict] \| None constructor parameter containing OpenAI-format tool definitions and forwards them via the native tools= parameter on chat.completions.create(). Empty list / None rule: when tools is falsy, the tools key is omitted from the API call entirely — never sent as tools=[] — so providers that reject an empty tools array don't return a 400. Tool-call response handling: when the model returns finish_reason "tool_calls" with no text content, the executor serialises the call list as a JSON string and enqueues it as the A2A reply. This keeps the executor thin (single API call per turn, no ReAct loop) while surfacing function-call intent in a structured, parseable format. Changes: - HermesA2AExecutor.__init__: new tools kwarg; stored as self._tools (copy; mutating the input list has no effect) - execute(): builds create_kwargs dict and conditionally adds tools= only when self._tools is non-empty; handles tool_calls response - Module docstring: new "Native tools (#497)" section with schema reference and edge-case explanation Tests (12 new, 47 total in hermes test file, 1002 total suite): - tools stored correctly in constructor (copy, None, [], non-empty) - non-empty tools forwarded as tools= in API call - multiple tools all forwarded - empty list ([] and None and default) → tools key absent from call - model tool_call response → JSON-serialised list as A2A reply - multiple tool_calls → all in JSON reply - text content present → text wins over tool_calls Closes #497 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-17 01:00:23 +00:00
molecule-ai[bot]	3b5affb0d1	fix(github): refresh installation token when TTL < 10 min (#547 ) (#567 ) Root cause: the github-app-auth plugin injects GH_TOKEN + GITHUB_TOKEN into each workspace container's env at provision time (EnvMutator). Those are GitHub App installation tokens with a fixed ~60 min TTL. The plugin has an in-process cache that proactively refreshes 5 min before expiry — but the workspace env is set once at container start and never updated. Any workspace alive >60 min ends up with an expired token. Fix (Option B — on-demand endpoint): pkg/provisionhook: - Add TokenProvider interface: Token(ctx) (token, expiresAt, error) Lives in pkg/ (public) so the github-app-auth plugin can implement it. - Add Registry.FirstTokenProvider() — discovers the first mutator that also satisfies TokenProvider via interface assertion. Safe under concurrent reads (existing RWMutex). platform/internal/handlers/github_token.go: - New GitHubTokenHandler serving GET /admin/github-installation-token - Delegates to the registered TokenProvider (plugin cache — always fresh) - 404 if no GitHub App configured, 500 + [github] prefix log on error - Never logs the token itself platform/internal/handlers/workspace.go: - Add TokenRegistry() getter so the router can wire the handler without coupling to WorkspaceHandler internals platform/internal/router/router.go: - Register GET /admin/github-installation-token under AdminAuth workspace-template/: - scripts/molecule-git-token-helper.sh — git credential helper; calls the platform endpoint on every push/fetch; falls through to next helper (operator PAT) if platform unreachable - entrypoint.sh — configure the credential helper at startup Why Option B over Option A (background goroutine): - The plugin already has its own cache refresh; nothing to refresh here. - Pushing env updates into running containers requires docker exec, which the architecture explicitly rejects (issue #547 "Alternatives"). - Pull-based is stateless, trivially testable, zero extra goroutines. Closes #547 Co-authored-by: Molecule AI DevOps Engineer <devops-engineer@agents.moleculesai.app> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-17 00:47:03 +00:00
Molecule AI Backend Engineer	4e14bc71b3	feat(adapters): add Google ADK runtime adapter (#542 ) Implements WorkspaceAdapter for Google's Agent Development Kit (google-adk v1.x, Apache-2.0). Ships four files under workspace-template/adapters/google-adk/: - adapter.py — GoogleADKAdapter + GoogleADKA2AExecutor (100% test coverage) - requirements.txt — pinned google-adk==1.30.0 + google-genai>=1.16.0 - README.md — overview, install, usage, config, architecture diagram - test_adapter.py — 46 unit tests, all passing, no live API calls Supports AI Studio (GOOGLE_API_KEY) and Vertex AI (GOOGLE_GENAI_USE_VERTEXAI=1). Model prefix stripping: "google:gemini-2.0-flash" → "gemini-2.0-flash". Error sanitization mirrors the hermes_executor convention. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-17 00:08:17 +00:00
Molecule AI Backend Engineer	b7c8f18ab2	feat(hermes): expose reasoning mode for Hermes 4 via OpenAI-compat API (#496 ) Hermes 4 is a hybrid-reasoning model trained on <think> tags; without asking for thinking we pay flagship $/tok but get non-reasoning quality. This adds a dedicated HermesA2AExecutor that dispatches to any OpenAI-compat endpoint (OpenRouter, Nous Portal) and enables native reasoning for Hermes 4 models. Key decisions: - ProviderConfig + _reasoning_supported() detect Hermes 4 by model slug substring ("hermes-4", "hermes4") — case-insensitive, no config needed - extra_body={"reasoning": {"enabled": True}} sent only to Hermes 4 entries; Hermes 3 path unchanged (no extra_body, no regressions) - choices[0].message.reasoning + reasoning_details extracted and written to an OTEL span (hermes.reasoning) — deliberately NOT echoed in the A2A reply so the reasoning trace never contaminates the agent's next-turn context - API key / base URL default to OPENAI_API_KEY / OPENAI_BASE_URL env vars with openrouter.ai/api/v1 as the fallback endpoint - _client injection parameter for unit tests (no live API calls needed) - Error sanitization: only exception class name surfaces to user (mirrors sanitize_agent_error() convention from cli_executor.py) Test coverage: 35 tests, 100% coverage on all new code paths including: - _reasoning_supported() — Hermes 4/3/unknown/empty/uppercase - ProviderConfig — field assignment and capability flags - extra_body presence for Hermes 4, absence for Hermes 3 - reasoning not in A2A reply; _log_reasoning called when trace present - reasoning_details forwarded; span attributes set correctly - Telemetry failure swallowed (never blocks response) - API error → sanitized class-name-only reply - cancel() → TaskStatusUpdateEvent(state=canceled) Full suite: 990 passed, 0 failed (no regressions). Resolves #496 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-16 20:38:45 +00:00
Hongming Wang	510c40089f	fix: address all code review findings + remove exposed secrets Code review fixes: - 🟡 #1: Replace python3 with jq in Dockerfile template stages (~50MB → ~2MB) - 🟡 #2: Add clone count verification to scripts/clone-manifest.sh (set -e + expected vs actual count check — fails build if any clone fails) - 🟡 #3: Drop 'unsafe-eval' from CSP (not needed for Next.js production standalone builds, only dev mode). Updated test assertion. - 🟡 #4: Remove broken pyproject.toml from workspace-template/ (it claimed to package as molecule-ai-workspace-runtime but the directory structure didn't match — the real package ships from the standalone repo) - 🔵 #1: Add version-pinning TODO comment to manifest.json - 🔵 #3: Add full repo URLs + test counts for SDK/MCP/CLI/runtime in CLAUDE.md Security (GitGuardian alert): - Removed Telegram bot token (8633739353:AA...) from template-molecule-dev pm/.env — replaced with ${TELEGRAM_BOT_TOKEN} placeholder - Removed Claude OAuth token (sk-ant-oat01-...) from template-molecule-dev root .env — replaced with ${CLAUDE_CODE_OAUTH_TOKEN} placeholder - Both tokens need immediate rotation by the operator Tests: Platform middleware tests updated + all pass.	2026-04-16 05:05:49 -07:00
Hongming Wang	2347d6a80b	fix: properly remove adapter subdirectories + move shared code to root PR #471 removed Dockerfiles/requirements from adapters/ but left the Python source files. This commit finishes the extraction: 1. Moved shared_runtime.py → workspace-template/shared_runtime.py (used by prompt.py, a2a_executor.py, coordinator.py — not adapter-specific) 2. Moved base.py → workspace-template/adapter_base.py (BaseAdapter + AdapterConfig — the interface adapters implement) 3. Updated imports in prompt.py, a2a_executor.py, coordinator.py 4. Rewritten adapters/__init__.py as a thin shim that: - Reads ADAPTER_MODULE env var (production: standalone repos set this) - Re-exports BaseAdapter/AdapterConfig for backward compat 5. adapters/base.py + adapters/shared_runtime.py remain as re-export shims 6. Deleted all 8 adapter subdirectories (autogen, claude_code, crewai, deepagents, gemini_cli, hermes, langgraph, openclaw) 7. Removed 11 test files that imported adapter-specific code Tests: 955 passed, 0 failed (down from 1216 — the difference is adapter-specific tests that moved to standalone repos).	2026-04-16 04:59:13 -07:00
Hongming Wang	c0af9cbde2	fix: remove tests that referenced removed plugins/ directory test_first_party_plugins.py, test_plugins_builtins_drift.py, and test_hermes_adapter.py all referenced files under plugins/ and adapters/ which were extracted to standalone repos. These tests belong in those repos now, not in the core workspace-template. 1216 passed, 0 failed after removal.	2026-04-16 04:39:31 -07:00
Hongming Wang	ab1562f3fe	chore: remove adapter Dockerfiles and requirements.txt from monorepo These files have moved to the standalone template repos: https://github.com/Molecule-AI/molecule-ai-workspace-template-<runtime> Each adapter repo now has its own Dockerfile (FROM python:3.11-slim + pip install molecule-ai-workspace-runtime) and requirements.txt. The adapter Python source files (.py) stay in the monorepo for local development and testing. Adapters removed from workspace-template/adapters/*/: Dockerfile, requirements.txt Adapters retained: adapter.py, __init__.py (+ hermes extras: escalation.py, executor.py, providers.py) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-16 04:33:22 -07:00
Hongming Wang	03f6fc81dd	chore: extract workspace runtime to PyPI + move adapter Dockerfiles to template repos Published `molecule-ai-workspace-runtime==0.1.0` to PyPI: https://pypi.org/project/molecule-ai-workspace-runtime/0.1.0/ Source repo: https://github.com/Molecule-AI/molecule-ai-workspace-runtime Each adapter's Dockerfile and requirements.txt have moved to the corresponding standalone template repo (molecule-ai-workspace-template-<runtime>). The adapter Python code (.py files) stays in the monorepo for local dev and testing. Changes: - workspace-template/pyproject.toml — new, packages the shared runtime as a PyPI package - workspace-template/adapters//Dockerfile — removed (now in template repos) - workspace-template/adapters//requirements.txt — removed (now in template repos) - workspace-template/Dockerfile — drop COPY adapters/ (still copies .py files via *.py glob) - workspace-template/build-all.sh — simplified to base-image-only build - workspace-template/entrypoint.sh — remove adapter requirements.txt install step - workspace-template/tests/test_hermes_adapter.py — skip Dockerfile/requirements.txt checks - CLAUDE.md — update architecture description + workspace image table - docs/workspace-runtime-package.md — new, explains the package + adapter repo layout Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-16 04:33:10 -07:00
rabbitblood	67675880cb	feat(workspace): gh-wrapper — auto-tag agent PRs + issues with role Every agent in the template currently uses the same GitHub PAT, so \`gh pr list\` shows every PR as authored by the CEO's account with no signal which agent opened each one. Commits already carry per-agent authors (GIT_AUTHOR_NAME from #402). This wrapper extends the identity split to the PR/issue metadata surface layer that commit attribution can't reach. ## How it works A tiny bash script installed at \`/usr/local/bin/gh\`, which sits earlier in PATH than the real binary at \`/usr/bin/gh\`. For \`gh pr create\` and \`gh issue create\`: - Title gets prefixed with \`[Role Name]\` — e.g. \`[Frontend Engineer] fix: canvas grid index\` - Body gets \`\n\n---\n_Opened by: Molecule AI <Role>_\` appended Role is read from \`GIT_AUTHOR_NAME\` which the platform provisioner sets to \`Molecule AI <Role>\` (shipped with #402). Accepts both \`--title X\` and \`--title=X\` forms. Same for \`--body\`. Anything that isn't \`gh pr create\` or \`gh issue create\` (e.g. \`gh pr list\`, \`gh issue view\`, \`gh run watch\`) passes through untouched. No behaviour change for read-side operations. ## Idempotent - If the title already starts with \`[...]\` the wrapper does not re-prefix. \`gh pr edit\` flows that resubmit title won't layer multiple tags. - If the body already contains \`Opened by: Molecule AI\` the footer is not re-appended. ## Fail-open When \`GIT_AUTHOR_NAME\` is absent or doesn't start with \`Molecule AI \`, the wrapper exec's the real gh with unchanged args. No call is ever blocked by this script. ## Test coverage \`tests/test_gh_wrapper.sh\` — 12 cases, no network, no Docker: - Passthrough for non-create subcommands (pr list) - pr create title prefix + body footer - issue create with \`--title=X\` \`--body=X\` equals-form - Idempotent title re-prefix - Idempotent body footer (count = 1 after two applies) - Missing GIT_AUTHOR_NAME → passthrough, title preserved - Malformed GIT_AUTHOR_NAME (not "Molecule AI ...") → passthrough All 12 pass. Test script is standalone bash + a temp fake gh binary that echoes argv; safe to run in CI's Python Lint & Test job via subprocess shell-out. ## Deployment note This lands in the workspace image. Existing containers keep their old /usr/bin/gh until the image is rebuilt and they're re-provisioned (POST /workspaces/:id/restart {}). No migration required; the wrapper just starts tagging PRs once the new image is rolled. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-16 03:10:46 -07:00
rabbitblood	f76ddba0f5	fix(tests): test_hermes_phase2_dispatch exec-load needs escalation + __name__ Phase 3 escalation ladder added `from .escalation import ...` to executor.py. The phase-2 dispatch tests load executor.py via `exec(compile(src, ...))` with the relative import rewritten — this broke because (a) the rewrite didn't know about escalation and (b) the exec namespace lacked `__name__`, which executor.py needs at import time for `logging.getLogger(__name__)`. Fix both in all 8 exec sites: - Rewrite both `from .providers import` AND `from .escalation import` - Pre-register escalation + providers in sys.modules under the fake package name - Seed the exec namespace with `__name__ = "hermes_executor_under_test"` 54/54 hermes tests pass (28 escalation truth-table + 6 ladder-integration + 20 existing phase-2 dispatch). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-16 02:43:02 -07:00
rabbitblood	7d732cec3c	feat(hermes): escalation ladder — promote to stronger models on transient failure Ships scoped Phase 3 of the Hermes multi-provider work. Every workspace can now declare an ordered list of (provider, model) rungs; when the pinned model hits rate-limit / 5xx / context-length / overload, the executor advances to the next rung before raising. ## Why 3× Claude Max saturation is a routine occurrence now — the "first 429 on a batch delegation" is the common path, not the exception. A workspace pinned to Haiku that hits a context-length limit has no recovery today; same for Sonnet hitting rate-limit mid-synthesis. Escalation promotes to the next tier for that single call, preserves coordination, avoids restart cascades. ## New module: adapters/hermes/escalation.py - ``LadderRung(provider, model)`` — one config entry. - ``parse_ladder(raw)`` — tolerant config parser; skips malformed rungs with a warning rather than raising so boot stays resilient. - ``should_escalate(exc) -> bool`` — truth table over 15+ error shapes: - Typed classes (RateLimitError, OverloadedError, APITimeoutError, APIConnectionError, InternalServerError) - Context-length markers (each provider uses different phrasing) - Gateway markers (502/503/504, overloaded, temporarily unavailable) - Status-code substrings (429, 529, 5xx) - Hard-rejects auth failures (401/403/invalid_api_key) even if the outer exception class is RateLimitError — wrapping case matters. ## Executor wiring ``HermesA2AExecutor`` now accepts ``escalation_ladder`` in its constructor + ``create_executor()`` factory. ``_do_inference()`` walks the ladder: 1. First attempt = pinned provider:model (matches pre-ladder behaviour) 2. On escalatable error, try each rung in order 3. On non-escalatable error, raise immediately (auth, malformed payload) 4. On exhaustion, raise the last error Rung switches temporarily rebind ``self.provider_cfg`` / ``self.model`` / ``self.api_key`` / ``self.base_url`` in a try/finally, so any raised error leaves the executor in its original state for the next call. Key resolution for non-pinned rungs goes through ``resolve_provider`` which reads the rung-provider's env vars fresh. ## Config shape ``config.yaml`` (rendered from ``org.yaml`` → workspace secrets): runtime_config: escalation_ladder: - provider: gemini model: gemini-2.5-flash - provider: anthropic model: claude-sonnet-4-5-20250929 - provider: anthropic model: claude-opus-4-1-20250805 Empty / absent = single-shot behaviour, full backwards-compat with every existing workspace. ## Tests 34 passing, all isolated (no network): - ``test_hermes_escalation.py`` (28): parser + truth-table across rate-limit, overload, context-length, gateway, auth-reject, unrelated exceptions, and case-insensitivity. - ``test_hermes_ladder_integration.py`` (6): no-ladder single call, ladder-not-triggered on success, escalate-on-rate-limit-then-succeed, stop-on-non-escalatable, raise-last-error-when-exhausted, skip- unknown-provider-in-rung. ## Not in this PR - Uncertainty-driven escalation (judge pass after successful reply). - Per-workspace budget tracking (#305 covers this separately). - Live streaming reuse across rungs (ladder retries the whole call). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-16 02:27:27 -07:00
Hongming Wang	84de543378	fix(a2a): add missing Authorization header to delegation and message calls (#401 ) * fix(a2a): add missing Authorization header to delegation and message calls Three A2A client functions were missing the Bearer token on their HTTP calls after the Phase 30.1 workspace-auth enforcement rollout: 1. send_a2a_message (a2a_client.py): POST to target workspace's /message/send used WorkspaceAuth middleware that fails-closed on missing auth header. Fix: headers=auth_headers() — auth_headers() already imported. 2. tool_delegate_task_async (a2a_tools.py): POST to platform /delegate endpoint requires the caller's workspace bearer token since Phase 30.1. Fix: headers=_auth_headers_for_heartbeat() 3. tool_check_task_status (a2a_tools.py): GET /delegations endpoint, same issue. Fix: headers=_auth_headers_for_heartbeat() tool_list_peers already uses _auth_headers_for_heartbeat() correctly — that's why list_peers works while delegation returns 401/[A2A_ERROR]. Root cause of the multi-session A2A outage. PR #386 (TTL fix) addressed the workspace-restart cascade; this fixes the underlying 401 on each call. Closes #391 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(a2a): add missing auth headers to /activity and /notify endpoints Two more Phase 30.1 regressions in a2a_tools.py found during send_message_to_user debugging (it was returning 401): - tool_report_activity: POST /workspaces/:id/activity missing headers - tool_send_message_to_user: POST /workspaces/:id/notify missing headers Both now use headers=_auth_headers_for_heartbeat() matching the pattern used by commit_memory, recall_memory, and the heartbeat POST in the same file. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Co-authored-by: PM (Molecule AI) <pm@molecule-ai.internal> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-16 00:53:18 -07:00
Hongming Wang	1ff544eba8	feat(adapters): add gemini-cli runtime adapter (closes #332 ) (#379 ) Adds a `gemini-cli` workspace runtime backed by Google's Gemini CLI (@google/gemini-cli, ~101k ★, Apache 2.0). Mirrors the claude-code adapter pattern: Docker image installs the CLI, CLIAgentExecutor drives the subprocess, A2A MCP tools wire via ~/.gemini/settings.json. Changes: - workspace-template/adapters/gemini_cli/ — new adapter (Dockerfile, adapter.py, __init__.py, requirements.txt); setup() seeds GEMINI.md from system-prompt.md and injects A2A MCP server into settings.json - workspace-template/cli_executor.py — adds gemini-cli to RUNTIME_PRESETS (--yolo flag, -p prompt, --model, GEMINI_API_KEY env auth); adds mcp_via_settings preset flag to skip --mcp-config injection for runtimes that own their own settings file - workspace-configs-templates/gemini-cli/ — default config.yaml + system-prompt.md template - tests/test_adapters.py — adds gemini-cli to expected adapter set - CLAUDE.md — documents new runtime row in the image table Requires: GEMINI_API_KEY global secret. Build: bash workspace-template/build-all.sh gemini-cli Co-authored-by: DevOps Engineer <devops@molecule.ai> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-15 23:30:00 -07:00
Hongming Wang	c7477047c2	Merge pull request #338 from Molecule-AI/fix/issue-328-transcript-fail-closed fix(security): /transcript fails closed when auth token missing (#328)	2026-04-15 21:30:56 -07:00
Hongming Wang	0e46afa4b9	fix(security): hitl task-id ownership + wire fail_open_if_no_scanner in loader (closes #265 , #268 ) Security audit cycle 13: hitl.py LGTM (workspace-scoped task IDs). Loader.py fix applied (commit 0557f73): fail_open_if_no_scanner now read from config and forwarded to scan_skill_dependencies(); regression test added. CI 5/6 pass (E2E cancel = run-supersession pattern). Closes #265. Closes #268.	2026-04-15 21:18:52 -07:00
Hongming Wang	e1cdb5c9c6	fix(security): /transcript endpoint fails closed when auth token missing (#328 ) Severity HIGH. The /transcript route in main.py used `if expected:` around the bearer-token compare, so `get_token()` returning None (no /configs/.auth_token on disk — bootstrap window, deleted file, OSError) silently skipped the entire auth check. Any container on molecule-monorepo-net could GET /transcript during the provisioning window and walk away with the full session log (user messages, Claude tool calls, assistant replies). The platform's TranscriptHandler always has a valid token (it acquired one at workspace registration), so tightening this gate has no legitimate-caller impact. Only unauthenticated sniffers lose access, which was never the intended contract of #287. Fix: 1. Extracted the auth gate into `workspace-template/transcript_auth.py` — a 20-line module with no heavy imports so the security-critical code is unit-testable without standing up the full uvicorn/a2a/httpx stack (the former inline guard could only be tested end-to-end, which explains why the regression shipped in #287). 2. `transcript_authorized(expected, auth_header)` returns False when `expected` is None or empty — the #328 fix — and otherwise does strict equality against "Bearer <expected>". 3. main.py's inline handler calls the extracted function: if not _transcript_authorized(get_token(), auth_header): return 401 4. New tests/test_transcript_auth.py covers: None token, empty token, valid bearer, wrong bearer, missing header, case-sensitive prefix, whitespace fuzzing. All 7 pass. Closes #328 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 21:17:37 -07:00
Hongming Wang	5451164cba	fix(security): add bearer token auth to /transcript endpoint (#287 ) Closes #287 Any container on molecule-monorepo-net could previously read the full Claude session log without authentication. Guard uses get_token() from platform_auth — skipped only before workspace registration (dev-mode).	2026-04-15 19:47:23 -07:00
Hongming Wang	84d5e395d4	fix(a2a-tools): auth_headers on recall_memory + commit_memory (#304 ) Adds auth_headers to recall_memory and commit_memory in a2a_tools.py. Fixes the #215-class auth regression for A2A memory tools. Test mocks updated to accept headers kwarg.	2026-04-15 19:12:18 -07:00

1 2

87 Commits